OpenShift

This chapter documents the OpenShift setup in Phoenix.

Instances

Currently, two instances are deployed: Staging and Production. Both have a matching configuration with 3 masters, 3 nodes and a load balancer handling both API/UI and application traffic.

API endpoints

Instance API endpoint First master Note
Production https://shift.ovirt.org:8443 shift-m01.phx.ovirt.org
Staging https://staging-shift.phx.ovirt.org:8443 staging-shift-master01.phx.ovirt.org API reachable via OpenVPN only

Node hierarchy

Nodes are arranged using region and zone labels to split workloads:

region zone Workload
infra default infra pods needed for cluster operation (registry, etc)
primary prod production pods with user apps
primary logs logging infrastructure ()
primary ci CI pods
external ci CI pods running on bare metals located outside of PHX

The ci zone also contains a type label to better utilize node capacity:

type Description
vm VM node for CI jobs that don't need nested
bare-metal Bare metal node located in PHX
bare-metal-external Bare metal node located outside of PHX

Every new project it should contain a default node selector based on one of the above labels, at least the zone one.

Remote access using oc

External authentication is used, so to log in remotely using the oc console tool please first authenticate in the UI, click on the username in the top right corner and select "Copy Login Command" - this will generate an authentication token and copy the complete login command into the clipboard.

Administrative console

To perform administrative tasks on the cluster, such as upgrades and permission modification, please log in as root to the first master node indicated in the table above. All changes should be tested on Staging first.

Creating a project

To create a new project called myprod that will run on nodes in the prod zone, run the following command:

oc adm new-project myprod --node-selector='zone=prod'

Adding a new user

Authentication happens using Google Auth so anyone can log in. For this reason, a new user cannot do anything and permissions must be granted to create projects. To do that, first ask the new user to log into the UI so that a user mapping is created. Then list users to confirm the new user's email is visible:

oc get users

Single project access

To provide access to an existing project, run the following command:

oc adm policy add-role-to-user admin newuser@test.com -n NAME_OF_EXISTING_PROJECT

Project creation permission

To allow the new user to create projects, add the self-provisioner role:

oc adm policy add-cluster-role-to-user self-provisioner newuser@test.com

Cluster admin role

In rare cases when a user needs to have instance-wide admin access, add the cluster-admin role:

oc adm policy add-cluster-role-to-user cluster-admin newadmin@test.com

For more info, check out the official docs on user and role management.

Managing persistent storage

Persistent volumes are used to save data across pod restarts and are provisioned manually. To view existing volumes and their states, run:

oc get pv

The STATUS column equals to "Bound" for volumes used by pods.

To add a new volume - create a new YAML listing the name, size and NFS path to use. More info is provided in official docs.

A sample persistent volume definition is presented below:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: new-pv-name
spec:
  capacity:
    storage: 4Gi
  accessModes:
  - ReadWriteOnce
  nfs:
    path: /nfs/export/path
    server: NFS_SERVER_IP
  persistentVolumeReclaimPolicy: Recycle

Upgrading an instance

The ansible hosts file and playbooks are stored on the first Master. Playbooks are stored in /root/openshift-ansible and to update them run a git pull in this dir.

To perform maintenance tasks please follow the official docs, testing them on Staging first.

Adding a node

To add a new node to the cluster, please check that the following preparations are made:

  • CentOS 7 installed and up-to-date
  • docker installed, overlay2 storage configured (default on CentOS7)
  • NetworkManager installed and enabled
  • firewalld installed and enabled
  • SELinux set to Enforcing mode
  • the first master's SSH pubkey is installed on the node
  • if the node is external, ensure it can connect back to the following services in PHX:
  • apiserver endpoint: https://shift-int.phx.ovirt.org:8443
  • SDN on infra nodes: VxLAN (UDP/4789) open for PHX public subnet 66.187.230.0/25
  • if the node is used for kubevirt, ensure it has access to templates.ovirt.org/kubevirt

Connect to the first master and update the Ansible hosts file /etc/ansible/hosts Add the node that needs to be added into the [new_nodes] section.

Now run the node scale-up playbook:

ansible-playbook /root/openshift-ansible/playbooks/openshift-node/scaleup.yml

Ensure the playbook completes without errors. Verify the node is added to the cluster:

oc get nodes

If the node is present in the list and its status is "Ready", the process is complete.

SSL

SSL is managed using openshift-acme which is an automated ACME controller.

Enabling opensift-acme on a route

The controller will only act on routes that have it explicitly enabled to avoid abuse and certificate requests for non-existing domains. The following annotation needs to be added to a route definition:

metadata:
  annotations:
    kubernetes.io/tls-acme: "true"

Alternatively, patch the route using the CLI:

oc patch route ROUTE_NAME -p '{"metadata":{"annotations":{"kubernetes.io/tls-acme":"true"}}}'

This will instruct the controller to generate a new certificate and install it on the route. Upon expiration the controller will renew the certificate automatically.

Deploying openshift-acme

Standard upstream instructions can be used to deploy openshift-acme after a reinstall:

oc new-project acme
oc create -fhttps://raw.githubusercontent.com/tnozicka/openshift-acme/master/deploy/letsencrypt-live/cluster-wide/{clusterrole,serviceaccount,imagestream,deployment}.yaml -n acme
oc adm policy add-cluster-role-to-user openshift-acme -z openshift-acme -n acme

The last step provides the service account required access permissions to read routes and change them by adding generated certificates.

Troubleshooting certificate renewal

The controller runs as a pod in the acme namespace. In case of issues ensure that the pod is running and review its logs for further information.