oVirt Hosts

All oVirt hosts in PHX have CentOS 7 installed with a hardware RAID5 setup and bonding on all interfaces.

Hosts are split across datacenters. There is one that hosts the hosted engine and production VMs while others host CI workloads.

PHX oVirt datacenter organization

There are several datacenters defined: Production for critical VMs that uses shared storage and local datacenters to host CI workloads.

The Production datacenter consists of three hosts which are located in the DMZ VLAN and have static IPs defined:

Host	IP
ovirt-srv01	66.187.230.3
ovirt-srv02	66.187.230.4
ovirt-srv03	66.187.230.5

These hosts are connected to the Storage array

Other datacenters are of the Local Storage type so there is one host per datacenter. These reside in the infra VLAN with IPs also assigned statically according to the hostname:

Host	IP
ovirt-srv09	172.19.11.9
ovirt-srv10	172.19.11.10
ovirt-srv11	172.19.11.11
ovirt-srv12	172.19.11.12
ovirt-srv13	172.19.11.13
ovirt-srv14	172.19.11.14

There is also two POWER8 hosts in use by oVirt hosting ppc64le VMs:

Host	FQDN
ovirt-srv15	ovirt-srv15.phx.ovirt.org
ovirt-srv16	ovirt-srv16.phx.ovirt.org

See the network layout for more details about PHX VLANs.

Production VMs

VMs in this datacenter can be installed through Foreman or using templates imported from Glance.

Some of the vms that are located in the Production datacenter:

foreman: Foreman master
foreman-phx: Foreman proxy serving the phoenix network, includes DHCP, TFTP and DNS services. Also serves (or will) as DNS for the network.
HostedEngine: VM with the hosted engine, is not actually managed by itself but by the hosted engine services.
resources02-phx-ovirt-org: Frontend to serve repositories in resources.ovirt.org. It's connected to a special shared disk where the repos are stored, so it's easy to plug-unplug it from the vm if need upgrading or anything.
proxy-phx-ovirt-org: This will be the local network squid proxy used to conserve traffic and increase speed when building with mock.
gw02.phx.ovirt.org: PHX gateway for routing internal VLANs

Jenkins VMs

The jenkins local DCs have all the slaves and templates used to build them. The amount and oses/distros varies often but the organization should be quite stable.

The slaves are named following the pattern:

vm${NUMBER}

The number is used only to distinguish between the vms from one another so it's only requirement is to be unique.

Currently the number ranges are used as follows:

first VM	last VM	distro
vm0001	vm0049	el7
vm0050	vm0063	fedora
vm0064	vm0099	el7
vm0100	vm0199	fedora
vm0200	vm0299	el7

These are located in the workers VLAN and have IPs assigned via DHCP based on the MAC address used during VM creation. Some examples:

VM	MAC	IP
vm0001	00:16:3e:11:00:01	172.19.12.1
vm0100	00:16:3e:11:01:00	172.19.12.100
vm0222	00:16:3e:11:02:22	172.19.12.222
vm1001	00:16:3e:11:10:01	172.19.15.233

The workers VLAN has a /22 subnet assigned so it can contain up to 1024 hosts. IPs in this subnet are internal and are not reachable from the outside.

The templates are named the same way the slaves are, but instead of using the vm${NUMBER} suffix you only have two suffixes, -base and -worker. The -base template (sometimes you'll see also a vm with that name, used to update the template) is a template you can use to build any server, it has only the base foreman hostgroup applied. The -worker template has the cloud-init config defined to install software to act as a Jenkins slave.

Also keep in mind that puppet may be run again by the foreman finisher script when creating a new machine to make sure to apply the latest puppet manifests and configurations.

Network configuration

All interfaces are bonded and the first one has PXE enabled. If a rebuild is needed, use the first interface and then bond using this "custom" bondiing mode in the oVirt Admin Interface.

"mode=4 miimon=100 lacp_rate=1"

This will ensure the keepalive rate matches that on the switch.

Hosted engine management

The three first hosts (when writing this), ovirt-srv01, ovirt-srv02 and ovirt-srv03 are the ones that manage the hosted engine vm. That hosted engine is not handled by itself but by a couple of services and scripts installed bu the hosted-engine rpms.

To check the current status of the hosted engine cluster, you can run from any of those hosts:

[root@ovirt-srv01 ~]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : 66.187.230.3
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 1415642612
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=1415642612 (Mon Nov 10 11:03:32 2014)
    host-id=1
    score=2400
    maintenance=False
    state=EngineUp


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : 66.187.230.4
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 1415642616
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=1415642616 (Mon Nov 10 11:03:36 2014)
    host-id=2
    score=2400
    maintenance=False
    state=EngineDown


--== Host 3 status ==--

Status up-to-date                  : True
Hostname                           : ovirt-srv03.ovirt.org
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 1415642615
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=1415642615 (Mon Nov 10 11:03:35 2014)
    host-id=3
    score=2400
    maintenance=False
    state=EngineDown

You can see that the engine is running only on one of the hosts. You can set one host into maintenance mode executing:

[root@ovirt-srv01 ~]# hosted-engine --set-maintenance=local

From the selected host. You can also handle the vm engine with hoset-endine command (don't do it through the engine ui).

Tips and Tricks

Strange routing/network issues

For example, once saw that ping was unable to resolve any names, while dig/nslookup worked perfectly, that was caused by having wrong custom routing rules in a routing table aside from the main one, to see all the routing rules you can type:

ip route show table all

Those were defined in the /etc/network-scripts/rules-ovirtmgmt file.

VDSM did not create the ovirtmgmt libvirt network

In one of the hosts, after messing the network, vdsm did not automatically create the ovirtmgmt network in the libvirt setting, you can create it manually by:

$ echo <<EOC > ovirtmgmt_net.xml
<network>
  <name>vdsm-ovirtmgmt</name>
  <forward mode='bridge'/>
  <bridge name='ovirtmgmt'/>
</network>
$ virsh -c qemu:///system
user: vdsm@ovirt
pass: shibboleth
(virsh)$ net-create ovirtmgmt_net.xml
# this creates the network in non-persistent mode, to force persistent we can 
# just edit it and add a newline at the end
(virsh)$ net-edit ovirtmgmt
(virsh)$ net-autostart ovirtmgmt