oVirt Hosts

All the hosts have a server installation of Fedora 19, with a hardware RAID5 setup and bonding on all interfaces.

The hosts are separated in two groups, one that hosts the hosted engine and all the others. Right now we also have one of the hosts reserved (ovirt-srv08.ovirt.org) for the new integration testing framework.

Network configuration

Due to an restriction of the bonding drivers on Fedora 19, the network bond interfaces has to be bond1 instead of bond0, so the relevant networking configuration files end up as:

[root@ovirt-srv01 ~]# cat /etc/sysconfig/network
HOSTNAME=ovirt-srv01
GATEWAY=66.187.230.126
[root@ovirt-srv01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-em1
# Generated by VDSM version 4.14.9-0.fc19
DEVICE=em1
ONBOOT=yes
HWADDR=f8:bc:12:3b:4e:08
NM_CONTROLLED=no
SLAVE=yes
MASTER=bond1
USERCTL=no


[root@ovirt-srv01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond1
DEVICE=bond1
ONBOOT=yes
BRIDGE=ovirtmgmt
NM_CONTROLLED=no
STP=no
HOTPLUG=no
BONDING_OPTS="mode=4 miimon=100 lacp_rate=1"

Note the special BONDING_OPTS, that sets the type of bonding and rate to the same configured in the switch.

The current network range that we are using is 66.187.230.0/25, with the gateway at 60.187.230.126. All the ips are public but there's a transparent firewall that blocks any incoming request.

Hosted engine management

The three first hosts (when writing this), ovirt-srv01, ovirt-srv02 and ovirt-srv03 are the ones that manage the hosted engine vm. That hosted engine is not handled by itself but by a couple of services and scripts installed bu the hosted-engine rpms.

To check the current status of the hosted engine cluster, you can run from any of those hosts:

[root@ovirt-srv01 ~]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : 66.187.230.3
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 1415642612
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=1415642612 (Mon Nov 10 11:03:32 2014)
    host-id=1
    score=2400
    maintenance=False
    state=EngineUp


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : 66.187.230.4
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 1415642616
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=1415642616 (Mon Nov 10 11:03:36 2014)
    host-id=2
    score=2400
    maintenance=False
    state=EngineDown


--== Host 3 status ==--

Status up-to-date                  : True
Hostname                           : ovirt-srv03.ovirt.org
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 1415642615
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=1415642615 (Mon Nov 10 11:03:35 2014)
    host-id=3
    score=2400
    maintenance=False
    state=EngineDown

You can see that the engine is running only on one of the hosts. You can set one host into maintenance mode executing:

[root@ovirt-srv01 ~]# hosted-engine --set-maintenance=local

From the selected host. You can also handle the vm engine with hoset-endine command (don't do it through the engine ui).

oVirt datacenter organization

All the hosts are distributed across two datacenters, jenkins and production (Default in the ui), the first one is ment to host all the jenkins related vms and any testing vm outside jenkins. The production one is configured for high availability and is ment to host all the service vms that host production services like foreman, jenkins or resources01.

Production VMs

There are no templates yet in this datacenter but some base netinstall images are uploaded, so if you have to create a new vm you'd be able to do so using the netinstall boot from foreman.

Currently these are the vms that we have in the production datacenter:

  • foreman-phx: Foreman proxy serving the phoenix network, includes DHCP, TFTP and DNS services. Also serves (or will) as DNS for the network.
  • HostedEngine: VM with the hosted engine, is not actually managed by itself but by the hosted engine services.
  • resources01-phx-ovirt-org: Frontend to serve the old repositories in resources.ovirt.org. It's connected to a special shared disk where the repos are stored, so it's easy to plug-unplug it from the vm if need upgrading or anything.
  • proxy-phx-ovirt-org: This will be the local network squid proxy, is not yet functional but the idea is to use it to cache mostly yum packages, as we use intensive use of those when building with mock.

Jenkins VMs

The jenkins DC has all the slaves and templates used to build them. The amount and oses/distros varies often but the organization should be quite stable.

The slaves are named following the pattern:

${DISTRO}${VERSION}-vm${NUMBER}-phx-ovirt-org

That way is fairly easy to know the relevant information about the slave just by it's name. The number is used only to distinguish between the vms from the same distro/version, so it's only requirement is to be unique, though we usually try to use the lowest available number (that might change in the future when we automate the slave creation, thatn might be replace with the build name or just a hash).

The templates are named the same way the slaves are, but instead of using the vm${NUMBER} suffix you only have two suffixes, -base and -jenkins-slave. The -base template (sometimes you'll see also a vm with that name, used to update the template) is a template you can use to build any server, it has only the base foreman hostgroup applied. The -jenkins-slave template has applied the jenkins-slave hostgroup.

Also keep in mind that puppet will be run again by the foreman finisher script when creating a new machine to make sure to apply the latest puppet manifests and configurations.

Tips and Tricks

Strange routing/network issues

For example, once saw that ping was unable to resolve any names, while dig/nslookup worked perfectly, that was caused by having wrong custom routing rules in a routing table aside from the main one, to see all the routing rules you can type:

ip route show table all

Those were defined in the /etc/network-scripts/rules-ovirtmgmt file.

VDSM did not create the ovirtmgmt libvirt network

In one of the hosts, after messing the network, vdsm did not automatically create the ovirtmgmt network in the libvirt setting, you can create it manually by:

$ echo <<EOC > ovirtmgmt_net.xml
<network>
  <name>vdsm-ovirtmgmt</name>
  <forward mode='bridge'/>
  <bridge name='ovirtmgmt'/>
</network>
$ virsh -c qemu:///system
user: vdsm@ovirt
pass: shibboleth
(virsh)$ net-create ovirtmgmt_net.xml
# this creates the network in non-persistent mode, to force persistent we can 
# just edit it and add a newline at the end
(virsh)$ net-edit ovirtmgmt
(virsh)$ net-autostart ovirtmgmt