Tag Archives: juno

OpenStack – Juno Release – Layer 3 High-Availability

In the Juno release of OpenStack we were blessed with many new features and functionality. One of them is the Layer 3 (L3) High-Availability (HA) functionality. The L3 HA support that came in Juno uses Keepalived with Virtual Routing Redundancy Protocol (VRRP).

This blog is not meant to give you a full dump of Keepalived or VRRP. It’s not even meant to give you a deep dive on L3 HA. I will provide some explanation of my pictures and configs but you really should familiarize yourself with the original info on L3 HA:

Why do we need L3 HA in OpenStack?

Historically, there was not much in the way of providing a tenant a First Hop Redundancy Protocol (FHRP) for the tenant Neutron router/L3 agent.  If a tenant used a router for connectivity to other tenants or external access (Internet) and the node that housed the L3 agent died or the agent itself puked then the tenant was isolated with no connectivity in or out.  The one exception to this is if you did not use a tenant router and instead used the provider network model with VLANs and the first L3 hop for the tenant instance was a physical L3 device (like an aggregation layer L3 switch). In that case the FHRP (i.e. VRRP, HSRP, GLBP) being used between the redundant aggregation layer switches would provide your L3 HA capabilities.

So, we needed an answer for this L3 HA issue. In Juno, the L3 HA functionality was released so that we now had redundancy for the neutron router (L3 agent).

High-Level Considerations

There are a few things to keep in mind when using the L3 HA functionality:

      • L3 HA can be configured manually via the Neutron client by an admin:
        • neutron router-create --ha True|False
      • L3 HA functionality can be set as a system default within the /etc/neutron/neutron.conf and l3_agent.ini files (see examples later in the post)
      • Existing non-HA enabled router can be updated to HA:
        • neutron router-update <router-name> --ha=True
    • Requires a minimum of two network nodes (or controllers – each running the L3 agent)

What does the tenant see?

The tenant sees one router with a single gateway IP address. (Note:Non-Admin users cannot control if the router is HA or non-HA).  From the tenant’s perspective, the router behaves the same in HA or non-HA mode. In Figure 1 below, the tenant sees the instances, private network and a single router (the example below is from an admin user).

Figure 1. Tenant View of L3 HA-enabled Router

tenant-view

Routing View

In Figure 2 below, a basic view of the routing and VRRP setup is shown.  In this example the tenant network is assigned 10.10.30.x/24. VRRP is using 169.254.0.x over a dedicated HA-only network that traverses the same tenant network type (i.e VXLAN).  The router (L3 agent) on the left is the VRRP master and is the tenant gateway (10.10.30.1).

Tenant instances will use the 10.10.30.1 as their gateway and traffic northbound will pass through the master (the left router in this example). Return traffic will pass back through the router acting as a master.

Figure 2. Routing View of L3 HA

routing-view

Host View of L3 HA

In Figure 3 below,  there are three OpenStack nodes: A compute node, control node and a network node. (Note: In this example the control node is acting as a network node as well).  The L3 HA-enabled router has been created and there is a neutron router (L3 agent) running on both the control and network nodes. Keepalived/VRRP is enabled by the L3 HA code. In this example, br-int has a variety of ports connecting:

  • Port to br-tun (used for VXLAN)
  • Port to br-eth1 (used for eth1 connection)
  • Port to qrouter (qr-xxxx)
  • Port to keepalived (ha-xxxx)

The Neutron router (qrouter) in this example has the ports listed above for keepalived and br-int but also has a port to br-eth1 (via qg-xxxx).

The compute node has instances attached to br-int (via tap interfaces, veth pairs and linux bridge). br-int has a port connecting it to br-tun, again, used in this example for VXLAN.

Figure 3. Host View of L3 HA

host-view

Figure 4 shows a basic traffic flow example.  L3 HA uses keepalived/VRRPv2 to manage the master/backup relationship between the Neutron routers (L3 agents). VRRPv2 control traffic is using the 169.254.192.x network (configurable) and advertisements are sent using the well-known IPv4 multicast group of 224.0.0.18 (not configurable).

In this example, traffic leaving  an instance on the compute node will follow the path to its default gateway (via the VXLAN tunnels between br-tun on each node). The traffic flows through whichever L3 agent is acting as ‘master’. In this example the L3 agent is running on the control node.  The L3 agent will then route traffic towards the destination.

Figure 4. Traffic Flow Example for L3 HA

traffic-flow

Enabling L3 HA in Juno

On the node running Neutron server (controller), edit the /etc/neutron/neutron.conf file and uncomment/edit the following lines:

router_distributed = False
# =========== items for l3 extension ==============
# Enable high availability for virtual routers.
l3_ha = True
#
# Maximum number of l3 agents which a HA router will be scheduled on. If it
# is set to 0 the router will be scheduled on every agent.
max_l3_agents_per_router = 3
#
# Minimum number of l3 agents which a HA router will be scheduled on. The
# default value is 2.
min_l3_agents_per_router = 2
#
# CIDR of the administrative network if HA mode is enabled
l3_ha_net_cidr = 169.254.192.0/18

On the nodes running the L3 agent (in my example the control and network nodes), edit the /etc/neutron/l3_agent.ini file and uncomment/edit the following lines (Note: Set a better password than the one I’ve included ;-)):

# Location to store keepalived and all HA configurations
ha_confs_path = $state_path/ha_confs

# VRRP authentication type AH/PASS
ha_vrrp_auth_type = PASS

# VRRP authentication password
ha_vrrp_auth_password = cisco123

# The advertisement interval in seconds
ha_vrrp_advert_int = 2

Restart the L3 agent service on each node:

systemctl restart neutron-l3-agent.service

If you completed the above steps then you can create the neutron router without any ‘–ha’ flags set and the router will be created as an HA-enabled router. You can also create an HA-enabled router using the ‘–ha=True’ flag as shown in the following example (Note: Only admins have permissions to run with the –ha flag set):

[root@net1 ~]# neutron router-create --ha True test1
Created a new router:
+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | 1fe9e406-2bb5-42c4-af62-3daef314e181 |
| name                  | test1                                |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | 45e1c2a0b3a244a3a9fad48f67e28ef4     |
+-----------------------+--------------------------------------+

Once the router is created via the dashboard or via CLI (or via Heat) and your networks are all attached, you can look at the keepalived settings in the /var/lib/neutron/ha_confs/<id>/keepalived.conf file. In the example below, the “interface ha-0d655b16-c6” is the L3 HA interface. VRRP will track that inteface. The virtual IP (VIP) address is for the HA-xxx interface is 169.254.0.1 (only the master holds this address). The tenant-facing router IP (10.10.30.1) and the public-facing router IP (192.168.81.13) are the VIPs for each respective networks. The external default gateway is 192.168.81.2.

[root@net1 ~]# cat /var/lib/neutron/ha_confs/719b853f-539e-420b-a76b-0440146f05de/keepalived.conf
. . . output abbreviated 
vrrp_instance VR_1 {
    state BACKUP
    interface ha-0d655b16-c6
    virtual_router_id 1
    priority 50
    nopreempt
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass cisco123
    }
    track_interface {
        ha-0d655b16-c6
    }
    virtual_ipaddress {
        169.254.0.1/24 dev ha-0d655b16-c6
    }
    virtual_ipaddress_excluded {
        10.10.30.1/24 dev qr-c3090bd6-1b
        192.168.81.13/24 dev qg-4f163e63-c4
}
    virtual_routes {
        0.0.0.0/0 via 192.168.81.2 dev qg-4f163e63-c4
    }

Run a tcpdump on the L3 HA interface (inside the router namespace) to watch the VRRPv2 advertisements:

[root@net1 ~]# ip netns exec qrouter-719b853f-539e-420b-a76b-0440146f05de tcpdump -n -i ha-0d655b16-c6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ha-0d655b16-c6, link-type EN10MB (Ethernet), capture size 65535 bytes
14:00:03.123895 IP 169.254.192.33 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
14:00:05.125386 IP 169.254.192.33 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
14:00:07.128133 IP 169.254.192.33 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
14:00:09.129421 IP 169.254.192.33 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
14:00:11.130814 IP 169.254.192.33 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
14:00:13.131529 IP 169.254.192.33 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20

Testing a failure
First, check who is master:

[root@net1 ~]# cat /var/lib/neutron/ha_confs/719b853f-539e-420b-a76b-0440146f05de/state
master
[root@net2 ~]# cat /var/lib/neutron/ha_confs/719b853f-539e-420b-a76b-0440146f05de/state
backup

Simulate a failure by shutting down the HA interface (remember that it was in the tracked interface list). Have a ping running on the instance to verify connectivity through the failure.:

[root@net1 ~]# ip netns exec qrouter-719b853f-539e-420b-a76b-0440146f05de ifconfig ha-0d655b16-c6 down

Check to see that the master role changed after failure:

[root@net1 ~]# cat /var/lib/neutron/ha_confs/719b853f-539e-420b-a76b-0440146f05de/state
fault
[root@net2 ~]# cat /var/lib/neutron/ha_confs/719b853f-539e-420b-a76b-0440146f05de/state
master

Checking in on the ping shows a delay but no loss (In a loaded system you will likely see a brief loss of traffic):

ubuntu@server1:~$ ping 8.8.8.8
64 bytes from 8.8.8.8: icmp_seq=20 ttl=127 time=65.4 ms
64 bytes from 8.8.8.8: icmp_seq=21 ttl=127 time=107 ms
64 bytes from 8.8.8.8: icmp_seq=22 ttl=127 time=64.5 ms
64 bytes from 8.8.8.8: icmp_seq=23 ttl=127 time=67.6 ms

Happy testing!