OpenBGPd and route filters

Many moons ago, OpenBGPd was extensively used throughout the networking world as a Route Server. However, over the years, many have stopped using it and have migrated away to other implementations. Recently, I have been getting more involved with the networking community, so I decided to ask "why". Almost exclusively, they told me "filter performance".

Many IXP operators made a point to tell me that they liked the syntax and use of OpenBGPd. They were very concerned that there was apparently only a single piece of software that was now being used for Route Servers, and even some of them used multiple versions of it to "satisfy" their desire for divergent implementations. (insert 'eek' here)

So, I decided to look into why the filters were so slow.

Turns out, we were being very naive with the filters and checking every filter against every rule, even when we couldn't match. This is very common in the first implementation of a filter and even showed up in the initial version of PF. Obviously, filter performance in PF is rather critical to having a performant firewall so I looked at its implemetation. Turns out, it was rather simple and straight-forward.

PF's filter performance is known as "skip steps", which exploit the fact that some rules will never match. As an example, if you write a rule like "allow from AS 65111", this can never match if a peer has AS 65333. In such a case, we need to skip all of those rules.

There are two parts to skip-steps. First is when you parse and load the filters, you need to calculate where to skip to, if a group doesn't apply. This happens at load time. The second part is when we are iterating through the filters, is to detect when we get to a rule that can't apply, and skip to the next group. That happens while applying the rules.

Andy from LONAP (the 15th largest IX in the world, according to wikipedia), gave me the last configuration they used with OpenBGPd. This had 314 peers configured, and about 35,000 prefix filters. For additional torture, I added the IRR ruleset for a friend's network, which brought the total prefix rules up to around 600,000 prefixes filters. To this configuration, I sent 65,000 routes.

In OpenBSD 5.8, processing these routes would take approx 35 minutes on my laptop. With the change I committed today (Nov 6), the same processing takes only 30 seconds.


Stuart Henderson (a fellow OpenBSD developer, and one of the people that reviewd and OKed my change) adds a valuable comment in the discussion on Undeadly:

"Turns out, it was rather simple and straight-forward." - Henning pointed out a few times that it was only ever meant to be a quick first implementation, to be replaced with something better! Such a nice idea to use skip-steps here, with hindsight it's blindingly obvious, but that's often the way.

Since skip-steps come from PF, we can look in that direction for guidance about performance. An old Undeadly article about PF rule optimization talks about this ("Ordering rulesets to maximize skip steps") and the principles are the same here. You want to group the rules together to reduce the number of evaluations done.

Take these rules, for example:

allow quick from AS 112 prefix { 192.31.196.0/24, 192.175.48.0/24, 2001:4:112::/48, 2620:4f:8000::/48 }
allow quick to AS 112 prefix { $MY_PREFIXES }

allow quick from AS 2818 prefix { 212.58.224.0/19, 212.58.224.0/20, 212.58.240.0/20, [...] }
allow quick to AS 2818 prefix { $MY_PREFIXES }

allow quick from AS 8330 prefix { 199.212.92.0/23, 199.212.90.0/23,  199.80.128.0/17 [...] }
allow quick to AS 8330 prefix { $MY_PREFIXES }

And here are the skip-step groupings used in the code:

#define RDE_FILTER_SKIP_DIR            0
#define RDE_FILTER_SKIP_GROUPID        1
#define RDE_FILTER_SKIP_REMOTE_AS      2
#define RDE_FILTER_SKIP_PEERID         3

You can probably make a good guess at the meanings. So the rules above are not optimal - a simple reordering as below would put all of the "from" rules together and allow half of the lines to be skipped in one go. (Fewer than half of the rules, because the prefix lists are expanded to multiple rules, but hey there needs to be some room for borrowing something else from PF for future improvement ;-)

allow quick from AS 112 prefix { 192.31.196.0/24, 192.175.48.0/24, 2001:4:112::/48, 2620:4f:8000::/48 }
allow quick from AS 2818 prefix { 212.58.224.0/19, 212.58.224.0/20, 212.58.240.0/20, [...] }
allow quick from AS 8330 prefix { 199.212.92.0/23, 199.212.90.0/23, 199.80.128.0/17 [...] }

allow quick to AS 112 prefix { $MY_PREFIXES }
allow quick to AS 2818 prefix { $MY_PREFIXES }
allow quick to AS 8330 prefix { $MY_PREFIXES }

Nowadays PF has a ruleset optimizer that is aware of skip-steps and will reorder rules to take advantage of them automatically. (In fact, in some cases with PF it can still be better to disable the optimizer and do this by hand; you might know something about your traffic that pfctl doesn't). In the case of IXP route-server BGP filters, they're likely to be much larger than any PF ruleset you're ever going to see, so a ruleset optimizer would be relatively slow and take a lot of memory. Fortunately with BGP filters, they're already going to be machine-generated from routing registries or local databases, so it's a lot easier to order them correctly from the start.

social