[OpenBSD]

[Previous: Anchors] [Contents] [Next: Address Pools and Load Balancing]

PF: Packet Queueing and Prioritization


Table of Contents


Queueing

To queue something is to store it, in order, while it awaits processing. In a computer network, when data packets are sent out from a host, they enter a queue where they await processing by the operating system. The operating system then decides which queue and which packet(s) from that queue should be processed. The order in which the operating system selects the packets to process can affect network performance. For example, imagine a user running two network applications: SSH and FTP. Ideally, the SSH packets should be processed before the FTP packets because of the time-sensitive nature of SSH; when a key is typed in the SSH client, an immediate response is expected, but an FTP transfer being delayed by a few extra seconds hardly bears any notice. But what happens if the router handling these connections processes a large chunk of packets from the FTP connection before processing the SSH connection? Packets from the SSH connection will remain in the queue (or possibly be dropped by the router if the queue isn't big enough to hold all of the packets) and the SSH session may appear to lag or slow down. By modifying the queueing strategy being used, network bandwidth can be shared fairly between different applications, users, and computers.

Note that queueing is only useful for packets in the outbound direction. Once a packet arrives on an interface in the inbound direction it's already too late to queue it -- it's already consumed network bandwidth to get to the interface that just received it. The only solution is to enable queueing on the adjacent router or, if the host that received the packet is acting as a router, to enable queueing on the internal interface where packets exit the router.

Schedulers

The scheduler is what decides which queues to process and in what order. By default, OpenBSD uses a First In First Out (FIFO) scheduler. A FIFO queue works like the line-up at a supermarket's checkout -- the first item into the queue is the first processed. As new packets arrive they are added to the end of the queue. If the queue becomes full, and here the analogy with the supermarket stops, newly arriving packets are dropped. This is known as tail-drop.

OpenBSD supports two additional schedulers:

Class Based Queueing

Class Based Queueing (CBQ) is a queueing algorithm that divides a network connection's bandwidth among multiple queues or classes. Each queue then has traffic assigned to it based on source or destination address, port number, protocol, etc. A queue may optionally be configured to borrow bandwidth from its parent queue if the parent is being under-utilized. Queues are also given a priority such that those containing interactive traffic, such as SSH, can have their packets processed ahead of queues containing bulk traffic, such as FTP.

CBQ queues are arranged in an hierarchical manner. At the top of the hierarchy is the root queue which defines the total amount of bandwidth available. Child queues are created under the root queue, each of which can be assigned some portion of the root queue's bandwidth. For example, queues might be defined as follows:

Root Queue (2Mbps)
Queue A (1Mbps)
Queue B (500Kbps)
Queue C (500Kbps)

In this case, the total available bandwidth is set to 2 megabits per second (Mbps). This bandwidth is then split among three child queues.

The hierarchy can further be expanded by defining queues within queues. To split bandwidth equally among different users and also classify their traffic so that certain protocols don't starve others for bandwidth, a queueing structure like this might be defined:

Root Queue (2Mbps)
UserA (1Mbps)
ssh (50Kbps)
bulk (950Kbps)
UserB (1Mbps)
audio (250Kbps)
bulk (750Kbps)
http (100Kbps)
other (650Kbps)

Note that at each level the sum of the bandwidth assigned to each of the queues is not more than the bandwidth assigned to the parent queue.

A queue can be configured to borrow bandwidth from its parent if the parent has excess bandwidth available due to it not being used by the other child queues. Consider a queueing setup like this:

Root Queue (2Mbps)
UserA (1Mbps)
ssh (100Kbps)
ftp (900Kbps, borrow)
UserB (1Mbps)

If traffic in the ftp queue exceeds 900Kbps and traffic in the UserA queue is less than 1Mbps (because the ssh queue is using less than its assigned 100Kbps), the ftp queue will borrow the excess bandwidth from UserA. In this way the ftp queue is able to use more than its assigned bandwidth when it faces overload. When the ssh queue increases its load, the borrowed bandwidth will be returned.

CBQ assigns each queue a priority level. Queues with a higher priority are preferred during congestion over queues with a lower priority as long as both queues share the same parent (in other words, as long as both queues are on the same branch in the hierarchy). Queues with the same priority are processed in a round-robin fashion. For example:

Root Queue (2Mbps)
UserA (1Mbps, priority 1)
ssh (100Kbps, priority 5)
ftp (900Kbps, priority 3)
UserB (1Mbps, priority 1)

CBQ will process the UserA and UserB queues in a round-robin fashion -- neither queue will be preferred over the other. During the time when the UserA queue is being processed, CBQ will also process its child queues. In this case, the ssh queue has a higher priority and will be given preferential treatment over the ftp queue if the network is congested. Note how the ssh and ftp queues do not have their priorities compared to the UserA and UserB queues because they are not all on the same branch in the hierarchy.

For a more detailed look at the theory behind CBQ, please see References on CBQ.

Priority Queueing

Priority Queueing (PRIQ) assigns multiple queues to a network interface with each queue being given a priority level. A queue with a higher priority is always processed ahead of a queue with a lower priority. If two or more queues are assigned the same priority then those queues are processed in a round-robin fashion.

The queueing structure in PRIQ is flat -- you cannot define queues within queues. The root queue is defined, which sets the total amount of bandwidth that is available, and then sub queues are defined under the root. Consider the following example:

Root Queue (2Mbps)
Queue A (priority 1)
Queue B (priority 2)
Queue C (priority 3)

The root queue is defined as having 2Mbps of bandwidth available to it and three subqueues are defined. The queue with the highest priority (the highest priority number) is served first. Once all the packets in that queue are processed, or if the queue is found to be empty, PRIQ moves onto the queue with the next highest priority. Within a given queue, packets are processed in a First In First Out (FIFO) manner.

It is important to note that when using PRIQ you must plan your queues very carefully. Because PRIQ always processes a higher priority queue before a lower priority one, it's possible for a high priority queue to cause packets in a lower priority queue to be delayed or dropped if the high priority queue is receiving a constant stream of packets.

Random Early Detection

Random Early Detection (RED) is a congestion avoidance algorithm. Its job is to avoid network congestion by making sure that the queue doesn't become full. It does this by continually calculating the average length (size) of the queue and comparing it to two thresholds, a minimum threshold and a maximum threshold. If the average queue size is below the minimum threshold then no packets will be dropped. If the average is above the maximum threshold then all newly arriving packets will be dropped. If the average is between the threshold values then packets are dropped based on a probability calculated from the average queue size. In other words, as the average queue size approaches the maximum threshold, more and more packets are dropped. When dropping packets, RED randomly chooses which connections to drop packets from. Connections using larger amounts of bandwidth have a higher probability of having their packets dropped.

RED is useful because it avoids a situation known as global synchronization and it is able to accommodate bursts of traffic. Global synchronization refers to a loss of total throughput due to packets being dropped from several connections at the same time. For example, if congestion occurs at a router carrying traffic for 10 FTP connections and packets from all (or most) of these connections are dropped (as is the case with FIFO queueing), overall throughput will drop sharply. This isn't an ideal situation because it causes all of the FTP connections to reduce their throughput and also means that the network is no longer being used to its maximum potential. RED avoids this by randomly choosing which connections to drop packets from instead of choosing all of them. Connections using large amounts of bandwidth have a higher chance of their packets being dropped. In this way, high bandwidth connections will be throttled back, congestion will be avoided, and sharp losses of overall throughput will not occur. In addition, RED is able to handle bursts of traffic because it starts to drop packets before the queue becomes full. When a burst of traffic comes through there will be enough space in the queue to hold the new packets.

RED should only be used when the transport protocol is capable of responding to congestion indicators from the network. In most cases this means RED should be used to queue TCP traffic and not UDP or ICMP traffic.

For a more detailed look at the theory behind RED, please see References on RED.

Explicit Congestion Notification

Explicit Congestion Notification (ECN) works in conjunction with RED to notify two hosts communicating over the network of any congestion along the communication path. It does this by enabling RED to set a flag in the packet header instead of dropping the packet. Assuming the sending host has support for ECN, it can then read this flag and throttle back its network traffic accordingly.

For more information on ECN, please refer to RFC 3168.

Configuring Queueing

Since OpenBSD 3.0 the Alternate Queueing (ALTQ) queueing implementation has been a part of the base system. Starting with OpenBSD 3.3 ALTQ has been integrated into PF. OpenBSD's ALTQ implementation supports the Class Based Queueing (CBQ) and Priority Queueing (PRIQ) schedulers. It also supports Random Early Detection (RED) and Explicit Congestion Notification (ECN).

Because ALTQ has been merged with PF, PF must be enabled for queueing to work. Instructions on how to enable PF can be found in Getting Started.

Queueing is configured in pf.conf. There are two types of directives that are used to configure queueing:

The syntax for the altq on directive is:

altq on interface scheduler bandwidth bw qlimit qlim \
   tbrsize size queue { queue_list }

For example:

altq on fxp0 cbq bandwidth 2Mb queue { std, ssh, ftp }
This enables CBQ on the fxp0 interface. The total bandwidth available is set to 2Mbps. Three child queues are defined: std, ssh, and ftp.

The syntax for the queue directive is:

queue name [on interface] bandwidth bw [priority pri] [qlimit qlim] \
   scheduler ( sched_options ) { queue_list }

Continuing with the example above:

queue std bandwidth 50% cbq(default)
queue ssh bandwidth 25% { ssh_login, ssh_bulk }
  queue ssh_login bandwidth 25% priority 4 cbq(ecn)
  queue ssh_bulk bandwidth 75% cbq(ecn)
queue ftp bandwidth 500Kb priority 3 cbq(borrow red)

Here the parameters of the previously defined child queues are set. The std queue is assigned a bandwidth of 50% of the root queue's bandwidth (or 1Mbps) and is set as the default queue. The ssh queue is assigned 25% of the root queue's bandwidth (500kb) and also contains two child queues, ssh_login and ssh_bulk. The ssh_login queue is given a higher priority than ssh_bulk and both have ECN enabled. The ftp queue is assigned a bandwidth of 500Kbps and given a priority of 3. It can also borrow bandwidth when extra is available and has RED enabled.

NOTE: Each child queue definition has its bandwidth specified. Without specifying the bandwidth, PF will give the queue 100% of the parent queue's bandwidth. In this situation, that would cause an error when the rules are loaded since if there's a queue with 100% of the bandwidth, no other queue can be defined at that level since there is no free bandwidth to allocate to it.

Assigning Traffic to a Queue

To assign traffic to a queue, the queue keyword is used in conjunction with PF's filter rules. For example, consider a set of filtering rules containing a line such as:

pass out on fxp0 proto tcp to port 22

Packets matching that rule can be assigned to a specific queue by using the queue keyword:

pass out on fxp0 proto tcp to port 22 queue ssh

When a state table entry is created by this rule, PF will record the queue in the state table entry; this will be used for other packets permitted by the entry:

pass in on fxp0 proto tcp to port 80 queue http
With this rule, packets traveling back out fxp0 that match the stateful connection will end up in the http queue. Note that even though the queue keyword is being used on a rule filtering incoming traffic, the goal is to specify a queue for the corresponding outgoing traffic; the above rule does not queue incoming packets.

When using the queue keyword with block directives, any resulting TCP RST or ICMP Unreachable packets are assigned to the specified queue.

Note that queue designation can happen on an interface other than the one defined in the altq on directive:

altq on fxp0 cbq bandwidth 2Mb queue { std, ftp }
queue std bandwidth 500Kb cbq(default)
queue ftp bandwidth 1.5Mb

pass in on dc0 proto tcp to port 21 queue ftp

Queueing is enabled on fxp0 but the designation takes place on dc0. If packets matching the pass rule (or the state created by this rule) exit from interface fxp0, they will be queued in the ftp queue. This type of queueing can be very useful on routers.

Normally only one queue name is given with the queue keyword, but if a second name is specified that queue will be used for packets with a Type of Service (ToS) of low-delay and for TCP ACK packets with no data payload. A good example of this is found when using SSH. SSH login sessions will set the ToS to low-delay while SCP and SFTP sessions will not. PF can use this information to queue packets belonging to a login connection in a different queue than non-login connections. This can be useful to prioritize login connection packets over file transfer packets.

pass out on fxp0 from any to any port 22 queue(ssh_bulk, ssh_login)

This assigns packets belonging to SSH login connections to the ssh_login queue and packets belonging to SCP and SFTP connections to the ssh_bulk queue. SSH login connections will then have their packets processed ahead of SCP and SFTP connections because the ssh_login queue has a higher priority.

Assigning TCP ACK packets to a higher priority queue is useful on asymmetric connections, that is, connections that have different upload and download bandwidths such as ADSL lines. With an ADSL line, if the upload channel is being maxed out and a download is started, the download will suffer because the TCP ACK packets it needs to send will run into congestion when they try to pass through the upload channel. Testing has shown that to achieve the best results, the bandwidth on the upload queue should be set to a value less than what the connection is capable of. For instance, if an ADSL line has a max upload of 640Kbps, setting the root queue's bandwidth to a value such as 600Kb should result in better performance. Trial and error will yield the best bandwidth setting.

Example #1: Small, Home Network

  
    [ Alice ]    [ Charlie ]
        |             |                              ADSL
     ---+-----+-------+------ dc0 [ OpenBSD ] fxp0 -------- ( Internet )
              |
           [ Bob ]

In this example, OpenBSD is being used on an Internet gateway for a small home network with three workstations. The gateway is performing packet filtering and NAT duties. The Internet connection is via an ADSL line running at 2Mbps down and 640Kbps up.

The queueing policy for this network:

Below is the ruleset that meets this network policy. Note that only the pf.conf directives that apply directly to the above policy are present.

# enable queueing on the external interface to control traffic going to
# the Internet. use the priq scheduler to control only priorities. set
# the bandwidth to 610Kbps to get the best performance out of the TCP
# ACK queue.

altq on fxp0 priq bandwidth 610Kb queue { std_out, ssh_im_out, dns_out, \
	tcp_ack_out }

# define the parameters for the child queues.
# std_out      - the standard queue. any filter rule below that does not
#                explicitly specify a queue will have its traffic added
#                to this queue.
# ssh_im_out   - interactive SSH and various instant message traffic.
# dns_out      - DNS queries.
# tcp_ack_out  - TCP ACK packets with no data payload.

queue std_out     priq(default)
queue ssh_im_out  priority 4 priq(red)
queue dns_out     priority 5
queue tcp_ack_out priority 6

# enable queueing on the internal interface to control traffic coming in
# from the Internet. use the cbq scheduler to control bandwidth. max
# bandwidth is 2Mbps.

altq on dc0 cbq bandwidth 2Mb queue { std_in, ssh_im_in, dns_in, bob_in }

# define the parameters for the child queues.
# std_in      - the standard queue. any filter rule below that does not
#               explicitly specify a queue will have its traffic added
#               to this queue.
# ssh_im_in   - interactive SSH and various instant message traffic.
# dns_in      - DNS replies.
# bob_in      - bandwidth reserved for Bob's workstation. allow him to
#               borrow.

queue std_in    bandwidth 1.6Mb cbq(default)
queue ssh_im_in bandwidth 200Kb priority 4
queue dns_in    bandwidth 120Kb priority 5
queue bob_in    bandwidth 80Kb cbq(borrow)


# ... in the filtering section of pf.conf ...

alice         = "192.168.0.2"
bob           = "192.168.0.3"
charlie       = "192.168.0.4"
local_net     = "192.168.0.0/24"
ssh_ports     = "{ 22 2022 }"
im_ports      = "{ 1863 5190 5222 }"

# filter rules for fxp0 inbound
block in on fxp0 all

# filter rules for fxp0 outbound
block out on fxp0 all
pass  out on fxp0 inet proto tcp from (fxp0) queue(std_out, tcp_ack_out)
pass  out on fxp0 inet proto { udp icmp } from (fxp0)
pass  out on fxp0 inet proto { tcp udp } from (fxp0) to port domain \
	queue dns_out
pass  out on fxp0 inet proto tcp from (fxp0) to port $ssh_ports \
	queue(std_out, ssh_im_out)
pass  out on fxp0 inet proto tcp from (fxp0) to port $im_ports \
	queue(ssh_im_out, tcp_ack_out)

# filter rules for dc0 inbound
block in on dc0 all
pass  in on dc0 from $local_net

# filter rules for dc0 outbound
block out on dc0 all
pass  out on dc0 to $local_net
pass  out on dc0 proto { tcp udp } from port domain to $local_net \
	queue dns_in
pass  out on dc0 proto tcp from port $ssh_ports to $local_net \
	queue(std_in, ssh_im_in)
pass  out on dc0 proto tcp from port $im_ports to $local_net \
	queue ssh_im_in
pass  out on dc0 to $bob queue bob_in

Example #2: Company Network


  ( IT Dept )  [ Boss's PC ]
       |          |                                   T1
     --+----+-----+---------- dc0 [ OpenBSD ] fxp0 -------- ( Internet )
            |                         fxp1
         [ COMP1 ]    [ WWW ]         /
                         |           / 
                       --+----------' 

In this example, the OpenBSD host is acting as a firewall for a company network. The company runs a WWW server in the DMZ portion of their network where customers upload their websites via FTP. The IT department has their own subnet connected to the main network, and the boss has a PC on his desk that's used for email and surfing the web. The connection to the Internet is via a T1 line running at 1.5Mbps in both directions. All other network segments are using Fast Ethernet (100Mbps).

The network administrator has decided on the following policy:

Below is the ruleset that meets this network policy. Note that only the pf.conf directives that apply directly to the above policy are present; nat, rdr, options, etc., are not shown.

# enable queueing on the external interface to queue packets going out
# to the Internet. use the cbq scheduler so that the bandwidth use of
# each queue can be controlled. the max outgoing bandwidth is 1.5Mbps.

altq on fxp0 cbq bandwidth 1.5Mb queue { std_ext, www_ext, boss_ext }

# define the parameters for the child queues.
# std_ext        - the standard queue. also the default queue for
#                  outgoing traffic on fxp0.
# www_ext        - container queue for WWW server queues. limit to
#                  500Kbps.
#   www_ext_http - http traffic from the WWW server; higher priority.
#   www_ext_misc - all non-http traffic from the WWW server.
# boss_ext       - traffic coming from the boss's computer.

queue std_ext        bandwidth 500Kb cbq(default borrow)
queue www_ext        bandwidth 500Kb { www_ext_http, www_ext_misc }
  queue www_ext_http bandwidth 50% priority 3 cbq(red borrow)
  queue www_ext_misc bandwidth 50% priority 1 cbq(borrow)
queue boss_ext       bandwidth 500Kb priority 3 cbq(borrow)

# enable queueing on the internal interface to control traffic coming
# from the Internet or the DMZ. use the cbq scheduler to control the
# bandwidth of each queue. bandwidth on this interface is set to the
# maximum. traffic coming from the DMZ will be able to use all of this
# bandwidth while traffic coming from the Internet will be limited to
# 1.0Mbps (because 0.5Mbps (500Kbps) is being allocated to fxp1).

altq on dc0 cbq bandwidth 100% queue { net_int, www_int }

# define the parameters for the child queues.
# net_int    - container queue for traffic from the Internet. bandwidth
#              is 1.0Mbps.
#   std_int  - the standard queue. also the default queue for outgoing
#              traffic on dc0.
#   it_int   - traffic to the IT Dept network; reserve them 500Kbps.
#   boss_int - traffic to the boss's PC; assign a higher priority.
# www_int    - traffic from the WWW server in the DMZ; full speed.

queue net_int    bandwidth 1.0Mb { std_int, it_int, boss_int }
  queue std_int  bandwidth 250Kb cbq(default borrow)
  queue it_int   bandwidth 500Kb cbq(borrow)
  queue boss_int bandwidth 250Kb priority 3 cbq(borrow)
queue www_int    bandwidth 99Mb cbq(red borrow)

# enable queueing on the DMZ interface to control traffic destined for
# the WWW server. cbq will be used on this interface since detailed
# control of bandwidth is necessary. bandwidth on this interface is set
# to the maximum. traffic from the internal network will be able to use
# all of this bandwidth while traffic from the Internet will be limited
# to 500Kbps.

altq on fxp1 cbq bandwidth 100% queue { internal_dmz, net_dmz }

# define the parameters for the child queues.
# internal_dmz   - traffic from the internal network.
# net_dmz        - container queue for traffic from the Internet.
#   net_dmz_http - http traffic; higher priority.
#   net_dmz_misc - all non-http traffic. this is also the default queue.

queue internal_dmz   bandwidth 99Mb cbq(borrow)
queue net_dmz        bandwidth 500Kb { net_dmz_http, net_dmz_misc }
  queue net_dmz_http bandwidth 50% priority 3 cbq(red borrow)
  queue net_dmz_misc bandwidth 50% priority 1 cbq(default borrow)


# ... in the filtering section of pf.conf ...

main_net  = "192.168.0.0/24"
it_net    = "192.168.1.0/24"
int_nets  = "{ 192.168.0.0/24, 192.168.1.0/24 }"
dmz_net   = "10.0.0.0/24"

boss      = "192.168.0.200"
wwwserv   = "10.0.0.100"

# default deny
block on { fxp0, fxp1, dc0 } all

# filter rules for fxp0 inbound
pass in on fxp0 proto tcp from any to $wwwserv port { 21, \
	> 49151 } queue www_ext_misc
pass in on fxp0 proto tcp from any to $wwwserv port 80 queue www_ext_http

# filter rules for fxp0 outbound
pass out on fxp0 from $int_nets
pass out on fxp0 from $boss queue boss_ext

# filter rules for dc0 inbound
pass in on dc0 from $int_nets
pass in on dc0 from $it_net queue it_int
pass in on dc0 from $boss queue boss_int
pass in on dc0 proto tcp from $int_nets to $wwwserv port { 21, 80, \
	> 49151 } queue www_int

# filter rules for dc0 outbound
pass out on dc0 from dc0 to $int_nets

# filter rules for fxp1 inbound
pass in on fxp1 proto { tcp, udp } from $wwwserv to port 53

# filter rules for fxp1 outbound
pass out on fxp1 proto tcp to $wwwserv port { 21, \
	> 49151 } queue net_dmz_misc
pass out on fxp1 proto tcp to $wwwserv port 80 queue net_dmz_http
pass out on fxp1 proto tcp from $int_nets to $wwwserv port { 80, \
	21, > 49151 } queue internal_dmz

[Previous: Anchors] [Contents] [Next: Address Pools and Load Balancing]