RSS

Huge improve network performance by change TCP congestion control to BBR

BBR improve Linux server response time. Huge improve Linux network performance by change TCP congestion control to BBR (Bottleneck Bandwidth and RTT).

What is BBR

BBR is Bottleneck Bandwidth and RTT. BBR congestion control computes the sending rate based on the delivery rate (throughput) estimated from ACKs.

BBR was contributed to Linux kernel 4.9 since 2016 by Google.

BBR has significantly increased throughput and reduced latency for connections on Google's internal backbone networks and google.com and YouTube Web servers.

BBR requires only changes on the sender side, not in the network or the receiver side. Thus it can be incrementally deployed on today's Internet, or in datacenters.

How to enable BBR

bbr need Linux kernel version 4.9 or above, use uname -r to your Linux kernel version:

$ uname -a
Linux pi3 4.19.97-v7+ #1294

List available congestion control algorithms and your current setting:

$ sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = reno cubic

$ sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = cubic

To enable BBR, need enable kernel module tcp_bbr:

# modprobe tcp_bbr
# echo "tcp_bbr" > /etc/modules-load.d/bbr.conf

After modprobe tcp_bbr, bbr should be available in the list of tcp_available_congestion_control:

$ sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = reno cubic bbr

Then adding following two lines to /etc/sysctl.conf:

net.core.default_qdisc = fq    # BBR must be used with fq qdisc, see note below
net.ipv4.tcp_congestion_control = bbr

Then reload /etc/sysctl.conf:

$ sudo sysctl -p
...
...
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Now you can double check to make sure bbr is enabled:

$ sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = bbr

How to test network performance

iperf3 is a utility to perform network throughput tests.

$ sudo apt-get install -y iperf3

Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libiperf0 libsctp1
Suggested packages:
  lksctp-tools
The following NEW packages will be installed:
  iperf3 libiperf0 libsctp1
...

iperf3 can use -C (or --congestion) to choose congestion control algorithm. In our test, we can specific bbr.

-C, --congestion algo
      Set the congestion control algorithm (Linux and FreeBSD only).  An  older  --linux-congestion  synonym
      for this flag is accepted but is deprecated.
iperf -C bbr -c example.com  # replace example.com with your test target

How can I monitor Linux TCP BBR connections?

You can use ss (another utility to investigate sockets) to monitor BBR state variables, including pacing rate, cwnd, bandwidth estimate, min_rtt estimate, etc.

ss -tin example output:

$ ss -tin
State       Recv-Q       Send-Q              Local Address:Port                 Peer Address:Port        Process
ESTAB       0            36                      10.0.0.55:22                 123.23.12.98:61030
	 bbr wscale:6,7 rto:292 rtt:91.891/20.196 ato:40 mss:1448 pmtu:9000 rcvmss:1448 advmss:8948 cwnd:48 bytes_sent:95301
   bytes_retrans:136 bytes_acked:95129 bytes_received:20641 segs_out:813 segs_in:1091 data_segs_out:792 data_segs_in:481
   bbr:(bw:1911880bps,mrtt:73.825,pacing_gain:2.88672,cwnd_gain:2.88672) send 6050995bps lastsnd:4 lastrcv:8 lastack:8
   pacing_rate 5463880bps delivery_rate 1911928bps delivered:791 app_limited busy:44124ms unacked:1 retrans:0/2
   dsack_dups:1 rcv_space:56576 rcv_ssthresh:56576 minrtt:73.825

Below fields may appear:

ts     show string "ts" if the timestamp option is set

sack   show string "sack" if the sack option is set

ecn    show string "ecn" if the explicit congestion notification option is set

ecnseen
        show string "ecnseen" if the saw ecn flag is found in received packets

fastopen
        show string "fastopen" if the fastopen option is set

cong_alg
        the congestion algorithm name, the default congestion algorithm is "cubic"

wscale:<snd_wscale>:<rcv_wscale>
        if window scale option is used, this field shows the send scale factor and receive scale factor

rto:<icsk_rto>
        tcp re-transmission timeout value, the unit is millisecond

backoff:<icsk_backoff>
        used for exponential backoff re-transmission,  the  actual  re-transmission  timeout  value  is
        icsk_rto << icsk_backoff

rtt:<rtt>/<rttvar>
        rtt  is  the average round trip time, rttvar is the mean deviation of rtt, their units are mil‐
        lisecond

ato:<ato>
        ack timeout, unit is millisecond, used for delay ack mode

mss:<mss>
        max segment size

cwnd:<cwnd>
        congestion window size

pmtu:<pmtu>
        path MTU value

ssthresh:<ssthresh>
        tcp congestion window slow start threshold

bytes_acked:<bytes_acked>
        bytes acked

bytes_received:<bytes_received>
        bytes received

segs_out:<segs_out>
        segments sent out

segs_in:<segs_in>
        segments received

send <send_bps>bps
        egress bps

lastsnd:<lastsnd>
        how long time since the last packet sent, the unit is millisecond

lastrcv:<lastrcv>
        how long time since the last packet received, the unit is millisecond

lastack:<lastack>
        how long time since the last ack received, the unit is millisecond

pacing_rate <pacing_rate>bps/<max_pacing_rate>bps
        the pacing rate and max pacing rate

rcv_space:<rcv_space>
        a helper variable for TCP internal auto tuning socket receive buffer

TCP Throughput Improvement Examples

From Google

Google Search, Youtube deployed BBR and gain TCP performance improvement.

Example performance results, to illustrate the difference between BBR and CUBIC:

  • Resilience to random loss (e.g. from shallow buffers):

    Consider a netperf TCP_STREAM test lasting 30 secs on an emulated path with a 10Gbps bottleneck, 100ms RTT, and 1% packet loss rate. CUBIC gets 3.27 Mbps, and BBR gets 9150 Mbps (2798x higher).

  • Low latency with the bloated buffers common in today’s last-mile links:

    Consider a netperf TCP_STREAM test lasting 120 secs on an emulated path with a 10Mbps bottleneck, 40ms RTT, and 1000-packet bottleneck buffer. Both fully utilize the bottleneck bandwidth, but BBR achieves this with a median RTT 25x lower (43 ms instead of 1.09 secs).

From AWS CloudFront

During March and April 2019, AWS CloudFront deployed BBR. Per AWS blog: TCP BBR Congestion Control with Amazon CloudFront :

Using BBR in CloudFront has been favorable overall, with performance gains of up to 22% improvement on aggregate throughput across several networks and regions.

From Shadowsocks

I have a Shadowsocks server running on Raspberry Pi, without BBR, the client download speed is round 450 KB/s. With BBR, the client download speed improve to 3.6MB/s which is 8 times than default.

BBR v2

There is a on-going work for BBR v2 still in alpha phase.

Troubleshooting

sysctl: setting key “net.core.default_qdisc”: No such file or directory

sysctl set net.core.default_qdisc may run into following error:

sysctl: setting key "net.core.default_qdisc": No such file or directory

The reason is tcp_bbr kernel module is not load yet. To load tcp_bbr, run following command:

sudo modprobe tcp_bbr

To verify tcp_bbr is loaded, use lsmod, for example, in following command, you should see tcp_bbr line:

$ lsmod | grep tcp_bbr
tcp_bbr                20480  3

If sudo modprobe tcp_bbr does not work out, reboot.

References