Huge improve network performance by change TCP congestion control to BBR

BBR improve Linux server response time. Huge improve Linux network performance by change TCP congestion control to BBR (Bottleneck Bandwidth and RTT).

What is BBR

BBR is Bottleneck Bandwidth and RTT. BBR congestion control computes the sending rate based on the delivery rate (throughput) estimated from ACKs.

BBR was contributed to Linux kernel 4.9 since 2016 by Google.

BBR has significantly increased throughput and reduced latency for connections on Google's internal backbone networks and and YouTube Web servers.

BBR requires only changes on the sender side, not in the network or the receiver side. Thus it can be incrementally deployed on today's Internet, or in datacenters.

How to enable BBR

bbr need Linux kernel version 4.9 or above, use uname -r to your Linux kernel version:

$ uname -a
Linux pi3 4.19.97-v7+ #1294

List available congestion control algorithms and your current setting:

$ sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = reno cubic

$ sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = cubic

To enable BBR, need enable kernel module tcp_bbr:

# modprobe tcp_bbr
# echo "tcp_bbr" > /etc/modules-load.d/bbr.conf

After modprobe tcp_bbr, bbr should be available in the list of tcp_available_congestion_control:

$ sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = reno cubic bbr

Then adding following two lines to /etc/sysctl.conf:

net.core.default_qdisc = fq    # BBR must be used with fq qdisc, see note below
net.ipv4.tcp_congestion_control = bbr

Then reload /etc/sysctl.conf:

$ sudo sysctl -p
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Now you can double check to make sure bbr is enabled:

$ sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = bbr

How to test network performance

iperf3 is a utility to perform network throughput tests.

$ sudo apt-get install -y iperf3

Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libiperf0 libsctp1
Suggested packages:
The following NEW packages will be installed:
  iperf3 libiperf0 libsctp1

iperf3 can use -C (or --congestion) to choose congestion control algorithm. In our test, we can specific bbr.

-C, --congestion algo
      Set the congestion control algorithm (Linux and FreeBSD only).  An  older  --linux-congestion  synonym
      for this flag is accepted but is deprecated.
iperf -C bbr -c  # replace with your test target

How can I monitor Linux TCP BBR connections?

You can use ss (another utility to investigate sockets) to monitor BBR state variables, including pacing rate, cwnd, bandwidth estimate, min_rtt estimate, etc.

ss -tin example output:

$ ss -tin
State       Recv-Q       Send-Q              Local Address:Port                 Peer Address:Port        Process
ESTAB       0            36                   
	 bbr wscale:6,7 rto:292 rtt:91.891/20.196 ato:40 mss:1448 pmtu:9000 rcvmss:1448 advmss:8948 cwnd:48 bytes_sent:95301
   bytes_retrans:136 bytes_acked:95129 bytes_received:20641 segs_out:813 segs_in:1091 data_segs_out:792 data_segs_in:481
   bbr:(bw:1911880bps,mrtt:73.825,pacing_gain:2.88672,cwnd_gain:2.88672) send 6050995bps lastsnd:4 lastrcv:8 lastack:8
   pacing_rate 5463880bps delivery_rate 1911928bps delivered:791 app_limited busy:44124ms unacked:1 retrans:0/2
   dsack_dups:1 rcv_space:56576 rcv_ssthresh:56576 minrtt:73.825

Below fields may appear:

ts     show string "ts" if the timestamp option is set

sack   show string "sack" if the sack option is set

ecn    show string "ecn" if the explicit congestion notification option is set

        show string "ecnseen" if the saw ecn flag is found in received packets

        show string "fastopen" if the fastopen option is set

        the congestion algorithm name, the default congestion algorithm is "cubic"

        if window scale option is used, this field shows the send scale factor and receive scale factor

        tcp re-transmission timeout value, the unit is millisecond

        used for exponential backoff re-transmission,  the  actual  re-transmission  timeout  value  is
        icsk_rto << icsk_backoff

        rtt  is  the average round trip time, rttvar is the mean deviation of rtt, their units are mil‐

        ack timeout, unit is millisecond, used for delay ack mode

        max segment size

        congestion window size

        path MTU value

        tcp congestion window slow start threshold

        bytes acked

        bytes received

        segments sent out

        segments received

send <send_bps>bps
        egress bps

        how long time since the last packet sent, the unit is millisecond

        how long time since the last packet received, the unit is millisecond

        how long time since the last ack received, the unit is millisecond

pacing_rate <pacing_rate>bps/<max_pacing_rate>bps
        the pacing rate and max pacing rate

        a helper variable for TCP internal auto tuning socket receive buffer

TCP Throughput Improvement Examples

From Google

Google Search, Youtube deployed BBR and gain TCP performance improvement.

Example performance results, to illustrate the difference between BBR and CUBIC:

  • Resilience to random loss (e.g. from shallow buffers):

    Consider a netperf TCP_STREAM test lasting 30 secs on an emulated path with a 10Gbps bottleneck, 100ms RTT, and 1% packet loss rate. CUBIC gets 3.27 Mbps, and BBR gets 9150 Mbps (2798x higher).

  • Low latency with the bloated buffers common in today’s last-mile links:

    Consider a netperf TCP_STREAM test lasting 120 secs on an emulated path with a 10Mbps bottleneck, 40ms RTT, and 1000-packet bottleneck buffer. Both fully utilize the bottleneck bandwidth, but BBR achieves this with a median RTT 25x lower (43 ms instead of 1.09 secs).

From AWS CloudFront

During March and April 2019, AWS CloudFront deployed BBR. Per AWS blog: TCP BBR Congestion Control with Amazon CloudFront :

Using BBR in CloudFront has been favorable overall, with performance gains of up to 22% improvement on aggregate throughput across several networks and regions.

From Shadowsocks

I have a Shadowsocks server running on Raspberry Pi, without BBR, the client download speed is round 450 KB/s. With BBR, the client download speed improve to 3.6MB/s which is 8 times than default.

BBR v2

There is a on-going work for BBR v2 still in alpha phase.


sysctl: setting key “net.core.default_qdisc”: No such file or directory

sysctl set net.core.default_qdisc may run into following error:

sysctl: setting key "net.core.default_qdisc": No such file or directory

The reason is tcp_bbr kernel module is not load yet. To load tcp_bbr, run following command:

sudo modprobe tcp_bbr

To verify tcp_bbr is loaded, use lsmod, for example, in following command, you should see tcp_bbr line:

$ lsmod | grep tcp_bbr
tcp_bbr                20480  3

If sudo modprobe tcp_bbr does not work out, reboot.