2. \TCP Selective Acknowledgment Options" RFC2018]
(SACK) is able to treat multiple packet losses as sin-
gle congestion events. SACK allows windows to grow
larger than the ubiquitous Reno TCP, since Reno
will timeout and reduce its congestion window to one
packet if a congested router drops several packets due
to a single congestion event. SACK is able to rec-
ognize burst losses by having the receiver piggyback
lost packet information on acknowledgements. The
authors have added the PSC SACK implementation
to the hosts used in testing SAC98].
3. \Path MTU Discovery" RFC1191, Ste94] allows the
largest possible packet size (Maximum Transmission
Unit) to be sent between two hosts. Without pMTU
discovery, hosts are often restricted to sending pack-
ets of around 576 bytes. Using small packets can lead
to reduced performance MSMO97]. A sender imple-
ments pMTU discovery by setting the \Don't Frag-
ment" bit in packets, and reducing the packet size ac-
cording to ICMP Messages received from intermediate
routers. Since not all of the operating systems used
in this paper support pMTU discovery, static routes
specifying the path MTU were established on the test
hosts.
1.2 Receive Socket Bu er Background
Before delving into implementation details for dynamically
adjusting socket bu ers, it may be useful to understand con-
ventional tuning of socket bu ers. Socket bu ers are the
hand-o area between TCP and an application, storing data
that is to be sent or that has been received. Sender-side and
receiver-side bu ers behave in very di erent ways.
The receiver's socket bu er is used to reassemble the data
in sequential order, queuing it for delivery to the application.
On hosts that are not CPU-limited or application-limited,
the bu er is largely empty except during data recovery, when
it holds an entire window of data minus the dropped packets.
The amount of available space in the receive bu er deter-
mines the receiver's advertised window (or receive window),
the maximum amount of data the receiver allows the sender
to transmit beyond the highest-numbered acknowledgement
issued by the receiver RFC793, Ste94].
The intended purpose of the receive window is to im-
plement end-to-end ow control, allowing an application to
limit the amount of data sent to it. It was designed at a
time when RAM was expensive, and the receiver needed a
way to throttle the sender to limit the amount of memory
required.
If the receiver's advertised window is smaller than cwnd
on the sender, the connection is receive window-limited (con-
trolled by the receiver), rather than congestion window-
limited (controlled by the sender using feedback from the
network).
A small receive window may be con gured as an inten-
tional limit by interactive applications, such as telnet, to
prevent large bu ering delays (ie. a ^C may not take e ect
until many pages of queued text have scrolled by). However
most applications, including WWW and FTP, are better
served by the high throughput that large bu ers o er.
1.3 Send Socket Bu er Background
In contrast to the receive bu er, the sender's socket bu er
holds data that the application has passed to TCP until the
receiver has acknowledged receipt of the data. It is nearly
always full for applications with much data to send, and is
nearly always empty for interactive applications.
If the send bu er size is excessively large compared to
the bandwidth-delay product of the path, bulk transfer ap-
plications still keep the bu er full, wasting kernel memory.
As a result, the number of concurrent connections that can
exist is limited. If the send bu er is too small, data is trick-
led into the network and low throughput results, because
TCP must wait until acknowledgements are received before
allowing old data in the bu er to be replaced by new, unsent
data1.
Applications have no information about network conges-
tion or kernel memory availability to make informed calcu-
lations of optimal bu er sizes and should not be burdened
by lower layers.
2 Implementation
Our implementation involved changes to the socket code and
TCP code in the NetBSD 1.2 kernel Net96]. The standard
NetBSD 1.2 kernel supports RFC 1323 TCP extensions for
high performance. Our kernel also included the PSC SACK
port SAC98].
Since NetBSD is based on 4.4 BSD Lite MBKQ96,
WS95], the code changes should be widely applicable to a
large number of operating systems.
2.1 Large Receive Socket Bu er
During a transfer, the receiver has no simple way of deter-
mining the congestion window size, and therefore cannot be
easily tuned dynamically. One idea for dynamic tuning of
the receive socket bu er is to increase the bu er size when
it is mostly empty, since the lack of data queued for deliv-
ery to the application indicates a low data rate that could
be the result of a receive window-limited connection. The
peak usage is reached during recovery (indicated by a lost
packet), so the bu er size can be reduced if it is much larger
than the space required during recovery. If the low data
rate is not caused by a small receive window, but rather by
a slow bottleneck link, the bu er size will still calibrate itself
when it detects a packet loss. This idea was inspired by a
discussion with Greg Minshall Min97] and requires further
research.
However, the complexity of a receive bu er tuning algo-
rithm may be completely unnecessary for all practical pur-
poses. If a network expert con gured the receive bu er size
for an application desiring high throughput, they would set
it to be two times larger than the congestion window for
the connection. The bu er would typically be empty, and
would, during recovery, hold one congestion window's worth
of data plus a limited amount of new data sent to maintain
the Self-clock.
The same e ect can be obtained simply by con guring
the receive bu er size to be the operating system's maximum
socket bu er size, which our auto-tuning TCP implementa-
tion does by default2. In addition, if an application manu-