Federico Mengozzi

Transport Layer

Connection-Oriented Transport: TCP

The TCP Connection

TCP is said to be a connection-oriented protocol because for two processes to start communication an handshake must first occur. The protocol provide a full-duplex service meaning that data flow independently from source $A$ to $B$ even and at the same time from $B$ to $A$. It also creates point-to-point namely a connection is made of just two parties (no multicasting).

During a TCP communication although each party can send and receive messages the party that initiate the connection is considered to be the client and the other party is the server.

After the handshake the protocol reserve some memory for structure it will later use during the sending of packets. One of this data strucutre is the send buffer, such buffer is filled by the application who put there the data that need to be sent. TCP automatically remove chuck of byte from this buffer and send them to the receiver. Each of these chunk can be at most as large as the MSS (maximum segment size), this value is usually equal to largest link layer frame called MTU (maximum transmission unit). TCP pair the data to be sent with e header, thereby forming an actual TCP segment; the TCP segment is passed down to the network layer where it’s encapsulated within an IP datagram.

TCP Segment Structure

TCP consist of header and data field, it’s structure resemble the one below.

32 bits
source portdestination port
sequence number
acknowledgement number
header lengthunusedBIT fieldsreceive window
internal checksumurgent data pointer
  • sequence number since TCP manages stream of bytes, seq. num. is the byte-stream number of the first byte in the segment (for the $i$-th segment the number would be $i \cdot MSS$)
  • receiver window used for flow control
  • urgent data pointer indicate the location of the last byte of the urgent data
  • options (optional) used negotiate MSS

The bits in the BIT fields are used as flag to specify different kind of packages

  • CWR, ECE used for congestion notification
  • URG mark the data as urgent
  • ACK
  • PSH used to indicate the receiver should pass the data to the upper level immediately
  • RST, SYN, FIN used for connection setup and teardown (the RST bit is used when the server receive a SYN segment to a port where it’s not listening for TCP connections, it’s similar to UDP’s ICMP datagram)

The sequence number is chosen randomly by both parties at handshake. Each of the segments that arrive from host $B$ has a sequence number for the data flowing from $B$ to $A$. The acknowledgment number that host $A$ puts in its segment is the sequence number of the next byte host $A$ is expecting from host $B$. That means the acknowledgment for a data segment from host $B$ to $A$ is carried in a segment carrying data from host $A$ to $B$; this acknowledgment is said to be piggybacked on the $A$-to-$B$r data segment.

As a Go-Back-N protocol TCP uses cumulative acknowledgements.

Round-Trip Time Estimation and Timeout

The timer for deciding when to resent a packet is a central key in TCP performances. A good approach for choosing the right time is the following

The protocol measures a $sampleRTT$ and computer the $estimateRTT$ using the following formula

$estimateRTT = (1 - \alpha) estimateRTT + \alpha \cdot sampleRTT$ with $\alpha = 0.125$

The $sampleRTT$s are calculate approximately on every acknowledged packet (the ones not acknowledged and resent are not used for the calculation). The $estimateRTT$ is an exponential weighted moving average (EWMA).

Another useful value is the variability of the RTT $DevRTT$ that is calculated as follow

$DevRTT = (1-\beta)DevRTT + \beta \cdot \mid sampleRTT - estimateRTT \mid$ with $\beta = 0.25$.

With those information is possible to calculate the $TimeoutInterval = estimateRTT + 4 \cdot DevRTT$. The timer will be at least as large as the $estimateRTT$ in addition is will be larger depending on the its variability.

Reliable Data Transfer

It’s theoretically possible to use a timer for each segment sent, but in practice it requires to much overhead; in most implementation just one timer is used.

According to the protocol whenever the timeout event occurs, TCP retransmits the not-yet-acknowledged segment with the smallest sequence number. But each time TCP retransmits, it sets the next timeout interval to twice the previous value. This modification provide some sort of congestion control (since the daley may be due the network being already congested).

Fast Retransmit

Usually the sender can detect packet loss way before the timer interrupt; in particular, given the fact that the receiver keep acknowledging the last received segment, if the sender receive three duplicate ACK (acknowledgement segment relative to the same segment) then all packet sent after it can be considered to be lost. The sender can then go ahead and just resent the packet without waiting for the timer.

Consideration: TCP maintains a reference (sequence number) to the last sent acknowledged packet and also uses cumulative acknowledgement but in contrast to a GO-BACK-N protocol can store out-of order packets. In addition, if the acknowledgement for packet $k < N$ is lost, GO-BACK-N would resend all packets $k, k+1, N-1$ on the other hand TCP would resent only the packet $k$ (if the receiver has buffered out-of order packet it could just acknowledge them without the need for the sender to resend all the packet after the $k$-th).

A possible, but not yet implemented, modification to TCP using a selective acknowledgement in combination with selective repeat could lead to even better performances.

Flow Control

In-order received byte are place within a buffer in the receiver side, from there the application on the upper layer can read the data. If the application, however, doesn’t immediately read the data, the buffer could easily overflow if the sender send data too quickly.

TCP uses a relative simples flow-control service: the sender maintains a variable called receive window to keep track of the free space in the receiver buffer (being full-duplex, the receiver has a receive window too). Other variables are used for the flow control

  • LastByteRead - the number of the last byte read by the application process B
  • LastByteRcvd - the number of the last byte received and placed in the buffer

To avoid overflow the following condition must always hold $LastByteRcvd - LastByteRead \leq RcvBuffer$ and the received window ($rwnd$) is set to $rwnd = RcvBuffer - (LastByteRcvd - LastByteRead)$. The value of $rwnd$ is place in the receive window field of every segment. The sender keep track of $rwnd$ variable and ensure that $LastByteSent-LastByteAcked \leq rwnd$.

Once in the receiver side $rwnd = 0$ the sender could ideally stop to send data, but by doing that the receiver could not notify the sender once its buffer is emptied. In this scenario TCP requires the sender to keep sending segment with one data byte, those segment will be then acknowledged by the sender who can put the updated value of $rwnd$ in the acknowledgement packet.

TCP Connection Management

Connection Setup

A TCP connection is established using a series of steps

  • The TCP client send a TCP segment to the TCP server. The segment has no data except for the SYN bit set to $1$ and the randomly chosen sequence number $client\_isn$
  • The TCP server, upon receiving the SYN packet, allocates the necessary buffer and variables for the the connection. It then send a connection granted segment without data, the SYN bit set to $1$, the acknowledgement field is set to $client\_isn + 1$ and a randomly chosen $server\_isn$. This packet is referred to as the SYNACK segment
  • Once the client receive the SYNACK segment the client allocate buffer and variables and eventually acknowledge the segment by setting the the acknowledgement field is set to $server\_isn + 1$, the SYN bit is set to $0$. The three-way handshake is now complete; this SYNACK acknowledging segment can already carry actual data.

Connection Teardown

When the TCP client decide to close the connection the following event occur

  • The TCP client send a segment with the FIN bit set to $1$. The server send back a segment acknowledging the shutdown request.
  • The TCP server proceed to send its shutdown request (a segment with the FIN bit set to $1$) and the client acknowledge this last segment too.
  • All resources are now deallocated

Connection States

A connection makes transitions through numerous TCP states

  • The TCP client begins in a CLOSED state
  • Once the client send the SYN segment the state becomes SYN_SENT
  • After receiving the acknowledgement for the SYN segment the client enter the ESTABLISHED state and can now communicate freely with the server
  • When the client whats to terminate the connection by sending a FIN segment, the client state become FIN_WAIT_1
  • In FIN_WAIT_1 state the client wait for the acknowledgement of the FIN packet, upon receiving it the client enter FIN_WAIT_2
  • The last action the client perform is waiting for the FIN segment send by server. Once it receives it, the client acknowledges it and enter the FIN_WAIT for about $30, 60$ seconds (depending on the implementation), this time allow the client to resent the ACK for the server’s FIN if it’s lost.

After the wait all resources are released

Go to top