Transmission Control Protocol (TCP)

 

By Farrokh Ghani Zadegan, Vengatanathan Krishnamoorthi, Rahul Hiran and Niklas Carlsson, January 2016
(This lab manual is partly based on "Wireshark Lab: TCP", version 2.0 (September 2009) by J.F. Kurose, K.W. Ross, available here.)

Contents

Overview of the Assignment

The goal of this assignment is to make you familiar with the basics of the Transmission Control Protocol (TCP). Before proceeding with this assignment, you should review Sections 3.5-3.7 of your text book Computer Networking: A Top-Down Approach.. During this assignment, you will learn more about Wireshark, TCP congestion control, and will take a quick look at TCP connection fairness.

To carry out the assignment, you should download and extract the files in this archive. The archive contains one Wireshark trace file and some text-based trace files. The text-based trace files are exported from Wireshark and their format will be described later in this manual. In this lab, you will use gnuplot to generate TCP sequence number plots and transmission window plots from the given text-based trace files.

In this assignment, you will be asked to answer and/or discuss a number of questions. To save time, it is important that you carefully read the instructions such that you provide answers in the desired format(s).

TCP Basics

First, you will analyze the provided Wireshark trace file tcp-ethereal-trace-1. This trace file is captured during uploading a 150KB text file to a Web server through the HTTP POST method. Run Wireshark and open the above trace file. Enter tcp (lowercase) into the filter input box.

Note: After you've changed the expression in the filter input box, do not forget to press the Apply button (or the Enter/Return key twice), to apply this filter string to the displayed trace file.

You should now see the initial three-way handshake (packet numbers 1, 2, and 3) used to setup the TCP connection before transmitting the HTTP data. Since the size of this POST request is larger than what can be fit into a single TCP segment, the transfer is split across multiple segments. Wireshark, depending on its version, might show this POST request (in the packet list pane) as the last transmitted packet of this HTTP request. Therefore, to find out which packet contains the actual POST request (i.e. which packet is the start of HTTP data transmission) you can use the Find Packet feature of Wireshark (choose Edit->Find Packet...), which is shown in Figure 1.

Figure 1
Figure 1: The Find Packet feature in Wireshark

Next, please consider the following practice questions. When needed, print out the packet(s) and annotate it to explain your answer. To print out packet information you can use File->Print, choose Selected packet only, choose Packet summary line, and select the minimum amount of packet details that you need to answer the questions. Hand in such printouts along with your answers.

Note 1: In answering the following questions, you may find it convenient to add some of TCP segment fields, such as Sequence Number and Acknowledgment Number, as columns to the Packet List Pane. To do so, simply right-click on the desired field in the Packet Details Pane and choose "Apply as column" from the pop-up menu that appears.
Note 2: Wireshark shows the sequence and acknowledgment numbers relative to the initial numbers exchanged during the TCP handshaking. Therefore, in answering questions 4, 5, and 6, look for the actual (and not the relative) numbers!
Practice questions (not to be explicitly answered):
  1. What are the first and last packets for the POST request?
  2. What is the IP address and the TCP port used by the client computer (source) that is transferring the file to gaia.cs.umass.edu?
  3. What is the IP address of gaia.cs.umass.edu? On what port number is it sending and receiving TCP segments for this connection?
  4. What is the sequence number of the TCP SYN segment that is used to initiate the TCP connection between the client computer and gaia.cs.umass.edu? What is it in the segment that identifies the segment as a SYN segment?
  5. What is the sequence number of the SYNACK segment sent by gaia.cs.umass.edu to the client computer in reply to the SYN? What is the value of the ACKnowledgement field in the SYNACK segment? How did gaia.cs.umass.edu determine that value? What is it in the segment that identifies the segment as a SYNACK segment?
  6. What is the sequence number of the TCP segment containing the HTTP POST command?
  7. Consider the TCP segment containing the HTTP POST as the first segment in the TCP connection. What are the sequence numbers of the first six segments in the TCP connection (including the segment containing the HTTP POST)? At what time was each segment sent? When was the ACK for each segment received? Given the difference between when each TCP segment was sent, and when its acknowledgement was received, what is the RTT value for each of the six segments? What is the EstimatedRTT value (see page 277 in text) after the receipt of each ACK? Assume that the value of the EstimatedRTT is equal to the measured RTT for the first segment, and then is computed using the EstimatedRTT equation on page 277 for all subsequent segments.
Note: Wireshark has a nice feature that allows you to plot the RTT for each of the TCP segments sent. Select a TCP segment in the Packet List Pane that is being sent from the client to the gaia.cs.umass.edu server. Then select: Statistics->TCP Stream Graph->Round Trip Time Graph.
  1. What is the length of each of the first six TCP segments?
  2. What is the minimum amount of available buffer space advertised at the receiver for the entire trace? Does the lack of receiver buffer space ever throttle the sender?
  3. Are there any retransmitted segments in the trace file? What did you check for (in the trace) in order to answer this question?
  4. How much data does the receiver typically acknowledge in an ACK? Can you identify cases where the receiver is ACKing every other received segment (see Table 3.2 on page 285 in the text).
  5. What is the throughput (bytes transferred per unit time) for the TCP connection? Explain how you calculated this value.
Task A: Now, based on questions 1-12, please write two paragraphs explaining and discussing your observations from the above practice questions. One paragraph should describe and discuss the connections at a high level. The second paragraph should discuss the impact of RTT estimates, packet losses, and interpreted packet loss events. Note that your answer may benefit from explaining and/or referring to some of your observations from the practice questions explicitly. Note that, similar to previous assignments, you are expected to convince us that you understand these aspects of TCP.

TCP Congestion Control in Action

In practice very many (!) TCP versions have been proposed, and a wide range of these are being used on various systems. For example, Linux machines are today typically using CUBIC TCP, and some Microsoft machines are using an extension to Reno called compound TCP that use a combination of losses and delay measurements to adjust the congestion window. In addition, companies such as Google are advocating for a large initial window and implementing their own transport layer solutions. Other TCP versions are designed specifically for data centers, wireless environments, and for long-haul links in research networks. One method to understand how different TCP versions (implemented on different machines and OS) operate is to collect packet traces under different traffic conditions (degrees of congestion) and see how the protocols behave. In this part of the assignment you will learn about how time-sequence graphs can be used for this task. For this part of the assignment you will look at TWO different traces. In addition to the original trace files, you can find a more recent sample trace here.

Wireshark's Time-Sequence Graph

Let’s now examine the amount of data sent per unit time from the client to the server. Rather than (tediously!) calculating this from the raw data in the Wireshark window, we’ll use one of Wireshark’s TCP graphing utilities—Time-Sequence-Graph(Stevens) —to plot out data. Select a TCP segment in the Wireshark’s Packet List Pane. Then select the menu: Statistics->TCP Stream Graph-> Time-Sequence-Graph(Stevens). You should see a plot that looks similar to the plot in Figure 2.

Note: You will not get the graph as shown in Figure 2 if you click on the wrong packet, e.g. if you click on an acknowledgment packet instead of a TCP segment containing data!
Figure 2 Figure 2
Figure 2a: TCP sequence number plot using one (old) sample trace. Figure 2a: TCP sequence number plot using one (recent) sample trace.

Here, each dot represents a TCP segment sent, plotting the sequence number of the segment versus the time at which it was sent. Note that a set of dots stacked above each other represents a series of packets that were sent back-to-back by the sender. Left-clicking on each of the dots in the graph, selects (i.e. moves the highlight over) the corresponding segment in the Packet List Pane.

Along with the graph window, Wireshark shows also the graph control window. (See Figure 3.) By clicking on the Help button in this window you can find out the keyboard shortcuts for zooming, navigating, etc.

Use Ctrl + "+" to zoom in and Ctrl + "-" to zoom out.

Figure 3
Figure 3: Wireshark's graph control window

Task B: Please answer and discuss the following three questions:

  1. Use the Time-Sequence-Graph (Stevens) plotting tool to view the sequence number versus time plot of segments being sent from the client to the server (Figure 2a and Figure 2b). For each of the two traces, can you identify where TCP's slow start phase begins and ends, and where congestion avoidance takes over? If you can, explain how. If not, explain why not. To better identify these phases, you may need to find the number of unacknowledged packets (or bytes) at different times and plot the unacknowledged packets (y-axis) as a function of time (x-axis). Note that the number of unacknowledged packets at different times can be found by comparing the number of packets that have been sent with the number of packets that have been acknowledged. After plotting the number of unacknowledged packets versus time, comment on ways in which the measured data differs from the idealized behavior of TCP that we've studied in the text.
  2. Explain the relationship between (i) the congestion window, (ii) the receiver advertised window, (iii) the number of unacknowledged bytes, and (iv) the effective window at the sender.
  3. Is it generally possible to find the congestion window size (i.e. cwnd) and how it changes with time, from the captured trace files? If so, please explain how. If not, please explain when and when not. Motivate your answer and give examples. Your answer may also benefit from trying to describe and discuss your answer in the context of the two prior questions, for example.
There are a lot that can be learned from time-sequence graphs. If you are interested in learning more about generating and interpreting these types of figures, the interested student is encouraged to do the optional section at the end of this assignment.

A Short Study of TCP Fairness

Task C: Please carefully answer and discuss questions 16-18 as outlined in this section.

In this part of the assignment, three cases will be presented to you based on some example measurements that Farrokh performed in the beginning of fall 2011. You will be asked to discuss these scenarios (and the high-level results provided) with regards to TCP fairness (see Section 3.7.1 of the text). As a hint, consider that in the textbook, the following formula is presented to estimate the steady-state throughput of a TCP connection:

Average throughput of a connection = ( 1.22 * MSS ) / ( RTT * sqrt(L) )

where MSS is the maximum segment size, RTT is the round-trip time, and L is the loss rate. For simplicity, you can assume that the loss rate is the same for connections sharing the same bottleneck link.

The first case to consider, is four concurrent downloads from the same server using four different clients (all clients are on the same host). The following table shows the total number of bytes, the duration, and the RTT associated with each of the connections:

Connection Total transferred bytes Duration (in seconds) RTT (in milliseconds)
1 165095720 521 12
2 165842766 521 12
3 165458792 514 12
4 163235772 512 12
  1. What is the throughput of each of the connections in bps (bits per second)? What is the total bandwidth of the host on which the clients are running? Discuss the TCP fairness for this case.

Another case to consider is downloading the same file from different mirror servers around the world. The following table lists the details of each of the connections:

Connection Total transferred bytes Duration (in seconds) RTT (in milliseconds)
1 261319130 90 13
2 175995832 90 35
3 151894552 90 68
4 140388568 90 73
5 108610702 90 49
6 70644690 90 33
7 65744938 90 135
8 43212876 90 326
9 39222524 90 322
  1. What is the throughput of each of the connections in bps (bits per second)? What is the total bandwidth of the host on which the clients are running? Discuss the TCP fairness for this case.

The final case to consider is a BitTorrent download from multiple peers. Similar to the previous cases, the details of each of the connections is presented in the following table. This time only ten of the connections are presented.

Connection Total transferred bytes Duration (in seconds) RTT (in milliseconds)
1 108851134 58 40
2 90435681 58 36
3 57971584 53 100
4 32000012 29 68
5 32557334 35 31
6 27199361 31 33
7 26329578 31 122
8 38834490 56 146
9 23571761 35 74
10 36252962 55 66
  1. Discuss the TCP fairness for this case. For all of these questions you must take a closer look at the relationships between the characteristics of the different connections and discuss your findings in the context of the different experiments. You are expected to show that you understand the concept of TCP fairness and how the different scenarios may impact the throughput relationships that you observe and those that you may expect in general. To help the discussion you may for example want to create a scatter plot that show the estimated round trip time (RTT) and throughput against each other (for the different connections). You also want to carefully examine and discuss the above throughput equation and how it may apply to each scenario.

Demonstration and Report

For this assignment you will need to write a report that carefully answers each of the three tasks A (Q1-12), B (Q13-15), and C (Q16-18, as outlined above). Please structure your report such that your answers are clearly indicated for each question (and section of the assignment). It is not the TA's task to search for the answers. Both the questions themselves and the corresponding answers should be clearly stated (and indicated) in your report. Structure your report accordingly. Furthermore, your answers should be explained and supported using additional evidence, when applicable.

It is important that you demonstrate the assignment (and discuss your report) with the TA before handing in the report. Also, in addition to having a draft of the report ready, please make sure to open Wireshark and have the trace files ready before calling the TA for the demonstration.

To assess your understanding of the lab, during the demonstration, the TA may ask similar questions as those in the report. As the assignments are done in groups of two, both members of the group will be asked to answer questions. You are expected to clearly explain and motivate your answers both verbally AND in the written report.

Additional instructions and information about the reports can be found here. Please take this chance to read the guidelines carefully.

OPTIONAL: More Detailed Time Sequence Number Plots

The Time-Sequence-Graph tool in Wireshark does not show the acknowledgment packets together with those transmitted segments they correspond to. In the next part of this assignment, you will use gnuplot to plot both TCP segments and their corresponding acknowledgments on the same graph:

gnuplot> set xrange [0:0.5]
gnuplot> replot
Figure 4
Figure 4: gnuplot's window

As you will shortly see, you can export the plots generated by gnuplot to a number of formats including Postscript, PNG, JPEG, etc. Additionally, you can put a number of gnuplot commands in a text file (called a gnuplot script file) and run it using the following command at the terminal:

% gnuplot scriptfile.gp

where scriptfile.gp is the name of the text file containing the gnuplot commands. Here's a sample gnuplot script file which demonstrates exporting the same plot to multiple file formats with different resolutions. The # character comments out the words following it on the same line.

# This is a sample gnuplot script file

# Positioning the graph key (legend)
set key top left
set key box

set size 1.0, 1.0

# Configuring the output to be Postscript
set terminal postscript landscape enhanced color
set output "plot1.ps"

plot "sender.txt"  using 1:2 title 'Data Packet' with points , \
        "acks.txt" using 1:2 title 'Ack Packet' with points


# Configuring the output to be PNG
set terminal png size 800, 600
set output "plot1.png"

replot

# Configuring the output to be JPEG
set terminal jpeg size 1024,768
set output "plot1.jpeg"

replot

Note: You can convert postscript files to PDF by typing ps2pdf filename.ps filename.pdf in the terminal.

Estimating the Congestion Window Size

In Section 3.7 of the textbook, you read that the amount of unacknowledged data at the sender side is always less than or equal to min{cwnd,rwnd}, where cwnd is the congestion window size and rwnd is the amount of memory space left in the receive buffer (see Section 3.5.5). There, it is also mentioned that if the receive window is so large that rwnd constraint can be ignored, the number of unacknowledged bytes is only limited by cwnd. Therefore, under this assumption there will be a correspondence between the number of unacknowledged packets and the value of cwnd. In this part of the assignment, you will plot the estimated cwnd versus time graph by assuming that the rwnd constraint in min{cwnd,rwnd} can be ignored.

Practice questions (not to be explicitly answered):

  1. Use gnuplot to plot the timestamps (x-axis) and sequence numbers (y-axis) of the sent packets, as well as the corresponding acknowledgement numbers (y-axis) in the same time-sequence plot (such that sequence numbers and acknowledgements are lined up). You may want to use a base value of zero for the first byte, such as to ensure that the y-axis starts at zero. Also, unless you have already done so in previous questions, plot the estimated cwnd as a function of time, carefully taking into account the exact timestamps of each packet in the trace.
  2. Implement an algorithm (in a language of your choice) to calculate the data required for plotting the cwnd versus time graph, and export that data as a text file to be used by gnuplot. As input, your algorithm should receive two text files as described below:
  3. Plot the cwnd versus time using gnuplot and the output of the program that you developed for Question 18. Compare this plot with the plot that you manually obtained for Question 17.

More Experiments with RTT and Throughput

Among the files provided to you for this lab are the following two pairs of input files:

These files are formatted using the same format as the input to your program, developed for Question 18. The name of each file shows if it contains the sequence numbers or the acknowledgment numbers. Also if the name of the file contains the word upload, it means that the file is captured at the transmitting side and if it contains the word download, it is captured at the receiving side. As you will see in this task, the capture side can impact how one should interpret the packet data.

Practice questions (not to be explicitly answered):

  1. Run the algorithm you have developed for Question 18 using the upload files (i.e. upload_1_seq.txt and upload_1_ack.txt) as input, and plot the cwnd versus time plot by using gnuplot and the output of your algorithm.
  2. Plot the time sequence graph for the upload files and by using these plots validate the correctness of the plots you obtained in Question 20. Explain the validation method you have used with some examples.
  3. Repeat Questions 20 and 21 for the download files (i.e. download_1_seq.txt and download_1_ack.txt) and see if you can validate the cwnd versus time plot. Explain the discrepancies between the plots and suggest a simple workaround for your algorithm.