Energy Saving in Mobile Devices

 

By Farrokh Ghani Zadegan and Niklas Carlsson, September 2012

Contents

Overview of the Assignment

The purpose of this assignment is to investigate potential energy savings in mobile devices while surfing the Web. For this task we will use Wireshark, which is a packet sniffer, to study a prerecorded trace file. A trace file is a set of packets captured from a network interface card and stored in a standard packet capture format. If you are not already familiar with Wireshark, you can use Wireshark Lab: Getting Started as a quick start guide. However, this manual will explain every step, such that only a basic familiarity with the Wireshark graphical user interface is needed.

In this assignment, we will analyze this trace file, which is captured on a smartphone running Android, while surfing the Web. The trace was captured while visiting the Web's five most popular Web sites, according to www.alexa.com. The basic idea is to identify the idle times and data transfers during the trace period, and use this information to investigate the energy savings that may be possible. We have performed a rough preliminary analysis on the trace file. Figure 1 shows a bar chart of all TCP connections which carry HTTP data. For a larger view, click on the image, and for a PDF version click here. Here, the horizontal axis shows the time (in milliseconds) and the vertical axis shows connection ID. Bars are grouped by using different colors based on the server address. The number in front of each bar is the amount of exchanged data in kilobytes during the corresponding TCP connection.

TCP connections
Figure 1: TCP connections carrying HTTP data

Consider the first five connections. Note that there is no correspondence between the length of the TCP connection and the amount of exchanged data. This can be explained by the browser's use of persistent connections, for example, in which multiple requests and responses are serialized over the same TCP connection. Unfortunately, this makes it non-trivial to determine how much idle times there may be between a request and its corresponding response, or between a response and the next request. (Note that a single user request may result in a large series of HTTP request. For example, in our trace file, the user only made five page requests; each after the previous Web page had been fully loaded in the mobile browser.) In this lab, we will use Wireshark to extract more information about the time and duration of each HTTP request and its corresponding response. We will use this information to estimate the amount of idle time during which the smartphone potentially could turn off some hardware components to save energy (e.g., let the radio go to sleep).

Information Extraction

Using Wireshark to extract information

To get Wireshark running on Sun machine in SU labs, you need to enter module add net/wireshark at the terminal command prompt and press Enter. After successful loading of the module, type wireshark and press Enter. Download the trace file from here and store it in a local folder. In the main window of Wireshark, choose File->Open to load this trace file. After loading the file, the Wireshark window should be similar to Figure 2.

Initial Wireshark window
Figure 2: Wireshark window after loading the trace file.

In order to extract the start time and duration of each request and response, we need to use a filter string similar to the following:

tcp contains "GET" || tcp contains "HTTP/1.1" || tcp.segment

The above filter string requires a bit of explanation. First of all, it should be noted that large HTTP requests and responses (i.e. those which are larger than the maximum TCP segment size, typically 1460 bytes) are broken across multiple TCP segments. While Wireshark marks the last segment of such a series as the "actual" request/response, this information is not sufficient for the purpose of this lab (as we want to extract both the starting time and the duration of each request/response). The first term in the above filter string tells Wireshark that we are interested in the TCP segments which contain the string "GET", and this will return the packets containing the actual http request. Similarly, the second term in the filter string tells Wireshark to return the first packet of the HTTP responses. Finally, the third term (tcp.segment) tells Wireshark that we also are interested in the last segment of each request and response (which Wireshark has marked with number of reassembled segments).

After entering and applying the above mentioned filter string, the Wireshark screen should looks similar to displayed in Figure 3.

Wireshark window after applying the filter string
Figure 3: Wireshark window after applying the filter string

By clicking on packet number 211, Wireshark tells us that this packet is reassembled from multiple frames (see the packet details pane) and that the first packet of this series is the one with number 26. Note that our filter string allowed us to find the first packet as well as the last packet. By clicking on the last packet of a series, such as packet number 211, Wireshark reassembles and (if needed) decompresses the compressed data and shows the whole request/response in a separate tab in packet bytes pane (the lowermost pane in the Wireshark window).

So far, we have filtered out the first and last packets of all of the requests and responses in the trace file. Please note that if the requests/responses are not broken across multiple segments there is only one record for them in the filtered list. However, we need to know which response goes with which request, in order to calculate the idle times between request and responses (and between a response and the next request). Wireshark assigns a number to each TCP connection which is called the TCP Stream number. We will add this stream number as a column to the packet listing pane in Wireshark to be used in finding out the matching request-response pairs. To add the TCP stream number as a column in Wireshark, select Edit->Preferences..., and click on the Columns item under the User Interface node in the leftmost list in the opened window. Click on the Add button, select Custom from the dropdown menu for the Field type, and enter tcp.stream in the field name box. Optionally you can click on the New Column name that Wireshark has assigned to the newly added column and enter a meaningful name such as Stream Number. You may want to reorder the columns by dragging the newly added column above the Info column. Once you have added the new column, the list of columns in the Preferences window might look like Figure 4. After closing the Preferences window (by clicking on the Ok button) the new column will be shown in the packet listing pane.

The Preferences window in Wireshark after adding the TCP Stream number as a new column
Figure 4: The Preferences window in Wireshark after adding the TCP Stream number as a new column.

We will now move the list of packets to a spreadsheet for further analysis. We will use OpenOffice.org Calc for this purpose, since it is installed on the lab machines. However, you may use MS Excel if you feel more comfortable with it. To export the filtered list of packets, select File->Export->As "CSV" (Comma Separated Values packet summary) file. In the opened window enter a filename (use .csv extension), click on All packets and the Displayed button in the packet range group, and press OK.

Using OpenOffice.org Calc for Processing the Data

To get the packet list file imported into Calc, it should have a csv extension (otherwise Calc will not show the text import wizard). Run Calc and open the file. The text import wizard shows up. In the Separator Options, choose Separated by and Comma, and press OK.

The packet listing is now imported into the worksheet. It might be a good idea to save the file as ODF Spreadsheet (.ods) to keep the original text file intact. We now need to sort the data by the Stream Number and then by the time. This sorting makes the packets belonging to a given TCP stream be in consecutive cells. To do the sorting, choose Data->Sort, and within the Sort window choose Sort by Stream Number and Then by Time, both Ascending.

Enter the text "Length" into the cell H1 and the following formula into the cell H2:

=IF(ISNUMBER(FIND("TCP segment";G2));B3-B2;IF(ISNUMBER(FIND("TCP segment";G1));0;0.000001))
Important Note:If in the settings of your system, comma is used as the decimal symbol, you need to (1) in the above formula, replace 0.000001 with 0,000001, and (2) select the data in the "Time" column and replace dots with commas (to do so, make sure in the Find  & Replace window, you choose More Options and then Current selection only, in order not to change the data in other columns inadvertently).

This formula does the following (you might want to find out how it works):

Copy and paste this formula into the H cells for the rest of the rows. Calc will take care of appropriate modifications in the formula to match the destination row number. Here, you might want to change the cell format to show the values with up to 6 decimal places.

This may be a good time to take a closer look at the records, and to check if everything seems reasonable. As mentioned earlier, Wireshark marks the last segment of a multi-segment HTTP transfer as the actual request/response. If for any reason, such as a lost TCP segment, the request/response cannot be reconstructed from the arrived segments, Wireshark cannot mark the last segment as the HTTP request/response. Unfortunately, there is one such case in the trace file used for this lab. At this time, you should identify and deal with the affected records in a reasonable way; e.g., you may want to delete the affected records or manually adding a record to compensate for the missing record.

We are also interested in having a Boolean value to show whether a row corresponds to a request or to a response. Enter "Is Request" into the cell I1 and the following formula into I2:

=IF(ISNUMBER(FIND("TCP segment";G2));IF(ISNUMBER(FIND("GET";G3));1;0);IF(ISNUMBER(FIND("GET";G2));1;0))

Copy and paste this formula into the I cells for the rest of the rows. A value of 1 in a cell in the I column signifies a request and a value of zero signifies a response.

So far, in columns B, H, and I, we have a list of HTTP events, each represented by a start time, length, and a type (i.e. request or response). In the next step in this assignment we will develop a small program (in the language of your choice), that receives this information and calculates the total amount of idle time, i.e. the periods in which no request or response has been in progress. As input to this program, a text file is required in which each row contains the values for start, length and type of an event. This text file should be sorted in the ascending order on the start time of an event. To create this file, we need to copy the values of the contents of columns B, H, and I into a new worksheet, sort it on the start time and export it as a text file. To copy the values, hold down the Ctrl key (for multiple selection) and click on the headers for columns B, H, and I. Press Ctrl+C, choose Insert->Sheet, and press OK in the Insert Sheet window. Make sure A1 is selected in the newly inserted sheet and press Ctrl+V. Select Data->Sort and in the opened sort window, select Sort by "Time" in the ascending order, and press OK. To save this worksheet as a text file, choose File->Save as, choose Text CSV as file type in the Save As window, enter a file name and press Save. You will be warned that the formatting will be lost by saving as a text file, press Keep Current Format. In the next window, choose a field delimiter (depending on what field delimiter is handled more easily by the programming language you will choose for implementing the idle time calculation algorithm) and press OK. You will receive a warning that only the active worksheet is saved which is fine. you should now have a file that have roughly the following format.

Estimating Energy Saving Potential

Create an event-based simulator that uses the above .csv file (with the start time and duration of each event) as input. Your simulator should use this file to simulate when the radio potentially could be turned on and when it can be turned off (to save energy). Please do as many of the questions below as you find that you have time for, but at least the first. Make assumptions as you find need, but try to justify each additional assumption you make.

While you will not need to present any results for this lab, it should be noted that there likely will be an exam question based on this assignment. Furthermore, if you decide to do the later extensions I would be very happy if you could email (or present) your solutions to me (Niklas). Also, if you are interested in further refinements of modeling such a system, I would be happy to direct you to additional resources.

Basic model: Idle time estimations

In the simplest version, you can assume that the radio can be turned off whenever there are no active HTTP requests or responses. This corresponds to identifying the idle periods in the above trace file. Note that you have the start time of each transaction, as well as for how long that transaction causes the system to be busy. For this task, you simply need to write a program that calculates for how much of the time that there is no active transfer (request or response). Hint: You may want to create a queue of events and process the events in the order they occur.

Extension 1: Warmup and shutdown period

You may now assume that the radio requires X time units to start up and Y time units to shut itself off, but that you still (optimistically) know exactly when the data will be transferred (such that no additional delays are introduced into the system and the system can be turned on/off just in time of transfers). How does your energy savings reduce with X and Y?

Extension 2: PSM

Similar to in the previous question you may now assume that the radio requires X time units to start up and Y time units to shut itself off; however, in contrast to the previous question you must now rely on PSM beacons and PSM pull messages to pull the information of the access point. How can you simulate such a system? What additional parameters do you need? How do you capture the additional delay that the user would have to endure in these simulations? Please show your simulation model and identify any potential energy saving tradeoffs that you identify. What assumptions did you make about the PSM implementation? What information source did you use to justify your assumptions? Note that for this question, you may need to treat requests and responses differently.