Assignment 4 (TDTS11): Scripts, utility tools, and trace analysis |
By
Niklas Carlsson and Rahul Hiran, Februaury 2016. Revised by Carl Magnus Bruhner, April 2019. |
IMPORTANT: We want to remind you to read the complete instructions before starting this assignment. Also, note that this assignment is much more demanding than your previous assignments in this course and you therefore should make sure to get started right away.
Motivated by the concept of problem-based learning, the final assignment will involve a number of tasks of varying difficulty that may require that you identify and learn tools and techniques that are not explicitly taught in the theory part of the course but if mastered (after some practice) may significantly simplify and speed up some of the tasks in this assignment and help you in your future careers.
The assignment consists of a number of "tasks" and there are some pointers to help you
on your way. However, to simulate real world problems, the information is not spoon feed,
there is often not a single correct answer (but there are of course good and bad answers!),
and you will be required to identify the relevant information that best help you complete the
tasks. (In fact, there may even be somewhat conflicting advices, which you must weigh against each other.)
|
In your own words, please (concisely) explain what the following tools, sites, and/or services do and what information they may provide:
Note that Google and the 'man' pages on many systems can be valuable information sources.
(For example, try typing "man sort" to find out how you can use "sort" with different arguments.)
|
|
|
|
In this task you have two options: (i) collect your own trace, or (ii) find a person in the class who has collected a trace and borrow that trace. If you select option (ii) it should be clearly stated in your report who you borrowed the trace from. If you select option (i), please share the trace with your classmates. (Looking at each other's traces and/or comparing results may provide additional insights between differences observed, as well as help sanity check your results.)
The trace should be collected as follows:
You should now have a trace in which the "front" page of the 10 most popular sites on the Internet (according to Alexa/Why No HTTPS?) was visited.
Note 1: Analysis of traces from purely text-based browsers (that only display the text) will not be acceptable.
Note 2:
In order to receive usable data for all the tasks, you must disable ad-blockers.
|
|
|
|
|
Pick five (5) information rich pages that have many different http requests and answer the following questions.
For those of the top 10 pages, please answer and discuss the following questions. Note that you may need to use additional tools or complimentary information sources to answer some of the questions below. (Please also refer to the lectures notes for concepts such as RTT, hop count, etc.)
|
Which of the top-25 sites on Alexa have gained the most relative
popularity over the last year? Any other interesting trends on the world wide web?
If possible, please provide both statistical and visual support for your answer.
Eight task [2pt]: The size of the Internet?
In this task you should build your own Web crawler that explores the tree structure of the links (and objects that are used) at a given set of the Web sites. Your crawler should take an URL as input and generate a tree structure of domains that are linked as output. For example, a site "www.foo.com" that have links to domains "bar1.com" and "bar2.com" and use a image from the domain "img.com" should result in the following relationships: "www.foo.com -> bar1.com", "www.foo.com -> bar2.com" and "www.foo.com -> img.com". You do not have to visualize the tree structure (although this can be a fun and interesting exercise in itself). The crawler should be able to handle a variable sized tree, and for the demonstration you should demonstrate the crawler going three (3) levels deep with three (3) URLs on each depth (3x3 = 9 links URLs fetched/displayed).
For this final task you should write half a page describing what you've learned in this assignment with the tasks you've chosen. What challenges have you stumbled upon? What are the key learnings, and why does this matter? Please refer to the specific tasks when explaining.
Your answers should clearly explain what you learned and how you solved the questions. (Note that the steps taken to obtain an answer in many cases are more important than the answer itself.) Only giving your answer is not acceptable.
Please explain if you found additional tools and information sources which helped you answer the above questions. Finally, if you could not solve the question, please explain why it was not possible and what information you would need to solve the question.
Please structure your report such that your answers are clearly indicated for each question (and section of the assignment). It is not the TA's task to search for the answers. Both the questions themselves and the corresponding answers should be clearly stated (and indicated) in your report. Structure your report accordingly. Furthermore, your answers should be explained and supported using additional evidence, when applicable. During the demonstration the TA may ask similar questions to assess your understanding of the lab. You are expected to clearly explain and motivate your answers. As the assignments are done in groups of two, both members of the group will be asked to answer questions.
Additional instructions and information about the reports can be found here. Please take this chance to read the guidelines carefully.
|
Using the "Filter" window you can filter the packets/information that you want to export. For example, if you filter using "http", then only the http packet information will be displayed. When you export using the "As displayed" option, only displayed packets will be exported to text file.
You may want to use different filters to answer different questions in the assignment. Please note that part of the assignment is to determine which information to extract, such as to simplify the processing of your exported traces. Here, it is important to keep track of which protocol information to filter for when answering each of the questions.
It is also possible to only export the "Packet summary line". Other packet details options are:
Below are some basic examples.
1. Packet summary:
No. Time Source Destination Protocol Length Sequence number Acknowledgement number Info 1 0.000000000 10.0.1.82 205.251.219.181 HTTP 440 1 1 GET /images/help/bubble.png HTTP/1.1 2 0.004133000 10.0.1.82 205.251.219.181 HTTP 447 1 1 GET /images/help/bubble_filler.png HTTP/1.1 6 0.029374000 205.251.219.181 10.0.1.82 HTTP 740 1449 375 HTTP/1.0 200 OK (PNG) 8 0.034654000 205.251.219.181 10.0.1.82 HTTP 754 1 382 HTTP/1.0 200 OK (PNG)
2. HTTP with details:
Frame 1: 440 bytes on wire (3520 bits), 440 bytes captured (3520 bits) Ethernet II, Src: QuantaMi_13:af:92 (aa:aa:aa:aa:aa:aa), Dst: Apple_b9:73:56 (aa:aa:aa:aa:aa:aa) Internet Protocol Version 4, Src: 10.0.1.82 (10.0.1.82), Dst: 205.251.219.181 (205.251.219.181) Transmission Control Protocol, Src Port: 38710 (38710), Dst Port: http (80), Seq: 1, Ack: 1, Len: 374 Hypertext Transfer Protocol GET /images/help/bubble.png HTTP/1.1\r\n Host: pcache.alexa.com\r\n Connection: keep-alive\r\n User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.70 Safari/537.17\r\n Accept: */*\r\n Referer: http://www.alexa.com/topsites\r\n Accept-Encoding: gzip,deflate,sdch\r\n Accept-Language: en-US,en;q=0.8\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3\r\n \r\n [Full request URI: http://pcache.alexa.com/images/help/bubble.png] Frame 2: 447 bytes on wire (3576 bits), 447 bytes captured (3576 bits) Ethernet II, Src: QuantaMi_13:af:92 (aa:aa:aa:aa:aa:aa), Dst: Apple_b9:73:56 (aa:aa:aa:aa:aa:aa) Internet Protocol Version 4, Src: 10.0.1.82 (10.0.1.82), Dst: 205.251.219.181 (205.251.219.181) Transmission Control Protocol, Src Port: 38707 (38707), Dst Port: http (80), Seq: 1, Ack: 1, Len: 381 Hypertext Transfer Protocol GET /images/help/bubble_filler.png HTTP/1.1\r\n Host: pcache.alexa.com\r\n Connection: keep-alive\r\n User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.70 Safari/537.17\r\n Accept: */*\r\n Referer: http://www.alexa.com/topsites\r\n Accept-Encoding: gzip,deflate,sdch\r\n Accept-Language: en-US,en;q=0.8\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3\r\n \r\n [Full request URI: http://pcache.alexa.com/images/help/bubble_filler.png]
3. IP packets with IP and HTTP details:
Frame 1: 440 bytes on wire (3520 bits), 440 bytes captured (3520 bits) Ethernet II, Src: QuantaMi_13:af:92 (aa:aa:aa:aa:aa:aa), Dst: Apple_b9:73:56 (aa:aa:aa:aa:aa:aa) Internet Protocol Version 4, Src: 10.0.1.82 (10.0.1.82), Dst: 205.251.219.181 (205.251.219.181) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport)) Total Length: 426 Identification: 0xbe51 (48721) Flags: 0x02 (Don't Fragment) Fragment offset: 0 Time to live: 64 Protocol: TCP (6) Header checksum: 0xc5f9 [correct] Source: 10.0.1.82 (10.0.1.82) Destination: 205.251.219.181 (205.251.219.181) Transmission Control Protocol, Src Port: 38710 (38710), Dst Port: http (80), Seq: 1, Ack: 1, Len: 374 Hypertext Transfer Protocol GET /images/help/bubble.png HTTP/1.1\r\n Host: pcache.alexa.com\r\n Connection: keep-alive\r\n User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.70 Safari/537.17\r\n Accept: */*\r\n Referer: http://www.alexa.com/topsites\r\n Accept-Encoding: gzip,deflate,sdch\r\n Accept-Language: en-US,en;q=0.8\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3\r\n \r\n [Full request URI: http://pcache.alexa.com/images/help/bubble.png]
4. Overview of all protocol headers
No. Time Source Destination Protocol Length Sequence number Acknowledgement number Info 1 0.000000000 10.0.1.82 205.251.219.181 HTTP 440 1 1 GET /images/help/bubble.png HTTP/1.1 Frame 1: 440 bytes on wire (3520 bits), 440 bytes captured (3520 bits) Ethernet II, Src: QuantaMi_13:af:92 (20:7c:8f:13:af:92), Dst: Apple_b9:73:56 (24:ab:81:b9:73:56) Internet Protocol Version 4, Src: 10.0.1.82 (10.0.1.82), Dst: 205.251.219.181 (205.251.219.181) Transmission Control Protocol, Src Port: 38710 (38710), Dst Port: http (80), Seq: 1, Ack: 1, Len: 374 Hypertext Transfer Protocol No. Time Source Destination Protocol Length Sequence number Acknowledgement number Info 2 0.004133000 10.0.1.82 205.251.219.181 HTTP 447 1 1 GET /images/help/bubble_filler.png HTTP/1.1 Frame 2: 447 bytes on wire (3576 bits), 447 bytes captured (3576 bits) Ethernet II, Src: QuantaMi_13:af:92 (20:7c:8f:13:af:92), Dst: Apple_b9:73:56 (24:ab:81:b9:73:56) Internet Protocol Version 4, Src: 10.0.1.82 (10.0.1.82), Dst: 205.251.219.181 (205.251.219.181) Transmission Control Protocol, Src Port: 38707 (38707), Dst Port: http (80), Seq: 1, Ack: 1, Len: 381 Hypertext Transfer Protocol No. Time Source Destination Protocol Length Sequence number Acknowledgement number Info 3 0.020771000 205.251.219.181 10.0.1.82 TCP 66 1 375 http > 38710 [ACK] Seq=1 Ack=375 Win=89 Len=0 TSval=276 5196414 TSecr=4294963077 Frame 3: 66 bytes on wire (528 bits), 66 bytes captured (528 bits) Ethernet II, Src: Apple_b9:73:56 (24:ab:81:b9:73:56), Dst: QuantaMi_13:af:92 (20:7c:8f:13:af:92) Internet Protocol Version 4, Src: 205.251.219.181 (205.251.219.181), Dst: 10.0.1.82 (10.0.1.82) Transmission Control Protocol, Src Port: http (80), Dst Port: 38710 (38710), Seq: 1, Ack: 375, Len: 0 ... ... ...
Some commands that might be useful in this task:
awk grep sed sort uniq wc
Finally, some random examples of the above commands that may be worth playing around with:
cat httpAsdisplayed.txt | grep "Host" | sed 's/\\r\\n//g'| awk '{print $2}' | sort | uniq -c | wc cat httpAsdisplayed.txt | grep -A 15 "Host" | grep "Referer" | wc cat httpAsdisplayed.txt | grep "Host" | wc