With the risk of being repetitive, please clearly state and motivate any assumptions that you make. Explain and comment code that you submit, and make sure that answers are clearly explained and motivated.
The assignment can give up to four (4) bonus marks, with marks assigned based on the quality of the solution. The bonus marks will only be valid for the original exam (in October 2012). You may use either Swedish or English and are expected to solve the question individually.
The assignment is due on October 12, at 16:59 and should be submitted as a hardcopy (on paper) to Niklas (or placed in his mailbox) by the time of the deadline.
While this question can be analytically modeled, you may want to use simulations to solve the problem. Consider a file-sharing community with M users. Assume that all M users want to download the same file, and N of these users makes the file available at time zero. Furthermore, assume that n out of these N users are making available corrupt copies (malicious or fake copies, for example) and that the downloader of a file would not know if a file is corrupt or not until the file is fully downloaded. (The other N-n copies are good copies of the file.) You can assume that it takes T time units to upload a file to a single user and all users are willing to upload copies until every peer in the system except the n bad peers have a good copy of the file.
Please simulate and plot how the number of good (and bad) copies in the system changes with time for different initial values n, N, and M. (Without loss of generality you can pick T=1.) You can initially consider the case when the "good" peers always check if the file is corrupt or not, before uploading the file to somebody else. In the case a peer discover that the file is corrupt, they delete the file and try to download a new copy at random. Also, consider the more general case when peers only check the validity of a file with a probability p each time they either have downloaded or uploaded the file. What happen when you change the probability p?
For this question you may want to use a combination of wireshark, scripts, gnuplot, and various utility tools available in unix and/or the Internet. Identify the top 100 most popular Web pages on the Internet (using alexa.com). If you access only the first page of each of these sites, how many servers do you contact in total? How many of the contacted servers (or hosts) are operated by a different company than the originally contacted host? How many of all sites are served by at least one server that is owned by Google? How many of the servers use gzip? How much bandwidth did gzip save in these cases? Where are all the servers located? How far away are the servers located? Can you visualize these results using graphs, plots, and/or tables?
For this question you are expected to use math (and possibly simulations) to estimate the average time that different content spend in the a Web cache. You can assume that the cache serves a very large number of users from a fixed set of objects. Assume that there are N objects, all objects are of the same size L, the cache has a total size B, and the requests for each file i follows a Poisson process with request rate λi, where the rates λi follow a Zipf distribution with Zipf parameter α. You should also assume that the cache use a FCFS cache replacement policy.
Given the above information you should determine a reasonable estimate for the time it takes for a file to get kicked out of the cache, as well as how much of the time each object would spend in the cache if the system remained in steady-state for a very long time. When interpreting your results, you may want to consider different B, N, λ0, and α. (Without loss of generality you can pick L=1.)
This exercise involves both reading and some exploration/implementation/analysis. First, carefully read a full-length research paper (not posters or demos!) that has been published the last 5 years (2008-2012) in either ACM SIGCOMM or ACM SIGMETRICS. These papers are typically roughly 12 pages long and can typically be accessed through the author's websites, the conference website, and/or through the ACM digital library. (The ACM digital library can be accessed for free from the university; otherwise author copies are typically easy to find by pasting the title into google, and let google do the job for you ...) Second, carefully summarize the paper and the key insights from the paper. Finally, build your own (relatively) simple simulator (or alternative build a simple mathematical model) to analyze/validate some smaller/specific aspect considered in the paper (or question raised in the paper).