Bonus Assignment (TDTS11, 2015)

For this bonus assignment you are expected to solve one out of three exercises. These exercises are fairly open ended and there are many ways they can be solved. These questions are for bonus marks, so you will likely need to spend some time figuring out how to best solve the questions. This may involve doing some detective work, and learning new tools and/or techniques. It is important that you clearly explain the problem you are solving, how you solved the problem, including any abstractions/assumptions that you make to capture the dynamics of the system in the question, and what insights that your solution provide.

With the risk of being repetitive, please clearly state and motivate any assumptions that you make. Explain and comment code that you submit, and make sure that answers are clearly explained and motivated.

The assignment can give up to four (4) bonus marks, with marks assigned based on the quality of the solution. The bonus marks will only be valid for the original exam (in March 2015). You are expected to solve the question in pairs (with your lab partner).

The assignment is due on March 6, at 16:59 and should be submitted as a hardcopy (on paper) to Niklas (or placed in his mailbox) by the time of the deadline AND if you do the extension of assignment 4 also to Rahul.

Note that the deadline of this bonus assignment is earlier than the final deadline for the other assignments.

Good vs Bad: File Sharing Simulations

While this question can be analytically modeled, you may want to use simulations to solve the problem. Consider a file-sharing community with M users. Assume that all M users want to download the same file, and N of these users makes the file available at time zero. Furthermore, assume that n out of these N users are making available corrupt copies (malicious or fake copies, for example) and that the downloader of a file would not know if a file is corrupt or not until the file is fully downloaded. (The other N-n copies are good copies of the file.) You can assume that it takes T time units to upload a file to a single user and all users are willing to upload copies until every peer in the system except the n bad peers have a good copy of the file.

Please simulate and plot how the number of good (and bad) copies in the system changes with time for different initial values n, N, and M. (Without loss of generality you can pick T=1.) You can initially consider the case when the "good" peers always check if the file is corrupt or not, before uploading the file to somebody else. In the case a peer discover that the file is corrupt, they delete the file and try to download a new copy at random. Also, consider the more general case when peers only check the validity of a file with a probability p each time they either have downloaded or uploaded the file. What happen when you change the probability p?

Extension of Assignment 4

For this question you should have handed in assignment 4 by the above deadline and without any additional work (after this date), sometimes requested by the TA, you should have successfully completed and been awarded at least 24 points for the assignment. and done some additional investigation using one of the tools created (automated scripts, crawler, or netninny proxy).

For the bonus part you should hand in

Examples of additional investigation may be a visualization and comparison of the tree topology observed for some different websites, a comparison of the type of files types used on different sites, a discussion about the amount of compression at the different sites, of something else you may be interested in taking a closer look at, related to the top 25 sites, for example.

IMPORTANT NOTE: This extension assignments (and the information about which tasks that have been solved) will be coordinated between Rahul and Niklas.

Cache Occupancy

For this question you are expected to use math (and possibly simulations) to estimate the average time that different content spend in a Web cache. You can assume that the cache serves a very large number of users from a fixed set of objects. Assume that there are N objects, all objects are of the same size L, the cache has a total size B, and the requests for each file i follows a Poisson process with request rate λi, where the rates λi follow a Zipf distribution with Zipf parameter α. You should also assume that the cache use a FCFS cache replacement policy.

Given the above information you should determine a reasonable estimate for the time it takes for a file to get kicked out of the cache, as well as how much of the time each object would spend in the cache if the system remained in steady-state for a very long time. When interpreting your results, you may want to consider different B, N, λ0, and α. (Without loss of generality you can pick L=1.)

Current Research Topic

This exercise involves both reading and some exploration/implementation/analysis. First, carefully read a full-length research paper (not posters or demos!) that has been published the last 5 years (2010-2014) in one of the following conferences: ACM SIGCOMM, ACM SIGMETRICS, or ACM IMC. These papers are typically 12 or 14 pages long and can typically be accessed through the author's websites, the conference website, and/or through the ACM digital library. (The ACM digital library can be accessed for free from any computer on campus, thanks to the library; otherwise, author copies are typically easy to find by pasting the title into google, and let google do the job for you ...) Second, carefully summarize the paper and the key insights from the paper. Finally, build your own (relatively) simple simulator, mathematical model, or measurement framework to analyze/validate some smaller/specific aspect considered in the paper or to further investigate some question(s) raised in the paper.