LiU > IDA > Real-Time Systems Lab
ABOUT
MEMBERS
COOPERATION
PROJECTS
PUBLICATIONS
COURSES
OPEN POSITIONS
THESES
ALUMNI

Announcements

[16 May 2017] A bachelor student at RTSLAB was awarded the best thesis award from IDA - Tim Hultman. more ...

[12 May 2016] A master student at RTSLAB was awarded the best thesis award from IDA - Alexander Alesand. more ...

[12 May 2016] A bachelor student at RTSLAB was awarded the best thesis award from IDA - Mathias Almquist and Viktor Almquist. more ...

[25 May 2015] A master student at RTSLAB was awarded the best thesis award from IDA - Klervie Toczé. more ...

[26 May 2014] A bachelor student at RTSLAB was awarded the best thesis award from IDA - Simon Andersson. more ...

[31 May 2012] A masters student at RTSLAB was awarded the best thesis award from IDA - Ulf Magnusson. more ...

[27 February 2008] A masters student at RTSLAB was awarded the best thesis award from IDA - Johan Sigholm. more ...

[03 March 2004] A masters student at RTSLAB was awarded the best thesis award from IDA - Tobias Chyssler. more ...

[01 Jul 2003] For second year in a row a masters student at RTSLAB was awarded the best thesis award from SNART - Mehdi Amirijoo. more ...

Master Thesis - Past Projects - Abstract

Hadoop Read Performance During Datanode Crashes

ID: LIU-IDA/LITH-EX-G--16/056--SE

This bachelor thesis evaluates the impact of datanode crashes on the performance of the read operations of a Hadoop Distributed File System, HDFS. The goal is to better understand how datanode crashes, as well as how certain parameters, affect the performance of the read operation by looking at the execution time of the get command. The parameters used are the number of crashed nodes, block size and file size. By setting up a Linux test environment with ten virtual machines and Hadoop installed on them and running tests on it, data has been collected in order to answer these questions. From this data the average execution time and standard deviation of the get command was calculated. The network activity during the tests was also measured. The results showed that neither the number of crashed nodes nor block size had any significant effect on the execution time. It also demonstrated that the execution time of the get command was not directly proportional to the size of the fetched file. The execution time was up to 4.5 times as long when the file size was four times as large. A four times larger file did sometimes result in more than a four times as long execution time. Although, the consequences of a datanode crash while fetching a small file appear to be much greater than with a large file. The average execution time increased by up to 36% when a large file was fetched but it increased by as much as 85% when fetching a small file.

Keywords: hadoop, read performance, datanode, crashes, impact, file size, block size, number, crashed, nodes, distributed systems

File: Click here to download/view the thesis

Author(s): Fabian Johannsen and Mattias Hellsing

Contact: Mikael Asplund

Click here to return.
Last modified February 2017. If you have questions or suggestions for the webpages, contact the webmaster