Analysis of MapReduce Model on Big Data Processing within Cloud Computing

Marcellinus Ferdinand Suciadi

Abstract


Nowadays cloud computing is becoming a trend on big data processing. Google created MapReduce model to simplify the complex computation of big data processing by configuring and splitting the data into key/values pair to be processed in parallel, usually within a network of computers, then merge the results. However, MapReduce model has its limitations. Researchers have been trying to improve the model resulting in some newer models, such as Mantri, Camdoop, Sudo, and Nectar model. Each model exploits the different characteristics of MapReduce model to create improvements in different way and cases. Challenges and improvements still remain within these enhanced models, which open new possibilities on area of research.

Full Text:

PDF

References


Dean, J., and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. Comm. of ACM 51, 1 (2008).

Paolo Costa, Austin Donnelly, Ant Rowstron, Greg O'Shea. "Camdoop: Exploiting In-network Aggregation for Big Data Applications". Proceedings NSDI, April, 2012

Jiaxing Zhang, Hucheng Zhou, Rishan Chen, Xuepeng Fan, Zhenyu Guo, Haoxiang Lin, Jack Y.Li, Wei Lin, Jingren Zhou, and Lidong Zhou, “Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions”, in Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI '12), USENIX, 25 April 2012.

MSDN: LINQ to HPC. http://msdn.microsoft.com/en-us/library/hh378101.aspx. Retrieved 12 November 2012.

Apache Hadoop Release Notes. http://hadoop.apache.org/releases.html#News. Retrieved 12 November 2012.

Foley, Mary Jo. Microsoft to develop Hadoop distributions for Windows Server and Azure. ZDNet. http://www.zdnet.com/blog/microsoft/microsoft-to-develop-hadoop-distributions-for-windows-server-and-azure/10958. Retrieved 12 November 2012.

Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis, “Dremel: Interactive Analysis of Web-Scale Datasets”, in Proceedings of the 36th International Conference on Very Large Data Bases (2010), pp. 330-339.

M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin, "MapReduce and Parallel DBMSs: Friends or Foes?," Communications of the ACM, vol. 53, iss. 1, pp. 64-71, 2010.

A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker, "A comparison of approaches to large-scale data analysis," in SIGMOD ’09: Proceedings of the 35th SIGMOD international conference on Management of data, New York, NY, USA, 2009, pp. 165-178.

David F. Carr. How Google Works. http://www.baselinemag.com/c/a/Infrastructure/How-Google-Works-1/. Retrieved 12 November 2012.

. Ant Rowstron. Rethinking the Data Center: CamCube and beyond. http://research.microsoft.com/en-us/um/people/antr/borgcube/borgcube.htm. Retrieved 12 November 2012.

Rimal. P, E. Choi, and I. Lan, “A Taxonomy and Survey of Cloud Computing Systems”. 2009 Fifth International Joint Conference on INC, IMS and IDC

A. Ganesh, S. Kandula, and A. Greenberg, “Reining in the Outliers in Map-Reduce Clusters using Mantri”, 9th USENIX Symposium on Operating Systems Design and Implementation

P. Gunda, L. Ravin, C. A. Thekkath, Y. Yu, L. Zhuang, “Nectar: Automatic Management of Data and Computation in Datacenters”

Google's Colossus Makes Search Real-time by Dumping MapReduce. http://highscalability.com/blog/2010/9/11/googles-colossus-makes-search-real-time-by-dumping-mapreduce.html. Retrieved 12 November 2012.


Refbacks

  • There are currently no refbacks.