Design and Approximation of Big Data Investigation using Mapreduce and Hadoop

Authors

  • Sahil Joshi student

Keywords:

Hadoop, MapReduce, Hadoop Acceleration

Abstract

Hadoop is a stylish open-source execution of the MapReduce software
design model. Map- Reduce and HDFS are the two major mechanisms
of Hadoop. To avoid network congestions, a new method to preprocess
middle data amid the maps and reduce stages, thus cumulative the
throughput of Hadoop clusters. These take in a serialization barrier that
interruptions the lessen phase and repetitive merges, disk accesses.
To handle large dataset needs to advancement the performance by
modifying existing Hadoop system. Describe Hadoop-A, an acceleration
framework that optimizes Hadoop with plugin mechanism employed
for fast data movement, overwhelming its existing limitations. A merge
algorithm is familiarized to merge data without replication and disk
access.

Author Biography

Sahil Joshi, student

Department of Computer Science Engineering Global Institute of Technology, Jaipur, Rajasthan, India

References

Dean J, Ghemawat S. Mapreduce: Simplified data

processing on large clusters. Sixth Symp.on Operating

System Design and Implementation (OSDI),2004;

Liu J, Wu J, Panda DK. High Performance RDMA-Based

MPI Implementation over InfiniBand. Int’l J.Parallel

Programming 2004: 32; 167-198.

Jiang D, Ooi BC, Shi L et al. The performance of

mapreduce: An in- depth study. In Proceedings of the

th International Conference on Very Large DataBases

(VLDB), 2010; 3: 472-483.

Hsiao JH, Kao SJ. A Usage-Aware Scheduler for

Improving MapReduce Performance in Heterogeneous

Environments. International Conference on Information

Science, Electronics and Electrical Engineering (ISEEE),

; 3: 1648-1652.

Li B, Mazur E, Diao Y et al. A Platform for Scalable OnePass Analytics Using MapReduce,” Proc. ACM SIGMOD

Int’l Conf. Management of Data (SIGMOD ’11), 2011

- 996.

Condie T, Conway N, Alvaro P. Elmeleegy and Systems

Design and Implementation (NSDI) 2010; 312-328.

Yu W, Member, IEEE, Yandong Wang, and Xinyu Que.

“Design and Evaluation of Network-Levitated Merge

for Hadoop Acceleration” IEEE Transactions On Parallel

And Distributed Systems 2014; 25(3).

Pavlo A, Paulson E, Rasin A et al. A comparison of

approaches to large-scale data analysis. In SIGMOD,

; 165-178. ACM.

Published

2020-05-14