Design and Approximation of Big Data Investigation using Mapreduce and Hadoop
Keywords:
Hadoop, MapReduce, Hadoop AccelerationAbstract
Hadoop is a stylish open-source execution of the MapReduce software
design model. Map- Reduce and HDFS are the two major mechanisms
of Hadoop. To avoid network congestions, a new method to preprocess
middle data amid the maps and reduce stages, thus cumulative the
throughput of Hadoop clusters. These take in a serialization barrier that
interruptions the lessen phase and repetitive merges, disk accesses.
To handle large dataset needs to advancement the performance by
modifying existing Hadoop system. Describe Hadoop-A, an acceleration
framework that optimizes Hadoop with plugin mechanism employed
for fast data movement, overwhelming its existing limitations. A merge
algorithm is familiarized to merge data without replication and disk
access.
References
Dean J, Ghemawat S. Mapreduce: Simplified data
processing on large clusters. Sixth Symp.on Operating
System Design and Implementation (OSDI),2004;
Liu J, Wu J, Panda DK. High Performance RDMA-Based
MPI Implementation over InfiniBand. Int’l J.Parallel
Programming 2004: 32; 167-198.
Jiang D, Ooi BC, Shi L et al. The performance of
mapreduce: An in- depth study. In Proceedings of the
th International Conference on Very Large DataBases
(VLDB), 2010; 3: 472-483.
Hsiao JH, Kao SJ. A Usage-Aware Scheduler for
Improving MapReduce Performance in Heterogeneous
Environments. International Conference on Information
Science, Electronics and Electrical Engineering (ISEEE),
; 3: 1648-1652.
Li B, Mazur E, Diao Y et al. A Platform for Scalable OnePass Analytics Using MapReduce,” Proc. ACM SIGMOD
Int’l Conf. Management of Data (SIGMOD ’11), 2011
- 996.
Condie T, Conway N, Alvaro P. Elmeleegy and Systems
Design and Implementation (NSDI) 2010; 312-328.
Yu W, Member, IEEE, Yandong Wang, and Xinyu Que.
“Design and Evaluation of Network-Levitated Merge
for Hadoop Acceleration” IEEE Transactions On Parallel
And Distributed Systems 2014; 25(3).
Pavlo A, Paulson E, Rasin A et al. A comparison of
approaches to large-scale data analysis. In SIGMOD,
; 165-178. ACM.