Statistics: The Essential Tool for Data Analysis

Authors

  • Avinash Mishra UG Student, Department of Computer Science and Engineering, Chandigarh University, Gharuan, Punjab, India.

Keywords:

Data Science, Data Exploration, Architectures, Statistics

Abstract

In this essay, we support the claim that statistics is one of the disciplines that is most crucial for providing tools and methods to uncover structure in data and provide deeper insight into it, as well as the most crucial for analysing and quantifying uncertainty. The impact of statistics on phases like data gathering and enrichment, data exploration, data analysis and modelling, validation and representation, and reporting is covered along with an overview of various proposed architectures for data science. We also highlight errors made when statistical reasoning is disregarded.

References

Adenso-Diaz, B., Laguna, M.: Fine-tuning of algorithms using fractional experimental designs and local search. Oper. Res. 54(1), 99–114 (2006) 2.Aggarwal, C.C. (ed.): Data Classification: Algorithms and Applications. CRC Press, Boca Raton .

Allen, E., Allen, L., Arciniega, A., Greenwood, P.: Construction of equivalent stochastic differential equation models. Stoch. Anal. Appl. 26, 274–297 (2008)

Aue, A., Horváth, L.: Structural breaks in time series. J. Time Ser. Anal. 34(1), 1–16 (2013)

Berger, R.E.: A scientific approach to writing for engineers and scientists. IEEE PCS Professional Engineering Communication Series IEEE Press, Wiley (2014)

Bischl, B., Mersmann, O., Trautmann, H., Weihs, C.: Resampling methods for meta-model validation with recommendations for evolutionary computation. Evol. Comput. 20(2), 249–275 (2012)

Bischl, B., Schiffner, J., Weihs, C.: Benchmarking local classification methods. Comput. Stat. 28(6), 2599–2619 (2013)

Weihs, Claus &Ickstadt, Katja. (2018). Data Science: the impact of statistics. International Journal of Data Science and Analytics.

10.1007/s41060-018- 0102-5. 2] Galeano, Pedro & Peña, Daniel. (2019). Data science, big data and statistics.

Hennig, C., Meila, M., Murtagh, F., Rocci, R.: Handbook of Cluster Analysis. Chapman & Hall, London (2015)

Klein, H.U., Schäfer, M., Porse, B.T., Hasemann, M.S., Ickstadt, K., Dugas, M.: Integrative analysis of histone chip-seq and transcription data using Bayesian mixture models. Bioinformatics 30(8), 1154–1162 (2014

Knoche, S., Ebeling, M.: The musical signal: physically and psychologically, chap 2. In: Weihs, C., Jannach, D., Vatolkin, I., Rudolph, G. (eds.) Music Data Analysis—Foundations and Applications, pp. 15–68. CRC Press, Boca Raton (2017)

Koenker, R.: Quantile Regression. Econometric Society Monographs, vol. 38 (2010)

Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT press, Cambridge (2009)

Lütkepohl, H.: New Introduction to Multiple Time Series Analysis. Springer, Berlin (2010)

Published

2023-10-13