Statistics: The Essential Tool for Data Analysis
Keywords:
Data Science, Data Exploration, Architectures, StatisticsAbstract
In this essay, we support the claim that statistics is one of the disciplines that is most crucial for providing tools and methods to uncover structure in data and provide deeper insight into it, as well as the most crucial for analysing and quantifying uncertainty. The impact of statistics on phases like data gathering and enrichment, data exploration, data analysis and modelling, validation and representation, and reporting is covered along with an overview of various proposed architectures for data science. We also highlight errors made when statistical reasoning is disregarded.
References
Adenso-Diaz, B., Laguna, M.: Fine-tuning of algorithms using fractional experimental designs and local search. Oper. Res. 54(1), 99–114 (2006) 2.Aggarwal, C.C. (ed.): Data Classification: Algorithms and Applications. CRC Press, Boca Raton .
Allen, E., Allen, L., Arciniega, A., Greenwood, P.: Construction of equivalent stochastic differential equation models. Stoch. Anal. Appl. 26, 274–297 (2008)
Aue, A., Horváth, L.: Structural breaks in time series. J. Time Ser. Anal. 34(1), 1–16 (2013)
Berger, R.E.: A scientific approach to writing for engineers and scientists. IEEE PCS Professional Engineering Communication Series IEEE Press, Wiley (2014)
Bischl, B., Mersmann, O., Trautmann, H., Weihs, C.: Resampling methods for meta-model validation with recommendations for evolutionary computation. Evol. Comput. 20(2), 249–275 (2012)
Bischl, B., Schiffner, J., Weihs, C.: Benchmarking local classification methods. Comput. Stat. 28(6), 2599–2619 (2013)
Weihs, Claus &Ickstadt, Katja. (2018). Data Science: the impact of statistics. International Journal of Data Science and Analytics.
10.1007/s41060-018- 0102-5. 2] Galeano, Pedro & Peña, Daniel. (2019). Data science, big data and statistics.
Hennig, C., Meila, M., Murtagh, F., Rocci, R.: Handbook of Cluster Analysis. Chapman & Hall, London (2015)
Klein, H.U., Schäfer, M., Porse, B.T., Hasemann, M.S., Ickstadt, K., Dugas, M.: Integrative analysis of histone chip-seq and transcription data using Bayesian mixture models. Bioinformatics 30(8), 1154–1162 (2014
Knoche, S., Ebeling, M.: The musical signal: physically and psychologically, chap 2. In: Weihs, C., Jannach, D., Vatolkin, I., Rudolph, G. (eds.) Music Data Analysis—Foundations and Applications, pp. 15–68. CRC Press, Boca Raton (2017)
Koenker, R.: Quantile Regression. Econometric Society Monographs, vol. 38 (2010)
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT press, Cambridge (2009)
Lütkepohl, H.: New Introduction to Multiple Time Series Analysis. Springer, Berlin (2010)