Big Data analytics ontology
DOI:
https://doi.org/10.15587/2312-8372.2018.123612Keywords:
Big Data analysis ontology, visualization data, data mining, Text Mining, MapReduceAbstract
The object of this research is the Big Data (BD) analysis processes. One of the most problematic places is the lack of a clear classification of BD analysis methods, the presence of which will greatly facilitate the selection of an optimal and efficient algorithm for analyzing these data depending on their structure.
In the course of the study, Data Mining methods, Technologies Tech Mining, MapReduce technology, data visualization, other technologies and analysis techniques were used. This allows to determine their main characteristics and features for constructing a formal analysis model for Big Data. The rules for analyzing Big Data in the form of an ontological knowledge base are developed with the aim of using it to process and analyze any data.
A classifier for forming a set of Big Data analysis rules has been obtained. Each BD has a set of parameters and criteria that determine the methods and technologies of analysis. The very purpose of BD, its structure and content determine the techniques and technologies for further analysis. Thanks to the developed ontology of the knowledge base of BD analysis with Protégé 3.4.7 and the set of RABD rules built in them, the process of selecting the methodologies and technologies for further analysis is shortened and the analysis of the selected BD is automated. This is due to the fact that the proposed approach to the analysis of Big Data has a number of features, in particular ontological knowledge base based on modern methods of artificial intelligence.
Thanks to this, it is possible to obtain a complete set of Big Data analysis rules. This is possible only if the parameters and criteria of a specific Big Data are analyzed clearly.
References
- Mayer-Schonberger, V., Cukier, K. (2013). Big Data: A Revolution That Will Transform How We Live, Work, and Think. John Murray Publishers, 256.
- Fekete, J. D. (2016). Big Data Visual Analytics. Available at: http://www.aviz.fr/wiki/uploads/TeachingVA2016/Lectur-BigDataVA.pdf. Last accessed: 18.09.2017.
- Raghupathi, W., Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2 (1). doi:10.1186/2047-2501-2-3
- Hong, S. H., Ma, K. L., Koyamada, K. (2017). Big Data Visual Analytics. NII Shonan Meeting Report No. 2015-147. Tokyo. Available at: https://pdfs.semanticscholar.org/45ec/4934ee034a5839f4e657089ac865f0baa8ff.pdf. Last accessed: 18.09.2017.
- Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J. M., Welton, C. (2009). MAD Skills: New Analysis Practices for Big Data. Proceedings of the VLDB Endowment, 2 (2), 1481–1492. doi:10.14778/1687553.1687576
- History and evolution of big data analytics. Available at: https://www.sas.com/en_us/insights/analytics/big-data-analytics.html. Last accessed: 18.09.2017.
- Mitchell R. L. 8 big trends in big data analytics. Available at: http://www.computerworld.com/article/2690856/big-data/8-big-trends-in-big-data-analytics.html. Last accessed: 18.09.2017.
- Big Data. Available at: http://tadviser.ru/a/125096. Last accessed: 18.09.2017.
- Inmon, W. H. (2014). Big Data – getting it right: A checklist to evaluate your environment. Forest Rim Technology LLC. Available at: http://dssresources.com/papers/features/inmon/inmon01162014.htm. Last accessed: 18.09.2017.
- Barsegyan, A. A., Kupriyanov, M. S., Kholod, I. I., Tess, M. D., Elizarov, S. I. (2009). Analysis of data and processes. Saint Petersburg: BHV-Petersburg, 512.
- Paklin, N. B., Oreshkov, V. I. (2009). Business analysis: from data to knowledge. Saint Petersburg: Piter, 624.
- Duke, V., Samoylenko, A. (2001). Data Mining: training course. Saint Petersburg: Piter, 368.
- Manyika, J. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, 156.
- Zhuravlev, J. I., Ryazanov, V. V., Senko, O. V. (2006). Recognition. Mathematical methods. Software system. Practical applications. Moscow: Phasis, 176.
- Zinovev, A. Y. (2000). Visualization of multidimensional data. Krasnoyarsk: Publisher Krasnoyarsk State Technical University, 180.
- Chubukova, I. A. (2006). Data Mining. Moscow: Internet University of Information Technologies, BINOM, 382.
- Sitnik, V. F., Krasnyuk, M. T. (2007). Data Mining. Kyiv: KNEU, 376.
- Witten, I. H., Frank, E., Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques. Burlington: Morgan Kaufmann, 664. doi:10.1016/c2009-0-19715-5
- Marr, B. (2015). Big Data: Using SMART Big Data, Analytics and Metrics to Make Better Decisions and Improve Performance. John Wiley & Sons Ltd, 256.
- Einav, L., Levin, J. (2014). The Data Revolution and Economic Analysis. Available at: http://www.nber.org/chapters/c12942.pdf. Last accessed: 18.09.2017.
- Vanyashin, A., Klimentov, A., Korenkov, V. (2013). PANDA follows the large data. Supercomputers, 3 (11), 56–61.
- Serov D. Analytics of «big data» – new perspectives. Available at: http://www.storagenews.ru/49/EMC_BigData_49.pdf. Last accessed: 18.09.2017.
- Ronen, S., Goncalves, B., Hu, K. Z., Vespignani, A., Pinker, S., Hidalgo, C. A. (2014). Links that speak: The global language network and its association with global fame. Proceedings of the National Academy of Sciences, 111 (52), 5616–5622. doi:10.1073/pnas.1410931111
- Aflalo, Y., Kimmel, R. (2013). Spectral multidimensional scaling. Proceedings of the National Academy of Sciences, 110 (45), 18052–18057. doi:10.1073/pnas.1308708110
- Gadepally, V., Kepner, J. (2014). Big data dimensional analysis. 2014 IEEE High Performance Extreme Computing Conference (HPEC). doi:10.1109/hpec.2014.7040944
- Weinstein, M., Meirer, F., Hume, A., Sciau, Ph., Shaked, G., Hofstetter, R. et al. Analyzing Big Data with Dynamic Quantum Clustering. Available at: https://arxiv.org/ftp/arxiv/papers/1310/1310.2700.pdf. Last accessed: 18.09.2017.
- Paklin, N. B., Oreshkov, V. I. (2013). Business Intelligence: from data to knowledge. Saint Petersburg: Piter, 702.
- Zelazny, D. (2004). Speak in the language of diagrams: manual on visual communications for managers. Moscow: Institute for Comprehensive Strategic Studies, 220.
- Roem, D. (2014). The practice of visual thinking. An original method for solving complex problems. Moscow: Mann, Ivanov and Ferber, 396.
- Russom, P. (2011). Big data analytics. Available at: https://vivomente.com/wp-content/uploads/2016/04/big-data-analytics-white-paper.pdf. Last accessed: 18.09.2017.
- Yau, N. (2013). The art of visualization in business. How to present complex information with simple images. Moscow: Mann, Ivanov and Ferber, 352.
- Iliinsky, N., Steele, J. (2011). Designing Data Visualizations. Sebastopol: O’Reilly, 110.
- Krum, R. (2014). Cool infographics: effective communication with data visualization and design. Indianapolis: Wiley, 348.
- Tukey, J. (1981). Analysis of Observation Results: Exploratory Analysis. Moscow: Мir, 693.
- Alper, C., Brown, K., Wagner, G. R. (2006). New Software for Visualizing the Past, Present and Future. Available at: http://dssresources.com/papers/features/alperbrown&wagner/alperbrown&wagner09212006.html. Last accessed: 18.09.2017.
- Barsegyan, A. A., Kupriyanov, M. S., Kholod, I. I., Tess, M. D., Elizarov, S. I. (2009). Analysis of data and processes. Saint Petersburg: BHV-Petersburg, 512.
- Text Mining. Available at: http://statsoft.ru/home/textbook/modules/sttextmin.html#index. Last accessed: 18.09.2017.
- Lande, D., Berezin, B., Pavlenko, O. (2017). Postroenie modeli informatsionnogo servisa na baze natsional'nogo segmenta Internet. Informatsionnye tehnologii i bezopasnost'. Materialy XVI Mezhdunarodnoi nauchno-prakticheskoi konferentsii ITB-2016. Kyiv: IPRI NAN Ukrainy, 48–57. URL: http://dwl.kiev.ua/art/itb2016/i4/i4.pdf. Last accessed: 18.09.2017.
- Barsegyan, A. A., Kupriyanov, M. S., Stepanenko, V. V., Kholod, I. I. (2007). Data Analysis Technologies. Data Mining, Visual Mining, Text Mining, OLAP. Saint Petersburg: BHV-Petersburg, 384.
- Linyuchev, P. (2007). Text Mining: modern technologies on information mines. PC Week/RE, 6 (564). Available at: https://www.pcweek.ru/idea/article/detail.php?ID=82081. Last accessed: 18.09.2017.
- Pleskach, V. L., Zatonatskaya, T. G. (2011). Information systems and technologies at enterprises. Kyiv: Znannya, 718.
- Stonebraker, M., Abadi, D., DeWitt, D. J., Madden, S., Paulson, E., Pavlo, A., Rasin, A. (2010). MapReduce and Parallel DBMSs: Friends or Foes? Communications of the ACM, 53 (1), 64. doi:10.1145/1629175.1629197
- Berezin, A. (2013). Map-Reduce on the example of MongoDB. Available at: https://habrahabr.ru/post/184130/. Last accessed: 18.09.2017.
- Lebedenko, E. (2013). Google MapReduce technology: divide and conquer. Kompiuterra. Available at: http://www.computerra.ru/82659/mapreduce/. Last accessed: 18.09.2017.
- Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., DeWitt, D. J., Madden, S., Stonebraker, M. (2009). A comparison of approaches to large-scale data analysis. Proceedings of the 35th SIGMOD International Conference on Management of Data – SIGMOD ’09. doi:10.1145/1559845.1559865
- Big Data from A to Ya. Part 1: Principles of working with large data, the MapReduce paradigm. (2015). Available at: https://habrahabr.ru/company/dca/blog/267361/. Last accessed: 18.09.2017.
- Big Data from A to Ya. Part 3: Methods and strategies for developing MapReduce applications. (2015). Available at: https://habrahabr.ru/company/dca/blog/270453/. Last accessed: 18.09.2017.
- Gavrilova, T. A., Khoroshevsky, V. F. (2000). Intelligent Systems Knowledge Base. Saint Petersburg: Piter, 384.
- Lytvyn, V., Vysotska, V., Veres, O., Rishnyak, I., Rishnyak, H. (2017). Classification Methods of Text Documents Using Ontology Based Approach. Advances in Intelligent Systems and Computing. Springer, 229–240. doi:10.1007/978-3-319-45991-2_15
- Bisikalo, O. V., Vysotska, V. A. (2016). Identifying keywords on the basis of content monitoring method in Ukrainian texts. Radio Electronics, Computer Science, Control, 1 (36), 74–83. doi:10.15588/1607-3274-2016-1-9
- Bisikalo, O. V., Vysotska, V. A. (2016). Sentence syntactic analysis application to keywords identification Ukrainian texts. Radio Electronics, Computer Science, Control, 3 (38), 54–65. doi:10.15588/1607-3274-2016-3-7
- Lytvyn, V., Bobyk, I., Vysotska, V. (2016). Application of algorithmic algebra system for grammatical analysis of symbolic computation expressions of propositional logic. Radio Electronics, Computer Science, Control, 4 (39), 54–67. doi:10.15588/1607-3274-2016-4-10
- Alieksieieva, K., Berko, A., Vysotska, V. (2015). Technology of commercial web-resource management based on fuzzy logic. Radio Electronics, Computer Science, Control, 3 (34), 71–79. doi:10.15588/1607-3274-2015-3-9
- Korobchynskyi, M., Chyrun, L., Vysotska, V., Nych, M. (2017). Matches prognostication features and perspectives in cybersport. Radio Electronics, Computer Science, Control, 3 (42), 95–105. doi:10.15588/1607-3274-2017-3-11
- Wolfram, S. (2013). Data Science of the Facebook World. Available at: http://blog.wolfram.com/2013/04/24/data-science-of-the-facebook-world/. Last accessed: 18.09.2017.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 Victoria Vysotska
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.