An Examination of Machine Learning in the Process of Data Integration

An Examination of Machine Learning in the Process of Data Integration

	© 2023 by IJCTT Journal
	Volume-71 Issue-6
	Year of Publication : 2023
	Authors : Sandeep Rangineni, Divya Marupaka, Arvind Kumar Bhardwaj
	DOI : 10.14445/22312803/IJCTT-V71I6P114

How to Cite?

Sandeep Rangineni, Divya Marupaka, Arvind Kumar Bhardwaj, "An Examination of Machine Learning in the Process of Data Integration," International Journal of Computer Trends and Technology, vol. 71, no. 6, pp. 79-85, 2023. Crossref, https://doi.org/10.14445/22312803/IJCTT-V71I6P114

Abstract
Some of the challenges of real-world machine learning and data analysis are discussed, and solutions are offered. Although using data-driven approaches in industrial and corporate applications might result in significant benefits in productivity and efficiency, the associated expense and complexity can be daunting. An experienced analyst without deep domain expertise in the field of application is frequently called upon to conduct the arduous manual labor required in creating machine learning applications in practice. In this article, we'll go through some of the most common challenges encountered during analysis projects and provide some advice for overcoming them. When applying machine learning methods to complicated data, for example, in industrial applications, it is crucial to ensure that the processes creating the data are modelled correctly. It is necessary to formalize and express the relevant features so that we may carry out our computations effectively. Because of this, we can make statistical models that are both consistent and expressive, which makes it easier to represent complicated systems. Applying a Bayesian perspective, we make the models usable even when just a little amount of data is available and permit the encoding of previous information. We'll talk about how to extract this structure from sequences of data. Taking the use of the dependencies between consecutive data points, we develop a correlation measure based on information theory that avoids the pitfalls of traditional methods. The iterative and interactive performance of classification is favored in a wide variety of diagnostic settings. Data analysis projects may be made more efficient by focusing not just on the models used but also on the technique and applications that might facilitate simplification. In this article, we provide a technique for data preparation together with a software library tailored toward speedy evaluation, prototyping, and implementation. Lastly, we'll look at several real-world applications, including those that include categorization, prediction, and anomaly detection.

Keywords
Machine learning, Data analytics, Artificial intelligence, Challenges, Data integrity, Data analysis, Data quality.

Reference

[1] Christopher J.C. Burges et al., “A Tutorial on Support Vector Machines for Pattern Reclog- Nition,” Data Mining and Knowledge Discovery, vol. 2, pp. 1221-167, 1998.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Dr. Naveen Prasadula, A Review of Literature on Review of Applying Machine Learning to Analyze and Detect Malware,
[3] Feng Tao et al., “The Future of Artificial Intelligence in Cybersecurity: A Comprehensive Survey,” EAI Endorsed Transactions on Creative Technologies, vol. 8, no. 28, pp. 1-15, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Sanjay Sharma, C. Rama Krishna, and Sanjay K. Sahay, “Detection of Advanced Malware by Machine Learning Techniques,” Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computin, vol. 472, pp. 333–342, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[5] D Chandrakala et al., “Detection, and Classification of Malware,” In Proceedings of the 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pp. 1–3, 2021.
[CrossRef] [Publisher Link]
[6] Kai Zhao et al., “A Feature Extraction and Selection Tool for Android Malware Detection,” In Proceedings of the 2015 IEEE Symposium on Computers and Communication (ISCC), pp. 714–720, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Muhammad Shoaib Akhtar, and Tao Feng et al., “Detection of Sleep Paralysis by Using IoT Based Device and Its Relationship Between Sleep Paralysis and Sleep Quality,” EAI Endorsed Transactions on Internet of Thing, vol. 8, no. 28, pp. 1-15, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Daniel Gibert et al., “Using Convolutional Neural Networks for Classification of Malware Represented As Images,” Journal of Computer Virology and Hacking Techniques, vol. 15, no. 1, pp. 15–28, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Peter Cheeseman, and John Stutz, “Bayesian Classification (Autoclass): Theory and Results,” In Advances in Knowledge Discovery and Data Mining, pp. 153–180, 199.
[CrossRef] [Google Scholar] [Publisher Link]
[10] C. Chow, and C. Liu, “Approximating Discrete Probability Distributions with Dependency Trees,” IEEE Transactions on Information Theory, vol. 14, no. 5, pp. 462–467, 1968.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Amanda Clare, and Ross D. King, “Data Mining the Yeast Genome in a Lazy Functional Language,” Practical Aspects of Declarative Languages. PADL 2003, vol. 2562, pp. 19-36, 2003.
[CrossRef] [Google Scholar] [Publisher Link]
[12] William G. Cochran, Sampling Techniques, Wiley, New York, 1991.
[Google Scholar] [Publisher Link]
[13] Leonard E. Baum L. E, “An Inequality and Associated Maximisation Technique in Statistical Estimation for Probabilistic Functions of a Markov Process,” Inequalities, pp. 1–8, 1972.
[Google Scholar] [Publisher Link]
[14] Richard E. Bellman et al., Adaptive Control Processes: A Guided Tour, Princeton Uni- Versity Press, New Jersey.
[Publisher Link]
[15] Beni G., and Wang J, “Swarm Intelligence,” In Proceedings of the Seventh Annual Meeting of the Robotics Society of Japan, pp. 425– 428, 1989.
[16] Christopher M. Bishop, and Markus Svenskn, “Bayesian Hierarchical Mixtures of Experts,” In Proceedings of the Nineteenth Conference on Uncertainty In AI (UAI), pp. 57–64, 2002.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Leo Breiman, Bagging Predictors, Technical Report, Department of Statistics, University of California, Berkely, CA, pp. 1-19, 1998.
[Google Scholar] [Publisher Link]
[18] Leo Breiman, Classification and Regression Trees, Wadsworth, Belmont, CA, 1984.
[CrossRef] [Google Scholar] [Publisher Link]
[19] E. O. Brigham, and R. E. Morrow, “The Fast Fourier Transform,” IEEE Spectrum, vol. 4, no. 12, pp. 63–70, 1967.
[CrossRef] [Publisher Link]
[20] W. Buntine, “A Guide To the Literature on Learning Probabilistic Networks From Data,” IEEE Trans. Knowledge and Data Engineering, vol. 8, no. 2, pp. 195–210, 1996.
[CrossRef] [Google Scholar] [Publisher Link]

IJBTT

An Examination of Machine Learning in the Process of Data Integration