Flexible Approach for Data Mining using Grid based Computing Concepts

International Journal of Computer Trends and Technology (IJCTT)          
© 2017 by IJCTT Journal
Volume-48 Number-3
Year of Publication : 2017
Authors : Abdul Ahad, Dr.Y.Suresh Babu
DOI :  10.14445/22312803/IJCTT-V48P129


Abdul Ahad, Dr.Y.Suresh Babu "Flexible Approach for Data Mining using Grid based Computing Concepts". International Journal of Computer Trends and Technology (IJCTT) V48(3):160-164, June 2017. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
Now days, in the field of life sciences and business, knowledge discovery has become a common task in both for the growing amount of data being gathered and for the complexity of the analysis that need to be performed on it. Due to some unique characteristics of today’s data sources, such as their heterogeneity, high dimensionality, distributed nature and large volume. Distribution of data and computation allows increasing trend towards decentralized business organizations; distribution of users, software, and hardware systems magnifies the need for more advanced and flexible approaches and solutions. Here we present the state of the art about the major data mining techniques, systems and approaches. This paper discusses how distributed and Grid computing can be used to support distributed data mining. In particular, a distinction is made between distributed and Grid-based data mining methods.

1. Andrei L. Turinsky, Robert L. Grossman y “A Framework for Finding Distributed Data Mining Strategies That are Intermediate Between Centralized Strategies and In-Place Strategies”, 2004.
2. Assaf Schuster, Ran Wolff, and Dan Trock, “A High-Performance Distributed Algorithm for Mining Association Rules”. In Third IEEE International Conference on Data Mining, Florida , USA, November 2003.
3. R. Agrawal and J. C. Shafer, “Parallel Mining of Association Rules”. IEEE Transactions On Knowledge And Data Engineering, 8:962-969, 1996.
4. Felicity George, Arno Knobbe, “A Parallel Data Mining Architecture for Massive Data Sets”, High Performance Research Center, 2001.
5. Abraham, A., & Nath, B. (2000). Hybrid heuristics for optimal design of artificial neural networks. In R. John & R. Birkenhead (Eds.), Advances in Soft Computing Techniques and Applications (pp. 15-22). Springer-Verlag.
6. Abraham, A., Grosan, C., & Ramos, V. (Eds.). (2006). Swarm Intelligence in Data Mining, Studies in Computational Intelligence. Springer-Verlag.
7. Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 144-152). ACM.
8. Brezany, P., Hofer, J., Tjoa, A., & Wohrer, A. (2003). Gridminer: An infrastructure for data mining on computational grids. In Data Mining on Computational Grids APAC’03.
9. Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. In J. Peckham (Ed.), International Conference on Management of Data (pp. 255-264). ACM Press.
10. Cannataro, M. & Talia, D. (2003). The knowledge grid: An architecture for distributed knowledge discovery. Commun. ACM, 46(1), 89-93.
11. Congiusta, A., Talia, D., & Trunfio, P. (2007). Distributed data mining services leveraging wsrf. Future Generation Computing Systems, 23(1), 34-41.
12. Dhillon, I. S., & Modha, D. S. (2000). A data-clustering algorithm on distributed memory multiprocessors. In Large- Scale Parallel Data Mining (pp. 245-260). Lecture Notes in Artificial Intelligence.
13. Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40, 139-157.
14. Freund, Y. & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computing System Science, 55(1), 119-139.
15. Giannadakis, N., Rowe, A., Ghanem, M., & Guo, Y. (2003). Infogrid: providing information integration for knowledge discovery. Information Sciences 155, 199-226.
16. Hall, L. O., Chawla, N., & Bowyer, K. W. (1998). Combining decision trees learned in parallel.
17. Lazarevic, A., & Obradovic, Z. (2002). Boosting algorithms for parallel and distributed learning. Distributed and Parallel Databases, 11(2), 203-229.
18. Lazarevic, A., Pokrajac, D., & Obradovic, Z. (2000). Distributed clustering and local regression for knowledge discovery in multiple spatial databases. In 8th European Symposium on Artificial Neural Networks (pp. 129-134).
19. Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Magazine,4(22), 7-25.
20. Luengo, F., Cofino, A. S., & Gutierrez, J. M. (2004). Grid oriented implementation of self-organizing maps for data mining in meteorology. Lecture Notes in Computer Science, 2970, 163-171.
21. Luo, C., Pereira, A. L., & Chung, S. M. (2006). Distributed mining of maximal frequent itemsets on a data grid system. Journal of Supercomputing, 37(1), 71-90.
22. Romei, A., Ruggieri, S., & Turini, F. (2006). Kddml: a middleware language and system for knowledge discovery in databases. Data Knowledge Engineering, 57(2), 179-220.
23. Romei, A., Sciolla, M., Turini, F., & Valentini, M. (2007). Kddml-g: a grid-enabled knowledge discovery system. Concurr. Comput. : Pract. Exper., 19(13), 1785-1809.
24. Rushing, J., Ramachandran, R., Nair, U., Graves, S., Welch, R., & Lin, H. (2005). ADaM: a data mining toolkit for scientists and engineers. Computers and Geosciences, 31, 607-618.
25. Samatova, N. F., Ostrouchov, G., Geist, A., & Melechko, A. (2002). RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets. Distributed and Parallel Databases, 11(2), 157-180.

Data mining, Distributed data, Grid computing, Knowledge discovery, Data sharing.