Preparing Data Sets For Data Mining Using CASE, PIVOT And SPJ

International Journal of Computer Trends and Technology (IJCTT)          
© - October Issue 2013 by IJCTT Journal
Volume-4 Issue-10                           
Year of Publication : 2013
Authors :I.Lakshmi Kantha Reddy , M.Samba sivudu


I.Lakshmi Kantha Reddy , M.Samba sivudu"Preparing Data Sets For Data Mining Using CASE, PIVOT And SPJ "International Journal of Computer Trends and Technology (IJCTT),V4(10):3670-3678 October Issue 2013 .ISSN Published by Seventh Sense Research Group.

Abstract:-  Data mining plays an important role in real time applications for extracting business intelligence from business data and make expert decisions. Datasets are used in order to mine data for the purpose of discovering knowledge from data. However, preparing datasets manually is a tedious task. The reason behind it is that it involves aggregation of relations and other complex operations. Another important reason for the difficultly is the fact that SQL aggregations do not provide datasets. Instead they can give only single value results that are not suitable for data mining. Data in horizontal layout is required for data mining purposes For this reason, in this paper we focus on the horizontal aggregations that can produce datasets. Towards it we build three constructs that can be used along with SQL queries to produce datasets automatically. The novel aggregations include SPJ, CASE and PIVOT constructs. We built a prototype for making experiments and the results revealed that the proposed aggregations are able to produce datasets required.


References -
[1] C. Ordonez, “Data Set Preprocessing and Transformation in aDatabase System,” Intelligent Data Analysis, vol. 15, no. 4, pp. 613-631, 2011.
[2] C. Ordonez and S. Pitchaimalai, “Bayesian Classifiers Programmedin SQL,” IEEE Trans. Knowledge and Data Eng., vol. 22,no. 1, pp. 139-144, Jan. 2010.
[3] C. Ordonez, “Statistical Model Computation with UDFs,” IEEETrans. Knowledge and Data Eng., vol. 22, no. 12, pp. 1752- 1765, Dec.2010.
[4] J. Han and M. Kamber, Data Mining: Concepts and Techniques, firsted. Morgan Kaufmann, 2001.
[5] C. Ordonez, “Integrating K-Means Clustering with a RelationalDBMS Using SQL,” IEEE Trans. Knowledge and Data Eng., vol. 18,no. 2, pp. 188-201, Feb. 2006.
[6] H. Wang, C. Zaniolo, and C.R. Luo, “ATLAS: A Small ButComplete SQL Extension for Data Mining and Data Streams,”Proc. 29th Int’l Conf. Very Large Data Bases (VLDB ’03), pp. 1113-1116, 2003.
[7] S. Sarawagi, S. Thomas, and R. Agrawal, “Integrating ssociationRule Mining with Relational Database Systems: Alternatives andImplications,” Proc. ACM SIGMOD Int’l Conf. Management of Data(SIGMOD ’98), pp. 343-354, 1998.
[8] A. Witkowski, S. Bellamkonda, T. Bozkaya, G. Dorman, N.Folkert, A. Gupta, L. Sheng, and S. Subramanian, “Spreadsheetsin RDBMS for OLAP,” Proc. ACM SIGMOD Int’l Conf. Managementof Data (SIGMOD ’03), pp. 52-63, 2003.
[9] H. Garcia-Molina, J.D. Ullman, and J. Widom, Database Systems:The Complete Book, first ed. Prentice Hall, 2001.
[10] C. Galindo-Legaria and A. Rosenthal, “Outer Join Simplificationand Reordering for Query Optimization,” ACM Trans. DatabaseSystems, vol. 22, no. 1, pp. 43-73, 1997.
[11] G. Bhargava, P. Goel, and B.R. Iyer, “Hypergraph BasedReorderings of Outer Join Queries with Complex Predicates,”Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’95),pp. 304-315, 1995.
[12] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, “Data Cube: ARelational Aggregation Operator Generalizing Group-by, Cross-Tab and Sub-Total,” Proc. Int’l Conf. Data Eng., pp. 152- 159, 1996.
[13] G. Graefe, U. Fayyad, and S. Chaudhuri, “On the EfficientGathering of Sufficient Statistics for Classification from LargeSQL Databases,” Proc. ACM Conf. Knowledge Discovery and DataMining (KDD ’98), pp. 204-208, 1998.
[14] J. Clear, D. Dunn, B. Harvey, M.L. Heytens, and P. Lohman, “Non-Stop SQL/MX Primitives for Knowledge Discovery,” Proc. ACMSIGKDD Fifth Int’l Conf. Knowledge Discovery and Data Mining(KDD ’99), pp. 425-429, 1999.
[15] C. Cunningham, G. Graefe, and C.A. Galindo-Legaria, “PIVOTand UNPIVOT: Optimization and Execution Strategies in anRDBMS,” Proc. 13th Int’l Conf. Very Large Data Bases (VLDB ’04),pp. 998-1009, 2004.
[16] C. Ordonez, “Horizontal Aggregations for Building Tabular DataSets,” Proc. Ninth ACM SIGMOD Workshop Data Mining andKnowledge Discovery (DMKD ’04), pp. 35-42, 2004.
[17] C. Ordonez, “Vertical and Horizontal Percentage Aggregations,”Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’04),pp. 866-871, 2004.
[18] Carlos Ordonez and Zhibo Chen, “Horizontal Aggregations in SQL to PrepareData Sets for Data Mining Analysis”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 4, APRIL 2012.

Keywords :— SQL, aggregations, horizontal aggregations