Extended SQL Aggregation for Database Transformation

Archana A. Chaudhari; Harmeet Kaur Khanuja

doi:https://doi.org/10.14445/22312803/IJCTT-V18P157

Research Article | Open Access | Download PDF

Volume 18 | Number 1 | Year 2014 | Article Id. IJCTT-V18P157 | DOI : https://doi.org/10.14445/22312803/IJCTT-V18P157

Extended SQL Aggregation for Database Transformation

Archana A. Chaudhari , Harmeet Kaur Khanuja

Citation :

Archana A. Chaudhari , Harmeet Kaur Khanuja, "Extended SQL Aggregation for Database Transformation," International Journal of Computer Trends and Technology (IJCTT), vol. 18, no. 1, pp. 272-275, 2014. Crossref, https://doi.org/10.14445/22312803/IJCTT-V18P157

Abstract

To prepare a normalized data set from relational database for analysis requires significant efforts and it is time consuming task. The main reason is that, in general the database grows with many tables and views that must be joined, aggregated and transformed in order to build the required data set. As result, most of the SQL queries are written independently multiple times and in disorganize manner, which create problems in database evolution and software maintenance. To address this issue, we propose simple methods to generate SQL code to return aggregated columns in a horizontal tabular layout, where every row corresponds to an observation, instance or point (possibly varying over time) and every column is associated to a one variable or dimension. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal denormalized layout (e.g. point-dimension, observation variable, instance-feature) which is the standard layout required by most data mining algorithms. By providing these standard normalized data-set as an input to the Decision tree generation algorithm for generating Decision tree, similarly we can generate extended ER model.

Keywords

Data mining, Transformation, Aggregation, Data preparation, Pivoting

References

[1] Carlos Ordonez, Sofian Maabout, David Sergio Matusevich, Wellington Cabrera, “Extending ER models to capture database transformations to build data sets for data mining”, Data and Knowledge Engineering, vol.89, pp. 38 - 54, January 2014.
[2] Carlos Ordonez and Zhibo Chen,“Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis”, IEEE Transaction On Knowledge and Data Engineering, Vol. 24, No. 4, pp. 678-691, April 2012.
[3] Javier Garca-Garcaa, Carlos Ordonez,“Extended aggregations for databases with referential integrity issues”, Data and Knowledge Engineering, Vol.69, No.1, pp. 73-95, January 2010.
[4] Carlos Ordonez, “Vertical and Horizontal Percentage Aggregations”, Proc. ACM SIGMOD Intl Conf. Management of Data (SIGMOD 04), pp. 866-871, 2004.
[5] Carlos Ordonez, “Integrating K-Means Clustering with a Relational DBMS Using SQL”, IEEE Trans. Knowledge and Data Eng., Vol.18, No.2, pp.188-201., Feb. 2006.
[6] C. Ordonez, “Data Set Preprocessing and Transformation in a Database System”,Intelligent Data Analysis, vol. 15, no. 4, pp. 613-631, 2011.
[7] Carlos Ordonez,“Horizontal Aggregations for Building Tabular Data Sets”, Proc. Ninth ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD 04), pp. 35-42, 2004.