Classification Rule Discovery Using Genetic Algorithm-Based Approach

Syed Shaheena; Shaik Habeeb

doi:https://doi.org/10.14445/22312803/IJCTT-V4I8P158

Research Article | Open Access | Download PDF

Volume 4 | Issue 8 | Year 2013 | Article Id. IJCTT-V4I8P158 | DOI : https://doi.org/10.14445/22312803/IJCTT-V4I8P158

Classification Rule Discovery Using Genetic Algorithm-Based Approach

Syed Shaheena, Shaik Habeeb

Citation :

Syed Shaheena, Shaik Habeeb, "Classification Rule Discovery Using Genetic Algorithm-Based Approach," International Journal of Computer Trends and Technology (IJCTT), vol. 4, no. 8, pp. 2710-2715, 2013. Crossref, https://doi.org/10.14445/22312803/IJCTT-V4I8P158

Abstract

Data mining has a goal to extract knowledge from large databases. To extract this knowledge, a database may be considered as a large search space, and a mining algorithm as a search strategy. In general, a search space consists of an enormous number of elements, which make it infeasible to search exhaustively. As a search strategy Genetic Algorithms was introduced by J.H. Holland have been applied successfully in many fields. Data Mining is acknowledged as an effective technique for the problem ‘abundant data but poor knowledge’. As the kernel of DM technique, the mining algorithms are investigated extensively; it will generate the exact class description for the classification of unknown data by analyzing the existing data. A genetic algorithm generates formulas for extracting the high-level classification/prediction rules with the following form. IF some conditions are satisfied THEN predict the value of some goal attribute. Genetic algorithms cannot deal with the data directly the data has to be encoded in the form of a chromosome. Based on the notion of survival of the fittest, a new population is formed to consist of the fittest rules in the current population, as well as offspring of these values. Offspring are created by applying genetic operators such as crossover and mutation. In crossover substrings from pair of rules are swapped to form the new rules, in mutation randomly selected bits in a rule strings are inverted. In this work we are implementing the rule discovery for Indian Liver Patient database (ILPD) collected from the north east of Andhra Pradesh, India.

Keywords

Knowledge Discovery in Databases (KDD), Data Mining, Machine Learning, Genetic Algorithm, Classification Rule, Genetic Operators, Fitness Function, Predictive Accuracy.

References

[1] Zhu, X. and Davidson, I. 2007. Knowledge Discovery and Data Mining Challenges and Realities. IGI Global.
[2] Fayyad, U. M., Piatetsky-Sharpio, G. and Smyth, P.1996. From mining to knowledge discovery : An overview. In: Fayyad, U .M., Piatetsky-Sharpio, G. Smyth. P. and Uthurusany, R. (eds.)Advances in knowledge discovery and data mining , AAAI/MIT Press, pp. 1-34.
[3] Han, J., Kamber, M. and Pei, J. 2011. Data Mining: Concepts and Techniques. Third Edition, Morgan Kaufmann.
[4] Freitas, A. A. 2002. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag, Berlin Heidelberg.
[5] Yogita, Saroj and Kumar, D. 2009. Rule +Exceptions: Automated discovery of comprehensible decision Rules. IEEE International Advance Computing Conference (IACC2009), Patiala, India, pp. 1479-1483.
[6] Barros, R.C., Basgalupp, M.P., Ferreira, A.C. and Frietas, A.A. 2011. Towards the automatic design of decision tree induction algorithms. In: GECCO (Companion Material ), Dublin, Ireland, pp. 567-574.
[7] Bramer, M. 2007. Principles of Data Mining. Springer-Verlag London Limited. [8] Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley.
[9] Dehuri, S. and Mall, R. 2006. Predictive and comprehensible rule discovery using a multi objective genetic algorithms. Knowledge Based Systems, vol. 19, pp. 413-421.
[10] Fidelis, M.V., Lopes, H.S., Freitas, A.A. and Grossa, P. 2000. Discovering comprehensible classification rules with a genetic algorithm. In Proceedings of the 2000 Congress on Evolutionary Computation, La Jolla, CA, USA, IEEE, vol. 1, pp. 805-810.
[11] Kaplia, Saroj, Kumar D. and Kanika. 2010. A genetic algorithm with entropy based initial bias for automated rule mining. In Proceeding of the IEEE International Conference on Computer & Communication Technology(ICCCT 10), pp. 491-495.
[12] Bharadwaj, K. K. and Al-Maqaleh, B.M. 2005. Evolutionary approach for automated discovery of censored production rules. In: Proceedings of the 8th International Conference on Cybernetics, Informatics and Systemics (CIS-2005). vol. 10, Krakow, Poland, pp.147-152.
[13] Bharadwaj, K. K. and Al-Maqaleh, B.M. 2006. Evolutionary approach for automated discovery of augmented production rules. International Journal of Computational Intelligence. vol. 3, Issue 4, pp. 267-275.
[14] Goplan J., Alhajj R. and Barker, K. 2006. Discovering accurate and interesting classification rules using genetic algorithm. In Proceedings of the International Conference on Data Mining(DMIN06), Las Vegas, Nevada, USA , pp. 389-395. [15] Carvalho, D. R. and Freitas, A. A. 2002. A genetic-algorithm for discovering small-disjunct rules in data mining. Applied Soft Computing, vol. 2, pp.75-88.
[16] Sarkar, B. K., Sana, S.S. and Chaudhuri, K. 2012. A genetic algorithmbased rule extraction system. Applied Soft Computing. vol. 12, pp. 238-254.
[17] Al-Maqaleh, B.M. 2012. Genetic algorithm approach to automated discovery of comprehensible production rules. In Proceeding of the IEEE 2nd International Conference on Advanced Computing & Communication Technologies (ACCT2012), Rohtak, India, pp.69-71.
[18] Al-Maqaleh, B.M. 2012. Mining interesting classification rules: An evolutionary approach. International Journal of Mathematical Engineering and Science. vol. 1, Issue 1, pp. 13-20.
[19] Frietas, A.A. 1999. On rule interestingness measures. Knowledge-Based System. 12(5-6), pp. 309-315.
[20] Shi, X-J. and Lei, H. 2008. A genetic algorithm-based approach for classification rule discovery. In Proceeding of the IEEE International Conference on Information Management, Innovation Management and Industrial Engineering (ICIII08), Taipei, Taiwan, pp. 175-178.
[21] UCI Repository of Machine Learning Databases, Department of Information and Computer Science University of California, 1994. [http://www.ics.uci.edu/ ~mlearn/MLRepositry.html]
[22] Quinlan. J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.
[23] Quinlan, J. R. 1991. Improved estimates for the accuracy of small disjuncts. Journal of Machine Learning, Kluwer Academic Publishers Hingham, MA, USA, vol. 6, Issue 1, pp. 93-98.
[24] Holte, R. C., Acker, L. E. and Porter, B. W. 1989. Concept learning and the problem of small disjuncts. In Proceedings of IJCAI – 89, pp. 813-818.