International Journal of Computer
Trends and Technology

Research Article | Open Access | Download PDF

Volume 4 | Issue 3 | Year 2013 | Article Id. IJCTT-V4I3P132 | DOI : https://doi.org/10.14445/22312803/IJCTT-V4I3P132

An Efficient Classification Approach for the XML Documents


Navya sree.Yarramsetti, G.Siva Nageswara Rao

Citation :

Navya sree.Yarramsetti, G.Siva Nageswara Rao, "An Efficient Classification Approach for the XML Documents," International Journal of Computer Trends and Technology (IJCTT), vol. 4, no. 3, pp. 362-366, 2013. Crossref, https://doi.org/10.14445/22312803/IJCTT-V4I3P132

Abstract

Extensible Markup Language (XML) has been used as standard format for a data representation over the internet. An XML document is usually organized by a set of textual data according to a predefined logical structure. Due to the presence of inherent structure in the XML documents, conventional text classification methods cannot be used to classify XML documents directly. In this paper, we propose the learning issues with XML documents from three major research areas. First, a knowledge representation method, which is based on typed higher order logic formalism. Here, the main focus is how to represent an XML document using higher order logic terms where both its contents and structures are captured. Second-symbolic machine learning. Here, a new decision-tree learning algorithm determined by precision/recall breakeven point (PRDT) for the XML document classification problem. Precision/recall heuristic is considered in xml document classification is that the xml documents have strong connections with text documents.  Finally, we had a semi-supervised learning algorithm which is based on the PRDT algorithm and the co-training framework. By producing comprehensible theories, the tentative results exhibit that our framework is capable to attain good performance in both the machine learning techniques. 

Keywords

precision/recall, Co-training, machine learning, knowledge representation, semi-supervised learning.

References

[1] S. Giri, A. Chandramouli, and S. Gauch, “XML Classification Using Content and Structure,” Technical Report ITTC-FY2007-TR- 31020-02, 2007.
[2] J.X. Wu and J. Zhang, “Knowledge Representation and Learning for Semistructured Data,” Technical Report, CSIRO ICT Centre, 2009.
[3] Bouchachia.A, Hassler.M, “Classification of XML Documents”,2007
[4] Xiaobing Jemma Wu, XML Document Classification with Co-training, CSIRO ICT Centre, 2009
[5]  Qingjiu Zhang, “Shiliang sun, “Evolutionary classifier ensembles for semi-supervised learning”,2010
[6] Yuanyuan Guo, Xiaoda Niu ;  Zhang.H “An Extensive Empirical Study on Semi-supervised Learning”,2010
[7] X. Zhu and A.B. Goldberg, “Introduction to SemiSupervised Learning”, 2009
[8] Jemma Wu, A Framework for Learning Comprehensible Theories Classification, IEEE in XML Document TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 1, JANUARY 2012
[9] Classification Tree Embedded XML Document Structure Design for Enhanced Web Document Utilization, Sixth International Conference on Advanced Language Processing and Web Information Technology, 2007,pages: 542-547.
[10] Applications of Data Mining in the Education Resource Based on XML, International Conference on advanced Computer Theory and Engineering, 2008, ICACTE’08. Pages: 943-946. 
[11] Graph-based Semi-supervised Learning Algorithm for Web Page Classification, Sixth  International Conference on Intelligent Systems Design and Applications,2006. Pages:856-860.
[12] Research on Multi-View Semi-Supervised Learning Algorithm Based on Co-Learning, International Conference on Machine Learning and Cybernatics, 2006. Xing-Qi wang.
[13] Xiaobing Jemma Wu, An Inductive Learning System for XML Documents,2010
[14] A Passive-Aggressive Algorithm for Semi-supervised Learning, International Conference on Technologies and Applications of Artificial Intelligence, 2010. Chien-chung Chang. Pages: 335-341.
[15] A new semi-supervised support vector machine learning algorithm based on active learning , International Conference on Future Computer and Communication (ICFCC), 2010. Li Cunhe. Vol 3, May 2010.