Web Crawl Detection and Analysis of Semantic Data

International Journal of Computer Trends and Technology (IJCTT)          
© 2015 by IJCTT Journal
Volume-21 Number-1
Year of Publication : 2015
Authors : AbhishekYadav, Piyush Singh
DOI :  10.14445/22312803/IJCTT-V21P101


AbhishekYadav, Piyush Singh "Web Crawl Detection and Analysis of Semantic Data". International Journal of Computer Trends and Technology (IJCTT) V21(1):1-6, March 2015. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
Web mining can be defined as mining of the WWW to retrieve useful knowledge and data about user behavior, user query, content and structure of the web. In this paper, aim on processing of structured and unstructured data mining will take place. With a tremendous development growth in website, web portal to provide downloadable data to user, required a lead to demand of a specific strategy to provide knowledgeable data to user and also useful to predict otherwise uncertain user behavior on the server. Semantic web is about machine-understandable web pages to make the web more intelligent and able to provide useful services to the users. In this paper we propose agent based Semantic Web Mining System (SWMS). It will provide classification and clustering of the web contents according to user navigating links and time when navigating to other pages, thereby facilitating knowledge based response to the user and will highlight otherwise unnoticed patterns. It mainly comprises of Interface Agents, collection Agent supported with ontology database, content mining agent and clustering agent. Content mining agent works in collaboration with descriptive metadata agent and semantic metadata agent.

[1] Sharma K., Shrivastava G. & Kumar V., ?Web Mining: Today and Tommorrow‘. In Proceedings of the IEEE 3rd International Conference on Electronics Computer Technology, 2011.
[2] Bhatia C.S. & Jain S., ?Semantic Web Mining: Using Ontology Learning and Grammatical Rule Interface Technique‘. In IEEE 2011.
[3]Kosala R. &Blockeel H., ?Web Mining Research: A Survey‘. Published in ACM SIGKDD, Vol. 2, Issue 1,July 2000.
[4] Eirinaki M. &Vazirgiannis M., ?Web Mining for Web Personalization‘. Published in ACM Transactions on Internet Technology, Vol.3 , No. 1, February 2003, pp. 1-27 [05] Z. Yang, B. Zhang, J. Dai, A. C. Champion, D. Xuan, and D. Li, “E-smalltalker: A distributed mobile system for social networking in physical proximity,” in ICDCS, 2010, pp. 468–477.
[5] Meirong T. & Xuedong C. , ?Application of Agent Based Web Mining in E-business‘. Published in 2010 IEEE Second International Conference on Intelligent Human-Machine Systems and Cybernetics, pp. 192-195.
[6] Ting I.H., ?Web Mining Techniques for On-line Social Networks Analysis‘. In Proceedings of the 5th International Conference on Service Systems and Service Management, Melbourne, Australia, 30 June-2 July 2008, pp. 696-700.
[7] Jicheng W., Yuan H., Gangshan W. &Fuyan Z., ?Web Mining: Knowledge Discovery on the Web‘. In Proceedings of IEEE International Conference on System, Man and Cybernetics 1999 (IEEE SMC‘99), Vol. 2 , pp. 137-141.
[08] Zhan L. &Zhijing L., ?Web Mining based on Multi-Agents‘. Published in proceedings of Fifth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA‘03), 2003.
[9] C.Dimou, A.Batzios, A.L.Symeonidis and P.A.Mitkas, ?A Multi-agent framework for Spiders Traversing the Semantic Web‘. In Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence.
[10] F. Buccafurri, G. Lax, D. Rosaci and D. Ursino, ?Dealing with Semantic Heterogeneity for Improving Web Usage‘. Data Knowledge Eng. Vol. 58, Issue 3, pp. 436–465,2006.
[11] Singh A., Juneja D. and Sharma A.K., ?Design of Ontology-Driven Agent based Focused Crawlers‘. In proceedings of 3rd International Conference on Intelligent Systems & Networks (IISN-2009),Organized by Institute of Science and Technology, Klawad, 14 -16 Feb 2009, pp. 178-181. Available online in ECONOMICS OF NETWORKS ABSTRACTS, Volume 2, No. 8: Jan 25, 2010.
[12] Singh A., Juneja D., Sharma A.K., ‘Design of An Intelligent And Adaptive Mapping Mechanism For MultiagentInterface‘.In Proceedings of International Conference on High Performance Architecture and Grid Computing Communications in Computer and Information Science (HPAGC‘11), 2011, Volume 169, Part 2, 373-384, DOI: 10.1007/978-3-642-22577-2_51.
[13] Singh A., Juneja D., Sharma A.K., ?General Design Structure of Ontological Databases in Semantic Web‘. Published in International Journal of Engineering, Science & Technology, Vol. 2, Issue 5, pp. 1227-1232, 2010.
[14] Karayannidis N. &Sellis T., ?Hierarchical Clustering for OLAP: The CUBE File Approach‘. Published in The VLDB Journal — The International Journal on Very Large Data Bases, Vol. 17, Issue 4, July 2008.
[15] Aarti Singh, ?Agent Based Framework for Semantic Web Content Mining‘. Published in International Journal of Advancements in Technology,Vol. 3 No.2 (April 2012), ISSN 0976-4860.
[16] Agarwal, S., Pandey, G. N., & Tiwari, M. D.:Data Mining in Education: Data Classification and Decision Tree Approach.(2012).
[17] Nicolas Garcia-Pedrajas, Javier Perez-Rodriguez, Aida de Haro-Garcia , ?OligoIS: Scalable Instance Selection for Class-Imbalanced Data Sets?, IEEE Transactions On Systems, Man, And Cybernetics—Part B: Cybernetics.
[18] Dr.D.Ramyachitra, P.Manikandan,?Imbalanced DataSet Classification and solution:A Reviw? International Journal of Computing and Business Research (IJCBR) ISSN (Online) : 2229-6166 Volume 5 Issue 4 July 2014.
[19] Anyanwu, M. N., & Shiva, S. G.: Comparative analysis of serial decision tree classification algorithms. Vol.3, 230-240 International Journal of Computer Science and Security(2009).
[20] Asuncion, A., & Newman, D. J. UCI Machine Learning Repository. Irvine, CA: University of California. School of Information and Computer Science. 2007.
[21] Bakar, A. A., Othman, Z. A., & Shuib, N. L. M. : Building a new taxonomy for data discretization techniques. In Data Mining and Optimization, 2nd Conference. 132-140. IEEE ( 2009).
[22] Balagatabi, Z. N., & Balagatabi, H. N.: Comparison of Decision Tree and SVM Methods in Classification of Researcher`s Cognitive Styles in Academic Environment. vol.1, 31-43. Indian Journal of Automation and Artificial Intelligence, (2013).
[23] Bramer, M.: Decision Tree Induction: Using Entropy for Attribute Selection. In Principles of Data Mining. 49-62. Springer London. (2013).
[24] Bunkar, K., Singh, U. K., Pandya, B., & Bunkar, R.. : Data mining: Prediction for performance improvement of graduate students using classification. In Wireless and Optical Communications Networks (WOCN), Ninth International Conference, 1-5. IEEE. (2012).
[25] Burrows, W. R., Benjamin, M., Beauchamp, S., Lord, E. R., McCollor, D., & Thomson, B.: CART decision-tree statistical analysis and prediction of summer season maximum surface ozone for the Vancouver, Montreal, and Atlantic regions of Canada. Vol. 34 1848-1862, Journal of applied meteorology( 1995).
[26] Cortes, C., & Vapnik, V.: Support vector machine. Vol.203, 273-297. Machine learning (1995).
[27] Cover, T., & Hart, P.: Nearest neighbor pattern classification. Information Theory. Vol. 13, 21- IEEE Transactions (1967).
[28] Dasarathy, B.V.: Nosing Around the Neighborhood: A New System Structure and Classification Rule for Recognition in Partially Expos Environments. Vol. PAMI-2, No. 1, 67-71. Pattern Analysis and Machine Intelligence. IEEE Transactions (1980).
[29] Stonebraker, M., Çetintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. ACM SIGMOD Record. 34, 42-47 (2005).
[30] Cugola, G., Margara, A.: Processing Flows of Information : From Data Stream to Complex Event Processing. ACM Computing Surveys.
[31] Niblett, P.: Event Processing In Action. (2010).
[32] Wang, Q., Meegan, J., Freund, T., Li, F.T., Cosgrove, M.: Smarter City: The Event Driven Realization of City-Wide Collaboration. 2010 International Conference on Management of e-Commerce and eGovernment. 195-199 (2010).
[33] Giffinger, R., Fertner, C., Kramar, H., Kalasek, R., Pichler-Milanovic, N., Meijers, E.: Smart cities Ranking of European medium-sized cities. , Vienna, Austria (2007).
[34] Transport for London, Live Traffic Disruptions – Data Dictionary, (last accessed 1st September 2012), http://www.tfl.gov.uk/assets/downloads/businessandpar tners/data-dictionary-live-traffic-disruptions.pdf.
[35] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations, Volume 11, Issue 1, 2009.
[36] C. Romero, S. Ventura, E. Garcia, "Data mining in course management systems: Moodle case study and tutorial", Computers & Education, Vol. 51, No. 1, pp. 368-384, 2008.
[37] C. Romero, S. Ventura "Educational data Mining: A Survey from 1995 to 2005", Expert Systems with Applications (33), pp. 135-146, 2007.
[38] Shaeela Ayesha, Tasleem Mustafa, Ahsan Raza Sattar, M. Inayat Khan, “Data Mining Model for Higher Education

Sementic web mining, Resource Description Framework