Extended CurlCrawler: A focused and path-oriented framework for crawling the web with thumb

  IJCOT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© - Issue 2012 by IJCTT Journal
Volume-3 Issue-3                           
Year of Publication : 2012
Authors :Dr Ela Kumar, Ashok Kumar

MLA

Dr Ela Kumar, Ashok Kumar"Extended CurlCrawler: A focused and path-oriented framework for crawling the web with thumb"International Journal of Computer Trends and Technology (IJCTT),V3(3):327-335 Issue 2012 .ISSN 2231-2803.www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract: -Information is a vital role playing versatile thing from availability at church level to web through trends of books. WWW is now the exposed and up-to-date huge repository of information available to everyone, everywhere and every time [1]. It is the thrust arena of engineering endeavor and is evolving without a grand design blueprint. Finally, an age has come, where information has become an instrument, a tool that can be used to solve many problems. The biggest challenge being posed by the Internet is its ever-growing size with the availability of endless pool of information hosted on the World Wide Web (WWW). It is problematic to identify and ping with graphical frame of mind for the desired information amongst the large set of web pages resulted by the search engine with reduced chaffing and cross features of the framework. With further increase in the size of the Internet, the problem grows exponentially. Crawlers can retrieve data much quicker and in greater depth than human searchers, so they can have a crippling impact on the performance of a site [7, 17]. Needless to say that building an effective web crawler to solve your purpose is not a difficult task, but choosing the right strategies and building an effective architecture will lead to implementation of multi-agent framework to outcome highly featured web crawler application [2, 3]. This paper is an experimental strives to develop and implement an extended framework with extended architecture to make search engines more efficient using local resource utilization features of the programming. This work is an implementation experience for use of focused and path oriented approach to provide a cross featured framework for search engines with human powered approach. In addition to curl programming, personalization of information, caching and graphical perception, main features of this framework are cross platform, cross architecture, focused, path oriented and human powered.

References-

[1]. Segev, Elad (2010). Google and the Digital Divide: The Biases of Online Knowledge, Oxford: Chandos Publishing.
[2].Vaughan, L. & Thelwall, M. (2004). Search engine Coverage bias: evidence and possible causes, Information Processing &Management,40(4), 693-707.
[3].Gandal, Neil (2001). "The dynamics of competition in the internet search engine market". International Journal of Industrial Organization 19 (7): 1103–1117.
[4].Kobayashi, M. and Takeda, K. (2000). "Information retrieval on the web". ACM Computing Surveys (ACM Press).
[5].Steve Lawrence; C. Lee Giles (1999). "Accessibility of information on the web". Nature 400 (6740): 107–9.
[6].Zeinalipour-Yazti, D. and Dikaiakos, M. D. (2002). Design and mplementation of a distributed crawler and filtering processor. In Proceedings of the Fifth Next Generation Information Technologies and Systems (NGITS).
[7].Cho,Junghoo,"Crawling the Web: Discovery and Maintenance of a Large- Scale Web Data", Ph.D. dissertation, Department of Computer Science, Stanford University, November 2001.
[8].Shkapenyuk, V. and Suel, T. (2002). Design and implementation of a high performance distributed web crawler.In Proceedings of the 18th International Conference on Data Engineering (ICDE), pages 357-368, San Jose, California. IEEE CS Press.
[9].Edwards, J., McCurley, K. S., and Tomlin, J. A. (2001). "An daptive model for optimizing performance of an incremental web crawler". In Proceedings of the Tenth Conference on World Wide Web (Hong Kong:Elsevier Science).
[10].Shestakov, Denis (2008). Search Interfaces on the Web: Querying and Characterizing. TUCS Doctoral Dissertations 104, University of Turku.
[11].Chakrabarti, S., van den Berg, M., and Dom, B.(1999). Focused crawling: a new approach to topic-specific web resource discovery. Computer Networks,

KeywordsTopical, SOAP, Interacting Agent, WSDL, Thumb, Whois, CachedDatabase, IECapture, Searchcon, Main_spider, UDDI.