A Web Crawler Design for Data Warehousing

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2017 by IJCTT Journal
Volume-50 Number-3
Year of Publication : 2017
Authors : Prof. Leena H. Patil, AnkitKhobragade, Priyakant Satpudke, Nikhil Sangani
DOI :  10.14445/22312803/IJCTT-V50P127

MLA

Prof. Leena H. Patil, AnkitKhobragade, Priyakant Satpudke, Nikhil Sangani "A Web Crawler Design for Data Warehousing". International Journal of Computer Trends and Technology (IJCTT) V50(3):151-154, August 2017. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
The size of the web is becoming a focus for research activities. The internet is the largest collection of data today. For this computer programs need to conduct any large scale processing of web pages. So we need the use of web crawler at some stage in order to fetch the pages that should be analysed. A web crawler is a program which browses the internet in a methodical, automated manner. This process is called web crawling. A search engine uses web crawler to collect web pages from internet and the web crawler collects it by web crawling. In this paper we have reviewed a web crawler design for data warehousing which allows to search in offline mode also.

References
[1] Mini Singh Ahuja Dr Jatinder Singh BalVarnica,“Web Crawler: Extracting the Web Data”, International Journal of Computer Trends and Technology (IJCTT) – volume 13 number 3 – July 2014 [2] Pavalam, S. M., SV Kashmir Raja, Felix K. Akorli, and M. Jawahar, “A Survey of Web Crawler Algorithms,” International Journal of Computer Science, vol. 8, iss. 6, no 1, Nov. 2011. [3] Component of web search system figure, Accessed July 25,2017. https://www.google.co.in/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwionMns_aTVAhXLpo8KHaB0BPQQjRwIBw&url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FWeb_crawler&psig=AFQjCNGV10hs3Dm9EBk_yJgFmXskFXXq6g&ust=1501090566478463 [4] http://www.mathcs.emory.edu/~cheung/Courses/171/Syllabus/11-Graph/bfs.html [5] https://www.researchgate.net/figure/315347498_fig1_Fig2-Pseudocode-for-Best-First-Search-algorithm [6] http://db.cs.duke.edu/courses/fall11/cps149s/notes/a_star.pdf [7] Shalini Sharma,“Web Crawler”, International Journal of Advanced Research in Computer Science and Software Engineering- Volume 4, Issue 4, April 2014 [8] Aviral Nigam, “Web Crawling Algorithms”, International Journal of Computer Science and Artificial Intelligence Sept. 2014, Vol. 4 Iss. 3, PP. 63-67

Keywords
Web crawler; Search engine; Offline mode.