International Journal of Computer
Trends and Technology

Research Article | Open Access | Download PDF

Volume 50 | Number 2 | Year 2017 | Article Id. IJCTT-V50P127 | DOI : https://doi.org/10.14445/22312803/IJCTT-V50P127

A Web Crawler Design for Data Warehousing


Prof. Leena H. Patil, AnkitKhobragade, Priyakant Satpudke, Nikhil Sangani

Citation :

Prof. Leena H. Patil, AnkitKhobragade, Priyakant Satpudke, Nikhil Sangani, "A Web Crawler Design for Data Warehousing," International Journal of Computer Trends and Technology (IJCTT), vol. 50, no. 2, pp. 151-154, 2017. Crossref, https://doi.org/10.14445/22312803/IJCTT-V50P127

Abstract

The size of the web is becoming a focus for research activities. The internet is the largest collection of data today. For this computer programs need to conduct any large scale processing of web pages. So we need the use of web crawler at some stage in order to fetch the pages that should be analysed. A web crawler is a program which browses the internet in a methodical, automated manner. This process is called web crawling. A search engine uses web crawler to collect web pages from internet and the web crawler collects it by web crawling. In this paper we have reviewed a web crawler design for data warehousing which allows to search in offline mode also.

Keywords

Web crawler; Search engine; Offline mode.

References

[1] Mini Singh Ahuja Dr Jatinder Singh BalVarnica,“Web Crawler: Extracting the Web Data”, International Journal of Computer Trends and Technology (IJCTT) – volume 13 number 3 – July 2014 [2] Pavalam, S. M., SV Kashmir Raja, Felix K. Akorli, and M. Jawahar, “A Survey of Web Crawler Algorithms,” International Journal of Computer Science, vol. 8, iss. 6, no 1, Nov. 2011. [3] Component of web search system figure, Accessed July 25,2017. https://www.google.co.in/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwionMns_aTVAhXLpo8KHaB0BPQQjRwIBw&url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FWeb_crawler&psig=AFQjCNGV10hs3Dm9EBk_yJgFmXskFXXq6g&ust=1501090566478463 [4] http://www.mathcs.emory.edu/~cheung/Courses/171/Syllabus/11-Graph/bfs.html [5] https://www.researchgate.net/figure/315347498_fig1_Fig2-Pseudocode-for-Best-First-Search-algorithm [6] http://db.cs.duke.edu/courses/fall11/cps149s/notes/a_star.pdf [7] Shalini Sharma,“Web Crawler”, International Journal of Advanced Research in Computer Science and Software Engineering- Volume 4, Issue 4, April 2014 [8] Aviral Nigam, “Web Crawling Algorithms”, International Journal of Computer Science and Artificial Intelligence Sept. 2014, Vol. 4 Iss. 3, PP. 63-67