A Web Crawler Design for Data Warehousing

The size of the web is becoming a focus for research activities. The internet is the largest collection of data today. For this computer programs need to conduct any large scale processing of web pages. So we need the use of web crawler at some stage in order to fetch the pages that should be analysed. A web crawler is a program which browses the internet in a methodical, automated manner. This process is called web crawling. A search engine uses web crawler to collect web pages from internet and the web crawler collects it by web crawling. In this paper we have reviewed a web crawler design for data warehousing which allows to search in offline mode also.

Web crawler; Search engine; Offline mode.