Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/2415
Full metadata record
DC FieldValueLanguage
dc.date.accessioned2011-08-25T10:33:10Z-
dc.date.available2011-08-25T10:33:10Z-
dc.date.issued2011-08-25-
dc.identifier.urihttp://hdl.handle.net/10603/2415-
dc.description.abstractWorld Wide Web (WWW) is a huge repository of interlinked hypertext documents known as Web pages. Users access these hypertext documents via Internet. Since its inception in 1990, WWW has become many folds in size, now it contains more than 50 billion publicly accessible web documents distributed all over the world on thousands of web servers and still growing at exponential rate. It is very difficult to search information from such a huge collection of World Wide Web as the web pages/documents are not organized as books on shelves in a library, nor are web pages completely catalogued at one central location. Search engine is basic information retrieval tool, used to access information from WWW. Users provide search queries in the Search engine’s interface. In response to the search query provided, Search engines use their database to search the relevant documents and produce the result after ranking on the basis of relevance. In fact, the Search engine builds its database, with the help of Web Crawlers, where a WebCrawler is a program that traverses the Web and collects information about web documents. To maximize the download rate and to retrieve the whole or significant portion of the Web search engines run multiple crawlers in parallel. Overlapping of downloaded web documents, quality, network bandwidth and refreshing of web documents are the major challenging problems faced by existing parallel web crawlers that are addressed in this work. A Multi Threaded (MT) Server based novel architecture for incremental parallel web crawler has been designed that helps to reduce overlapping, quality and network bandwidth problems. Additionally, web page change detection methods have been developed to refresh the web document by detecting the structural, presentation and content level changes in web documents. These change detection methods help to detect whether version of a web page, existing at Search engine side has got changed at Web server end or not. If it has got changed at Web server end, the WebCrawler should replace the existing version at Search engine database side to keep its repository up-to-date.en_US
dc.format.extentxvi, 160p.en_US
dc.languageEnglishen_US
dc.rightsuniversityen_US
dc.titleDesign of a novel incremental parallel webcrawleren_US
dc.creator.researcherYadav, Divakaren_US
dc.subject.keywordComputer Scienceen_US
dc.subject.keywordwebcrawleren_US
dc.subject.keywordInformation retrievalen_US
dc.subject.keywordWorld wide weben_US
dc.subject.keywordInformation Technologyen_US
dc.description.noteReferences p. 123-132, Appendix p. 133-147, Synopsis p. synopsis-1-synopsis-12en_US
dc.contributor.guideGupta, J Pen_US
dc.contributor.guideSharma, A Ken_US
dc.publisher.placeNoidaen_US
dc.publisher.universityJaypee Institute of Information Technologyen_US
dc.publisher.institutionDepartment of Computer Science Engineering and Information Technologyen_US
dc.date.completed2010en_US
dc.date.awarded2010en_US
dc.format.accompanyingmaterialNoneen_US
dc.type.degreePh.D.en_US
dc.source.inflibnetINFLIBNETen_US
Appears in Departments:Department of Computer Science Engineering and Information Technology

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File21.65 kBAdobe PDFView/Open
02_table of contents.pdf15.99 kBAdobe PDFView/Open
03_declaration.pdf9.78 kBAdobe PDFView/Open
04_certificate.pdf10.01 kBAdobe PDFView/Open
05_acknowledgement.pdf10.39 kBAdobe PDFView/Open
06_abstracts.pdf11.02 kBAdobe PDFView/Open
07_list of acronyms & abbreviations.pdf10.13 kBAdobe PDFView/Open
08_list of figures.pdf14.21 kBAdobe PDFView/Open
09_list of tables.pdf10 kBAdobe PDFView/Open
10_chapter 1.pdf119.84 kBAdobe PDFView/Open
11_chapter 2.pdf271.94 kBAdobe PDFView/Open
12_chapter 3.pdf158.14 kBAdobe PDFView/Open
13_chapter 4.pdf155.62 kBAdobe PDFView/Open
14_chapter 5.pdf499.26 kBAdobe PDFView/Open
15_chapter 6.pdf109.77 kBAdobe PDFView/Open
16_references.pdf119.94 kBAdobe PDFView/Open
17_appendix.pdf690.53 kBAdobe PDFView/Open
19_synopsis.pdf52.65 kBAdobe PDFView/Open


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: