Design of a novel incremental parallel webcrawler

Yadav, Divakar

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/2415

Full metadata record

DC Field	Value	Language
dc.date.accessioned	2011-08-25T10:33:10Z	-
dc.date.available	2011-08-25T10:33:10Z	-
dc.date.issued	2011-08-25	-
dc.identifier.uri	http://hdl.handle.net/10603/2415	-
dc.description.abstract	World Wide Web (WWW) is a huge repository of interlinked hypertext documents known as Web pages. Users access these hypertext documents via Internet. Since its inception in 1990, WWW has become many folds in size, now it contains more than 50 billion publicly accessible web documents distributed all over the world on thousands of web servers and still growing at exponential rate. It is very difficult to search information from such a huge collection of World Wide Web as the web pages/documents are not organized as books on shelves in a library, nor are web pages completely catalogued at one central location. Search engine is basic information retrieval tool, used to access information from WWW. Users provide search queries in the Search engine’s interface. In response to the search query provided, Search engines use their database to search the relevant documents and produce the result after ranking on the basis of relevance. In fact, the Search engine builds its database, with the help of Web Crawlers, where a WebCrawler is a program that traverses the Web and collects information about web documents. To maximize the download rate and to retrieve the whole or significant portion of the Web search engines run multiple crawlers in parallel. Overlapping of downloaded web documents, quality, network bandwidth and refreshing of web documents are the major challenging problems faced by existing parallel web crawlers that are addressed in this work. A Multi Threaded (MT) Server based novel architecture for incremental parallel web crawler has been designed that helps to reduce overlapping, quality and network bandwidth problems. Additionally, web page change detection methods have been developed to refresh the web document by detecting the structural, presentation and content level changes in web documents. These change detection methods help to detect whether version of a web page, existing at Search engine side has got changed at Web server end or not. If it has got changed at Web server end, the WebCrawler should replace the existing version at Search engine database side to keep its repository up-to-date.	en_US
dc.format.extent	xvi, 160p.	en_US
dc.language	English	en_US
dc.rights	university	en_US
dc.title	Design of a novel incremental parallel webcrawler	en_US
dc.creator.researcher	Yadav, Divakar	en_US
dc.subject.keyword	Computer Science	en_US
dc.subject.keyword	webcrawler	en_US
dc.subject.keyword	Information retrieval	en_US
dc.subject.keyword	World wide web	en_US
dc.subject.keyword	Information Technology	en_US
dc.description.note	References p. 123-132, Appendix p. 133-147, Synopsis p. synopsis-1-synopsis-12	en_US
dc.contributor.guide	Gupta, J P	en_US
dc.contributor.guide	Sharma, A K	en_US
dc.publisher.place	Noida	en_US
dc.publisher.university	Jaypee Institute of Information Technology	en_US
dc.publisher.institution	Department of Computer Science Engineering and Information Technology	en_US
dc.date.completed	2010	en_US
dc.date.awarded	2010	en_US
dc.format.accompanyingmaterial	None	en_US
dc.type.degree	Ph.D.	en_US
dc.source.inflibnet	INFLIBNET	en_US
Appears in Departments:	Department of Computer Science Engineering and Information Technology

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	21.65 kB	Adobe PDF	View/Open
02_table of contents.pdf		15.99 kB	Adobe PDF	View/Open
03_declaration.pdf		9.78 kB	Adobe PDF	View/Open
04_certificate.pdf		10.01 kB	Adobe PDF	View/Open
05_acknowledgement.pdf		10.39 kB	Adobe PDF	View/Open
06_abstracts.pdf		11.02 kB	Adobe PDF	View/Open
07_list of acronyms & abbreviations.pdf		10.13 kB	Adobe PDF	View/Open
08_list of figures.pdf		14.21 kB	Adobe PDF	View/Open
09_list of tables.pdf		10 kB	Adobe PDF	View/Open
10_chapter 1.pdf		119.84 kB	Adobe PDF	View/Open
11_chapter 2.pdf		271.94 kB	Adobe PDF	View/Open
12_chapter 3.pdf		158.14 kB	Adobe PDF	View/Open
13_chapter 4.pdf		155.62 kB	Adobe PDF	View/Open
14_chapter 5.pdf		499.26 kB	Adobe PDF	View/Open
15_chapter 6.pdf		109.77 kB	Adobe PDF	View/Open
16_references.pdf		119.94 kB	Adobe PDF	View/Open
17_appendix.pdf		690.53 kB	Adobe PDF	View/Open
19_synopsis.pdf		52.65 kB	Adobe PDF	View/Open

Show simple item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET