Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/12226
Title: Intelligent techniques for enhancing recall and precission in cross lingual search among indian languages
Researcher: Siva Kumar, A P
Guide(s): Govardhan, A
Premchand, A
Keywords: LINGUAL SEARCH AMONG INDIAN LANGUAGES
Upload Date: 24-Oct-2013
University: Jawaharlal Nehru Technological University, Anantapuram
Completed Date: 23.07.2011
Abstract: The Indian language information access technologies face severe precision and recall problems when using conventional Information Retrieval techniques (used for English-like languages).This is a study of Indian language information access. During this study, we investigated the web extensively for Indian languages and in this process, we came up with some solutions for the low recall and precision problems. We focused our research on the key components of cross-lingual search like indexing, transliteration, stemming, translation and summarization. The following are some of the major contributions of this thesis. newlineand#61623; We built a query-biased summarizer based on similarity of sentences. newlineand#61623; We produced a summary which is having good linguistic quality and also is non-redundant with good Recall and Precision rates. newlineand#61623; We developed Hindi- English CLIR without using query translation. newlineand#61623; We resolved polysemy, synonymy problems caused by dictionary based indexing. newlineand#61623; We retrieved the documents based on the semantic relation between documents. newlineand#61623; We defined and developed a language independent stemmer. newlineand#61623; We modeled an unsupervised Telugu stemmer that does not require inflection-root pair for training. newlineand#61623; We showed retrieval effectiveness and reduced the size of the index for the Telugu information retrieval task. newlineand#61623; We constructed a Machine transliteration model which combines both grapheme and phoneme based transliteration models. newlineand#61623; We came up with user friendly transliteration of words by considering both the spelling and the pronunciation of a word. newlineand#61623; We designed complete file transliteration model. newlineii newlineAll our evaluations were based on Telugu, Hindi and English datasets, which are proved to be reasonably good for most of the Indian Languages with minor modifications. In most of our experiments, we used system-based evaluation methodology which is a widely used evaluation methodology in Information Access research community. All available standard evaluation datasets were used. In other cases, we built our evaluation datasets us
Pagination: 239 Pages
URI: http://hdl.handle.net/10603/12226
Appears in Departments:Department of Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File77.49 kBAdobe PDFView/Open
02_certificates.pdf84.47 kBAdobe PDFView/Open
03_acknowledgements.pdf276.43 kBAdobe PDFView/Open
04_contents.pdf239.41 kBAdobe PDFView/Open
05_preface.pdf186.16 kBAdobe PDFView/Open
06_list of tables figures.pdf163.21 kBAdobe PDFView/Open
07_chapter1.pdf419.51 kBAdobe PDFView/Open
08_chapter2.pdf533.42 kBAdobe PDFView/Open
09_chapter3.pdf486.96 kBAdobe PDFView/Open
10_chapter4.pdf784.98 kBAdobe PDFView/Open
11_chapter5.pdf590.71 kBAdobe PDFView/Open
12_chapter6.pdf587.6 kBAdobe PDFView/Open
13_chapter7.pdf94.5 kBAdobe PDFView/Open
14_references.pdf295.85 kBAdobe PDFView/Open
15_appendix.pdf72.79 kBAdobe PDFView/Open


Items in Shodhganga are protected by copyright, with all rights reserved, unless otherwise indicated.