Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/314165
Title: Microblog processing summarization and impoliteness detection
Researcher: Modha, Sandip Jayantilal
Guide(s): Majumder, Prasenjit
Keywords: Engineering and Technology
Computer Science
Computer Science Interdisciplinary Applications
Social networks
Data Processing
Machine theory
University: Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)
Completed Date: 2019
Abstract: Social Media is an excellent source for studying human interaction and behavior. Sensing social media such as Facebook and Twitter, by the smart autonomous application empower its user community with real-time information unfolds across different part of the world. In this thesis, we study social media text from the summarization and impoliteness perspective. In the first part of the thesis, Microblog Summarization is explored from the three scenarios. In the first scenario, we present a summarization system, built over the Twitter stream, to summarize the topic for a given duration. Daily summary or digest from Microblog is a way to update social media users what happened today on the subject of her interest. To design a Microblog based summarization system, Tweet ranking is the primary task. After ranking tweets, relevant tweet selection is the crucial task for any summarization system due to the massive volume of tweets in the Twitter stream. In addition, the Summarization system should include novel tweets in the summary or digest. The measure of relevance is typically the similarity score obtained from different text similarity (between user information need and tweets) algorithms. More similar, the higher the score. So, we need to choose a threshold that can minimize false-positive judgments in this case. We have developed various methods by exploiting statistical features of the rank list to estimate these thresholds and evaluated against thresholds determined via grid search. We have used language models to rank the tweets to select relevant tweets where the selection of the smoothing technique and its parameters are critical. Results are also compared with the standard probabilistic ranking algorithm BM25. Learning to Rank strategies are also implemented, which show substantial improvement in some of the result metrics. In the second scenario: we develop a real-time version of the summarization system that continually monitors the Twitter stream.
Pagination: xvi, 164 p.
URI: http://hdl.handle.net/10603/314165
Appears in Departments:Department of Information and Communication Technology

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File28.9 kBAdobe PDFView/Open
02_declaration and certificate.pdf22.52 kBAdobe PDFView/Open
03_acknowledgments.pdf19.8 kBAdobe PDFView/Open
04_contents.pdf29.3 kBAdobe PDFView/Open
05_abstract.pdf26 kBAdobe PDFView/Open
06_list of tables.pdf26.3 kBAdobe PDFView/Open
07_list of figures.pdf21.88 kBAdobe PDFView/Open
08_chapter 1.pdf51.22 kBAdobe PDFView/Open
09_chapter 2.pdf102.83 kBAdobe PDFView/Open
10_chapter 3.pdf154.74 kBAdobe PDFView/Open
11_chapter 4.pdf57.64 kBAdobe PDFView/Open
12_chapter 5.pdf590.75 kBAdobe PDFView/Open
13_chapter 6.pdf382.22 kBAdobe PDFView/Open
14_chapter 7.pdf32.86 kBAdobe PDFView/Open
15_references.pdf64.63 kBAdobe PDFView/Open
16_appendix.pdf43.04 kBAdobe PDFView/Open
80_recommendation.pdf43.61 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: