Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/314165
Title: | Microblog processing summarization and impoliteness detection |
Researcher: | Modha, Sandip Jayantilal |
Guide(s): | Majumder, Prasenjit |
Keywords: | Engineering and Technology Computer Science Computer Science Interdisciplinary Applications Social networks Data Processing Machine theory |
University: | Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT) |
Completed Date: | 2019 |
Abstract: | Social Media is an excellent source for studying human interaction and behavior. Sensing social media such as Facebook and Twitter, by the smart autonomous application empower its user community with real-time information unfolds across different part of the world. In this thesis, we study social media text from the summarization and impoliteness perspective. In the first part of the thesis, Microblog Summarization is explored from the three scenarios. In the first scenario, we present a summarization system, built over the Twitter stream, to summarize the topic for a given duration. Daily summary or digest from Microblog is a way to update social media users what happened today on the subject of her interest. To design a Microblog based summarization system, Tweet ranking is the primary task. After ranking tweets, relevant tweet selection is the crucial task for any summarization system due to the massive volume of tweets in the Twitter stream. In addition, the Summarization system should include novel tweets in the summary or digest. The measure of relevance is typically the similarity score obtained from different text similarity (between user information need and tweets) algorithms. More similar, the higher the score. So, we need to choose a threshold that can minimize false-positive judgments in this case. We have developed various methods by exploiting statistical features of the rank list to estimate these thresholds and evaluated against thresholds determined via grid search. We have used language models to rank the tweets to select relevant tweets where the selection of the smoothing technique and its parameters are critical. Results are also compared with the standard probabilistic ranking algorithm BM25. Learning to Rank strategies are also implemented, which show substantial improvement in some of the result metrics. In the second scenario: we develop a real-time version of the summarization system that continually monitors the Twitter stream. |
Pagination: | xvi, 164 p. |
URI: | http://hdl.handle.net/10603/314165 |
Appears in Departments: | Department of Information and Communication Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 28.9 kB | Adobe PDF | View/Open |
02_declaration and certificate.pdf | 22.52 kB | Adobe PDF | View/Open | |
03_acknowledgments.pdf | 19.8 kB | Adobe PDF | View/Open | |
04_contents.pdf | 29.3 kB | Adobe PDF | View/Open | |
05_abstract.pdf | 26 kB | Adobe PDF | View/Open | |
06_list of tables.pdf | 26.3 kB | Adobe PDF | View/Open | |
07_list of figures.pdf | 21.88 kB | Adobe PDF | View/Open | |
08_chapter 1.pdf | 51.22 kB | Adobe PDF | View/Open | |
09_chapter 2.pdf | 102.83 kB | Adobe PDF | View/Open | |
10_chapter 3.pdf | 154.74 kB | Adobe PDF | View/Open | |
11_chapter 4.pdf | 57.64 kB | Adobe PDF | View/Open | |
12_chapter 5.pdf | 590.75 kB | Adobe PDF | View/Open | |
13_chapter 6.pdf | 382.22 kB | Adobe PDF | View/Open | |
14_chapter 7.pdf | 32.86 kB | Adobe PDF | View/Open | |
15_references.pdf | 64.63 kB | Adobe PDF | View/Open | |
16_appendix.pdf | 43.04 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 43.61 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: