Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/592434
Title: | Design and development of progressive sampling models for increasing the efficiency of data analytics |
Researcher: | B C, Yathish Aradhya |
Guide(s): | H A, Dinesha |
Keywords: | Computer Science Computer Science Interdisciplinary Applications Engineering and Technology |
University: | Visvesvaraya Technological University, Belagavi |
Completed Date: | 2024 |
Abstract: | Big Data become famous for its huge quantity and multi-faceted nature which offers obstacles for analytics thus making it impractical to exploit the complete dataset for training learning algorithms. The challenge of sampling such much data is difficult. Dealing with big data involves processing huge amounts, creating a substantial challenge for both academia and industries. Conventional sampling strategies fall short when confronted with difficulties like data imbalance, substantial data heterogeneity, and multi-dimensionality. While existing progressive sampling systems frequently rely on random sample selection, this may be inadequate, particularly in scenarios with high data heterogeneity and unbalanced data circumstances. The restrictions in random feature selection can lead to erroneous performance due to insufficient or insignificant feature learning. Therefore, a resilient computing environment, along with better pre-processing, feature-sensitive feature selection, and classification, offers as a promising solution for effective big data analytics. newlineIn the first study, innovative approaches to enhance the operational efficiency of Probabilistic Sampling Algorithms (PSA) have been explored. Thus, mainly focusing on a substantial reduction in the cardinality of the training dataset while preserving the accuracy of the learning method within the framework of Probably Approximately Correct (PAC) using Bounds of Rademacher Averages (RMA). The newly suggested strategies rely on the divergence of statistics for the selection of an initial set of samples for Rademacher averages and a sampling schedule that is reliant on data and#8455;-approximation, giving a tight constraint for the learning process. These approaches are developed to maximize the theoretical assessments of Rademacher averages in the context of a progressive sampling algorithm for estimating stopping time. The PSA s runtime is successfully controlled by the PSA through the adoption of these novel approaches, with a time complexity of O(and#8455;-approximation |
Pagination: | 158 |
URI: | http://hdl.handle.net/10603/592434 |
Appears in Departments: | Department of Computer Science and Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 32.59 kB | Adobe PDF | View/Open |
02_prelim pages.pdf | 476.62 kB | Adobe PDF | View/Open | |
03_content.pdf | 196.82 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 159.68 kB | Adobe PDF | View/Open | |
05_chapter 1.pdf | 284.7 kB | Adobe PDF | View/Open | |
06_chapter 2.pdf | 146.73 kB | Adobe PDF | View/Open | |
07_chapter 3.pdf | 323.23 kB | Adobe PDF | View/Open | |
08_chapter 4.pdf | 580.52 kB | Adobe PDF | View/Open | |
09_chapter 5.pdf | 413.73 kB | Adobe PDF | View/Open | |
10_annexures.pdf | 297.87 kB | Adobe PDF | View/Open | |
11_chapter 6.pdf | 437.32 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 18.18 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: