Text Summarization using K-Means, Tanimoto Distance & Jaccard Similarity

Authors : Annu Sharma, Ms.Nandini Sharma
Text Summarization is reduction procedure of content, text, passage source into the tiny or short text nevertheless still preserve and retain the crucial and significant information enclosed. This scheme confers the Summarization of the information like reviews, blogs, news from the web pages based on the content and context for the specific category or class using machine learning techniques like K-Means, Tanimoto Distance Jaccard Similarity and word frequency weighting. The aim or contemplation is to recapitulate, minimize and summarize the reviews, blogs and news web pages automatically to abridge the procedure of discovery a middle of reviews, blogs and news information. The analysis was completed by measuring the accurateness of the précis and summary by precision and recall calculation. From the analysis consequences, it was establish that the précis or summary produces accuracy rate of precise summary is approx 80% for and concise summary is approx 73% for English language reviews, blogs and news available online. The proposed scheme depicts that by assimilation of two or more techniques using machine learning were relatively successful and effectual in intriguing the essence of equivalent reviews, blogs and news that taken manually by humans as a précis or summary.

Automatic Text Summarization, Machine Learning, K-Means Clustering, Tanimoto Distance, Jaccard Similarity.

