A Theoretical Review on SMS Normalization using Hidden Markov Models (HMMs)

International Journal of Computer Trends and Technology (IJCTT)          
© - July Issue 2013 by IJCTT Journal
Volume-4 Issue-7                           
Year of Publication : 2013
Authors :Ratika Bali


Ratika Bali "A Theoretical Review on SMS Normalization using Hidden Markov Models (HMMs)"International Journal of Computer Trends and Technology (IJCTT),V4(7):2388-2387 July Issue 2013 .ISSN 2231-2803.www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract:- SMS language or textese is a term for the abbreviations and slang most commonly used due to the necessary brevity of mobile phone text messaging, in particular the widespread SMS (Short Message Service) communication protocol. [1] Recent times have seen a magnificent augmentation in mobile based data services that facilitate people to use SMS to access these data services. With the dynamically escalating diffusion of mobile phones, social networking and micro blogging, textese-pigeonholed by atypical acronyms, shortening and omissions, has rapidly emerged as the language of the youth. It throws up a challenge to conventional electronic processing of text and thus calls for SMS Normalization. In this research paper, the usage of Hidden Markov Models (HMMs) has been illustrated to perform SMS normalization by filtering the textese and generate noise-free conventional form of original text.


[1] http://en.wikipedia.org/wiki/SMS_language
[2] http://www.dtxtrapp.com/
[3] http://transl8it.com/
[4] http://www.lingo2word.com/translate.php
[5] http://classes.soe.ucsc.edu/cmpe264/Fall06/LecHMM.pdf
[6] http://en.wikipedia.org/wiki/Markov_model
[7] http://digital.cs.usu.edu/~cyan/CS7960/hmm-tutorial.pdf
[5] P. Deepak, V Subramaniam, et al. (2012) ‘Correcting SMS Text Automatically.’ CSI Communications.
[8] AiTi Aw, et al. (2006). “A Phrase-Based Statistical Model for SMS Text Normalization”, Proceedings of COLING/ ACL Conference, Sydney, Australia.
[9] Brown, P, et al. (1993). “The mathematics of statistical machine translation: parameter estimation”, Computational Linguistics, 19(2), 263-311.
[10] Choudhury, M, et al. (2007). “Investigation and modeling of the structure of texting language”, 1st Intl. Workshop on Analytics for Noisy Unstructured Text Data, Hyderabad, India.

Keywords : — SMS, textese, noise, normalization, HMMs, training set.