A Theoretical Review on SMS Normalization using Hidden Markov Models (HMMs)

Ratika Bali

doi:https://doi.org/10.14445/22312803/IJCTT-V4I7P177

Research Article | Open Access | Download PDF

Volume 4 | Issue 7 | Year 2013 | Article Id. IJCTT-V4I7P177 | DOI : https://doi.org/10.14445/22312803/IJCTT-V4I7P177

A Theoretical Review on SMS Normalization using Hidden Markov Models (HMMs)

Ratika Bali

Citation :

Ratika Bali, "A Theoretical Review on SMS Normalization using Hidden Markov Models (HMMs)," International Journal of Computer Trends and Technology (IJCTT), vol. 4, no. 7, pp. 2388-2391, 2013. Crossref, https://doi.org/10.14445/22312803/IJCTT-V4I7P177

Abstract

SMS language or textese is a term for the abbreviations and slang most commonly used due to the necessary brevity of mobile phone text messaging, in particular the widespread SMS (Short Message Service) communication protocol. [1] Recent times have seen a magnificent augmentation in mobile based data services that facilitate people to use SMS to access these data services. With the dynamically escalating diffusion of mobile phones, social networking and micro blogging, textese-pigeonholed by atypical acronyms, shortening and omissions, has rapidly emerged as the language of the youth. It throws up a challenge to conventional electronic processing of text and thus calls for SMS Normalization. In this research paper, the usage of Hidden Markov Models (HMMs) has been illustrated to perform SMS normalization by filtering the textese and generate noise-free conventional form of original text.

Keywords

SMS, textese, noise, normalization, HMMs, training set.

References

[1] http://en.wikipedia.org/wiki/SMS_language
[2] http://www.dtxtrapp.com/
[3] http://transl8it.com/
[4] http://www.lingo2word.com/translate.php
[5] http://classes.soe.ucsc.edu/cmpe264/Fall06/LecHMM.pdf
[6] http://en.wikipedia.org/wiki/Markov_model
[7] http://digital.cs.usu.edu/~cyan/CS7960/hmm-tutorial.pdf
[5] P. Deepak, V Subramaniam, et al. (2012) ‘Correcting SMS Text Automatically.’ CSI Communications.
[8] AiTi Aw, et al. (2006). “A Phrase-Based Statistical Model for SMS Text Normalization”, Proceedings of COLING/ ACL Conference, Sydney, Australia.
[9] Brown, P, et al. (1993). “The mathematics of statistical machine translation: parameter estimation”, Computational Linguistics, 19(2), 263-311.
[10] Choudhury, M, et al. (2007). “Investigation and modeling of the structure of texting language”, 1st Intl. Workshop on Analytics for Noisy Unstructured Text Data, Hyderabad, India.
[11] Contractor, D, et al. (2010). “Unsupervised cleansing of noisy text”,
[12] Proceedings of the COLING Conference, Beijing, China.
[13] Kobus, C, et al. (2008). “Normalizing SMS: are two metaphors better than one?” Proceedings of the COLING Conference, Manchester.
[14] Venkata Subramaniam, L, et al. (2009). “A survey of types of text noise and techniques to handle noisy text”,
[15] Proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data, Barcelona, Spain [16] Venkata Subramaniam, L (2010). “Noisy Text Analytics”, Tutorial at the NAACL HLT Conference, Los Angeles, USA. N