Variational Autoencoder based Data Augmentation & Corroboration

© 2021 by IJCTT Journal
Volume-69 Issue-4
Year of Publication : 2021
Authors : Atharva Bankar, Chanavi Singh, Lakshanya Shinde, Pallavi Udatewar
DOI :  10.14445/22312803/IJCTT-V69I4P106

How to Cite?

Atharva Bankar, Chanavi Singh, Lakshanya Shinde, Pallavi Udatewar, "Variational Autoencoder based Data Augmentation & Corroboration," International Journal of Computer Trends and Technology, vol. 69, no. 4, pp. 23-33, 2021. Crossref,

Cybersecurity attacks spanning countries and organizations are triggered by networks that are compromised with cryptographic ransomware, which results in the loss of millions of dollars in the form of extortion amount. By encrypting the user files, this type of malicious software takes them hostage and demands a large ransom payment in exchange for the decryption key. In most cases, cryptocurrency is used as a method of payment. The combination of efficient and well implemented cryptographic methods to take the data hostage, the Tor protocol for anonymous correspondence, and the use of a cryptocurrency to collect unmediated payments give ransomware attackers a high degree of impunity. Every year, a number of ransomware attacks on various institutions compel them to keep a huge chunk of money aside to pay the ransom in order to access their files quickly. This calls for a need to address this issue. In this paper, we propose the use of Autoencoders (AE) and Variational Autoencoders (VAE) to augment the data consisting of ransomware properties with two techniques: AE and VAE on the entire test set and on each ransomware independently. This verifies the robustness of Machine Learning models- Extra Tree Classifier, XGBoost Classifier, and Random Forest Classifier. The metrics used to judge the classification of the ransomware verifies if the data generated is in accordance with the dataset used.

Cryptocurrency, Bitcoin, Ransomware, Machine Learning, Autoencoder, Variational Autoencoders.


[1] IPWatchdog., The Blockchain Patent Landscape Shows Accelerating Growth. Retrieved from. Accessed December 10(2020)
[2] Infosecurity., Ransomware set for Evolution in Attack Capabilities in 2021. Retrieved from Accessed December 4, (2020).
[3] Cuneyt G. Akcora, Yitao Li, Yulia R. Gel, Murat Kantarcioglu, BitcoinHeist: Topological Data Analysis for Ransomware Prediction on the Bitcoin BlockchaiN., Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence Special Track on AI in FinTech. Pages (2020) 4439- 4445.
[4] [dataset] Akcora, C., Li, Y., Gel, Y., & Kantarcioglu, M., BitcoinHeist: Topological Data Analysis for Ransomware Detection on the Bitcoin Blockchain. arXiv preprint arXiv:1906.07852 (2019).
[5] S. Nakamoto. Bitcoin: A peer-to-peer electronic cash system (2008)
[6] Masarah Paquet-Clouston, Bernhard Haslhofer, Benoît Dupont, Ransomware payments in the Bitcoin ecosystem, Journal of Cybersecurity, 5(1)(2019) tyz003,
[7] Huang, D. Y., Aliapoulios, M. M., Li, V. G., Invernizzi, L., Bursztein, E., McRoberts, K., Levin, J., Levchenko, K., Snoeren, A. C., & McCoy, D., Tracking Ransomware End-to-end. In Proceedings - 2018 IEEE Symposium on Security and Privacy, SP (2018) 618-631. [8418627] (Proceedings - IEEE Symposium on Security and Privacy; (2018). Institute of Electrical and Electronics Engineers Inc..
[8] Conti, Mauro & Gangwal, Ankit & Ruj, Sushmita. (2018). On the Economic Significance of Ransomware Campaigns: A Bitcoin Transactions Perspective. Computers & Security. 79(2018) 162- 189.
[9] Androulaki E., Karame G.O., Roeschlin M., Scherer T., Capkun S., Evaluating User Privacy in Bitcoin. In: Sadeghi AR. (eds) Financial Cryptography and Data Security. FC. Lecture Notes in Computer Science,7859(2013). Springer, Berlin, Heidelberg.
[10] Gangwar, Keertika & Mohanty, Subhranshu & Mohapatra, A., Analysis, and Detection of Ransomware Through Its Delivery Methods., (2018).
[11] S. K. Shaukat and V. J. Ribeiro, RansomWall: A layered defense system against cryptographic ransomware attacks using machine learning.,10th International Conference on Communication Systems & Networks (COMSNETS), Bengaluru, (2018) 356-363,
[12] D. Gonzalez and T. Hayajneh., Detection and prevention of crypto-ransomware., IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), New York, NY, (2017) 472-478,
[13] K. Liao, Z. Zhao, A. Doupe and G. Ahn, Behind closed doors: measurement and analysis of CryptoLocker ransoms in Bitcoin, APWG Symposium on Electronic Crime Research (eCrime), Toronto, ON, (2016) 1-13,
[14] Li Chen, Chih-Yuan Yang, Anindya Paul, Ravi Sahita.Towards resilient machine learning for ransomware detection., In KDD 2019 Workshop. ACM, New York, NY, USA, 10 pages. arXiv preprint arXiv:1812.09400 ., (2019)
[15] S.Mahmudha Fasheem, P.Kanimozhi, B.Akora Murthy. Detection and Avoidance of Ransomware IJEDR 5(1)(2017) ISSN: 2321- 9939,
[16] Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes., arXiv preprint arXiv:1312.6114v10., (2014).
[17] M. Alam, S. Bhattacharya, S. Dutta, S. Sinha, D. Mukhopadhyay, and A. Chattopadhyay., RATAFIA: Ransomware Analysis using Time And Frequency Informed Autoencoders., IEEE International Symposium on Hardware Oriented Security and Trust (HOST), McLean, VA, USA, (2019)218-227,
[18] Breiman, L. Random Forests. Machine Learning 45(2001) 5–32.
[19] Tianqi Chen and Carlos Guestrin. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD `16). Association for Computing Machinery, New York, NY, USA, 785–794.
[20] Geurts, P., Ernst, D. & Wehenkel, L., Extremely randomized trees. Mach Learn 63(2006) 3–4. 006-6226-1