Challenges and Opportunities for Categorizing Big Video Data on the Web

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2016 by IJCTT Journal
Volume-42 Number-1
Year of Publication : 2016
Authors : Mr. Pramod B. Deshmukh, Miss. Snehal Mundada, Miss. Vishakha Metre, Miss. Shraddha K. Popat
  10.14445/22312803/IJCTT-V42P102

MLA

Mr. Pramod B. Deshmukh, Miss. Snehal Mundada, Miss. Vishakha Metre, Miss. Shraddha K. Popat  "Challenges and Opportunities for Categorizing Big Video Data on the Web". International Journal of Computer Trends and Technology (IJCTT) V42(1):7-10, December 2016. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
Video categorization is a very important problem with many applications like content search and organization, smart content-aware advertising, open-source intelligence analysis, etc. This paper discusses selected representative research progresses in categorizing big video data, with a focus on the user-generated videos on the Internet. We identify two major challenges in this vibrant field and envision promising directions that deserve in-depth future investigations. The discussions in this paper are brief but hopefully useful for quickly understanding the current progress and knowing where we should go in the next couple of years.

References
[1] Y.-G. Jiang, S. Bhattacharya, S.-F. Chang, and M. Shah, “High-level event recognition in unconstrained videos,” International Journal of Multimedia Information Retrieval, vol. 2, no. 2, pp. 73–101, 2013.
[2] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, 2004.
[3] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in CVPR, 2005.
[4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in CVPR, 2009.
[5] M. Jain, J. van Gemert, and C. G. M. Snoek, “University of amsterdam at thumos challenge 2014,” in ECCV THUMOS Challenge 2014, 2014.
[6] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “CNN features off-the-shelf: an astounding baseline for recognition,” CoRR, 2014.
[7] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies,” in CVPR, 2008.
[8] H. Wang and C. Schmid, “Action recognition with improved trajecto-ries,” in ICCV, 2013.
[9] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in NIPS, 2014.
[10] Y.-G. Jiang, X. Zeng, G. Ye, S. Bhattacharya, D. Ellis, M. Shah, and S.-F. Chang, “Columbia-ucf trecvid2010 multimedia event detection: Combin-ing multiple modalities, contextual concepts, and temporal matching,” NIST TRECVid Workshop, 2010.
[11] Y.-G. Jiang, C.-W. Ngo, and J. Yang, “Towards optimal bagof- features for object categorization and semantic video retrieval,” in ACM CIVR, 2007.
[12] S. Maji, A. C. Berg, and J. Malik, “Classification using intersection kernel support vector machines is efficient,” in CVPR, 2008.
[13] Z. Wu, Y.-G. Jiang, J. Wang, J. Pu, and X. Xue, “Exploring inter-feature and inter-class relationships with deep neural networks for video classification,” in ACM Multimedia, 2014.
[14] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in CVPR, 2014.
[15] A. Loui, J. Luo, S.-F. Chang, D. Ellis, W. Jiang, L. Kennedy, K. Lee, and A. Yanagawa, “Kodak’s consumer video benchmark data set: concept definition and annotation,” in ACM MIR Workshop, 2007.
[16] J. Cao, Y.-D. Zhang, Y.-C. Song, Z.-N. Chen, X. Zhang, and J.-T. Li, “MCG-WEBV: A benchmark dataset for web video analysis,” Technical Report, CAS Institute of Computing Technology, 2009.
[17] Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui, “Consumer video understanding: A benchmark database and an evaluation of human and machine performance,” in ACM ICMR, 2011.
[18] K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A dataset of 101 human actions classes from videos in the wild,” CoRR, 2012.
[19] Y.-G. Jiang, J. Liu, A. Roshan Zamir, G. Toderici, I. Laptev, M. Shah, and R. Sukthankar, “THUMOS challenge: Action recognition with a large number of classes,” http://crcv.ucf.edu/THUMOS14/, 2014.
[20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” CoRR, 2014.
[21] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, 2014.
[22] G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context- Dependent Pre-Trained Deep Neural Networks for Large- Vocabulary Speech Recognition,” IEEE TASLP, 2012.
[23] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and Their Compo-sitionality,” in NIPS, 2013.
[24] S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” in ICML, 2010.

Keywords
Big Data, intelligence analysis, Convolutional Neural Network, Support Vector Machines, Histogram of Orientated Gradients.