Video Summarization with Neural Networks: A Systematic Comparison of State-of-the-Art Techniques

M. Hamza Eissa; Hesham Farouk; Kamal Eldahshan; Amr Abozeid

doi:https://doi.org/10.14445/22312803/IJCTT-V73I2P111

Research Article | Open Access | Download PDF

Volume 73 | Issue 2 | Year 2025 | Article Id. IJCTT-V73I2P111 | DOI : https://doi.org/10.14445/22312803/IJCTT-V73I2P111

Video Summarization with Neural Networks: A Systematic Comparison of State-of-the-Art Techniques

M. Hamza Eissa, Hesham Farouk, Kamal Eldahshan, Amr Abozeid

Received	Revised	Accepted	Published
31 Dec 2024	27 Jan 2025	18 Feb 2025	28 Feb 2025

Citation :

M. Hamza Eissa, Hesham Farouk, Kamal Eldahshan, Amr Abozeid, "Video Summarization with Neural Networks: A Systematic Comparison of State-of-the-Art Techniques," International Journal of Computer Trends and Technology (IJCTT), vol. 73, no. 2, pp. 90-108, 2025. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V73I2P111

Abstract

Video summarization represents an essential research field dedicated to developing fast methods that extract valuable content from extensive video collections. An evaluation of video summarization strategies with Neural Networks analyzes methodologies together with architectures and evaluation methods, datasets, and performance assessments. This paper conducts an in-depth analysis of different methodological approaches, including supervised and unsupervised learning, reinforcement learning, hybrid models, and object-centric methods, while assessing their performance traits and existing constraints. The popularity of deep learning techniques such as attention mechanism transformers along with hierarchical reinforcement learning shows continuous growth due to their effectiveness in improving summarization accuracy and efficiency. Summaries achieve higher quality through visual, audio, and textual features, which help produce better outputs for video tutorials combined with tournament highlights and security footage monitoring. The research investigates evaluation frameworks specifically by highlighting weaknesses in existing benchmark metrics and calling for Performance over Random (PoR) as a strong alternative framework. The current model faces ongoing issues with real-time operation, computational speed, and summation generation control based on user needs. The paper explores existing state-of-the-art approaches and proposes research directions focusing on scalability while developing user-customized frameworks and better-assessing metrics.

Keywords

Video Summarization, Feature Extraction, Neural Networks, Deep Learning, Reinforcement Learning.

References

[1] Lingkun Chen et al., “Convolutional Neural Networks (CNNs)-Based Multi-Category Damage Detection and Recognition of High-Speed Rail (HSR) Reinforced Concrete (RC) Bridges Using Test Images,” Engineering Structures, vol. 276, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Ambre Dupuis, Camélia Dadouchi, and Bruno Agard, “Methodology for Multi-Temporal Prediction of Crop Rotations Using Recurrent Neural Networks,” Smart Agricultural Technology, vol. 4, pp. 1-13, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Preeti Meena, Himanshu Kumar, and Sandeep Kumar Yadav, “A Review on Video Summarization Techniques,” Engineering Applications of Artificial Intelligence, vol. 118, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Ambreen Sabha, and Arvind Selwal, “Data-Driven Enabled Approaches for Criteria-Based Video Summarization: A Comprehensive Survey, Taxonomy, and Future Directions,” Multimedia Tools and Applications, vol. 82, pp. 32635-32709, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Yujie Li et al., “A DCA-Based Sparse Coding for Video Summarization with MCP,” IET Image Processing, vol. 17, no. 5, pp. 1564 1577, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Anil Singh Parihar, Joyeeta Pal, and Ishita Sharma, “Multiview Video Summarization using Video Partitioning and Clustering,” Journal of Visual Communication and Image Representation, vol. 74, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Ghazaala Yasmin et al., “Key Moment Extraction for Designing an Agglomerative Clustering Algorithm-Based Video Summarization Framework,” Neural Computing and Applications, vol. 35, no. 7, pp. 4881-4902, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Teja Kattenborn et al., “Review on Convolutional Neural Networks (CNN) in Vegetation Remote Sensing,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 173, pp. 24-49, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Zewen Li et al., “A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 6999-7019, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Nikhil Ketkar, and Jojo Moolayil, Convolutional Neural Networks, Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch, pp. 197-242, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Wencheng Zhu et al., “DSNet: A Flexible Detect-To-Summarize Network for Video Summarization,” IEEE Transactions on Image Processing, vol. 30, pp. 948-962, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Muhammad Rafiq et al., “Scene Classification for Sports Video Summarization Using Transfer Learning,” Sensors, vol. 20, no. 6, pp. 1 18, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Hafiz Burhan Ul Haq et al., “An Effective Video Summarization Framework Based on the Object of Interest Using Deep Learning,” Mathematical Problems in Engineering, vol. 2022, no. 1, pp. 1-25, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Zhong Ji et al., “Deep Attentive Video Summarization with Distribution Consistency Learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 4, pp. 1765-1775, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Evlampios Apostolidis et al., “Performance Over Random: A Robust Evaluation Protocol for Video Summarization Methods,” Proceedings of the 28th ACM International Conference on Multimedia, pp. 1056-1064, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Hansa Shingrakhia, and Hetal Patel, “SGRNN-AM and HRF-DBN: A Hybrid Machine Learning Model for Cricket Video Summarization,” The Visual Computer, vol. 38, no. 7, pp. 2285-2301, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Yunjae Jung et al., “Global-and-Local Relative Position Embedding for Unsupervised Video Summarization,” European Conference on Computer Vision, Glasgow, UK, pp. 167-183, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Shruti Jadon, and Mahmood Jasim, “Unsupervised Video Summarization Framework using Keyframe Extraction and Video Skimming,” IEEE 5th International Conference on Computing Communication and Automation, Greater Noida, India, pp. 140-145, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Muhammad Atif Afzal, and Muhammad Sohail Tahir, “Reinforcement Learning based Video Summarization with Combination of ResNet and Gated Recurrent Unit,” Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol. 4, pp. 261-268, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Hafiz Burhan Ul Haq, Watcharapan Suwansantisuk, and Kosin Chamnongthai, “An Optimized Deep Learning Method for Video Summarization Based on the User Object of Interest,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 10, pp. 244-256, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Ping Li et al., “Exploring Global Diverse Attention via Pairwise Temporal Relation for Video Summarization,” Pattern Recognition, vol. 111, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Xu Wang et al., “A Video Summarization Model Based on Deep Reinforcement Learning with Long-Term Dependency,” Sensors, vol. 22, no. 19, pp. 1-21, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Cheng Huang, and Hongmei Wang, “A Novel Key-Frames Selection Framework For Comprehensive Video Summarization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 577-589, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Kazuki Kawamura, and Jun Rekimoto, “FastPerson: Enhancing Video-Based Learning through Video Summarization that Preserves Linguistic and Visual Contexts,” Proceedings of the Augmented Humans International Conference, Melbourne, Australia pp. 205-216, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Obada Issa, and Tamer Shanableh, “Static Video Summarization Using Video Coding Features With Frame-Level Temporal Subsampling and Deep Learning,” Applied Sciences, vol. 13, no. 10, pp. 1-17, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Jia-Hong Huang et al., “Causalainer: Causal Explainer for Automatic Video Summarization,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2630-2636, 2023.
[Google Scholar] [Publisher Link]
[27] Ui Nyoung Yoon, Myung Duk Hong, and Geun-Sik Jo, “Unsupervised Video Summarization Based on Deep Reinforcement Learning with Interpolation,” Sensors, vol. 23, no. 7, pp. 1-15, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Arati Kushwaha et al., “Human Activity Recognition Based on Video Summarization and Deep Convolutional Neural Network,” The Computer Journal, vol. 67, no. 8, pp. 2601-2609, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Vrushali Raut, and Reena Gunjan, “Transfer Learning Based Video Summarization in Wireless Capsule Endoscopy,” International Journal of Information Technology, vol. 14, no. 4, pp. 2183-2190, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Tianrui Liu et al., “Video Summarization through Reinforcement Learning with a 3D Spatio-Temporal U-Net,” IEEE Transactions on Image Processing, vol. 31, pp. 1573-1586, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Rabbia Mahum et al., “A Robust Framework to Generate Surveillance Video Summaries using Combination of Zernike Moments and r Transform and Deep Neural Network,” Multimedia Tools and Applications, vol. 82, no. 9, pp. 13811-13835, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Ye Yuan, and Jiawan Zhang, “Unsupervised Video Summarization via Deep Reinforcement Learning with Shot-Level Semantics,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 1, pp. 445-456, 2022.
[CrossRef] [Google Scholar] [Publisher Link]