Augmenting Intelligent Document Processing (IDP) Workflows with Contemporary Large Language Models (LLMs)

  IJCTT-book-cover
 
         
 
© 2023 by IJCTT Journal
Volume-71 Issue-10
Year of Publication : 2023
Authors : Shreekant Mandvikar
DOI :  10.14445/22312803/IJCTT-V71I10P110

How to Cite?

Shreekant Mandvikar, "Augmenting Intelligent Document Processing (IDP) Workflows with Contemporary Large Language Models (LLMs) ," International Journal of Computer Trends and Technology, vol. 71, no. 10, pp. 80-91, 2023. Crossref, https://doi.org/10.14445/22312803/IJCTT-V71I10P110

Abstract
The current decade has witnessed an explosion in the volume of documents generated by businesses, academic institutions, and other organizations. Managing, analyzing, and extracting value from this vast array of documents has become challenging. Integration of Large Language Models (LLMs) into intelligent document processing can significantly address this challenge. This research explores the contributions of Large Language Models (LLMs) in enhancing the various stages of the Intelligent Document Processing (IDP) workflow. Specifically, how LLMs can enhance each step of the IDP offered on AWS. In the initial document classification stage of the workflow, LLMs can offer an improved semantic-based and hierarchical classification of documents. However, this can introduce challenges such as overfitting, bias, and increased computational overhead. During the document extraction stage, LLMs provide benefits in contextual interpretation, cross-referencing data, and data transformation. In the review & validation stage, LLMs can augment human efforts by offering automated suggestions and anomaly detection, although this can sometimes result in false alarms. In the document enrichment stage, LLMs contribute by offering contextual enrichment, better sentiment analysis, and topic modeling but risk over-enriching data. In the data integration stage, LLMs can synthesize data for consistency, generate automated narratives, and facilitate API interactions for smoother integration. Across these different stages, LLMs are subject to limitations like increased computational costs, dependency on training data for specialized tasks, and latency in real-time operations.

Keywords
Artificial intelligence-driven document enrichment, Intelligent document classification, Intelligent data extraction, Intelligent document processing, Large language models, Semantic understanding.

Reference

[1] Ashish Vaswani et al., “Attention is All You Need,” Advances in Neural Information Processing Systems, 2017.
[Google Scholar] [Publisher Link]
[2] Yongchao Zhou et al., “Large Language Models are Human-Level Prompt Engineers,” arXiv, pp. 1-43, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Wayne Xin Zhao et al., “A Survey of Large Language Models,” arXiv, pp. 1-97, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Ann Yuan et al., “Wordcraft: Story Writing with Large Language Models,” 27th International Conference on Intelligent User Interfaces, pp. 841–852, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Jie Huang, and Kevin Chen-Chuan Chang, “Towards Reasoning in Large Language Models: A Survey,” arXiv, pp. 1-15, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Blaise Aguera y Arcas, “Do Large Language Models Understand Us?,” Daedalus, vol. 151, no. 2, pp. 183-197, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Murray Shanahan, “Talking about Large Language Models,” arXiv, pp. 1-13, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Jiaxin Huang et al., “Large Language Models Can Self-Improve,” arXiv, pp. 1-19, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Steven T. Piantadosi, and Felix Hill, “Meaning without Reference in Large Language Models,” arXiv, pp. 1-8, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Matthias C. Rillig et al., “Risks and Benefits of Large Language Models for the Environment,” Environmental Science and Technology, vol. 57, no. 9, pp. 3464-3466, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Arun James Thirunavukarasu et al., “Large Language Models in Medicine,” Nature Medicine, vol. 29, pp. 1930–1940, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Yupeng Chang et al., “A Survey on Evaluation of Large Language Models,” arXiv, pp. 1-42, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Gpt Generative Pretrained Transformer, Almira Osmanovic Thunstrom, and Steinn Steingrimsson, “Can GPT-3 Write an Academic Paper on Itself, with Minimal Human Input?,” HAL Open Science, pp. 1-8, 2022.
[Google Scholar] [Publisher Link]
[14] Fiona Fui-Hoon Naha et al., “Generative AI and ChatGPT: Applications, Challenges, and AI-Human Collaboration,” Journal of Information Technology Case and Application Research, vol. 25, no. 3, pp. 277–304, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Geoffrey James, “Artifical Intelligence and Document Processing,” Proceedings of the 5th Annual International Conference, pp. 8-12, 1986.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Q. Zhu, and J. Luo, “Generative Pre-Trained Transformer for Design Concept Generation: An Exploration,” Proceedings of the Design Society, vol. 2, pp. 1825–1834, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Valentinas Gružauskas, and Diwakaran Ragavan, “Robotic Process Automation for Document Processing: A Case Study of a Logistics Service Provider,” Journal of Management, vol. 36, no. 2, pp. 119-126, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Graham A. Cutting, and Anne-Francoise Cutting-Decelle, “Intelligent Document Processing Methods and Tools in the Real World,” arXiv, pp. 1-28, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Arman Cohan et al., “Overview of the Third Workshop on Scholarly Document Processing,” Association for Computational Linguistics, pp. 1-6, 2022.
[Google Scholar] [Publisher Link]
[20] Muthu Kumar Chandrasekaran et al., “Overview of the First Workshop on Scholarly Document Processing (SDP),” Association for Computational Linguistics, pp. 1–6, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Thomas Hegghammer, “OCR with Tesseract, Amazon Textract, and Google Document AI: a Benchmarking Experiment,” Journal of Computational Social Science, vol. 5, pp. 861-882, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Dipali Baviskar, Swati Ahirrao, and Ketan Kotecha, “Multi-Layout Unstructured Invoice Documents Dataset: A Dataset for TemplateFree Invoice Processing and its Evaluation Using AI Approaches,” IEEE Access, vol. 9, pp. 101494-101512, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Sk Md Obaidullah et al., Document Processing Using Machine Learning, CRC Press, 2019.
[Google Scholar] [Publisher Link]
[24] Yuan Y. Tang, Seong-Whan Lee, and Ching Y. Suen, “Automatic Document Processing: A survey,” Pattern Recognit, vol. 29, no. 12, pp. 1931-1952, 1996.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Dipali Baviskar, Swati Ahirrao, and Ketan Kotecha, “A Bibliometric Survey on Cognitive Document Processing,” Library Philosophy and Practice (e-journal), pp. 1-31, 2020.
[Google Scholar] [Publisher Link]
[26] Tan Yue et al., “DWSA: An Intelligent Document Structural Analysis Model for Information Extraction and Data Mining,” Electronics, vol. 10, no. 19, pp. 1-16, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Jason T.L. Wang, and Peter A. NG, “Texpros: An Intelligent Document Processing System,” International Journal of Software Engineering and Knowledge Engineering, vol. 2, no. 2, pp. 171-196, 1992.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Xufeng Ling, Ming Gao, and Dong Wang, “Intelligent Document Processing Based on RPA and Machine Learning,” 2020 Chinese Automation Congress (CAC), IEEE, pp. 1349–1353, 2020.
[CrossRef] [Google Scholar] [Publisher Link]