Navigating Integration Complexities in Hybrid BI and Data Lake Architectures

  IJCTT-book-cover
 
         
 
© 2024 by IJCTT Journal
Volume-72 Issue-10
Year of Publication : 2024
Authors : Savio Dmello
DOI :  10.14445/22312803/IJCTT-V72I10P127

How to Cite?

Savio Dmello, "Navigating Integration Complexities in Hybrid BI and Data Lake Architectures," International Journal of Computer Trends and Technology, vol. 72, no. 10, pp. 199-205, 2024. Crossref, https://doi.org/10.14445/22312803/IJCTT-V72I10P127

Abstract
In today’s complex business environments, large organizations often operate within hybrid and heterogeneous IT system landscapes, integrating a range of on-premises and cloud-based business intelligence (BI) and data lake applications. These include platforms such as Databricks, Snowflake, SAP Analytics Cloud (SAC), SAP Datasphere, SAP BW/4HANA, and reporting tools like Microsoft Power BI and Tableau. Such environments cater to diverse business needs and require seamless integration to support complex data modeling, reporting, forecasting, and predictive analytics. However, integrating these systems, each with variations in SQL dialects, architecture, and data models, presents significant challenges, including issues related to user authentication, data security, data aggregation, and maintaining formula consistency and calculation behaviors across different processing layers. This study examines the inconsistencies and issues within these hybrid and heterogeneous environments, focusing on how integration across non-native systems can lead to discrepancies in data structure, query syntax, authentication protocols, and semantics. These differences can result in inaccurate calculations, negatively affecting algorithms, data accuracy, system security and query execution plans. The findings underscore the implications of these integration challenges, highlighting the need for a comprehensive redesign of data flows and calculation logic in hybrid and heterogeneous landscapes. This study also proposes recommendations to improve integration and ensure reliable data reporting outcomes in these environments.

Keywords
Aggregation, data security, Query execution plan, System Authentication, Enterprise Cloud and Premises systems, Business Intelligence, Formula collision, SAP, Databricks, Snowflake, System integration.

Reference

[1] Qing Li et al., “Applications Integration in a Hybrid Cloud Computing Environment: Modelling and Platform,” Enterprise Information Systems, vol. 7, no. 3, pp. 237-271, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Surabhi Saxena et al., “Hybrid Cloud Computing for Data Security System,” 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India, pp. 1-8, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Zhenxiao Luo et al., “From Batch Processing to Real Time Analytics: Running Presto® at Scale,” 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, pp. 1598-1609, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Nada Chendeb Taher et al., "An IoT-Cloud Based Solution for Real-Time and Batch Processing of Big Data: Application in Healthcare," 2019 3rd International Conference on Bio-Engineering for Smart Technologies (BioSMART), Paris, France, pp. 1-8, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Ramez Elmasri, and Shamkant B. Navathe, Fundamentals of Database Systems, Global Edition, Pearson Education, pp. 1-1272, 2016.
[Google Scholar] [Publisher Link]
[6] Fotis Savva, Christos Anagnostopoulos, and Peter Triantafillou, "Explaining Aggregates for Exploratory Analytics," 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, pp. 478-487, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Sanket Tavarageri et al., “A Data Analytics Framework for Aggregate Data Analysis,” Arxiv, pp. 1-10, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Navin Kabra, and David Johns DeWitt, “Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans,” Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle Washington USA, pp. 106-117, 1998.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Saeed Shahrivari, "Beyond Batch Processing: Towards Real-Time and Streaming Big Data," Computers, vol. 3, no. 4, pp. 117-129, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[10] E. Bertino, and R. Sandhu, "Database Security - Concepts, Approaches, and Challenges," IEEE Transactions on Dependable and Secure Computing, vol. 2, no. 1, pp. 2-19, 2005.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Yuxiang Gao, and Peng Zhang, "A Survey of Homogeneous and Heterogeneous System Architectures in High-Performance Computing," 2016 IEEE International Conference on Smart Cloud (SmartCloud), New York, NY, USA, pp. 170-175, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Kisung Park et al., “ProcAnalyzer: Effective Code Analyzer for Tuning Imperative Programs in SAP HANA,” Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland OR USA, pp. 2709-2712, 2020.
[CrossRef] [Google Scholar] [Publisher Link]