International Journal of Computer
Trends and Technology

Research Article | Open Access | Download PDF

Volume 73 | Issue 4 | Year 2025 | Article Id. IJCTT-V73I4P119 | DOI : https://doi.org/10.14445/22312803/IJCTT-V73I4P119

Implementing Enterprise-Wide Lakehouse using Microsoft Azure Databricks and Delta Lake


Mehul K Bhuva

Received Revised Accepted Published
20 Mar 2025 18 Apr 2025 23 Apr 2025 30 Apr 2025

Citation :

Mehul K Bhuva, "Implementing Enterprise-Wide Lakehouse using Microsoft Azure Databricks and Delta Lake," International Journal of Computer Trends and Technology (IJCTT), vol. 73, no. 4, pp. 135-139, 2025. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V73I4P119

Abstract

This article presents a practical and scalable approach for implementing an enterprise-wide Lakehouse using Azure Databricks and Delta Lake. As data grows in volume, variety, and velocity, organizations need a unified platform that combines the reliability of data warehouses with the scalability of data lakes. The Lakehouse paradigm fulfills this by enabling transactional data lakes with support for both analytical and operational workloads. This paper discusses the architecture, key components, implementation strategies, and real-world considerations for building such systems in Azure. The results showcase improved data governance, reduced duplication, and faster insights. This architecture has broad implications for digital transformation and advanced analytics.

Keywords

Azure databricks, Data lakehouse, Data pipeline, Delta lake, Enterprise data architecture..

References

[1] Matei Zaharia et al., “Apache Spark: A Unified Engine for Big Data Processing,” Communications of the ACM, vol. 59, no. 11, pp. 56 65, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Michael Armbrust et al., “Delta Lake: High-Performance ACID Table Storage Over Cloud Object Stores,” Proceedings of the VLDB Endowment, vol. 13, no. 12, pp. 3411-3424, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Databricks Documentation, Lakehouse Architecture, 2023. [Online]. Available: https://docs.databricks.com/lakehouse/
[4] Microsoft, Azure Data Lake Storage Gen2 Documentation, 2023. [Online]. Available: https://learn.microsoft.com/en us/azure/storage/blobs/data-lake-storage-introduction
[5] Unity Catalog Documentation, Databricks, 2023. [Online]. Available: https://docs.databricks.com/data-governance/unity catalog/index.html
[6] Konstantin Shvachko et al., “The Hadoop Distributed File System,” IEEE 26th Symposium on Mass Storage Systems and Technologies, Incline Village, NV, USA, pp. 1-10, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Daniel E. O'Leary, “Embedding AI and Crowdsourcing in the Big Data Lake,” IEEE Intelligent Systems, vol. 29, no. 5, pp. 70-73, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Natalia Miloslavskaya, and Alexander Tolstoy, “Big Data, Fast Data and Data Lake Concepts,” Procedia Computer Science, vol. 88, pp. 300-305, 2016.
[CrossRef] [Google Scholar] [Publisher Link]