Big Data Architectural Pattern to Ingest Multiple Sources and Standardization to Immune Downstream Applications

  IJCTT-book-cover
 
         
 
© 2020 by IJCTT Journal
Volume-68 Issue-1
Year of Publication : 2020
Authors : Imran Quadri Syed
DOI :  10.14445/22312803/IJCTT-V68I1P102

How to Cite?

Imran Quadri Syed, "Big Data Architectural Pattern to Ingest Multiple Sources and Standardization to Immune Downstream Applications," International Journal of Computer Trends and Technology, vol. 68, no. 1, pp. 5-10, 2020. Crossref, https://doi.org/10.14445/22312803/IJCTT-V68I1P102

Abstract
In today’s era where organizations are handling large volume of varying data to meet their business needs. Also, Organizations receive data from numerous sources for the same data domain but in different layouts and formats. In this article we will go over a Big data architectural pattern that immunes traditional downstream system of any change to source system. This is achieved by Datahub (big data) by ingesting data from different sources, standardize to denormalized canonical form, integrate with reference data, reject reprocess and publish extract using big data technologies like hive, impala to traditional downstream systems. This article also discusses how key management service (KMS) is utilized to identify latest iteration of a record and to achieve easier querying and then generating standard publications for downstream systems.

Keywords
Big data, Data Ingestion, Data Integration, Standardization, Reject Reprocessing, Architecture, Key Management Service (KMS), Hive, Impala, Datahub, publisher subscriber pattern.

Reference
[1] John Russell (2014). Getting Started with Impala. Publisher O’Reily Media, Inc ISBN: 9781491905777
[2] Li, N., &Mahalik, N. (2019). A big data and cloud computing specification, standards and architecture: agricultural and food informatics. International Journal of Information and Communication Technology, 14(2), 159- 174.
[3] James Le, An Introduction to Big Data: Data Integration : Published at Medium.com
[4] Ruojing Zhang, Marta Indulska, Shazia Sadiq “Discover Data Quality Problems” published in Business and Information Systems Engineering journal in July 2019
[5] Atif Mohammad, Hamid Mcheick, Emanuel Grant “Big Data Architecture Evolution: 2014 and Beyond” published in Association for Computing Machinery in September 2014
[6] Mohammed M.A, Bartholomew E “Big Data Performance Analysis In Apache And Internet Information Services” published in International Journal of Computer Trends and Technology in November 2019
[7] Eric Huey “Cloud Computing-Challenges and Benefits” published in International Journal of Computer Trends and Technology in September 2019