6/18/2023 0 Comments Aws data architectThese attributes are maintained on siloed databases at the source system as their usage is directly linked to the operation of the source system. During this integration process, data from various source systems are fed into a centralized data lake, often resulting in the same attributes having different values and formats. The second type of data quality issue is introduced when data siloes are being integrated into the data lake. Additionally, machine learning capabilities such as computer vision and optical character recognition (OCR) could be used to automatically extract a customer’s name, address, and birth date from their driver license, and therefore, minimizing errors resulting from manual data entry. For example, if the majority of customer birth dates are null, the source system’s application can be updated so that it requires a valid date of birth before allowing a customer to be onboarded. Another form of remediation would be to leverage data quality metrics to enhance business processes. One example of remediation would be scrubbing, which includes reaching out to the customers to confirm certain information or referencing the original onboarding documents. This type of data quality issue should be assigned to the source application owner and be remediated at the source system. The first is operational where information such as a customer’s birth date or address is entered incorrectly by a person or a system. There are two types of data quality issues that can arise in a data lake. In this blog we will be focused on data quality. As a result, the original issue is not fixed, it just resurfaces on a new platform.Ī data governance framework consists of multiple components, including data quality, data ownership, data catalog, data lineage, operation, and compliance. Without governance, a data lake becomes a data swamp where data continues to be consumed in a siloed manner without consistency and accuracy. Data governance refers to a framework that includes people, processes, and technology that enables business users to work collaboratively with technologists to drive clean, certified, and trusted data. One of the key components to creating and maintaining an effective data lake is data governance. In order to properly track, trace, and uncover such activities, a financial institution should collect and centralize transactions from all lines of business and their respective applications, create a 360-degree view of the customer, product, and transactions, and apply AML detective analytical scenarios. Criminals use money laundering techniques to conceal their activities. A data lake allows organizations to break down data silos and store all of their data – structured, semi-structured, and unstructured – in a centralized repository at any scale.Īs an example, in order to comply with Anti Money Laundering (AML) compliance requirement, a financial institution needs to detect and report suspicious activities, such as security fraud and market manipulation. Financial institutions such as FINRA, Nasdaq, and National Australia Bank have built data lakes on AWS to collect, store, and analyze increasing amounts of data at speed and scale.
0 Comments
Leave a Reply. |