Extract, Transform, and Load (ETL) is a process that is typically used to create and maintain Data Warehouse (DWH). ETL represents a three step process of Extract from source table (/s), Transform that data to conform to the target data structures, and Load into the Target database. However, creating a Data Warehouse usually consists of multiple intermediate steps as indicated in the diagram below
In this blog, I shall discuss some key challenges that are inherent to Data Warehouse testing and why it is different than regular testing of transactional systems including databases.
- Data Volume – The volume of data running through ETL processes is typically in the tune of millions of transactions/data records. Data Volume poses a significant problem in terms writing up SQL queries to validate the data and then manually reviewing the results in Excel spreadsheets
- Multiple Data Owners– Data could be from a list of source systems that are on different technologies, driven by different rules for data governance, and more importantly having different masters. It is not just sufficient to detect non-conformances to the defined business requirements of the Data Warehouse; it is more critical to identify the source of the defect. It is then a different matter altogether on whether that data condition can be corrected or not – leading to data exceptions that you now need to continue to manage
- Test Bed Availability – The two points discussed above primarily contribute to this issue. Creating a test bed that has a representative sample of data from all the data sources is a challenge, faced by most organizations. Not having the right sample set to test against significantly increases the risk of defects leaking into production
- Dynamic Data Governance Rules – Over time, the source systems can change or the data providers may be replaced. Therefore, it is critically important for the testing processes to be nimble enough to accommodate drastic changes
- Testing Effort Estimation – In many cases, the organization is challenged by its lack of knowledge of the source data and its quality. Therefore, it is an expensive exercise to develop comprehensive business requirements/data rules document. As described above the testing effort here is more than validating whether the ETL process developed is according to the business rules. Identifying data issues and fixing the ETL processes to resolve them, make up for bulk of the build effort. Testing effort has to account for these peaks and troughs in ETL testing cycle
ETL Testing is NOT like Testing of Transactional Database Testing
In the next blog I shall discuss the ETL Testing Types in detail, some testing types and techniques.
Santosh, a Testing thought-leader at Mphasis, has over 20 years of experience consulting with Fortune 500 enterprises especially in the Financial Services and Insurance industries. He has successfully led Managed Services Testing programs through across phases of pre-sales, solution development, contract negotiation, transition, and delivery. Driving performance of Testing organizations using a metrics-driven approach, implementing methodologies to improve testing effectiveness, and building high-performing testing teams are areas that Santosh specializes in. He can be reached at Santosh.Subramanian@mphasis.com