As referenced in my previous blog, ETL testing is quite different than testing of transactional systems, and consequently testing has to be approached in a way so as to address some of the challenges as well as nuances of ETL projects. In this blog, we shall discuss some of the Testing Types and techniques associated with Testing in ETL and Data Warehousing projects.
While the table below represents the most common types of testing, it is not mandated to conduct every type of test for an ETL project. Based on the anticipated risk areas associated with the ETL project it is important to pick the right subset from this list below. To that end, the determining factors for when the testing type is required has also been indicated.
Considering that ETL processes deal with heterogeneous data sources and typical involve large volumes of data, performance testing of the ETL process is critical. Performance testing should validate that the following are within acceptable performance limits in the context of the overall integrated system the upstream and downstream systems. Some performance testing focus areas are:
Scalability testing validates if the system can sustain the growth of data volume and what are its impacts. It should also address the expected increase in the number of users accessing this data. Some of the test scenarios that covers scalability testing are:
The typical process to conduct a performance or scalability test is:
As it is with every other system, Data Warehouse is a living system – as the environment around it changes it has to change as well otherwise, it will lose relevance and integrity of the data it holds. A well-defined regression test suite should be created based on priority business scenarios, but does not need to include all of the different functional tests listed above.
Tagging the regression test case with the data source and the data-warehouse entities it validates, makes it easier to identify the test cases that need to be part of a regression test run for a new release. When doing regression testing, it is much quicker to compare execution results from a previous run
In the next blog in this series, we will discuss the skills required for ETL testing and some pitfalls that we should avoid
Santosh Subramanian
Santosh, a Testing thought-leader at Mphasis, has over 20 years of experience consulting with Fortune 500 enterprises especially in the Financial Services and Insurance industries. He has successfully led Managed Services Testing programs through across phases of pre-sales, solution development, contract negotiation, transition, and delivery. Driving performance of Testing organizations using a metrics-driven approach, implementing methodologies to improve testing effectiveness, and building high-performing testing teams are areas that Santosh specializes in. He can be reached at Santosh.Subramanian@mphasis.com