social share alt icon
Thought Leadership
Banner Image
May 10, 2016
ETL TESTING – Testing Types and Techniques
Santosh Subramanian

As referenced in my previous blog, ETL testing is quite different than testing of transactional systems, and consequently testing has to be approached in a way so as to address some of the challenges as well as nuances of ETL projects. In this blog, we shall discuss some of the Testing Types and techniques associated with Testing in ETL and Data Warehousing projects.

Functional Testing

While the table below represents the most common types of testing, it is not mandated to conduct every type of test for an ETL project. Based on the anticipated risk areas associated with the ETL project it is important to pick the right subset from this list below. To that end, the determining factors for when the testing type is required has also been indicated.

Performance & Scalability

Considering that ETL processes deal with heterogeneous data sources and typical involve large volumes of data, performance testing of the ETL process is critical. Performance testing should validate that the following are within acceptable performance limits in the context of the overall integrated system the upstream and downstream systems. Some performance testing focus areas are:

  1. Testing of the initial load – this is key to developing the Production deployment plan and identify any risks associated with it
  2. Testing of incremental loads to validate that its performance is within expected performance limits and identify any impact it might have on any existing batch cycle(/s)
  3. Performance of any OLAP cube refreshes
  4. Report and ad-hoc query performance

Scalability testing validates if the system can sustain the growth of data volume and what are its impacts. It should also address the expected increase in the number of users accessing this data. Some of the test scenarios that covers scalability testing are:

  1. Comparative analysis of ETL load timings at different data volumes , including testing it at “x” times current expected load, where x is typically equal to 50, 100 or more
  2. Loading of the data warehouse with significantly large amount of data (10 – 30 times expected growth)

The typical process to conduct a performance or scalability test is:

  1. Validate performance expectations, gather user demand & business growth statistics
  2. Build out performance test environment and identify test scenarios
  3. Develop test scripts and test data and conduct tests
  4. Performance analysis & reporting

Regression Testing

As it is with every other system, Data Warehouse is a living system – as the environment around it changes it has to change as well otherwise, it will lose relevance and integrity of the data it holds. A well-defined regression test suite should be created based on priority business scenarios, but does not need to include all of the different functional tests listed above.

Tagging the regression test case with the data source and the data-warehouse entities it validates, makes it easier to identify the test cases that need to be part of a regression test run for a new release. When doing regression testing, it is much quicker to compare execution results from a previous run

In the next blog in this series, we will discuss the skills required for ETL testing and some pitfalls that we should avoid

Santosh Subramanian

Santosh, a Testing thought-leader at Mphasis, has over 20 years of experience consulting with Fortune 500 enterprises especially in the Financial Services and Insurance industries. He has successfully led Managed Services Testing programs through across phases of pre-sales, solution development, contract negotiation, transition, and delivery.  Driving performance of Testing organizations using a metrics-driven approach, implementing methodologies to improve testing effectiveness, and building high-performing testing teams are areas that Santosh specializes in. He can be reached at