What is ETL?
ETL, if we say it in
simpler terms, it is nothing but “Extract,
Transform and Load”. We know
that testing is an essential part of the design life cycle of any software
product. This is also true that for succeeding in any data warehousing projects,
testing plays a critical role as one can gain user’s confidence only when the
quality of information we provide is up to the mark.
We have also seen these days that Verification and
Validation, when done independently, grabbing business’s attention as they see in
it a huge market potential and business gains as well. Beside these, customers
have been offered different range of products in terms of service providers,
based on the different technologies, processes and solutions, also distributed
in many areas. Hence, ETL testing and Data warehouse testing are continuously
emerging as the best techniques to provide customers the best of the services
and offerings in different areas.

3. L(oading)- Loading is the process to take data from the Staging or Landing area to the final destination known as the ‘Base Database’ or ‘Target System’ .
b) Incremental Load: It requires some data warehousing concepts so that it can maintain partial or full history. SCD that is Slowly Changing Dimensions (which is later discussed in the blog) concept is also involved in this case. Also, the concepts of Insert/Update or Delete/Insert is used.
Terms used in ETL Testing:
1. E (xtraction)- is the operation of selecting data
from a source system for further processing. The extracted data is first
received in a Database space identified as "Staging" area.
It is of
two types:
a) Full Extraction
a) Full Extraction
b)
Incremental Extraction
Let’s
discuss these in detail:
a)
Full
Extraction: It is a type of extraction where whole data from source system
is extracted at ones, it replicates data as it is into the target system. In
this case, target is always empty or truncated. This happens during initial
load of the system and during data migration from one system to another. Data
is cumulated at source.
b)
Incremental
Extraction: In this type of extraction only additional data is extracted
for specific period and then that extracted data is appended or updated in the
target system. In this case, target system is not empty prior to load. Data is
cumulated at target. Here we define a Load
Period Window by ‘Start time’ and ‘End time’. This type of extraction is
also known as “Delta Extraction”
2. T (ransformation)-
is the change data which will undergo as
part of data processing and can be further categorized as follows:
a) Data Cleansing: It means that the data is being cleaned and errors will be fixed.
a) Data Cleansing: It means that the data is being cleaned and errors will be fixed.
There are some processes involved to remove the anomalies
such as:
·
Unwanted spaces
·
Junk characters
·
Aligning data format such as date format
b) Applying
business rules: These form the core transformation and are of following
types:
·
Business rules such as Commission of agent on
the basis of sales
·
Complex equations such as Net Asset values, Net
Profit Margin, Gross Profit Margin
·
Apply calculation such as calculating the profit
given cost price and selling price
Data Quality Checks that can be applied is broadly classified as :
Data Quality Checks that can be applied is broadly classified as :

3. L(oading)- Loading is the process to take data from the Staging or Landing area to the final destination known as the ‘Base Database’ or ‘Target System’ .
There are some important points
which must be kept in mind while doing data loading such as:
As we discussed that the Extraction is done in 2 ways similarly Data loading can also be done in 2 ways:
a) History Load: In this case, the target tables are truncated and entire data is loaded into it. Right utilities should be chosen in this case as it involves huge data load. Hence, there is always some dedicated code written for performing History Loads.
As we discussed that the Extraction is done in 2 ways similarly Data loading can also be done in 2 ways:
a) History Load: In this case, the target tables are truncated and entire data is loaded into it. Right utilities should be chosen in this case as it involves huge data load. Hence, there is always some dedicated code written for performing History Loads.
b) Incremental Load: It requires some data warehousing concepts so that it can maintain partial or full history. SCD that is Slowly Changing Dimensions (which is later discussed in the blog) concept is also involved in this case. Also, the concepts of Insert/Update or Delete/Insert is used.
No comments:
Post a Comment