|
Open Source Extract-Transform-Load (ETL) Overview
The ETL (Extract-Transform-Load) are one of the most critical processes of BI and Data Warehouse applications.
What is ETL?
The Extract-Transform and Load process consists of the following three sub-processes which are used to transfer data from production systems to the data warehouse where they are consumed by BI applications:
- Extraction of the data from production application and databases.
- Transform the data to reconcile it across source systems, including required data cleaning. Also the data is transformed to meet the requirements of the target systems (Star Schema, Slowly Changing Dimensions, etc).
- Loading of the transformed data into the Data Warehouse, Data Marts and other BI applications.
Traditionally most proprietary data integration and ETL offerings were designed to be used for large projects such as data warehouse or master data management. These offerings typically had expensive licenses and required teams of experts to implement under long consulting engagements and could only be afforded by large companies.
The small to medium sized projects could not afford such large upfront ETL investment and tended to use custom code but with increased risk and maintenance costs.
Recently there has been an alternative and that is the rise of Open Source ETL software. The two leaders in this field are Pentaho Data Integration and Talend Open Studio software products. This overview will focus on the capabilities of Talend Enterprise version known as Talend Data Integration Suite.

Talend Enterprise Data Integration Suite
Talend Open Source Enterprise Integration Suite contains the key features that one would expect from proprietary alternatives at a low subscription support costs and without any upfront licensing costs. The key features include:
- Business Modeller – provides a top-down GUI approach for the design of the ETL integration processes from a business perspective.
- Job Designer – A graphical and functional designer of the actual ETL processes using a graphical palette of components and connectors
- Metadata Manager – provides a metadata repository that centralises all aspects of design and execution.
- Data Cleaning Profiling capabilities (offered as part of Talend Data Profiling and Cleaning)
- Job Conductor – coordinates and schedules the execution of all jobs, including event-based scheduling for real-time integration
- Grid Conductor – distributes jobs across an execution grid of execution servers and performs automatic load balancing and fail over.
- Execution Monitoring performed via:
Activity Monitoring Console that monitors job execution events (successes, failures, etc), execution times and data volumes.
- Activity Monitoring Dashboard that provides a business oriented view of the Activity Monitoring Console via a web interface and using real-time gauges and status indicators. Business managers will be able to view the current and historical status and data associated with any job.
Azinta ETL Services
Azinta Data Warehouse and BI solutions use Talend Enterprise Integration Suite to significantly cut the costs of the ETL process when compared with proprietary offerings. This enables the SME and large companies to make significant cost savings on their BI and data integration investments. For further information contact: sales@azinta.com
|
| |
Data Warehouse Disrupter
Open Source Data Warehouse has been described by Claudia Imhoff, a leading industry guru, as a disruptive force.
To see why you should consider using Enterprise Open Source Data Warehouse read Claudia Imhoff Report |
|
| |
Azinta solutions are designed to reduce costs, increase sales revenues and shareholder value. |
| |
We monitor pre-implementation ROI projections with post implementation ROI results ensuring that our solutions deliver as promised. |
|
 |
|
 |
|