Column-based Open Source Data Warehouse Overview

Business managers are increasingly aware that there is an urgent requirement for business intelligence (BI) that will spot trends about customers, products, markets and suppliers to identify new business opportunities, manage risk or formulate strategic initiatives that can lead to increased profitable revenues.

Furthermore in the current recession companies should consider upgrading their BI capabilities to enable them to explore the “how, why and when” and not just the “what” of the data that is stored within their organisations. What is needed is advanced data analytics discover useful patterns within rapidly changing customer buying preferences to create cross-marketing opportunities and leading to increased customer revenues.

At the heart of advanced analytics is the Data Warehouse (DW). However most data warehouse implementations were not designed for advanced data analytics. Advanced data analytics requires fast access to data so that the data mining and statistical queries can be conducted in minutes rather than hours.

Traditional Data Warehouse Problems
The traditional data warehouse suffered from a number of problems:
  • High licensing and storage costs meant that only large companies could afford to implement data warehouse solutions.
  • Large and expensive database administrators and support staff required to tune and manage the data warehouse so that it can supply the constantly changing BI information
  • Recent requirements for predictive analytics have resulted in very slow query performance. This has caused business users frustration as they seek to quickly test their business models.
  • Small and medium size companies often could not afford to implement BI and data warehouse solutions and were therefore at a considerable competitive disadvantage.

What was required was a new type of data warehouse that was both very fast for performing advanced data analytical queries, and, at the same time had a low TCO that ensured that it could be used by both SME and large companies.

Infobright architecture diagram

The Rise of the Open Source Column-Oriented Data Warehouse
Infobright column-oriented data warehouse technology combines a column-oriented database with its Knowledge Grid architecture to produce a very fast, low cost (suitable for both SME and large companies) analytical data warehouse.

Whilst other data warehouse products require extensive IT support to create indexes and partition data, Infobright technology is self-managed and capable of producing fast analytical answers on the fly. Infobright solution leverages MySQL technology and scales to 50TB+ on a single server and with a data compression (10-1 up to 40-1) to significantly reduce data storage costs.

Infobright Technology
The Infobright Analytic Data Warehouse is based on the following four concepts:
  • Column oriented
  • Data Packs
  • Knowledge Grid
  • The Optimizer

Column Oriented
Infobright performance is based on its column-oriented database technology where data is stored column-by-column instead of the traditional row-by-row. Most queries only involve a subset of the columns of the tables therefore only those columns answering the query need be retrieved. This leads to a significant savings on disk I/O. Furthermore data stored column-by-column can be stored at much higher levels of compression than row-by-row.

Data Packs and Knowledge Grid
Data is stored in 65K Data Packs. Where each Data Pack Node contains a set of statistics about the data that is stored (in compressed form)in each Data Node. The Knowledge Node provides additional set of meta-data on the Data Packs or column relationships.

The Data Packs and Knowledge Nodes form the Knowledge Grid and are automatically created and managed by the system. This is the key for the low maintenance costs because there are no traditional database indexes that require ongoing management.

The Optimizer
The Optimizer uses the Knowledge Grid to determine the minimum set of Data Pack that need to be decompresses in order to get the results for a given query.

Infobright Compression
Infobright technology can significantly reduce the costs data storage. For example a 10 TB of raw data can be stored in 1 TB space (or even less) on average (including the overhead associated with Data Pack Nodes and the Knowledge Grid. Other data base technology increase by a factor of 2 or more because of the overhead required to store indexes and other special structures.

Infobright Architecture

Infobright MySQL Integration
Infobright Data warehouse technology is deployed as a storage engine for MySQL. Infobright therefore leverages MySQL and therefore can automatically connect to all the BI tools that connect to MySQL including Business Objects, Cognos, Pentaho, Jaspersoft

Azinta Infobright Solutions
Azinta partner's with Infobright to deliver affordable Data Warehouse solutions. For further information contact: sales@azinta.com

  
 

Data Warehouse Disrupter
Open Source Data Warehouse has been described by Claudia Imhoff, a leading industry guru, as a disruptive force.

To see why you should consider using Enterprise Open Source Data Warehouse read Claudia Imhoff Report

 

Azinta solutions are designed to reduce costs, increase sales revenues and shareholder value.

 

We monitor pre-implementation ROI projections with post implementation ROI results ensuring that our solutions deliver as promised.

Infobright Partner Logo
 
 
Copyright © Azinta Systems Ltd, 2008-2009 About Us   |   Services   |   Managed Services   |   Associates   |   Contacts