I’ve used Vertica, a commercial column-based database storage engine, and was reasonably impressed. During O’Reilly MySQL conference last month, I checked out Infobright’s vendor booth and talked with some users. I became curious and wanted to test it out. Infobright has a free community version (ICE Infobright Community Edition)of its column-based storage engine that works with MySQL, which was what I used for my testing. I have no relationship with Infobright whatsoever, I happen to think that column-based storage can be a potentially disruptive technology in the BI/DW field. I’d love to hear your comments/experiences.
Here are some noteworthy points:
1. Setup is pretty easy. You can follow steps here. Note that the package has most relavant MySQL tools. A separate install of mysql client and server is NOT needed.
The ICE package has the following storage engines bundled:
2. I used TPC-H’s dbgen tool to generate data for testing. The raw text file is around 8.8 GB, about 72 million rows;
3. I used Rackspace’s cloud server, CentOS 5.5, 1 GB memory, 64-bit, 35 GB of hard drive space for testing. I created 2 databases, each with one table called lineitem: one table uses the BRIGHTHOUSE storage engine, the other uses the MyISAM storage engine. No index on the MyISAM table initially;
4. On this particular Rackspace server, below is the record of how long it took to load that amount of data into BRIGHTHOUSE table:
# time mysql-ib infobright < load.sql real 22m46.974s user 0m2.320s sys 0m16.140s [/sourcecode] And here is the record of how long it takes to load into the MyISAM table: [sourcecode language="text"] # time mysql-ib test < load.sql real 6m11.966s user 0m1.960s sys 0m14.420s [/sourcecode] Here is what's inside load.sql: [sourcecode language="text"] load data local infile '/root/dbgen/lineitem.tbl' into table lineitem fields terminated by '|' lines terminated by '\n'; [/sourcecode] 5. Size comparison, in GB. Here you can see the power of the impressive compression rate of a column-based storage engine: Raw Text: 8.8 Raw Text After GZIP: 2.6 MyISAM File Size: 7.2 InfobrightFileSize: 1.5 6. I did a rudimentary performance comparison. The first one is on Infobright table, the second on MyISAM table: [sourcecode language="text"] mysql> select count(*) from lineitem where l_shipdate between '1993-01-01' and '1995-01-01'; +----------+ | count(*) | +----------+ | 21880025 | +----------+ 1 row in set (15.06 sec) mysql> use test; Database changed mysql> select count(*) from lineitem where l_shipdate between '1993-01-01' and '1995-01-01'; +----------+ | count(*) | +----------+ | 21880025 | +----------+ 1 row in set (1 min 9.23 sec)
I then created an index on l_shipdate for the MyISAM table, and improved the performance to a bit more than 10 seconds.