A few days ago I wrote about Infobright’s column-based storage engine, and compared the sizes of raw text data file, gzipped file, MyISAM files, and Infobright files. At that time, I also wanted to compare that against data compression in SQL Server 2008, which is a new feature. But the Windows cloud server instance I fired up at the time didn’t have enough disk space, so I temporarily aborted that endeavour, until today.
Once again, testing data was generated using TPC-H’s dbgen tool. In fact I took the same steps outlined here. The total raw text file size is around 8.8 GB. I then created 2 SQL Server tables without any index (heap), one without compression and one with page compression. The DDL used is based on DDL listed in this post, without the indexes.
I used BULK INSERT for data loading. Here is the statement used:
[sourcecode language=”sql”]
BULK INSERT [testDb].[dbo].[LINEITEM]
FROM ‘C:\Users\Administrator\Documents\lineitem.tbl’
WITH(CHECK_CONSTRAINTS,CODEPAGE=’RAW’,DATAFILETYPE=’char’,
FIELDTERMINATOR=’|’,ROWTERMINATOR=’0x0a’)
[/sourcecode]
I then calculated storage space taken with sp_spaceused stored procedure.
Here is the results in GB:
Raw Text: 8.8
Raw Text After GZIP: 2.6
Uncompressed SQL Server table data size: 9.5
Compressed SQL Server table data size: 7.4
For clarity’s sake, here is the results with Infobright and MyISAM on Linux:
MyISAM File Size: 7.2
InfobrightFileSize: 1.5
SQL Server 2008 provides a system stored procedure that gives a size estimation if compression is used for a non-compressed table/index.The estimated size of a compressed table is 5.4 GB. Comparing it with 7.4, the estimation appears to be optimistic, in this case.