Script to generate index rebuild with PAGE compression

For BI data warehouse databases, since the data does not change much and they typically require a lot of space, it makes a lot of sense to compress the indexes to save space.

I came across some BI databases whose indexes were created without compression. We are in the process of migrating those databases to a new server so I took this opportunity to completely rebuild those indexes with PAGE compression.

Two things are of interest:

  • Index rebuild starts from the smallest to the largest. The rational is that during one index rebuild, it needs roughly twice of the actual index size. If we start with the largest index, it may need to expand the file size unnecessarily. On the other hand, if we start with the smallest one, there might be enough space inside to accommodate that rebuild. Once that rebuild is done, more space will be saved, leaving more room for the next rebuild. This way we’ll be able to accommodate index rebuild with no or minimal additional space requirement;
  • Re-indexing is done on a new server, therefore there are no or very few connections to it, so the script defines the MAXDOP parameter, to hopefully make the process faster.
SELECT 
    s.Name AS SchemaName,
    t.NAME AS TableName,
    i.name AS IndexName,
    'ALTER INDEX ' + i.name + ' ON ' + s.name + '.' + t.name + ' REBUILD WITH (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, MAXDOP = 20);' AS AlterIndex,
    SUM(a.total_pages) * 8 AS TotalSpaceKB, 
    SUM(a.used_pages) * 8 AS UsedSpaceKB, 
    (SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM 
    sys.tables t
INNER JOIN 
    sys.schemas s ON s.schema_id = t.schema_id
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN 
    sys.allocation_units a ON p.partition_id = a.container_id
WHERE 
    t.NAME NOT LIKE 'dt%' 
    AND t.is_ms_shipped = 0
    AND i.OBJECT_ID > 255 
    AND i.index_id > 0
GROUP BY 
    s.Name, t.Name, i.name
ORDER BY 
    UsedSpaceKB

JiMetrics now gathers SQL Server startup account

During the last few days, I’ve refactored JiMetrics and added a new function:

  • Used Pester to create more test cases for PowerShell functions I wrote;
  • Enhanced the design and code so JiMetrics also gathers SQL Server instance’s startup account, which can be useful.

If you don’t know what JiMetrics is, go to this page to find out. It’s is a tool that uses SQL Server to gather important SQL Server metrics in your enterprise. No additional software install, no registry change, no files to copy and move around that pollute your system, it only uses SQL Server, which you already have and it just works! It has been pretty useful to me and my co-workers. JiMetrics is open source and free to use. Check it out and let me know what you think.

I will keep improving this. My next objective is to play with SQL Server 2014 column-store and see if I can convert the collection database tables to column-store.

Removing duplicate rows in small batches based on date column

Due to double scheduling, some duplicate rows were inserted into the Windows.TableStats table in JiMetrics. To confirm that the table has duplicates, here is the T-SQL script I used. Note that this same script should work in all other major RDBMS platforms like MySQL, Oracle, etc. Adjust table and column names to fit your needs.

SELECT HostID,
       InstanceID,
       DbName,
       SchemaName,
       TableName,
       CAST (CollectionDate AS DATE),
       COUNT (*)
  FROM Windows.TableStats
GROUP BY HostID,
	 InstanceID,
         DbName,
         SchemaName,
         TableName,
         CAST (CollectionDate AS DATE)
HAVING COUNT (*) > 1

So the duplicates need to be removed. In Microsoft SQL Server, that can be accomplished with a Common Table Expression (CTE) using the ROW_NUMBER ranking function, which is a pretty elegant solution.

However, when using DELETE to remove potentially large number of rows, say millions, it is always advisable to do that in small batches. Otherwise, you are risking running out of temp, log space, or even disk space, not to mention that you are potentially creating bigger and coarser locks than necessary.

So I decided to remove the duplicates on a daily basis to lessen the impact on the database instance. A runningDate variable is used and incremented by one until the desired end date. Within each iteration of the loop, that day’s duplicates are removed. Tweak it to suit your needs.

I think things like this make a good job interview question for a database administrator.

DECLARE @runningDate DATE
SET @runningDate = '20140101'

WHILE (@runningDate < '20140406')
BEGIN
   WITH dupRows
        AS (SELECT RecordID,
                   ROW_NUMBER ()
                      OVER (PARTITION BY HostID,
                                         InstanceID,
					 DbName,
                                         SchemaName,
                                         TableName,
                                         cast (CollectionDate AS DATE)
                            ORDER BY
                               HostID,
			       InstanceID,
                               DbName,
                               SchemaName,
                               TableName)
                      AS RankID
              FROM Windows.TableStats
             WHERE CAST (CollectionDate AS DATE) = @runningDate)
   DELETE dupRows
    WHERE RankID > 1

   SET @runningDate = DATEADD (DAY, 1, @runningdate)
END

Collecting Windows BIOS and Host Serial Number

After imporving JiMetrics yesterday so it tries to determine if the host is a VM or not, I made another improvement today: collecting host server BIOS related information and serial number.

I don’t know about you, in the past when I needed to gather a Windows server’s BIOS or serial number for troubleshooting, I typically rebooted the machine, pressed whatever function key was necessary to get into the BIOS, and coped things down. Wouldn’t it be nice if we can get it programmatically and store that somewhere for easy reference?

So today I made another enhancement to JiMetrics to address that need. Here is what I did:

  1. Added additional columns to the Windows.Host table: SMBIOSVersion, BIOSReleaseDate, SerialNumber;
  2. I made improvement to the PowerShell script, so it now uses Win32_BIOS to gather and store that in the JiMetrics database.

Together with HardwareVendor and HardwareModel, it will be easy to find the latest BIOS software for your server host. In addition, based on SMBIOSVersion and BIOSRealeaseDate, you can determine if your BIOS is out of date and decide if you want to update the host BIOS accordingly. Based on my past experience, a lot of companies have servers whose BIOS is way out of date. Armed with this information, they won’t be.

With this improvement, it is even more easier for system admins to be on top of things. JiMetrics is not just for SQL Server DBAs, a Windows admin will also find it extremely valuable. What are you waiting for? Go to jimetrics.com and get it 🙂

Determining if a Windows host is a VM in JiMetrics

I’ve been using my own SQL Server metrics collection package called JiMetrics for a couple of years. It is easy to set up. All you need are just two things: SQL Server instance and an account that has admin access to both the servers and instances you care about.

JiMetrics doesn’t do anything that will be hard to undo: no binary install, no registry changes, no cookies and temp files to store states and all that junk that pollutes your environment. Its footprint is small and just quietly collects important metrics data at an interval of your own choosing. Follow steps here and start today. As a system administrator, you will be glad to have the metrics data for analysis and decision making.

Today I made the database a bit better: I changed the database schema by adding a new column, IsVM, to the Windows.Host table. Here is the idea:

  1. IsVM tells you if a particular Windows host is a virtual machine or not;
  2. IsVM can have only one of two values: “Y” and “N”, and is not NULLable;
  3. IsVM is a persisted computed column. The computation is done via the Manufacture property of Win32_ComputerSystem class;
  4. Currently, if the Manufacure is one of “microsoft”, “xen”, or “innotek GmbH”, then IsVM value is “Y”, otherwise it is “N”.

Here is the relevant portion of the computed column definition:

	[IsVM] AS (CASE HardwareVendor 
			WHEN 'innotek GmbH' THEN 'Y'
			WHEN 'microsoft' THEN 'Y'
			WHEN 'xen' THEN 'Y'
			ELSE 'N'
		  END) PERSISTED,

Enjoy! If you have any comments on either of these questions, I’d love to hear them:
1. What other values of Manufacture in Win32_ComputerSystem indicates a VM, other than the three I listed above?
2. Are there better ways to detect if a Windows host is a VM?