Archive for Technology

Script to generate index rebuild with PAGE compression

For BI data warehouse databases, since the data does not change much and they typically require a lot of space, it makes a lot of sense to compress the indexes to save space.

I came across some BI databases whose indexes were created without compression. We are in the process of migrating those databases to a new server so I took this opportunity to completely rebuild those indexes with PAGE compression.

Two things are of interest:

  • Index rebuild starts from the smallest to the largest. The rational is that during one index rebuild, it needs roughly twice of the actual index size. If we start with the largest index, it may need to expand the file size unnecessarily. On the other hand, if we start with the smallest one, there might be enough space inside to accommodate that rebuild. Once that rebuild is done, more space will be saved, leaving more room for the next rebuild. This way we’ll be able to accommodate index rebuild with no or minimal additional space requirement;
  • Re-indexing is done on a new server, therefore there are no or very few connections to it, so the script defines the MAXDOP parameter, to hopefully make the process faster.
SELECT 
    s.Name AS SchemaName,
    t.NAME AS TableName,
    i.name AS IndexName,
    'ALTER INDEX ' + i.name + ' ON ' + s.name + '.' + t.name + ' REBUILD WITH (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, MAXDOP = 20);' AS AlterIndex,
    SUM(a.total_pages) * 8 AS TotalSpaceKB, 
    SUM(a.used_pages) * 8 AS UsedSpaceKB, 
    (SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM 
    sys.tables t
INNER JOIN 
    sys.schemas s ON s.schema_id = t.schema_id
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN 
    sys.allocation_units a ON p.partition_id = a.container_id
WHERE 
    t.NAME NOT LIKE 'dt%' 
    AND t.is_ms_shipped = 0
    AND i.OBJECT_ID > 255 
    AND i.index_id > 0
GROUP BY 
    s.Name, t.Name, i.name
ORDER BY 
    UsedSpaceKB

Comments

JiMetrics now gathers SQL Server startup account

During the last few days, I’ve refactored JiMetrics and added a new function:

  • Used Pester to create more test cases for PowerShell functions I wrote;
  • Enhanced the design and code so JiMetrics also gathers SQL Server instance’s startup account, which can be useful.

If you don’t know what JiMetrics is, go to this page to find out. It’s is a tool that uses SQL Server to gather important SQL Server metrics in your enterprise. No additional software install, no registry change, no files to copy and move around that pollute your system, it only uses SQL Server, which you already have and it just works! It has been pretty useful to me and my co-workers. JiMetrics is open source and free to use. Check it out and let me know what you think.

I will keep improving this. My next objective is to play with SQL Server 2014 column-store and see if I can convert the collection database tables to column-store.

Comments

Speaking at SQLSaturday Chicago

I will deliver a session for SQLSaturday this coming Saturday, April 26th. I will be discussing “Big Data”: cloud computing, distributed computing, column-store databases, in-memory OLTP databases, Hadoop, machine learning, and data analytics. Hope to see you there.

Comments

Removing duplicate rows in small batches based on date column

Due to double scheduling, some duplicate rows were inserted into the Windows.TableStats table in JiMetrics. To confirm that the table has duplicates, here is the T-SQL script I used. Note that this same script should work in all other major RDBMS platforms like MySQL, Oracle, etc. Adjust table and column names to fit your needs.

SELECT HostID,
       InstanceID,
       DbName,
       SchemaName,
       TableName,
       CAST (CollectionDate AS DATE),
       COUNT (*)
  FROM Windows.TableStats
GROUP BY HostID,
	 InstanceID,
         DbName,
         SchemaName,
         TableName,
         CAST (CollectionDate AS DATE)
HAVING COUNT (*) > 1

So the duplicates need to be removed. In Microsoft SQL Server, that can be accomplished with a Common Table Expression (CTE) using the ROW_NUMBER ranking function, which is a pretty elegant solution.

However, when using DELETE to remove potentially large number of rows, say millions, it is always advisable to do that in small batches. Otherwise, you are risking running out of temp, log space, or even disk space, not to mention that you are potentially creating bigger and coarser locks than necessary.

So I decided to remove the duplicates on a daily basis to lessen the impact on the database instance. A runningDate variable is used and incremented by one until the desired end date. Within each iteration of the loop, that day’s duplicates are removed. Tweak it to suit your needs.

I think things like this make a good job interview question for a database administrator.

DECLARE @runningDate DATE
SET @runningDate = '20140101'

WHILE (@runningDate < '20140406')
BEGIN
   WITH dupRows
        AS (SELECT RecordID,
                   ROW_NUMBER ()
                      OVER (PARTITION BY HostID,
                                         InstanceID,
					 DbName,
                                         SchemaName,
                                         TableName,
                                         cast (CollectionDate AS DATE)
                            ORDER BY
                               HostID,
			       InstanceID,
                               DbName,
                               SchemaName,
                               TableName)
                      AS RankID
              FROM Windows.TableStats
             WHERE CAST (CollectionDate AS DATE) = @runningDate)
   DELETE dupRows
    WHERE RankID > 1

   SET @runningDate = DATEADD (DAY, 1, @runningdate)
END

Comments

WordPress post update services and Nginx upstream time out

Starting around last Friday, I think, this site became irresponsive. Checking Tengine/Nginx error logs, here is a sample of what I saw:

2014/03/xx 08:43:40 [error] 3837#0: *6149 connect() to unix:/var/run/php-fastcgi/php-fastcgi.socket failed (11: Resource temporarily unavailable) while connecting to upstream, client: xxx.xx.xxx.xxx, server: haidongji.com, request: "POST /xmlrpc.php HTTP/1.0", upstream: "fastcgi://unix:/var/run/php-fastcgi/php-fastcgi.socket:", host: "www.haidongji.com"
2014/03/xx 08:45:13 [error] 3837#0: *6151 upstream timed out (110: Connection timed out) while reading response header from upstream, client: x.xxx.xx.xxx, server: haidongji.com, request: "GET /wiki/doku.php?id=wiki:dokuwiki HTTP/1.0", upstream: "fastcgi://unix:/var/run/php-fastcgi/php-fastcgi.socket", host: "www.haidongji.com", referrer: "http://www.haidongji.com/"
2014/03/xx 08:46:00 [error] 3837#0: *6159 upstream timed out (110: Connection timed out) while reading response header from upstream, client: xx.xxx.xx.xxx, server: haidongji.com, request: "GET /category/technology/mysql/feed/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fastcgi/php-fastcgi.socket", host: "www.haidongji.com"

So I started researching. A number of web pages, including StackOverflow, recommended increasing values for some or all of the following Tengine/Nginx parameters, so time out wouldn’t occur:

  • fastcgi_read_timeout
  • proxy_connect_timeout
  • proxy_read_timeout
  • proxy_send_timeout

Those suggestion appeared to be reasonable. So I tried raising those parameter values one at a time. Their default value is 60 seconds. I raised them to 600 just to test.

In my case, the problem persisted. The only difference was that the site suffered a slower death and the few minutes it was able to stay alive, the site was terribly slow and unacceptable.

At this point I’d spent hours debugging this. I decided to give myself a break and come back to it the next day. Before falling asleep, I wrote down a note of checking some reputable forums the search engines may not necessarily able to get, such as Linode’s forums. Failing that, I am determined to read up on more documentation and code to get to the bottom of this.

It turned out that Linode’s forum provided the necessary clue to finally resolve this. The problem lies in the WordPress blog publishing update service. The idea behind it was to notify blog aggregators like Technorati/Feedburner (remember them?) as soon as you publish a blog. One of the aggregator sites listed in my WordPress setting decided to not accepting those notifications anymore. Not knowing this, WordPress still tries to send notification out in vain, which caused the whole mess. After removing that URL, everything went back to normal.

On your WordPress admin page, you can find which URL your blog pings to by going to: Settings -> Writing -> Update Services. Update: The same problem happened again. So I removed pingomatic.com also. Now my WordPress does not ping or update anybody. So far the site has been stable.

Hope this helps!

Lesson learned/reinforced:
After working on a problem for a long time and you are stuck, take a break, go for a walk, find a friend to talk to and describe the issue and what you’ve done thus far. You’d be surprised at how effective this can be.

Comments (2)

« Previous entries