Archive for Technology

JiMetrics now gathers SQL Server startup account

During the last few days, I’ve refactored JiMetrics and added a new function:

  • Used Pester to create more test cases for PowerShell functions I wrote;
  • Enhanced the design and code so JiMetrics also gathers SQL Server instance’s startup account, which can be useful.

If you don’t know what JiMetrics is, go to this page to find out. It’s is a tool that uses SQL Server to gather important SQL Server metrics in your enterprise. No additional software install, no registry change, no files to copy and move around that pollute your system, it only uses SQL Server, which you already have and it just works! It has been pretty useful to me and my co-workers. JiMetrics is open source and free to use. Check it out and let me know what you think.

I will keep improving this. My next objective is to play with SQL Server 2014 column-store and see if I can convert the collection database tables to column-store.


Speaking at SQLSaturday Chicago

I will deliver a session for SQLSaturday this coming Saturday, April 26th. I will be discussing “Big Data”: cloud computing, distributed computing, column-store databases, in-memory OLTP databases, Hadoop, machine learning, and data analytics. Hope to see you there.


Removing duplicate rows in small batches based on date column

Due to double scheduling, some duplicate rows were inserted into the Windows.TableStats table in JiMetrics. To confirm that the table has duplicates, here is the T-SQL script I used. Note that this same script should work in all other major RDBMS platforms like MySQL, Oracle, etc. Adjust table and column names to fit your needs.

       CAST (CollectionDate AS DATE),
       COUNT (*)
  FROM Windows.TableStats
         CAST (CollectionDate AS DATE)

So the duplicates need to be removed. In Microsoft SQL Server, that can be accomplished with a Common Table Expression (CTE) using the ROW_NUMBER ranking function, which is a pretty elegant solution.

However, when using DELETE to remove potentially large number of rows, say millions, it is always advisable to do that in small batches. Otherwise, you are risking running out of temp, log space, or even disk space, not to mention that you are potentially creating bigger and coarser locks than necessary.

So I decided to remove the duplicates on a daily basis to lessen the impact on the database instance. A runningDate variable is used and incremented by one until the desired end date. Within each iteration of the loop, that day’s duplicates are removed. Tweak it to suit your needs.

I think things like this make a good job interview question for a database administrator.

DECLARE @runningDate DATE
SET @runningDate = '20140101'

WHILE (@runningDate < '20140406')
   WITH dupRows
        AS (SELECT RecordID,
                   ROW_NUMBER ()
                      OVER (PARTITION BY HostID,
                                         cast (CollectionDate AS DATE)
                            ORDER BY
                      AS RankID
              FROM Windows.TableStats
             WHERE CAST (CollectionDate AS DATE) = @runningDate)
   DELETE dupRows
    WHERE RankID > 1

   SET @runningDate = DATEADD (DAY, 1, @runningdate)


WordPress post update services and Nginx upstream time out

Starting around last Friday, I think, this site became irresponsive. Checking Tengine/Nginx error logs, here is a sample of what I saw:

2014/03/xx 08:43:40 [error] 3837#0: *6149 connect() to unix:/var/run/php-fastcgi/php-fastcgi.socket failed (11: Resource temporarily unavailable) while connecting to upstream, client:, server:, request: "POST /xmlrpc.php HTTP/1.0", upstream: "fastcgi://unix:/var/run/php-fastcgi/php-fastcgi.socket:", host: ""
2014/03/xx 08:45:13 [error] 3837#0: *6151 upstream timed out (110: Connection timed out) while reading response header from upstream, client:, server:, request: "GET /wiki/doku.php?id=wiki:dokuwiki HTTP/1.0", upstream: "fastcgi://unix:/var/run/php-fastcgi/php-fastcgi.socket", host: "", referrer: ""
2014/03/xx 08:46:00 [error] 3837#0: *6159 upstream timed out (110: Connection timed out) while reading response header from upstream, client:, server:, request: "GET /category/technology/mysql/feed/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fastcgi/php-fastcgi.socket", host: ""

So I started researching. A number of web pages, including StackOverflow, recommended increasing values for some or all of the following Tengine/Nginx parameters, so time out wouldn’t occur:

  • fastcgi_read_timeout
  • proxy_connect_timeout
  • proxy_read_timeout
  • proxy_send_timeout

Those suggestion appeared to be reasonable. So I tried raising those parameter values one at a time. Their default value is 60 seconds. I raised them to 600 just to test.

In my case, the problem persisted. The only difference was that the site suffered a slower death and the few minutes it was able to stay alive, the site was terribly slow and unacceptable.

At this point I’d spent hours debugging this. I decided to give myself a break and come back to it the next day. Before falling asleep, I wrote down a note of checking some reputable forums the search engines may not necessarily able to get, such as Linode’s forums. Failing that, I am determined to read up on more documentation and code to get to the bottom of this.

It turned out that Linode’s forum provided the necessary clue to finally resolve this. The problem lies in the WordPress blog publishing update service. The idea behind it was to notify blog aggregators like Technorati/Feedburner (remember them?) as soon as you publish a blog. One of the aggregator sites listed in my WordPress setting decided to not accepting those notifications anymore. Not knowing this, WordPress still tries to send notification out in vain, which caused the whole mess. After removing that URL, everything went back to normal.

On your WordPress admin page, you can find which URL your blog pings to by going to: Settings -> Writing -> Update Services. Update: The same problem happened again. So I removed also. Now my WordPress does not ping or update anybody. So far the site has been stable.

Hope this helps!

Lesson learned/reinforced:
After working on a problem for a long time and you are stuck, take a break, go for a walk, find a friend to talk to and describe the issue and what you’ve done thus far. You’d be surprised at how effective this can be.

Comments (2)

Collecting Windows BIOS and Host Serial Number

After imporving JiMetrics yesterday so it tries to determine if the host is a VM or not, I made another improvement today: collecting host server BIOS related information and serial number.

I don’t know about you, in the past when I needed to gather a Windows server’s BIOS or serial number for troubleshooting, I typically rebooted the machine, pressed whatever function key was necessary to get into the BIOS, and coped things down. Wouldn’t it be nice if we can get it programmatically and store that somewhere for easy reference?

So today I made another enhancement to JiMetrics to address that need. Here is what I did:

  1. Added additional columns to the Windows.Host table: SMBIOSVersion, BIOSReleaseDate, SerialNumber;
  2. I made improvement to the PowerShell script, so it now uses Win32_BIOS to gather and store that in the JiMetrics database.

Together with HardwareVendor and HardwareModel, it will be easy to find the latest BIOS software for your server host. In addition, based on SMBIOSVersion and BIOSRealeaseDate, you can determine if your BIOS is out of date and decide if you want to update the host BIOS accordingly. Based on my past experience, a lot of companies have servers whose BIOS is way out of date. Armed with this information, they won’t be.

With this improvement, it is even more easier for system admins to be on top of things. JiMetrics is not just for SQL Server DBAs, a Windows admin will also find it extremely valuable. What are you waiting for? Go to and get it :)


« Previous entries