Archive for July, 2011

Generating dimension data for dates

Most analytical and BI databases have date dimension table(s). One frequently needs to generate and populate such data. I present a solution below for such data generation, written in Python. Please use different database drivers/modules to connect to your specific database server (MySQL, SQL Server, Oracle, etc.) for data population.

Notes:

1. It takes 2 parameters, start date and end date, in YYYYMMDD format, inclusive. Extensive error checking is built in, but let me know if you have comments/suggestions;

2. The script produce a Python dictionary (associated array) and print out its content;

3. The output includes dayNumber: a day’s position in a year. For example, 2011-02-01 is the 32ed day in 2011, therefore its dayNumber is 32;

4. The output includes weekNumber: a week’s position in a year. The week number in year is based on ISO standard. From documentation: the ISO year consists of 52 or 53 full weeks, where a week starts on a Monday and ends on a Sunday. The first week of an ISO year is the first (Gregorian) calendar week of a year containing a Thursday. This is called week number 1, and the ISO year of that Thursday is the same as its Gregorian year.

So, 2011-01-01 has the weekNumber 52, because it falls on a Saturday and belongs to the last week of 2010.

5. The output includes weekday information as well. 4 different variations are included:
Sunday 0, Monday 1, and so on
Sunday 1, Monday 2, and so on
Monday 0, Tuesday 1, and so on
Monday 1, Tuesday 2, and so on

6. The script requires the argparse module. It comes with Python 2.7. Python version prior to 2.7 does not have it by default, therefore you need to install it.

import argparse, sys, time
from datetime import date, timedelta
parser = argparse.ArgumentParser(description="Generating date dimension data")
parser.add_argument('-s', '--startDate', help='Start date in YYYYMMDD format', required=True, dest='startDate')
parser.add_argument('-e', '--endDate', help='end date in YYYYMMDD format', required=True, dest='endDate')
argList = parser.parse_args()
if (((not argList.startDate.isdigit()) or (not (len(argList.startDate) == 8))) or ((not argList.endDate.isdigit()) or (not (len(argList.endDate) == 8))) or (argList.startDate > argList.endDate)):
	print "Input(s) must be numeric in YYYYMMDD format and end date must not be earlier than start date"
	sys.exit (1)
try:
	startDate = date(int(argList.startDate[0:4]), int(argList.startDate[4:6]), int(argList.startDate[6:8]))
	endDate = date(int(argList.endDate[0:4]), int(argList.endDate[4:6]), int(argList.endDate[6:8]))
except ValueError:
	print "Input(s) must be valid date value in YYYYMMDD format"
	sys.exit (1)
start = time.time()
while startDate <= endDate:
	dateInfo = {'dateYYYYMMDD': startDate.strftime('%Y%m%d'), 'calDate': startDate.strftime('%Y-%m-%d'), 'calDay': startDate.day, 'calMonth': startDate.month, 'calYear': startDate.year}
	dateInfo['dayOfWeekSunday0Monday1'] = startDate.isoweekday() % 7
	dateInfo['dayOfWeekSunday1Monday2'] = startDate.isoweekday() % 7 + 1
	dateInfo['dayOfWeekSunday6Monday0'] = startDate.weekday()
	dateInfo['dayOfWeekSunday7Monday1'] = startDate.isoweekday()
	dateInfo['dayNumber'] = startDate.toordinal() - date(startDate.year - 1, 12, 31).toordinal()
	dateInfo['weekNumber'] = startDate.isocalendar()[1]
	print dateInfo
	startDate = startDate + timedelta(1)

Comments (1)

Sysinternals and PAL

Sysinternals and PAL (Performance Analysis of Logs) are two fantastic tools for general server information gathering and troubleshooting on Windows.

Sysinternals suite is a set of tools that can be downloaded freely from Microsoft. One thing that is particularly attractive about them is that they can be run directly after downloading without special installation and all the footprints a typical installation leaves on the host machine (new directories under C:\Program Files\, registry entries, data files and what have you). I found them very valuable and handy.

In particular, psInfo provides good summary information of the server. For example, psinfo -s -h -d provides basic information about the system, software installed, Windows hot fixes installed, and disk volume information.

PAL: install PAL on your test/analysis/general purpose machine. Install the mschart control as it is a prerequisite of PAL. Here is how I used it:

1. Produce Perfmon data gathering template files using PAL. I exported 3 template files: overview, quick overview, and SQL Server 2005/2008;

Perfmon is the general-purpose data instrumentation tool on Windows. Through Perfmon you can gather system wide counters for things like CPU, memory, network, and disk IO. In addition, a lot of applications such as SQL Server, Exchange, and others, expose application level Instrumentation data such that you can collect them via Perfmon as well.

It is best to have a few handy data collection template, hence this step.

2. On the Windows server that I am interested in monitoring, import Perfmon counter template file produced above by opening a DOS prompt under Administrator and executing:

logman import -n templateNameIdefine -xml pathAndName2TemplateXmlFile

3. Open Perfmon, find the one you imported, and start collecting

4. After collection is done, copy the log file and use PAL for analysis. It will generate a very nice and intuitive report. Please don’t run PAL on the system you are diagnosing. Run it somewhere else. Be patient, as it will take a while for PAL to churn through the data (it took 2 hours on a Rackspace cloud server with 2 CPUs and 1 gig of RAM for a file about 30 meg)

Comments

Page optimized by WP Minify WordPress Plugin