Generating dimension data for dates

Most analytical and BI databases have date dimension table(s). One frequently needs to generate and populate such data. I present a solution below for such data generation, written in Python. Please use different database drivers/modules to connect to your specific database server (MySQL, SQL Server, Oracle, etc.) for data population.

Notes:

1. It takes 2 parameters, start date and end date, in YYYYMMDD format, inclusive. Extensive error checking is built in, but let me know if you have comments/suggestions;

2. The script produce a Python dictionary (associated array) and print out its content;

3. The output includes dayNumber: a day’s position in a year. For example, 2011-02-01 is the 32ed day in 2011, therefore its dayNumber is 32;

4. The output includes weekNumber: a week’s position in a year. The week number in year is based on ISO standard. From documentation: the ISO year consists of 52 or 53 full weeks, where a week starts on a Monday and ends on a Sunday. The first week of an ISO year is the first (Gregorian) calendar week of a year containing a Thursday. This is called week number 1, and the ISO year of that Thursday is the same as its Gregorian year.

So, 2011-01-01 has the weekNumber 52, because it falls on a Saturday and belongs to the last week of 2010.

5. The output includes weekday information as well. 4 different variations are included:
Sunday 0, Monday 1, and so on
Sunday 1, Monday 2, and so on
Monday 0, Tuesday 1, and so on
Monday 1, Tuesday 2, and so on

6. The script requires the argparse module. It comes with Python 2.7. Python version prior to 2.7 does not have it by default, therefore you need to install it.

import argparse, sys, time
from datetime import date, timedelta

parser = argparse.ArgumentParser(description="Generating date dimension data")
parser.add_argument('-s', '--startDate', help='Start date in YYYYMMDD format', required=True, dest='startDate')
parser.add_argument('-e', '--endDate', help='end date in YYYYMMDD format', required=True, dest='endDate')

argList = parser.parse_args()

if (((not argList.startDate.isdigit()) or (not (len(argList.startDate) == 8))) or ((not argList.endDate.isdigit()) or (not (len(argList.endDate) == 8))) or (argList.startDate > argList.endDate)):
	print "Input(s) must be numeric in YYYYMMDD format and end date must not be earlier than start date"
	sys.exit (1)

try:
	startDate = date(int(argList.startDate[0:4]), int(argList.startDate[4:6]), int(argList.startDate[6:8]))
	endDate = date(int(argList.endDate[0:4]), int(argList.endDate[4:6]), int(argList.endDate[6:8]))
except ValueError:
	print "Input(s) must be valid date value in YYYYMMDD format"
	sys.exit (1)

start = time.time()

while startDate <= endDate:
	dateInfo = {'dateYYYYMMDD': startDate.strftime('%Y%m%d'), 'calDate': startDate.strftime('%Y-%m-%d'), 'calDay': startDate.day, 'calMonth': startDate.month, 'calYear': startDate.year}
	dateInfo['dayOfWeekSunday0Monday1'] = startDate.isoweekday() % 7
	dateInfo['dayOfWeekSunday1Monday2'] = startDate.isoweekday() % 7 + 1
	dateInfo['dayOfWeekSunday6Monday0'] = startDate.weekday()
	dateInfo['dayOfWeekSunday7Monday1'] = startDate.isoweekday()
	dateInfo['dayNumber'] = startDate.toordinal() - date(startDate.year - 1, 12, 31).toordinal()
	dateInfo['weekNumber'] = startDate.isocalendar()[1]
	print dateInfo
	startDate = startDate + timedelta(1)

1 Comment »

  1. donald Said,

    August 4, 2011 @

    This is a really helpful guide—probably the best one I’ve found–I gave it to all my interns learning Python.

RSS feed for comments on this post · TrackBack URI

Leave a Comment