Archive for Linux

Generating dimension data for dates

Most analytical and BI databases have date dimension table(s). One frequently needs to generate and populate such data. I present a solution below for such data generation, written in Python. Please use different database drivers/modules to connect to your specific database server (MySQL, SQL Server, Oracle, etc.) for data population.

Notes:

1. It takes 2 parameters, start date and end date, in YYYYMMDD format, inclusive. Extensive error checking is built in, but let me know if you have comments/suggestions;

2. The script produce a Python dictionary (associated array) and print out its content;

3. The output includes dayNumber: a day’s position in a year. For example, 2011-02-01 is the 32ed day in 2011, therefore its dayNumber is 32;

4. The output includes weekNumber: a week’s position in a year. The week number in year is based on ISO standard. From documentation: the ISO year consists of 52 or 53 full weeks, where a week starts on a Monday and ends on a Sunday. The first week of an ISO year is the first (Gregorian) calendar week of a year containing a Thursday. This is called week number 1, and the ISO year of that Thursday is the same as its Gregorian year.

So, 2011-01-01 has the weekNumber 52, because it falls on a Saturday and belongs to the last week of 2010.

5. The output includes weekday information as well. 4 different variations are included:
Sunday 0, Monday 1, and so on
Sunday 1, Monday 2, and so on
Monday 0, Tuesday 1, and so on
Monday 1, Tuesday 2, and so on

6. The script requires the argparse module. It comes with Python 2.7. Python version prior to 2.7 does not have it by default, therefore you need to install it.

import argparse, sys, time
from datetime import date, timedelta
parser = argparse.ArgumentParser(description="Generating date dimension data")
parser.add_argument('-s', '--startDate', help='Start date in YYYYMMDD format', required=True, dest='startDate')
parser.add_argument('-e', '--endDate', help='end date in YYYYMMDD format', required=True, dest='endDate')
argList = parser.parse_args()
if (((not argList.startDate.isdigit()) or (not (len(argList.startDate) == 8))) or ((not argList.endDate.isdigit()) or (not (len(argList.endDate) == 8))) or (argList.startDate > argList.endDate)):
	print "Input(s) must be numeric in YYYYMMDD format and end date must not be earlier than start date"
	sys.exit (1)
try:
	startDate = date(int(argList.startDate[0:4]), int(argList.startDate[4:6]), int(argList.startDate[6:8]))
	endDate = date(int(argList.endDate[0:4]), int(argList.endDate[4:6]), int(argList.endDate[6:8]))
except ValueError:
	print "Input(s) must be valid date value in YYYYMMDD format"
	sys.exit (1)
start = time.time()
while startDate <= endDate:
	dateInfo = {'dateYYYYMMDD': startDate.strftime('%Y%m%d'), 'calDate': startDate.strftime('%Y-%m-%d'), 'calDay': startDate.day, 'calMonth': startDate.month, 'calYear': startDate.year}
	dateInfo['dayOfWeekSunday0Monday1'] = startDate.isoweekday() % 7
	dateInfo['dayOfWeekSunday1Monday2'] = startDate.isoweekday() % 7 + 1
	dateInfo['dayOfWeekSunday6Monday0'] = startDate.weekday()
	dateInfo['dayOfWeekSunday7Monday1'] = startDate.isoweekday()
	dateInfo['dayNumber'] = startDate.toordinal() - date(startDate.year - 1, 12, 31).toordinal()
	dateInfo['weekNumber'] = startDate.isocalendar()[1]
	print dateInfo
	startDate = startDate + timedelta(1)

Comments (1)

A comparison of HandlerSocket and mysql client libraries with Python

I’ve done some benchmark testing of 2 Python modules for MySQL data retrieval: MySQLdb and pyhs. MySQLdb uses MySQL’s client libraries, whereas pyhs uses HandlerSocket that bypasses MySQL’s client layer and interfaces Innodb storage engine’s files directly. In my testing, HandlerSocket results in 82% improvement over mysql client libraries based on number of rows retrieved. The tests were conducted under different conditions: right after a start when cache is cold, a warmed up cache after running SELECT * FROM customer, and alternating the execution order of those 2 Python files. The results are fairly consistent in that they all fall in the same range. Below is a sample output.

root@ubuntu:~# python hanSolo.py
Using HandlerSocket, below is a report of how many customer's name and address can be retrieved based on customer key:
Seconds elapsed:  61.0000810623
Rows retrieved:  509863
root@ubuntu:~# python mclient.py
Using mysql client libraries, below is a report of how many customer's name and address can be retrieved based on customer key:
Seconds elapsed:  61.0001530647
Rows retrieved:  280120

Here is my setup:

Hardware and software:
1. Rackspace Cloud server Ubuntu 10.04 Lucid Lynx, 1 gig memory, 40 gig hard disk space, 64-bit
Linux ubuntu 2.6.35.4-rscloud #8 SMP Mon Sep 20 15:54:33 UTC 2010 x86_64 GNU/Linux

2. Following instruction here to get Percona’s APT repository;

3. apt-get install percona-server-client-5.5

4. apt-get install percona-server-server-5.5

Enable HandlerSocket plugin. HandlerSocket is bundled with Percona Server 5.5, so you don’t have to download source files, config, make, build yourself:

1. mysql> install plugin handlersocket soname 'handlersocket.so'

2 cp /usr/share/mysql/my-large.cnf /etc/mysql/my.cnf

3. vim /etc/mysql/my.cnf with the following under mysqld section

loose_handlersocket_port = 9998
loose_handlersocket_port_wr = 9999
loose_handlersocket_threads = 16
loose_handlersocket_threads_wr = 1
open_files_limit = 65535

4. service mysql restart

Acquire Python’s MySQLdb and pyhs modules:

1. apt-get install libmysqlclient-dev
Necessary for building Python’s MySQLdb

2. apt-get install python-dev
Necessary Python header files, gcc, make, etc., for building Python modules

2. wget the appropriate egg from this page. Get the one for your version of Python.

3. sh eggFileDownloadedFromTheStepAbove

4. easy_install MySQL-python
MySQLdb module, which uses mysql client for MySQL access

5. easy_install python-handler-socket

Prepare testing data

1. Follow instructions here to get dbgen compiled;

2. While at the proper directory, run
./dbgen -v -T c
It will generate a customer file that has 150000 rows

3. Create customer table in test. Here is the DDL:

CREATE TABLE customer ( C_CUSTKEY     INTEGER NOT NULL,
C_NAME        VARCHAR(25) NOT NULL,
C_ADDRESS     VARCHAR(40) NOT NULL,
C_NATIONKEY   INTEGER NOT NULL,
C_PHONE       CHAR(15) NOT NULL,
C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
C_MKTSEGMENT  CHAR(10) NOT NULL,
C_COMMENT     VARCHAR(117) NOT NULL,
primary key (C_CUSTKEY));

4. load data local infile ‘/root/dbgen/customer.tbl’ into table customer fields terminated by ‘|’ lines terminated by ‘\n’;
Adjust file location as necessary.

Finally, here is the content of my throwaway Python test scripts. One highlight of mclient.py is it demonstrate how to return results back in as dict with MySQLdb.

hanSolo.py

import time
from pyhs import Manager
hs = Manager()
start = time.time()
i = 1
j= 0
while i < 150000:
	data = hs.get('test', 'customer', ['C_CUSTKEY', 'C_NAME', 'C_ADDRESS'], '%s' % i)
	i=i+1
	if i == 150000:
		i = 1
	end = time.time()
	j = j + 1
	if int(end - start) > 60:
		break
print "Using HandlerSocket, below is a report of how many customer's name and address can be retrieved based on customer key:"
print "Seconds elapsed: ", str(end - start)
print "Rows retrieved: ", str(j)

mclient.py

import sys, MySQLdb, time
my_host = "localhost"
my_user = "root"
my_pass = ""
my_db = "test"
try:
    db = MySQLdb.connect(host=my_host, user=my_user, passwd=my_pass, db=my_db)
except MySQLdb.Error, e:
     print "Error %d: %s" % (e.args[0], e.args[1])
     sys.exit (1)
cursor = db.cursor (MySQLdb.cursors.DictCursor)
i=1
j=0
start = time.time()
while i < 150000:
	sql = "select c_custkey, c_name, c_address from customer where c_custkey=%s" % i;
	cursor.execute(sql)
	results = cursor.fetchall()
	i=i+1
	if i==150000:
		i=1
	end = time.time()
	j=j+1
	if int(end - start) > 60:
		break
print "Using mysql client libraries, below is a report of how many customer's name and address can be retrieved based on customer key:"
print "Seconds elapsed: ", str(end - start)
print "Rows retrieved: ", str(j)
db.close()

Comments

Setting up replication with XtraBackup

I attended Vadim Tkachenko’s talk on XtraBackup during MySQL conference in Santa Clara last week. Backups are obviously very important, but the use case I had in mind is this:
Replicating a database that has Innodb tables in it, while keeping both master and slave on line if possible.

Tangent: by the way, I love the native backup utility that was once promised in MySQL 6.0, similar to SQL Server’s way of backup. It was like running “BACKUP myDb to DISK = ‘/backupDirectory/myDb.bak’” under mysql client, but I digress…

I have used mysqldump to accomplish this in the past, but I wondered how XtraBackup would fare in this task, especially after hearing Vadim’s talk and reading news on Percona’s development effort. To cut to the chase, this is my conclusion. Reproducing steps are listed immediately afterwards.
1. innobackupex provides a consisten database backup, spitting out log file and log positions in stdout, which is nice and useful for slave initiation;
2. It works with both MyISAM and innodb tables;
3. If MyISAM tables are all you have, just run innobackupex –prepare /directoryWhereBackupIs, and then move the database directory from under /directoryWhereBackupIs to under your slave’s datadir, then make the necessary group and owner change to said directory and its content files, and you are ready to run the “change master” command and start slave;
4. If the database has innodb tables, then in addition to step 3, you will also need to stop mysql on slave, move the ibdata1 file to datadir, then restart mysql, and run “change master…” and “start slave” commands. It does not matter if you are using innodb_file_per_table or not.

It will be nice if I can keep the slave up and running during this step when the database has innodb tables in it. Did I do anything wrong? Is there a better way? What if the slave has a database that has innodb tables and thus uses ibdata1 to begin with? What do you do then? Should I play with Tungsten’s replication? What are the compelling reasons to use Tungsten’s replication?

In any case, from my limited testing, I think I will use innobackupex for future replication creation tasks, if I can afford a mysqld restart. Overall, it feels a bit easier than mysqldump approach that I’ve been using in the past.

Here are the steps needed to reproduce:

1. Fire up 2 Rackspace CentOS 5.5 servers. Rackspace cloud servers beat Amazon EC2 servers hands down, in my view, for developing/sandboxing purposes;
2. Install the required mysql client, server, and XtraBackup on both servers;
3. Make /etc/my.cnf by cloning the sample cnf files under /usr/share/my-small.cnf. 3 minimum changes were necessary: log-bin=mysql-bin, server-id=a unique number, datadir=/var/lib/mysql. The first 2 are necessary for replication, the last is needed for innobackupex

Well, while you are at it, on slave, add in read-only and skip-slave-start if appropriate. That’s best practice for read only slave.

4. Add master server’s public key to authorized_keys on slave, to facilitate easy ssh connection.
5. On master, run this command:

innobackupex --databases=test --stream=tar /tmp/ --slave-info | ssh root@slave "tar xfi - -C /root"
When it finishes, you should see something like this:
110419 18:54:21  innobackupex: completed OK!
tar: Read 6656 bytes from -

Take note of 3 lines immediately above it, where it states the binlog file and log position, like this:

innobackupex: MySQL binlog position: filename 'mysql-bin.000002', position 2515

6. On slave, run this command:

innobackupex --apply-log /locationWhereBackupIs

then, assuming the database name is test, run the 2 commands below to change the group and owner to mysql:

chgrp -R mysql test
chown -R mysql test

move the directory under mysqld’s datadir:

mv test/ /mysql/datadir

If test database has innodb tables in it, stop mysql on slave, then copy ibdata1 to datadir, restart mysql.

7. On master, open up port 3306 if it is not already open, then create the replication account:

grant replication slave, replication client on *.* to repl@'50.56.121.%' identified by 'p@ssw0rd';

8. On slave, run:

change master to master_host='50.56.121.96', master_user='repl', master_password='p@ssw0rd', master_log_file='see output from innobackupex backup command on master', master_log_pos=numFrominnobackupexOutputOnMaster;
start slave;
show slave status\G

Comments

Install MySQLdb module for Python

Update:

Commenter MarkR made a great point: if possible, use some packaging tools, to try to maintain proper dependencies, to the extent that is possible. Install from the source should be Plan B. So, try yum install MySQL-python first.

This is mostly for my own future reference. It’ll be icing on the cake if it helps you!

This is geared for CentOS or Red Hat. Use apt-get or other packaging tools for different flavours of Linux.

1. Get Python module setuptools called easy_install. I love easy_install, by the way, sort of like CPAN for Perl modules;
2. To install MySQLdb package, you would think easy_install MySQLdb would do. But that is not the case. I hope the developer would fix that. Instead, you need:

easy_install MySQL-python

3. If you have build errors, you may need:
yum install python-devel or yum install gcc or both.

Update: one more tip. If you encounter:

gcc: error trying to exec ‘cc1plus’: execvp: No such file or directory, you probably need:

yum install gcc-c++

Comments (2)

SELinux and “failed to map segment from shared object” error

I am reading and following examples in MySQL 5.1 Plugin Development. After compiling and moving a .so file (think DLL or Assembly file in Windows) into MySQL plugin directory, I got this message when I tried to create a UDF (User Defined Function):

mysql> create function udf_staticexample returns integer soname 'udf_staticexample.so';
ERROR 1126 (HY000): Can't open shared library 'udf_staticexample.so' (errno: 0 /usr/lib/mysql/plugin/udf_staticexample.so: failed to map segment from shared object: Permission denied)

This is caused by the fact that the .so shared object file is not in the right SELinux security context:

[root@asusfedora plugin]# ll -Z
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_archive.so -> ha_archive.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_archive.so.0 -> ha_archive.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_archive.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_blackhole.so -> ha_blackhole.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_blackhole.so.0 -> ha_blackhole.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_blackhole.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_example.so -> ha_example.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_example.so.0 -> ha_example.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_example.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_federated.so -> ha_federated.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_federated.so.0 -> ha_federated.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_federated.so.0.0.0
-rwxr-xr-x. root root unconfined_u:object_r:user_home_t:s0 udf_staticexample.so

To fix this, use chcon –reference to change the security context of the file in question to a referenced file, like this:

[root@asusfedora plugin]# chcon --reference=/usr/lib/mysql/plugin/ha_archive.so /usr/lib/mysql/plugin/udf_staticexample.so
[root@asusfedora plugin]# ll -Z
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_archive.so -> ha_archive.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_archive.so.0 -> ha_archive.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_archive.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_blackhole.so -> ha_blackhole.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_blackhole.so.0 -> ha_blackhole.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_blackhole.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_example.so -> ha_example.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_example.so.0 -> ha_example.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_example.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_federated.so -> ha_federated.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_federated.so.0 -> ha_federated.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_federated.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       udf_staticexample.so

All is well afterwards.

mysql> create function udf_staticexample returns integer soname 'udf_staticexample.so';
Query OK, 0 rows affected (0.00 sec)
mysql> select udf_staticexample();
+---------------------+
| udf_staticexample() |
+---------------------+
|              318749 |
+---------------------+
1 row in set (0.00 sec)

Comments

Data generation with TPC-H’s dbgen for load testing

2011-06-26 update:

I am not sure if there are any changes in the latest make and gcc packages. Anyway, I noticed when run make, I encountered the message below:

make: g: Command not found
make: [qgen] Error 127 (ignored)

To fix this, find where gcc is at, then created a symbolic link g that points to gcc. All is well afterwards:
[root@ip-10-245-209-196 dbgen]# which gcc
/usr/bin/gcc
[root@ip-10-245-209-196 dbgen]# cd /usr/bin/
[root@ip-10-245-209-196 bin]# ln -s gcc g

End update
Recently I found myself doing some data loading benchmark testing with table partition. Data loading and storing for BI/DW/DSS stuff almost always involves data partitioning. SQL Server partition has a nice feature called partition switch, where you can swap data in and out of a partitioned table. In MySQL, it is called exchange, but it hasn’t been released. Perhaps other RDBMS have similar things as well.

To facilitate testing, I need some data that lends itself easily for partition, which led me to TPC-H’s dbgen tool.

dbgen is pretty easy to build and compile on Linux. It also has a Windows version. I didn’t bother trying because getting it built on Linux was such a quick and easy process. However, the documentation of dbgen, if it can be called as such, leaves a lot to be desired. But by experimenting and surfing around, I was able to get what I needed.

Instructions here are for CentOS. Building it on other flavors of Linux should also be easy.

1. yum install make and yum install gcc if you don’t have that already;

2. wget source code;

3. tar xf tarball;

4. cd dbgen

5. vi makefile.suite and change 4 lines. See uncommented lines below:

################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH
DATABASE= SQLSERVER
MACHINE = LINUX
WORKLOAD = TPCH

Note: It does not have MySQL and others for DATABASE setting, but that really doesn’t matter, because the file generated, which is delimited by a pipe, |, can be used for loading into any system.

6. make -f makefile.suite
This creats the dbgen binary that we can use in the next step.

7. ./dbgen -h gives you a brief description of switches that can be used. I am just interested in creating the lineitem table, because it has a good combination of integer, decimal, date, and character fields. The date inside the tables spans between 1992-01-01 and 1998-12-31, which is good for a partitioned table based on date. The DDL for this table is inside the dss.ddl file in the same directory. Please modify it as necessary. In fact, Lubor Kollar has it ready for SQL Server here, and Vadim Tkachenko has it for MySQL here.

Here is the command I used and its results:

[root@centos dbgen]# ./dbgen -v -T L -s 2
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
Generating data for lineitem table/
Preloading text ... 100%
Do you want to overwrite ./lineitem.tbl ? [Y/N]: Y
done.

-v: verbose, -T L: lineitem only, -s 2: scale factor of 2. My understanding is that it roughly indicates that the end file will be close to 2 gig.

It turns out the end file is about 1.5 gig, with close to 12 million rows in it.

-rw-r--r-- 1 root root  1.5G Mar 30 21:33 lineitem.tbl
[root@centos dbgen]# wc -l lineitem.tbl
11997996 lineitem.tbl

You can play around with different key switches to get the size of file you want. For example, quoting from Lubor Kollar:

dbgen –T L -s 4 -C 3 -S 1

Using the –s option, we set the scale to 4 that generates a Lineitem table of 3GB. Using the –C option we split the table into 3 portions, and then using the –S option we chose only the first 1GB portion of the Lineitem table.

Running the command above, I got a file named lineitem.tbl.1 with close to 8 million rows, about 976 meg.

When trying to load it into SQL Server with SSIS bulk copying task, remember the delimiter is | and line separator is LF (line feed). For MySQL, something like this should do:

load data local infile '/root/dbgen/lineitem.tbl.1' into table lineitem fields terminated by '|' lines terminated by '\n';

Comments (2)

Parse sqlio log file with Python pyparsing

A few weeks ago I posted some questions on 3 Python text processing modules: pyparsing, SimpleParse, and NLTK. Today I need to analyze a log file generated by sqlio. I decided to use pyparsing. I am pretty pleased with it.

I am aware that there are at least 2 utility scripts for this: one is in Perl and then other is in Windows PowerShell. But I wanted to write in Python.

Noteworthy points:

1. In analyzing and building up the grammar/structure of the text for Python, I opted for line by line analysis and coding. I think this brings clarity and is easier to read. It felt a bit tedious in building it up, but once it is done, it is pretty rewarding to see the end results;
2. After parsing, the script writes the results to a csv file. You can then open that in a spreadsheet program to create some charts. Seeing the results in a column diagram chart brings clarity and focus to the result. It worked pretty well for me.

All in all, a good and productive day. Below are the details. The top part is the sample log content to be parsed, followed by the program.

"""
c:\Program Files (x86)\SQLIO>sqlio -kW -t2 -s30 -dH -o1 -frandom -b64 -BH -LS Testfile.dat
sqlio v1.5.SG
using system counter for latency timings, 2208037 counts per second
2 threads writing for 30 secs to file H:Testfile.dat
	using 64KB random IOs
	enabling multiple I/Os per thread with 1 outstanding
	buffering set to use hardware disk cache (but not file cache)
using current size: 24576 MB for file: H:Testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:  6106.50
MBs/sec:   381.65
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 0
Max_Latency(ms): 5
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 99  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
c:\Program Files (x86)\SQLIO>sqlio -kW -t2 -s30 -dH -o2 -frandom -b64 -BH -LS Testfile.dat
sqlio v1.5.SG
using system counter for latency timings, 2208037 counts per second
2 threads writing for 30 secs to file H:Testfile.dat
	using 64KB random IOs
	enabling multiple I/Os per thread with 2 outstanding
	buffering set to use hardware disk cache (but not file cache)
using current size: 24576 MB for file: H:Testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:  6951.19
MBs/sec:   434.44
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 0
Max_Latency(ms): 6
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 87 12  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
"""
from pyparsing import *
output = open("sample.csv", "w")
input = open("hDriveSqlioResults.txt", "r")
data = input.read()
#------------------------------------------------------------------------
# Define Grammars
#------------------------------------------------------------------------
integer = Word(nums)
singleLetter = Word(alphas, max=1)
realNumber = Combine(Word(nums) + '.' + Word(nums))
reading = Literal("reading")
writing = Literal("writing")
readingOrWriting = reading | writing
random = Literal("random")
sequential = Literal("sequential")
randomOrSequential = random | sequential
#Line 1 is useful. We want to suck out the juicy parts.
sectionBeginning = CaselessLiteral("c:\\program files (x86)\\sqlio>sqlio ")
ioKind = "-k" + singleLetter("k")
numThreads = "-t" + integer("t")
numSeconds = "-s" + integer("s")
driveOrMountPath = "-d" + singleLetter("d")
outstandingRequests = "-o" + integer("o")
accessType = "-f" + Word(alphas)("f")
ioBlockSizeInKB = "-b" + integer("b")
#Line 2 can be safely ignored
line2 = "sqlio v" + restOfLine
#Line 3 can be safely ignored
line3 = "using system counter for latency timings" + restOfLine
#Line 4 can be safely ignored
line4 = integer + "threads" + readingOrWriting + restOfLine
#Line 5 can be safely ignored
line5 = "using" + integer + "KB" + randomOrSequential + "IOs" + restOfLine
#Line 6 can be safely ignored
line6 = "enabling multiple " + restOfLine
#Line 7 can be safely ignored
line7 = "buffering set to use" + restOfLine
#Line 8 we should get the file size out
line8 = "using current size: " + integer("fileSize") + Word(alphas)("fileSizeUnit") + restOfLine
#Line 9 can be safely ignored
line9 = "initialization done" + restOfLine
#Line 10 can be safely ignored
line10 = "CUMULATIVE" + restOfLine
#Line 11 can be safely ignored
line11 = "throughput" + restOfLine
#Line 12 we want IOPS
line12 = "IOs/sec:" + realNumber("IOPS") + restOfLine
#Line 13 we want MBPS
line13 = "MBs/sec:" + realNumber("MBPS") + restOfLine
#Line 14 can be safely ignored
line14 = "latency" + restOfLine
#Line 15 we need to get minLatency out
line15 = "Min_Latency(ms):" + integer("minLatency") + restOfLine
#Line 16 we need to get avgLatency out
line16 = "Avg_Latency(ms):" + integer("avgLatency") + restOfLine
#Line 17 we need to get maxLatency out
line17 = "Max_Latency(ms):" + integer("maxLatency") + restOfLine
#Line 18 can be safely ignored
line18 = "histogram" + restOfLine
#Line 19 can be safely ignored
line19 = "ms:" + restOfLine
#Line 20 can be safely ignored
line20 = "\%:" + restOfLine
extraStuff = ZeroOrMore(line14 | line15 | line16 | line17 | line18 | line19 | line20)
logEntry = sectionBeginning + ioKind + numThreads + numSeconds + driveOrMountPath + outstandingRequests + accessType + ioBlockSizeInKB + restOfLine + line2 + line3 + line4 + line5 + line6 + line7 + line8 + line9 + line10 + line11 + line12 + line13 + extraStuff
output.write("IO property, IOPS, MBPS, minLatencyInMS, avgLatencyInMS, maxLatencyInMS\n")
for tokens in logEntry.searchString(data):
	output.write("%(k)s%(t)s threads %(o)s queue %(f)s in %(b)s KB chunks,%(IOPS)s,%(MBPS)s,%(minLatency)s,%(avgLatency)s,%(maxLatency)s\n" % tokens)

Comments (5)

Moving a LAMP site

I’ve done this in the past, but thought this time I’ve got to take notes. It can be used as a crude check list in the future. Don’t underestimate the power of a practical, down-to-earth check list! Perhaps documents like this should be kept in a wiki page, for easy updating to avoid being stale, a proeblem with blog entries, it seems.

P in LAMP here stands for php, not Python or Perl. L is CentOS (I used CentOS 5.5) or Red Hat Linux. I am not covering moving all databases in a MySQL instance, just a select few or just one.

I’d appreciate your comments or suggestions.

Software install and configuration
MySQL:
MariaDB or Percona.

MariaDB can be had here.

For Percona server and client tools, it’s best to have direct access to Percona’s repository:

yum install gpg
rpm -Uhv http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm

Do

yum search Percona

to find Percona packages you are interested in.

Percona MySQL Server package does not create /etc/my.cnf for you. In today’s server, I think you should use /usr/share/doc/Percona-Server-server-51-5.1.54/my-huge.cnf as default /etc/my.cnf if you are not sure. Remember to turn on the slow-query-log. Also, remember to set interactive-timeout=600. You may also need to increase the max_connections setting.

Open up port 3306:
Edit /etc/sysconfig/iptables: add this line somewhere above COMMIT

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3306 -j ACCEPT

Restart the firewall:

service iptables restart

Note: Here is how I came up with this configuration. I found the line that has 22 in it, which is the default port for ssh, yanked and pasted it, replacing 22 with 3306. This works, but I am not sure if it is done correctly. Comments welcome.

On source MySQL server:
1. FLUSH TABLES WITH READ LOCK
2. The backup can be done many ways. This time I used xtrabackup, but I am leaning toward using mysqldump or mysqlhotcopy exclusively next time I do it, because I was not satisfied with innobackupex-1.5.1′s –stream and –apply-log functions. Perhaps my bad experience was due to working with a particularly messed up MySQL instance and/or a under-powered server. Percona did mention that they are not satisfied with innobackupex so hopefully a better version will come out.

In any case, here is what I did with innobackupex-1.5.1 for database backup. It did get all MyISAM tables correctly, though, just not the InnoDB tables:

innobackupex-1.5.1 --databases=myDB --user=samIam --password=IamSam --stream=tar ./ | gzip -> /home/sam/myDb.tar.gz

For InnoDB tables, I did:

mysqldump -usamIam -pIamSam myDb innoDbTable > innoDbTable.sql

For stored routines, I did:
[sourcecode language="text"}
mysqldump --routines --no-create-info --no-data --no-create-db --skip-opt myDb > routine.sql
[/sourcecode]

3. Move the backup file to the new server. Percona mentioned a few creative ways of moving files here.

4. Use the excellent mk-show-grants, part of Maatkit, to get the login/user/permission generation script. Modify that as necessary.

On destination MySQL server:

1. Unpack myDB.tar.gz and move all myDb directory and its contents to MySQL’s data-dir, perhaps /var/lib/mysql or /usr/local/mysql/data. While inside MySQL data-dir on the shell, do

chgrp -R mysql myDB/
chown -R mysql myDB/

If you took a dump earlier, pun not intended, run this on the shell

mysql myDB < innoDbTable.sql

For stored routines, move over the dumped file, and run this:

mysql myDb < routine.sql

2. Do a select count(*) on all myDB tables. I don’t have a clever way of doing this, I did “show tables”, grabbed the results, and hacked with Vim’s macro to generate a bunch of select count(*) statements and ran it. This should let you know if there are any issues with your newly restored myDB. mysqldump -d myDB is also a good thing to run to find out problematic tables, I think.

3. Run the login creation script generated earlier with mk-show-grants

Web server:

If a brand new web server, these are needed, at a minimum

yum install php
yum install php-mysql
yum install httpd

It might be necessary to edit /etc/php.ini, so there is a line
extension=myql.so
under Dynamic Extensions

Comments

Search objects in SQL Server databases with pyodbc

Command line utility to search for objects based on names. Search results will be displayed in well formatted tabular format, left-justified.

1. pyodbc installed;
2. Use trusted authentication by default. Find the relevant code to adjust to login/password as needed;
3. -w for wild card search. Default is exact name match, case insensitive in almost all cases, depending on your SQL Server configuration;
4. If -d is not specified, all databases in instance will be searched, minus the ones that one not accessible at the moment. To specify database(s) you want to search into, use -d followed by database names. Separate them with comma if more than one;
5. Use -S to specify instance;
6. Good sample code for command line parameters processing and pyodbc usage;
7. Won’t work with SQL Server 2000 or lower.

Example 1: python objectSearch.py myobject -S instance1
Result: Database objects that are named myobject in instance1 will be displayed

Example 2: python objectSearch.py -S server1 -w -d db1,db2 myobject
Result: Database objects that has “myobject” in db1 and db2 in server1 will be displayed

import pyodbc, argparse
def pp(cursor, data=None, check_row_lengths=True):
    if not data:
        data = cursor.fetchall( )
    names = [  ]
    lengths = [  ]
    rules = [  ]
    for col, field_description in enumerate(cursor.description):
        field_name = field_description[0]
        names.append(field_name)
        field_length = field_description[2] or 12
        field_length = max(field_length, len(field_name))
        if check_row_lengths:
            # double-check field length, if it's unreliable
            data_length = max([ len(str(row[col])) for row in data ])
            field_length = max(field_length, data_length)
        lengths.append(field_length)
        rules.append('-' * field_length)
    format = " ".join(["%%-%ss" % l for l in lengths])
    result = [ format % tuple(names), format % tuple(rules) ]
    for row in data:
        result.append(format % tuple(row))
    return "\n".join(result)
def objectMatch(wildcardSearch, objectNamePattern, dbName, cn):
	if wildcardSearch:
		sql = "select '%s' as dbName, s.name schema_name, o.name, o.type from %s.sys.objects o inner join %s.sys.schemas s on o.schema_id = s.schema_id where o.name like '%%%s%%\' order by dbname, schema_name, o.name" % (dbName, dbName, dbName, objectNamePattern)
		cursor = cn.cursor()
		cursor.execute(sql)
		rows = cursor.fetchall()
		if len(rows) > 0:
			print pp(cursor, rows, 1)
	else:
		sql = "select '%s' as dbName, s.name schema_name, o.name, o.type from %s.sys.objects o inner join %s.sys.schemas s on o.schema_id = s.schema_id where o.name = '%s' order by dbname, schema_name, o.name" % (dbName, dbName, dbName, objectNamePattern)
		cursor = cn.cursor()
		cursor.execute(sql)
		rows = cursor.fetchall()
		if len(rows) > 0:
			print pp(cursor, rows, 1)
parser = argparse.ArgumentParser(description='SQL Server object search. Case insensitive')
parser.add_argument('-S', '--server', help='Instance you wish to connect to. myInstance is the default if not specified', dest='instance', default='myInstance')
parser.add_argument('-w', help='Wildcard search indicator. If specified, LIKE clause will be used for pattern matching. Otherwise it will look for an exact match. Exact match is default', action="store_true", dest='wild', default=False)
parser.add_argument('-d', '--database', help='Database(s) you want to search in. If more than one, separate them by comma only. It will search all databases by default', action="store", dest="dbs")
parser.add_argument('objectNamePattern', help='Object name pattern you want to search for', action="store")
argList = parser.parse_args()
try:
	cn = pyodbc.connect("DRIVER={SQL Server};SERVER=%s;DATABASE=master;Trusted_Connection=yes" % argList.instance)
except:
	print "Couldn't connect to %s. It is down or you might have had a typo." % argList.instance
if argList.dbs is None:
	cursor = cn.cursor()
	cursor.execute("select name from sys.databases where state = 0")
	rows = cursor.fetchall()
	for row in rows:
		objectMatch(argList.wild, argList.objectNamePattern, row.name, cn)
else:
	dbs = argList.dbs.split(',')
	for db in dbs:
		objectMatch(argList.wild, argList.objectNamePattern, db, cn)

Comments

Packages needed for building MySQL/MariaDb/Percona

From a stock/standard/typical/desktop install of Linux, it seems these are required in order to build MySQL/MariaDb/Percona forks:

gcc
gcc-c++
automake
libtool
bison
ncurses (Thanks Justin!)

Do apt-get, yum, rpm, emerge, or whatever to get them before doing configure, make and such. I am missing one, and I think it has “curse” or something like that in its name. Will update this post when I find that out.

Comments (4)

« Previous entries

Page optimized by WP Minify WordPress Plugin