Archive for June, 2011

A comparison of HandlerSocket and mysql client libraries with Python

I’ve done some benchmark testing of 2 Python modules for MySQL data retrieval: MySQLdb and pyhs. MySQLdb uses MySQL’s client libraries, whereas pyhs uses HandlerSocket that bypasses MySQL’s client layer and interfaces Innodb storage engine’s files directly. In my testing, HandlerSocket results in 82% improvement over mysql client libraries based on number of rows retrieved. The tests were conducted under different conditions: right after a start when cache is cold, a warmed up cache after running SELECT * FROM customer, and alternating the execution order of those 2 Python files. The results are fairly consistent in that they all fall in the same range. Below is a sample output.

root@ubuntu:~# python hanSolo.py
Using HandlerSocket, below is a report of how many customer's name and address can be retrieved based on customer key:
Seconds elapsed:  61.0000810623
Rows retrieved:  509863
root@ubuntu:~# python mclient.py
Using mysql client libraries, below is a report of how many customer's name and address can be retrieved based on customer key:
Seconds elapsed:  61.0001530647
Rows retrieved:  280120

Here is my setup:

Hardware and software:
1. Rackspace Cloud server Ubuntu 10.04 Lucid Lynx, 1 gig memory, 40 gig hard disk space, 64-bit
Linux ubuntu 2.6.35.4-rscloud #8 SMP Mon Sep 20 15:54:33 UTC 2010 x86_64 GNU/Linux

2. Following instruction here to get Percona’s APT repository;

3. apt-get install percona-server-client-5.5

4. apt-get install percona-server-server-5.5

Enable HandlerSocket plugin. HandlerSocket is bundled with Percona Server 5.5, so you don’t have to download source files, config, make, build yourself:

1. mysql> install plugin handlersocket soname 'handlersocket.so'

2 cp /usr/share/mysql/my-large.cnf /etc/mysql/my.cnf

3. vim /etc/mysql/my.cnf with the following under mysqld section

loose_handlersocket_port = 9998
loose_handlersocket_port_wr = 9999
loose_handlersocket_threads = 16
loose_handlersocket_threads_wr = 1
open_files_limit = 65535

4. service mysql restart

Acquire Python’s MySQLdb and pyhs modules:

1. apt-get install libmysqlclient-dev
Necessary for building Python’s MySQLdb

2. apt-get install python-dev
Necessary Python header files, gcc, make, etc., for building Python modules

2. wget the appropriate egg from this page. Get the one for your version of Python.

3. sh eggFileDownloadedFromTheStepAbove

4. easy_install MySQL-python
MySQLdb module, which uses mysql client for MySQL access

5. easy_install python-handler-socket

Prepare testing data

1. Follow instructions here to get dbgen compiled;

2. While at the proper directory, run
./dbgen -v -T c
It will generate a customer file that has 150000 rows

3. Create customer table in test. Here is the DDL:

CREATE TABLE customer ( C_CUSTKEY     INTEGER NOT NULL,
C_NAME        VARCHAR(25) NOT NULL,
C_ADDRESS     VARCHAR(40) NOT NULL,
C_NATIONKEY   INTEGER NOT NULL,
C_PHONE       CHAR(15) NOT NULL,
C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
C_MKTSEGMENT  CHAR(10) NOT NULL,
C_COMMENT     VARCHAR(117) NOT NULL,
primary key (C_CUSTKEY));

4. load data local infile ‘/root/dbgen/customer.tbl’ into table customer fields terminated by ‘|’ lines terminated by ‘\n’;
Adjust file location as necessary.

Finally, here is the content of my throwaway Python test scripts. One highlight of mclient.py is it demonstrate how to return results back in as dict with MySQLdb.

hanSolo.py

import time
from pyhs import Manager
hs = Manager()
start = time.time()
i = 1
j= 0
while i < 150000:
	data = hs.get('test', 'customer', ['C_CUSTKEY', 'C_NAME', 'C_ADDRESS'], '%s' % i)
	i=i+1
	if i == 150000:
		i = 1
	end = time.time()
	j = j + 1
	if int(end - start) > 60:
		break
print "Using HandlerSocket, below is a report of how many customer's name and address can be retrieved based on customer key:"
print "Seconds elapsed: ", str(end - start)
print "Rows retrieved: ", str(j)

mclient.py

import sys, MySQLdb, time
my_host = "localhost"
my_user = "root"
my_pass = ""
my_db = "test"
try:
    db = MySQLdb.connect(host=my_host, user=my_user, passwd=my_pass, db=my_db)
except MySQLdb.Error, e:
     print "Error %d: %s" % (e.args[0], e.args[1])
     sys.exit (1)
cursor = db.cursor (MySQLdb.cursors.DictCursor)
i=1
j=0
start = time.time()
while i < 150000:
	sql = "select c_custkey, c_name, c_address from customer where c_custkey=%s" % i;
	cursor.execute(sql)
	results = cursor.fetchall()
	i=i+1
	if i==150000:
		i=1
	end = time.time()
	j=j+1
	if int(end - start) > 60:
		break
print "Using mysql client libraries, below is a report of how many customer's name and address can be retrieved based on customer key:"
print "Seconds elapsed: ", str(end - start)
print "Rows retrieved: ", str(j)
db.close()

Comments

20110622随便写写

靠,太多事儿啦。但看到以前那些说自己忙的帖子,再抱怨,连自由女神都要掩嘴羞答答地扑哧笑了!

看完了《平凡的世界》这三卷巨著,大都是见缝插针和睡觉前读的,真是一种莫大的享受。特别是读进去后,就开始牵挂着那些鲜活的人物的发展和命运。也读了路遥的《早晨从中午开始》的那篇文章,对他的心路有了更深的认识。这样的作品很难得。感谢周燕老师,我对《平凡的世界》的兴趣,产生于几年前读完她的一篇博客,里面提到了这部书。有时间,真想写一下自己的体会和这本书激起的记忆。

上个周末和全家到本地的动物保护协会做义工。据告知美国每年杀掉约两百万只狗,主要原因是不少狗没有做过阉割之类的计划生育绝育手术,生下来的不被人得宠的就只有死路一条了。这个问题据说在南方严重一些。有些狗深得广大人民群众的喜爱,就有人专门做非法的气狗生意,所谓puppy mill,不管种狗的死活,是业界的一大问题之一。

我们可能想养只狗。我小时候有只狗,也养过两只山羊,这个算我的宠物吧。小时候喂的猪和鸡,不算宠物,但那些毛绒绒的小鸡、小鸭、和小鹅真的很可爱。那些黄毛小鸭子,颤颤地往村上的几个汪里走,然后游起来,很招人喜欢。

我奶奶养过不少猫,都是吃干巴鱼儿,娇惯地好像不捉老鼠。我上周才知道,猫到六七个月就能气小猫,那搞结扎套环阉割手术等就更要早。咳,这里的道德、伦理、和尺度很复杂和不好把握。

周六快结束时和太座和儿子坐在地上,拿着绒线和小猫玩儿。那三只猫从我们肩头爬过,软软的猫爪,喉咙里呼噜呼噜地,很有意思,逗得儿子猫和狗都想要。

Comments

Installing Perl DBI and DBD-mysql on Windows 64 bit

I had trouble getting Perl DBI and DBD-mysql on Windows in the past. In addition, on Windows 64-bit, you sometimes see recommendations of using 32-bit Perl.

Today I got to test the latest 64-bit ActiveState Perl distro for Windows, version 5.12.3.1204. I tested it on Windows 2008 R2 64-bit. I am happy to report that it works. I am not categorically recommend FOR the installation of 64-bit Perl on Windows, though.

Here are the steps:
1. Get the ActiveState Perl 64-bit package for Windows and install it, following all the default options;
2. On command prompt, do:
cd c:\perl64\bin
ppm install DBI
ppm install DBD-mysql

I then tested against both Oracle’s MySQL 5.5 Community Server and MariaDb’s 5.2.7 on Windows with MaatKit’s mk-table-checksum to confirm. And it worked fine:

C:\Users\Administrator\Downloads\maatkit-7540\maatkit-7540\bin>c:\Perl64\bin\perl.exe mk-table-checksum –databases mysql h=localhost,u=root,p=password

Comments

最近忙啊,并且这种繁忙可能还要持续一阵子。

今天忙完了,就把我们培育的辣椒栽子种上了,十几棵吧,希望有个好收成。看到旁边的几棵牡丹花,已经盛开,白里透红,怪好看的。

想起了小时候的流行歌曲,《牡丹之歌》,这个我会唱,第一段的歌词会背:

啊牡丹
百花丛中最鲜艳
啊牡丹
众香国里最壮观
有人说你娇媚
娇媚的生命哪有这样丰满
有人说你富贵
哪知道你曾历尽贫寒
啊牡丹啊牡丹
哪知道你曾历尽贫寒

啊牡丹
百花丛中最鲜艳
啊牡丹
众香国里最壮观
冰封大地的时候
你正蕴育着生机一片
春风吹来的时候
你把美丽带给人间
啊牡丹啊牡丹
你把美丽带给人间
你把美丽带给人间

也因此联想到现在的对所谓“红歌”的讨论。我最近刚读完Tom Brokaw(汤姆-布罗考)的Boom!-Talking about the Sixties,还好啦,马马虎虎。但其中的保罗-西蒙(如翻成“萨尔蒙”会更接近其原音)的一段话挺有意思的:

For people who were there, the songs are associated with a certain memory or person, and the power of that magnified over the years. The songs are our lives. If I tried to write something like ‘The Sound of Silence’ now, it would be wrong. That time frame doesn’t exist anymore.(简装本第529页)

我对西蒙的上述评论深表赞同。唱一下自己年轻时的歌,唱一下自己父母喜欢的歌,唱一下自己成长中的歌,怀旧一下,不行吗?五毛和美分,民主柿油党和乌有之乡们,跳出来上纲上线,非得左呀右呀地到处贴个标签,喳喳呼呼,往自己预先订阅的“主义”里安排,这样才觉得安全,才觉得真理在握。你们猴什么急呀?!

感谢虾米,让我可以听到那些老歌曲。我自己的一些精选集里,有许巍,汪峰,崔健,伍佰,山东老乡并且唱出山东口音的谢天笑与冷血动物,也有《三峡好人》里的小马哥用来当手机铃声的《上海滩》,张蔷的《走过咖啡屋》,汪明荃的《万水千山总是情》,刘文正的《雨中即景》(“你有钱坐不到!”),邓丽君的《原乡人》,当然也有刀郎演绎的《祝酒歌》(“豪情那胜过长江水,胜过长江水”)和《驼铃》(“顶天立地雄心在,不负人民养育情”),李谷一的《边疆的泉水清又纯》,郭颂的《乌苏里船歌》,关贵敏的《浪花里飞出欢乐的歌》,卞小贞的《我们的生活充满阳光》(那时髦的电子琴声音多有意思呀),叶佩英的《我爱你中国》,等。有时间我还要细细发掘,如郭兰英的《人说山西好风光》等。

西蒙还说了段这个,也挺有意思的:

The culture we’re in now is controlled by popular polarizers. If that’s the way the world is headed, we’re in for a lot of pain.

I have an exceptionally privileged life, and I am not entitled to any complaints. But then I go on and complain anyway.(简装本第529页)

哈,2003年,我曾有幸和我的爱人一起看过西蒙和佳芬克尔的Old Friends演唱会,挺不错的。我爱人曾在伦敦当过一阵子au pair(翻译成丫鬟比较贴切)时接触到西蒙和佳芬克尔的音乐。我是90年代初在厦大第一次接触到西蒙和佳芬克尔还有披头士,当时我就震惊了。当然也在那时听到了更多的卡朋特音乐!

读完了《平凡的世界》第一部。力荐!

Comments

Page optimized by WP Minify WordPress Plugin