Archive for May, 2011

Finding long running INNODB transactions

Notes:
1. The script prints out elapsed time since transaction started, MySQL thread id, and the kill statement for transactions running longer than a defined threshold value, in seconds. Just copy, paster, and then execute the kill statement if you want to terminate the long transaction(s);
2. Adjust shellCmd variable;
3. Adjust longRunningThreshold value as needed. It is measured in seconds;
4. No special libraries/modules needed, as long as there is a working mysql client;
5. re module is used for regex processing. Good place to find examples of regular expression search and grouping. A status variable is used to assist locating MySQL thread id once a transaction running longer than the defined threshold is found.

import re, shlex, subprocess
def runCmd(cmd):
    proc = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False)
    out, err = proc.communicate()
    ret = proc.returncode
    return (ret, out, err)
shellCmd = """mysql -h hostName -e "show innodb status\\G""""
longRunningThreshold = 600
returnCode, stdOut, stdErr = runCmd(shellCmd)
targetTransactionFound = False
for line in stdOut.split('\n'):
    if targetTransactionFound:
        match = re.search(r'^MySQL\sthread\sid\s(\d+),', line)
        if match:
            print 'MySQL thread id', match.group(1), ' has been running for ', secondsTransactionElapsed, ' seconds'
            print 'To kill it, run: kill', match.group(1)
            targetTransactionFound = False
    else:
        match = re.search(r'^---TRANSACTION\s\w+,\sACTIVE\s(\d+)\ssec', line)
        if match:
            if (long(match.group(1)) > longRunningThreshold):
                targetTransactionFound = True
                secondsTransactionElapsed = match.group(1)

Comments (3)

雅舍小品若干

1. 昨天在外面和儿子及朋友玩完了讲中文的地产大亨游戏,回到家后,大小肠里积了些气体。我坐在椅子上,慢慢地左右挪动,轻掰肛门,放了几个哑屁,舒服了不少。但结果一个动作做大,虽没出声,仍被老婆逮个正着。她不客气地指出了我的行径,遂大笑。我顺水推舟,给儿子讲了下爸爸的小秘密和几个注意事项:哑屁要在没臭味的时候才能放,要不动声色。老婆反讽道:“呿,还不动声色,你以为别人看不出来?!”

2. 在读路遥的《平凡的世界》,非常非常好的一本书,绝对是中国文学的一个经典和丰碑。路遥的英年早逝真是太可惜了。我现在很想看看根据他的《人生》小说而改编的同名电影。我小时候应当看过这部电影,知道反响很大,但那时候什么都不懂。

《平凡的世界》勾起了我对小时候我们农村老家的回忆:农村农业户口小地方出身的人有很多是相通的,虽然路遥比我大二十多岁,虽然我们的家乡隔不少的距离。那些记忆的片段,童年的感觉,味道,气息,苦涩,尴尬,趣味等等等等,都散落在脑海的各个角落。一时想不起,并不意味着不存在。它们散落在记忆的旮旯里,只要有了恰当的引子,就如涌泉一样浮了上来,给人回味、悲伤、感激、希望、和力量。

我还没有读完,并且这是我的第一本路遥的书。但仅从我现在的体会,我感觉百度百科的一些评论很贴切,如:

–路遥的追求与成功,他的忧思与矛盾,都同他的心理结构有着密切的联系,他被称为“土著”作家,主要受到的是农民文化的影响,作为农民的儿子,他深深地爱着他的故乡,承袭和接受了传统文化的影响,以农民生活作为他取之不尽的源泉。但他又是一名“文明”的作家,他立意高远,广纳博取,时时瞻望世界文化,他喜欢《红楼梦》 、鲁迅的作品,巴尔扎克、托尔斯泰、肖洛霍夫的作品他更是百读不厌。各种报纸期刊他也经常翻阅,勤奋大量的阅读,丰富多彩的现实生活,再参之以他独特的生活体验,使他的创作博大宏阔却又情深意长。

– 著名作家、陕师大副教授朱鸿表示,路遥的精神遗产至少有以下四点:第一,他对文学事业的那种神圣感,以整个生命去打造自己的文学;第二,他对普通人命运深刻、持久地关注;第三,他所塑造的高加林、孙少平等人物形象,给了社会底层特别是正处于奋斗中的青年,以永远的感情共鸣与精神鼓励;第四,他尽可能地挖掘、表现了每个人本身潜在的朴素而又宝贵的精神。这四点足以使一位作家永远不朽。“路遥是我尊敬的朋友和师长”,著名作家、省文联副主席高建群如此说道。一个作家去世近二十年了,人们还在热烈地怀念他,还在谈论他的作品,这本身就是对一个作家最高的奖励。路遥的作品中那些人物及其命运,已远远超越了文学的范畴,他给一切卑微的人物以勇气与光亮,让他们知道自己能够走多远。

3. 今天早上,我给老婆谈起这本书。我说,《平凡的世界》那么宏大,页数那么多,并且中文相比英文是那么的浓缩,够我享受一阵子。老婆装天真开玩笑:“呕,我有个提议:在全世界普及中文,这样印刷出来的书不会浪费那么多纸张。绿色世界,指日可待!”

4. 出去翻地松土。我们培育的辣椒栽子快能种上了。

What’s up with WordPress? It’s not behaving lately and I’ve got to jump through hoops to save and publish a post! Man, the trouble I go through for my legion of fans worldwide!

Comments

Some SQL Server 2008 page compression observations

A few days ago I wrote about Infobright’s column-based storage engine, and compared the sizes of raw text data file, gzipped file, MyISAM files, and Infobright files. At that time, I also wanted to compare that against data compression in SQL Server 2008, which is a new feature. But the Windows cloud server instance I fired up at the time didn’t have enough disk space, so I temporarily aborted that endeavour, until today.

Once again, testing data was generated using TPC-H’s dbgen tool. In fact I took the same steps outlined here. The total raw text file size is around 8.8 GB. I then created 2 SQL Server tables without any index (heap), one without compression and one with page compression. The DDL used is based on DDL listed in this post, without the indexes.

I used BULK INSERT for data loading. Here is the statement used:

BULK INSERT [testDb].[dbo].[LINEITEM]
FROM 'C:\Users\Administrator\Documents\lineitem.tbl'
WITH(CHECK_CONSTRAINTS,CODEPAGE='RAW',DATAFILETYPE='char',
FIELDTERMINATOR='|',ROWTERMINATOR='0x0a')

I then calculated storage space taken with sp_spaceused stored procedure.

Here is the results in GB:

Raw Text: 8.8
Raw Text After GZIP: 2.6
Uncompressed SQL Server table data size: 9.5
Compressed SQL Server table data size: 7.4

For clarity’s sake, here is the results with Infobright and MyISAM on Linux:
MyISAM File Size: 7.2
InfobrightFileSize: 1.5

SQL Server 2008 provides a system stored procedure that gives a size estimation if compression is used for a non-compressed table/index.The estimated size of a compressed table is 5.4 GB. Comparing it with 7.4, the estimation appears to be optimistic, in this case.

Comments

Some notes and observations on ICE storage engine

I’ve used Vertica, a commercial column-based database storage engine, and was reasonably impressed. During O’Reilly MySQL conference last month, I checked out Infobright’s vendor booth and talked with some users. I became curious and wanted to test it out. Infobright has a free community version (ICE Infobright Community Edition)of its column-based storage engine that works with MySQL, which was what I used for my testing. I have no relationship with Infobright whatsoever, I happen to think that column-based storage can be a potentially disruptive technology in the BI/DW field. I’d love to hear your comments/experiences.

Here are some noteworthy points:

1. Setup is pretty easy. You can follow steps here. Note that the package has most relavant MySQL tools. A separate install of mysql client and server is NOT needed.

The ICE package has the following storage engines bundled:
BRIGHTHOUSE
MRG_MYISAM
CSV
MyISAM
MEMORY

2. I used TPC-H’s dbgen tool to generate data for testing. The raw text file is around 8.8 GB, about 72 million rows;

3. I used Rackspace’s cloud server, CentOS 5.5, 1 GB memory, 64-bit, 35 GB of hard drive space for testing. I created 2 databases, each with one table called lineitem: one table uses the BRIGHTHOUSE storage engine, the other uses the MyISAM storage engine. No index on the MyISAM table initially;

4. On this particular Rackspace server, below is the record of how long it took to load that amount of data into BRIGHTHOUSE table:

# time mysql-ib infobright < load.sql
real	22m46.974s
user	0m2.320s
sys	0m16.140s

And here is the record of how long it takes to load into the MyISAM table:

# time mysql-ib test < load.sql
real	6m11.966s
user	0m1.960s
sys	0m14.420s

Here is what’s inside load.sql:

load data local infile '/root/dbgen/lineitem.tbl' into table lineitem fields terminated by '|' lines terminated by '\n';

5. Size comparison, in GB. Here you can see the power of the impressive compression rate of a column-based storage engine:
Raw Text: 8.8
Raw Text After GZIP: 2.6
MyISAM File Size: 7.2
InfobrightFileSize: 1.5

6. I did a rudimentary performance comparison. The first one is on Infobright table, the second on MyISAM table:

mysql> select count(*) from lineitem where l_shipdate between '1993-01-01' and '1995-01-01';
+----------+
| count(*) |
+----------+
| 21880025 |
+----------+
1 row in set (15.06 sec)
mysql> use test;
Database changed
mysql> select count(*) from lineitem where l_shipdate between '1993-01-01' and '1995-01-01';
+----------+
| count(*) |
+----------+
| 21880025 |
+----------+
1 row in set (1 min 9.23 sec)

I then created an index on l_shipdate for the MyISAM table, and improved the performance to a bit more than 10 seconds.

Comments (2)

Page optimized by WP Minify WordPress Plugin