Archive for Linux

Install MySQLdb module for Python

Update:

Commenter MarkR made a great point: if possible, use some packaging tools, to try to maintain proper dependencies, to the extent that is possible. Install from the source should be Plan B. So, try yum install MySQL-python first.

This is mostly for my own future reference. It’ll be icing on the cake if it helps you!

This is geared for CentOS or Red Hat. Use apt-get or other packaging tools for different flavours of Linux.

1. Get Python module setuptools called easy_install. I love easy_install, by the way, sort of like CPAN for Perl modules;
2. To install MySQLdb package, you would think easy_install MySQLdb would do. But that is not the case. I hope the developer would fix that. Instead, you need:

easy_install MySQL-python

3. If you have build errors, you may need:
yum install python-devel or yum install gcc or both.

Update: one more tip. If you encounter:

gcc: error trying to exec ‘cc1plus’: execvp: No such file or directory, you probably need:

yum install gcc-c++

Comments (2)

SELinux and “failed to map segment from shared object” error

I am reading and following examples in MySQL 5.1 Plugin Development. After compiling and moving a .so file (think DLL or Assembly file in Windows) into MySQL plugin directory, I got this message when I tried to create a UDF (User Defined Function):

mysql> create function udf_staticexample returns integer soname 'udf_staticexample.so';
ERROR 1126 (HY000): Can't open shared library 'udf_staticexample.so' (errno: 0 /usr/lib/mysql/plugin/udf_staticexample.so: failed to map segment from shared object: Permission denied)

This is caused by the fact that the .so shared object file is not in the right SELinux security context:

[root@asusfedora plugin]# ll -Z
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_archive.so -> ha_archive.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_archive.so.0 -> ha_archive.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_archive.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_blackhole.so -> ha_blackhole.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_blackhole.so.0 -> ha_blackhole.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_blackhole.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_example.so -> ha_example.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_example.so.0 -> ha_example.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_example.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_federated.so -> ha_federated.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_federated.so.0 -> ha_federated.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_federated.so.0.0.0
-rwxr-xr-x. root root unconfined_u:object_r:user_home_t:s0 udf_staticexample.so

To fix this, use chcon –reference to change the security context of the file in question to a referenced file, like this:

[root@asusfedora plugin]# chcon --reference=/usr/lib/mysql/plugin/ha_archive.so /usr/lib/mysql/plugin/udf_staticexample.so
[root@asusfedora plugin]# ll -Z
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_archive.so -> ha_archive.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_archive.so.0 -> ha_archive.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_archive.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_blackhole.so -> ha_blackhole.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_blackhole.so.0 -> ha_blackhole.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_blackhole.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_example.so -> ha_example.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_example.so.0 -> ha_example.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_example.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_federated.so -> ha_federated.so.0.0.0
lrwxrwxrwx. root root system_u:object_r:lib_t:s0       ha_federated.so.0 -> ha_federated.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       ha_federated.so.0.0.0
-rwxr-xr-x. root root system_u:object_r:lib_t:s0       udf_staticexample.so

All is well afterwards.

mysql> create function udf_staticexample returns integer soname 'udf_staticexample.so';
Query OK, 0 rows affected (0.00 sec)
mysql> select udf_staticexample();
+---------------------+
| udf_staticexample() |
+---------------------+
|              318749 |
+---------------------+
1 row in set (0.00 sec)

Comments

Data generation with TPC-H’s dbgen for load testing

2011-06-26 update:

I am not sure if there are any changes in the latest make and gcc packages. Anyway, I noticed when run make, I encountered the message below:

make: g: Command not found
make: [qgen] Error 127 (ignored)

To fix this, find where gcc is at, then created a symbolic link g that points to gcc. All is well afterwards:
[root@ip-10-245-209-196 dbgen]# which gcc
/usr/bin/gcc
[root@ip-10-245-209-196 dbgen]# cd /usr/bin/
[root@ip-10-245-209-196 bin]# ln -s gcc g

End update
Recently I found myself doing some data loading benchmark testing with table partition. Data loading and storing for BI/DW/DSS stuff almost always involves data partitioning. SQL Server partition has a nice feature called partition switch, where you can swap data in and out of a partitioned table. In MySQL, it is called exchange, but it hasn’t been released. Perhaps other RDBMS have similar things as well.

To facilitate testing, I need some data that lends itself easily for partition, which led me to TPC-H’s dbgen tool.

dbgen is pretty easy to build and compile on Linux. It also has a Windows version. I didn’t bother trying because getting it built on Linux was such a quick and easy process. However, the documentation of dbgen, if it can be called as such, leaves a lot to be desired. But by experimenting and surfing around, I was able to get what I needed.

Instructions here are for CentOS. Building it on other flavors of Linux should also be easy.

1. yum install make and yum install gcc if you don’t have that already;

2. wget source code;

3. tar xf tarball;

4. cd dbgen

5. vi makefile.suite and change 4 lines. See uncommented lines below:

################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH
DATABASE= SQLSERVER
MACHINE = LINUX
WORKLOAD = TPCH

Note: It does not have MySQL and others for DATABASE setting, but that really doesn’t matter, because the file generated, which is delimited by a pipe, |, can be used for loading into any system.

6. make -f makefile.suite
This creats the dbgen binary that we can use in the next step.

7. ./dbgen -h gives you a brief description of switches that can be used. I am just interested in creating the lineitem table, because it has a good combination of integer, decimal, date, and character fields. The date inside the tables spans between 1992-01-01 and 1998-12-31, which is good for a partitioned table based on date. The DDL for this table is inside the dss.ddl file in the same directory. Please modify it as necessary. In fact, Lubor Kollar has it ready for SQL Server here, and Vadim Tkachenko has it for MySQL here.

Here is the command I used and its results:

[root@centos dbgen]# ./dbgen -v -T L -s 2
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
Generating data for lineitem table/
Preloading text ... 100%
Do you want to overwrite ./lineitem.tbl ? [Y/N]: Y
done.

-v: verbose, -T L: lineitem only, -s 2: scale factor of 2. My understanding is that it roughly indicates that the end file will be close to 2 gig.

It turns out the end file is about 1.5 gig, with close to 12 million rows in it.

-rw-r--r-- 1 root root  1.5G Mar 30 21:33 lineitem.tbl
[root@centos dbgen]# wc -l lineitem.tbl
11997996 lineitem.tbl

You can play around with different key switches to get the size of file you want. For example, quoting from Lubor Kollar:

dbgen –T L -s 4 -C 3 -S 1

Using the –s option, we set the scale to 4 that generates a Lineitem table of 3GB. Using the –C option we split the table into 3 portions, and then using the –S option we chose only the first 1GB portion of the Lineitem table.

Running the command above, I got a file named lineitem.tbl.1 with close to 8 million rows, about 976 meg.

When trying to load it into SQL Server with SSIS bulk copying task, remember the delimiter is | and line separator is LF (line feed). For MySQL, something like this should do:

load data local infile '/root/dbgen/lineitem.tbl.1' into table lineitem fields terminated by '|' lines terminated by '\n';

Comments (2)

Parse sqlio log file with Python pyparsing

A few weeks ago I posted some questions on 3 Python text processing modules: pyparsing, SimpleParse, and NLTK. Today I need to analyze a log file generated by sqlio. I decided to use pyparsing. I am pretty pleased with it.

I am aware that there are at least 2 utility scripts for this: one is in Perl and then other is in Windows PowerShell. But I wanted to write in Python.

Noteworthy points:

1. In analyzing and building up the grammar/structure of the text for Python, I opted for line by line analysis and coding. I think this brings clarity and is easier to read. It felt a bit tedious in building it up, but once it is done, it is pretty rewarding to see the end results;
2. After parsing, the script writes the results to a csv file. You can then open that in a spreadsheet program to create some charts. Seeing the results in a column diagram chart brings clarity and focus to the result. It worked pretty well for me.

All in all, a good and productive day. Below are the details. The top part is the sample log content to be parsed, followed by the program.

"""
c:\Program Files (x86)\SQLIO>sqlio -kW -t2 -s30 -dH -o1 -frandom -b64 -BH -LS Testfile.dat
sqlio v1.5.SG
using system counter for latency timings, 2208037 counts per second
2 threads writing for 30 secs to file H:Testfile.dat
	using 64KB random IOs
	enabling multiple I/Os per thread with 1 outstanding
	buffering set to use hardware disk cache (but not file cache)
using current size: 24576 MB for file: H:Testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:  6106.50
MBs/sec:   381.65
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 0
Max_Latency(ms): 5
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 99  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
c:\Program Files (x86)\SQLIO>sqlio -kW -t2 -s30 -dH -o2 -frandom -b64 -BH -LS Testfile.dat
sqlio v1.5.SG
using system counter for latency timings, 2208037 counts per second
2 threads writing for 30 secs to file H:Testfile.dat
	using 64KB random IOs
	enabling multiple I/Os per thread with 2 outstanding
	buffering set to use hardware disk cache (but not file cache)
using current size: 24576 MB for file: H:Testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:  6951.19
MBs/sec:   434.44
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 0
Max_Latency(ms): 6
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 87 12  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
"""
from pyparsing import *
output = open("sample.csv", "w")
input = open("hDriveSqlioResults.txt", "r")
data = input.read()
#------------------------------------------------------------------------
# Define Grammars
#------------------------------------------------------------------------
integer = Word(nums)
singleLetter = Word(alphas, max=1)
realNumber = Combine(Word(nums) + '.' + Word(nums))
reading = Literal("reading")
writing = Literal("writing")
readingOrWriting = reading | writing
random = Literal("random")
sequential = Literal("sequential")
randomOrSequential = random | sequential
#Line 1 is useful. We want to suck out the juicy parts.
sectionBeginning = CaselessLiteral("c:\\program files (x86)\\sqlio>sqlio ")
ioKind = "-k" + singleLetter("k")
numThreads = "-t" + integer("t")
numSeconds = "-s" + integer("s")
driveOrMountPath = "-d" + singleLetter("d")
outstandingRequests = "-o" + integer("o")
accessType = "-f" + Word(alphas)("f")
ioBlockSizeInKB = "-b" + integer("b")
#Line 2 can be safely ignored
line2 = "sqlio v" + restOfLine
#Line 3 can be safely ignored
line3 = "using system counter for latency timings" + restOfLine
#Line 4 can be safely ignored
line4 = integer + "threads" + readingOrWriting + restOfLine
#Line 5 can be safely ignored
line5 = "using" + integer + "KB" + randomOrSequential + "IOs" + restOfLine
#Line 6 can be safely ignored
line6 = "enabling multiple " + restOfLine
#Line 7 can be safely ignored
line7 = "buffering set to use" + restOfLine
#Line 8 we should get the file size out
line8 = "using current size: " + integer("fileSize") + Word(alphas)("fileSizeUnit") + restOfLine
#Line 9 can be safely ignored
line9 = "initialization done" + restOfLine
#Line 10 can be safely ignored
line10 = "CUMULATIVE" + restOfLine
#Line 11 can be safely ignored
line11 = "throughput" + restOfLine
#Line 12 we want IOPS
line12 = "IOs/sec:" + realNumber("IOPS") + restOfLine
#Line 13 we want MBPS
line13 = "MBs/sec:" + realNumber("MBPS") + restOfLine
#Line 14 can be safely ignored
line14 = "latency" + restOfLine
#Line 15 we need to get minLatency out
line15 = "Min_Latency(ms):" + integer("minLatency") + restOfLine
#Line 16 we need to get avgLatency out
line16 = "Avg_Latency(ms):" + integer("avgLatency") + restOfLine
#Line 17 we need to get maxLatency out
line17 = "Max_Latency(ms):" + integer("maxLatency") + restOfLine
#Line 18 can be safely ignored
line18 = "histogram" + restOfLine
#Line 19 can be safely ignored
line19 = "ms:" + restOfLine
#Line 20 can be safely ignored
line20 = "\%:" + restOfLine
extraStuff = ZeroOrMore(line14 | line15 | line16 | line17 | line18 | line19 | line20)
logEntry = sectionBeginning + ioKind + numThreads + numSeconds + driveOrMountPath + outstandingRequests + accessType + ioBlockSizeInKB + restOfLine + line2 + line3 + line4 + line5 + line6 + line7 + line8 + line9 + line10 + line11 + line12 + line13 + extraStuff
output.write("IO property, IOPS, MBPS, minLatencyInMS, avgLatencyInMS, maxLatencyInMS\n")
for tokens in logEntry.searchString(data):
	output.write("%(k)s%(t)s threads %(o)s queue %(f)s in %(b)s KB chunks,%(IOPS)s,%(MBPS)s,%(minLatency)s,%(avgLatency)s,%(maxLatency)s\n" % tokens)

Comments (5)

Moving a LAMP site

I’ve done this in the past, but thought this time I’ve got to take notes. It can be used as a crude check list in the future. Don’t underestimate the power of a practical, down-to-earth check list! Perhaps documents like this should be kept in a wiki page, for easy updating to avoid being stale, a proeblem with blog entries, it seems.

P in LAMP here stands for php, not Python or Perl. L is CentOS (I used CentOS 5.5) or Red Hat Linux. I am not covering moving all databases in a MySQL instance, just a select few or just one.

I’d appreciate your comments or suggestions.

Software install and configuration
MySQL:
MariaDB or Percona.

MariaDB can be had here.

For Percona server and client tools, it’s best to have direct access to Percona’s repository:

yum install gpg
rpm -Uhv http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm

Do

yum search Percona

to find Percona packages you are interested in.

Percona MySQL Server package does not create /etc/my.cnf for you. In today’s server, I think you should use /usr/share/doc/Percona-Server-server-51-5.1.54/my-huge.cnf as default /etc/my.cnf if you are not sure. Remember to turn on the slow-query-log. Also, remember to set interactive-timeout=600. You may also need to increase the max_connections setting.

Open up port 3306:
Edit /etc/sysconfig/iptables: add this line somewhere above COMMIT

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3306 -j ACCEPT

Restart the firewall:

service iptables restart

Note: Here is how I came up with this configuration. I found the line that has 22 in it, which is the default port for ssh, yanked and pasted it, replacing 22 with 3306. This works, but I am not sure if it is done correctly. Comments welcome.

On source MySQL server:
1. FLUSH TABLES WITH READ LOCK
2. The backup can be done many ways. This time I used xtrabackup, but I am leaning toward using mysqldump or mysqlhotcopy exclusively next time I do it, because I was not satisfied with innobackupex-1.5.1′s –stream and –apply-log functions. Perhaps my bad experience was due to working with a particularly messed up MySQL instance and/or a under-powered server. Percona did mention that they are not satisfied with innobackupex so hopefully a better version will come out.

In any case, here is what I did with innobackupex-1.5.1 for database backup. It did get all MyISAM tables correctly, though, just not the InnoDB tables:

innobackupex-1.5.1 --databases=myDB --user=samIam --password=IamSam --stream=tar ./ | gzip -> /home/sam/myDb.tar.gz

For InnoDB tables, I did:

mysqldump -usamIam -pIamSam myDb innoDbTable > innoDbTable.sql

For stored routines, I did:
[sourcecode language="text"}
mysqldump --routines --no-create-info --no-data --no-create-db --skip-opt myDb > routine.sql
[/sourcecode]

3. Move the backup file to the new server. Percona mentioned a few creative ways of moving files here.

4. Use the excellent mk-show-grants, part of Maatkit, to get the login/user/permission generation script. Modify that as necessary.

On destination MySQL server:

1. Unpack myDB.tar.gz and move all myDb directory and its contents to MySQL’s data-dir, perhaps /var/lib/mysql or /usr/local/mysql/data. While inside MySQL data-dir on the shell, do

chgrp -R mysql myDB/
chown -R mysql myDB/

If you took a dump earlier, pun not intended, run this on the shell

mysql myDB < innoDbTable.sql

For stored routines, move over the dumped file, and run this:

mysql myDb < routine.sql

2. Do a select count(*) on all myDB tables. I don’t have a clever way of doing this, I did “show tables”, grabbed the results, and hacked with Vim’s macro to generate a bunch of select count(*) statements and ran it. This should let you know if there are any issues with your newly restored myDB. mysqldump -d myDB is also a good thing to run to find out problematic tables, I think.

3. Run the login creation script generated earlier with mk-show-grants

Web server:

If a brand new web server, these are needed, at a minimum

yum install php
yum install php-mysql
yum install httpd

It might be necessary to edit /etc/php.ini, so there is a line
extension=myql.so
under Dynamic Extensions

Comments

Search objects in SQL Server databases with pyodbc

Command line utility to search for objects based on names. Search results will be displayed in well formatted tabular format, left-justified.

1. pyodbc installed;
2. Use trusted authentication by default. Find the relevant code to adjust to login/password as needed;
3. -w for wild card search. Default is exact name match, case insensitive in almost all cases, depending on your SQL Server configuration;
4. If -d is not specified, all databases in instance will be searched, minus the ones that one not accessible at the moment. To specify database(s) you want to search into, use -d followed by database names. Separate them with comma if more than one;
5. Use -S to specify instance;
6. Good sample code for command line parameters processing and pyodbc usage;
7. Won’t work with SQL Server 2000 or lower.

Example 1: python objectSearch.py myobject -S instance1
Result: Database objects that are named myobject in instance1 will be displayed

Example 2: python objectSearch.py -S server1 -w -d db1,db2 myobject
Result: Database objects that has “myobject” in db1 and db2 in server1 will be displayed

import pyodbc, argparse
def pp(cursor, data=None, check_row_lengths=True):
    if not data:
        data = cursor.fetchall( )
    names = [  ]
    lengths = [  ]
    rules = [  ]
    for col, field_description in enumerate(cursor.description):
        field_name = field_description[0]
        names.append(field_name)
        field_length = field_description[2] or 12
        field_length = max(field_length, len(field_name))
        if check_row_lengths:
            # double-check field length, if it's unreliable
            data_length = max([ len(str(row[col])) for row in data ])
            field_length = max(field_length, data_length)
        lengths.append(field_length)
        rules.append('-' * field_length)
    format = " ".join(["%%-%ss" % l for l in lengths])
    result = [ format % tuple(names), format % tuple(rules) ]
    for row in data:
        result.append(format % tuple(row))
    return "\n".join(result)
def objectMatch(wildcardSearch, objectNamePattern, dbName, cn):
	if wildcardSearch:
		sql = "select '%s' as dbName, s.name schema_name, o.name, o.type from %s.sys.objects o inner join %s.sys.schemas s on o.schema_id = s.schema_id where o.name like '%%%s%%\' order by dbname, schema_name, o.name" % (dbName, dbName, dbName, objectNamePattern)
		cursor = cn.cursor()
		cursor.execute(sql)
		rows = cursor.fetchall()
		if len(rows) > 0:
			print pp(cursor, rows, 1)
	else:
		sql = "select '%s' as dbName, s.name schema_name, o.name, o.type from %s.sys.objects o inner join %s.sys.schemas s on o.schema_id = s.schema_id where o.name = '%s' order by dbname, schema_name, o.name" % (dbName, dbName, dbName, objectNamePattern)
		cursor = cn.cursor()
		cursor.execute(sql)
		rows = cursor.fetchall()
		if len(rows) > 0:
			print pp(cursor, rows, 1)
parser = argparse.ArgumentParser(description='SQL Server object search. Case insensitive')
parser.add_argument('-S', '--server', help='Instance you wish to connect to. myInstance is the default if not specified', dest='instance', default='myInstance')
parser.add_argument('-w', help='Wildcard search indicator. If specified, LIKE clause will be used for pattern matching. Otherwise it will look for an exact match. Exact match is default', action="store_true", dest='wild', default=False)
parser.add_argument('-d', '--database', help='Database(s) you want to search in. If more than one, separate them by comma only. It will search all databases by default', action="store", dest="dbs")
parser.add_argument('objectNamePattern', help='Object name pattern you want to search for', action="store")
argList = parser.parse_args()
try:
	cn = pyodbc.connect("DRIVER={SQL Server};SERVER=%s;DATABASE=master;Trusted_Connection=yes" % argList.instance)
except:
	print "Couldn't connect to %s. It is down or you might have had a typo." % argList.instance
if argList.dbs is None:
	cursor = cn.cursor()
	cursor.execute("select name from sys.databases where state = 0")
	rows = cursor.fetchall()
	for row in rows:
		objectMatch(argList.wild, argList.objectNamePattern, row.name, cn)
else:
	dbs = argList.dbs.split(',')
	for db in dbs:
		objectMatch(argList.wild, argList.objectNamePattern, db, cn)

Comments

Packages needed for building MySQL/MariaDb/Percona

From a stock/standard/typical/desktop install of Linux, it seems these are required in order to build MySQL/MariaDb/Percona forks:

gcc
gcc-c++
automake
libtool
bison
ncurses (Thanks Justin!)

Do apt-get, yum, rpm, emerge, or whatever to get them before doing configure, make and such. I am missing one, and I think it has “curse” or something like that in its name. Will update this post when I find that out.

Comments (4)

SSH without typing password using public key

I’ve done transferring public key id_rsa.pub (or id_dsa.pub and identity.pub) and appending that to authorized_keys (or authorized_keys2) file on remote host many times in the past, but the last time I did that was a bit over 2 months ago. Today I wanted to write a Python script for blog database backup and copying to my new home Linux machine with paramiko. So I thought I should put a note here for setting up ssh connection without typing the password by using public key. Here is what I did on my machine.

Note: The machine you operate on might have public identity key generated already. Look for identity.pub, id_rsa.pub, or id_dsa.pub under ~/.ssh/. In my case, my machine is fairly new and I haven’t generated it. ssh-gen asks for a passphrase and I didn’t provide any.

ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa.pub login@remoteHost

ssh-copy-id takes care of the whole key copying and appending business, which is nice. In my search, I also came across sshfs that can mount remote file system, which looks intriguing and useful.

Additional resources:
SSH with Keys in a console window
ssh-copy-id man page.

Comments

Using Linux Mint as my main operating system

Man, a functional, reliable, day-to-day use Linux desktop distro, I’ve been chasing that unicorn for a while. I think I might have caught one yesterday. Time will tell, but so far it looks promising.

Here is a list of distros I’ve tried:
Ubuntu 9.04 64-bit and 32-bit. I am sure I’ve tried versions prior to 9.04;
Fedora 11 64-bit. I might have tried 32-bit and versions prior to 11;
Suse Linux 11.1 64-bit. I might have tried 32-bit and versions prior to 11.1;
Sabayon 4.2 64-bit. I am sure I’ve tried the 32-bit and versions prior to 4.2;
Kubuntu 9.04 64-bit and 32-bit;
Centos 5.3 and 5.5 64-bit and 32-bit.
Mint 7 and 8, a distro based on Ubuntu, both 32-bit and 64-bit.

During my last gig, I used Centos 5.3 as my main workhorse computer with identical dual monitors for close to a year, and I really liked it. I used MacBook Pro during that time as well. MacBook is a decent machine, but I couldn’t stand the cult surrounding Apple and Apple’s arrogance.

So I want to continue the practice of using Linux as my default day to day operating system, and running a Windows virtual machine on top of it for Outlook email and calendar, and some other necessary Windows functions. I was really impressed with Mint last time I tried it. My machine is this several years old HP dv6253cl laptop. Here is the gist of things:

1. Mint 9 64-bit didn’t work. I installed Mint 9 32-bit;
2. Administration -> Hardware Drivers. I installed Broadcom STA wireless driver. To do that, you need a wired connection first so it could download necessary files;
3. Administration -> Hardware Drivers. I installed NVIDIA accelerated graphics driver (Version 173). I then went to Administration -> NVIDIA X Server Settings and setup dual monitors. Experiment and find the setup you like;
4. Skype works. Use Software Manager, search for Skype, and install;
5. GMail “call phone” works once you install the 32-bit deb package;
6. I installed VirtualBox and built a Windows 7 VM. I used VMWare Player first but I had trouble setting up the VM’s network connection.
7. Flash works. No need to compile and install it yourself.

Comments (1)

Testing Windows IO with SQLIO and SysBench

To benchmark IO on Linux and MySQL transaction processing, SysBench is a popular choice that can do both. After poking around at the source code, it seems PostgreSQL and Oracle are also included for transaction processing testing if you have the proper header files, but I didn’t test those.

To benchmark IO on Windows and SQL Server transaction processing, Microsoft provides two tools, SQLIO and SQLIOSim. SQLIO is a misnomer in that it really doesn’t have much to do with SQL Server. It is a general purpose disk IO benchmark tool.

So today I was playing with SysBench and noticed that I can compile and build it on Windows as well. I decided I should run IO benchmark on a single machine with both tools (SQLIO and SysBench), and see if I could reconcile the results.

To make things simple, I thought I would just benchmark random read of 3G (orders of magnitude bigger than disk controller cache) files for 5 minutes (300 seconds) with a single thread using 16Kb block size, without adding any additional queue. I tested this on both my laptop and an Amazon EC2 instance. The commands for both tools are listed below, and they should perform the same thing, as far as I can tell. Let me know if you have any comments/pointers or if I missed anything.

SysBench commands:

sysbench --test=fileio --file-total-size=3G prepare
sysbench.exe --test=fileio --file-total-size=3G --file-test-mode=rndrd --max-time=300 run

Fro SQLIO, here is the line in param.txt and command used:

c:\testfile.dat 1 0x0 3072
sqlio -kR -s300 -dc -b16 -frandom -Fparam.txt

As this is a quick test, I ran the same test twice and took the average value for comparison purposes. The detailed output is pasted at the end of this post.

On my Windows XP Pro Service Pack 2 laptop with Intel X-25 SSD:

IO/SecondThroughput/Second
SQLIO3833.559.90Mb
SysBench3390.7752.98Mb

So on my laptop, SQLIO’s results are 13% higher than that of SysBench.

On Amazon EC2 ami-c3e40daa with EBS device running Windows Server 2008 Datacenter Edition Service Pack 2, whose results varied widely between my two runs:

IO/SecondThroughput/Second
SQLIO678.9110.61Mb
SysBench408.966.39Mb

On this machine, SQLIO results are 66% higher than that of SysBench.

Below is the gory details.

Here are the detailed output on my laptop:
SQLIO
C:\Program Files\SQLIO>sqlio -kR -s300 -dc -b16 -frandom -Fparam.txt
sqlio v1.5.SG
parameter file used: param.txt
file c:\testfile.dat with 1 thread (0) using mask 0×0 (0)
1 thread reading for 300 secs from file c:\testfile.dat
using 16KB random IOs
using specified size: 3072 MB for file: c:\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 3835.39
MBs/sec: 59.92

C:\Program Files\SQLIO>sqlio -kR -s300 -dc -b16 -frandom -Fparam.txt
sqlio v1.5.SG
parameter file used: param.txt
file c:\testfile.dat with 1 thread (0) using mask 0×0 (0)
1 thread reading for 300 secs from file c:\testfile.dat
using 16KB random IOs
using specified size: 3072 MB for file: c:\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 3832.00
MBs/sec: 59.87

SysBench
C:\MessAround\sysbench-0.4.12\sysbench-0.4.12\sysbench\RelWithDebInfo>sysbench.e
xe –test=fileio –file-total-size=3G –file-test-mode=rndrd –max-time=300 run
sysbench 0.4: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
128 files, 24Mb each
3Gb total file size
Block size 16Kb
Number of random requests for random IO: 10000
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random read test
Threads started!
WARNING: Operation time (18446744073709226000.000000) is greater than maximal co
unted value, counting as 10000000000000.000000
WARNING: Percentile statistics will be inaccurate
Done.

Operations performed: 10000 Read, 0 Write, 0 Other = 10000 Total
Read 156.25Mb Written 0b Total transferred 156.25Mb (52.143Mb/sec)
3337.16 Requests/sec executed

Test execution summary:
total time: 2.9966s
total number of events: 10000
total time taken by event execution: 2.9343
per-request statistics:
min: 0.01ms
avg: 0.29ms
max: 18446744073709.47ms
approx. 95 percentile: 0.48ms

Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 2.9343/0.00

C:\MessAround\sysbench-0.4.12\sysbench-0.4.12\sysbench\RelWithDebInfo>sysbench.e
xe –test=fileio –file-total-size=3G –file-test-mode=rndrd –max-time=300 run
sysbench 0.4: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
128 files, 24Mb each
3Gb total file size
Block size 16Kb
Number of random requests for random IO: 10000
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random read test
Threads started!
WARNING: Operation time (18446744073694841000.000000) is greater than maximal co
unted value, counting as 10000000000000.000000
WARNING: Percentile statistics will be inaccurate
Done.

Operations performed: 10000 Read, 0 Write, 0 Other = 10000 Total
Read 156.25Mb Written 0b Total transferred 156.25Mb (53.818Mb/sec)
3444.38 Requests/sec executed

Test execution summary:
total time: 2.9033s
total number of events: 10000
total time taken by event execution: 2.8777
per-request statistics:
min: 0.01ms
avg: 0.29ms
max: 18446744073696.34ms
approx. 95 percentile: 15.39ms

Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 2.8777/0.00

Here are the detailed output from Amazon EC2 ami-c3e40daa with EBS device:
SQLIO
c:\Program Files\SQLIO>sqlio -kR -t1 -s300 -dC -frandom -b16 -Fparam.txt -BH -LS

sqlio v1.5.SG
using system counter for latency timings, 3579545 counts per second
parameter file used: param.txt
file c:\testfile.dat with 1 thread (0) using mask 0×0 (0)
1 thread reading for 300 secs from file c:\testfile.dat
using 16KB random IOs
buffering set to use hardware disk cache (but not file cache)
size of file c:\testfile.dat needs to be: 3221225472 bytes
current file size: 0 bytes
need to expand by: 3221225472 bytes
expanding c:\testfile.dat … done.
using specified size: 3072 MB for file: c:\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 1230.94
MBs/sec: 19.23
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 0
Max_Latency(ms): 204
histogram:
ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 98 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

c:\Program Files\SQLIO>sqlio -kR -t1 -s300 -dC -frandom -b16 -Fparam.txt -BH -LS

sqlio v1.5.SG
using system counter for latency timings, 3579545 counts per second
parameter file used: param.txt
file c:\testfile.dat with 1 thread (0) using mask 0×0 (0)
1 thread reading for 300 secs from file c:\testfile.dat
using 16KB random IOs
buffering set to use hardware disk cache (but not file cache)
using specified size: 3072 MB for file: c:\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 126.88
MBs/sec: 1.98
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 7
Max_Latency(ms): 497
histogram:
ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 13 9 0 3 7 8 8 8 8 8 8 8 2 1 1 1 1 1 1 1 1 0 0 0 2

C:\Users\Administrator\Documents\sysbench-0.4.12\sysbench\RelWithDebInfo>sysbenc
h.exe –test=fileio –file-total-size=3G –file-test-mode=rndrd –max-time=300 r
un
sysbench 0.4: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
128 files, 24Mb each
3Gb total file size
Block size 16Kb
Number of random requests for random IO: 10000
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random read test
Threads started!
Done.

Operations performed: 10000 Read, 0 Write, 0 Other = 10000 Total
Read 156.25Mb Written 0b Total transferred 156.25Mb (10.64Mb/sec)
680.95 Requests/sec executed

Test execution summary:
total time: 14.6854s
total number of events: 10000
total time taken by event execution: 14.6048
per-request statistics:
min: 0.01ms
avg: 1.46ms
max: 150.29ms
approx. 95 percentile: 4.77ms

Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 14.6048/0.00

C:\Users\Administrator\Documents\sysbench-0.4.12\sysbench\RelWithDebInfo>sysbenc
h.exe –test=fileio –file-total-size=3G –file-test-mode=rndrd –max-time=300 r
un
sysbench 0.4: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
128 files, 24Mb each
3Gb total file size
Block size 16Kb
Number of random requests for random IO: 10000
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random read test
Threads started!
Done.

Operations performed: 10000 Read, 0 Write, 0 Other = 10000 Total
Read 156.25Mb Written 0b Total transferred 156.25Mb (2.1371Mb/sec)
136.77 Requests/sec executed

Test execution summary:
total time: 73.1139s
total number of events: 10000
total time taken by event execution: 73.0284
per-request statistics:
min: 0.02ms
avg: 7.30ms
max: 728.84ms
approx. 95 percentile: 23.08ms

Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 73.0284/0.00

Comments (4)

Next entries » · « Previous entries

Page optimized by WP Minify WordPress Plugin