Archive for Technology

Remove common lines from files with Python

I am digging Python. I am writing small pieces of code that does one thing and does it well, kind of like building a solid, reliable Lego piece. When I have a collection of them, I can snap ‘em together to make something useful. In fact, I’ve used Python to generate some content behind the wiki I built, http://www.haidongji.com/wiki

One useful thing that I wrote recently is to solve this problem: suppose you have two files, 1.txt and 2.txt, your objective is to remove lines that exist in both files from 1.txt. I came up with 4 lines of Python code (including the import statement) to solve it. I am a bit amused by this, although I don’t necessarily like this style of programming. It is clever, but can be hard to understand and maintain later on. Here is the code. Just for demo purposes, no error handling!

#!/usr/bin/env python

import fileinput

for line in fileinput.input("1.txt", inplace=1):
    if line not in open("2.txt", "r"):
        print line,

Note the comma at the end of the print statement. It is necessary, otherwise you will have extra newline characters in your file.

To create a simple test, create 1.txt with the English alphabet, with each letter occupying a line. Then create 2.txt, say with the letters in the word “haidong”, again with each letter taking a line. Run the code and see what happens.

Comments (1)

Quick notes for DokuWiki

Notes from tonight’s playing with DokuWiki. Initially saved it as a text file, then thought, what the hell, it might benefit somebody.

1. Got DokuWiki. Learned this from my buddy Baron, who I admire and whose recommendations I take seriously.

2. Fired up ami-23b6534a, an Amazon virtual instance with Apache pre-configured.
/var/www/html is the web root directory, fairly common.
use this code snippet to find out user and group that Apache httpd runs under:

< ?php

if(function_exists('posix_geteuid')){
    // use posix to get current uid and gid
    $uid   = posix_geteuid();
    $usr   = posix_getpwuid($uid);
    $user  = $usr['name'];
    $gid   = posix_getegid();
    $grp   = posix_getgrgid($gid);
    $group = $grp['name'];
}else{
    // try to create a file and read it's ids
    $tmp = tempnam ('/tmp', 'check');
    $uid = fileowner($tmp);
    $gid = filegroup($tmp);

    // try to run ls on it
    $out = `ls -l $tmp`;
    $lst = explode(' ',$out);
    $user  = $lst[2];
    $group = $lst[3];
    unlink($tmp);
}

echo "Your PHP process seems to run with the UID $uid ($user) and the GID $gid ($group)\n"; ?>

Or phpinfo, which is a bit too much data.

3. Downloaded DokuWiki and expanded it, and put it under the web root directory.

4. Permission stuff.

chmod 775 conf/
chgrp apache conf/
chgrp apache data/
chmod 775 data/
chmod 775 data/pages/
chgrp apache data/pages/
chmod 775 data/attic/
chgrp apache data/attic/
chgrp apache data/media/
chmod 775 data/media/
chmod 775 data/meta/
chgrp apache data/meta/
chgrp apache data/cache/
chmod 775 data/cache/
chmod 775 data/locks/
chgrp apache data/locks/
chgrp apache data/index/
chmod 775 data/index/
chmod 775 data/tmp/
chgrp apache data/tmp/
mv data /home/myname/
modify conf/local.php to include this line:
$conf['savedir'] = ‘/home/myname/data/’;
mv conf /home/myname/
add preload.php under /var/www/html/inc with this content:

define(’DOKU_CONF’,'/home/myname/conf/’);

To do:

1. a quick once-over of Wiki syntax.
2. Possibly search for transferring from one wiki platform to another?
3. Creating a test page. Shandong comes to mind.

该洗洗睡了。明天再说。记着和老婆一起把外面那个桌子搬到地下室里,记着教孩子中文要耐心!

Comments (1)

OmniFocus sync to iPhone through WebDav on MyDisk

I have been going back and forth in search of better ways to manage my life. Time seems to be in short supply, especially for mid-aged adults with child(ren) at home in a modern society. It is sad, in a way, that we are so goal-oriented, and there are certainly way too much noise and temptation surrounding us all. Oh heck, it is 10:30pm again, and I want to lie in bed and read for a while before I fall asleep, so let’s cut to the chase and show you what I want to share. My friends, here is the 猛料 you’ve been waiting for:

1. OmniFocus looks like a very promising productivity and life management software;
2. It’s companion iPhone application is also nice;
3. Syncing between the two seems a bit troublesome, if you don’t have MobileMe($$$) or Bonjour($$$). I’ve read people’s complaints against MobileMe. And my hosting provider does not support WebDAV on my domain. So I started searching for a free WebDAV service.

Three services caught my attention:

sharemation
swissdisk
http://mydisk.se.

I read through various comments about all of them. Here is my recommendation: go with mydisk.se. The reasons are:
1. Sharemation is only good for 5 MB, a bit small;
2. Swissdisk had a disk failure sometimes in October;

If you decide to check out mydisk.se, you won’t find the 2 GB free info on its site. Only after you sign up, it will show you that there is 2 GB free WebDAV storage for you.

Comments (5)

AWS Management Console is nice

If you want to play with Amazon cloud computing stuff, I think using AWS Management Console web interface is the best, easiest, and most intuitive approach, based on my experience so far.

My usage with Amazon Web Services has been only with EC2 up to this point. Prior to AWS Management Console, I had to set up Java, EC2 API tools, various path and environmental variables, certificates, keys, etc., etc.. It is a fairly convoluted process.

AWS Management Console is much easier, except for downloading PuTTY and PuTTYgen on Windows and tsclient on Linux, and a private key pair, everything else is handled inside the browser. Here are a few things I learned:

  • Once you are in, create a Key Pair. The web interface will prompt you to save it. Do so, because you will need it to start instances and, depending on what type of instances you start (Windows or Linux), you will need it for shell access (Linux) or for getting Windows administrator password for remote desktop access;
  • If you are working with Windows machine running Linux EC2 instances, get PuTTY and PuTTYgen. Follow instructions here to generate key that can be used by PuTTY. It worked for me. I got my private key pair file on Linux first, and then moved to Windows. It even worked with the ^M characters inside the file.
  • For security groups, I found the ones proposed by Amazon works fine. For instance, for a typical LAMP server, it proposes a webserver group where it opens up SSH port 22, MySQL port 3306, and HTTP port 80, which is normally what you want.

Good luck!

Comments (2)

AnyDbTest

I’ve been working in my spare time on a database testing tool, AnyDbTest, with my friend Wade. The program is written in C#, based on the .Net Framework 3.5.

It can be used in many scenarios: business analysts and QA can use it to confirm and validate data, recordsets data comparison; database developers can use it for code refactoring, unit testing, and when run all test cases together, it can also help with regression testing, etc. It currently supports SQL Server, Oracle, and MySQL.

Here are some highlights:

  • Writes test case with XML, rather than Java/C++/C#/VB test case code
  • Many kinds of standard assertion supported, such as StrictEqual, SetEqual, IsSupersetOf, Overlaps, RecordCountEqual etc
  • Allows using Excel spreadsheet/XML as the source of the data for the tests
  • Supports Sandbox test model, if test will be done in sandbox, all database operations will be rolled back, meaning any changes will be undone
  • Unique cross database testing, which means target and reference result set can come from two databases, even if one is SQL Server, another is Oracle.

Currently we are working on developing a command line version of the application, which will greatly aid automation. With a console application, it is possible to interact with code and test cases stored in source code repositories, and make complete integration and regression testing possible with a simple batch file.

Here is the link to the trial download. I’d appreciate it if you could let me know your feedbacks and suggestions if you decide to try.

Comments

WordPress 2.8, a step back

Update below
Reports in blogsphere about some WordPress blogs being hacked got me alarmed. So I decided to upgrade mine from whatever version it was at (2.3, perhaps, whatever the current version was a year ago) to 2.8, the latest, but certainly not the greatest.

The upgrade process wasn’t hard, just a bit tedious. After posting a new entry and it not showing up in Google Reader two days after the fact, I felt something was not right.

So I clicked the feed link directly, http://www.haidongji.com/feed, and got the error below:

Warning: include_once(/home/xxx/public_html/wp-includes/pomo/mo.php) [function.include-once]: failed to open stream: No such file or directory in /home/xxx/public_html/wp-settings.php on line 307

Warning: include_once() [function.include]: Failed opening ‘/home/xxx/public_html/wp-includes/pomo/mo.php’ for inclusion (include_path=’.:/usr/lib/php:/usr/local/lib/php’) in /home/xxx/public_html/wp-settings.php on line 307

Fatal error: Class ‘Translations’ not found in /home/xxx/public_html/wp-includes/l10n.php on line 407

The strange thing is that I did have mo.php in the right place, under wp-include. So I went to Feedburner’s site, now part of Google, and it didn’t have anything useful there. My sixth sense told me there might be something fishy with the whole Google feed integration business, so I decided to restore the .htaccess file to the state prior to Feedburner integration, and see what happens. Lo, it worked! That was last night.

Today I saw the comment page link was broken. After clicking the recent comments link on the right side, there was a 404 page not found error. Once again, I invoked my psychic debugging skills and decided to revert .htaccess to the one generated by Feedburner a few years back, the one I ditched the previous night. Behold, it worked!

Now some additional gripe regarding WordPress 2.8: why do we need to make the dashboard admin page more complicated than necessary? The 2.8 dashboard looks busy and noisy, and it made it considerably harder to go to places I wanted to go. After finally locating the settings page I am after, I need to scroll all the way to the bottom to view it! Keep it simple, please.

Update: It turned out I claimed success too early. After I reverted to .htaccess from Feedburner, I noticed the feed was not refreshed with the latest entry. So I went back to the original .htaccess file, where comment page was broken. I googled around and realized the permalinks was the culprit. So I did chmod 666 .htaccess, went to my WordPress admin page, saved permalinks changes, then chmod 644 .htaccess, and now I am back in business.

Comments

In search of a better dev environment setup

Now my make-shift working area is no longer the dining table after the move, I am eager to set up a proper environment for learning and developing software at home.

My thought is to have one decent computer that powers a few virtual machines. I like the idea of virtual machines as opposed to physical ones, which are more costly and messy. And I was pretty determined to run a Linux distro as the host, because I want to live and breathe in it for a while, to bring my Linux skill to a similar level of my Windows knowledge.

I’ve learned a few things during my quest for a better computer configuration. At times, it was really frustrating.

  • SSD (Solid State Disk) is nice. It provided a tangible, fairly obvious performance boost to my system. I am no longer afraid of the previously daunting prospect of long running install and uninstall, such as SQL Server. Better yet, when I want to test something that can potentially have a negative side effect, I now would do it in a virtual machine. Before such test starts, I take a snapshot. Prior to SSD, it would take a long time to do a snapshot of the VM. With SSD, it is much quicker now. If the test didn’t go well, I would roll back the virtual machine to the state when snapshot was taken;

    Begin rant

  • So far I am disappointed with all Linux distro that I’ve tried. Part of it is understandable, as vendors will make their devices working on Windows as a priority. I remember clearly the frustration I had when trying to get dial-up modem work properly on Red Hat, and my struggle later with wireless card. But things like these are inexcusable:

    1. Mouse freezes up for no apparent reason. When that happened, I always had to do a hard shutdown. Yes, I’ve tried various ways to restart X, but a) it didn’t work; b) Even if it worked, I would not take it. Please spare me the lecture of always using keyboard all the time. This is the deal breaker for me;
    2. Sound card stopped working after some system update from Ubuntu’s repository;
    3. Wireless card is flaky. It does not work on a consistent basis, in other words;
    4. System updates broke my display driver. I was forced to use a lower resolution display than what the monitor is capable of. I think if I recompile, I could fix it, but I was so pissed off that I didn’t bother;
    5. I couldn’t enable file sharing. It told me to resolve some dependency issues, but there were none!
    6. Ubuntu comes with Firefox 3.0, and there is no decent way of upgrading it to Firefox 3.5, other than Ubunzilla, but I cannot install Ubunzilla because the aforementioned dependency issue! And if I use apt-get install firefox-3.5, why deliver Firefox 3.0, 3.1, and a beta version of 3.5?
    7. Could we please stop using code names like Hardy Heron, Intrepid Ibex, and Jaunty Jackalope? Use that internally is fine. Version number 8.04, 8.10, and 9.04 will do for the general public.

  • After my frustration with Ubuntu, I decided to try Fedora and suseLinux (both gnome and kde). Maybe I didn’t give them enough time, but I encountered mouse and windows freeze issue as well. Ditto for Kunbuntu.

    End Rant

So I came back to Windows. Now the host runs Windows 7 Enterprise with VMWare virtual machines running Windows XP and Linux Mint 7, which looks anesthetically pleasing to me, for now.

For virtual machine software, I am using VMWare Workstation 6.5. I’ve looked at and tested a few other choices: Xen, Microsoft hyper-v, Virtual PC, Virtual Server. And VMWare Workstation came out on top, in my personal opinion. I plan to give VMWare 180 bucks for a license.

PS. I tested VirtualBox before and was reasonably impressed. I really hoped I could use it at home. But it will not make the cut, I am afraid. Read Emilian’s great critique here. I just want to add, in addition to snapshot issues, I had trouble getting the shared folder working.

Should I try FreeBSD as host?

Comments

Running Windows without anti-virus software

I’ve been frustrated with anti-virus software for a while. At a client last year, I had to fight with Symantec to get a proper ASP development environment up and running. At home, I’ve used AVG, avast, McAfee, and others. Far from the feeling of being protected from the “scary world out there”, I started to view the scanning, listening, warning, and even “calling home” “features” of anti-virus software as hindrance to my daily digital life. Granted, anti-virus software probably needs to do those kind of things, but it needs to get out of the way!

After reading similar complaints on Stackoverflow.com, I decided to follow a few others and started running Windows XP and Windows 7 without any anti-virus software. I’ve run a virus-free and anti-virus-free Windows XP machine for close to a year now, in addition to a few virtual XP machines. Also, the home Windows 7 machine has also been running naked for a few weeks now without issues.

Here are a few things I do or don’t do:
1. Browser: I use Firefox and Chrome. When running Firefox, I use adblock plus plug-in;
2. When installing software, I always pick custom installation, and I uncheck all un-necessary features;
3. Exercise good judgement.

Comments (1)

Resolving nvidia card display issue on Ubuntu

Today I purchased a HP Pavilion p6130y desktop as my main development workstation at home. I pulled the Intel X-25M SSD out of my laptop, and put it into this new box as the primary disk, and loaded Ubuntu 9.04 64-bit on it. So far, I am impressed. Things are fast and the machine is also quite.

However, I did notice a display problem. Out of the box, the default resolution was perhaps 1024×768, certainly below 1920×1080 that this Dell display can handle. So I went to System -> Administration -> Hardware Drivers to get the latest nvidia driver, because nvidia graphics card (NVIDIA GeForce 9100 GS) comes with this HP on the motherboard. It looked nice initially, but whenever I try to maximize a window, the system would freeze, forcing me to shut down the machine ungracefully by holding down the power button. Opening Firefox would cause the same thing, because my Firefox window opens in full / maximized mode.

I then spent the next few hours trying to figure this out. I opened the display setting window and tweaked the values here and there, hoping that would resolve the issue. I also uninstalled and reinstalled the driver a few more times. I ened up rebooting this machine so many times that I lost count.

Then it occured to me that instead of googling and scaning information that looked pretty irrelevant to my issue, I should try nvidia’s site directly. Sure enough, I could download Linux 64-bit display driver on its site! The download file was a .run file. I followed directions here, but the driver wouldn’t install because X is already running. So I booted into recovery mode, ignored the “telinit 3″ (which starts X) prompt, and just followed the directions in the terminal. The process tried to download some kernel files without success, so it compiled. One step also asked for 32-bit compatibility, which I answered “yes”, whether that step was successful was not confirmed. After the driver was successfully installed, I ran “telinit 3″ and the display issue went away.

Hope this helps somebody out there.

Update: to install the same driver on Fedora 11, get into the screen where you can edit grub loader entry, press a, space key, 3, then enter. That will start Fedora without X running. Then run “sh NVidiaDriverFileName.run”, follow instruction on screen and you will be good to go.

Comments

Different lingoes for bookmark lookup and why bookmark lookup can be costly

In the past, when I read technical books, I tend to skim through them, looking for keywords and only read the part that is relevant at the moment, and move on. Sometimes I would make an attempt at finishing a whole book, but a few months or even years later, I haven’t even finished the first 3 chapters!

I took a different approach recently. Now I’ve set up daily goals to read 50 or more pages or a key section of a technical book, and follow through. I am reading two technical books at the moment: Itzik Ben-Gan et al’s Inside Microsoft SQL Server 2005: T-SQL Querying, and Baron Schwartz et al’s High Performance MySQL. It’s nice to read database books that focus on different vendor implementations (SQL Server, MySQL, Oracle, etc.), because each one explains certain things from a slightly different angle, with slightly different language, and at times this gives you a better feel of the overall picture and clarity to certain key concepts.

Here I am talking about quality technical books, though, because the industry churns out way too much junk. It certainly is a waste of time and money on poorly thought out and written books.

Anyway, today I went through Itzik Ben-Gan’s performance tuning chapter. I used SQL Server 2008’s Management Studio to do tests against a SQL Server 2005 instance. I noticed an interesting change in terminologies: in SQL Server 2008, a bookmark lookup on a table with clustered index is now called Key Lookup, on a table without clustered index is (still) called RID lookup. Here are some screen shots:

Bookmark lookup in SQL Server 2005 on a table with cluster index

Sql2005BookmarkLookupCluster
Bookmark lookup in SQL Server 2005 on a table without cluster index

Sql2005BookmarkLookupHeap
Bookmark lookup in SQL Server 2008 on a table with cluster index

Sql2008BookmarkLookupCluster
Bookmark lookup in SQL Server 2008 on a table without cluster index

Sql2008BookmarkLookupHeap

To recap, here are the terms used for bookmark lookups in the 3 most recent SQL Server releases:

SQL Server 2000: bookmark lookup
SQL Server 2005: RID lookup on a heap, Clustered Index Seek on a table with clustered index
SQL Server 2008: RID lookup on a heap, Key Lookup on a table with clustered index

SQL Server’s clustered index implementation went through some interesting changes. Prior to SQL Server 7, all non-clustered index contains a pointer to the actual row(s) that has the value of the indexed keys. This pointer (RID, Row ID) physically points to the position on which page in which internal file that row is at. Starting from SQL Server 7, for tables without a clustered index (heap), the implementation stays the same. However, for tables with clustered index, the pointer is the clustered index.

This can potentially have a big impact for bookmark lookup on tables with clustered index. Here is why: to do lookups, SQL Server needs to traverse through clustered index, thus more reads. The number of additional reads this causes depends on the level of clustered index and how many rows the query touches. Suppose the clustered index has 3 levels (root, leaf, and one intermediate level), then a single bookmark lookup will incur 3 additional logical reads. If the query touches 2000 rows, then bookmark lookup will cause 6000 additional reads.

Note I am not bashing against clustered index though. Overall, in my opinion, the benefits of clustered index definitely outweighs its drawbacks. Now this post is getting long and I want to go back to my books, so I will stop here.

Comments (1)

« Previous entries