Archive for December, 2008

Happy Holidays 2008

And Happy New Year to you!

Christmas meal with Swedish characteristics

2008 Christmas meal

Opening presents

Ben opening Christmas presents

Christmas tree

2008 Christmas tree

Playing in the snow

Ben and friends playing at school playground

Comments

Consolidating Twitter feeds with Yahoo Pipes

I have been suffering from web addiction for a long time. So when Twitter first came out, I figured that would be the last thing I need. Plus, to really take advantage of Twitter, you’ve got to have a relatively fancy phone with a data plan. A decent phone costs anywhere between 150 and 400 dollars, and service plan runs around 80 dollars a month (about 1000 dollars a year). That’s a lot of money.

Off tangent (跑题): Regarding phones, I had high hopes for T-Mobile’s GPhone. But a few friends who have GPhones were a little disappointed. iPhone needs a competitor but it looks GPhone is not there yet. Now that I am running my own business, I am thinking about upgrading to a new phone with a service plan, so you Android people need to work harder to impress me.

Anyway, to get the latest happenings of people I am interested in, I go to Twitter every 2 months or so and just read through the messages. It is not very efficient.

I’ve heard of Yahoo Pipes before but never played with it. But after reading an O’Reilly book which recommended Yahoo Pipes, I thought: wouldn’t it be nice to consolidate the tweets I am interested in and deliver them in my Google Reader? So that’s what I did.

It is really simple to build a pipe. If you’ve worked with Sql Server DTS, it will look very familiar to you. The whole process is surprisingly simple yet powerful. You can even apply regular expression as filters.

Here is the key takeaway: when defining source, pick Fetch Site Feed. For instance, I am interested in 冯大辉‘s tweets, and I just put http://twitter.com/fenng as the address in Pipe. The beauty is that you can add more feeds and consolidate them in one pipe. I then published the pipe, subscribed it in my Google Reader. Because the tweet message is short, I can just eyeball through the subject line, save information if it is any interesting, and then click “Mark all as Read” button.

Comments (2)

John D. Liu’s The Lessons of the Loess Plateau

In the beginning of December of 2008, I listened to John D. Liu’s presentation of an ecosystem rehabilitation project in the Loess Plateau in China. At that event, only part of the documentary was shown because of time constraint.

I corresponded with John, who graciously offered me the link to the whole film, which is 52 minutes long. I watched it on Friday and, once again, really enjoyed it.

Here is the link to it. Highly recommended! Enjoy.

http://www.earthshope.org/Lessons_of_the_Loess_Plateau.html

Comments

A night at Asia Society — 夜访亚洲协会

I taught a 5-day course at the beginning of December in New York City. Thanks to information from Danwei, I bought a ticket and went to see John D. Liu’s presentation about rebuilding ecosystem in part of the Loess Plateau (Huangtu Plateau 黄土高原), and how that experience could benefit some other countries, especially in Africa. It was really great experience.

It was my first time going to the Asia Society in New York, although I’ve come across this organization many times through my readings. I am not sure if it is doing a special exhibition or what, but the exterior and the interior, as far as I can tell, is all Mao-themed: Mao suits, Mao statues, and Mao posters all over the place. Worse, on prominent display on one side of the building is Jung Chang and Jon Holliday’s horrible, dishonest “Mao The Unknown Story”. Now there is no question that learning about Mao is very important in understanding contemporary Chinese history (although his influence is waning, especially for people born after the 70s), but don’t you think it is appropriate to display something else, in addition to Mao, given that he died more than 32 years ago and China has undergone tremendous changes since? Especially given Asia Society’s stated purpose of “working to strengthen relationships and promote understanding among the people, leaders, and institutions of Asia and the United States”, that one-dimensional display was a letdown.

(Obviously if it is doing a special exhibition, then it is a totally different matter. Its web site says the organization has a lot of interesting things in its collection, so I’d like to come back for a proper visit, when I am in town next time.)

John’s presentation was excellent, and the discussion was based on a documentary he shot, “The Lessons of the Loess Plateau”. He talked about how the ecosystems in the Plateau was altered almost completely by thousands of years of human activity, how rainwater cannot be retained because of lack of vegetation, and how top soil is easily washed down to the river, and the resulting land erosion and desertification.

In 1994, the Loess Plateau Watershed Rehabilitation Project was initiated, with the World Bank providing some loans. Experts, both domestic and from abroad, joined hands to provide technical and scientific assistance. Trees were planted, free range goat grazing was banned, low yield land was taken back to plant other vegetation, along with other land rehabilitation efforts. By all accounts, it was very successful: silt load in the Yellow River was decreased, biodiversity is gradually making a comeback, farm land yield increased, and farmers’ income increased as well.

John mentioned that the key to success, which I totally concur, is that ecosystem rehabilitation and poverty alleviation were tackled together. There is a Chinese idiom called 标本兼治, meaning that both symptoms and root cause need to be cured together. One cannot hope for success if only one aspect of a complex issue is addressed.

The ensuing discussion was also interesting. John’s comment that corporations could play with the carbon dioxide trading scheme for monetary purposes is very much worth pondering. Like Wu Fei’s boyfriend commented, paraphrased here: “paying me for not hurting you”.

Speaking of Wu Fei, it was really nice meeting her and her boyfriend there. I bumped into them at the lobby. I thought I met her in the past. It turned out that I met her virtually via Danwei here. I enjoyed that Danwei interview, because I appreciated her sense of humor. Her sentiment of returning when she performed in Beijing, after being in the United States for so long, also resonated with me. Anyway, we talked and reminisced a little bit, especially about slangs in the late 80s and early 90s, like 个体户, 倒爷, and 练摊儿. It’s kinda funny that we are both 个体户 in the United States nowadays. Wu Fei’s music can be found here.

Another highlight was that I met Jocelyn Ford, NPR contributor in Beijing. Wu Fei, her boyfriend, Jocelyn, and I had dinner together at a Chinese restaurant in Chinatown. I wished the dinner could have been longer, because I am curious to learn from Jocelyn how, in general, do foreign journalists gather news in China, if and how their reporting is edited back in the headquarter, how many of those journalists know (speaking/reading/writing) Chinese, and if not, how reporting is done, etc..

All in all, a great evening. I wish the rehabilitation effort continues in Loess Plateau and other parts of China. We need to take care of the planet for ourselves and for generations to come.

Comments

美国竞选随记

8月初入美籍后,就有了选举权。之后到库克县政府给自己办了选民登记,参加了今年的美国总统大选。总统的票我投给了奥巴马,国会两院的票我给了民主党和绿党党员,库克县的一些职位我给了民主党、绿党、和共和党。和大多数选民一样,我对选票上的大部分竞选者一无所知,像一些法官之类,并且这些人很少有竞选对手,就一个人参选,你投不投他都会被选上。

说来好玩儿,今年五月左右我差点到奥巴马的竞选总部去做一个MySQL的项目。我当时客户的信息部门老总在里面有内线,努力一下可能有戏。但我那时和另一个潜在客户有一个口头上的约定,就没有去争取。

8月份麦肯挑佩林做竞选搭档,一开始竟有民调反弹。这个狐狸精,挤眉弄眼地搞一些暗示,说风凉话,煽动基督教原教旨主义者的狂热,给无知的、肤浅的、骨子里有莫名优越感的悲情人士推波助澜,稍微有点“流氓会武术,挡也挡不住”的架势。朕有点坐不住了,该出手时就出手,亲自出马,助小奥一臂之力。

因我已是公民,所以就已个人的名义捐了点小钱,之后为竞选做义工。申请做义工挺有意思:竞选总部要三个人的电话、电邮来核实我的信息。照办后,还参加了一个网上培训。因为我申请的是数据库方面的义工,所以培训是针对网上选民数据库的查询。不知数据从何而来,但很详细,虽然会有误差:大部分选民的姓名、地址、生日、性别、政党倾向、过去的投票历史,等等等等。

后来他们要求数据库的义工做全职,又不给钱,所以这事儿没办成。但被征召为小卒,到邻州搞人海战术,做地毯式轰炸,canvassing。就这样我去了威斯康星。就是密尔沃基分部的人,从选民数据库里根据事先定好的标准,查出一些人的地址。我们拿着这些地址去敲门,希望他们能投奥巴马一票。我和两位女士,从我社区的民主党党支部出发,开车去了密尔沃基。我们去了郊区的一个大部分是白人的社区敲门。如果主人在家的话,以我的经验,大部分的对话都很简短。实际上这种推销有点尴尬,和传教类似。所以后来奥巴马竞选总部邀请我轰炸印第安纳,我拒绝了,虽然我很想让这个保守的中西部州在总统竞选上倒向奥巴马。

纽约地铁站。这个站名上有中文“华埠”,因在中国城
New York City subway

后来离选举日越近,奥阵营的信心也越足。我,和其他很多人一样,收到了竞选夜到格兰特公园参加集会的邀请,当然要交钱买票。我没去。直到现在我还常收到奥阵营的电邮要钱,搞得我有点烦。

奥巴马的当选,确实是具有里程碑意义的事情,可喜可贺。其实感觉民主和法制社会的建立,是一个渐进的过程。即使有了这个框架,要维护真正意义上的、或者说接近真正意义上的民主,不让政府部门被说客和特殊利益集团利用道德、宗教、商业等极端恐怖手段所操纵,都需要人们长久的、持之以恒的努力。美国、台湾、泰国等过去几年的经历就是很好的反面教材。个人以为发展中国家的民主法制建设,有太多的事情要做。当然要学习、汲取他国的经验和教训,但可以肯定的是,照抄任何国家的模式都不行。在这个过程里面,对政府既要合作,也要维权;既要抗争,也要妥协;既要谨慎,也要大胆。光靠写煽动性的文字不行,多为大部分农民和城镇民众的利益做实事才是正本。民主法制的道路,是自己的路;在权衡利弊考虑方式之后,要脚踏实地得自己走。

Comments

Thoughts on Data Masking

Often times, production data needs to be moved to different environments for testing/developing purposes. However, some of that data can be people’s name, birthday, address, account number, etc., that we don’t want testers and/or developers to see, due to privacy and regulatory concerns. Hence the need to mask those data. I can certainly see this needs grow over time for all database platforms. There are software out there that does this sort of task, or similar tasks, such as data generation tool. Oracle actually has a Data Masking Pack since 10g for this purpose. Here are some of my thoughts on this topic.

One method of masking data is through reshuffling, which shuffles the value in target column(s) that you want to protect randomly across different rows.

Another way of doing it is through data generation. For instance, for target column(s), we just replace its value with something else.

For reshuffling, obviously the data element is still meaningful. In other words, a reshuffled account number is still a valid account number, only now its original owner has been changed. Depending on how stringent the requirements are, this may or may not be enough.

For data generation, we have this question to consider: is the format of the generated data important to us? If yes, then obviously some intelligence needs to be built in so that the generated data follows the format we define. For instance, a valid credit card number is 16 digits long, has certain prefix and/or suffix, the nth digit has a certain meaning, so on and so forth.

Another example is people’s name. Do we replace the name with some random letters we concoct together, or do we want the name to be realistic? If we want realistic names, then we may have to supply a dictionary for the masking software to pull that data from.

In either case, we also have the unique and foreign key constraints to deal with, if there are any. In certain instances where more than one schema/database is involved, the complexity increases exponentially.

Regardless of the method being used, performance of data masking process is important to consider. If the volume of data to be masked is small, then it may not be a big deal. But, as is often the case, you may have a huge transaction table that has millions and millions of rows to mask, then performance is a definite concern.

One idea I am toying around with for data masking performance issue is through low-level data manipulation. For instance, in MySQL, maybe play with rowid. And for Sql Server, play around with fileid, pageid, and such.

Another way to get around that is to do masking through batches. In other words, divide a big task into smaller tasks and tackle them one at a time.

Personally, I like the idea of data reshuffling. On one hand, the data element is meaningful. I know I don’t want to work with randomly generated gibberish that does not make sense to me. On the other hand, if one wants to do performance testing in test or development environment, one would like to have the data distribution as close to production as possible. And data reshuffling can probably keep the data distribution pretty close to that of production.

In my next entry, I will share a simple C# program I wrote to reshuffle data inside a CSV file.

Comments

Random oddity

Recently I wrote a program to reshuffle data in a csv file. And I ran into a problem with Random in C#.

Let’s look at the following program. One would think that it should print 5 integers, randomly picked between 0 and 8. But no, in almost all cases, the program prints the same integer 5 times.

using System;
class Scaffolding
{
    static void Main(string[] args)
    {
        for (int i = 0; i < = 5; ++i)
        {
            Random randomNumber = new Random();
            Console.WriteLine(randomNumber.Next(9));
        }
    }
}

It turns out the correct way, and more efficient way according to documentation, is to instantiate one static Random class to generate many random numbers over time, instead of repeatedly creating a new Random to generate one random number, like so:

using System;
class Scaffolding
{
    static Random randomNumber = new Random();
    static void Main(string[] args)
    {
        for (int i = 0; i <= 5; ++i)
        {
            Console.WriteLine(randomNumber.Next(9));
        }
    }
}

MSDN documentation is here.

Comments

Sql Server database mirroring automatic failover verification

Starting with Sql Server 2005, Sql Server provides an interesting high availability option at individual database level, called database mirroring. When configured in high availability mode (principal, mirror, and witness), provided that:

1. Failover Partner information is supplied in the connection string;
2. Application code knows to retry database operations;

then automatic fail over will occur.

This all sounds good, but seeing is believing. So I wrote a simple C# console program to simulate what happens when the principal database fails. This simple program connects to a mirrored database, grabs data, and then prints it out on the console. It is a bit contrived, and I used the somewhat dreaded goto statement to retry connections, but hopefully it demonstrated the point. I tested this on VMWare virtual machines, so I put in 1 second delay.

Here is what I tested while my console program is running:

1. Manual failover through the principal’s property page. The program didn’t skip a beat. Results below:

Haidong
12/4/2008 9:21:32 PM
Haidong
12/4/2008 9:21:33 PM
Haidong
12/4/2008 9:21:34 PM
Haidong
12/4/2008 9:21:35 PM
Haidong
12/4/2008 9:21:36 PM
Haidong
12/4/2008 9:21:37 PM
Haidong
12/4/2008 9:21:38 PM

2. Stopped Sql Server service on the principal. Similar results as above;

3. Yanked the power cable off the principal box. There was a noticeable delay. Notice the one minute delay in results below:

Haidong
12/4/2008 10:00:23 PM
Haidong
12/4/2008 10:00:24 PM
Haidong
12/4/2008 10:00:25 PM
Haidong
12/4/2008 10:01:24 PM
Haidong
12/4/2008 10:01:25 PM
Haidong
12/4/2008 10:01:26 PM
Haidong
12/4/2008 10:01:27 PM
Haidong
12/4/2008 10:01:28 PM
Haidong
12/4/2008 10:01:29 PM

All in all, I am pretty impressed. If you compile and run the program, use Ctrl – c to stop if you think you’ve seen enough. Source code below.

using System;
using System.Data;
using System.Data.SqlClient;
using System.Threading;
namespace SqlDbConsole
{
    class Program
    {
        [STAThreadAttribute]
        static void Main(string[] args)
        {
            while (true)
            {
                RunSimpleSql();
            }
        }
        static void RunSimpleSql()
        {
            SqlConnection conn = new SqlConnection("Data Source=PrincipalServer;Failover Partner=MirrorServer;Initial Catalog=MyDb;Integrated Security=SSPI;");
        PointOfRetry:
            try
            {
                conn.Open();
                SqlCommand cmd = conn.CreateCommand();
                cmd.CommandText = "SELECT TOP 1 FirstName from Contact";
                cmd.CommandType = CommandType.Text;
                SqlDataReader rdr = cmd.ExecuteReader();
                while (rdr.Read() == true)
                {
                    string s;
                    s = (string)rdr["FirstName"];
                    Console.WriteLine(s);
                    //Console.WriteLine("Beijing: {0}", TimeZoneInfo.ConvertTimeBySystemTimeZoneId(DateTime.Now, TimeZoneInfo.Local.Id, "China Standard Time"));
                    Console.WriteLine(DateTime.Now);
                }
                Thread.Sleep(1000);
            }
            catch
            {
                goto PointOfRetry;
            }
        }
    }
}

Comments

Why we fight — US government movie during World War II

I’ve read a couple of history books lately. Chinese history overall, especially the period since the mid 19th century, is fascinating to me. American history is also of great interest. I am currently working on an American history book.

Anyway, I noticed a series of US military propaganda movies on Netflix, made during World War II, called Why We Fight. One of them is the Battle of China. Another one, War comes to America, was on the same DVD. I watched both a few weeks ago. There is also a documentary of the same title, produced in 2005, that depicts the Military-industrial complex. I think I will check out that movie as well.

Anyway, it was really interesting to look back. How perceptions and people’s sense of history change over time, and how powerful a narrative can be when propagated through mainstream media of superpowers, first setting the context, then being wired, translated, quoted, and commented in media outlets all over the world. Fortunately, blogging, grassroots journalism, and the easy availability of information is changing that, probably similar to what happened when printing caused the easy availability of information in Europe right before the Protestant movement.

Here are a few images and things I found interesting.

1. There were a couple of scenes that depicts American people reciting the Pledge of Allegiance (对美国效忠宣言,美国中小学生每天早晨的例行活动和政府、法院仪式上常用的宣言) in War Comes to America. It was interesting to hear when “under God” was not part of it. “under God” was added in 1954;

2. The song, March of the Volunteers (义勇军进行曲), now the national anthem of the People’s Republic, was repeated a few times in the movie;

3. A few enduring images were chosen to represent China. The Great Wall was among them. It also included this image, the Potala Palace in Tibet:

Why we fight

4. Here is a map of China, as depicted in the movie.

Why we fight

5. At the end of the Battle of China, there was a narrative. I didn’t copy down the exact sentence, but it was something to the effect of “Your Japs were the Yellow Peril and we will crush your yellow faces”. According to wikipedia, Yellow Peril first referred to Chinese, then shifted to the Japanese during Japan’s aggression and invasion of East and Southeast Asia. The fear-mongering and stereotype of people is just incredible.

6. And finally, here is the picture of farmer winnowing wheat or rice in the 1940s. It looked familiar and intimate because I saw my parents and grandparents winnowing wheat like that in the 70s and early 80s. When I was growing up, the cutting and transporting of wheat to the threshing ground was all manual labor, and my mom did most of it. I did my meager share of that work also, but threshing was mostly done by threshing machines powered by electric motors during my day. My mom told me that the whole process of wheat harvesting in my hometown is mostly mechanized nowadays.

Why we fight

Comments (2)

Notes on analyzing a user minidump with WinDbg

One of my students’ Sql Server 2000 SP4 crashed. I volunteered to see if I could get anything out of it by looking at the dump file with WinDbg.

A few notes:
1. The processor or Windows version that the dump file was created on does not need to match the platform on which WinDbg is being run. However, you do need to provide symbol files that match the Windows edition and version where the dump was from. You can have more than one symbol file path, just separate them by a semicolon.

Since the dump file was generated on Windows 2003 SP2, I downloaded symbol files for it and installed on my Windows XP SP2 laptop. Afterwards, I added it to the symbol path. So my symbols are from 2 sources: Microsoft symbol server and the local symbol I downloaded;

2. I did install Sql Server 2000 SP4 on my laptop to match the Sql Server instance on the server where the dump was from. Afterwards, I added C:\Program Files\Microsoft SQL Server\MSSQL$SQL2K\Binn to the Image File Path

After running:
!analyze -v
~kv
lm

WinDbg points to the direction of:

INVALID_POINTER_READ_c0000005_sqlservr.exe!CIncPageMgr::FreeToMark

Not too sure how revealing this is, since we’ve already known it was an access violation. I suppose those kind of information is like bread crumbs that can be helpful to Microsoft support.

Questions:

1. I thought that I got all the right symbol path, but I still got the message below in the command window, why?
excerpt
Your debugger is not using the correct symbols
Type referenced: kernel32!pNlsUserInfo

2. It looks like viewing the memory content in the last call can be helpful (Alt + 5). Two questions related to this point:
a. Which display format is useful? To me, part of ASCII and Byte are meaningful to the naked eye.
b. How can I save the memory content in a text file? I can analyze text much faster in VI, but I haven’t found a good way to suck the text out of the memory viewer window?

Got to retire now. I have close to 20 students to teach tomorrow. It was a fun teaching day today.

By the way, X.T.X (谢天笑, 冷血动物), a Shandong boy like me, is not bad.

Any help regarding my WinDbg questions, dear reader?

Comments (2)

Page optimized by WP Minify WordPress Plugin