Archive for December, 2010

Questions on Text processing with Python

No need to waste time on proving the importance of text processing, I suppose. Here is an automation use case I had in mind when I started my search: sucking out all domain\login inside a block of text.

Yes, I can build my own regular expressions, and I have done that in the past. But, another use case is log file processing: SQL Server, Apache, MySQL, and such. Therefore an existing module that is easy to code, read and maintain is better than me code everything. In the end, regular expression is still going to be used, but a layer of abstraction will help productivity and maintainability.

I came across 3 modules: pyparsing, SimpleParse, and NLTK. I am curious to hear your opinions on those 3 modules, or if you have suggestions other than the 3 mentioned here:

1. How easy/difficult to learn those modules? I haven’t tried SimpleParse or NLTK yet, but I have tried pyparsing, which looks easy to pick up and the author, Paul McGuire, is very helpful. NTLK might be an overkill for what I do, at first glance.

2. What about performance? In most of my use cases, this is probably not that important, but I’ve come across comments on StackOverflow saying that pyparsing does not perform very well when text volume is large.

3. What about support and ongoing development? Like I mentioned earlier, the author behind pyparsing seems to be very active in answering questions and incorporating new ideas.

Thanks in advance for any insights and Happy New Year!

PS, here are 2 solutions to get domain\login out with pyparsing that Paul helpfully provided when I asked:

from pyparsing import *
grammar = Combine(Word(alphanums) + "\\" + Word(alphanums))
matches = grammar.searchString("jwfoleow fjlwowe\jfoew lwfweolwfo\jofweojw lifewijowe")
for m in matches:
	print m[0]
########
for toks,start,end in grammar.scanString("jwfoleow fjlwowe\jfoew lwfweolwfo\jofweojw lifewijowe"):
	print "%d:%d:%s" % (start,end,toks[0])

Comments (3)

Drop a schema and all its objects in SQL Server

Via Ranjith Kumar S, a script to drop all objects in a schema and then the schema itself in SQL Server. I made very slight modifications so stored procedure creation is unnecessary. All you need is adjusting values for @SchemaName and @WorkTest variables.

Limitations:
1. If a table has a PK with XML or a Spatial Index then it wont work
(workaround: drop that table manually and re run it)
2. If the schema is referred by a XML Schema collection then it wont work

Thanks Ranjith and enjoy!

declare @SchemaName varchar(100) = 'MySchema'
declare @WorkTest char(1) = 't'  -- use 'w' to work and 't' to print
declare @SQL varchar(4000)
declare @msg varchar(500)
IF OBJECT_ID('tempdb..#dropcode') IS NOT NULL DROP TABLE #dropcode
CREATE TABLE #dropcode
(
   ID int identity(1,1)
  ,SQLstatement varchar(1000)
 )
-- removes all the foreign keys that reference a PK in the target schema
 SELECT @SQL =
  'select
       '' ALTER TABLE ''+SCHEMA_NAME(fk.schema_id)+''.''+OBJECT_NAME(fk.parent_object_id)+'' DROP CONSTRAINT ''+ fk.name
  FROM sys.foreign_keys fk
  join sys.tables t on t.object_id = fk.referenced_object_id
  where t.schema_id = schema_id(''' + @SchemaName+''')
    and fk.schema_id <> t.schema_id
  order by fk.name desc'
 IF @WorkTest = 't' PRINT (@SQL )
 INSERT INTO #dropcode
 EXEC (@SQL)
 -- drop all default constraints, check constraints and Foreign Keys
 SELECT @SQL =
 'SELECT
       '' ALTER TABLE ''+schema_name(t.schema_id)+''.''+OBJECT_NAME(fk.parent_object_id)+'' DROP CONSTRAINT ''+ fk.[Name]
  FROM sys.objects fk
  join sys.tables t on t.object_id = fk.parent_object_id
  where t.schema_id = schema_id(''' + @SchemaName+''')
   and fk.type IN (''D'', ''C'', ''F'')'
 IF @WorkTest = 't' PRINT (@SQL )
 INSERT INTO #dropcode
 EXEC (@SQL)
 -- drop all other objects in order
 SELECT @SQL =
 'SELECT
      CASE WHEN SO.type=''PK'' THEN '' ALTER TABLE ''+SCHEMA_NAME(SO.schema_id)+''.''+OBJECT_NAME(SO.parent_object_id)+'' DROP CONSTRAINT ''+ SO.name
           WHEN SO.type=''U'' THEN '' DROP TABLE ''+SCHEMA_NAME(SO.schema_id)+''.''+ SO.[Name]
           WHEN SO.type=''V'' THEN '' DROP VIEW  ''+SCHEMA_NAME(SO.schema_id)+''.''+ SO.[Name]
           WHEN SO.type=''P'' THEN '' DROP PROCEDURE  ''+SCHEMA_NAME(SO.schema_id)+''.''+ SO.[Name]
           WHEN SO.type=''TR'' THEN ''  DROP TRIGGER  ''+SCHEMA_NAME(SO.schema_id)+''.''+ SO.[Name]
           WHEN SO.type  IN (''FN'', ''TF'',''IF'',''FS'',''FT'') THEN '' DROP FUNCTION  ''+SCHEMA_NAME(SO.schema_id)+''.''+ SO.[Name]
       END
FROM SYS.OBJECTS SO
WHERE SO.schema_id = schema_id('''+ @SchemaName +''')
  AND SO.type IN (''PK'', ''FN'', ''TF'', ''TR'', ''V'', ''U'', ''P'')
ORDER BY CASE WHEN type = ''PK'' THEN 1
              WHEN type in (''FN'', ''TF'', ''P'',''IF'',''FS'',''FT'') THEN 2
              WHEN type = ''TR'' THEN 3
              WHEN type = ''V'' THEN 4
              WHEN type = ''U'' THEN 5
            ELSE 6
          END'
IF @WorkTest = 't' PRINT (@SQL )
INSERT INTO #dropcode
EXEC (@SQL)
DECLARE @ID int, @statement varchar(1000)
DECLARE statement_cursor CURSOR
FOR SELECT SQLStatement
      FROM #dropcode
  ORDER BY ID ASC
 OPEN statement_cursor
 FETCH statement_cursor INTO @statement
 WHILE (@@FETCH_STATUS = 0)
 BEGIN
 IF @WorkTest = 't' PRINT (@statement)
 ELSE
  BEGIN
    PRINT (@statement)
    EXEC(@statement)
  END
 FETCH statement_cursor INTO @statement
END
CLOSE statement_cursor
DEALLOCATE statement_cursor
IF @WorkTest = 't' PRINT ('DROP SCHEMA '+@SchemaName)
ELSE
 BEGIN
   PRINT ('DROP SCHEMA '+@SchemaName)
   EXEC ('DROP SCHEMA '+@SchemaName)
 END
PRINT '------- ALL - DONE -------'

Comments

20101229杂记

11月底和12月初,我到首尔、东京、和北京做了笔生意,希望以后还会有这样的机会。在北京的前几天,一起和父母逛逛京城,看场戏,吃些烤鸭和涮羊肉,过了把瘾。得空我把一些见闻和照片晒给大家看看。

从老家回来后又一个人在北京呆了几天。在中央美术馆西、隆福寺北边的三联书店一坐就是六个多钟头。我喜欢这个书店,不大不小,服务态度也好。和其他顾客一起坐在楼梯边儿上阅读,在楼下的书从里走动,翻阅,浏览。我很少和这么多中文书在一起,浑身上下都感到舒坦,愉悦,亲近,和放松。怎么说呢,有点儿“似是故人来”的感觉。在茶店买茶前品茶时也有这个感觉,闻着那茉莉花的优雅和芬芳,感觉到那真喜欢茶的人的温暖和礼貌,微闭着眼,轻呷一口,体会那清香,就好像咱中国的美和爱,都凝聚在这儿啦。

也做了些电子书的市场调研,但感觉还不成熟,尤其是阅读器生产商和出版商之间在老书和新书的销售渠道的物流畅通方面有很多功课要做,即书源是个大问题。如果有一天有比Kindle再便宜点的阅读器出来,能从无线的网络里购买任何国家的任何文字的书籍,就爽歪歪了,虽然纸书的感觉是无法代替。从我个人来讲,别说无线购书,就是有线从网上购买母国的文字书籍,也会谢天谢地。

电子书不成熟,就买了些真正的纸书带回来,虽然很沉。回来后,很快就把老舍的《二马》看完了,让我感慨、唏嘘不已。老舍太伟大了。对想了解东西方文化和心理的人,我力荐这本书。从他对英国伦敦的街景、人情、和对话的描写,可以看出他的英文很不错!

年底了,也抽空读了些网上的东西,主要和WikiLeaks有关的,让我对中国和美国的一些做法更感到愤怒。

1.英国《卫报》把泄漏出来的一些美国国务院(外交部)电报公之与众,这儿是关于中国的,我把这些文件都读了一遍,感慨多多,五味杂陈。

2. Glenn Greenwald和CNN主持人的电视辩论和他的评论。顺便说一句,我个人以为读听美国主流媒体(CNN,VOA(更正,VOA 在美国国内决不算主流,只是美政府的对外宣传鼓动的喉舌而已), MSNBC,AP,Reuters,Fox News)等的新闻来学英语写作和获取信息并不是好主意,因为有太多的八股文。大家都知道目前天朝的很多主流宣传媒体的可笑、遮掩和僵硬(当然会有真东西),但这并不意味着美德法日台港英不玩这个(当然也会有真东西)。感觉好像中外记者群体的主流已经腐烂,不可救药,没剩几个好东西。那些不是记者的人所写的东西,好像更有功底和内容

3.Glenn Greenwald的关于Wired杂志当事人和记者的做法。Wired杂志应当公布全部的Adrian Lamo和Bradley Manning的网聊内容。我读了已经公布的那部分,对Glenn Greewald这哥们儿和那个叫Manning的年轻人又多了几分尊敬。

Comments

古举奥可高特你特奥

又是一个耶诞节,然后2011就来了。我在这里说过,我对这个高于佛教和伊斯兰教的“圣”字翻译不敢苟同,所以在这儿按宝岛台湾的叫法称之为“耶诞”。

在大学时凑热闹地过耶诞节。但对我本人来说,直到1998年第一次去瑞典时才体会到这个节日对西方人的重要性。那是我第一次也是最后一次遇见我的岳母。她在我儿子出生几个月后因癌症去世。耶诞节之与大部分西方人,正如春节之与大部分中国人。

耶诞节,和春节一样,会有很多好吃的。很多人喜欢精装的各种风味的放在盒子里的小块巧克力。在瑞典,耶诞节的食品有腌鱼(pickled herring),火腿,烟森的诱惑(Jansson’s temptation,翻译成“烟熏”有时更贴切一点。这是砂锅类(casserole)食品,地蛋、洋葱、小鱼、奶酪等混在一起后在炉子里烤出来的东西),红的园白菜,肉球,甜菜(beet root),香肠(sausage),面包,稠的含不少牛奶的白米粥(rice porridge)。这白米粥会和泡沫奶油(whipped cream)还有糖等混在一起当饭后的甜点吃。我在孩娘的外婆家里还吃到过马肉。

Christmas dinner 2010

看岳父做火腿比较有意思。他把一大块肉拿来先煮,锅里放那种叫leek的大葱,洋葱,少许蒜。煮完了,他竟然把那leek,洋葱,和蒜扔掉,让我暗暗心疼,因为那于我是好吃的东西。后来我跟孩娘调侃说这个,她听了后哈哈地笑。

在耶诞节期间,所有的瑞典同胞会喝一种叫“有了某司特”(julmust)的软饮料。这玩意只在耶诞节前出售,我喜欢。若干年前在瑞典,我喝上瘾了。由于签证问题困在瑞典,我和岳父一起生活了两个星期,稍微有点尴尬并且幽默。他问我想喝什么,我说那个julmust不错。他告我节后很难找并且我们也真没找到。

瑞典人庆祝耶诞节也会看很多电视里的歌曲等晚会表演节目。至少在我看到的晚会节目里,瑞典电视台的摄影师也喜欢用我记忆中的国内晚会节目的“慢慢消失”技术。这个我定义的“慢慢消失”指两个或更多的摄像机的拍摄影像切换时,先前的那个影像会“慢慢消失”,所以新的影像和旧的影像会有所重叠。八十年代中后期,我家买了个十四寸的黑白电视,应当是山东台放完《霍元甲》、《聪明的一休》等之后我该上初中的时候吧。手调节那种伸缩型的和圆形VHF和UHF天线时,掌握角度、离墙的远近、人所处的位置等,比较有意思。那象征着中国和其它国家一样进入万恶的屏幕时代的开始。一开始几次看到这个“慢慢消失”现象时,特别是晚会的光影交错,光线构成了闪耀晃动的十字,我不禁眼前一亮,暗暗称奇!

瑞典人在十二月二十四号庆祝这个节日,不是二十五日。下午三点左右,全国人民在洋溢着浓郁的节日气氛中一起欣赏一个小时的唐老鸭电视。四点了,tomten,瑞典版(家人装扮的)耶诞老人敲门给小孩送礼物。瑞典,和美国等其它西方国家一样,耶诞节的礼物对孩子的成长和打击有着深刻的影响。

Christmas tree 2010

很多年没在瑞典过耶诞了。但为了保持和传扬给孩子瑞典的文化,我们每年都会做瑞典式的耶诞菜。并且都会参加瑞典裔美国人在芝加哥的博物馆的耶诞节晚宴。老婆在网上看瑞典电视,听瑞典广播,聊慰思乡之情,很不错。

还有很多的St. Lucia和Star boy之类的瑞典传统活动,姜味饼干,格拉格甜酒(Glögg 温热,含葡萄干儿、坚果和一些调料)。不多写了。读者朋友,如你也过节,祝你耶诞和新年愉快。God Jul och Gott Nytt År!

Comments

Packages needed for building MySQL/MariaDb/Percona

From a stock/standard/typical/desktop install of Linux, it seems these are required in order to build MySQL/MariaDb/Percona forks:

gcc
gcc-c++
automake
libtool
bison
ncurses (Thanks Justin!)

Do apt-get, yum, rpm, emerge, or whatever to get them before doing configure, make and such. I am missing one, and I think it has “curse” or something like that in its name. Will update this post when I find that out.

Comments (4)

SSH without typing password using public key

I’ve done transferring public key id_rsa.pub (or id_dsa.pub and identity.pub) and appending that to authorized_keys (or authorized_keys2) file on remote host many times in the past, but the last time I did that was a bit over 2 months ago. Today I wanted to write a Python script for blog database backup and copying to my new home Linux machine with paramiko. So I thought I should put a note here for setting up ssh connection without typing the password by using public key. Here is what I did on my machine.

Note: The machine you operate on might have public identity key generated already. Look for identity.pub, id_rsa.pub, or id_dsa.pub under ~/.ssh/. In my case, my machine is fairly new and I haven’t generated it. ssh-gen asks for a passphrase and I didn’t provide any.

ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa.pub login@remoteHost

ssh-copy-id takes care of the whole key copying and appending business, which is nice. In my search, I also came across sshfs that can mount remote file system, which looks intriguing and useful.

Additional resources:
SSH with Keys in a console window
ssh-copy-id man page.

Comments

Getting rid of “Welcome to Internet Explorer 8″ screen

The lack of Vimperator function in Chrome, which exists in Firefox, prevents me from totally switching to it. Because I work in different environments in terms of different clients and operating systems, I have to use Internet Explorer sometimes.

And I found the “Welcome to Internet Explorer 8″ screen upon IE starting, if you have not followed Microsoft command to configure IE the first time you started, terribly irratitating. It has message like this in the window: “Internet Explorer 8 helps you use the Internet even faster than before. New features like search suggestions retrieve blah blah…”. Would you please respect the end user’s intelligence, get out of the way and leave him/her alone in peace, quite, and solitude? Sure, one can follow the wizard and set things up, but it feels like being violated. The ability to customize things is good, but not under your dictation.

Behold, there is a way! Come and follow my way, dear reader, for it leads to enlightenment and eternal happiness:

1. Start -> Run
2. gpedit.msc
3. Navigate to User Configuration -> Administrative Template -> Windows Components -> Internet Explorer -> Prevent performance of First Run Customize settings
4. Double click, then set it as Disabled.

Comments (2)

Page optimized by WP Minify WordPress Plugin