<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>The Ji Village News &#187; Technology</title> <atom:link href="http://www.haidongji.com/category/technology/feed/" rel="self" type="application/rss+xml" /><link>http://www.haidongji.com</link> <description>季庄新闻--Haidong Ji's Blog</description> <lastBuildDate>Mon, 30 Jan 2012 02:41:37 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.1.2</generator> <item><title>ALTER LOGIN after Windows user or group name has been changed</title><link>http://www.haidongji.com/2012/01/29/alter-login-after-windows-user-or-group-name-has-been-changed/</link> <comments>http://www.haidongji.com/2012/01/29/alter-login-after-windows-user-or-group-name-has-been-changed/#comments</comments> <pubDate>Mon, 30 Jan 2012 02:41:37 +0000</pubDate> <dc:creator>Haidong Ji</dc:creator> <category><![CDATA[SQLServer]]></category> <category><![CDATA[Technology]]></category> <guid
isPermaLink="false">http://www.haidongji.com/?p=1287</guid> <description><![CDATA[If a Windows AD group or user has been renamed, and if that group or user was granted access to SQL Server in the past, then you can use ALTER LOGIN to rename the login inside of SQL Server: ALTER LOGIN [myDomain\oldName] WITH NAME = [myDomain\newName] It is not necessary to adjust user names in [...]]]></description> <content:encoded><![CDATA[<p>If a Windows AD group or user has been renamed, and if that group or user was granted access to SQL Server in the past, then you can use ALTER LOGIN to rename the login inside of SQL Server:</p><p>ALTER LOGIN [myDomain\oldName] WITH NAME = [myDomain\newName]</p><p>It is not necessary to adjust user names in SQL Server databases that this login has access to, but you may want to do it just for consistency. Here is the command to do that:</p><p>ALTER USER [myDomain\oldName] WITH NAME = [myDomain\newName]</p><p>Note that renaming an AD user or group does not change its SID. You can check an AD user or group&#8217;s SID with psgetsid, part of the very handy <a
href="http://technet.microsoft.com/en-us/sysinternals/bb842062">Sysinternal tool suite</a>.</p> ]]></content:encoded> <wfw:commentRss>http://www.haidongji.com/2012/01/29/alter-login-after-windows-user-or-group-name-has-been-changed/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Velocity 2011中国行随记</title><link>http://www.haidongji.com/2011/12/18/velocity-2011%e4%b8%ad%e5%9b%bd%e8%a1%8c%e9%9a%8f%e8%ae%b0/</link> <comments>http://www.haidongji.com/2011/12/18/velocity-2011%e4%b8%ad%e5%9b%bd%e8%a1%8c%e9%9a%8f%e8%ae%b0/#comments</comments> <pubDate>Sun, 18 Dec 2011 23:38:16 +0000</pubDate> <dc:creator>Haidong Ji</dc:creator> <category><![CDATA[Chinese]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[Web]]></category> <guid
isPermaLink="false">http://www.haidongji.com/?p=1275</guid> <description><![CDATA[这次到北京参加Velocity中国大会，感觉很不错。记录下自己的感想和体会。 参加任何会议，对我来说最有意义的是和参会者的互动与交流：业界的新发展，某些技术的实战经验，网上和网下的资源，好书好网站的推荐等。很多灵感都是在谈话中通过思维火花的碰撞而产生。还有一部分干货是谈话者有意无意中透露出来。这就需要听者有开放的视野和耳朵，懂得聆听，不打断别人的话语（特别是在关键时刻），记下这些小金块：有用的工具啦，一些参数的设定啦，实战中碰到的问题和解决办案啦，提高工效的技巧啦，很有用的网站和论坛的帖子啦，等等。记下这些东西后，注意不要把它们遗忘，要跟踪和研究。因为趁热打铁，凭着这股子热乎劲儿才能把那转变成对自己和公司有用的东西，才能跳到更高的层次。有时参加大会后能收到一个这样的小金块就够本甚至有盈余。所以作为管理人员，在给下属买书和参加会议上，不要吝啬猴精。说到这里，如果你是个管理人员，你有没有拨出资金给员工买书？如果有，恭喜你，因为我感觉这是一个非常值得自豪的东西！那就再进一步，你有没有在搞好财务的同时让报销的流程更容易？你有没有在保证工效的同时给下属提供工作和生活上的方便？ 在这同时，也要去回馈。懂得回馈，懂得提携和帮助同仁和后来人，才能获得人脉资源，并且自己也可以在其中得到灵感和启发。因为解释和分享一些东西的时候，也是自己学习的时候。当你把东西用口头语言表述出来给同仁的时候，这本身就是一个非常有效的思索过程。你能把事儿说圆吗？如果不能，为什么？是不是自己理解得还不够透彻？在这过程中，别人甚至你自己会突然意识到为什么没从这样或那样的层面和角度来研究和尝试这个东西呢？同仁和后进的提问也能产生很好的启发。另外，这也是我坚信的一点，让自己利益最大化的最好办法是不自私，不信你在生活和工作中试试看。自以为高明，自以为是牛人，自以为别人和大多数人比你低下，玩儿清高装逼的那一套，这种态度，只会带来坏处，不会有丝毫的好处。关于回馈和帮助提携他人，我写过一个纪念我的同仁Ken Henderson的文章，引用了他的一个aging champion syndrome（我觉得可以翻译成过气冠军症）的短文。他讲得非常透彻和精辟！英语爱好者不妨注意下，Ken写的东西很值得学习。 以上都是个人层面上的东西。提高到公司和企业的角度，也是同样的道理。现代的互联网公司需要一个开放和互动的平台。而作为公司来参与这个平台的构建并保持和发展其活力对公司本身大有益处：公司本身的人气、内部员工的士气、和在业界的口碑，并且在这个平台里也会有高质量的人力资源。 从互动、交流、开放、分享的角度来看这次Velocity会议，我感觉组织者做得非常成功。淘宝和淘宝的员工们和O&#8217;Reilly投入资金、时间、和人力资源来办Velocity是一个非常有意义且值得称道的事情。这种开放和互动的平台需要各种大小公司的参与，进而形成一个良性循环，这样我们就可以把事情做大、做活。水涨船高，众人拾柴火焰高，不就是这个道理吗？现在淘宝是这个活动领头人，但希望百度、腾讯、网易、新浪、华为、谷歌中国、雅虎中国、和微软中国等其它公司一起加入到这个行列来。 我在会上做了个基本的关于InnoDB状态的分享，并且也参加了几个演讲。英文主题演讲时，有中文的同声翻译。我看到有估计不到1/3的人用同声翻译的耳机。很遗憾，我没来得及试听一下。我倒是找到机会试听了下中译英的同声翻译，感觉效果并不太好。我很想听听大家对英译中的评价。 Steve Souders讲的一些工具和方法比较有意思。章文嵩的绿色计算也是亮点。我听了淘宝叔度和清无的基于nginx的Tengine的介绍。这个与我是个亮点，因为我感觉互联网服务器是一个不管大小公司都要有的东西，而Apache和lighthttpd好像都有点过气，其笨重和抗压的表现都不尽人意。而nginx和在其基础上提高的Tengine确实让我眼前一亮。我加入了Tengine的邮件列表，感到其人气，特别是在中文圈里，在慢慢上升，这是一个可喜的现象，我对其非常看好。 顺便加一句，淘宝的开源软件可以在这里找到。很多东西看起来都比较有意思，像这个tsar。淘宝的博客也很棒，我订阅了淘宝核心系统团队博客和淘宝共享数据平台博客。另外很多淘宝员工的个人博客也很棒，特别是如果你也搞MySQL的话。像苏普的这个Perl脚本就很管用，而江枫的Flashcache介绍和讨论也会很有启发。stronghearted的博客和褚霸的博客也非常好。 星期二晚上有机会和叔度、冯景辉长谈，海阔天空，保罗万象，聊得很开心。靠，叔度和景辉很牛逼幽默。从叔度那里，我不得不由衷地得出如下结论：山东人不简单啊，好得很！凌晨左右，多谢景辉，我们到创新工厂参观。我喜欢和欣赏李开复干出来的事和其影响力，能到创新工厂看看很不错。 另外豆瓣刘洪清的MapReduce分享也很有料。在会场外也和出版界及网上IT社区的一些编辑做了交流，很有意思。和苏普和江枫有了更深的关于MySQL的交流。 感谢苏普、淘穆公、江枫、和吴炳锡的邀请和盛情款待。星期天晚上刚坐了14个小时的飞机和堵了两个多小时的出租，疲惫不堪，没胃口享受云南真菌火锅的美味。但和Virident的Leon一起吃了非常美味的东北菜。谢谢Leon!和jackbillow还有hellodba的交流，听他们的环境和应用也很有意思。还认识了很多其他人，就不多说了，名字可是记不太清，因为大家有微薄的帐号，淘宝的还有武侠花名。 &#8212;&#8212; Velocity里谈到的东西大都和开源软件有关。我以前和国内的刘忠武一起做过关于数据库测试的开源软件，AnyDbTest。这个软件绝大部分是忠武老弟用C#写成。忠武是个很强悍的程序猿，C#，Java，Oracle，SQL Server，Python，Linux Shell脚本等都拿得起，放得下，想法周到、细致、全面，生猛异常！ &#8212;&#8212; 我12月4日星期天下午到，北京城正被大雾和空气污染所覆盖。到大会期间，天气已经好转，一片晴朗。星期三那天，从旅馆11楼往下看，看到附近学校里有体育课在进行。当时看到年轻的学生们能在日光和蓝天下在学校的操场上跑步，感觉真好，那前两天的坏天气和疲乏所带来的阴郁感也一扫而光。周一中午一个人吃了海底捞，很不错。海底捞的服务态度那么好，希望也能给其他商家店面的客服方面带来正面连锁效应。周三晚上会议结束后有很短的时间，我顺着远大路往西走，到四环后右转往北走了走，看到遛狗的，摆地摊的，然后到路边的一个小馆要了个蒜苔炒肉盖饭和一小碟芹菜花生米，很不错。这么吃着，走着，看行人和小饭馆的顾客，读着我同胞的表情、动作和喜怒哀乐，聆听他/她们的标准和不太标准的普通话，猜测和想像他/她们的生活状况，很好。 吃北京咸菜也是亮点，那腌制的藕片和花生米，嚼起来脆生生的，特过瘾。我在王府井那个天主教堂的前面小广场上看到在晚上，人们在一起跳国标和恰恰恰，也很不错。星期天早上我逛到了东四清真寺，但那个地方谢绝参观，有点遗憾。在那天飞回美国之前，买了几本书回来看。 第一次坐高铁，感觉很不错，最高时速差不多310公路左右。以前北京到枣庄十来个小时的车程，现在两小时稍微多一点就搞定。虽然铁路还有问题，但总起来讲很牛啊！]]></description> <content:encoded><![CDATA[<p>这次到北京参加Velocity中国大会，感觉很不错。记录下自己的感想和体会。</p><p>参加任何会议，对我来说最有意义的是和参会者的互动与交流：业界的新发展，某些技术的实战经验，网上和网下的资源，好书好网站的推荐等。很多灵感都是在谈话中通过思维火花的碰撞而产生。还有一部分干货是谈话者有意无意中透露出来。这就需要听者有开放的视野和耳朵，懂得聆听，不打断别人的话语（特别是在关键时刻），记下这些小金块：有用的工具啦，一些参数的设定啦，实战中碰到的问题和解决办案啦，提高工效的技巧啦，很有用的网站和论坛的帖子啦，等等。记下这些东西后，注意不要把它们遗忘，要跟踪和研究。因为趁热打铁，凭着这股子热乎劲儿才能把那转变成对自己和公司有用的东西，才能跳到更高的层次。有时参加大会后能收到一个这样的小金块就够本甚至有盈余。所以作为管理人员，在给下属买书和参加会议上，不要吝啬猴精。说到这里，如果你是个管理人员，你有没有拨出资金给员工买书？如果有，恭喜你，因为我感觉这是一个非常值得自豪的东西！那就再进一步，你有没有在搞好财务的同时让报销的流程更容易？你有没有在保证工效的同时给下属提供工作和生活上的方便？</p><p>在这同时，也要去回馈。懂得回馈，懂得提携和帮助同仁和后来人，才能获得人脉资源，并且自己也可以在其中得到灵感和启发。因为解释和分享一些东西的时候，也是自己学习的时候。当你把东西用口头语言表述出来给同仁的时候，这本身就是一个非常有效的思索过程。你能把事儿说圆吗？如果不能，为什么？是不是自己理解得还不够透彻？在这过程中，别人甚至你自己会突然意识到为什么没从这样或那样的层面和角度来研究和尝试这个东西呢？同仁和后进的提问也能产生很好的启发。另外，这也是我坚信的一点，让自己利益最大化的最好办法是不自私，不信你在生活和工作中试试看。自以为高明，自以为是牛人，自以为别人和大多数人比你低下，玩儿清高装逼的那一套，这种态度，只会带来坏处，不会有丝毫的好处。关于回馈和帮助提携他人，我写过一个<a
href="http://www.haidongji.com/2008/02/13/in-memory-of-ken-henderson/">纪念我的同仁Ken Henderson的文章</a>，引用了他的一个aging champion syndrome（我觉得可以翻译成过气冠军症）的短文。他讲得非常透彻和精辟！英语爱好者不妨注意下，Ken写的东西很值得学习。</p><p>以上都是个人层面上的东西。提高到公司和企业的角度，也是同样的道理。现代的互联网公司需要一个开放和互动的平台。而作为公司来参与这个平台的构建并保持和发展其活力对公司本身大有益处：公司本身的人气、内部员工的士气、和在业界的口碑，并且在这个平台里也会有高质量的人力资源。</p><p>从互动、交流、开放、分享的角度来看这次Velocity会议，我感觉组织者做得非常成功。淘宝和淘宝的员工们和O&#8217;Reilly投入资金、时间、和人力资源来办Velocity是一个非常有意义且值得称道的事情。这种开放和互动的平台需要各种大小公司的参与，进而形成一个良性循环，这样我们就可以把事情做大、做活。水涨船高，众人拾柴火焰高，不就是这个道理吗？现在淘宝是这个活动领头人，但希望百度、腾讯、网易、新浪、华为、谷歌中国、雅虎中国、和微软中国等其它公司一起加入到这个行列来。</p><p>我在会上做了个基本的关于InnoDB状态的分享，并且也参加了几个演讲。英文主题演讲时，有中文的同声翻译。我看到有估计不到1/3的人用同声翻译的耳机。很遗憾，我没来得及试听一下。我倒是找到机会试听了下中译英的同声翻译，感觉效果并不太好。我很想听听大家对英译中的评价。</p><p>Steve Souders讲的一些工具和方法比较有意思。章文嵩的绿色计算也是亮点。我听了淘宝叔度和清无的基于nginx的<a
href="http://tengine.taobao.org/">Tengine的介绍</a>。这个与我是个亮点，因为我感觉互联网服务器是一个不管大小公司都要有的东西，而Apache和lighthttpd好像都有点过气，其笨重和抗压的表现都不尽人意。而nginx和在其基础上提高的Tengine确实让我眼前一亮。我加入了Tengine的邮件列表，感到其人气，特别是在中文圈里，在慢慢上升，这是一个可喜的现象，我对其非常看好。</p><p>顺便加一句，<a
href="http://code.taobao.org/">淘宝的开源软件可以在这里找到</a>。很多东西看起来都比较有意思，<a
href="http://code.taobao.org/p/tsar/src/">像这个tsar</a>。淘宝的博客也很棒，我订阅了<a
href="http://rdc.taobao.com/blog/cs/">淘宝核心系统团队博客</a>和<a
href="http://www.tbdata.org/archives">淘宝共享数据平台博客</a>。另外很多淘宝员工的个人博客也很棒，特别是如果你也搞MySQL的话。像<a
href="http://www.orczhou.com/index.php/2011/12/how-to-split-mysqldump-file/">苏普的这个Perl脚本</a>就很管用，而<a
href="http://www.ningoo.net/">江枫的Flashcache</a>介绍和讨论也会很有启发。<a
href="http://www.dbunix.com/">stronghearted的博客</a>和<a
href="http://blog.yufeng.info/">褚霸的博客</a>也非常好。</p><p>星期二晚上有机会和叔度、冯景辉长谈，海阔天空，保罗万象，聊得很开心。靠，叔度和景辉很牛逼幽默。从叔度那里，我不得不由衷地得出如下结论：山东人不简单啊，好得很！凌晨左右，多谢景辉，我们到创新工厂参观。我喜欢和欣赏李开复干出来的事和其影响力，能到创新工厂看看很不错。</p><p>另外豆瓣刘洪清的MapReduce分享也很有料。在会场外也和出版界及网上IT社区的一些编辑做了交流，很有意思。和苏普和江枫有了更深的关于MySQL的交流。</p><p>感谢苏普、淘穆公、江枫、和吴炳锡的邀请和盛情款待。星期天晚上刚坐了14个小时的飞机和堵了两个多小时的出租，疲惫不堪，没胃口享受云南真菌火锅的美味。但和<a
href="http://www.weibo.com/u/2576948750">Virident的Leon</a>一起吃了非常美味的东北菜。谢谢Leon!和<a
href="http://www.weibo.com/jackbillow">jackbillow</a>还有<a
href="http://www.weibo.com/hellodba">hellodba</a>的交流，听他们的环境和应用也很有意思。还认识了很多其他人，就不多说了，名字可是记不太清，因为大家有微薄的帐号，淘宝的还有武侠花名。</p><p>&#8212;&#8212;</p><p>Velocity里谈到的东西大都和开源软件有关。我以前和国内的刘忠武一起做过关于数据库测试的开源软件，<a
href="https://anydbtest.codeplex.com/">AnyDbTest</a>。这个软件绝大部分是忠武老弟用C#写成。<a
href="http://www.cnblogs.com/harrychinese/">忠武是个很强悍的程序猿</a>，C#，Java，Oracle，SQL Server，Python，Linux Shell脚本等都拿得起，放得下，想法周到、细致、全面，生猛异常！</p><p>&#8212;&#8212;</p><p>我12月4日星期天下午到，北京城正被大雾和空气污染所覆盖。到大会期间，天气已经好转，一片晴朗。星期三那天，从旅馆11楼往下看，看到附近学校里有体育课在进行。当时看到年轻的学生们能在日光和蓝天下在学校的操场上跑步，感觉真好，那前两天的坏天气和疲乏所带来的阴郁感也一扫而光。周一中午一个人吃了海底捞，很不错。海底捞的服务态度那么好，希望也能给其他商家店面的客服方面带来正面连锁效应。周三晚上会议结束后有很短的时间，我顺着远大路往西走，到四环后右转往北走了走，看到遛狗的，摆地摊的，然后到路边的一个小馆要了个蒜苔炒肉盖饭和一小碟芹菜花生米，很不错。这么吃着，走着，看行人和小饭馆的顾客，读着我同胞的表情、动作和喜怒哀乐，聆听他/她们的标准和不太标准的普通话，猜测和想像他/她们的生活状况，很好。</p><p>吃北京咸菜也是亮点，那腌制的藕片和花生米，嚼起来脆生生的，特过瘾。我在王府井那个天主教堂的前面小广场上看到在晚上，人们在一起跳国标和恰恰恰，也很不错。星期天早上我逛到了东四清真寺，但那个地方谢绝参观，有点遗憾。在那天飞回美国之前，买了几本书回来看。</p><p>第一次坐高铁，感觉很不错，最高时速差不多310公路左右。以前北京到枣庄十来个小时的车程，现在两小时稍微多一点就搞定。虽然铁路还有问题，但总起来讲很牛啊！</p> ]]></content:encoded> <wfw:commentRss>http://www.haidongji.com/2011/12/18/velocity-2011%e4%b8%ad%e5%9b%bd%e8%a1%8c%e9%9a%8f%e8%ae%b0/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Generating dimension data for dates</title><link>http://www.haidongji.com/2011/07/30/generating-dimension-data-for-dates/</link> <comments>http://www.haidongji.com/2011/07/30/generating-dimension-data-for-dates/#comments</comments> <pubDate>Sun, 31 Jul 2011 04:12:06 +0000</pubDate> <dc:creator>Haidong Ji</dc:creator> <category><![CDATA[Linux]]></category> <category><![CDATA[MySQL]]></category> <category><![CDATA[Oracle]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[SQLServer]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[Windows]]></category> <guid
isPermaLink="false">http://www.haidongji.com/?p=1238</guid> <description><![CDATA[Most analytical and BI databases have date dimension table(s). One frequently needs to generate and populate such data. I present a solution below for such data generation, written in Python. Please use different database drivers/modules to connect to your specific database server (MySQL, SQL Server, Oracle, etc.) for data population. Notes: 1. It takes 2 [...]]]></description> <content:encoded><![CDATA[<p>Most analytical and BI databases have date dimension table(s). One frequently needs to generate and populate such data. I present a solution below for such data generation, written in Python. Please use different database drivers/modules to connect to your specific database server (MySQL, SQL Server, Oracle, etc.) for data population.</p><p>Notes:</p><p>1. It takes 2 parameters, start date and end date, in YYYYMMDD format, inclusive. Extensive error checking is built in, but let me know if you have comments/suggestions;</p><p>2. The script produce a Python dictionary (associated array) and print out its content;</p><p>3. The output includes dayNumber: a day&#8217;s position in a year. For example, 2011-02-01 is the 32ed day in 2011, therefore its dayNumber is 32;</p><p>4. The output includes weekNumber: a week&#8217;s position in a year. The week number in year is based on ISO standard. From documentation: the ISO year consists of 52 or 53 full weeks, where a week starts on a Monday and ends on a Sunday. The first week of an ISO year is the first (Gregorian) calendar week of a year containing a Thursday. This is called week number 1, and the ISO year of that Thursday is the same as its Gregorian year.</p><p>So, 2011-01-01 has the weekNumber 52, because it falls on a Saturday and belongs to the last week of 2010.</p><p>5. The output includes weekday information as well. 4 different variations are included:<br
/> Sunday 0, Monday 1, and so on<br
/> Sunday 1, Monday 2, and so on<br
/> Monday 0, Tuesday 1, and so on<br
/> Monday 1, Tuesday 2, and so on</p><p>6. The script requires the argparse module. It comes with Python 2.7. Python version prior to 2.7 does not have it by default, therefore you need to install it.</p><pre class="brush: python">
import argparse, sys, time
from datetime import date, timedelta
parser = argparse.ArgumentParser(description=&quot;Generating date dimension data&quot;)
parser.add_argument(&#039;-s&#039;, &#039;--startDate&#039;, help=&#039;Start date in YYYYMMDD format&#039;, required=True, dest=&#039;startDate&#039;)
parser.add_argument(&#039;-e&#039;, &#039;--endDate&#039;, help=&#039;end date in YYYYMMDD format&#039;, required=True, dest=&#039;endDate&#039;)
argList = parser.parse_args()
if (((not argList.startDate.isdigit()) or (not (len(argList.startDate) == 8))) or ((not argList.endDate.isdigit()) or (not (len(argList.endDate) == 8))) or (argList.startDate &gt; argList.endDate)):
	print &quot;Input(s) must be numeric in YYYYMMDD format and end date must not be earlier than start date&quot;
	sys.exit (1)
try:
	startDate = date(int(argList.startDate[0:4]), int(argList.startDate[4:6]), int(argList.startDate[6:8]))
	endDate = date(int(argList.endDate[0:4]), int(argList.endDate[4:6]), int(argList.endDate[6:8]))
except ValueError:
	print &quot;Input(s) must be valid date value in YYYYMMDD format&quot;
	sys.exit (1)
start = time.time()
while startDate &lt;= endDate:
	dateInfo = {&#039;dateYYYYMMDD&#039;: startDate.strftime(&#039;%Y%m%d&#039;), &#039;calDate&#039;: startDate.strftime(&#039;%Y-%m-%d&#039;), &#039;calDay&#039;: startDate.day, &#039;calMonth&#039;: startDate.month, &#039;calYear&#039;: startDate.year}
	dateInfo[&#039;dayOfWeekSunday0Monday1&#039;] = startDate.isoweekday() % 7
	dateInfo[&#039;dayOfWeekSunday1Monday2&#039;] = startDate.isoweekday() % 7 + 1
	dateInfo[&#039;dayOfWeekSunday6Monday0&#039;] = startDate.weekday()
	dateInfo[&#039;dayOfWeekSunday7Monday1&#039;] = startDate.isoweekday()
	dateInfo[&#039;dayNumber&#039;] = startDate.toordinal() - date(startDate.year - 1, 12, 31).toordinal()
	dateInfo[&#039;weekNumber&#039;] = startDate.isocalendar()[1]
	print dateInfo
	startDate = startDate + timedelta(1)
</pre>]]></content:encoded> <wfw:commentRss>http://www.haidongji.com/2011/07/30/generating-dimension-data-for-dates/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>Sysinternals and PAL</title><link>http://www.haidongji.com/2011/07/10/sysinternals-and-pal/</link> <comments>http://www.haidongji.com/2011/07/10/sysinternals-and-pal/#comments</comments> <pubDate>Sun, 10 Jul 2011 17:19:09 +0000</pubDate> <dc:creator>Haidong Ji</dc:creator> <category><![CDATA[SQLServer]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[Windows]]></category> <guid
isPermaLink="false">http://www.haidongji.com/?p=1229</guid> <description><![CDATA[Sysinternals and PAL (Performance Analysis of Logs) are two fantastic tools for general server information gathering and troubleshooting on Windows. Sysinternals suite is a set of tools that can be downloaded freely from Microsoft. One thing that is particularly attractive about them is that they can be run directly after downloading without special installation and [...]]]></description> <content:encoded><![CDATA[<p><a
href="http://technet.microsoft.com/en-us/sysinternals/bb842062">Sysinternals</a> and <a
href="http://pal.codeplex.com/">PAL (Performance Analysis of Logs)</a> are two fantastic tools for general server information gathering and troubleshooting on Windows.</p><p>Sysinternals suite is a set of tools that can be downloaded freely from Microsoft. One thing that is particularly attractive about them is that they can be run directly after downloading without special installation and all the footprints a typical installation leaves on the host machine (new directories under C:\Program Files\, registry entries, data files and what have you). I found them very valuable and handy.</p><p>In particular, psInfo provides good summary information of the server. For example, psinfo -s -h -d provides basic information about the system, software installed, Windows hot fixes installed, and disk volume information.</p><p>PAL: install PAL on your test/analysis/general purpose machine. Install the mschart control as it is a prerequisite of PAL. Here is how I used it:</p><p>1. Produce Perfmon data gathering template files using PAL. I exported 3 template files: overview, quick overview, and SQL Server 2005/2008;</p><p>Perfmon is the general-purpose data instrumentation tool on Windows. Through Perfmon you can gather system wide counters for things like CPU, memory, network, and disk IO. In addition, a lot of applications such as SQL Server, Exchange, and others, expose application level Instrumentation data such that you can collect them via Perfmon as well.</p><p>It is best to have a few handy data collection template, hence this step.</p><p>2. On the Windows server that I am interested in monitoring, import Perfmon counter template file produced above by opening a DOS prompt under Administrator and executing:</p><p>logman import -n templateNameIdefine -xml pathAndName2TemplateXmlFile</p><p>3. Open Perfmon, find the one you imported, and start collecting</p><p>4. After collection is done, copy the log file and use PAL for analysis. It will generate a very nice and intuitive report. Please don&#8217;t run PAL on the system you are diagnosing. Run it somewhere else. Be patient, as it will take a while for PAL to churn through the data (it took 2 hours on a Rackspace cloud server with 2 CPUs and 1 gig of RAM for a file about 30 meg)</p> ]]></content:encoded> <wfw:commentRss>http://www.haidongji.com/2011/07/10/sysinternals-and-pal/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>A comparison of HandlerSocket and mysql client libraries with Python</title><link>http://www.haidongji.com/2011/06/29/a-comparison-of-handlersocket-and-mysql-client-libraries-with-python/</link> <comments>http://www.haidongji.com/2011/06/29/a-comparison-of-handlersocket-and-mysql-client-libraries-with-python/#comments</comments> <pubDate>Wed, 29 Jun 2011 21:50:58 +0000</pubDate> <dc:creator>Haidong Ji</dc:creator> <category><![CDATA[Linux]]></category> <category><![CDATA[MySQL]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[Technology]]></category> <guid
isPermaLink="false">http://www.haidongji.com/?p=1223</guid> <description><![CDATA[I&#8217;ve done some benchmark testing of 2 Python modules for MySQL data retrieval: MySQLdb and pyhs. MySQLdb uses MySQL&#8217;s client libraries, whereas pyhs uses HandlerSocket that bypasses MySQL&#8217;s client layer and interfaces Innodb storage engine&#8217;s files directly. In my testing, HandlerSocket results in 82% improvement over mysql client libraries based on number of rows retrieved. [...]]]></description> <content:encoded><![CDATA[<p>I&#8217;ve done some benchmark testing of 2 Python modules for MySQL data retrieval: MySQLdb and pyhs. <a
href="http://www.haidongji.com/2011/04/04/install-mysqldb-module-for-python/">MySQLdb</a> uses MySQL&#8217;s client libraries, whereas <a
href="http://pypi.python.org/pypi/python-handler-socket">pyhs</a> uses <a
href="http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html">HandlerSocket</a> that bypasses MySQL&#8217;s client layer and interfaces Innodb storage engine&#8217;s files directly. In my testing, HandlerSocket results in 82% improvement over mysql client libraries based on number of rows retrieved. The tests were conducted under different conditions: right after a start when cache is cold, a warmed up cache after running SELECT * FROM customer, and alternating the execution order of those 2 Python files. The results are fairly consistent in that they all fall in the same range. Below is a sample output.</p><pre class="brush: text">
root@ubuntu:~# python hanSolo.py
Using HandlerSocket, below is a report of how many customer&#039;s name and address can be retrieved based on customer key:
Seconds elapsed:  61.0000810623
Rows retrieved:  509863
root@ubuntu:~# python mclient.py
Using mysql client libraries, below is a report of how many customer&#039;s name and address can be retrieved based on customer key:
Seconds elapsed:  61.0001530647
Rows retrieved:  280120
</pre><p>Here is my setup:</p><p>Hardware and software:<br
/> 1. Rackspace Cloud server Ubuntu 10.04 Lucid Lynx, 1 gig memory, 40 gig hard disk space, 64-bit<br
/> Linux ubuntu 2.6.35.4-rscloud #8 SMP Mon Sep 20 15:54:33 UTC 2010 x86_64 GNU/Linux</p><p>2. Following <a
href="http://www.percona.com/docs/wiki/repositories:apt">instruction here</a> to get Percona&#8217;s APT repository;</p><p>3. apt-get install percona-server-client-5.5</p><p>4. apt-get install percona-server-server-5.5</p><p>Enable HandlerSocket plugin. HandlerSocket is bundled with Percona Server 5.5, so you don&#8217;t have to download source files, config, make, build yourself:</p><pre class="brush: text">
1. mysql&gt; install plugin handlersocket soname &#039;handlersocket.so&#039;
</pre><p>2 cp /usr/share/mysql/my-large.cnf /etc/mysql/my.cnf</p><p>3. vim /etc/mysql/my.cnf with the following under mysqld section</p><pre class="brush: text">
loose_handlersocket_port = 9998
loose_handlersocket_port_wr = 9999
loose_handlersocket_threads = 16
loose_handlersocket_threads_wr = 1
open_files_limit = 65535
</pre><p>4. service mysql restart</p><p>Acquire Python&#8217;s MySQLdb and pyhs modules:</p><p>1. apt-get install libmysqlclient-dev<br
/> Necessary for building Python&#8217;s MySQLdb</p><p>2. apt-get install python-dev<br
/> Necessary Python header files, gcc, make, etc., for building Python modules</p><p>2. wget the <a
href="http://pypi.python.org/pypi/setuptools">appropriate egg</a> from this page. Get the one for your version of Python.</p><p>3. sh eggFileDownloadedFromTheStepAbove</p><p>4. easy_install MySQL-python<br
/> MySQLdb module, which uses mysql client for MySQL access</p><p>5. easy_install python-handler-socket</p><p>Prepare testing data</p><p>1. Follow <a
href="http://www.haidongji.com/2011/03/30/data-generation-with-tpc-hs-dbgen-for-load-testing/">instructions here</a> to get dbgen compiled;</p><p>2. While at the proper directory, run<br
/> ./dbgen -v -T c<br
/> It will generate a customer file that has 150000 rows</p><p>3. Create customer table in test. Here is the DDL:</p><pre class="brush: sql">
CREATE TABLE customer ( C_CUSTKEY     INTEGER NOT NULL,
C_NAME        VARCHAR(25) NOT NULL,
C_ADDRESS     VARCHAR(40) NOT NULL,
C_NATIONKEY   INTEGER NOT NULL,
C_PHONE       CHAR(15) NOT NULL,
C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
C_MKTSEGMENT  CHAR(10) NOT NULL,
C_COMMENT     VARCHAR(117) NOT NULL,
primary key (C_CUSTKEY));
</pre><p>4. load data local infile &#8216;/root/dbgen/customer.tbl&#8217; into table customer fields terminated by &#8216;|&#8217; lines terminated by &#8216;\n&#8217;;<br
/> Adjust file location as necessary.</p><p>Finally, here is the content of my throwaway Python test scripts. One highlight of mclient.py is it demonstrate how to return results back in as dict with MySQLdb.</p><p>hanSolo.py</p><pre class="brush: python">
import time
from pyhs import Manager
hs = Manager()
start = time.time()
i = 1
j= 0
while i &lt; 150000:
	data = hs.get(&#039;test&#039;, &#039;customer&#039;, [&#039;C_CUSTKEY&#039;, &#039;C_NAME&#039;, &#039;C_ADDRESS&#039;], &#039;%s&#039; % i)
	i=i+1
	if i == 150000:
		i = 1
	end = time.time()
	j = j + 1
	if int(end - start) &gt; 60:
		break
print &quot;Using HandlerSocket, below is a report of how many customer&#039;s name and address can be retrieved based on customer key:&quot;
print &quot;Seconds elapsed: &quot;, str(end - start)
print &quot;Rows retrieved: &quot;, str(j)
</pre><p>mclient.py</p><pre class="brush: python">
import sys, MySQLdb, time
my_host = &quot;localhost&quot;
my_user = &quot;root&quot;
my_pass = &quot;&quot;
my_db = &quot;test&quot;
try:
    db = MySQLdb.connect(host=my_host, user=my_user, passwd=my_pass, db=my_db)
except MySQLdb.Error, e:
     print &quot;Error %d: %s&quot; % (e.args[0], e.args[1])
     sys.exit (1)
cursor = db.cursor (MySQLdb.cursors.DictCursor)
i=1
j=0
start = time.time()
while i &lt; 150000:
	sql = &quot;select c_custkey, c_name, c_address from customer where c_custkey=%s&quot; % i;
	cursor.execute(sql)
	results = cursor.fetchall()
	i=i+1
	if i==150000:
		i=1
	end = time.time()
	j=j+1
	if int(end - start) &gt; 60:
		break
print &quot;Using mysql client libraries, below is a report of how many customer&#039;s name and address can be retrieved based on customer key:&quot;
print &quot;Seconds elapsed: &quot;, str(end - start)
print &quot;Rows retrieved: &quot;, str(j)
db.close()
</pre>]]></content:encoded> <wfw:commentRss>http://www.haidongji.com/2011/06/29/a-comparison-of-handlersocket-and-mysql-client-libraries-with-python/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Installing Perl DBI and DBD-mysql on Windows 64 bit</title><link>http://www.haidongji.com/2011/06/20/installing-perl-dbi-and-dbd-mysql-on-windows-64-bit/</link> <comments>http://www.haidongji.com/2011/06/20/installing-perl-dbi-and-dbd-mysql-on-windows-64-bit/#comments</comments> <pubDate>Mon, 20 Jun 2011 21:28:33 +0000</pubDate> <dc:creator>Haidong Ji</dc:creator> <category><![CDATA[MySQL]]></category> <category><![CDATA[Perl]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[Windows]]></category> <guid
isPermaLink="false">http://www.haidongji.com/?p=1214</guid> <description><![CDATA[I had trouble getting Perl DBI and DBD-mysql on Windows in the past. In addition, on Windows 64-bit, you sometimes see recommendations of using 32-bit Perl. Today I got to test the latest 64-bit ActiveState Perl distro for Windows, version 5.12.3.1204. I tested it on Windows 2008 R2 64-bit. I am happy to report that [...]]]></description> <content:encoded><![CDATA[<p>I had trouble getting <a
href="http://www.haidongji.com/2009/05/13/activestate-perl-510-windows-xp-and-dbd-mysql/">Perl DBI and DBD-mysql on Windows in the past</a>. In addition, on Windows 64-bit, you sometimes see recommendations of using 32-bit Perl.</p><p>Today I got to test the latest 64-bit ActiveState Perl distro for Windows, version 5.12.3.1204. I tested it on Windows 2008 R2 64-bit. I am happy to report that it works. I am not categorically recommend FOR the installation of 64-bit Perl on Windows, though.</p><p>Here are the steps:<br
/> 1. Get the ActiveState Perl 64-bit package for Windows and install it, following all the default options;<br
/> 2. On command prompt, do:<br
/> cd c:\perl64\bin<br
/> ppm install DBI<br
/> ppm install DBD-mysql</p><p>I then tested against both Oracle&#8217;s MySQL 5.5 Community Server and MariaDb&#8217;s 5.2.7 on Windows with MaatKit&#8217;s mk-table-checksum to confirm. And it worked fine:</p><p>C:\Users\Administrator\Downloads\maatkit-7540\maatkit-7540\bin>c:\Perl64\bin\perl.exe mk-table-checksum &#8211;databases mysql h=localhost,u=root,p=password</p> ]]></content:encoded> <wfw:commentRss>http://www.haidongji.com/2011/06/20/installing-perl-dbi-and-dbd-mysql-on-windows-64-bit/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Finding long running INNODB transactions</title><link>http://www.haidongji.com/2011/05/29/finding-long-running-innodb-transactions/</link> <comments>http://www.haidongji.com/2011/05/29/finding-long-running-innodb-transactions/#comments</comments> <pubDate>Mon, 30 May 2011 03:18:47 +0000</pubDate> <dc:creator>Haidong Ji</dc:creator> <category><![CDATA[MySQL]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[Technology]]></category> <guid
isPermaLink="false">http://www.haidongji.com/?p=1203</guid> <description><![CDATA[Notes: 1. The script prints out elapsed time since transaction started, MySQL thread id, and the kill statement for transactions running longer than a defined threshold value, in seconds. Just copy, paster, and then execute the kill statement if you want to terminate the long transaction(s); 2. Adjust shellCmd variable; 3. Adjust longRunningThreshold value as [...]]]></description> <content:encoded><![CDATA[<p>Notes:<br
/> 1. The script prints out elapsed time since transaction started, MySQL thread id, and the kill statement for transactions running longer than a defined threshold value, in seconds. Just copy, paster, and then execute the kill statement if you want to terminate the long transaction(s);<br
/> 2. Adjust shellCmd variable;<br
/> 3. Adjust longRunningThreshold value as needed. It is measured in seconds;<br
/> 4. No special libraries/modules needed, as long as there is a working mysql client;<br
/> 5. re module is used for regex processing. Good place to find examples of regular expression search and grouping. A status variable is used to assist locating MySQL thread id once a transaction running longer than the defined threshold is found.</p><pre class="brush: python">
import re, shlex, subprocess
def runCmd(cmd):
    proc = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False)
    out, err = proc.communicate()
    ret = proc.returncode
    return (ret, out, err)
shellCmd = &quot;&quot;&quot;mysql -h hostName -e &quot;show innodb status\\G&quot;&quot;&quot;&quot;
longRunningThreshold = 600
returnCode, stdOut, stdErr = runCmd(shellCmd)
targetTransactionFound = False
for line in stdOut.split(&#039;\n&#039;):
    if targetTransactionFound:
        match = re.search(r&#039;^MySQL\sthread\sid\s(\d+),&#039;, line)
        if match:
            print &#039;MySQL thread id&#039;, match.group(1), &#039; has been running for &#039;, secondsTransactionElapsed, &#039; seconds&#039;
            print &#039;To kill it, run: kill&#039;, match.group(1)
            targetTransactionFound = False
    else:
        match = re.search(r&#039;^---TRANSACTION\s\w+,\sACTIVE\s(\d+)\ssec&#039;, line)
        if match:
            if (long(match.group(1)) &gt; longRunningThreshold):
                targetTransactionFound = True
                secondsTransactionElapsed = match.group(1)
</pre>]]></content:encoded> <wfw:commentRss>http://www.haidongji.com/2011/05/29/finding-long-running-innodb-transactions/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>Some SQL Server 2008 page compression observations</title><link>http://www.haidongji.com/2011/05/14/some-sql-server-2008-page-compression-observations/</link> <comments>http://www.haidongji.com/2011/05/14/some-sql-server-2008-page-compression-observations/#comments</comments> <pubDate>Sun, 15 May 2011 03:03:00 +0000</pubDate> <dc:creator>Haidong Ji</dc:creator> <category><![CDATA[SQLServer]]></category> <category><![CDATA[Technology]]></category> <guid
isPermaLink="false">http://www.haidongji.com/?p=1185</guid> <description><![CDATA[A few days ago I wrote about Infobright&#8217;s column-based storage engine, and compared the sizes of raw text data file, gzipped file, MyISAM files, and Infobright files. At that time, I also wanted to compare that against data compression in SQL Server 2008, which is a new feature. But the Windows cloud server instance I [...]]]></description> <content:encoded><![CDATA[<p>A few days ago I wrote about Infobright&#8217;s <a
href="http://www.haidongji.com/2011/05/12/some-notes-and-observations-on-ice-storage-engine/">column-based storage engine, and compared the sizes of raw text data file, gzipped file, MyISAM files, and Infobright files</a>. At that time, I also wanted to compare that against data compression in SQL Server 2008, which is a new feature. But the Windows cloud server instance I fired up at the time didn&#8217;t have enough disk space, so I temporarily aborted that endeavour, until today.</p><p>Once again, testing data was generated using TPC-H&#8217;s dbgen tool. In fact I took the same steps <a
href="http://www.haidongji.com/2011/05/12/some-notes-and-observations-on-ice-storage-engine/">outlined here</a>. The total raw text file size is around 8.8 GB. I then created 2 SQL Server tables without any index (heap), one without compression and one with page compression. The DDL used is based on <a
href="http://blogs.msdn.com/b/sqlcat/archive/2010/07/30/loading-data-to-sql-azure-the-fast-way.aspx">DDL listed in this post</a>, without the indexes.</p><p>I used BULK INSERT for data loading. Here is the statement used:</p><pre class="brush: sql">
BULK INSERT [testDb].[dbo].[LINEITEM]
FROM &#039;C:\Users\Administrator\Documents\lineitem.tbl&#039;
WITH(CHECK_CONSTRAINTS,CODEPAGE=&#039;RAW&#039;,DATAFILETYPE=&#039;char&#039;,
FIELDTERMINATOR=&#039;|&#039;,ROWTERMINATOR=&#039;0x0a&#039;)
</pre><p>I then calculated storage space taken with sp_spaceused stored procedure.</p><p>Here is the results in GB:</p><p>Raw Text: 8.8<br
/> Raw Text After GZIP: 2.6<br
/> Uncompressed SQL Server table data size: 9.5<br
/> Compressed SQL Server table data size: 7.4</p><p>For clarity&#8217;s sake, here is the results with Infobright and MyISAM on Linux:<br
/> MyISAM File Size: 7.2<br
/> InfobrightFileSize: 1.5</p><p>SQL Server 2008 provides a system stored procedure that gives a size estimation if compression is used for a non-compressed table/index.The estimated size of a compressed table is 5.4 GB. Comparing it with 7.4, the estimation appears to be optimistic, in this case.</p> ]]></content:encoded> <wfw:commentRss>http://www.haidongji.com/2011/05/14/some-sql-server-2008-page-compression-observations/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Some notes and observations on ICE storage engine</title><link>http://www.haidongji.com/2011/05/12/some-notes-and-observations-on-ice-storage-engine/</link> <comments>http://www.haidongji.com/2011/05/12/some-notes-and-observations-on-ice-storage-engine/#comments</comments> <pubDate>Fri, 13 May 2011 03:15:46 +0000</pubDate> <dc:creator>Haidong Ji</dc:creator> <category><![CDATA[MySQL]]></category> <guid
isPermaLink="false">http://www.haidongji.com/?p=1169</guid> <description><![CDATA[I&#8217;ve used Vertica, a commercial column-based database storage engine, and was reasonably impressed. During O&#8217;Reilly MySQL conference last month, I checked out Infobright&#8217;s vendor booth and talked with some users. I became curious and wanted to test it out. Infobright has a free community version (ICE Infobright Community Edition)of its column-based storage engine that works [...]]]></description> <content:encoded><![CDATA[<p>I&#8217;ve used Vertica, a commercial column-based database storage engine, and was reasonably impressed. During O&#8217;Reilly MySQL conference last month, I checked out Infobright&#8217;s vendor booth and talked with some users. I became curious and wanted to test it out. Infobright has a free community version (ICE Infobright Community Edition)of its column-based storage engine that works with MySQL, which was what I used for my testing. I have no relationship with Infobright whatsoever, I happen to think that column-based storage can be a potentially disruptive technology in the BI/DW field. I&#8217;d love to hear your comments/experiences.</p><p>Here are some noteworthy points:</p><p>1. Setup is pretty easy. You can <a
href="http://www.infobright.org/wiki/Install_Guide_for_Linux/">follow steps here</a>. Note that the package has most relavant MySQL tools. A separate install of mysql client and server is NOT needed.</p><p>The ICE package has the following storage engines bundled:<br
/> BRIGHTHOUSE<br
/> MRG_MYISAM<br
/> CSV<br
/> MyISAM<br
/> MEMORY</p><p>2. I used TPC-H&#8217;s <a
href="http://www.haidongji.com/2011/03/30/data-generation-with-tpc-hs-dbgen-for-load-testing/">dbgen tool to generate data for testing</a>. The raw text file is around 8.8 GB, about 72 million rows;</p><p>3. I used Rackspace&#8217;s cloud server, CentOS 5.5, 1 GB memory, 64-bit, 35 GB of hard drive space for testing. I created 2 databases, each with one table called lineitem: one table uses the BRIGHTHOUSE storage engine, the other uses the MyISAM storage engine. No index on the MyISAM table initially;</p><p>4. On this particular Rackspace server, below is the record of how long it took to load that amount of data into BRIGHTHOUSE table:</p><pre class="brush: text">
# time mysql-ib infobright &lt; load.sql
real	22m46.974s
user	0m2.320s
sys	0m16.140s
</pre><p>And here is the record of how long it takes to load into the MyISAM table:</p><pre class="brush: text">
# time mysql-ib test &lt; load.sql
real	6m11.966s
user	0m1.960s
sys	0m14.420s
</pre><p>Here is what&#8217;s inside load.sql:</p><pre class="brush: text">
load data local infile &#039;/root/dbgen/lineitem.tbl&#039; into table lineitem fields terminated by &#039;|&#039; lines terminated by &#039;\n&#039;;
</pre><p>5. Size comparison, in GB. Here you can see the power of the impressive compression rate of a column-based storage engine:<br
/> Raw Text: 8.8<br
/> Raw Text After GZIP: 2.6<br
/> MyISAM File Size: 7.2<br
/> InfobrightFileSize: 1.5</p><p>6. I did a rudimentary performance comparison. The first one is on Infobright table, the second on MyISAM table:</p><pre class="brush: text">
mysql&gt; select count(*) from lineitem where l_shipdate between &#039;1993-01-01&#039; and &#039;1995-01-01&#039;;
+----------+
| count(*) |
+----------+
| 21880025 |
+----------+
1 row in set (15.06 sec)
mysql&gt; use test;
Database changed
mysql&gt; select count(*) from lineitem where l_shipdate between &#039;1993-01-01&#039; and &#039;1995-01-01&#039;;
+----------+
| count(*) |
+----------+
| 21880025 |
+----------+
1 row in set (1 min 9.23 sec)
</pre><p>I then created an index on l_shipdate for the MyISAM table, and improved the performance to a bit more than 10 seconds.</p> ]]></content:encoded> <wfw:commentRss>http://www.haidongji.com/2011/05/12/some-notes-and-observations-on-ice-storage-engine/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Setting up replication with XtraBackup</title><link>http://www.haidongji.com/2011/04/19/setting-up-replication-with-xtrabackup/</link> <comments>http://www.haidongji.com/2011/04/19/setting-up-replication-with-xtrabackup/#comments</comments> <pubDate>Tue, 19 Apr 2011 21:07:20 +0000</pubDate> <dc:creator>Haidong Ji</dc:creator> <category><![CDATA[Linux]]></category> <category><![CDATA[MySQL]]></category> <category><![CDATA[Technology]]></category> <guid
isPermaLink="false">http://www.haidongji.com/?p=1155</guid> <description><![CDATA[I attended Vadim Tkachenko&#8217;s talk on XtraBackup during MySQL conference in Santa Clara last week. Backups are obviously very important, but the use case I had in mind is this: Replicating a database that has Innodb tables in it, while keeping both master and slave on line if possible. Tangent: by the way, I love [...]]]></description> <content:encoded><![CDATA[<p>I attended <a
href="http://en.oreilly.com/mysql2011/public/schedule/detail/17116">Vadim Tkachenko&#8217;s talk on XtraBackup during MySQL conference</a> in Santa Clara last week. Backups are obviously very important, but the use case I had in mind is this:<br
/> Replicating a database that has Innodb tables in it, while keeping both master and slave on line if possible.</p><p>Tangent: by the way, I love the native backup utility that was once promised in MySQL 6.0, similar to SQL Server&#8217;s way of backup. It was like running &#8220;BACKUP myDb to DISK = &#8216;/backupDirectory/myDb.bak&#8217;&#8221; under mysql client, but I digress&#8230;</p><p>I have used mysqldump to accomplish this in the past, but I wondered how XtraBackup would fare in this task, especially after hearing Vadim&#8217;s talk and reading news on Percona&#8217;s development effort. To cut to the chase, this is my conclusion. Reproducing steps are listed immediately afterwards.<br
/> 1. innobackupex provides a consisten database backup, spitting out log file and log positions in stdout, which is nice and useful for slave initiation;<br
/> 2. It works with both MyISAM and innodb tables;<br
/> 3. If MyISAM tables are all you have, just run innobackupex &#8211;prepare /directoryWhereBackupIs, and then move the database directory from under /directoryWhereBackupIs to under your slave&#8217;s datadir, then make the necessary group and owner change to said directory and its content files, and you are ready to run the &#8220;change master&#8221; command and start slave;<br
/> 4. If the database has innodb tables, then in addition to step 3, you will also need to stop mysql on slave, move the ibdata1 file to datadir, then restart mysql, and run &#8220;change master&#8230;&#8221; and &#8220;start slave&#8221; commands.<strong> It does not matter if you are using innodb_file_per_table or not.</p><p>It will be nice if I can keep the slave up and running during this step when the database has innodb tables in it. Did I do anything wrong? Is there a better way? What if the slave has a database that has innodb tables and thus uses ibdata1 to begin with? What do you do then? Should I play with Tungsten&#8217;s replication? What are the compelling reasons to use Tungsten&#8217;s replication?</strong></p><p>In any case, from my limited testing, I think I will use innobackupex for future replication creation tasks, if I can afford a mysqld restart. Overall, it feels a bit easier than mysqldump approach that I&#8217;ve been using in the past.</p><p>Here are the steps needed to reproduce:</p><p>1. Fire up 2 Rackspace CentOS 5.5 servers. Rackspace cloud servers beat Amazon EC2 servers hands down, in my view, for developing/sandboxing purposes;<br
/> 2. Install the required mysql client, server, and XtraBackup on both servers;<br
/> 3. Make /etc/my.cnf by cloning the sample cnf files under /usr/share/my-small.cnf. 3 minimum changes were necessary: log-bin=mysql-bin, server-id=a unique number, datadir=/var/lib/mysql. The first 2 are necessary for replication, the last is needed for innobackupex</p><p>Well, while you are at it, on slave, add in read-only and skip-slave-start if appropriate. That&#8217;s best practice for read only slave.</p><p>4. <a
href="http://www.haidongji.com/2010/12/11/ssh-without-typing-password-using-public-key/">Add master server&#8217;s public key to authorized_keys on slave, to facilitate easy ssh connection</a>.<br
/> 5. On master, run this command:</p><pre class="brush: text">
innobackupex --databases=test --stream=tar /tmp/ --slave-info | ssh root@slave &quot;tar xfi - -C /root&quot;
When it finishes, you should see something like this:
110419 18:54:21  innobackupex: completed OK!
tar: Read 6656 bytes from -
</pre><p>Take note of 3 lines immediately above it, where it states the binlog file and log position, like this:</p><pre class="brush: text">
innobackupex: MySQL binlog position: filename &#039;mysql-bin.000002&#039;, position 2515
</pre><p>6. On slave, run this command:</p><pre class="brush: text">
innobackupex --apply-log /locationWhereBackupIs
</pre><p>then, assuming the database name is test, run the 2 commands below to change the group and owner to mysql:</p><pre class="brush: text">
chgrp -R mysql test
chown -R mysql test
</pre><p>move the directory under mysqld&#8217;s datadir:</p><pre class="brush: text">
mv test/ /mysql/datadir
</pre><p>If test database has innodb tables in it, stop mysql on slave, then copy ibdata1 to datadir, restart mysql.</p><p>7. On master, open up port 3306 if it is not already open, then create the replication account:</p><pre class="brush: text">
grant replication slave, replication client on *.* to repl@&#039;50.56.121.%&#039; identified by &#039;p@ssw0rd&#039;;
</pre><p>8. On slave, run:</p><pre class="brush: text">
change master to master_host=&#039;50.56.121.96&#039;, master_user=&#039;repl&#039;, master_password=&#039;p@ssw0rd&#039;, master_log_file=&#039;see output from innobackupex backup command on master&#039;, master_log_pos=numFrominnobackupexOutputOnMaster;
start slave;
show slave status\G
</pre>]]></content:encoded> <wfw:commentRss>http://www.haidongji.com/2011/04/19/setting-up-replication-with-xtrabackup/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
