Live Space to WordPress, officially…

The breaking news for me this morning is Microsoft officially announced they will give up Live Space service and allow all users migrating to WordPress.com. This is so good, for human beings.

And my small tool Live Space Mover will probably accomplish his mission after serving for 3 years. I wrote this script when I worked as an intern in Bosch RTC in 2007, for moving my own blog posts to WordPress. Since then I maintained it at times, when receiving emails from users around the world. Most updates were catching up with Live Space upgrade, while some are just bug fixing. Several nice people sent me donations, which were the best times I’ve ever had. The largest donation(20$) came from a man using an “@microsoft.com” email ;)

The small script experienced a few significant changes in its not long life. The first change is from directly publishing posts to destination blog via MetaWeblog API to exporting XML for user to import by himself. This leveraged existing functions and made it much easier. The second change is about fetching blog comments more than 1 page. Live Space switched the way of paging comments between Ajax and non-Ajax several times. In the early versions of Live Space Mover I once successfully “decoded” some Ajax functions of Live Space by watching the Ajax http traffic and guessing parameter meanings. That was fun but quite time-consuming, and also made the early Live Space Mover “full functional”. Then Live Space switched to non-Ajax way and my life was much easier for a long time. Recently Live Space used an Ajax way again but I had lost interests to waste time on this sh*t… So current Live Space Mover can’t fetch comments more than 1 page.

By the news there are 30 million live space users. I guess maybe 10% of them will move to WordPress.com? That is still a large number. For Chinese bloggers who want to use a blog service which is not GFWed, self-hosted blog (WordPress or other blog app) would be the best but expensive choice, and blog.163.com is the best choice among free solutions AFAIK.

QCon小记

我去的是QCon第二天。比较有印象的session就是Facebook和Twitter的talk。

1. memcache@facebook
演讲者是Marc,FB资深架构师,对memcache应该是有很强的hands on经验,对各种细节逻辑了解清楚且反应迅速。这个talk讨论了FB对memcache的大量修改和扩展,使得memcache能够有效的scale并 承载极高的流量。
扩展的逻辑之复杂度个人感觉已经超过了原有的memcache。可以认为memcache是充分利用内存来开发可扩展高负载应用的一个良好基础。

Facebook的规模是400m活跃用户,每天billion级的status update,数万台服务器。
memcached服务器每秒承担400m gets请求和28m sets请求,cache了超过2T的items,超过200T bytes。
单台memcached服务器每秒承担80k gets和2k sets,receive 9.7M/s, transmit 19M/s。

Facebook的架构大致可以分为DB tier, memcached tier和Web tier三层。

为了memcache,FB实现了新的serialization库,比php serialization快速高效。

mcproxy: memcache tier的顶层是一组mcproxy服务器,用来dispatch请求。memcached服务器是有按照地域的水平分割和冗余的,mcproxy负责基 于这些逻辑进行分发。

对于Hot Keys(系统中出现的热点,比如名人的页面),复制到多台memcached。

对于同一来源并发的大量gets请求,使用Broad Shallow Multi gets的方法将其分组,可以减少gets请求数从而减少数据流量。相应的,memcached服务器要进行冗余和分组,使得每组gets请求只需要发到 一组服务器。

key missing和delete的情况都做了很多处理来scale。

展示了扩展后的key的状态机,看起来相当复杂。

Tesing的原则是test fast and don’t break things。没有使用test framework。

Why memcache works: easy, robust primitives, allow hacking.

2. Big Data in Real-time at Twitter
演讲者Nick Kallen。伯克利毕业,Twitter系统架构师,络腮胡,右耳有长耳钉,轻声慢语,气质相当文艺……
keynote在http://www.slideshare.net/nkallen/q-con-3770885。

Twitter的这个talk主要集中在大数据量和实时这两点上。加上下午的session,主要讨论了四个问题及其解决方法:
a) Tweets. 根据时间进行水平分割。利用查询主要集中在最近的分区这一Locality。但仍存在MySQL死锁,创建新分区费时费力的问题,计划中的解决方案包括基 于主键分区,Cassandra和memcached等。
b) Timeline. offline计算,预存结果。所有的timeline都是预存在memcache里面的,每条tweet都会offline的fanout到所有它应该 出现的timeline上。预存的timeline定期truncate以保证其大小在一定范围内。总结起来就是使用offline计算的原则是查询方式 固定且offline计算结果可以限定在一定范围内;另外一旦offline结果丢失,重建的成本也应该考虑在内。
c) Social graph. Information like who follows whom and who blocks whom. 解决方案简单的说是对每条边进行双向的存储,然后通过分区,冗余和索引来scale。具体比较复杂。
d) Search Index. 在Document的时间两个维度上进行分割。可能使用Lucene代替MySQL。

Twitter,FB以及其他很多talk里面都提到了Cassandra。根据了解到的信息,FB将Cassandra使用在Inbox等应用上,而 Twitter认为Cassandra尚不能胜任critical的应用。

Published
Categorized as Uncategorized Tagged ,

Blogbus到WordPress的转换工具

过年的时候帮朋友写了个Blogbus到WordPress的转换工具。Blogbus提供XML格式的导出,转换到WordPress的格式也就是个力气活了。

利用了一些原来Live Space Mover的代码,所以代码还是Python的。Code放在http://code.google.com/p/blogbus-to-wordpress/

应用放在Google App Engine上了,用起来应该会比较简单。访问

http://blogbus-to-wordpress.appspot.com/

上传Blogbus的备份XML文件,得到转换后的WordPress格式文件,到WordPress后台导入即可。

WordPress导入的时候支持一大堆类型,注意选择WordPress类型。

Indent Cucumber Step Definitions in VIM

When writing Cucumber step definition ruby files with VIM, I noticed it can’t be indent correctly. Here is an example:

Given /^login as "(.+?)",\s*"(.+)"$/ do |user, password|
When "goto /system/account/login"
When "browser type 'login' #{user}"
When "browser type 'password' #{password}"
When "browser click 'text-input-password'"
  Given "user #{user.strip}'s network"
end

After a few trying I found the root cause is the “When” keywords used by Cucumber step definitions. In Cucumber feature file, “When” is a keyword; while in step definition ruby file, “When” is a pre-defined method name. In VIM ruby indent rules (on my Mac OSX, it’s located at /usr/share/vim/vim72/indent/ruby.vim), “when” is recognized as a ruby language keyword, but the regex matching is CASE INSENSITIVE! Actually in ruby all keywords are case sensitive, so what we need to do is fix the regex in vim indent file, let the regular expressions be case sensitive.

File diff below shows the changes:

--- indent/ruby.vim	2009-12-17 14:50:02.000000000 +0800
+++ /usr/share/vim/vim72/indent/ruby.vim	2009-07-14 13:28:14.000000000 +0800
@@ -54,11 +54,11 @@
       \ '\|while\|until\|else\|elsif\|case\|when\|unless\|begin\|ensure' .
       \ '\|rescue\)\>' .
       \ '\|\%([*+/,=-]\|<<\|>>\|:\s\)\s*\zs' .
-      \    '\<\%(if\|for\|while\|until\|case\|unless\|begin\)\>\C'
+      \    '\<\%(if\|for\|while\|until\|case\|unless\|begin\)\>'
 
 " Regex used for words that, at the start of a line, remove a level of indent.
 let s:ruby_deindent_keywords =
-      \ '^\s*\zs\<\%(ensure\|else\|rescue\|elsif\|when\|end\)\>\C'
+      \ '^\s*\zs\<\%(ensure\|else\|rescue\|elsif\|when\|end\)\>'
 
 " Regex that defines the start-match for the 'end' keyword.
 "let s:end_start_regex = '\%(^\|[^.]\)\<\%(module\|class\|def\|if\|for\|while\|until\|case\|unless\|begin\|do\)\>'
@@ -70,7 +70,7 @@
       \ '\|\'
 
 " Regex that defines the middle-match for the 'end' keyword.
-let s:end_middle_regex = '\<\%(ensure\|else\|\%(\%(^\|;\)\s*\)\@<=\\|when\|elsif\)\>\C'
+let s:end_middle_regex = '\<\%(ensure\|else\|\%(\%(^\|;\)\s*\)\@<=\\|when\|elsif\)\>'
 
 " Regex that defines the end-match for the 'end' keyword.
 let s:end_end_regex = '\%(^\|[^.:@$]\)\@<=\'

Now my VIM indent the snippet correctly, yeah!

Given /^login as "(.+?)",\s*"(.+)"$/ do |user, password|
  When "goto /system/account/login"
  When "browser type 'login' #{user}"
  When "browser type 'password' #{password}"
  When "browser click 'text-input-password'"
  Given "user #{user.strip}'s network"
end

=======================
This has been fixed by Tim Pope who maintains the vim-ruby script on GitHub. He also points out this problem only happens when VIM was globally set to “ignorecase”.

在黑莓8900上显示中文MP3曲名和专辑封面

黑莓8900内置的播放器还不错,除了有时候反应慢点之外其它都还好。经过一番折腾发现不仅能显示中文曲名还能显示专辑封面,挺不错的。

8900的播放器可以读ID3v2的MP3 tag,要显示中文曲名就要把中文曲名以UTF-8编码写在tag里面。ID3v1的tag不支持UTF-8编码,最好用ID3v2.4 UTF8编码来写tag。修改MP3 tag和对tag进行转码的工具很多,推荐MP3tag这个软件。在Option里面对读写和移除tag都做好设置就行了,如下图:

MP3TAG Config
MP3TAG Config

然后基本的操作步骤就是在这个软件里面打开一堆MP3,全选中之后先Ctrl + S保存一下,这样就写入了ID3v2.4的tag,然后再remove tag一下,这样就去掉了其他捣乱的tag,就OK了。

至于显示专辑封面的功能,本来看到这篇文章以为需要把封面图片写到tag里面,觉得这样太麻烦了就没弄。结果无意中发现只要在目录里面把封面图片命名成Folder.jpg就行了,8900自带的播放器可以找到并且显示出来 :)

另外就是注意如果这些MP3已经被黑莓扫描过了,会生成一个BBThumbs文件,重写MP3 tag或者更换专辑封面之后要删掉这个文件让BB重新生成。