Live Space Mover

As Microsoft has announced Live Space to WordPress.com migration, the recommended way now is to use the official function. If you want to move Live Space to a self-hosted WordPress, create a blog on WordPress.com as a bridge (export from WordPress.com blog then import to your self-hosted WordPress).

Thank all guys who cared about this script ;)

A python script for importing blog entries from live space to WordPress.

With Google blog converter you may also be able to move from live space to blogger/TypePad/Moveable Type/Live Journal blog. (See note 2 if you want to use this script with Google blog converter).

Tested on Python 2.5/2.6 and Windows XP/Ubuntu Linux.

Based on the wonderful HTML parser library BeautifulSoup.

Hosted on Google Code, Source code svn is

svn checkout http://live-space-mover.googlecode.com/svn/trunk/ live-space-mover

If there are any problems when using this script, feel free to contact me. weiwei9 AT gmail dot com

User Guide

  1. Install Python runtime and Beautiful Soup. There are 2 combinations tested by me:
    1. Python Runtime 2.5.2 and Beautiful Soup 3.0.6
    2. Python Runtime 2.5.1 and Beautiful Soup 3.0.4 and a small fix in note 1

    Place the file BeautifulSoup.py in the same directory of live-space-mover.py, or install it into Python runtime by yourself

  2. Download the newest release zip from the hosted page, extract it. (Older versions may become unusable because of the HTML changes of Live Space).
  3. Change your live space settings
    1. Make sure it is open to anyone (not only to your contacts)
    2. Set time zone to the same with your wordpress blog
    3. Set date format to yyyy/mm/dd, or mm/dd/yyyy. This probably depends on the locale setting of your system or browser, the point is to make the “YEAR” appear in your date. If the program fails and complains about date parsing, try to use the option -t to specify date time format. For example, the time on my space is shown like “9:45 PM”, but if your time is shown like “9:45:15 PM”, you may want to use a command line like below

      python live-space-mover.py -s http://yourspaceid.spaces.live.com/ -t "%m/%d/%Y %I:%M:%S %p"

      An introduction for the time format parameters are available here.

    4. Set “Blog entry date display” to “Show the blog entry date in the header”
    5. From some users’ feedback, I noticed themes of live space differ slightly in structure, which may lead to failure of this program. So please change your live space theme to “Journey” (the same as my experiment space).
  4. Run the live-space-mover.py script. In Windows, open the command line (win+R, enter “cmd” and return), change to the directory (use “c:”/”d:” to change disk, use “cd” command to change directory, please google it for help if need) of live-space-mover.py, run command like this

    python live-space-mover.py -s http://yourspaceid.spaces.live.com/

    Replace the example parameter with your own. This will generate an XML file named “export_xxxxx.xml” in the same directory of this script, which is in WordPress export file format.
  5. Use the import function in WordPress to import the XML file generated in the last step, remember to choose “WordPress” type in the import page, rather than “LiveJournal” or something else.

Notes

  • A known limitation: can’t fetch comments after the first page!
  • If you met an “UnicodeDecodeError”, that’s probably because your live space contains Italian or other languages. There is a bug in Python 2.5, you need to fix it. Yes, fix Python library by your own hands :P
    If you installed Python to it’s default path on Windows, what you need to do is to change the file C:\Python25\Lib\sgmlib.py, in line 394
    if not 0 <= n <= 255:
    should be changed to
    if not 0 <= n <= 127:
    That’s all, I learned this from here
  • If you want to use Google blog converter with this script, the recommended way is to open a new blog on WordPress.com or any other wordpress powered BSP, import the XML generated by this script and export a new XML with the built-in exporting function of WordPress, then feed Google blog converter with this new XML, because I can’t make sure XML exported by this script will meet Google blog converter’s requirement. Another thing to remember is you should change timezone of every place to UTC, including live space, wordpress blog, and the machine used to run this script. Thank 1nm for sharing experiences about Google blog converter.
  • This mover heavily depends on some very weird and sucking patterns of HTML and JavaScript codes in live space. So it may become unusable at any time….in that case please inform me
  • As I studied, the metaWeblog API in WordPress seems not to support comments? WordPress supports other two kinds of XML-RPC interfaces, too, blogger and MovableType. The blogger API has been updated to GData, and the old API looks not supporting comments, too. The documentation of MovableType API is so complex….I can’t understand yet.So maybe it would be much easier to write a mover with PHP which can handle comments.
  • This script may generate log file and cache file in the working directory. If you met some errors, it would be very helpful to send the log file and error message to me. Thank you.
  • You can use command

    python live-space-mover.py -help

    to check other options of this script
  • Since version 1.0, the suggested usage method is to export an xml file then import it. The directly posting method with MetaWeblog interface has been deprecated but left in the release package for anybody’s needs.

Change Log

* Version 1.8
– CHG: Catch up with changes of live space

* Version 1.7.6
– CHG: Modify exported date format to be compatible with WordPress2Blogger converter.

* Version 1.7.5
– BUG: Handled the weird format of comment date box

* Version 1.7.4
– BUG: Fixed the comments order problem reported by Sun Yue

* Version 1.7.3
– BUG: Fixed the problem when comment author name contains emoticons

* Version 1.7.2
– BUG: Fixed the pubDate of post item for WP 2.7

* Version 1.7.1
– BUG: Fixed the comment author missing

* Version 1.7
– CHG: Catch up with changes of live space in Dec 2008

* Version 1.6
– CHG: Catch up with changes of Live Space
– BUG: Fix the bug “can’t scan domain name with hyphen when comments are more than 20”

* Version 1.5
– CHG: Catch up with changes of Live Space
– BUG: Escaped special chars for XML
– NEW: Improved error logging when parsing error

* Version 1.4
– BUG: Converted unicode numbers (in category name, entry title and comment author) to unicode string. The bug of duplicate categories in WP 2.3 was solved by this.

* Version 1.3
– NEW: Support category exporting by setting header field ‘User-Agent’ to Firefox

* Version 1.2
– Catch up with some changes of live space

* Version 1.1
– BUG: Error when title is empty
– NEW: Add cache and resume ability

* Version 1.0
– Use XML file and import function of WordPress, instead of MetaWeblog and post
– Change some fetching codes according to the code changes of live space
– Fixed a bug of extracting email address of comment author

* Version 0.93
– Add Donate Link

* Version 0.92
– Fix some bugs

* Version 0.9
– NEW: Support moving comments. Add file “my-wp-comments-post.php” for posting comments
– NEW: Add running modes, for only moving posts/comments, or both

* Version 0.2
– BUG: Error when reading live space in Italian or other languages. Actually it’s a bug of Python 2.5.
– BUG: Doesn’t jump out loop after moving the oldest entry.
– NEW: Support date format pattern specifying, added -t option
– NEW: Support starting from a specified entry, added -f option

* Version 0.1
– NEW: Starting, used to move my own live space

Thanks

Great Thanks for Michele Nasti and Oliver Diaz Herrera, they used this script, reported bugs to me and helped me to solve them. I’m not a patient guy and I don’t have many blogs to test this script too much. It’s them, the nice users, who made this script really usable.

It’s so wonderful to cooperate with guys all around the world ;-p

203 comments

  1. How to solve the problem as below:

    LINE 570 : ERROR Unexpected error
    Traceback (most recent call last):
    File “live-space-mover.py”, line 568, in
    main()
    File “live-space-mover.py”, line 491, in main
    i=fetchEntry(permalink,datetimepattern,mode)
    File “live-space-mover.py”, line 69, in fetchEntry
    soup = BeautifulSoup(page)
    File “D:\Python25\BeautifulSoup.py”, line 1282, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
    File “D:\Python25\BeautifulSoup.py”, line 946, in __init__
    self._feed()
    File “D:\Python25\BeautifulSoup.py”, line 971, in _feed
    SGMLParser.feed(self, markup)
    File “D:\Python25\lib\sgmllib.py”, line 99, in feed
    self.goahead(0)
    File “D:\Python25\lib\sgmllib.py”, line 133, in goahead
    k = self.parse_starttag(i)
    File “D:\Python25\lib\sgmllib.py”, line 285, in parse_starttag
    self._convert_ref, attrvalue)
    UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xb7 in position 0: ordinal
    not in range(128)
    Traceback (most recent call last):
    File “live-space-mover.py”, line 568, in
    main()
    File “live-space-mover.py”, line 491, in main
    i=fetchEntry(permalink,datetimepattern,mode)
    File “live-space-mover.py”, line 69, in fetchEntry
    soup = BeautifulSoup(page)
    File “D:\Python25\BeautifulSoup.py”, line 1282, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
    File “D:\Python25\BeautifulSoup.py”, line 946, in __init__
    self._feed()
    File “D:\Python25\BeautifulSoup.py”, line 971, in _feed
    SGMLParser.feed(self, markup)
    File “D:\Python25\lib\sgmllib.py”, line 99, in feed
    self.goahead(0)
    File “D:\Python25\lib\sgmllib.py”, line 133, in goahead
    k = self.parse_starttag(i)
    File “D:\Python25\lib\sgmllib.py”, line 285, in parse_starttag
    self._convert_ref, attrvalue)
    UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xb7 in position 0: ordinal
    not in range(128)

  2. 谢谢你发布这么好的工具。开始我弄不好,总是提示:WARNING Can’t find date
    后来发现,必须修改Live Space的显示日志日期为:在页首处显示日志发布日期。
    想到可能会有人遇到同样的问题,特此评论。^^

  3. Hey guys, I get this error:

    giga-nanobashvilis-macbook:~ Giga$ python /Users/Giga/Downloads/live-space-mover-1/live-space-mover.py -s http://bokuchava.spaces.live.com
    LINE 450 : INFO No more entries in cache file for loading
    LINE 225 : INFO connectiong to source blog http://bokuchava.spaces.live.com
    LINE 227 : INFO connect successfully, look for 1st Permalink
    LINE 241 : INFO Found 1st Permalink http://bokuchava.spaces.live.com/blog/cns!324D52D21278190B!359.entry
    LINE 548 : ERROR Unexpected error
    Traceback (most recent call last):
    File “/Users/Giga/Downloads/live-space-mover-1/live-space-mover.py”, line 546, in
    main()
    File “/Users/Giga/Downloads/live-space-mover-1/live-space-mover.py”, line 469, in main
    i=fetchEntry(permalink,datetimepattern,mode)
    File “/Users/Giga/Downloads/live-space-mover-1/live-space-mover.py”, line 142, in fetchEntry
    comment[’email’]=mailAndName[‘href’][len(‘mailto:’):]
    File “/Users/Giga/Downloads/live-space-mover-1/BeautifulSoup.py”, line 430, in __getitem__
    return self._getAttrMap()[key]
    KeyError: ‘href’
    Traceback (most recent call last):
    File “/Users/Giga/Downloads/live-space-mover-1/live-space-mover.py”, line 546, in
    main()
    File “/Users/Giga/Downloads/live-space-mover-1/live-space-mover.py”, line 469, in main
    i=fetchEntry(permalink,datetimepattern,mode)
    File “/Users/Giga/Downloads/live-space-mover-1/live-space-mover.py”, line 142, in fetchEntry
    comment[’email’]=mailAndName[‘href’][len(‘mailto:’):]
    File “/Users/Giga/Downloads/live-space-mover-1/BeautifulSoup.py”, line 430, in __getitem__
    return self._getAttrMap()[key]
    KeyError: ‘href’

    can anybody help?

  4. I have some problems with the script.

    get an “unknown url type”. i include a funcion ‘decode_htmlentities’ and change the line
    linkNodeHref = linkNode[“href”] => linkNodeHref = decode_htmlentities(linkNode[“href”])

    and i get an error in all datetime.strptime and datetime.strftime calls.
    i have to rewrite all to be called with time.strptime and time.strftime.

    Now the script works fine. Thank you.

  5. Thank you very much! You helped me a lot for migrating my space to a my new wordpress blog. I would like to donate but the above button doesn’t seem working. Feel free to email me.

  6. Gawd mate… at this rate, I’m going to manually transfer 3 years worth of posts… please help soon = =…. this __str__ thing is bugging me like hell

  7. Pingback: Moved « sanyam
  8. To Issa: Sorry I’m on travel now, can’t access web everyday. If you don’t mind, please send your live space address to me. I can do some testing and debugging work after one week… ;-P

  9. Gawd mate… at this rate, I’m going to manually transfer 3 years worth of posts… please help soon = =…. this __str__ thing is bugging me like hell :(

  10. Apparently it was something to do with the post… the 11th post? so I deleted it… (making a backup of course) and it continued, I just started the tool so I’ll post if any future errors occur.

  11. Oh no… my eyes ruined the code, good I made a copy/paste, I think it’s the tags… I’m replacing the pointy tags with curly ones :)

    Hi ^^, I’m having some kind of error … it gets the recent 10 entries and then an error… my blog is in English.

    This is the error I get after it succesfully gets the 10th latest blog entry.
    Traceback {most recent call last}:
    File “blah blah\live-space-mover.py”, line 507, in {module}
    File “blah blah\live-space-mover.py”, line 436, in main
    i=fetchEntry{permalink,datetimepattern,mode>
    File “blah blah\live-space-mover.py”, line 138, in fetchEntry
    comment[‘comment’]=u” .join{map{CDAata,cmDiv.content[1].contents}}
    File “blah blah\live-space-mover.py”, line 367, in __unicode__
    return __str__{self, None}
    NameError: global name ‘__str__’ is not defined

    I hope you don’t mind I typed the error/code since I’m using Remote Desktop from my mac to pc and can’t really copy/paste (I also don’t know how to do it from CMD)

    Thank you in advance, I never though I’d ever be able to rescue my posts from naughty Windows Live Spaces…

  12. Hi ^^, I’m having some kind of error >:
    File “blah blah\live-space-mover.py”, line 507, in
    File “blah blah\live-space-mover.py”, line 436, in main
    i=fetchEntry
    File “blah blah\live-space-mover.py”, line 138, in fetchEntry
    comment[‘comment’]=u” .join>
    File “blah blah\live-space-mover.py”, line 367, in __unicode__
    return __str__
    NameError: global name ‘__str__’ is not defined

    I hope you don’t mind I typed the error/code since I’m using Remote Desktop from my mac to pc and can’t really copy/paste (I also don’t know how to do it from CMD)

    Thank you in advance, I never though I’d ever be able to rescue my posts from naughty Windows Live Spaces… >

  13. i have search for a long time to get a mover like this.
    but…i cant open the BeautifulSoup.py download pape.
    can u send me a copy to me ?

    thx a lots !

  14. 你真的很帅~~ 德国的、意大利的全来下你的程序~!
    Really cool, bloggers from all over the world like Germany and Italy come to download your program!

  15. Thank you, Michele. Your site looks cool :)
    I’m working with the part of comments. I hope it can be done in several days

Leave a Reply to Brezeck Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.