Monthly Archives: October 2008

Microsoft Word characters to ASCII

Microsort Word uses some characters that are in the Windows-1252 character set in the range x80-x9F range that aren’t used in the Latin-1 (extended ASCII) character set that it’s derived from. Here’s a quick reference for converting them to an extended ASCII character set.

Written in python


conv = [
[u'\u2018',"'"], # 145 - \x91
[u'\u2019',"'"], # 146 - \x92
[u'\u201C','"'], # 147 - \x93
[u'\u201D','"'], # 148 - \x94
[u'\u2013',"-"], # 150 - \x96
[u'\u2014',"-"], # 151 - \x97
]