I wrote a blog several years ago about transform all files in a folder recursively from one encoding to another. Today I decide to solve this issue more completely.
I wrap all this to a Python egg package, upload it to PyPi, and everyone who want to use this don’t need to copy & paste code any more. Just install it and use it.
pip install toutf8
This ships with a shell command, so after installing, just type
to transform a single file to UTF-8 encoding, or
to transform all files in folder PATHNAME to UTF-8 encoding.
The script can detect the source encoding, so whether it being GBK, GB2312, GB18030, CP936 or Shift-jis, all will be transformed to UTF-8.
GBK --> UTF-8 GB2312 --> UTF-8 GB18030 --> UTF-8 CP936 --> UTF-8 Shift-jis --> UTF-8 Euc-jp --> UTF-8 Korean --> UTF-8 Vietnamese --> UTF-8 UTF-16LE --> UTF-8 UTF-16BE --> UTF-8 UTF-32 --> UTF-8
Use a regular expression to filter out which kinds of files should be transformed.
toutf8 PATHNAME .*txt