Next Spaceship

Driving into future…

Transform All Files to UTF-8

| Comments

I wrote a blog several years ago about transform all files in a folder recursively from one encoding to another. Today I decide to solve this issue more completely.

I wrap all this to a Python egg package, upload it to PyPi, and everyone who want to use this don’t need to copy & paste code any more. Just install it and use it.

1
pip install toutf8

This ships with a shell command, so after installing, just type

1
toutf8 FILENAME

to transform a single file to UTF-8 encoding, or

1
toutf8 PATHNAME

to transform all files in folder PATHNAME to UTF-8 encoding.

The script can detect the source encoding, so whether it being GBK, GB2312, GB18030, CP936 or Shift-jis, all will be transformed to UTF-8.

1
2
3
4
5
6
7
8
9
10
11
GBK        --> UTF-8
GB2312     --> UTF-8
GB18030    --> UTF-8
CP936      --> UTF-8
Shift-jis  --> UTF-8
Euc-jp     --> UTF-8
Korean     --> UTF-8
Vietnamese --> UTF-8
UTF-16LE   --> UTF-8
UTF-16BE   --> UTF-8
UTF-32     --> UTF-8

Advance usage

Use a regular expression to filter out which kinds of files should be transformed.

1
toutf8 PATHNAME .*txt

Comments