python2編碼解碼會以unicode作為中間碼,要用decode和encode解碼後再編碼
其中decode解碼,是把bytes以給定的編碼格式解碼成unicode
encode是以給定的編碼格式將unicode編碼為bytes
資料是以bytes形式傳遞和儲存的,程式需要用正確的編碼來將bytes解碼顯示
decode: From bytes To Unicode
encode: From Unicode To bytes
在python2中試了多種編解碼組合,都無法解決中文顯示為unicode形式的問題
最終發現是http框架對json資料做序列化的時候出的問題
python-json 相關程式碼註釋如下
def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True,
allow_nan=True, cls=None, indent=None, separators=None,
encoding='utf-8', default=None, sort_keys=False, **kw):
"""Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
``.write()``-supporting file-like object).
If ``skipkeys`` is true then ``dict`` keys that are not basic types
(``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
will be skipped instead of raising a ``TypeError``.
If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
instance consisting of ASCII characters only. If ``ensure_ascii`` is
false, some chunks written to ``fp`` may be ``unicode`` instances.
This usually happens because the input contains unicode strings or the
``encoding`` parameter is used. Unless ``fp.write()`` explicitly
understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
cause an error.
其中有關於 ensure_ascii 引數的說明
大意就是,如果ensure_ascii為true,任何非ascii字元都會被轉義成\uXXXX
的形式
再看tornado中write方法的程式碼, write方法對dict型別資料統一用escape.json_encode序列化為json
兩個方法程式碼如下
def write(self, chunk):
if self._finished:
raise RuntimeError("Cannot write() after finish()")
if not isinstance(chunk, (bytes, unicode_type, dict)):
message = "write() only accepts bytes, unicode, and dict objects"
if isinstance(chunk, list):
message += ". Lists not accepted for security reasons; see http://www.tornadoweb.org/en/stable/web.html#tornado.web.RequestHandler.write"
raise TypeError(message)
if isinstance(chunk, dict):
chunk = escape.json_encode(chunk)
self.set_header("Content-Type", "application/json; charset=UTF-8")
chunk = utf8(chunk)
self._write_buffer.append(chunk)
===================================================
def json_encode(value):
return json.dumps(value).replace("</", "<\\/")
可以看到json_encode中 json dumps方法並沒有給定ensure_ascii的值,所以ensure_ascii就是預設值True,也就是,被序列化的資料中的字串所有非ascii的字元都會轉義為unicode形式。
解決辦法,就是手動處理json資料,將ensure_ascii設定為False。
json.dumps(value, ensure_ascii=False)
老專案沒辦法,新專案必定是python3了。
https://www.cnblogs.com/haiton/p/18159481
轉載須註明出處!!!!