python2 http響應中文顯示unicode uXXX的問題

华腾海神發表於2024-04-26

python2編碼解碼會以unicode作為中間碼,要用decode和encode解碼後再編碼
其中decode解碼,是把bytes以給定的編碼格式解碼成unicode
encode是以給定的編碼格式將unicode編碼為bytes
資料是以bytes形式傳遞和儲存的,程式需要用正確的編碼來將bytes解碼顯示
decode: From bytes To Unicode
encode: From Unicode To bytes

在python2中試了多種編解碼組合,都無法解決中文顯示為unicode形式的問題
最終發現是http框架對json資料做序列化的時候出的問題

python-json 相關程式碼註釋如下

def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True,
        allow_nan=True, cls=None, indent=None, separators=None,
        encoding='utf-8', default=None, sort_keys=False, **kw):
    """Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
    ``.write()``-supporting file-like object).

    If ``skipkeys`` is true then ``dict`` keys that are not basic types
    (``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
    will be skipped instead of raising a ``TypeError``.

    If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
    output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
    instance consisting of ASCII characters only.  If ``ensure_ascii`` is
    false, some chunks written to ``fp`` may be ``unicode`` instances.
    This usually happens because the input contains unicode strings or the
    ``encoding`` parameter is used. Unless ``fp.write()`` explicitly
    understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
    cause an error.

其中有關於 ensure_ascii 引數的說明
大意就是,如果ensure_ascii為true,任何非ascii字元都會被轉義成\uXXXX的形式
再看tornado中write方法的程式碼, write方法對dict型別資料統一用escape.json_encode序列化為json
兩個方法程式碼如下

    def write(self, chunk):
        if self._finished:
            raise RuntimeError("Cannot write() after finish()")
        if not isinstance(chunk, (bytes, unicode_type, dict)):
            message = "write() only accepts bytes, unicode, and dict objects"
            if isinstance(chunk, list):
                message += ". Lists not accepted for security reasons; see http://www.tornadoweb.org/en/stable/web.html#tornado.web.RequestHandler.write"
            raise TypeError(message)
        if isinstance(chunk, dict):
            chunk = escape.json_encode(chunk)
            self.set_header("Content-Type", "application/json; charset=UTF-8")
        chunk = utf8(chunk)
        self._write_buffer.append(chunk)
    ===================================================
    def json_encode(value):
        return json.dumps(value).replace("</", "<\\/")

可以看到json_encode中 json dumps方法並沒有給定ensure_ascii的值,所以ensure_ascii就是預設值True,也就是,被序列化的資料中的字串所有非ascii的字元都會轉義為unicode形式。

解決辦法,就是手動處理json資料,將ensure_ascii設定為False。

json.dumps(value, ensure_ascii=False)

老專案沒辦法,新專案必定是python3了。


https://www.cnblogs.com/haiton/p/18159481
轉載須註明出處!!!!

相關文章