python批次將檔案編碼格式轉換為 UTF8帶標籤的格式，解決linux環境下中文編碼亂碼的問題

一字千金發表於2024-12-04

原文網址 : https://www.cnblogs.com/bclshuai/p/18586827

指定一個資料夾，遍歷資料夾內的檔案和子資料夾內的檔案，然後識別檔案字尾為cpp的檔案，透過chardet取檢測檔案的編碼格式，如果不是UTF-8-SIG，則轉換為UTF-8-SIG

python指令碼格式如下

import os
import sys
import codecs
import chardet

def convert(filename,out_enc="UTF-8-SIG"):
  try:
    content=codecs.open(filename,'rb+').read()
    source_encoding=chardet.detect(content)["encoding"]
    print(source_encoding)
    
    if source_encoding != "UTF-8-SIG":#"GB2312":
      content=content.decode(source_encoding).encode(out_enc)
      codecs.open(filename,'wb+').write(content)
      print("covert file "+filename)
  except IOError as err:
    print("I/O error:{0}".format(err))

def removeBom(file):
  '''移除UTF-8檔案的BOM位元組'''
  data = open(file,'rb+').read()
  if data[:3] == codecs.BOM_UTF8:
    data = data[3:]
    data.decode("utf-8")
    # print(data.decode("utf-8"))


def explore(dir):
  for root,dirs,files in os.walk(dir):
    for file in files:
      if os.path.splitext(file)[1]=='.cpp':
       print(file)
       path=os.path.join(root,file)
       convert(path)
       # removeBom(path)

def main():
  explore(sys.argv[1])

if __name__=="__main__":
  main()

如果出現未找到chardet的錯誤，在cmd中執行下pip install chardet 命令，就可以安裝chardet

然後用cmd執行執行命令 python ToUtf8.py test test是資料夾的名稱；就可以批次實現檔案的編碼格式識別和轉換了；

如何進行Linux下檔案編碼格式轉換
2020-12-11
Linux
文字檔案的編碼格式
2022-09-01
mysql5.7.22設定中文編碼-解決亂碼問題Linux
2018-05-29
MySqlLinux
轉換Linux 檔案編碼方式
2020-11-16
Linux
檢測檔案編碼，轉換檔案編碼
2022-05-24
Spring MVC 中文編碼亂碼解決
2018-11-07
SpringMVC
IDEA如何設定編碼格式，字元編碼，全域性編碼和專案編碼格式
2024-11-18
Idea字元
C# 解決httplistener querystring 中文亂碼、返回json中文格式亂碼
2021-04-27
C#HTTPJSON
ajax 提交資料格式一個為 utf8 後臺gbk格式檔案接收亂碼
2020-04-05
Redis中文顯示為Unicode編碼亂碼的解決辦法
2021-09-06
RedisUnicode
vscode如何將所有檔案格式lf批次轉換為crlf
2024-05-16
VSCode
解決中文亂碼問題
2024-05-14
jasperreport HTML格式亂碼問題
2020-04-06
HTML
解決PHP匯出CSV檔案中文亂碼問題
2019-03-15
PHP
解決Url帶中文引數亂碼問題
2024-05-26
Python：Python中文寫入csv檔案出現亂碼問題的解決方案之一
2018-05-12
Python
Python 編碼轉換與中文處理
2021-09-09
Python
MySQL直接匯出CSV檔案，並解決中文亂碼的問題
2020-10-23
MySql
springmvc 解決中文亂碼問題
2024-05-14
SpringMVC
MySql中文亂碼問題解決
2020-11-13
MySql
Jmeter 解決中文亂碼問題
2020-10-10
JMeter
雲伺服器：解決linux下zip檔案解壓亂碼問題
2019-11-19
伺服器Linux
python json.dumps中文亂碼問題解決
2019-06-13
PythonJSON
Python編解碼問題與文字檔案處理
2021-06-19
Python
徹底解決Python編碼問題
2020-04-25
Python
python base64 編解碼，轉換成Opencv，PIL.Image圖片格式
2018-06-01
PythonOpenCV
關於本地GB2312編碼的檔案上傳github後中文出現亂碼的問題
2024-06-11
Github
編碼轉換統一防止亂碼
2024-05-31
Python將xml格式轉換為json格式
2019-03-22
PythonXMLJSON
docker 容器指定utf-8編碼，解決中文亂碼
2024-10-14
Docker
[20180502]UTF8編碼問題.txt
2018-05-02
python檔案中寫中文亂碼怎麼解決
2021-09-11
Python
Idea編碼UTF-8中.properties 配置檔案中文亂碼
2024-03-19
Idea
用Javascript實現UTF8編碼轉換成gb2312編碼
2023-03-07
JavaScript
Sublime Text 3 中文亂碼問題的解決
2020-10-07
LiveCharts中文顯示亂碼問題的解決
2024-06-02
Echarts
js解決url中文亂碼問題
2024-06-05
JS
解決plsql中中文亂碼問題
2020-12-12
SQL

python批次將檔案編碼格式轉換為 UTF8帶標籤的格式，解決linux環境下中文編碼亂碼的問題

相關文章