檢查字串的byte[]是什麼編碼字符集

漠孤烟發表於2024-06-12

引入依賴

        <dependency>
            <groupId>com.googlecode.juniversalchardet</groupId>
            <artifactId>juniversalchardet</artifactId>
            <version>1.0.3</version>
        </dependency>

包裝一下

public class CharsetUtil {

    /**
     * 獲取字元(串位元組陣列格式)的字符集
     *
     * @param bytes 字串的位元組陣列
     * @return 字符集
     */
    public static String getCharset(byte[] bytes) {
        String defaultCharset = "UTF-8";
        UniversalDetector detector = new UniversalDetector(null);
        detector.handleData(bytes, 0, bytes.length);
        detector.dataEnd();
        detector.reset();
        String detectedCharset = detector.getDetectedCharset();
        return detectedCharset == null ? defaultCharset : detectedCharset;
    }

}

驗證

    @Test
    void getCharset() {
        String hello = "hello, world";
        System.out.println("hello charset: " + CharsetUtil.getCharset(hello.getBytes()));
    }

輸出(與專案字符集設定有關):

hello charset: UTF-8

相關文章