一個類似於 Native2Ascii 的東東
同事晚上問到,我花了點時間趕製的,不知道有什麼問題,請大家在不同的環境下測試一下,有問題請回貼,謝謝。
package com.utstar.pizer.util.unicode; import java.util.*; import java.io.*; /** * <p>Title: </p> * <p>Description: </p> * <p>Copyright: Copyright (c) 2003</p> * <p>Company: </p> * @author not attributable * @version 1.0 */ public class UnicodeUtil { public static void main(String[] args) throws Exception { System.out.println(Integer.toHexString( (int) '你')); System.out.println(Integer.toHexString( (int) '我')); String tmp = "\\u" + "4F60" + "\\u" + "6211\u0000同\\u時a\\bc\u5e87DEf_)*&^\\u^"; //String tmp = "\u4F60\u6211\u540c\u65f6\u20e6"; System.out.println(tmp); System.out.println("[" + escapeUnicode(tmp) + "]"); String tmp2 = escapeUnicode(tmp); OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream("data.txt"),"UTF-8"); osw.write(tmp2); osw.flush(); osw.close(); long start = System.currentTimeMillis(); InputStreamReader isr = new InputStreamReader(new FileInputStream("data.txt"),"UTF-8"); char[] inc = new char[1024*4]; StringBuffer s = new StringBuffer(); int p = 0; while((p=isr.read(inc,0,inc.length))>0){ s.append(inc,0,p); } isr.close(); System.out.println(unescapeUnicode(s.toString(), null)); long end = System.currentTimeMillis(); System.out.println("Time consumed:"+(end-start)); } /* note: '\' is not a general escape, only the pair '\\u' and the triple '\\u' (for a literal "\\u") our unicode-escape parsing algorithm will go something like: 1. scan for '\\u' 2. if previous char is '\', skip all 3 of them. go to 1. 3. if next 4 chars are legal hex, continue, else go to 1. 4. turn into character. char foo = (char)Integer.parseInt(hexChars, 16); 5. test with Character.isDefined(foo) 6. insert into sb2, go to 1. */ /** * Decode 4-hex-digit unicode escapes from a String. * Escapes are defined in * <a href=" http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#100850 ">§3.3</a> * of the * <a href=" http://java.sun.com/docs/books/jls/second_edition/html/j.title.doc.html ">java language specification</a>. * * As a short example: "\u00BF" would be translated into ¿ (the inverted question mark) * * @param s the string to decode * @return the decoded string, or s if there were errors. */ public static String unescapeUnicode(String s, String encoding) throws UnsupportedEncodingException { if (s == null) { return s; } String decoded = s; try { int sindex; int osindex; sindex = s.indexOf("\\u"); osindex = 0; if (sindex >= 0) { // we can still have \\u, but we'll work it out. String hex4 = null; char tchar = '\u0000'; // we will build up our new string in here: StringBuffer sb2 = new StringBuffer(s.length()); // sb2.append( s.substring( 0, sindex ) ); while ( (sindex >= 0) && (sindex < s.length())) { //System.out.println(osindex+","+sindex); sb2.append(s.substring(osindex, sindex)); osindex = sindex; // we have a triple-esc, skip onward if ( (sindex > 0) && (s.charAt(sindex - 1) == '\\')) { sindex += 2; // the length of "\\u" sb2.append("u"); } else { // check for 4 hex digits following \\u // make sure we _have_ 4 more chars: if (sindex + 6 > s.length()) { sb2.append(s.substring(sindex)); break; //throw new StringIndexOutOfBoundsException(); } hex4 = s.substring(sindex + 2, sindex + 6); try { tchar = (char) Integer.parseInt(hex4, 16); } catch (NumberFormatException nfe) { sb2.append(s.substring(sindex, sindex + 2)); sindex+=2; osindex = sindex; sindex = s.indexOf("\\u", sindex); continue; //throw new StringIndexOutOfBoundsException(); } // hex4 parsed to an int, now see if its a character... if (Character.isDefined(tchar)) { sb2.append(tchar); } else { sb2.append(tchar); //throw new StringIndexOutOfBoundsException(); } sindex += 6; } osindex = sindex; sindex = s.indexOf("\\u", sindex); } if (sindex < 0) { // grab the rest of the string. sb2.append(s.substring(osindex)); } decoded = sb2.toString(); sb2 = null; // get rid of it } } catch (StringIndexOutOfBoundsException e) { // do nothing, s will be unaffected. return s; } if (encoding == null || "".equals(encoding)) { encoding = System.getProperty("file.encoding", "ISO-8859-1"); } return new String(decoded.getBytes(encoding)); } public static String escapeUnicode(String s) { if (s == null) { return s; } char[] chars = s.toCharArray(); char c; StringBuffer sb = new StringBuffer(); for (int i = 0; i < chars.length; i++) { c = chars[i]; //Ignore ascii character if (c > 0xff) { sb.append("\\u").append(Integer.toHexString(c)); } else { sb.append(c); } } return sb.toString(); } } <p class="indent"> |
相關文章
- windows forms 上一個類似於wpf snoop 的工具: HawkeyeWindowsORMOOP
- 誰能提供一個開源的下載庫 類似於FlashGet
- 使用VuePress搭建一個類似element的文件Vue
- 一個TextView設定多種格式(類似於“評論”的樣式)TextView
- 一個類似於Gridster的柵格佈局系統Vue元件Vue元件
- 用Vue仿了一個類似抖音的AppVueAPP
- 【Swift】類似於微博、微信的ActionSheetSwift
- 【JavaScript框架封裝】實現一個類似於JQuery的動畫框架的封裝JavaScript框架封裝jQuery動畫
- 【like-react】手寫一個類似 react 的框架React框架
- [Kails] 一個基於 Koa2 構建的類似於 Rails 的 nodejs 開源專案AINodeJS
- 關於類似於awr的效能分析報告
- 系統設計:如何設計一個類似於Tinder的基於位置的社交搜尋應用
- 【JavaScript框架封裝】實現一個類似於JQuery的CSS樣式框架的封裝JavaScript框架封裝jQueryCSS
- 採用 SwiftNIO 實現一個類似 Express 的 Web 框架SwiftExpressWeb框架
- 關於Java中類似於Portal starter的專案Java
- 寫了一個 SRE 除錯工具,類似一個小木馬除錯
- 類似資料字典的幾個表
- gmcache一個用go寫的分散式快取,類似memcachedGo分散式快取
- 字串的一個操作(替換類似陣列字串中的項)字串陣列
- 做一個類似賽博賞小程式大概多少錢
- 類似於C語言的printf函式 (轉)C語言函式
- YUI可真是個不錯的東東UI
- Blazor如何實現類似於微信的Tab切換?Blazor
- JS實現類似於微博秀的GitHub掛件JSGithub
- 類似if一樣的自定義程式碼塊
- mamute是一個類似stackoverflow的Q&A問答系統Java引擎Java
- tiny4j:一個輕量級的類似Spring的實現Spring
- clover 一個windows 多個視窗集中在一個介面類似google的多標籤功能軟體WindowsGo
- 【Swift】類似於微博、微信的多圖瀏覽 檢視Swift
- LINUX下kill掉多個類似程式的妙招Linux
- 討論一下基於網頁聊天的實現(類似gmail中的聊天)網頁AI
- 自己動手寫類似酷狗播放器(1)_建立一個基於對話方塊模板播放器
- 類似咻一咻,水波紋實現
- 基於原生fetch封裝一個帶有攔截器功能的fetch,類似axios的攔截器封裝iOS
- Android Room封裝成一個類似Redis的快取資料庫的效果AndroidOOM封裝Redis快取資料庫
- mysql怎麼清屏?類似於linux的clear怎麼做?MySqlLinux
- PHP - 實現類似於百度的實時搜尋PHP
- 解讀GBK編碼格式下的" "(類似於空格)的字元字元