本帖最后由 supertoy 于 2014-8-14 09:29 编辑
String s = "哈哈";
byte[] bytes = s.getBytes("gbk");
System.out.println("GBK编码:"+ToHex.arrayToHex(bytes));
s = new String(bytes,"utf-8");
System.out.println("utf-8字符:"+s);
bytes = s.getBytes("utf-8");
System.out.println("UTF8编码:"+ToHex.arrayToHex(bytes));
-----------------
GBK编码:[b9(B:10111001),fe(B:11111110),b9(B:10111001),fe(B:11111110)]
utf-8字符:����
UTF8编码:UTF8编码:[ef,bf,bd,ef,bf,bd,ef,bf,bd,ef,bf,bd]
---------------------
utf8在对b9,fe,b9,fe,解码时【三个规则:(0),(110,10),(1110,10,10)】,无法找到匹配的字符,舍弃b9,用unicode占位符U+FFFD(ef,bf,bd)代替,
导致解码后无法还原原来的编码。{:3_52:}
|