黑马程序员技术交流社区

标题: Servlet输出中文乱码如何将解决 [打印本页]

作者: 齐银春 时间: 2012-11-27 17:52
标题: Servlet输出中文乱码如何将解决
Servlet以字节流输出中文乱码如何将解决啊

作者: 刘芮铭 时间: 2012-11-27 17:57
可以才用过滤器或者设置字符编码

作者: 黑马-王宁 时间: 2012-11-27 18:10
原始代码：

　　java 代码

　　protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {

　　PrintWriter pw = response.getWriter();

　　response.setCharacterEncoding("utf-8");

　　response.setContentType("text/html; charset=utf-8");

　　pw.print("中文");

　　}

　　无论把3、4两句改成gbk还是utf-8，页面访问到的一律是??

　　一怒之下用wpe抓包，发现无论设为utf-8还是gbk抓到的均为

　　HTTP 代码

　　HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Content-Type: text/html;charset=ISO-8859-1 Content-Length: 2 Date: Thu, 08 Mar 2007 06:04:55 GMT ??

　　说明3、4两句没起作用，检查代码，尝试把2和三四顺序调整，乱码问题解决。

　　检查api文档，发现说明如下

　　PrintWriter getWriter() throws IOException

　　Returns a PrintWriter object that can send character text to the client. The PrintWriter uses the character encoding returned by getCharacterEncoding(). If the response's character encoding has not been specified as described in getCharacterEncoding (i.e., the method just returns the default value ISO-8859-1), getWriter updates it to ISO-8859-1.

　　推断getWriter()返回的PrintWriter使用的charactor encoding是在这个函数返回时即已确定的，但到底是返回的PrintWriter内部属性还是运行时的控制，未找到依据。

　　查看 tomcat中setCharacterEncoding方法的实现时发现如下代码：

　　java 代码

　　public void setCharacterEncoding(String charset) {

　　if (isCommitted())

　　return;

　　// Ignore any call from an included servlet

　　if (included)

　　return;

　　// Ignore any call made after the getWriter has been invoked

　　// The default should be used

　　if (usingWriter)

　　return;

　　coyoteResponse.setCharacterEncoding(charset);

　　isCharacterEncodingSet = true;

　　}

　　其中usingWriter 标志为getPrinteWriter方法中设定，可见其控制逻辑为一旦返回了PrintWriter，本函数即不再生效。但是上述的推断没有进一步的证据。

　　同时我们发现只有usingWriter标志，却没有usingOutputStream标记。猜测使用ServletOutputStream 输出不受此限制，经测试写出如下代码。

　　java 代码

　　ServletOutputStream out = response.getOutputStream();

　　out.print("中文");

　　//情况1：正常，浏览器按utf-8方式查看

　　//response.setContentType("text/html; charset=utf-8");

　　//情况2：浏览器缺省按简体中文查看，手动设为utf-8方式查看正常

　　说明：这种方式不仅不需要在调用getOutputStream()之前设定字符集，甚至在print输出后设定都有效。

　　查看setCharacterEncoding API文档，进一步发现：

　　Calling setContentType(java.lang.String) with the String of text/html and calling this method with the String of UTF-8 is equivalent with calling setContentType with the String of text/html; charset=UTF-8.

　　原来只需要用response.setContentType("text/html; charset=utf-8"); 设定就ok，不需要两次调用。进一步

　　This method can be called repeatedly to change the character encoding. ......If the character encoding has already been set by setContentType(java.lang.String) or setLocale(java.util.Locale), this method overrides it.

　　可反复设置，相互覆盖，据此写出如下测试代码

　　java 代码

　　//情况1：正常，浏览器按utf-8方式查看

　　response.setContentType("text/html; charset=gbk");

　　response.setCharacterEncoding("utf-8");

　　//情况2：正常，浏览器按简体中文方式查看

　　//response.setContentType("text/html; charset=utf-8");

　　//response.setCharacterEncoding("gbk");

　　PrintWriter pw = response.getWriter();

　　pw.print("中文");

　　结论：

　　1.在servlet中输出中文，如果采用PrintWriter方式，需要在调用getPrintWriter()之前调用setContentType 或者 setCharacterEncoding；采用ServletOutputStream方式，不受此限。

　　2.setContentType 和 setCharacterEncoding两方法中设定characterEncoding的方法对服务器效果一致，不需要反复调用。在输出文本内容时，采用response.setContentType("text/html; charset=utf-8");似乎更为方便。

欢迎光临黑马程序员技术交流社区 (http://bbs.itheima.com/)

黑马程序员IT技术论坛 X3.2