httpClient总览和getContentLength()为-1之谜案
HttpClient是Apache Jakarta Common下的子项目,用来提供高效的、最新的、功能丰富的支持HTTP协议的客户端编程工具包,并且它支持HTTP协议最新的版本和建议。HttpClient已经应用在很多的项目中,比如Apache Jakarta上很著名的另外两个开源项目Cactus和HTMLUnit都使用了HttpClient。
目前最新的正式版本是4.3.6
下载地址: http://hc.apache.org/downloads.cgi
HttpClient简介 HTTP 协议可能是现在 Internet 上使用得最多、最重要的协议了,越来越多的 Java 应用程序需要直接通过 HTTP 协议来访问网络资源。虽然在 JDK 的 java.net 包中已经提供了访问 HTTP 协议的基本功能,但是对于大部分应用程序来说,JDK 库本身提供的功能还不够丰富和灵活。HttpClient 是 Apache Jakarta Common 下的子项目,用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包,并且它支持 HTTP 协议最新的版本和建议。
HttpClient 基本功能的使用 GET 方法
使用 HttpClient 需要以下 6 个步骤:
1. 创建 HttpClient 的实例
2. 创建某种连接方法的实例,在这里是 GetMethod。在 GetMethod 的构造函 数中传入待连接的地址
3. 调用第一步中创建好的实例的 execute 方法来执行第二步中创建好的 method 实例
4. 读 response
5. 释放连接。无论执行方法是否成功,都必须释放连接
6. 对得到后的内容进行处理
httpClient的getContentLength()为-1之谜案
HttpClient是一个代码级的Http客户端工具,可以使用它模拟浏览器向Http服务器发送请求。使用HttpClient还需要HttpCore.后者包括Http请求与Http响应的代码封装。
先写一个简单的HttpGet用例:
代码如下:
packagetest.ffm83.commons.httpClient; importorg.apache.commons.lang.StringUtils; importorg.apache.http.HttpEntity; importorg.apache.http.HttpResponse; importorg.apache.http.client.HttpClient; importorg.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.DefaultHttpClient; import org.apache.http.util.EntityUtils; /** * httpClient 的简单应用 * 基于4.x版本 * @author范芳铭 */ public class EasyHttpGetA { public final static voidmain(String[] args) throwsException { getHttpGetOld("http://www.baidu.com/"); getHttpGetOld("http://www.apache.org/"); } private static void getHttpGetOld(String url)throws Exception{ HttpClient httpclient = newDefaultHttpClient(); //老方法,不推荐使用 try { HttpGet httpget = new HttpGet(url); System.out.println(StringUtils.center(url, 80,"=")); HttpResponse response =httpclient.execute(httpget); //老方法,不推荐使用 HttpEntity entity =response.getEntity(); System.out.println(response.getStatusLine()); String webContent = ""; if (entity !=null) { webContent= EntityUtils.toString(entity); System.out.println("Response getContentLength: "+ entity.getContentLength()); System.out.println("Response toString() length: "+ webContent.length()); } //System.out.println(response.toString()); //显示HTTP请求header System.out.println("----------------------------------------"); httpget.abort(); } finally { httpclient.getConnectionManager().shutdown(); } } }
以下是运行结果:
==============http://www.baidu.com/===============
HTTP/1.1 200 OK
Response getContentLength: -1
Response toString() length: 85664
----------------------------------------
==============http://www.apache.org/==============
HTTP/1.1 200 OK
Response getContentLength: 41765
Response toString() length: 41765
----------------------------------------
那么问题就来了,为什么baidu的Response getContentLength: -1 呢?
把//System.out.println(response.toString());//显示HTTP请求header
这一行的注释去掉,运行后结果如下:
==============http://www.baidu.com/===============
HTTP/1.1 200 OK
Response getContentLength: -1
Response toString() length: 85720
HTTP/1.1 200 OK [Date: Fri, 12 Dec 201413:11:00 GMT, Content-Type: text/html; charset=utf-8, Transfer-Encoding:chunked, Connection: Keep-Alive, Vary: Accept-Encoding, Set-Cookie:BAIDUID=165FB4395C8BBCE8F574B740FDA16EC7:FG=1; expires=Thu, 31-Dec-37 23:55:55GMT; max-age=2147483647; path=/; domain=.baidu.com, Set-Cookie:BAIDUPSID=165FB4395C8BBCE8F574B740FDA16EC7; expires=Thu, 31-Dec-37 23:55:55GMT; max-age=2147483647; path=/; domain=.baidu.com, Set-Cookie: BDSVRTM=0;path=/, Set-Cookie: BD_HOME=0; path=/, Set-Cookie:H_PS_PSSID=1435_9992_10571_10503_10500_10497_10017_10510_10645_10458_10066_10219_9769_10591_10355_9093_10095_10008_10442_10461_9950_9024_10627;path=/; domain=.baidu.com, P3P: CP=" OTI DSP COR IVA OUR IND COM ",Cache-Control: private, Cxy_all: baidu+f39c7d3dc65b3b8b1ed456d4af91f779,Expires: Fri, 12 Dec 2014 13:10:59 GMT, X-Powered-By: HPHP, Server: BWS/1.1,BDPAGETYPE: 1, BDQID: 0xa337691d0000efac, BDUSERID: 0]org.apache.http.conn.BasicManagedEntity@8b819f
----------------------------------------
==============http://www.apache.org/==============
HTTP/1.1 200 OK
Response getContentLength: 41765
Response toString() length: 41765
HTTP/1.1 200 OK [Date: Fri, 12 Dec 201413:11:08 GMT, Server: Apache/2.4.7 (Ubuntu), Last-Modified: Fri, 12 Dec 201412:10:43 GMT, ETag: "a325-50a03c82c50a6", Accept-Ranges: bytes,Content-Length: 41765, Vary: Accept-Encoding, Cache-Control: max-age=3600,Expires: Fri, 12 Dec 2014 14:11:08 GMT, Keep-Alive: timeout=30, max=100,Connection: Keep-Alive, Content-Type: text/html]org.apache.http.conn.BasicManagedEntity@120a47e
----------------------------------------
仔细看这个http请求的数据包,在www.apache.org的返回数据中,有一个头参数,Content-Length: 41765。这个就是entity.getContentLength()对应的值,而baidu返回的数据包里,没有这个属性。所以导致
Response getContentLength: -1
到了这里,貌似问题解决了,仔细看代码,DefaultHttpClient方法已经不支持了,继续学习,看doc文档,貌似CloseableHttpClienthttpclient = HttpClients.createDefault();是方向
代码如下:
packagetest.ffm83.commons.httpClient; importorg.apache.commons.lang.StringUtils; importorg.apache.http.HttpEntity; importorg.apache.http.HttpResponse; importorg.apache.http.client.HttpClient; importorg.apache.http.client.methods.CloseableHttpResponse; importorg.apache.http.client.methods.HttpGet; importorg.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.DefaultHttpClient; importorg.apache.http.impl.client.HttpClients; importorg.apache.http.util.EntityUtils; /** * httpClient 的简单应用,一个用老方法,一个用新方法 * 基于4.x版本 * @author范芳铭 */ public class EasyHttpGetB { public final static voidmain(String[] args) throwsException { _getContentLength("http://www.apache.org/"); _getContentLengthClose("http://www.apache.org/"); } privatestaticvoid_getContentLength(String url) throwsException{ HttpClient httpclient = newDefaultHttpClient(); //老方法 try { HttpGet httpget = new HttpGet(url); System.out.println(StringUtils.center(url+" getContentLength", 50,"=")); HttpResponse response =httpclient.execute(httpget); //老方法 HttpEntity entity =response.getEntity(); System.out.println(response.getStatusLine()); String webContent = ""; if (entity !=null) { webContent= EntityUtils.toString(entity); System.out.println("Response content length:" + entity.getContentLength()); System.out.println("Response toString() length: "+ webContent.length()); } System.out.println(response.toString());//输入http协议的包头部分 httpget.abort(); } finally { httpclient.getConnectionManager().shutdown(); } } private static void _getContentLengthClose(String url)throws Exception{ CloseableHttpClienthttpclient = HttpClients.createDefault();//新方法 try { HttpGet httpget = new HttpGet(url); System.out.println(StringUtils.center(url +" getContentLengthClose",50,"=")); CloseableHttpResponse response =httpclient.execute(httpget);//新方法 HttpEntity entity =response.getEntity(); System.out.println(response.getStatusLine()); String webContent = ""; if (entity !=null) { webContent= EntityUtils.toString(entity); System.out.println("Response content length:" + entity.getContentLength()); System.out.println("Response toString() length: "+ webContent.length()); } System.out.println(response.toString());//输入http协议的数据 httpget.abort(); } finally { httpclient.close(); } } }
运行结果如下:
=====http://www.apache.org/getContentLength======
HTTP/1.1200 OK
Responsecontent length: 41765
ResponsetoString() length: 41765
HTTP/1.1200 OK [Date: Fri, 12 Dec 2014 13:41:26 GMT, Server: Apache/2.4.7 (Ubuntu),Last-Modified: Fri, 12 Dec 2014 12:10:43 GMT, ETag:"a325-50a03c82c50a6", Accept-Ranges: bytes, Content-Length: 41765,Vary: Accept-Encoding, Cache-Control: max-age=3600, Expires: Fri, 12 Dec 201414:41:26 GMT, Keep-Alive: timeout=30, max=100, Connection: Keep-Alive,Content-Type: text/html] org.apache.http.conn.BasicManagedEntity@503429
===http://www.apache.org/getContentLengthClose===
HTTP/1.1200 OK
Response content length: -1
Response toString() length: 41765
HttpResponseProxy{HTTP/1.1 200 OK [Date: Fri,12 Dec 2014 13:41:31 GMT, Server: Apache/2.4.7 (Ubuntu), Last-Modified: Fri, 12Dec 2014 12:10:43 GMT, ETag: "a325-50a03c82c50a6-gzip",Accept-Ranges: bytes, Vary: Accept-Encoding, Cache-Control: max-age=3600,Expires: Fri, 12 Dec 2014 14:41:31 GMT, Keep-Alive: timeout=30, max=100,Connection: Keep-Alive, Content-Type: text/html]org.apache.http.client.entity.GzipDecompressingEntity@ae506e}
http://www.apache.org网站对两种方法返回的数据包差异巨大;
其中有一个是关于class的差别,后面一个是GzipDecompressingEntity,从包名的意思大概猜测可能和压缩有关系。
既然有可能和压缩有关,那么就告诉服务器不要压缩,那么这个-1的问题就能解决了。代码如下:
package test.ffm83.commons.httpClient; import java.io.InputStream; import java.util.HashMap; import java.util.Map; import org.apache.commons.lang.StringUtils; import org.apache.http.HttpEntity; import org.apache.http.HttpResponse; import org.apache.http.client.HttpClient; importorg.apache.http.client.entity.GzipDecompressingEntity; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.client.utils.HttpClientUtils; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.DefaultHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; /** * httpClient 的简单应用,页面压缩导致getContentLength返回长度为-1 * 基于4.x版本 * @author范芳铭 */ public class EasyHttpGetC { public finalstatic void main(String[] args) throws Exception { _getContentLengthClose("http://www.apache.org/"); } private staticvoid _getContentLengthClose(String url) throws Exception{ System.out.println(StringUtils.center(url+ " getContentLengthClose", 50,"=")); CloseableHttpClienthttpclient = HttpClients.createDefault();//新方法 try { HttpGet httpget = new HttpGet(url); //在请求中明确定义不要进行压缩 httpget.setHeader("Accept-Encoding", "identity"); CloseableHttpResponse response =httpclient.execute(httpget); //新方法 HttpEntity entity =response.getEntity(); System.out.println(response.getStatusLine()); String webContent = ""; if (entity != null) { webContent= EntityUtils.toString(entity); System.out.println("Responsecontent length: " +entity.getContentLength()); System.out.println("ResponsetoString() length: " + webContent.length()); } httpget.abort(); } finally { httpclient.close(); } } }
运行结果如下:
===http://www.apache.org/getContentLengthClose===
HTTP/1.1 200 OK
Response content length: 41884
Response toString() length: 41884
真棒,问题终于解决了。