httpClient总览和getContentLength()为-1之谜案
HttpClient是Apache Jakarta Common下的子项目,用来提供高效的、最新的、功能丰富的支持HTTP协议的客户端编程工具包,并且它支持HTTP协议最新的版本和建议。HttpClient已经应用在很多的项目中,比如Apache Jakarta上很著名的另外两个开源项目Cactus和HTMLUnit都使用了HttpClient。
目前最新的正式版本是4.3.6
下载地址: http://hc.apache.org/downloads.cgi
HttpClient简介 HTTP 协议可能是现在 Internet 上使用得最多、最重要的协议了,越来越多的 Java 应用程序需要直接通过 HTTP 协议来访问网络资源。虽然在 JDK 的 java.net 包中已经提供了访问 HTTP 协议的基本功能,但是对于大部分应用程序来说,JDK 库本身提供的功能还不够丰富和灵活。HttpClient 是 Apache Jakarta Common 下的子项目,用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包,并且它支持 HTTP 协议最新的版本和建议。
HttpClient 基本功能的使用 GET 方法
使用 HttpClient 需要以下 6 个步骤:
1. 创建 HttpClient 的实例
2. 创建某种连接方法的实例,在这里是 GetMethod。在 GetMethod 的构造函 数中传入待连接的地址
3. 调用第一步中创建好的实例的 execute 方法来执行第二步中创建好的 method 实例
4. 读 response
5. 释放连接。无论执行方法是否成功,都必须释放连接
6. 对得到后的内容进行处理
httpClient的getContentLength()为-1之谜案
HttpClient是一个代码级的Http客户端工具,可以使用它模拟浏览器向Http服务器发送请求。使用HttpClient还需要HttpCore.后者包括Http请求与Http响应的代码封装。
先写一个简单的HttpGet用例:
代码如下:
packagetest.ffm83.commons.httpClient;
importorg.apache.commons.lang.StringUtils;
importorg.apache.http.HttpEntity;
importorg.apache.http.HttpResponse;
importorg.apache.http.client.HttpClient;
importorg.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;
/**
* httpClient 的简单应用
* 基于4.x版本
* @author范芳铭
*/
public class EasyHttpGetA {
public final static voidmain(String[] args) throwsException {
getHttpGetOld("http://www.baidu.com/");
getHttpGetOld("http://www.apache.org/");
}
private static void getHttpGetOld(String url)throws Exception{
HttpClient httpclient = newDefaultHttpClient(); //老方法,不推荐使用
try {
HttpGet httpget = new HttpGet(url);
System.out.println(StringUtils.center(url, 80,"="));
HttpResponse response =httpclient.execute(httpget); //老方法,不推荐使用
HttpEntity entity =response.getEntity();
System.out.println(response.getStatusLine());
String webContent = "";
if (entity !=null) {
webContent= EntityUtils.toString(entity);
System.out.println("Response getContentLength: "+ entity.getContentLength());
System.out.println("Response toString() length: "+ webContent.length());
}
//System.out.println(response.toString()); //显示HTTP请求header
System.out.println("----------------------------------------");
httpget.abort();
}
finally {
httpclient.getConnectionManager().shutdown();
}
}
}以下是运行结果:
==============http://www.baidu.com/===============
HTTP/1.1 200 OK
Response getContentLength: -1
Response toString() length: 85664
----------------------------------------
==============http://www.apache.org/==============
HTTP/1.1 200 OK
Response getContentLength: 41765
Response toString() length: 41765
----------------------------------------
那么问题就来了,为什么baidu的Response getContentLength: -1 呢?
把//System.out.println(response.toString());//显示HTTP请求header
这一行的注释去掉,运行后结果如下:
==============http://www.baidu.com/===============
HTTP/1.1 200 OK
Response getContentLength: -1
Response toString() length: 85720
HTTP/1.1 200 OK [Date: Fri, 12 Dec 201413:11:00 GMT, Content-Type: text/html; charset=utf-8, Transfer-Encoding:chunked, Connection: Keep-Alive, Vary: Accept-Encoding, Set-Cookie:BAIDUID=165FB4395C8BBCE8F574B740FDA16EC7:FG=1; expires=Thu, 31-Dec-37 23:55:55GMT; max-age=2147483647; path=/; domain=.baidu.com, Set-Cookie:BAIDUPSID=165FB4395C8BBCE8F574B740FDA16EC7; expires=Thu, 31-Dec-37 23:55:55GMT; max-age=2147483647; path=/; domain=.baidu.com, Set-Cookie: BDSVRTM=0;path=/, Set-Cookie: BD_HOME=0; path=/, Set-Cookie:H_PS_PSSID=1435_9992_10571_10503_10500_10497_10017_10510_10645_10458_10066_10219_9769_10591_10355_9093_10095_10008_10442_10461_9950_9024_10627;path=/; domain=.baidu.com, P3P: CP=" OTI DSP COR IVA OUR IND COM ",Cache-Control: private, Cxy_all: baidu+f39c7d3dc65b3b8b1ed456d4af91f779,Expires: Fri, 12 Dec 2014 13:10:59 GMT, X-Powered-By: HPHP, Server: BWS/1.1,BDPAGETYPE: 1, BDQID: 0xa337691d0000efac, BDUSERID: 0]org.apache.http.conn.BasicManagedEntity@8b819f
----------------------------------------
==============http://www.apache.org/==============
HTTP/1.1 200 OK
Response getContentLength: 41765
Response toString() length: 41765
HTTP/1.1 200 OK [Date: Fri, 12 Dec 201413:11:08 GMT, Server: Apache/2.4.7 (Ubuntu), Last-Modified: Fri, 12 Dec 201412:10:43 GMT, ETag: "a325-50a03c82c50a6", Accept-Ranges: bytes,Content-Length: 41765, Vary: Accept-Encoding, Cache-Control: max-age=3600,Expires: Fri, 12 Dec 2014 14:11:08 GMT, Keep-Alive: timeout=30, max=100,Connection: Keep-Alive, Content-Type: text/html]org.apache.http.conn.BasicManagedEntity@120a47e
----------------------------------------
仔细看这个http请求的数据包,在www.apache.org的返回数据中,有一个头参数,Content-Length: 41765。这个就是entity.getContentLength()对应的值,而baidu返回的数据包里,没有这个属性。所以导致
Response getContentLength: -1
到了这里,貌似问题解决了,仔细看代码,DefaultHttpClient方法已经不支持了,继续学习,看doc文档,貌似CloseableHttpClienthttpclient = HttpClients.createDefault();是方向
代码如下:
packagetest.ffm83.commons.httpClient;
importorg.apache.commons.lang.StringUtils;
importorg.apache.http.HttpEntity;
importorg.apache.http.HttpResponse;
importorg.apache.http.client.HttpClient;
importorg.apache.http.client.methods.CloseableHttpResponse;
importorg.apache.http.client.methods.HttpGet;
importorg.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.DefaultHttpClient;
importorg.apache.http.impl.client.HttpClients;
importorg.apache.http.util.EntityUtils;
/**
* httpClient 的简单应用,一个用老方法,一个用新方法
* 基于4.x版本
* @author范芳铭
*/
public class EasyHttpGetB {
public final static voidmain(String[] args) throwsException {
_getContentLength("http://www.apache.org/");
_getContentLengthClose("http://www.apache.org/");
}
privatestaticvoid_getContentLength(String url) throwsException{
HttpClient httpclient = newDefaultHttpClient(); //老方法
try {
HttpGet httpget = new HttpGet(url);
System.out.println(StringUtils.center(url+" getContentLength", 50,"="));
HttpResponse response =httpclient.execute(httpget); //老方法
HttpEntity entity =response.getEntity();
System.out.println(response.getStatusLine());
String webContent = "";
if (entity !=null) {
webContent= EntityUtils.toString(entity);
System.out.println("Response content length:" + entity.getContentLength());
System.out.println("Response toString() length: "+ webContent.length());
}
System.out.println(response.toString());//输入http协议的包头部分
httpget.abort();
}
finally {
httpclient.getConnectionManager().shutdown();
}
}
private static void _getContentLengthClose(String url)throws Exception{
CloseableHttpClienthttpclient = HttpClients.createDefault();//新方法
try {
HttpGet httpget = new HttpGet(url);
System.out.println(StringUtils.center(url +" getContentLengthClose",50,"="));
CloseableHttpResponse response =httpclient.execute(httpget);//新方法
HttpEntity entity =response.getEntity();
System.out.println(response.getStatusLine());
String webContent = "";
if (entity !=null) {
webContent= EntityUtils.toString(entity);
System.out.println("Response content length:" + entity.getContentLength());
System.out.println("Response toString() length: "+ webContent.length());
}
System.out.println(response.toString());//输入http协议的数据
httpget.abort();
}
finally {
httpclient.close();
}
}
}运行结果如下:
=====http://www.apache.org/getContentLength======
HTTP/1.1200 OK
Responsecontent length: 41765
ResponsetoString() length: 41765
HTTP/1.1200 OK [Date: Fri, 12 Dec 2014 13:41:26 GMT, Server: Apache/2.4.7 (Ubuntu),Last-Modified: Fri, 12 Dec 2014 12:10:43 GMT, ETag:"a325-50a03c82c50a6", Accept-Ranges: bytes, Content-Length: 41765,Vary: Accept-Encoding, Cache-Control: max-age=3600, Expires: Fri, 12 Dec 201414:41:26 GMT, Keep-Alive: timeout=30, max=100, Connection: Keep-Alive,Content-Type: text/html] org.apache.http.conn.BasicManagedEntity@503429
===http://www.apache.org/getContentLengthClose===
HTTP/1.1200 OK
Response content length: -1
Response toString() length: 41765
HttpResponseProxy{HTTP/1.1 200 OK [Date: Fri,12 Dec 2014 13:41:31 GMT, Server: Apache/2.4.7 (Ubuntu), Last-Modified: Fri, 12Dec 2014 12:10:43 GMT, ETag: "a325-50a03c82c50a6-gzip",Accept-Ranges: bytes, Vary: Accept-Encoding, Cache-Control: max-age=3600,Expires: Fri, 12 Dec 2014 14:41:31 GMT, Keep-Alive: timeout=30, max=100,Connection: Keep-Alive, Content-Type: text/html]org.apache.http.client.entity.GzipDecompressingEntity@ae506e}
http://www.apache.org网站对两种方法返回的数据包差异巨大;
其中有一个是关于class的差别,后面一个是GzipDecompressingEntity,从包名的意思大概猜测可能和压缩有关系。
既然有可能和压缩有关,那么就告诉服务器不要压缩,那么这个-1的问题就能解决了。代码如下:
package test.ffm83.commons.httpClient;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
import org.apache.commons.lang.StringUtils;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
importorg.apache.http.client.entity.GzipDecompressingEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.utils.HttpClientUtils;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
/**
* httpClient 的简单应用,页面压缩导致getContentLength返回长度为-1
* 基于4.x版本
* @author范芳铭
*/
public class EasyHttpGetC {
public finalstatic void main(String[] args) throws Exception {
_getContentLengthClose("http://www.apache.org/");
}
private staticvoid _getContentLengthClose(String url) throws Exception{
System.out.println(StringUtils.center(url+ " getContentLengthClose", 50,"="));
CloseableHttpClienthttpclient = HttpClients.createDefault();//新方法
try {
HttpGet httpget = new HttpGet(url);
//在请求中明确定义不要进行压缩
httpget.setHeader("Accept-Encoding", "identity");
CloseableHttpResponse response =httpclient.execute(httpget); //新方法
HttpEntity entity =response.getEntity();
System.out.println(response.getStatusLine());
String webContent = "";
if (entity != null) {
webContent= EntityUtils.toString(entity);
System.out.println("Responsecontent length: " +entity.getContentLength());
System.out.println("ResponsetoString() length: " + webContent.length());
}
httpget.abort();
}
finally {
httpclient.close();
}
}
}运行结果如下:
===http://www.apache.org/getContentLengthClose===
HTTP/1.1 200 OK
Response content length: 41884
Response toString() length: 41884
真棒,问题终于解决了。
