牛骨文教育服务平台(让学习变的简单)
博文笔记

httpClient总览和getContentLength()为-1之谜案

创建时间:2014-12-15 投稿人: 浏览次数:9802

HttpClient是Apache Jakarta Common下的子项目,用来提供高效的、最新的、功能丰富的支持HTTP协议的客户端编程工具包,并且它支持HTTP协议最新的版本和建议。HttpClient已经应用在很多的项目中,比如Apache Jakarta上很著名的另外两个开源项目Cactus和HTMLUnit都使用了HttpClient。

目前最新的正式版本是4.3.6

下载地址: http://hc.apache.org/downloads.cgi

 

HttpClient简介  HTTP 协议可能是现在 Internet 上使用得最多、最重要的协议了,越来越多的 Java 应用程序需要直接通过 HTTP 协议来访问网络资源。虽然在 JDK 的 java.net 包中已经提供了访问 HTTP 协议的基本功能,但是对于大部分应用程序来说,JDK 库本身提供的功能还不够丰富和灵活。HttpClient 是 Apache Jakarta Common 下的子项目,用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包,并且它支持 HTTP 协议最新的版本和建议。

 

HttpClient 基本功能的使用 GET 方法 

使用 HttpClient 需要以下 6 个步骤:

1. 创建 HttpClient 的实例 

2. 创建某种连接方法的实例,在这里是 GetMethod。在 GetMethod 的构造函 数中传入待连接的地址

3. 调用第一步中创建好的实例的 execute 方法来执行第二步中创建好的 method 实例

4. 读 response 

5. 释放连接。无论执行方法是否成功,都必须释放连接

6. 对得到后的内容进行处理

 

httpClient的getContentLength()为-1之谜案

 

HttpClient是一个代码级的Http客户端工具,可以使用它模拟浏览器向Http服务器发送请求。使用HttpClient还需要HttpCore.后者包括Http请求与Http响应的代码封装。 

 

先写一个简单的HttpGet用例:

代码如下:

packagetest.ffm83.commons.httpClient;
 
importorg.apache.commons.lang.StringUtils;
importorg.apache.http.HttpEntity;
importorg.apache.http.HttpResponse;
importorg.apache.http.client.HttpClient;
importorg.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;
/**
 * httpClient 的简单应用
 * 基于4.x版本
 * @author范芳铭
 */
public class EasyHttpGetA {
     public final static voidmain(String[] args) throwsException { 
         getHttpGetOld("http://www.baidu.com/");
         getHttpGetOld("http://www.apache.org/");
     } 
    
     private static void getHttpGetOld(String url)throws Exception{
         HttpClient httpclient = newDefaultHttpClient();  //老方法,不推荐使用
         try { 
             HttpGet httpget = new HttpGet(url); 
             System.out.println(StringUtils.center(url, 80,"="));
             HttpResponse response =httpclient.execute(httpget); //老方法,不推荐使用
             HttpEntity entity =response.getEntity(); 
             System.out.println(response.getStatusLine()); 
             String webContent = "";
             if (entity !=null) { 
             webContent= EntityUtils.toString(entity);
                 System.out.println("Response getContentLength: "+ entity.getContentLength()); 
                 System.out.println("Response toString() length: "+ webContent.length()); 
             } 
             //System.out.println(response.toString()); //显示HTTP请求header
             System.out.println("----------------------------------------"); 
             httpget.abort(); 
            
         }
         finally {         
         httpclient.getConnectionManager().shutdown(); 
         } 
     }
}


以下是运行结果:

==============http://www.baidu.com/===============

HTTP/1.1 200 OK

Response getContentLength: -1

Response toString() length: 85664

----------------------------------------

==============http://www.apache.org/==============

HTTP/1.1 200 OK

Response getContentLength: 41765

Response toString() length: 41765

----------------------------------------

 

那么问题就来了,为什么baidu的Response getContentLength: -1 呢?

把//System.out.println(response.toString());//显示HTTP请求header

这一行的注释去掉,运行后结果如下:

==============http://www.baidu.com/===============

HTTP/1.1 200 OK

Response getContentLength: -1

Response toString() length: 85720

HTTP/1.1 200 OK [Date: Fri, 12 Dec 201413:11:00 GMT, Content-Type: text/html; charset=utf-8, Transfer-Encoding:chunked, Connection: Keep-Alive, Vary: Accept-Encoding, Set-Cookie:BAIDUID=165FB4395C8BBCE8F574B740FDA16EC7:FG=1; expires=Thu, 31-Dec-37 23:55:55GMT; max-age=2147483647; path=/; domain=.baidu.com, Set-Cookie:BAIDUPSID=165FB4395C8BBCE8F574B740FDA16EC7; expires=Thu, 31-Dec-37 23:55:55GMT; max-age=2147483647; path=/; domain=.baidu.com, Set-Cookie: BDSVRTM=0;path=/, Set-Cookie: BD_HOME=0; path=/, Set-Cookie:H_PS_PSSID=1435_9992_10571_10503_10500_10497_10017_10510_10645_10458_10066_10219_9769_10591_10355_9093_10095_10008_10442_10461_9950_9024_10627;path=/; domain=.baidu.com, P3P: CP=" OTI DSP COR IVA OUR IND COM ",Cache-Control: private, Cxy_all: baidu+f39c7d3dc65b3b8b1ed456d4af91f779,Expires: Fri, 12 Dec 2014 13:10:59 GMT, X-Powered-By: HPHP, Server: BWS/1.1,BDPAGETYPE: 1, BDQID: 0xa337691d0000efac, BDUSERID: 0]org.apache.http.conn.BasicManagedEntity@8b819f

----------------------------------------

==============http://www.apache.org/==============

HTTP/1.1 200 OK

Response getContentLength: 41765

Response toString() length: 41765

HTTP/1.1 200 OK [Date: Fri, 12 Dec 201413:11:08 GMT, Server: Apache/2.4.7 (Ubuntu), Last-Modified: Fri, 12 Dec 201412:10:43 GMT, ETag: "a325-50a03c82c50a6", Accept-Ranges: bytes,Content-Length: 41765, Vary: Accept-Encoding, Cache-Control: max-age=3600,Expires: Fri, 12 Dec 2014 14:11:08 GMT, Keep-Alive: timeout=30, max=100,Connection: Keep-Alive, Content-Type: text/html]org.apache.http.conn.BasicManagedEntity@120a47e

----------------------------------------

仔细看这个http请求的数据包,在www.apache.org的返回数据中,有一个头参数,Content-Length: 41765。这个就是entity.getContentLength()对应的值,而baidu返回的数据包里,没有这个属性。所以导致

Response getContentLength: -1

 

到了这里,貌似问题解决了,仔细看代码,DefaultHttpClient方法已经不支持了,继续学习,看doc文档,貌似CloseableHttpClienthttpclient = HttpClients.createDefault();是方向

代码如下:

packagetest.ffm83.commons.httpClient;
 
importorg.apache.commons.lang.StringUtils;
importorg.apache.http.HttpEntity;
importorg.apache.http.HttpResponse;
importorg.apache.http.client.HttpClient;
importorg.apache.http.client.methods.CloseableHttpResponse;
importorg.apache.http.client.methods.HttpGet;
importorg.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.DefaultHttpClient;
importorg.apache.http.impl.client.HttpClients;
importorg.apache.http.util.EntityUtils;
/**
 * httpClient 的简单应用,一个用老方法,一个用新方法
 * 基于4.x版本
 * @author范芳铭
 */
public class EasyHttpGetB {
     public final static voidmain(String[] args) throwsException {  
         _getContentLength("http://www.apache.org/");
         _getContentLengthClose("http://www.apache.org/");
     } 
 
     privatestaticvoid_getContentLength(String url) throwsException{
         HttpClient httpclient = newDefaultHttpClient();  //老方法
         try { 
             HttpGet httpget = new HttpGet(url); 
             System.out.println(StringUtils.center(url+" getContentLength", 50,"="));
             HttpResponse response =httpclient.execute(httpget); //老方法
             HttpEntity entity =response.getEntity(); 
             System.out.println(response.getStatusLine()); 
             String webContent = "";
             if (entity !=null) { 
             webContent= EntityUtils.toString(entity);
                 System.out.println("Response content    length:" + entity.getContentLength()); 
                 System.out.println("Response toString() length: "+ webContent.length()); 
             } 
             System.out.println(response.toString());//输入http协议的包头部分 
             httpget.abort();  
         }
         finally {         
         httpclient.getConnectionManager().shutdown(); 
         }   
     }
    
     private static void _getContentLengthClose(String url)throws Exception{
         CloseableHttpClienthttpclient = HttpClients.createDefault();//新方法
         try { 
             HttpGet httpget = new HttpGet(url); 
             System.out.println(StringUtils.center(url +" getContentLengthClose",50,"="));
             CloseableHttpResponse response =httpclient.execute(httpget);//新方法
             HttpEntity entity =response.getEntity(); 
             System.out.println(response.getStatusLine()); 
             String webContent = "";
             if (entity !=null) { 
             webContent= EntityUtils.toString(entity);
                 System.out.println("Response content    length:" + entity.getContentLength()); 
                 System.out.println("Response toString() length: "+ webContent.length()); 
             } 
             System.out.println(response.toString());//输入http协议的数据 
             httpget.abort();  
         }
         finally {         
         httpclient.close();
         }   
     }
}


运行结果如下:

=====http://www.apache.org/getContentLength======

HTTP/1.1200 OK

Responsecontent    length: 41765

ResponsetoString() length: 41765

HTTP/1.1200 OK [Date: Fri, 12 Dec 2014 13:41:26 GMT, Server: Apache/2.4.7 (Ubuntu),Last-Modified: Fri, 12 Dec 2014 12:10:43 GMT, ETag:"a325-50a03c82c50a6", Accept-Ranges: bytes, Content-Length: 41765,Vary: Accept-Encoding, Cache-Control: max-age=3600, Expires: Fri, 12 Dec 201414:41:26 GMT, Keep-Alive: timeout=30, max=100, Connection: Keep-Alive,Content-Type: text/html] org.apache.http.conn.BasicManagedEntity@503429

===http://www.apache.org/getContentLengthClose===

HTTP/1.1200 OK

Response content    length: -1

Response toString() length: 41765

HttpResponseProxy{HTTP/1.1 200 OK [Date: Fri,12 Dec 2014 13:41:31 GMT, Server: Apache/2.4.7 (Ubuntu), Last-Modified: Fri, 12Dec 2014 12:10:43 GMT, ETag: "a325-50a03c82c50a6-gzip",Accept-Ranges: bytes, Vary: Accept-Encoding, Cache-Control: max-age=3600,Expires: Fri, 12 Dec 2014 14:41:31 GMT, Keep-Alive: timeout=30, max=100,Connection: Keep-Alive, Content-Type: text/html]org.apache.http.client.entity.GzipDecompressingEntity@ae506e}

 

http://www.apache.org网站对两种方法返回的数据包差异巨大;

其中有一个是关于class的差别,后面一个是GzipDecompressingEntity,从包名的意思大概猜测可能和压缩有关系。

既然有可能和压缩有关,那么就告诉服务器不要压缩,那么这个-1的问题就能解决了。代码如下:

package test.ffm83.commons.httpClient;
 
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
 
import org.apache.commons.lang.StringUtils;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
importorg.apache.http.client.entity.GzipDecompressingEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.utils.HttpClientUtils;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
/**
 * httpClient 的简单应用,页面压缩导致getContentLength返回长度为-1
 * 基于4.x版本
 * @author范芳铭
 */
public class EasyHttpGetC {
     public finalstatic void main(String[] args) throws Exception { 
         _getContentLengthClose("http://www.apache.org/");
     } 
 
     private staticvoid _getContentLengthClose(String url) throws Exception{
         System.out.println(StringUtils.center(url+ " getContentLengthClose", 50,"="));
         CloseableHttpClienthttpclient = HttpClients.createDefault();//新方法
         try { 
             HttpGet httpget = new HttpGet(url);
             //在请求中明确定义不要进行压缩
            httpget.setHeader("Accept-Encoding", "identity");
 
             CloseableHttpResponse response =httpclient.execute(httpget); //新方法
             HttpEntity entity =response.getEntity(); 
            System.out.println(response.getStatusLine()); 
             String webContent = "";
             if (entity != null) { 
             webContent= EntityUtils.toString(entity);
                 System.out.println("Responsecontent    length: " +entity.getContentLength()); 
                 System.out.println("ResponsetoString() length: " + webContent.length()); 
             } 
             httpget.abort();  
         }
         finally {         
         httpclient.close();
         }   
     }
}


运行结果如下:

===http://www.apache.org/getContentLengthClose===

HTTP/1.1 200 OK

Response content    length: 41884

Response toString() length: 41884

 

真棒,问题终于解决了。

声明:该文观点仅代表作者本人,牛骨文系教育信息发布平台,牛骨文仅提供信息存储空间服务。