Apache之HttpClient
本文基於下述版本進行分析
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.4.1</version>
</dependency>
下述所有程式碼進行了必要的刪減
傳送請求流程
當我們要訪問一個介面執行HttpClient的execute()的方法時,會運用責任鏈模式走到MainClientExec的execute()中;
public CloseableHttpResponse execute(
final HttpRoute route,
final HttpRequestWrapper request,
final HttpClientContext context,
final HttpExecutionAware execAware) throws IOException, HttpException {
//1. 從池中獲取連線
Object userToken = context.getUserToken();
final ConnectionRequest connRequest = connManager.requestConnection(route, userToken);
final RequestConfig config = context.getRequestConfig();
final HttpClientConnection managedConn;
//ConnectionRequestTimeout配置用在這裡
final int timeout = config.getConnectionRequestTimeout();
managedConn = connRequest.get(timeout > 0 ? timeout : 0, TimeUnit.MILLISECONDS);
context.setAttribute(HttpCoreContext.HTTP_CONNECTION, managedConn);
//第二個配置:檢查connection的有效性
if (config.isStaleConnectionCheckEnabled()) {
// validate connection
if (managedConn.isOpen()) {
this.log.debug("Stale connection check");
if (managedConn.isStale()) {
this.log.debug("Stale connection detected");
managedConn.close();
}
}
}
final ConnectionHolder connHolder = new ConnectionHolder(this.log, this.connManager, managedConn);
try {
HttpResponse response;
for (int execCount = 1;; execCount++) {
if (!managedConn.isOpen()) {//沒有繫結socket
//上面已經獲取了connection,這裡就要把這個connection和一個socket繫結了
this.log.debug("Opening connection " + route);
//這裡會建立tcp/ip連線,並把socket繫結到managedConn上
establishRoute(proxyAuthState, managedConn, route, request, context);
}
//在真正和伺服器互動之前,還要設定好socketTimeOut
final int timeout = config.getSocketTimeout();
if (timeout >= 0) {
managedConn.setSocketTimeout(timeout);
}
//2. 真正傳送資料
response = requestExecutor.execute(request, managedConn, context);
// The connection is in or can be brought to a re-usable state.
if (reuseStrategy.keepAlive(response, context)) {
final long duration = keepAliveStrategy.getKeepAliveDuration(response, context);
connHolder.setValidFor(duration, TimeUnit.MILLISECONDS);
//這個會影響releaseConnection()的行為
connHolder.markReusable();
} else {
connHolder.markNonReusable();
}
}
// check for entity, release connection if possible
final HttpEntity entity = response.getEntity();
if (entity == null || !entity.isStreaming()) {
// connection not needed and (assumed to be) in re-usable state
connHolder.releaseConnection();
return new HttpResponseProxy(response, null);
} else {
return new HttpResponseProxy(response, connHolder);
}
} catch (...) {
...
}
}
大概總結一下上述流程:
- connectionPool中獲取connection(還有各種驗證);
- 使用這個connection傳送資料;
- 根據返回的response,設定一些引數,比如keepAlive;
- 釋放這個連線並返回response中的資料;
池中獲取連線
池化技術相信很多人都使用過,比如ThreadPool,JDBCPool(DataSource)等。接下來看一下HttpConnectionPool的工作原理。
// PoolingHttpClientConnectionManager.java
public ConnectionRequest requestConnection(
final HttpRoute route,
final Object state) {
//這裡是真正幹活的
final Future<CPoolEntry> future = this.pool.lease(route, state, null);
return new ConnectionRequest() {
@Override
public boolean cancel() {
return future.cancel(true);
}
@Override
public HttpClientConnection get(
final long timeout,
final TimeUnit tunit) throws InterruptedException, ExecutionException, ConnectionPoolTimeoutException {
return leaseConnection(future, timeout, tunit);
}
};
}
lease()的返回值實際是自定義的一個Future,其實現的get()中呼叫了getPoolEntryBlocking(),在研究具體的程式碼之前,需要先說明一下程式碼中幾個集合的作用,便於理解,如下圖:
HttpClientPool(姑且稱之為HPool吧)中維護了多個pool(specific pool,姑且稱之為UPool吧), 一個url會對應一個pool,不同顏色的connection可以理解為訪問不同的url建立的;其中的collection的含義如下:
- leased: 總的借出去的connection;
- available:可用的connection;
- connection pool:url對應的pool;
- pending:等待的執行緒佇列;
在程式中,leased和available實際的和為allocatedCount。
UPool的結構和HPool基本一致,只是這裡面的connection才是真正被使用的,每次當有執行緒來獲取connection的時候,會到一個具體的UPool中來查詢connection。HPool中維護的leased、available和pending是用來統計的;
當連線池裡的connection超出限制時,當前執行緒就會被放入pending中等待被喚醒;
瞭解了上述的設計,讀下面的程式碼就輕而易舉了。
private E getPoolEntryBlocking(
final T route, final Object state,
final long timeout, final TimeUnit tunit,
final Future<E> future) throws IOException, InterruptedException, TimeoutException {
Date deadline = null;
if (timeout > 0) {
deadline = new Date (System.currentTimeMillis() + tunit.toMillis(timeout));
}
this.lock.lock();
try {
//定位 UPool
final RouteSpecificPool<T, C, E> pool = getPool(route);
E entry;
for (;;) {//死迴圈-1
for (;;) { //死迴圈-2:迴圈直到從UPool中獲取一個沒有過期的connection
entry = pool.getFree(state);
/////////////////////////////////////////////////////////// getFree()方法體
public E getFree(final Object state) {
if (!this.available.isEmpty()) {//有可用的connection
if (state != null) { //state與認證有關,先忽略
final Iterator<E> it = this.available.iterator();
while (it.hasNext()) {
final E entry = it.next();
if (state.equals(entry.getState())) {
it.remove();
this.leased.add(entry);
return entry;
}
}
}
final Iterator<E> it = this.available.iterator();
while (it.hasNext()) {
final E entry = it.next();
if (entry.getState() == null) {
it.remove(); //UPool的available中刪掉這個connection
this.leased.add(entry);//UPool的leased中新增這個connection
return entry;
}
}
}
//走到這裡說明沒有可用的connection,下文一定會建立
return null;
}
///////////////////////////////////////////////////////////
if (entry == null) {//沒有借到connection
break;
}
if (entry.isExpired(System.currentTimeMillis())) {
entry.close();
}
if (entry.isClosed()) {
//這個connection關閉了(這裡是底層socket的關閉),也把HPool中available和leased中儲存的刪掉,池裡徹底沒有這個connection了
this.available.remove(entry);
pool.free(entry, false);
} else {
break;
}
}//死迴圈-2結束
if (entry != null) {//上面借到了connection
//HPool中做相應的處理以作統計用
this.available.remove(entry);
this.leased.add(entry);
//鉤子方法
onReuse(entry);
return entry;
}
// 走到這裡說明沒有獲取到有效的connection,需要建立
// 建立前先壓縮一下UPool,把暫時空閒的connection刪掉,騰出地兒
final int maxPerRoute = getMax(route);
// Shrink the pool prior to allocating a new connection
final int excess = Math.max(0, pool.getAllocatedCount() + 1 - maxPerRoute);
if (excess > 0) {
for (int i = 0; i < excess; i++) {
final E lastUsed = pool.getLastUsed();
if (lastUsed == null) {
break;
}
lastUsed.close();
this.available.remove(lastUsed);
pool.remove(lastUsed);
}
}
//UPool中的connection量沒到最大值才能新建
if (pool.getAllocatedCount() < maxPerRoute) {
final int totalUsed = this.leased.size();
final int freeCapacity = Math.max(this.maxTotal - totalUsed, 0);
//也需要滿足HPool對connection數量總的限制
if (freeCapacity > 0) {
final int totalAvailable = this.available.size();
// HPool中,總的可用的connection很多,幾乎沒有使用
// 為了讓當前的url可以新建立一個connection,隨機刪除一個可用的connection
if (totalAvailable > freeCapacity - 1) {
if (!this.available.isEmpty()) {
final E lastUsed = this.available.removeLast();
lastUsed.close();
final RouteSpecificPool<T, C, E> otherpool = getPool(lastUsed.getRoute());
otherpool.remove(lastUsed);
}
}
//已經刪除了一個沒有使用的connection把地兒挪了出來,接著建立當前url的connection
final C conn = this.connFactory.create(route);
//放入HPool和UPool的leased中
entry = pool.add(conn);
this.leased.add(entry);
return entry;
}
}
//走到這裡說明pool已經滿了,不能建立新的connection
boolean success = false;
//一個執行緒對應一個future
try {
if (future.isCancelled()) {
throw new InterruptedException("Operation interrupted");
}
//放入pending佇列中
pool.queue(future);
this.pending.add(future);
if (deadline != null) {
//ConnectionRequestTimeout的設定最終會在這裡起作用
//當前執行緒park了直到deadline這個時間點
//1. 執行緒一直park到deadline,返回false;
//2. 還沒到deadline,被signal了,返回true;
//這是一個相對積極的訊號,說明可能存在可用的connection。
//那麼誰來呼叫signal呢?有兩種可能:a. releaseConnection();b. 當前的獲取操作被cancel()
//3. 被中斷了,success也是false,直接走入finally;
success = this.condition.awaitUntil(deadline);
} else {
this.condition.await();
success = true;
}
// park被signal或睡到自然醒後,判斷當前獲取connection的操作是否被cancel
// 這裡的cancel和FutureTask的cancel還不太一樣。FutureTask的cancel是直接對執行緒進行interrupt(),這裡只是對一個變數的值進行了改變;
if (future.isCancelled()) {
throw new InterruptedException("Operation interrupted");
}
} finally {
// In case of 'success', we were woken up by the
// connection pool and should now have a connection
// waiting for us, or else we're shutting down.
// Just continue in the loop, both cases are checked.
pool.unqueue(future);
this.pending.remove(future);
}
if (!success && (deadline != null && deadline.getTime() <= System.currentTimeMillis())) {
//這裡說明這個執行緒在deadline之前被中斷了,或者等到醒來都沒有新的connection可用
break;//跳出死迴圈-1
}
} //死迴圈-1 結束
throw new TimeoutException("Timeout waiting for connection");
} finally {
this.lock.unlock();
}
}
釋放連線
在上文中提到,在response返回給客戶端之前會釋放連線,接下來我們看一下釋放的過程。
// ConnectionHolder.java
public void releaseConnection() {
//一個connection只能釋放一次,因此要加鎖
synchronized (this.managedConn) {
if (this.released) {
return;
}
this.released = true;
//上文說過,reuseable會影響釋放的過程
if (this.reusable) {
//可重複使用的connection,其實就是把Pool中leased裡的connection挪到available中的過程
//response的http頭可能是這個樣子: Keep-Alive: timeout=5, max=100
//這裡的validDuration實際上是服務端返回的keep-alive的時間,若沒有,就為-1
this.manager.releaseConnection(this.managedConn,
this.state, this.validDuration, this.tunit);
} else {
try {
//這裡是真正的關閉,意味著socket也已經關閉
this.managedConn.close();
log.debug("Connection discarded");
} catch (final IOException ex) {
if (this.log.isDebugEnabled()) {
this.log.debug(ex.getMessage(), ex);
}
} finally {
this.manager.releaseConnection(
this.managedConn, null, 0, TimeUnit.MILLISECONDS);
}
}
}
}
池中釋放就是把leased的connection挪到available中,但除了這個動作,還要有別的地方需要注意。available中可用connection並不是永遠都有效的,因為tcp/ip協議是全雙工方式工作,一個connection是否有效,要根據雙方的時時狀態來更新connection的生命週期。實際工作中,客戶端一般要隨服務端的狀態來改變。比如服務端返回值中顯示keepalive為10s,那麼當這個connection在available中的存活時間也不能超過10s,否則就有問題。
// PoolingHttpClientConnectionManager.java
public void releaseConnection(
final HttpClientConnection managedConn,
final Object state,
final long keepalive, final TimeUnit tunit) {
Args.notNull(managedConn, "Managed connection");
synchronized (managedConn) {
final CPoolEntry entry = CPoolProxy.detach(managedConn);
if (entry == null) {
return;
}
final ManagedHttpClientConnection conn = entry.getConnection();
try {
if (conn.isOpen()) {
final TimeUnit effectiveUnit = tunit != null ? tunit : TimeUnit.MILLISECONDS;
entry.setState(state);
// 存活的最後時間點是 放入available的那一刻向後推keepalive;
// 當然,如果這個時間點在我們初始化時設定的最後時間點之後,還是以設定的值為準
entry.updateExpiry(keepalive, effectiveUnit);
if (this.log.isDebugEnabled()) {
final String s;
if (keepalive > 0) {
s = "for " + (double) effectiveUnit.toMillis(keepalive) / 1000 + " seconds";
} else {
s = "indefinitely";
}
this.log.debug("Connection " + format(entry) + " can be kept alive " + s);
}
}
} finally {
// 這裡就是connection從leased到available的挪動
// HPool和UPool都要進行挪動的操作並喚醒等待的執行緒
this.pool.release(entry, conn.isOpen() && entry.isRouteComplete());
if (this.log.isDebugEnabled()) {
this.log.debug("Connection released: " + format(entry) + formatStats(entry.getRoute()));
}
}
}
}
關閉連線
除了不能重複使用的connection需要關閉外,一些超時無用的connection也要關閉
// 這個方法可以傳入引數,可以由業務方根據實際情況設定值
public void closeIdleConnections(final long idleTimeout, final TimeUnit tunit) {
if (this.log.isDebugEnabled()) {
this.log.debug("Closing connections idle longer than " + idleTimeout + " " + tunit);
}
this.pool.closeIdle(idleTimeout, tunit);
}
// 這個方法沒有引數,那麼哪些算是expired的呢?
// 由上節我們知道,在釋放連線的時候,會根據服務端的keepalive(沒有的話,也有預設值) 設定expired的deadline;
public void closeExpiredConnections() {
this.log.debug("Closing expired connections");
this.pool.closeExpired();
}
idle: 從connection建立的時間點開始的idleTimeout時間範圍,是一個絕對的時間範圍;比如一個connection是10:00建立,idleTimeout設為60s,那麼10:01以後這個connection就得關閉;
expire:expire需要一個deadline,這個deadline每次release的時候都會更新,值為release的時間點 + keepalive(或validityDeadline),是一個相對的時間範圍;比如一個connection最後一次release的時間點是10:00,keepalive=6min,validityDeadline=5min,那麼deadline=10:05,如果這個connection再沒有使用過,則過了10:05,就算是過期的connection,應該被關閉; 如果在10:04的時候又被借出去使用了,release的時間是10:10,keepalive還是為6min,那麼過了10:15,這個connection就應關閉了;
很多情況response的keepalive和validityDeadline都沒有值,那麼這個時候deadline就是Long.MAX_VALUE了,這個時候只能通過idle的值來關閉不需要的connection了;
下面再說明一下幾個時間點
// 首次建立connection
public PoolEntry(final String id, final T route, final C conn,
final long timeToLive, final TimeUnit tunit) {
super();
Args.notNull(route, "Route");
Args.notNull(conn, "Connection");
Args.notNull(tunit, "Time unit");
this.id = id;
this.route = route;
this.conn = conn;
this.created = System.currentTimeMillis();
this.updated = this.created; //這個就是connection被建立的時間,會用於idle的判斷
if (timeToLive > 0) { //這個值通過HttpClientBuilder.setConnectionTimeToLive()傳入
final long deadline = this.created + tunit.toMillis(timeToLive);
// If the above overflows then default to Long.MAX_VALUE
this.validityDeadline = deadline > 0 ? deadline : Long.MAX_VALUE;
} else {
this.validityDeadline = Long.MAX_VALUE;
}
this.expiry = this.validityDeadline; //預設的expire deadline
}
上述兩種關閉connection的方式都是從時間入手,到了一個時間點,過期的connection都幹掉。現在假如把connection的idleTimeout設為10天,expired的deadline沒有設定,即為Long.MAX_VALUE,這個時候池裡面的connection會有什麼問題?伺服器端的connection不會保留10天這麼久,很快就會斷掉,那麼此時池裡的connection實際上就是半雙工狀態了,這個不正常的connection會被客戶端獲取到。為了解決這個問題,引入了validateAfterInactivity(預設5s)。
for (;;) {
final E leasedEntry = getPoolEntryBlocking(route, state, timeout, tunit, this);
//池中獲取的connection要驗證
if (validateAfterInactivity > 0) {
//比如10:00建立的connection,那麼10:05後就要驗證了
if (leasedEntry.getUpdated() + validateAfterInactivity <= System.currentTimeMillis()) {
if (!validate(leasedEntry)) {
//validate呼叫的是connection的isStale()
//////////////////////////////////////////////////////////////
public boolean isStale() {
if (!isOpen()) { //沒有繫結socket 或 socket關閉
return true;
}
try {
//其實socket沒讀到資料也不能說明該socket無效
//這裡我覺得是一種較悲觀的處理,寧可錯殺一千,不可放過一個
final int bytesRead = fillInputBuffer(1);
return bytesRead < 0;
} catch (final SocketTimeoutException ex) {
//這裡要注意,SocketTimeoutException不能說明這個connection無效
return false; //上面的if無法進入,這個connection可能有問題
} catch (final IOException ex) {
return true;
}
}
//////////////////////////////////////////////////////////////
leasedEntry.close();
release(leasedEntry, false);
continue;
}
}
}
entryRef.set(leasedEntry);
done.set(true);
onLease(leasedEntry);
if (callback != null) {
callback.completed(leasedEntry);
}
return leasedEntry;
}
最後,本文有點長,如果讀者覺得有哪裡不對的地方,歡迎批評指正。
相關文章
- RestTemplate和 apache HttpClient 使用方式RESTApacheHTTPclient
- 工具篇:apache-httpClient 和 jdk11-HttpClient的使用ApacheHTTPclientJDK
- Apache httpclient的execute方法除錯ApacheHTTPclient除錯
- Apache HttpClient使用和原始碼分析ApacheHTTPclient原始碼
- 如何在Apache HttpClient中設定TLS版本ApacheHTTPclientTLS
- 【HttpClient】httpclient之post 方法(引數為Map型別)HTTPclient型別
- java apache commons HttpClient傳送get和post請求的學習整理JavaApacheHTTPclient
- 把HttpClient換成IHttpClientFactory之後,放心多了HTTPclient
- HttpclientHTTPclient
- httpclient 4.5.3HTTPclient
- Go - httpclient 常用操作GoHTTPclient
- RPC框架實踐之:Apache ThriftRPC框架Apache
- 聊聊jdk httpclient的executorJDKHTTPclient
- HttpClient 下載檔案HTTPclient
- Httpclient 介面自動化HTTPclient
- HttpClient請求工具類HTTPclient
- .Netcore HttpClient原始碼探究NetCoreHTTPclient原始碼
- HttpClient 進行soap請求HTTPclient
- 為HttpClient開啟HTTP/2HTTPclient
- .NET Core HttpClient原始碼探究HTTPclient原始碼
- 大資料排程元件之Apache DolphinScheduler大資料元件Apache
- 【譯】Apache Hadoop 系列之四(Edits Viewer指ApacheHadoopView
- 使用apache的HttpClient進行http通訊,隱藏的HTTP請求頭部欄位是如何自動被新增的ApacheHTTPclient
- Java11 HttpClient小試牛刀JavaHTTPclient
- 聊聊jdk httpclient的retry引數JDKHTTPclient
- HttpClient4.5中文教程HTTPclient
- 小心 HttpClient 中的 FormUrlEncodeContent 的 bugHTTPclientORM
- 優雅通過HttpClientFactory使用HttpClientHTTPclient
- java httpclient傳送中文亂碼JavaHTTPclient
- C# httpclient上傳檔案C#HTTPclient
- 【傳輸協議】HttpClient基本使用協議HTTPclient
- Apache-Flink深度解析-DataStream-Connectors之KafkaApacheASTKafka
- Apache Hudi重磅特性解讀之全域性索引Apache索引
- [Jmeter] Implementation 中 JAVA 與 HTTPClient4 如何探尋兩者之間的區別JMeterJavaHTTPclient
- [case39]聊聊jdk httpclient的executorJDKHTTPclient
- 聊聊jdk httpclient的connect timeout異常JDKHTTPclient
- 使用httpclient下載 頁面、圖片HTTPclient
- Apache Hudi與Apache Flink整合Apache