PG 資料庫庫監聽佇列的長度問題
不論mysql 還是pg 資料庫都透過監聽某個ip/埠, 或者某個socket 來實現通訊.
這裡涉及到一個問題,就是這個監聽佇列的長度問題.
mysql 是自己實現的, 在my.cnf 裡有個配置選項 back_log 這就是設定監聽佇列的長度的.
PG 資料庫的監聽佇列的長度, 似乎沒有地方可以設定.
在做一個pgbench 的高併發壓力測試的時候,似乎出現這個問題.
命令:
pgbench -n -r -c 250 -j 250 -T 2 -f update_smallrange.sql
錯誤訊息:
Connection to database "" failed:
could not connect to server: Resource temporarily unavailable
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
但是從上面的“Resource temporarily unavailable”看不出是哪個資源出問題了。
經過調查,找到了下面一個連結
[code]
From:Andres Freund To:pgsql-hackers(at)postgresql(dot)orgSubject:PQConnectPoll, connect(2), EWOULDBLOCK and somaxconnDate:2013-06-17 14:16:22Message-ID:20130617141622.GH5875@alap2.anarazel.de (view raw, whole thread or download thread mbox)Thread: 2013-06-17 14:16:22 from Andres Freund 2013-06-26 11:22:58 from Andres Freund 2013-06-26 16:07:54 from Tom Lane 2013-06-26 18:12:00 from Andres Freund 2013-06-27 00:07:40 from Tom Lane 2013-06-27 06:17:57 from Andres Freund 2013-06-27 13:48:25 from Tom Lane 2013-06-27 16:42:47 from Tom Lane Lists:pgsql-hackersHi,
When postgres on linux receives connection on a high rate client
connections sometimes error out with:
could not send data to server: Transport endpoint is not connected
could not send startup packet: Transport endpoint is not connected
To reproduce start something like on a server with sufficiently high
max_connections:
pgbench -h /tmp -p 5440 -T 10 -c 400 -j 400 -n -f /tmp/simplequery.sql
Now that's strange since that error should happen at connect(2) time,
not when sending the startup packet. Some investigation led me to
fe-secure.c's PQConnectPoll:
if (connect(conn->sock, addr_cur->ai_addr,
addr_cur->ai_addrlen) < 0)
{
if (SOCK_ERRNO == EINPROGRESS ||
SOCK_ERRNO == EWOULDBLOCK ||
SOCK_ERRNO == EINTR ||
SOCK_ERRNO == 0)
{
/*
* This is fine - we're in non-blocking mode, and
* the connection is in progress. Tell caller to
* wait for write-ready on socket.
*/
conn->status = CONNECTION_STARTED;
return PGRES_POLLING_WRITING;
}
/* otherwise, trouble */
}
So, we're accepting EWOULDBLOCK as a valid return value for
connect(2). Which it isn't. EAGAIN in contrast is on some BSDs and on
linux. Unfortunately POSIX allows those two to share the same value...
My manpage tells me:
EAGAIN No more free local ports or insufficient entries in the routing cache. For
AF_INET see the description of
/proc/sys/net/ipv4/ip_local_port_range ip(7)
for information on how to increase the number of local
ports.
So, the problem is that we took a failed connection as having been
initially successfull but in progress.
Not accepting EWOULDBLOCK in the above if() results in:
could not connect to server: Resource temporarily unavailable
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5440"?
which makes more sense.
Trivial patch attached.
Now, the question is why we cannot complete connections on unix sockets?
Some code reading reading shows net/unix/af_unix.c:unix_stream_connect()
shows:
if (unix_recvq_full(other)) {
err = -EAGAIN;
if (!timeo)
goto out_unlock;
So, if we're in nonblocking mode - which we are - and the receive queue
is full we return EAGAIN. The receive queue for unix sockets is defined
as
static inline int unix_recvq_full(struct sock const *sk)
{
return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
}
Where sk_max_ack_backlog is whatever has been passed to the
listen(backlog) on the listening side.
Question: But postgres does listen(fd, MaxBackends * 2), how can that be
a problem?
Answer:
If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn,
then it is silently truncated to that value; the default value in this file is
128. In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with
the value 128.
Setting somaxconn to something higher indeed makes the problem go away.
I'd guess that pretty much the same holds true for tcp connections,
although I didn't verify that which would explain some previous reports
on the lists.
TLDR: Increase /proc/sys/net/core/somaxconn
Greetings,
Andres Freund
--
Andres Freund
PostgreSQL Development, 24x7 Support, Training & Services
[/code]
原來是PG服務端的listen backlog(受核心引數somaxconn限制)不夠用了,somaxconn的預設值是128,調大後,重啟PG再測就OK了。
/proc/sys/net/core/somaxconn
This file defines a ceiling value for the backlog argument of listen(2); see the listen(2) manual page
for details.
到這裡解決方案就很明瞭了,
echo 256 > /proc/sys/net/core/somaxconn
然後重新啟動pg 繼續進行就ok 了.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/133735/viewspace-2134427/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Oracle資料庫監聽Oracle資料庫
- 聊聊PG資料庫的防誤刪除問題資料庫
- 啟動資料庫監聽資料庫
- 資料庫開啟監聽資料庫
- 配置資料庫監聽白名單資料庫
- Mysql資料庫監聽binlogMySql資料庫
- 資料庫監聽夯故障分析資料庫
- 【PG資料庫】PG資料庫的安裝及連線方法資料庫
- Oracle資料庫修改LISTENER的監聽埠Oracle資料庫
- 11.2資料庫監聽的改變資料庫
- 資料庫text型別的長度?資料庫型別
- Canal-監聽資料庫表的變化資料庫
- 如何降低複雜度,用資料庫做訊息佇列的儲存?複雜度資料庫佇列
- 為什麼資料庫列的字元長度為191? - Grouparoo資料庫字元
- 【資料結構】迴圈佇列的front,rear指標以及佇列滿的條件、計算佇列長度資料結構佇列指標
- 清理資料庫監聽日誌最好方法資料庫
- oracle資料庫監聽啟動不了的原因分析Oracle資料庫
- Oracle資料庫的監聽器掛起情況Oracle資料庫
- Laravel 資料庫佇列倒序執行Laravel資料庫佇列
- 某次資料庫效能監控中發現的問題資料庫
- pg_resetwal pg_resetxlog 重整 pg資料庫 wal 與pg_controldata 。 資料庫恢復。資料庫LDA
- 監聽狀態正常,但是資料庫不能連線,提示沒有監聽....資料庫
- PG 資料庫 從阿里雲pg rds 同步資料。資料庫阿里
- 關於一個資料庫列設計的問題資料庫
- oracle 12c 資料庫例項監聽無法註冊問題一例Oracle資料庫
- PG資料庫初始化資料庫
- 監控資料庫連線遇到的一個小問題資料庫
- oracle的監聽問題Oracle
- 基於PMEM的PG資料庫Memhive資料庫Hive
- jboss訪問資料庫的問題資料庫
- 11gR2 DBCA建立資料庫global database name長度及db_name長度限制8位問題資料庫Database
- 資料庫同步問題資料庫
- 離線資料推送問題(訊息佇列)佇列
- 【監聽】兩庫互配靜態監聽
- dataguard主庫停監聽後還會把資料同步到備庫
- SessionBean呼叫資料庫的問題SessionBean資料庫
- 資料庫寫入的問題資料庫
- 如何監控ORACLE資料庫表的增長量Oracle資料庫