連線池溢位以及大量查詢系統表的問題

壹頁書發表於2015-10-20
最近一個月已經出了三次當機事故了
第一次因為程式有漏洞,一個搜尋的程式,一旦快取失效,會全表掃描資料庫並排序.
由於沒有做好併發控制,一旦快取失效,會造成大量的全表掃描

第二次因為程式設計有漏洞。每個使用者的空間,都有訪問次數,每個PV都會在資料庫產生一個update +1 的SQL.
這個設計的雷,埋了很長時間了,大家心裡都清楚.就是沒有動力去改.
平時這個漏洞也有,被大量訪問,會造成幾分鐘不能響應,但是卡一會兒就好了.
直到這次,海量的update,直接把所有的連線都堵了.
甚至DBA帳號在第一時間都登入不了..

第三次也是相似的經歷,海量的連線直接撲上來,資料庫資源瞬間耗盡,迴天乏力.


這三次事故有一個共同點,就是來勢猛
(除了第一次是因為程式的bug,另外兩次也不排除是人為攻擊.因為基礎架構設計確實有些問題.)
從接到郵件報警,登入伺服器,第一時間看到的現象就是
CPU使用率從正常的15%飆升到90%以上.甚至100%.
幾千個客戶端連線,已經達到了資料庫配置的最大連線數.

讓我感到慌張的有幾點
1.大量的客戶端連線達到了資料庫配置的最大連線數,所以第一時間,我可能登入不了資料庫了.
2.好不容易登入了資料庫,非常卡頓.
3.檢視正在執行的查詢,一屏一屏的SQL,好幾千個.引起問題的SQL,往往不是執行最多的那個.
要在這種慌亂的情緒中,找到問題SQL,就有點難度了.

總結這三次事故,我發現每次都伴隨著海量的連線.但是應用配置了連線池,怎麼會動輒產生好幾千個資料庫連線呢?
回想之前我們反映的問題,Oracle監聽器日誌顯示,每小時大致有12w連線.甚至有ORA_3136 WARNING__inbound_connection_timed_out的報警.但是一直沒有給我們反饋.

我猜測是因為程式的連線池出現了洩露。
因為我們的程式碼構成很複雜.從早期Struts+JDBC+DBCP1.4連線池到後來的Spring+MyBatis
+Resin JNDI資料來源,都出現了連線溢位的問題.
我當時猜測不是連線池的問題,就是驅動程式bug的問題.
當然經過排查,確實是連線池的問題.有溢位.

我將連線池更換為c3p0 v0.9.5.1之後,大量連線Oracle的問題已經解決.終於監聽器日誌不再刷了.

隔天,AWR檢查,發現一個SQL在4個小時內居然執行196w次.佔用資料庫負載3%左右.
SELECT NULL AS table_cat,
       o.owner AS table_schem,
       o.object_name AS table_name,
       o.object_type AS table_type,
       NULL AS remarks
  FROM all_objects o
  WHERE o.owner LIKE :1 
    AND o.object_name LIKE :2 
    AND o.object_type IN ('xxx', 'TABLE')
  ORDER BY table_type, table_schem, table_name

剛開始以為是系統被SQL隱碼攻擊了..
因為之前正在排查硬解析過高的問題
後來發現是JDBC Metadata的呼叫.
但是這個呼叫的頻率,明顯異常.
最後發現,原來是我使用c3p0連線池的時候,還需要配置preferredTestQuery引數.
否則,預設測試連線的行為,就是DatabaseMetaData.getTables()這種操作.
preferredTestQuery設定為"select 1 from dual" 問題解決.

原文如下:
http://www.mchange.com/projects/c3p0/#configuring_connection_testing

Configuring Connection TestingGo To Top

c3p0 can be configured to test the Connections that it pools in a variety of ways, to minimize the likelihood that your application will see broken or "stale" Connections. Pooled Connections can go bad for a variety of reasons -- some JDBC drivers intentionally "time-out" long-lasting database Connections; back-end databases or networks sometimes go down "stranding" pooled Connections; and Connections can simply become corrupted over time and use due to resource leaks, driver bugs, or other causes.

c3p0 provides users a great deal of flexibility in testing Connections, via the following configuration parameters:

idleConnectionTestPeriodtestConnectionOnCheckout, and testConnectionOnCheckin control when Connections will be tested. automaticTestTableconnectionTesterClassName, and preferredTestQuery control how they will be tested.

When configuring Connection testing, first try to minimize the cost of each test. If you are using a JDBC driver that you are certain supports the new(ish) jdbc4 API — and if you are using c3p0-0.9.5 or higher! — let your driver handle this for you. jdbc4 Connections include a method called isValid() that should be implemented as a fast, reliable Connection test. By default, c3p0 will use that method if it is present.

However, if your driver does not support this new-ish API, c3p0's default behavior is to test Connections by calling the getTables() method on a Connection's associated DatabaseMetaData object. This has the advantage of being very robust and working with any database, regardless of the database schema. However, a call to DatabaseMetaData.getTables() is often much slower than a simple database query, and using this test may significantly impair your pool's performance.

The simplest way to speed up Connection testing under a JDBC 3 driver (or a pre-0.9.5 version of c3p0) is to define a test query with the preferredTestQuery parameter. Be careful, however. Setting preferredTestQuery will lead to errors as Connection tests fail if the query target table does not exist in your database prior to initialization of your DataSource. Depending on your database and JDBC driver, a table-independent query like SELECT 1 may (or may not) be sufficient to verify the Connection. If a table-independent query is not sufficient, instead of preferredTestQuery, you can set the parameter automaticTestTable. Using the name you provide, c3p0 will create an empty table, and make a simple query against it to test the database.

The most reliable time to test Connections is on check-out. But this is also the most costly choice from a client-performance perspective. Most applications should work quite reliably using a combination of idleConnectionTestPeriod and testConnectionOnCheckin. Both the idle test and the check-in test are performed asynchronously, which can lead to better performance, both perceived and actual.

For some applications, high performance is more important than the risk of an occasional database exception. In its default configuration, c3p0 does no Connection testing at all. Setting a fairly long idleConnectionTestPeriod, and not testing on checkout and check-in at all is an excellent, high-performance approach.

It is possible to customize how c3p0's DefaultConnectionTester tests when no preferredTestQuery or automaticTestTable are available. Please see Configuring DefaultConnectionTester.isValidTimeout and Configuring DefaultConnectionTester.QuerylessTestRunner.

Advanced users may define any kind of Connection testing they wish, by implementing a ConnectionTester and supplying the fully qualified name of the class as connectionTesterClassName. If you'd like your custom ConnectionTesters to honor and support thepreferredTestQuery and automaticTestTable parameters, implement UnifiedConnectionTester, most conveniently by extending AbstractConnectionTester. See the api docs for more information.

If you know you want to use the jdbc4 Connection.isValid() method, but you want to set a timeout, consider writing a trivial extension of IsValidConnectionTester.

package com.mchange.v2.c3p0.example; import com.mchange.v2.c3p0.util.IsValidOnlyConnectionTester; public final class IsValidOnlyConnectionTester30 extends IsValidOnlyConnectionTester { protected int getIsValidTimeout() { return 30; } }

See? These really are trivial to write.

Simple advice on Connection testing

If you don't know what to do, try this:

  1. If you know your driver supports the JDBC 4 Connection.isValid(...) method and you are using c3p0-0.9.5 or above, don't set a preferredTestQuery. If your driver does not support this method (or if you are not sure), try SELECT 1 for your preferredTestQuery, if you are running MySQL or Postgres. For other databases, look for suggestions here. Leave automatedTestTable undefined.

  2. Begin by setting testConnectionOnCheckout to true and get your application to run correctly and stably. If you are happy with your application's performance, you can stop here! This is the simplest, most reliable form of Connection-testing, but it does have a client-visible performance cost.

  3. If you'd like to improve performance by eliminating Connection testing from clients' code path:

    1. Set testConnectionOnCheckout to false

    2. Set testConnectionOnCheckin to true

    3. Set idleConnectionTestPeriod to 30, fire up you application and observe. This is a pretty robust setting, all Connections will tested on check-in and every 30 seconds thereafter while in the pool. Your application should experience broken or stale Connections only very rarely, and the pool should recover from a database shutdown and restart quickly. But there is some overhead associated with all that Connection testing.

    4. If database restarts will be rare so quick recovery is not an issue, consider reducing the frequency of tests by idleConnectionTestPeriod to, say, 300, and see whether clients are troubled by stale or broken Connections. If not, stick with 300, or try an even bigger number. Consider settingtestConnectionOnCheckin back to false to avoid unnecessary tests on checkin. Alternatively, if your application does encounter bad Connections, consider reducing idleConnectionTestPeriod and set testConnectionOnCheckin back to true. There are no correct or incorrect values for these parameters: you are trading off overhead for reliability in deciding how frequently to test. The exact numbers are not so critical. It's usually easy to find configurations that perform well. It's rarely worth spending time in pursuit of "optimal" values here.

So, when should you stick with simple and reliable (Step 2 above), and when is it worth going for better performance (Step 3)? In general, it depends on how much work clients typically do with Connections once they check them out. If clients usually make complex queries and/or perform multiple operations, adding the extra cost of one fast test per checkout will not much affect performance. But if your application typically checks out a Connection and performs one simple query with it, throwing in an additional test can really slow things down.

That's nice in theory, but often people don't really have a good sense of how much work clients perform on average. The best thing to do is usually to try Step 3, see if it helps (however you measure performance), see if it hurts (is your application troubled by broken Connections? does it recover from database restarts well enough?), and then decide. You can always go back to simple, slow, and robust. Just set testConnectionOnCheckout to truetestConnectionOnCheckin to false, and set idleConnectionTestPeriod to 0.

But do, always, be sure that your tests themselves are performant, either because your JDBC driver supports Connection.isValid(...) or because you have set an efficient preferredTestQuery !!!


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29254281/viewspace-1815054/,如需轉載,請註明出處,否則將追究法律責任。

相關文章