豬豬最近在研究資料庫持久層的效能最佳化，做了大量的知識儲備，今天分享一篇著名的flexy-pool作者在2019年1月22日更新的該領域的技術文章。我們先看譯文，再看英文原文。翻譯不當的地方，大家可以直接看英文原文哈。

一個高效能的資料訪問層需要大量關於資料庫內部、JDBC、JPA、Hibernate的知識，本文總結了一些可用來最佳化企業應用程式的重要的技術。

1. SQL語句日誌

如果您用了生成符合自己使用習慣的語句的框架，則應始終驗證每個語句的有效性和效率。測試時使用斷言機制驗證更好，因為即使在提交程式碼之前，也可以捕獲N + 1個查詢問題。

2.連線管理

資料庫的連線開銷非常大，因此您應該始終使用連線池機制。

由於連線數由底層資料庫叢集的功能給出，所以您需要儘可能快地釋放連線。

在效能調優中，你總是要測量、設定出正確的連線池，池的大小又是差不多的。但像FlexyPool這樣工具可以幫助您找到合適的大小，即使您已經將應用程式部署到生產環境中。

3.JDBC批處理

JDBC批處理允許我們在單個資料庫往返中傳送多個SQL語句。效能增益在驅動程式和資料庫端都很重要。PreparedStatements 非常適合批處理，而某些資料庫系統（例如 Oracle）僅支援用於預處理語句的批處理。

由於JDBC為批處理定義了獨特的API（例如PreparedStatement.addBatch和PreparedStatement.executeBatch），如果您手動生成語句，那麼您應該從一開始就知道是否應該使用批處理。使用Hibernate，您可以切換到使用單個配置的批處理。

Hibernate 5.2 提供了會話級別的批處理，所以在這方面更加靈活。

4.語句快取

語句快取是您可以輕鬆利用的最鮮為人知的效能最佳化之一。根據基礎的JDBC驅動程式，可以在客戶端（驅動程式）或資料庫端（語法樹甚至執行計劃）上快取PreparedStatements。

5.Hibernate 識別符號

當使用Hibernate時，IDENTITY生成器不是一個好的選擇，因為它禁用了JDBC批處理。

TABLE生成器更糟糕，因為它使用一個單獨的事務來獲取新的識別符號，這會對底層事務日誌以及連線池造成壓力，因為每次我們需要一個新的識別符號時都需要單獨的連線。

SEQUENCE是正確的選擇，甚至從2012版本就開始支援SQL Server。對於SEQUENCE識別符號，Hibernate一直提供最佳化器，如 pooled 或 pooled-lo，這可以減少獲取新的實體識別符號值所需的資料庫往返次數。

6.選擇正確的列型別

您應該始終在資料庫端使用正確的列型別。列型別越緊湊，資料庫工作集中可容納的條目越多，索引將更好地適應於記憶體。為此，您應該利用特定於資料庫的型別（例如PostgreSQL中的IPv4地址的inet），尤其是在實現新自定義型別時，Hibernate非常靈活。

7 .關係

Hibernate 帶有許多關係對映型別，但並不是所有的關係對映型別在效率上都是相等的。

14個高效能Java永續性技巧

應該避免單向集合和 @ManyToMany 列表。如果您確實需要使用實體集合，則首選雙向 @OneToMany關聯。對於 @ManyToMany 關係，使用 Set(s)，因為在這種情況下它們更高效，或者簡單地對映連結的多對多表，並將 @ManyToMany 關係轉換為兩個雙向的 @OneToMany 關聯。

然而，與查詢不同，集合不夠靈活，因為它們不易分頁，這意味著當子關聯的數量相當高時，我們不能使用它們。出於這個原因，你應該考慮一個集合是否真的有必要。在許多情況下，實體查詢可能是更好的選擇。

8.繼承

就繼承而言，面嚮物件語言和關聯式資料庫之間的不匹配變得更加明顯。 JPA提供了SINGLE_TABLE，JOINED和TABLE_PER_CLASS來處理繼承對映，每個策略都有其優缺點。

SINGLE_TABLE在SQL語句方面表現最好，但由於我們不能使用NOT NULL約束，所以我們在資料完整性方面失敗了。

當同時提供更復雜的語句時，JOINED採用資料完整性限制。只要你不使用基本型別的多型查詢或@OneToMany關聯，這個策略就沒有問題。它的真正的作用在於對資料訪問層上由策略模式支援的多型@ManyToOne關聯。

應該避免使用TABLE_PER_CLASS，因為它不會生成有效的SQL語句。

9.永續性上下文的大小

在使用 JPA 和 Hibernate 時，應該始終關注永續性上下文的大小。出於這個原因，您不應該過多地使用託管實體。透過限制託管實體的數量，我們可以獲得更好的記憶體管理，並且預設的檢查機制也將更加高效。

10.只抓取必要的東西

獲取太多的資料可能是導致資料訪問層效能出問題的首要原因。一個問題是，即使是隻讀的 Projections，實體查詢也是專用的。

DTO projections更適合於獲取自定義檢視，而實體只能在業務流需要修改時才能獲取。

EAGER抓取是最糟糕的，您應該避免反模式(Anti-Pattern)，例如 Open-Session in View。

11.快取記憶體

14個高效能Java永續性技巧

關聯式資料庫系統使用許多記憶體緩衝區結構來避免磁碟訪問。資料庫快取經常被忽視。我們可以透過適當調整資料庫引擎來顯著降低響應時間，以便工作集駐留在記憶體中，而不是一直從磁碟中獲取。

應用程式級快取對於許多企業應用程式來說是不可選的。應用程式級快取可以減少響應時間，同時為資料庫關閉以進行維護或由於某些嚴重系統故障提供只讀輔助儲存庫。

二級快取對於減少讀寫事務響應時間非常有用，特別是在主從複製體系結構中。根據應用程式的要求，Hibernate允許你在READ_ONLY，NONSTRICT_READ_WRITE，READ_WRITE和TRANSACTIONAL之間進行選擇。

12.併發控制

在效能和資料完整性方面，事務隔離級別的選擇是非常重要的。對於多請求Web流程，為避免丟失更新，您應該對分離的實體或 EXTENDED 永續性上下文使用 optimistic 鎖定。

為避免optimistic locking誤報，您可以使用無版本 optimistic 併發控制或基於讀寫的屬性集來拆分實體。

13.釋放資料庫查詢功能

僅僅因為您使用JPA或Hibernate，並不意味著您不應該使用原生查詢。您應該利用視窗函式，CTE（公用表表示式），CONNECT BY，PIVOT 查詢。

這些構造允許您避免獲取太多的資料，以便稍後在應用程式層進行轉換。如果可以讓資料庫進行處理，那麼只能獲取最終結果，因此可以節省大量的磁碟I / O和網路開銷。為避免主節點過載，可以使用資料庫複製和擁有多個從屬節點，這樣資料密集型的任務就會在從屬節點而不是主節點上執行。

14.橫向擴充套件和縱向擴充套件

關聯式資料庫的伸縮性非常好。如果Facebook、Twitter、Pinterest或StackOverflow可以擴充套件他們的資料庫系統，那麼很有可能您可以將企業應用程式擴充套件到其特定的業務需求。

資料庫複製和分片是提高吞吐量的很好的方法，您應該完全可以利用這些經過測試的架構模式來擴充套件您的企業應用程式。

14個高效能Java永續性技巧

結論

高效能資料訪問層必須與底層資料庫系統互相響應。瞭解關聯式資料庫和正在使用的資料訪問框架的內部工作原理可以使企業高效能應用程式和幾乎沒有crawls的應用程式之間產生差異。

原文：

14 High-Performance Java Persistence Tips

(Last Updated On: January 22, 2019)

Introduction

A high-performance data access layer requires a lot of knowledge about database internals, JDBC, JPA, Hibernate, and this post summarizes some of the most important techniques you can use to optimize your enterprise application.

1. SQL statement logging

If you’re using a framework that generates statements on your behalf, you should always validate each statement effectiveness and efficiency. A testing-time assertion mechanism is even better because you can catch N+1 query problems even before you commit your code.

2. Connection management

Database connections are expensive, therefore you should always use a connection pooling mechanism.

Because the number of connections is given by the capabilities of the underlying database cluster, you need to release connections as fast as possible.

In performance tuning, you always have to measure, and setting the right pool size is no different. A tool like FlexyPool can help you find the right size even after you deployed your application into production.

3. JDBC batching

JDBC batching allows us to send multiple SQL statements in a single database roundtrip. The performance gain is significant both on the Driver and the database side. PreparedStatements are very good candidates for batching, and some database systems (e.g. Oracle) support batching only for prepared statements only.

Since JDBC defines a distinct API for batching (e.g. PreparedStatement.addBatch and PreparedStatement.executeBatch), if you’re generating statements manually, then you should know right from the start whether you should be using batching or not. With Hibernate, you can switch to batching with a single configuration.

Hibernate 5.2 offers Session-level batching, so it’s even more flexibile in this regard.

4. Statement caching

Statement caching is one of the least-known performance optimization that you can easily take advantage of. Depending on the underlying JDBC Driver, you can cache PreparedStatements both on the client-side (the Driver) or databases-side (either the syntax tree or even the execution plan).

5. Hibernate identifiers

When using Hibernate, the IDENTITY generator is not a good choice since it disables JDBC batching.

TABLE generator is even worse since it uses a separate transaction for fetching a new identifier, which can put pressure on the underlying transaction log, as well as the connection pool since a separate connection is required every time we need a new identifier.

SEQUENCE is the right choice, and even SQL Server supports since version 2012. For SEQUENCEidentifiers, Hibernate has long been offering optimizers like pooled or pooled-lo which can reduce the number of database roundtrips required for fetching a new entity identifier value.

6. Choosing the right column types

You should always use the right column types on the database side. The more compact the column type is, the more entries can be accommodated in the database working set, and indexes will better fit into memory. For this purpose, you should take advantage of database-specific types (e.g. inet for IPv4 addresses in PostgreSQL), especially since Hibernate is very flexible when it comes to implementing a new custom Type.

7. Relationships

Hibernate comes with many relationship mapping types, but not all of them are equal in terms of efficiency.

14個高效能Java永續性技巧

Unidirectional collections and @ManyToMany List(s) should be avoided. If you really need to use entity collections, then bidirectional @OneToMany associations are preferred. For the @ManyToManyrelationship, use Set(s) since they are more efficient in this case or simply map the linked many-to-many table as well and turn the @ManyToMany relationship into two bidirectional @OneToMany associations.

However, unlike queries, collections are less flexible since they cannot be easily paginated, meaning that we cannot use them when the number of child associations is rather high. For this reason, you should always question if a collection is really necessary. An entity query might be a better alternative in many situations.

8. Inheritance

When it comes to inheritance, the impedance mismatch between object-oriented languages and relational databases becomes even more apparent. JPA offers SINGLE_TABLE, JOINED, and TABLE_PER_CLASS to deal with inheritance mapping, and each of these strategies has pluses and minuses.

SINGLE_TABLE performs the best in terms of SQL statements, but we lose on the data integrity side since we cannot use NOT NULL constraints.

JOINED addresses the data integrity limitation while offering more complex statements. As long as you don’t use polymorphic queries or @OneToMany associations against base types, this strategy is fine. Its true power comes from polymorphic @ManyToOne associations backed by a Strategy pattern on the data access layer side.

TABLE_PER_CLASS should be avoided since it does not render efficient SQL statements.

9. Persistence Context size

When using JPA and Hibernate, you should always mind the Persistence Context size. For this reason, you should never bloat it with tons of managed entities. By restricting the number of managed entities, we gain better memory management, and the default dirty checking mechanism is going to be more efficient as well.

10. Fetching only what’s necessary

Fetching too much data is probably the number one cause for data access layer performance issues. One issue is that entity queries are used exclusively, even for read-only projections.

DTO projections are better suited for fetching custom views, while entities should only be fetched when the business flow requires to modify them.

EAGER fetching is the worst, and you should avoid anti-patterns such as Open-Session in View.

11. Caching

14個高效能Java永續性技巧

Relational database systems use many in-memory buffer structures to avoid disk access. Database caching is very often overlooked. We can lower response time significantly by properly tuning the database engine so that the working set resides in memory and is not fetched from disk all the time.

Application-level caching is not optional for many enterprise application. Application-level caching can reduce response time while offering a read-only secondary store for when the database is down for maintenance or because of some serious system failure.

The second-level cache is very useful for reducing read-write transaction response time, especially in Master-Slave replication architectures. Depending on application requirements, Hibernate allows you to choose between READ_ONLY, NONSTRICT_READ_WRITE, READ_WRITE, and TRANSACTIONAL.

12. Concurrency control

The choice of transaction isolation level is of paramount importance when it comes to performance and data integrity. For multi-request web flows, to avoid lost updates, you should use optimistic locking with detached entities or an EXTENDED Persistence Context.

To avoid optimistic locking false positives, you can use versionless optimistic concurrency control or split entities based write-based property sets.

13. Unleash database query capabilities

Just because you use JPA or Hibernate, it does not mean that you should not use native queries. You should take advantage of Window Functions, CTE (Common Table Expressions), CONNECT BY, PIVOT.

These constructs allow you to avoid fetching too much data just to transform it later in the application layer. If you can let the database do the processing, you can fetch just the end result, therefore, saving lots of disk I/O and networking overhead. To avoid overloading the Master node, you can use database replication and have multiple Slave nodes available so that data-intensive tasks are executed on a Slave rather than on the Master.

14. Scale up and scale out

Relational databases do scale very well. If Facebook, Twitter, Pinterest or StackOverflow can scale their database system, there is good chance you can scale an enterprise application to its particular business requirements.

14個高效能Java永續性技巧

Database replication and sharding are very good ways to increase throughput, and you should totally take advantage of these battle-tested architectural patterns to scale your enterprise application.

14個高效能Java永續性技巧

2.連線管理

3.JDBC批處理

4.語句快取

5.Hibernate 識別符號

6.選擇正確的列型別

7 .關係

8.繼承

9.永續性上下文的大小

10.只抓取必要的東西

11.快取記憶體

12.併發控制

13.釋放資料庫查詢功能

14.橫向擴充套件和縱向擴充套件

結論

1. SQL statement logging

2. Connection management

3. JDBC batching

4. Statement caching

5. Hibernate identifiers

6. Choosing the right column types

7. Relationships

8. Inheritance

9. Persistence Context size

10. Fetching only what’s necessary

11. Caching

12. Concurrency control

13. Unleash database query capabilities

14. Scale up and scale out

相關文章