資料庫批次插入這麼講究的麼？

秦怀杂货店發表於2022-01-04

原文網址 : https://segmentfault.com/a/1190000041218630

資料庫

最近新的專案寫了不少各種 insertBatch 的程式碼，一直有人說，批量插入比迴圈插入效率高很多，那本文就來實驗一下,到底是不是真的？

測試環境：

SpringBoot 2.5
Mysql 8
JDK 8
Docker

首先，多條資料的插入，可選的方案：

foreach迴圈插入
拼接sql，一次執行
使用批處理功能插入

搭建測試環境`

sql檔案：

drop database IF EXISTS test;
CREATE DATABASE test;
use test;
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT "",
  `age` int(11) DEFAULT 0,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

應用的配置檔案：

server:
  port: 8081
spring:
  #資料庫連線配置
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding=utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true
    username: root
    password: 123456
#mybatis的相關配置
mybatis:
  #mapper配置檔案
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: com.aphysia.spingbootdemo.model
  #開啟駝峰命名
  configuration:
    map-underscore-to-camel-case: true
logging:
  level:
    root: error

啟動檔案，配置了Mapper檔案掃描的路徑：

import org.mybatis.spring.annotation.MapperScan;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
@MapperScan("com.aphysia.springdemo.mapper")
public class SpringdemoApplication {

    public static void main(String[] args) {
        SpringApplication.run(SpringdemoApplication.class, args);
    }

}

Mapper檔案一共準備了幾個方法，插入單個物件，刪除所有物件，拼接插入多個物件：

import com.aphysia.springdemo.model.User;
import org.apache.ibatis.annotations.Param;

import java.util.List;

public interface UserMapper {

    int insertUser(User user);

    int deleteAllUsers();


    int insertBatch(@Param("users") List<User>users);
}

Mapper.xml檔案如下：

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd" >
<mapper namespace="com.aphysia.springdemo.mapper.UserMapper">
    <insert id="insertUser" parameterType="com.aphysia.springdemo.model.User">
        insert  into user(id,age) values(#{id},#{age})
    </insert>

    <delete id="deleteAllUsers">
        delete from user where id>0;
    </delete>

    <insert id="insertBatch" parameterType="java.util.List">
        insert into user(id,age) VALUES
        <foreach collection="users" item="model" index="index" separator=",">
            (#{model.id}, #{model.age})
        </foreach>
    </insert>
</mapper>

測試的時候，每次操作我們都刪除掉所有的資料，保證測試的客觀，不受之前的資料影響。

不同的測試

1. foreach 插入

先獲取列表，然後每一條資料都執行一次資料庫操作，插入資料：

@SpringBootTest
@MapperScan("com.aphysia.springdemo.mapper")
class SpringdemoApplicationTests {

    @Autowired
    SqlSessionFactory sqlSessionFactory;

    @Resource
    UserMapper userMapper;

    static int num = 100000;

    static int id = 1;

    @Test
    void insertForEachTest() {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for (int i = 0; i < users.size(); i++) {
            userMapper.insertUser(users.get(i));
        }
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }
}

2. 拼接sql插入

其實就是用以下的方式插入資料：

INSERT INTO `user` (`id`, `age`) 
VALUES (1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15);

    @Test
    void insertSplicingTest() {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        userMapper.insertBatch(users);
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

3. 使用Batch批量插入

將MyBatis session 的 executor type 設為 Batch ，使用sqlSessionFactory將執行方式置為批量，自動提交置為false,全部插入之後，再一次性提交：

    @Test
    public void insertBatch(){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0;i<users.size();i++){
            mapper.insertUser(users.get(i));
        }
        sqlSession.commit();
        sqlSession.close();
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

4. 批量處理+分批提交

在批處理的基礎上，每1000條資料，先提交一下，也就是分批提交。

    @Test
    public void insertBatchForEachTest(){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0;i<users.size();i++){
            mapper.insertUser(users.get(i));
            if (i % 1000 == 0 || i == num - 1) {
                sqlSession.commit();
                sqlSession.clearCache();
            }
        }
        sqlSession.close();
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

初次結果，明顯不對？

執行上面的程式碼，我們可以得到下面的結果，for迴圈插入的效率確實很差，拼接的sql效率相對高一點，看到有些資料說拼接sql可能會被mysql限制，但是我執行到1000w的時候，才看到堆記憶體溢位。

下面是不正確的結果！！！

插入方式	10	100	1000	1w	10w	100w	1000w
for迴圈插入	387	1150	7907	70026	635984	太久了...	太久了...
拼接sql插入	308	320	392	838	3156	24948	OutOfMemoryError: 堆記憶體溢位
批處理	392	917	5442	51647	470666	太久了...	太久了...
批處理 + 分批提交	359	893	5275	50270	472462	太久了...	太久了...

拼接sql並沒有超過記憶體

我們看一下mysql的限制:

mysql> show VARIABLES like '%max_allowed_packet%';
+---------------------------+------------+
| Variable_name             | Value      |
+---------------------------+------------+
| max_allowed_packet        | 67108864   |
| mysqlx_max_allowed_packet | 67108864   |
| slave_max_allowed_packet  | 1073741824 |
+---------------------------+------------+
3 rows in set (0.12 sec)

這67108864足足600多M，太大了，怪不得不會報錯，那我們去改改一下它吧，改完重新測試：

首先在啟動mysql的情況下，進入容器內，也可以直接在Docker桌面版直接點Cli圖示進入：

docker exec -it mysql bash

進入/etc/mysql目錄，去修改my.cnf檔案：

cd /etc/mysql

先按照vim，要不編輯不了檔案：

apt-get update
apt-get install vim

修改my.cnf

vim my.cnf

在最後一行新增max_allowed_packet=20M(按i編輯，編輯完按esc，輸入:wq退出)

 
[mysqld]
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
datadir         = /var/lib/mysql
secure-file-priv= NULL
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
 
# Custom config should go here
!includedir /etc/mysql/conf.d/
max_allowed_packet=2M

退出容器

# exit

檢視mysql容器id

docker ps -a

重啟mysql

docker restart c178e8998e68

重啟成功後檢視最大的max_allowed_pactet，發現已經修改成功：

mysql> show VARIABLES like '%max_allowed_packet%';
+---------------------------+------------+
| Variable_name             | Value      |
+---------------------------+------------+
| max_allowed_packet        | 2097152    |
| mysqlx_max_allowed_packet | 67108864   |
| slave_max_allowed_packet  | 1073741824 |
+---------------------------+------------+

我們再次執行拼接sql，發現100w的時候，sql就達到了3.6M左右，超過了我們設定的2M，成功的演示丟擲了錯誤：

org.springframework.dao.TransientDataAccessResourceException: 
### Cause: com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.
; Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.; nested exception is com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.

批量處理為什麼這麼慢？

但是，仔細一看就會發現，上面的方式，怎麼批處理的時候，並沒有展示出優勢了，和for迴圈沒有什麼區別？這是對的麼？

這肯定是不對的，從官方文件中，我們可以看到它會批量更新，不會每次去建立預處理語句，理論是更快的。

然後我發現我的一個最重要的問題：資料庫連線 URL 地址少了rewriteBatchedStatements=true

如果我們不寫，MySQL JDBC 驅動在預設情況下會忽視 executeBatch() 語句，我們期望批量執行的一組 sql 語句拆散，但是執行的時候是一條一條地發給 MySQL 資料庫，實際上是單條插入，直接造成較低的效能。我說怎麼效能和迴圈去插入資料差不多。

只有將 rewriteBatchedStatements 引數置為 true, 資料庫驅動才會幫我們批量執行 SQL。

正確的資料庫連線：

jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding=utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true&&&rewriteBatchedStatements=true

找到問題之後，我們重新測試批量測試,最終的結果如下：

插入方式	10	100	1000	1w	10w	100w	1000w
for迴圈插入	387	1150	7907	70026	635984	太久了...	太久了...
拼接sql插入	308	320	392	838	3156	24948（很可能超過sql長度限制）	OutOfMemoryError: 堆記憶體溢位
批處理(重點)	333	323	362	636	1638	8978	OutOfMemoryError: 堆記憶體溢位
批處理 + 分批提交	359	313	394	630	2907	18631	OutOfMemoryError: 堆記憶體溢位

從上面的結果來看，確實批處理是要快很多的，當數量級太大的時候，其實都會超過記憶體溢位的，批處理加上分批提交併沒有變快，和批處理差不多，反而變慢了，提交太多次了，拼接sql的方案在數量比較少的時候其實和批處理相差不大，最差的方案就是for迴圈插入資料，這真的特別的耗時。100條的時候就已經需要1s了，不能選擇這種方案。

一開始發現批處理比較慢的時候，真的挺懷疑自己，後面發現是有一個引數，有一種撥開雲霧的感覺，知道得越多，不知道的越多。

【作者簡介】：
秦懷，公眾號【秦懷雜貨店】作者，技術之路不在一時，山高水長，縱使緩慢，馳而不息。

劍指Offer全部題解PDF

2020年我寫了什麼？

開源程式設計筆記

咦，Java拆分個字串都這麼講究
2021-12-28
Java字串
mybatis 批次插入資料
2024-12-02
MyBatis
mybatis oracle資料庫批次插入資料，忽略主鍵重複
2024-03-14
MyBatisOracle資料庫
關於PHP往mysql資料庫中批次插入資料例項教程
2019-01-22
PHPMySql資料庫
經驗分享|BI資料視覺化大屏的顏色有什麼講究？
2023-03-13
視覺化
使用JDBC向MySQL資料庫批次插入10W條資料測試效率
2020-04-04
JDBCMySql資料庫
為什麼NoSQL資料庫這麼受歡迎？
2022-12-26
SQL資料庫
資料庫-批次更新
2020-10-15
資料庫
mongodb資料庫中插入資料
2018-06-23
MongoDB資料庫
mybatis+oracle 批次插入多條資料的處理方法
2024-10-09
MyBatisOracle
python如何將資料插入資料庫
2021-09-11
Python資料庫
在MySQL資料庫中，這4種方式可以避免重複的插入資料！
2020-10-20
MySql資料庫
MySQL 資料庫表格建立、資料插入及獲取插入的 ID：Python 教程
2023-11-09
MySql資料庫Python
mybatis批次插入
2024-04-19
MyBatis
Yii 批次插入
2022-08-24
老闆今天問我為什麼公司的資料庫這麼爛，我是這樣回答的......
2018-09-14
資料庫
插入排序就這麼簡單
2018-03-21
排序
U盤的使用要講究
2022-04-27
分庫分表插入資料
2020-12-09
eazyexcel 讀取excel資料插入資料庫
2020-11-04
Excel資料庫
python資料插入連線MySQL資料庫
2020-04-23
PythonMySql資料庫
Druid資料庫連線池就這麼簡單
2019-03-03
UI資料庫
解決hive資料庫插入資料很慢的問題
2024-09-23
Hive資料庫
請問這個資料庫統計資訊是什麼，
2022-02-03
資料庫
小程式批次刪除雲資料庫裡的資料
2021-09-09
資料庫
怎麼快速插入 100 條資料，用時最短！
2020-04-06
自增長主鍵回顯實現，批次資料插入
2024-03-06
這麼強？！Erda MySQL Migrator：持續整合的資料庫版本控制
2022-07-08
MySql資料庫
.NET 資料庫大資料方案（插入、更新、刪除、查詢、插入或更新）
2023-10-08
資料庫大資料
【資料庫】簡單聊聊資料庫可以做什麼，有什麼用？
2024-01-09
資料庫
【資料庫】雲資料庫rds是什麼意思？有什麼優勢？
2022-03-04
資料庫
mybatis插入資料、批量插入資料
2021-12-08
MyBatis
Entity Framework Core 7中高效地進行批次資料插入
2022-12-01
Framework
什麼是真正的自治資料庫？
2022-12-05
資料庫
Python資料分析庫之pandas，你該這麼學！No.1
2019-05-15
Python
雲資料庫RDS是什麼？雲資料庫RDS有什麼優勢？
2021-03-29
資料庫
oracle-資料庫- insert 插入語句
2020-11-02
Oracle資料庫
Java MyBatis 插入資料庫返回主鍵
2020-11-24
JavaMyBatis資料庫