因為 int32 溢位導致的 peewee get_or_create時出現 peewee.IntegrityError錯誤事故記錄
詳細看這裡:https://github.com/coleifer/peewee/issues/2948
關於使用 peewee 的 get_or_create 時候,出現 peewee.IntegrityError 報錯
下面的是完整的程式碼日誌
error handling worker <WorkerContext [taisan_task_ocr_service.start_ocr] at 0x7efd9bd31410>: (1062, "Duplicate entry '2147483647' for key 'found_video_cover_image_ocr_count.foundedvideocoverimageocrcounttable_video_id'")
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 7285, in get
return clone.execute(database)[0]
~~~~~~~~~~~~~~~~~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 4581, in __getitem__
return self.row_cache[item]
~~~~~~~~~~~~~~^^^^^^
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 6875, in get_or_create
return query.get(), False
^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 7288, in get
raise self.model.DoesNotExist('%s instance matching query does '
core.mysql.models.FoundedVideoCoverImageOcrCountTableDoesNotExist: <Model: FoundedVideoCoverImageOcrCountTable> instance matching query does not exist:
SQL: SELECT `t1`.`id`, `t1`.`video_id`, `t1`.`search_count`, `t1`.`created_at`, `t1`.`updated_at` FROM `found_video_cover_image_ocr_count` AS `t1` WHERE (`t1`.`video_id` = %s) LIMIT %s OFFSET %s
Params: [2397088498, 1, 0]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/nameko/containers.py", line 392, in _run_worker
result = method(*worker_ctx.args, **worker_ctx.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/services/taisan_task_ocr_service.py", line 25, in start_ocr
fvcisc, fvcisc_created = FoundedVideoCoverImageOcrCountTable.get_or_create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 6877, in get_or_create
raise exc
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 6872, in get_or_create
return cls.create(**kwargs), True
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 6741, in create
inst.save(force_insert=True)
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 6951, in save
pk = self.insert(**field_dict).execute()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 2036, in inner
return method(self, database, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 2107, in execute
return self._execute(database)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 2912, in _execute
return super(Insert, self)._execute(database)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 2625, in _execute
cursor = database.execute(self)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 3330, in execute
return self.execute_sql(sql, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/playhouse/shortcuts.py", line 256, in execute_sql
return self._reconnect(super(ReconnectMixin, self).execute_sql, sql, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/playhouse/shortcuts.py", line 268, in _reconnect
raise exc
File "/usr/local/lib/python3.11/site-packages/playhouse/shortcuts.py", line 263, in _reconnect
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 3320, in execute_sql
with __exception_wrapper__:
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 3088, in __exit__
reraise(new_type, new_type(exc_value, *exc_args), traceback)
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 196, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.11/site-packages/peewee.py", line 3322, in execute_sql
cursor.execute(sql, params or ())
File "/usr/local/lib/python3.11/site-packages/pymysql/cursors.py", line 153, in execute
result = self._query(query)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymysql/cursors.py", line 322, in _query
conn.query(q)
File "/usr/local/lib/python3.11/site-packages/pymysql/connections.py", line 563, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymysql/connections.py", line 825, in _read_query_result
result.read()
File "/usr/local/lib/python3.11/site-packages/pymysql/connections.py", line 1199, in read
first_packet = self.connection._read_packet()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymysql/connections.py", line 775, in _read_packet
packet.raise_for_error()
File "/usr/local/lib/python3.11/site-packages/pymysql/protocol.py", line 219, in raise_for_error
err.raise_mysql_exception(self._data)
File "/usr/local/lib/python3.11/site-packages/pymysql/err.py", line 150, in raise_mysql_exception
raise errorclass(errno, errval)
peewee.IntegrityError: (1062, "Duplicate entry '2147483647' for key 'found_video_cover_image_ocr_count.foundedvideocoverimageocrcounttable_video_id'")
下面是 peewee 的模型
class FoundedVideoCoverImageOcrCountTable(Model):
id = BigAutoField(primary_key=True)
video_id = IntegerField(unique=True)
search_count = IntegerField(default=0)
created_at = DateTimeField(
null=False,
constraints=[SQL('DEFAULT CURRENT_TIMESTAMP')],
help_text='使用資料庫時間'
)
updated_at = DateTimeField(
null=False,
constraints=[
SQL('DEFAULT CURRENT_TIMESTAMP'),
SQL('ON UPDATE CURRENT_TIMESTAMP'),
]
)
class Meta:
database = super_taisan_db
table_name = 'found_video_cover_image_ocr_count'
資料庫裡面的 ddl
-- super_taisan.found_video_cover_image_ocr_count definition
CREATE TABLE `found_video_cover_image_ocr_count` (
`id` bigint NOT NULL AUTO_INCREMENT,
`video_id` int NOT NULL,
`search_count` int NOT NULL,
`created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `foundedvideocoverimageocrcounttable_video_id` (`video_id`)
) ENGINE=InnoDB AUTO_INCREMENT=76871889 DEFAULT CHARSET=utf8mb3;
執行的部分程式碼
class TaiSanTaskSubmitService:
name = 'taisan_task_ocr_service'
def start_ocr(self, parse_result_json: str):
# logger.debug(parse_result_json)
parse_result = ParseResult.model_validate_json(parse_result_json)
video_id: int = parse_result.extra['video_id']
fvcisc, fvcisc_created = FoundedVideoCoverImageOcrCountTable.get_or_create(video_id=video_id)
fvcisc: FoundedVideoCoverImageOcrCountTable
if fvcisc.search_count >= 1 and timedelta_seconds(get_utc_now_timestamp(), fvcisc.updated_at) < 4*HOUR:
logger.warning(f'當前圖片重試次數過多,拋棄 video_id: {video_id}')
fvcisc.search_count += 1
fvcisc.save()
return
很顯然問題出現在 FoundedVideoCoverImageOcrCountTable.get_or_create(video_id=video_id)
階段
讓我感到困惑的是二次讀取的時候,為什麼會出現 WHERE (
t1.
video_id = %s) LIMIT %s OFFSET %s Params: [2397088498, 1, 0]
?這和 peewee.IntegrityError: (1062, "Duplicate entry '2147483647' for key 'found_video_cover_image_ocr_count.foundedvideocoverimageocrcounttable_video_id'")
的 2147483647 對應不起來。我不知道為什麼會出現 2397088498 這個值?
我一開始覺得是 peewee 出問題了,然後提了 issues 給 peewee 的作者
在作者回復之前,我自己也開始了排查,最後發現的原因是
video_id 的長度超過 int32 ,所以出現了溢位迴圈導致的
至於為什麼會超出 int32 ,是因為 video 表的 id 用的是 int64(即 mysql 的 bigint)
但是 found_video_cover_image_ocr_count 表的 video_id 不小心設定為 int 了