In-depth analysis of the comparison between AT and XA of distributed transactions

葉東富發表於2022-11-24

The AT transaction mode is the transaction mode mainly promoted by the open source seata of Alibaba. This article will explain the principle of AT in detail and compare it with the XA mode.

principle

From the principle point of view, AT has many similarities with the design of XA. XA is a two-phase commit implemented at the database level, and AT is a two-phase commit implemented at the application/driver layer. It is recommended that you read this article after understanding XA -related knowledge, so that you can master the principle and design of AT faster and better.

The roles of AT are divided into 3 like XA, but they have different names. Please pay attention to distinguish them:

  • RM resource manager, a business service, responsible for the management of the local database, consistent with the RM in XA
  • TC transaction coordinator, a Seata server, is responsible for the state management of global transactions, and is responsible for coordinating the execution of each transaction branch, which is equivalent to TM in XA
  • TM transaction manager is a business service, responsible for the initiation of global transactions, equivalent to the APP in XA

The first phase of AT is prepare, which does the following things during this phase:

  1. On the RM side, the user opens a local transaction
  2. On the RM side, every time a user modifies business data, assuming an update statement, AT will do the following:

    1. According to the update conditions, query the data before the modification, which is called BeforeImage
    2. Execute the update statement, and query the modified data according to the primary key in BeforeImage, which is called AfterImage
    3. Save BeforeImage and AfterImage to an undolog table
    4. Record the primary key and table name in BeforeImage, which is called lockKey, for subsequent use
  3. On the RM side, when a user submits a local transaction, AT will do the following:

    1. Register all the lockKeys recorded in 2.4 to the TC (that is, the transaction manager seata)
    2. The registration process in 3.1 will check whether there is a conflicting primary key + table name in the TC. If there is a conflict, the AT will sleep and wait and try again. If there is no conflict, it will be saved.
    3. 3.1 After successful completion, submit the local transaction

If all branches in the first stage of AT are error-free, then the commit of the second stage will be performed, and AT will do the following:

  1. TC will delete the lockKey related to the current global transaction
  2. TC notifies all business services related to the current global transaction that the global transaction has succeeded and the data stored in the undolog can be deleted
  3. After RM receives the notification, delete the data in the undolog

If there is a branch error in the first stage of AT, then the rollback of the second stage will be performed, and AT will do the following:

  1. TC notifies all business services related to the current global transaction, informs that the global transaction fails, and performs rollback
  2. After the RM receives the notification, it rolls back the modification of the local data. The rollback principle is as follows:

    1. Take out BeforeImage and AfterImage before and after modification from undolog
    2. If AfterImage is consistent with the current record in the database, then use the data in BeforeImage to overwrite the current record
    3. If AfterImage is inconsistent with the current record in the database, then a dirty rollback has occurred at this time, and manual intervention is required at this time.
  3. TC waits for the branches of the global transaction to be rolled back, and the TC deletes the lockKey of this global transaction

problem analysis

A prominent problem with AT mode is that the dirty rollback of 2.3 in rollback is unavoidable. The following steps can trigger this dirty rollback:

  1. Global transaction g1 modifies data row A1 v1 -> v2
  2. Another service will modify data row A1 v2 -> v3
  3. Global transaction g1 is rolled back, and it is found that the current data of data row A1 is v3, which is not equal to v2 in AfterImage, and the rollback fails

Once this dirty rollback occurs, the distributed transaction framework has no way to ensure data consistency, and must be manually intervened. To avoid dirty rollback, all write access to this table needs to be treated with special treatment (in Seata's Java client, the GlobalLock annotation needs to be added). This constraint is very difficult to guarantee for a complex system of a certain scale.

AT vs XA

The above dirty rollback problem does not occur in XA transactions, because XA transactions are implemented at the database level. When another service modifies data row A1, it will be blocked due to row locks, which is completely different from the performance of ordinary transactions. same, no problem.

In addition, XA will not have dirty reads, while AT will have dirty reads. Consider the following execution steps under AT:

  1. Global transaction g1 modifies data row A1 v1 -> v2
  2. Another service will read data row A1, get data v2
  3. Global transaction g1 is rolled back, changing data row A1 back to v2 -> v1

The data read in step 2 is v2, which is an intermediate state data. In Seata's manual, although there are some ways to avoid AT mode, it involves annotations and sql rewriting, which is not elegant. In XA mode, since xa commit has not been performed yet, the data read in step 2 according to MVCC is still v1, without the trouble of dirty reads in AT mode.

performance analysis

From the detailed steps of the principle, the performance of XA transactions is higher than that of AT. The analysis is as follows:

In AT mode, on the RM side, the SQL executed in the above principle process is as follows:

  1. open transaction
  2. Query BeforeImage data
  3. execute update
  4. Query AfterImage data
  5. Insert BeforeImage, AfterImage into undolog
  6. commit transaction
  7. After the transaction is complete, delete BeforeImage and AfterImage

In XA mode, the SQL executed on the RM side is as follows:

  1. xa begin
  2. execute update
  3. xa end
  4. xa prepare
  5. xa commit

Comparing the two, the related open/commit transactions are required by both modes, and the performance difference is not significant. However, from the perspective of the executed DML operations, the number of SQL under AT is: 3 writes, 2 reads, which is much more than only one update under XA, so there will be a big gap in performance.

From the above theoretical analysis, the transaction performance of XA will be significantly higher than that of AT, which should be verified on the postgres database; and the mysql database, in the current 5.8 version, after xa prepare, the current connection needs to be disconnected before other connections can be connected. On the xa commit, there will be an overhead of re-creating the connection. For the final performance comparison, refer to the next section.

Performance measurement

The theoretical performance analysis was carried out above, and I also did performance measurement. For the detailed test process and result data, please refer to xa-at bench

The XA transaction implemented by dtm, in order to ensure that the XA transaction can be cleaned up correctly in extreme cases, will insert the sub-transaction barrier table in the business transaction, so there will be one more sql write than the above theoretical analysis.

We can see that the final result XA outperforms AT. If Mysql improves the implementation of XA in the future, and allows other connections to submit XA transactions without closing the current connection, then the performance of XA can be improved a lot.

Meaning of AT

In mysql version 5.6, there is a bug in the xa related API. If the current connection is disconnected after xa prepare, then the unfinished transaction of this connection will be automatically rolled back. Such bugs cause that the XA mode of mysql cannot guarantee the correctness, and in various application crashes, it may lead to data inconsistency. Therefore, AT has high application value in the use of mysql version 5.6 and lower.

In addition, the databases of some major manufacturers prohibit the use of XA transactions. In this specific scenario, it is reasonable to select the AT mode.

For other scenarios, it is recommended to prioritize XA transactions.

summary

The author has not read the complete source code of the AT mode. The above related principles are written based on reading the relevant materials and referring to the source code of seata-golang. If there are any inaccuracies in the text, I hope readers can help correct me

Welcome to https://github.com/dtm-labs/dtm and star support us

相關文章