木鸟杂记

大规模数据系统

'Step by Step: Dissecting the Hardest Part of Transactions — Isolation'

It’s been a long time since my last post. After spending a month on transaction-related materials and sharing, I’m using this article to wrap things up today. Hopefully it can provide some inspiration.

When it comes to database transactions, most people’s first reaction is ACID. However, these properties differ in importance and complexity. Among them, the hardest to understand is Isolation (I). One important reason for this confusion is that historically, the definitions and implementations of isolation levels have been tightly coupled, and different vendors’ terminology and implementations often don’t match reality. This article attempts to sort out several common isolation levels from the perspective of locks, using relatively imprecise descriptions to build an intuitive and perceptual understanding.

Author: Woodpecker’s Miscellany https://www.qtmuniao.com/2022/07/07/db-isolation Please credit the source when reprinting

Background

Databases attempt to provide users with an isolation guarantee through transaction isolation, thereby reducing programming complexity on the application side. The strongest isolation level, Serializability, can provide users with a guarantee: at any moment, it can be assumed that only one transaction is running.

When first learning, the progressive relationship among several isolation levels is often difficult to understand because a suitable perspective hasn’t been found (reiterating the point: there’s no problem that can’t be understood, only wrong ways to approach it). My experience is that sorting out several isolation levels from an implementation perspective will establish a consistent understanding.

The four isolation levels defined by ANSI SQL, from weakest to strongest, are: Read Uncommitted, Read Committed, Repeatable Read, and Serializability. Their progressive relationship can be understood from the perspective of implementing transactions using locks.

Through the Lens of Locks

The strongest isolation level — Serializability — can be understood as a single global mutex lock: each transaction acquires the lock at startup and releases it at the end (commit or rollback). But this isolation level undoubtedly has the worst performance. The other weaker isolation levels can be understood as improving performance by reducing lock granularity (predicate lock → object lock), shortening lock duration (released after transaction ends → released immediately after use), and lowering lock strength (mutex lock → shared lock), thereby sacrificing consistency for performance.

From another perspective, considering lock strength, we have Mutex Locks (also known as write locks, exclusive locks) and Shared Locks (also known as read locks); considering lock duration, we have Long Period Locks (acquired at transaction start, released at transaction end) and Short Period Locks (acquired when accessing data, released immediately after use); considering lock granularity, we have Object Locks (Row Lock in relational databases, locking a single row) and Predicate Locks (locking a set of data).

KV Model

Speaking of data sets, since databases are implemented at the storage layer based on the KV model — for example, Pages in B+ trees and Blocks in LSM-Trees are both sets of KV entries. In relational databases, if stored by row, the Key of a single KV entry is usually the primary key, and the Value is usually a row of data. Therefore, in what follows, transaction data modifications can all be understood as:

  1. Single object. Can be understood as a single KV entry.
  2. A set of objects. For example, the expression where x > 5 and y < 6 will determine a subset of KV entries.

Therefore, the data granularity discussed below is all based on the KV model.

Weak Isolation Levels and Corresponding Problems

The isolation level with the best performance is using no locks at all. However, this leads to Dirty Read (one transaction reads another transaction’s uncommitted changes) and Dirty Write (one transaction overwrites another transaction’s uncommitted changes). What consequences can these two phenomena cause? Here’s an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
Dirty Read:
Initial x=4
Txn1: --W(x=5)--->rollback
Txn2: ----->R(x=5)--->commit

Txn2 incorrectly read x=5, but this value should never have existed. The reason is that Txn2 read an uncommitted value from Txn1.

Dirty Write:
Initial x=4, y=4
Txn1: --W(x=5)--------W(y=5)--->commit
Txn2: ----W(x=6)--W(y=6)->commit
Final result is x=6, y=5. But if the two transactions were executed sequentially, the correct result should be that both x and y are either 5 or 6. The cause of this inconsistency is that Txn1's uncommitted change x=5 was overwritten, and Txn2's uncommitted change y=6 was also overwritten.

To avoid dirty writes, we can add long-period write locks to objects to be modified, but do not lock when reading data. This isolation level is called Read Uncommitted (RU, Read Uncommitted).
But dirty reads can still occur at this point:

1
2
3
4
5
6
7
Dirty Read:
Initial x = 4
Txn1: ----WL(x),W(x=5)-->rollback, UL(x)
Txn2: ---------------->R(x=5)--->commit

Note: RL: Read Lock; WL: Write Lock; UL: Unlock

To avoid dirty reads, we can add short-period read locks to objects being read. This isolation level is Read Committed (RC, Read Committed).
However, because these are short-period read locks, if a transaction reads x twice, and in between, another transaction that modifies x commits, the two reads will be inconsistent. This problem is called Non-Repeatable Read.

1
2
3
4
Initial x = 4
Txn1: -RL(x),R(x=4),UL(x)-------------------------->RL(x),R(x=5),UL(x)--->commit
Txn2: -------------------WL(x),W(x=5)-->commit,UL(x)

To solve this problem, long-period read locks are also needed when reading data. The isolation level that solves non-repeatable reads is called Repeatable Read (RR, Repeatable Read).

Up to the Repeatable Read level, locks are applied to individual data items. Under this isolation level, if a transaction performs two range queries — for example, executing select count(x) where x > 5 twice; and another transaction inserts a row with x = 6 between the two queries — the results of the two reads will differ. This phenomenon is called Phantom Read. To solve the phantom read problem, long-period predicate read locks are needed for range query statements. The isolation level that solves phantom reads is called Serializability.

It can be seen that at the Serializable isolation level, both read locks and write locks are held until the transaction ends. This locking approach is called Two-Phase Locking (2PL, two Phase Lock).

Two-phase locking is divided into two phases: in the first phase (during transaction execution), only acquiring locks is allowed; in the second phase (at transaction commit), only releasing locks is allowed. Of course, this is actually Strict Two-Phase Locking (SS2PL). Interested readers can look up the difference between it and 2PL on their own.

Generalizing

From a more abstract perspective, each transaction during execution needs to touch a subset of data objects; for this data subset, it may read or write. If two concurrent transactions touch disjoint data subsets, no special handling is needed, and they can execute concurrently safely.

If the data subsets have intersections, dependency relationships between transactions will form. Abstracting transactions as nodes and dependency relationships as directed edges according to temporal order, we can construct a Directed Acyclic Graph.

The ideal abstraction that transactions provide externally is: all transactions can collapse into a single point on the timeline (completed instantaneously, i.e., A in ACID, Atomicity). In this way, all transactions can be topologically sorted on the time axis, achieving serializability.

But in actual execution, transactions take some time to complete, i.e., a time segment. Transactions whose execution times overlap have various concurrency and isolation (or visibility) issues. So how do we make physically concurrent transactions look logically sequential and atomic? The answer is: just maintain certain invariants before and after transaction execution.

These invariants are the C in ACID, Consistency. From the application layer perspective, this can also be called Causality. That is, the dependencies between transactions that have intersecting data must be handled correctly. For example, if Transaction 1 reads some content to make a decision and then modifies data, the content it depends on cannot be modified by concurrent Transaction 2, otherwise the decision would be flawed. A more practical example can be found in the doctor on-call problem in Chapter 7 of DDIA.

We typically use two approaches to maintain these invariants:

  1. Pessimistic approach. Locking, so that the same data subset cannot be accessed by multiple transactions simultaneously.
  2. Optimistic approach. MVCC, where each data object stores multiple versions, and each version is immutable; modifying an object appends a new version. Each transaction can read and write confidently, and retry upon detecting conflicts at commit time.

Many details are omitted here. Those interested can check out the video I recorded.

The Last Thing

The four isolation levels mentioned above do not cover another common isolation level — Snapshot Isolation. Because it leads to the optimistic implementation family mentioned above — MVCC. Since they belong to different implementation philosophies, Snapshot Isolation and Repeatable Read have a partial order relationship on the spectrum of isolation level strength; we cannot say which one is stronger. I’ll expand on this another time.

This article originated from a summary of my sharing on Chapter 7 of DDIA this month. Due to space constraints, many concepts are not elaborated in depth. Those interested can take a look at my reading notes, which could also be called a translation, because it’s really too long — after all, I recorded three videos in total (accessible by reading the original text).


我是青藤木鸟,一个喜欢摄影、专注大规模数据系统的程序员,欢迎关注我的公众号:“木鸟杂记”,有更多的分布式系统、存储和数据库相关的文章,欢迎关注。 关注公众号后,回复“资料”可以获取我总结一份分布式数据库学习资料。 回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群,可以添加我的微信号:qtmuniao,我拉你入群。加我时记得备注:“分布式系统群”。 另外,如果你不想加群,还有一个分布式系统和数据库的论坛(点这里),欢迎来玩耍。

wx-distributed-system-s.jpg