シャドーイング練習: Distributed Transactions Explained: 2 Phase Commit vs Saga Pattern - YouTubeで英語スピーキングを学ぶ

⏸ 一時停止中

When you're first building an application, transactions are easy.

再生速度:

277 文

文が短すぎたり長すぎる場合は、Editをタップして調整してください。

When you're first building an application, transactions are easy.

You have a database, and when a customer places an order,

you wrap the whole thing in the transaction.

Charge their card, reserve the inventory,

record a ledger entry for accounting.

If any of those rights fails,

the database rolls everything back automatically,

and it's just like nothing happened.

You probably don't even think about it much.

Your database gives you what are called asset guarantees,

which basically means two things that matter here.

First, atomicity.

Either all three of those rights happen together,

or none of them do.

There's no world where the card gets charged,

but the inventory doesn't get reserved.

Second is isolation.

While that transaction is in progress,

no other part of your system can see the half-finished state.

Another query checking the customer's balance won't see a charge for an order that hasn't fully processed yet.

The database handles all of this behind the scenes,

and you just write your SQL and move on.

But when your application grows,

you start to get more traffic, more data, more writes.

And eventually, that single database starts hitting its limits.

So you do what everyone does at this point.

You split things up.

Maybe you shard the database to spread write load across multiple machines,

or maybe you break up your monolith into microservices,

where each service now owns its own database.

The specifics can vary, but the result is the same.

Your data now lives on multiple independent machines instead of one.

And at this point, everything changes.

The payment flow that used to be one transaction against one database

is now three completely separate operations against three separate databases on three separate machines.

The card gets charged to the payment database,

inventory gets reserved in the inventory database,

and that ledger entry gets recorded in the accounting database.

You can't wrap a transaction across these independent databases because they don't know about each other.

So if the card charge commits,

but then the inventory reservation fails because the item is out of stock,

There's no database-level rollback that can undo that charge.

It's already committed and a completely different database on a completely different machine.

When you're processing thousands of transactions a second across distributed infrastructure,

partial failures like these aren't edge cases,

they actually become pretty routine.

Now this whole class of problem is what's called a distributed transaction.

A single logical operation that needs to span multiple independent databases or services,

where all the steps need to either succeed together or be cleaned up when something goes wrong.

The textbooks give us two approaches to distributed transactions,

two-phase commit and a SOGA pattern.

In practice, the industry has overwhelmingly chosen one over the other,

and understanding why will save you a lot of pain.

The two-phase commit is the classic academic solution to distributed transactions.

The idea is to introduce a new component called a coordinator whose entire job is to make sure

that all participants in a transaction agree on the outcome before any make their changes permanent.

It works in two phases,

which is where the name comes from of course.

In the first phase, called the prepare phase,

the coordinator sends a message to every participant asking,

can you commit this transaction?

Each participating database then does the actual work.

It processes the request, durably records the changes so that nothing is lost if it crashes,

and locks the affected rows so that no other transactions can modify them in the meantime.

Then it responds to the coordinator with either yes,

I'm ready to commit, or no, something went wrong.

If any single participant votes no,

the coordinator tells everyone to abort and release their locks.

If every participant votes yes on the other hand,

then the coordinator moves to phase 2.

It sends a commit message to everyone.

Each participant makes its changes permanent and releases those locks,

and the transaction is now complete.

What this gives you is strong consistency,

the same guarantee you had with a single database.

Every participant agrees on the outcome before anything is finalized,

so there's no window where the system is in a partial or inconsistent state.

On paper, this is exactly what you want,

but the problems show up when you try to run this in production.

The fundamental problem with two-phase commit or 2PC is that it's a blocking protocol,

and blocking in a distributed system is dangerous because you're now dependent on multiple machines all staying healthy at the same time.

I want you to picture this.

The coordinator collects all three yes votes from the participants,

but then it crashes.

Right there after collecting the votes,

but before it gets a chance to send the commit decision.

Now the participants are all stuck.

They're all sitting there with locks held on the rows they prepared and they have no idea what to do next.

They can't go ahead and commit on their own,

because maybe the coordinator was about to tell them to abort.

They can't abort on their own either,

because maybe the coordinator was about to tell them to commit and the other participants already went through with it.

So they just wait.

And every other transaction in your system that needs to touch any of those locked rows is now blocked too,

waiting for the locks that nobody can release.

Crashes aren't even the only problem with 2PC.

A single slow participant holds up the entire transaction.

So if the ledger service,

for example, were to take 10 seconds to respond to the prepared message,

100

the card service

101

and the inventory service are both just sitting there with their locks held for those full 10 seconds doing nothing.

102

That means the entire system moves at the speed of the slowest participant.

103

And if a network partition means the coordinator can't reach the participant at all,

104

there's no safe default.

105

It can't tell whether the message got through or not.

106

This is why almost nobody uses two-phase commit across services in productions.

107

Pat Helland wrote a really influential paper called Life Beyond Distributed Transactions.

108

In that paper, he argues exactly this point.

109

Distributed transactions across autonomous services don't work at internet scale.

110

The industry took this lesson to heart.

111

2PC does exist in production,

112

but only inside distributed databases like Google Spanner or YugoBiteDB,

113

where the coordinator and the participants are tightly coupled within the same system.

114

This is where the database handles the complexity internally so that you as the caller don't have to.

115

But across independent services with different deployment schedules and different failure characteristics,

116

that's where it all falls apart.

117

So what should you do instead?

118

Well, when companies need to coordinate work across multiple services,

119

the saga pattern is what they reach for.

120

Uber, Netflix, Amazon, DoorDash, they all use this pattern in production.

121

Sagas start from a very different assumption than 2PC.

122

You don't actually need all or nothing atomicity spanning multiple services.

123

You just need a way to eventually get to a consistent state,

124

even when things go wrong along the way.

125

Instead of coordinating one big distributed transaction with locks held across services,

126

you break the work into a chain of independent local transactions.

127

So each service does its piece of work and commits to its own database on its own terms.

128

When something fails further down the chain,

129

there's no way to roll back to earlier steps since they've already been committed to that separate database.

130

So instead, you run what is called a compensating action.

131

These are business-level undoes that reverse the effects of what already happened.

132

So a refund instead of a rollback,

133

a cancellation instead of an abort.

134

Something needs to detect that failure and trigger those compensations

135

and how that works is a key design decision we'll get into in just a moment.

136

But the trade-off is that instead of getting the strong consistency you get with 2PC,

137

Saga gives you what's called eventual consistency.

138

The system might be temporarily in an inconsistent state while compensations are running.

139

A customer might briefly see a charge on their card before the refund goes through,

140

but it always converges to a correct state and nothing is blocked while that convergence is happening.

141

Other transactions can keep flowing normally that entire time.

142

Now there are two ways to implement sagas,

143

and the choice between them determines who is responsible for detecting failures and running compensations.

144

The first approach is called choreography,

145

and it's the decentralized option.

146

It uses a publish subscribe pattern where each service broadcasts an event when it finishes its work,

147

and any interested service can pick it up and react.

148

So the card service charges the card and then publishes a card charged event.

149

The inventory service is listening for that event,

150

so when it arrives, it reserves the stock and publishes an inventory reserved event.

151

The ledger service picks that up and then records the entry.

152

If something fails, the failing service publishes a failure event,

153

and the upstream services react by running their own compensations.

154

This works well when you have a simple flow with just two or three steps.

155

But once you get to five or six services all publishing and reacting to each other's events,

156

figuring out the current state of any given transaction becomes really difficult.

157

Where exactly did it fail?

158

Which compensating actions have already run?

159

Did the refund actually go through?

160

Without a central place tracking all of this,

161

you end up digging through logs across a dozen different services,

162

trying to piece together what happened.

163

The second approach is called orchestration,

164

and it's what most teams end up using once they reach any serious scale.

165

Instead of services reacting to each other's events,

166

you have a dedicated orchestrator service that controls the entire flow.

167

It tells each service what to do one step at a time.

168

Card service, charge the card.

169

It waits for confirmation.

170

Inventory service, reserve the stock.

171

Wait for confirmation.

172

If something fails, the orchestrator knows exactly what steps failed and can run the right compensating action in the right order.

173

Tools like Temporal, which was created by the engineer behind Uber's Cadence workflow engine,

174

or AWS Step Functions, are purpose-built for exactly this kind of orchestration.

175

It's what we use here at Hello Interview to coordinate our payment and fulfillment flows,

176

and it's the pattern we'd recommend that most teams use.

177

The important difference between Saga orchestration and 2PC coordinators is what happens when it crashes.

178

The orchestrator doesn't leave locks dangling across your system.

179

It's durable.

180

When it restarts, it reads its own state from a database and it picks up exactly where it left off.

181

So no rows are locked in the meantime and no other transactions are blocked during that recovery period.

182

Saga solved the blocking problem that makes 2PC impractical,

183

but they introduce a different kind of complexity.

184

the compensating action themselves.

185

The idea of just undo the previous step sounds clean,

186

but in practice, it gets pretty messy quickly.

187

Say the card charge went through and committed,

188

and then the inventory reservation fails because the item is out of stock.

189

The compensation is to issue a refund on the card,

190

but unlike a database rollback,

191

that refund is visible to the customer.

192

They see an actual charge show up on their card,

193

and then a few seconds later, they see a refund.

194

Their bank might even send them a push notification for each one.

195

It works correctly, but it's not the invisible cleanup that a database rollback gives you.

196

And some actions are genuinely hard to undo at all.

197

If one of the steps in your flow is to send a confirmation email,

198

you can't unsend that email.

199

If you fired a webhook to a third-party payment system,

200

you can send a follow-up cancellation,

201

but you can't guarantee that they'll process it in time or at all.

202

Each step in your saga needs a well-defined compensating action,

203

and some of those compensations are inherently imperfect.

204

On top of that, compensating actions can themselves fail.

205

What if the refund API is down when you need to issue that refund?

206

Now you need retry logic for your compensations.

207

And if you're retrying a refund,

208

you need the operation to be idempotent,

209

meaning it produces the same result whether you run it once or 10 times,

210

so that a retry doesn't accidentally refund the customer twice.

211

You end up needing the same level of reliability engineering for your failure handling as you do for your happy paths.

212

Even with solid compensation logic in place,

213

there's one more failure mode that catches teams off guard.

214

When your card service finishes charging the card,

215

it needs to do two things.

216

Save the result to its own database

217

and publish an event to a message broker so that the next service in the chain knows it's time to proceed.

218

The problem is that those are two completely separate writes to two completely separate systems.

219

This is called the dual write problem.

220

If the database write succeeds but the event published fails,

221

the next step in your saga never gets triggered and the whole flow stalls.

222

If the event publishes successfully but the database write fails,

223

now downstream services are reacting to something that didn't actually happen in your database.

224

You can fix this with something called a transactional outbox pattern.

225

Instead of writing to your database and publishing an event as two separate operations,

226

you write both your data and the outgoing event into the same database via a single local transaction.

227

The event goes into a special outboxed table right alongside your regular data at right,

228

so they either both commit or neither do.

229

Then, a separate background process watches the outboxed table and publishes those events to your message broker.

230

That background process can use change data capture,

231

which means it tails the database's own transaction logs to pick up new entries,

232

or it can simply pull the outboxed table on a regular interval.

233

Before you reach for either of these patterns,

234

the first question to ask yourself is whether you actually need a distributed transaction at all.

235

If you can design your service boundaries so that the data that transacts together lives in the same database, do that.

236

This is easier to get right up front than to try to retrofit later,

237

so it's worth thinking about early.

238

For example, move the inventory and ledger table into the same database that has the payments table,

239

if they always need to be updated together.

240

This way a local database transaction is simpler,

241

faster, and more reliable than any distributed alternative,

242

and this is always the best answer when you can make it work.

243

If you genuinely can't avoid distributing the transaction across services,

244

you're going to use a saga.

245

That's not really a debate in the industry anymore.

246

The question is which flavor of saga makes sense for your situation.

247

Choreography is usually where teams start,

248

and for simple flows it works great.

249

If you have three or four steps,

250

the services are truly independent,

251

and you don't need centralized visibility into where each transaction stands,

252

choreography keeps things simple.

253

Something like an e-commerce notification system where an order is placed,

254

event triggers, an email or push notification independently,

255

this is a natural fit.

256

Most teams eventually outgrow it as their flows get more complex,

257

but there's no reason to over-engineer from day one.

258

For anything more complex than that,

259

orchestration is the way to go.

260

Complex flows with branching logic,

261

flows where you need to see exactly where a transaction is stuck,

262

flows where the compensation logic is tricky

263

and you want it defined in one clear place rather than scattered across half a dozen services.

264

Most teams end up here,

265

and tools like temporal or AWS step functions make it very practical to implement.

266

One last note, if eventual consistency truly is not acceptable for a particular piece of your system,

267

consider whether that data can live in a single distributed database like Spanner

268

or YugaByteDB that handle that strong consistency internally for you.

269

That's a very different thing from trying to build 2PC yourself across independent services.

270

At the end of the day,

271

the pattern that you'll see at most companies operating at scale is saga with orchestration,

272

independent operations at every step so the retries are always safe,

273

and a transactional outbox to make sure events are as reliable as database rights.

274

It means accepting eventual consistency,

275

but that's a trade-off the industry has made deliberately.

276

And it's the architecture that Uber,

277

Netflix, and Amazon actually run in production today.

シャドーイングとは？英語上達に効果的な理由

シャドーイング（Shadowing）は、もともとプロの通訳者養成プログラムで開発された言語学習法で、多言語習得者として知られるDr. Alexander Arguelles によって広く普及されました。方法はシンプルですが非常に効果的：ネイティブスピーカーの英語を聞きながら、1〜2秒の遅延で声に出してすぐに繰り返す——まるで「影（shadow）」のように話者を追いかけます。文法ドリルや受動的なリスニングと異なり、シャドーイングは脳と口の筋肉が同時にリアルタイムで英語を処理・再現することを強制します。研究により、発音精度、抑揚、リズム、連音、リスニング力、そして会話の流暢さが大幅に向上することが確認されています。IELTSスピーキング対策や自然な英語コミュニケーションを目指す方に特におすすめです。

ShadowingEnglish.com – 英語シャドーイング練習

シャドーイング（Shadowing）でネイティブ英語を体に染み込ませよう。YouTube動画を一文ずつ聞いて声に出して繰り返し、本物の発音と流暢さを身につける。

シャドーイング練習: Distributed Transactions Explained: 2 Phase Commit vs Saga Pattern - YouTubeで英語スピーキングを学ぶ

文脈と背景

日常コミュニケーションのためのトップ5フレーズ

ステップ・バイ・ステップのシャドーイングガイド

シャドーイングとは？英語上達に効果的な理由

シャドーイング練習: Distributed Transactions Explained: 2 Phase Commit vs Saga Pattern - YouTubeで英語スピーキングを学ぶ

アプリをダウンロード

文脈と背景

日常コミュニケーションのためのトップ5フレーズ

ステップ・バイ・ステップのシャドーイングガイド

シャドーイングとは？英語上達に効果的な理由