- Can your application change from pessimistic to optimistic concurrency control? Optimistic can be a problem when concurrent transactions attempt to modify the same row. If this is frequent then rollbacks will be frequent. Pessimistic handles this case by making transactions wait. As long as transactions are short then commits will eventually get done for all transactions. I am curious about techniques used with optimistic concurrency control to avoid pushing the problem back to the application programmers.
- Can your application tolerate higher latency on commit? Sync replication requires at least one round trip between replicas. Will the replicas be far apart, as in 100 milliseconds apart? Or are they in the same datacenter or geographic region?
- Are you willing to operate a system that is more complex in the steady state (sync replication) or only complex when things fail (async replication)?
Thursday, September 27, 2012
All things being equal, I prefer to avoid failover
But all things are not equal. A solution that avoids failover because synchronous replication is done between multiple masters can be awesome, but it also behaves differently and the awesomeness comes at a cost. For some users the benefit far outweighs the cost and Galera is amazing. I can't wait to read more about production deployments of it.
Subscribe to:
Post Comments (Atom)


Hi Mark, firstly, thanks your great blog.
ReplyDeleteWe plan to migrate to a Galera based production environment in the next month where two Apache/PHP/HAProxy servers will be load balancing schema/read/write database requests across 3 Percona XtraDB Cluster servers.
This is a huge environment change for us coming from several fully self contained servers and migrating to a load balanced and clustered network arrangement in an effort to achieve our high availability goals.
I must have built and killed 300+ clustered database servers over the past fortnight to understand how it all works and interconnects. We only needed to add/change about 20 lines of code in our application to manage rollbacks to handle issues caused by optimistic locking.
We're lucky in the fact that performance is not a massive issue for us as we have low traffic (serving a business application that is not publicly available). We also have the benefit of low latency with all of our servers residing in the same data centre.
Galera clustering is definitely complex. I wholeheartedly agree with you there. However I think its complexity lies in the fact that there not a lot of people currently using this stuff in production and so the collective knowledge and documentation is left a little lacking.
It's taken a long time to build all of scripts we need to automatically install, configure and tune these clustered servers. We hit soooo many issues that were not well documented (the issue or the workarounds). But having been through the process, and learning what works in our environment and what doesn't, it feels nowhere near as complex as when we started out. I plan to write a blog entry on our findings and setup techniques when I get an opportunity to share some of our new knowledge.
Personally I'd much rather complexity in a steady state system. When things fail, time is critical, and consequences are exponential, so people naturally go into emergency mode and clear thinking is often in short supply. Simplicity is crucial at these times.
We look forward to taking Galera clustering into production and no doubt will have many more lessons to learn during the process.
Keep up the good work on your blog :)
Thank you for detailed reply. One thing that makes MySQL great is that we gets many great posts from users. We need more of that from Galera users.
ReplyDeleteI read about the Paxos/Chubby/Megastore work from Google. So my limited understanding of Galera is usually mapped back to what I know of them and I am sure I know too little about Galera. Perhaps Henrik will switch from telling us that we should use it to describing how it behaves. He is a great writer -- http://openlife.cc.
The next step in the non-failover direction is to extend optimistic concurrency control to full-on async replication. I'm still hoping to show some progress on that problem with Tungsten in the near future. However, the more I work on replication the more respect I have for the problems. The Galera team deserve a lot of respect for their doggedness in getting synchronous multi-master to work.
ReplyDeleteMark,
ReplyDeleteCan you say 3-5 most important points you would like more about Galera? I will make it covered.
#1 - no failover required
ReplyDelete#2 - works with InnoDB
#3 - open source
Thanks for the compliment.
ReplyDeleteUnfortunately my work role currently is such that I'm not often involved in those pesky little details about running Galera in production - this explains the lack of blogging about such things. I think the best source for more detailed Galera knowledge currently are presentations that I post on my blog openlife.cc. There is one video from Sheeri, others are just slides. Also the webinars on Percona.TV are good - in fact I learned a few things myself from there.
When it comes to the problem of applications not being prepared to live with optimistic locking, I continue to be surprised at the lack of reported problems in this area (again, not a very good topic to blog about...) I would expect most MySQL based apps to have a problem here, but both in my own experience and reading the Codership mailing list, and talking to the Codership guys themselves, this is not a problem that anyone seems to be having. Maybe it's just that people don't notice it if one in ten thousand transactions mysteriously fail?
Your response is too vague for me. The docs I found are sparse. Maybe you could help them in that area. I think that the types of questions I have will be shared with many potential Galera customers so better documentation might help with getting more users.
ReplyDeleteThe SeveralNines FAQ was better than others (http://www.severalnines.com/clustercontrol-mysql-galera-tutorial). So I guess the answer is that when all writes are done to one master then write conflicts won't cause frequent rollbacks as normal InnoDB behavior will make all but the first transaction wait for the first. However, response time for those transactions will suffer based on the round trip times to other servers in the cluster. I assume that if roundtrip time between cluster nodes is 100ms, then there can be at most 10 commits/second to the same row.
And when conflicting writes are done concurrently on different masters then there will be rollbacks in addition to latency from network round trips.
There has been quite a lot of discussion about how to deal with multi-master conflicts lately, especially in the recent PerconaLive NY conference.
ReplyDeleteThere are three areas of development to work on: 1. Diagnose 2. Recover 3. Prevent, and we are working on each of them.
For diagnosis, we will add configuration to get log messages for each multi-master conflict. This is entry level diagnostics instrumentation, and more profound solution will probably be integrated with MySQL 5.6 performance schema.
For recovering, we have just enhanced the method of automatic retrying of multi-master conflicted autocommit query. We will move forward to enable retrying of last statement in multi-statement transaction. (general MST retrying is not safe and will probably never be implemented)
For preventing, we have some ideas in design phase about how to deal with hot-spot work loads. I expect we can publish something here before year end.
A way we've avoided the issue of database failover is to proxy all database connections and at the same time have all databases in a full-mesh, master-master replication network.
ReplyDeleteApplications are written so that each function requests a new database connection each time it wants to interact with a database. If an error occurs, it simply moves onto the next database available in the pool.
Since all databases are fully replicated between each other, if an application were to migrate between databases during its lifetime, it would be none the wiser and everything Just Works (tm).
Getting master-master to just work is the hard part. Are you willing to share your solution for conflict resolution/detection/avoidance? MySQL has an awesome solution if your workload is limited to inserts. Otherwise this is hard to do with MySQL.
ReplyDeleteConflict resolution is hard especially down at the database level so we solve this by pushing tricky bits up into the application layer.
ReplyDeleteFor example when creating a new user, the username must be globally unique. To avoid the race condition of the username being inserted by multiple masters and last insert wins, the critical section is simply a database connection to a "mutex" database (which is just an alias to a single designated database within the database pool). Any inserts into the users table after the first will fail with a primary key constraint.
The only problem with our "mutex" database alias is that it does create a Single Point of Failure. Our data and our workload aren't conflict heavy so it's mostly contained in small pieces of code. Everything else just uses a "normal" database connection for most of its workload.
If your workload is conflict heavy and you don't want a SPOT, what you want to do is create some way of performing critical sections over redundant machines. Maybe this is something where synchronous replication across a "mutex database pool" could help, or possibly some configuration of Zookeeper or HBase.
As for detecting row conflicts we're using pt-table-checksum. +1 to Percona.
One interesting point is that pessimistic locking has its own set of requirements - one of the biggest gotchas being deadlocks (the classic example of two threads modifying the same two rows in transactions, but in different orders, so each holds a lock on the row the other needs t complete and release its lock). The bar to beat shouldn't be to completly avoid pushing the problem back to developers, just not to push any more problems than the current model already does.
ReplyDeleteFor the workloads I care about, I can structure transactions to avoid those deadlocks with pessimistic locking. I cannot do that for optimistic locking.
ReplyDelete