blog

Let's talk about coding

Consistent Concurrency

This entry is part 3 of 4 in the series Code Design

In this part of the series, we want to explore how a single set of concurrency problems affects pretty much every programmer everywhere.

Background

Search the web, and you’ll find tons of people claiming — even in scientific publications — that multi-threaded programs are hard to write and debug. There are a few reasons for that, but it’s also a fairly misleading statement to make. Threads aren’t hard at all — it’s just several things running in parallel.

What’s hard is always sharing data between threads. If you design your algorithms in the right way, then this isn’t a problem. The canonical programming model to avoid this kind of issue is MapReduce, in which — slightly simplified — you partition your data into N blocks first, then use N threads in parallel to process a single block each, producing a single output each. You’ll notice that each thread has it’s own input and output, sharing nothing with other threads. Afterwards, you have N outputs, which you combine into a single result with some post-processing.

Threading isn’t hard at all; it’s sharing data between concurrent processes that’s hard. And in this post, we want to demonstrate how this affects even those programmers who stay away from threads.

Threads and Sharing

There are a bunch of problems with multiple threads accessing the same data, but the literature seems to focus mostly on algorithmic problems. For example, if you have a counter that is incremented N times by one thread, and decremented N times by another, at any given time the counter may have any value in the range of -N to N, rather than 0 to N as you might expect in code that schedules the increment and decrement operations more deterministically.

What the literature covers less deeply, but tends to treat more as a side topic, is how concurrent access to data is mostly a question of consistency. The important questions are, e.g.:

  1. If I read this data, is it internally consistent?
  2. If I write this data, how does that affect readers?

When you use threads, there are two basic methods for ensuring that both reading and writing yield consistent result.

Atomic Operations

An operation on data is atomic, when no other thread can access the data while the operation is not yet completed. To achieve this, most processors implement a compare and swap (CAS) operation, which overwrites some data if and only if it is currently equal to some given value:

  x = 1
  if x.cas(1, 2):
    # block is entered if x == 1; afterwards, x == 2
    # since before this, x was in fact 1, it is now 2
    pass
  if x.cas(1, 3):
    # block isn't entered, x remains 2
    pass

The main problem with CAS operations is that they only work on very small data types, such as integers. Compound data types and classes cannot be accessed like this by any CPU.

Mutexes

The other, more common approach in many cases, is to use mutual exclusion events. With them, you define a critical section that contains all the updates to some data during which you want to exclude all other threads from accessing the data.

  x = {} # some larger object
  m = Mutex()
  m.enter_critical_section()
  # do stuff with x
  m.leave_critical_section()

The interesting thing is how the two compare. At some fundamental level — perhaps in the operating system kernel — mutexes must be implemented somehow. It turns out, using a CAS operation, you can easily create a simple mutex yourself.

  x = {} # some larger object
  m = False # our mutex is a simple boolean flag
 
  # enter critical section
  while not m.cas(False, True):
    # could not set m to True here
    pass
  # m is now True
 
  # do stuff with x
 
  # leave critical section
  m = False

What’s happening here? As long as our mutex m is equal to True, the cas() function in our loop cannot succeed. Once it is False — as in our initial program state — the operation succeeds, and blocks out all other threads by the same method.

Read-/Write Mutexes

Mutexes are a fairly brute force method for handling concurrent access to data, and a refinement of them are read-/write mutexes. They stem from the realization that multiple threads reading the same data can never interfere with each other, but if a single thread attempts to write data, all other threads must be locked out.

I’ll leave as an exercise to the reader how to implement a read-/write mutex using a CAS operation.

Memory and CPU Caches

So now you know how to handle concurrent access to the memory location at which our object x was stored? Think again.

It turns out that reading from and writing to main memory is quite a slow thing for a CPU, mostly because memory — despite huge advances over the decades — is grindingly slow for a CPU. There are faster types of memory that can be built, but they’re more and more expensive, the faster they get.

As a result, CPUs implement one or more layers of CPU caches. Caches are made of fast(er) memory than the memory they cache; your CPU registers get data from L1 cache, L1 cache gets updated from L2 cache, L2 from L3, etc until LN gets updated from main memory. You don’t need to understand the exact mechanism behind how caches get updated; the important part for this article is to understand how caches interfere with CAS operations.

Let’s simplify things. Assume we only have CPU registers and main memory. Then, let’s go back to our CAS-based locking code, but assume our mutex is actually stored in a CPU register. For convenience’s sake, I’ve renamed it r0, and shortened the entire example:

  r0 = False
 
  # enter critical section
  while not r0.cas(False, True):
    pass
 
  # leave critical section
  r0 = False

So far, so good. But how many registers does your average CPU contain? Well, that depends very much on the CPU, but 32 is a fairly common number, with some CPUs sporting as many as 128. There aren’t all that many, compared to the gigabytes of memory your computer probably contains. To deal with that, compilers generate code which stores data in main memory, and only loads it into a register just before it’s needed. Let’s rewrite our example a little to make it look a lot more like it really behaves:

  r0 = False
  m = r0
 
  # enter critical section
  while True:
    r0 = m
    if r0.cas(False, True):
      m = r0
      break
 
  # leave critical section
  r0 = False
  m = r0

In this rewritten example, I’ve never tried to do anything with m (our main memory location) directly, except to set it to the value of r0. All other operations I did with r0, our register.

So what’s the problem here? There’s a new concurrency problem here, and it’s most easily illustrated in the while loop: in between reading the latest value of m into r0 and performing the CAS operation on r0, the register could theoretically have been updated to a new value. The result? We’d never check on the value we can actually see in m.

Of course, with several layers of caches in between main memory and registers, this kind of problem can occur whenever you update one from the other.

Luckily, this stuff isn’t quite what happens in the real world. There, CPUs and compilers contain enough cleverness to ensure consistency — but this isn’t happening magically, it happens with a mechanism reminiscent of mutexes (though they’re not really mutexes). If the mechanism fails, that’s called a cache miss and your CPU will stall program execution until it re-reads the memory.

Hey, wait!

Stalling program execution until we know our idea of memory contents is fresh… that sounds familiar:

  while not m.cas(False, True):
    pass

Yup, that’s fairly similar to how you might enter a critical section.

Disk Caches

These caches, and the problem of updating them, abound everywhere. Consider disk caches as an example:

  1. Hybrid SSD + physical disks improve speed by caching physical disk content in the SSD part.
  2. Any disk caches physical/SSD content in internal memory.
  3. OS Kernels cache in main memory what they get from the disks.
  4. Many applications cache file content in main memory and flush updates only occasionally to avoid heavy I/O.

Caches? They’re everywhere. And with them goes the concept of waiting until your cache is up-to-date.

Databases

When you write a line to a database table, chances are that a bunch of other things are happening in the background. For example, when you’re updating a table with indices — such as a primary key (who doesn’t use them?) — the RDBMS updates not just the table contents, but the index as well.

To bundle such operations, most any RDBMS will use a technique very similar to transactions. And if they don’t, then they still offer transactions for you to use when you want several changes to the database to happen as if they were one atomic operation.

A very simple transaction might be implemented by using a critical section. When the transaction starts, you enter the critical section, and leave it after committing. And on some level, that’s precisely what happens in an RDBMS — except it could potentially be optimized.

Remember the problem with updating registers from memory? And remember read-/write mutexes? It turns out that when you have multiple threads reading the same memory location, you only need to refresh the register once. The register value gets invalidated only if any thread writes to the memory location.

Databases could — and many do — make use of this idea.

These days it’s important to point out that NoSQL databases are no different from relational databases here. Their main difference is that in a NoSQL database, you will not spread logically grouped data across many database tables, which means a complex update can be made without the use of transactions. The downside is that instead of one or two indices per table, people will usually index a bunch of fields in a NoSQL document, yielding to much the same problems in keeping everything consistent and up-to-date.

Web Applications

Finally, we’re getting to the part of the software stack that most programmers nowadays will be most familiar with: applications that are accessed through a web browser.

When people start talking about scaling a web application up, for better or worse, what they usually mean is that they have one or more machines that run a web server and their application. From those machines, they access a single database, which must broker requests from all the web servers by locking most of them out at any given time.

With enough web servers, databases effectively become the critical section that all the servers compete for.

Distributed Databases

The solution, of course, is to have more database machines! Right?

Well… no. Having a more than one database machine re-introduces the consistency problem when the database machines try to decide between themselves what the current, up-to-date data is. All you’ve done is moved the problem away from the interface with the database, to the interface between the database servers.

Mind you, that might be a good thing. Authors of database software are usually well aware of these consistency issues, and spend a lot of time solving them. Better than you re-inventing the wheel, right?

Eventual Consistency

The alternative to this kind of thinking is to adopt some technique from memcached and similar software. Instead of multiple machines running memcache organizing themselves between each other, it’s the client that updates all instances, one after the other.

The underlying idea is a concept called eventual consistency, and it’s usually good enough for most web applications.

That is, the data stored in all the memcache servers is not consistent, until the client managed to update them all. Of course, each of these updates may still be subject to concurrency issues when multiple clients attempt to write the same data at the same time… which is why, you might have guessed it, memcache implements a CAS operation.

The concept can be taken a lot further, too — for example, in a social networking site, it is generally not necessary for all subscribers to a person’s post to receive the post at the same time, but perfectly fine to receive it eventually.

Sharding

There is a way, of course, to alleviate load issues that simply circumvent the single point of failure problem with making a single database enforce consistency. Sharding is a general technique for partitioning your data set. In most instances, it describes breaking your entire system into several independent shards — one common place where this is used is in MMORPGs, where not all of the potentially millions of subscribers contact the same server, but rather connect only to “their” server, which serves a limited and capped number of subscribers only.

In databases, especially in the NoSQL realm, the term is used to describe a more MapReduce-like approach. Data is distributed between several servers, and combined on the client side.

Sharding doesn’t precisely solve the consistency issue, it’s more that it circumvents it. Depending on use case, it’s a great tool in the tool box.

Sharding can also lead to a form of eventual consistency, which is why it’s mentioned here. If you distribute the users on a social networking site across multiple shards, then updates from one user will be available to other users on the same shard fairly quickly. In order to update other shards, you might use some kind of out-of-band updating mechanism, which eventually makes data available on other shards.

Master/Slave Databases

Many databases also implement eventual consistency in the form of a master/slave setup. Here, one master database exists, and this is the only database your application may write to. One or more slaves get updated from the master database, and serve read requests to the application.

This kind of setup is essentially yet another caching mechanism, which leads to eventual consistency. Your slaves do not necessarily reflect the most up-to-date data set of the master, but are likely to lag behind a little.

Conclusion

If I managed to point out one thing here in this post, it is that whether you’re writing multi-threaded code, whether you’re dealing with high amounts of disk I/O, or whether you’re writing distributed applications, one problem remains: consistency of data is a tricky beast, and the problem is consistently the same in every software project with concurrency.

The next thing you should take away is that writing multi-threaded code is great practice for understanding this issue. It goes to the heart of the problem, without having to write distributed software or having to deal with assembly language code.

Lastly, you should have been made aware that the mutual exclusion concept of ensuring consistency is always going to become a bottleneck in applications which need to share data with many parties. For that reason, in the context of web applications, it is generally a bad idea to rely on a database of any kind to spread the information around with guaranteed consistency. Far better to think in terms of eventual consistency, and design your system accordingly.


Comments are closed.

Reputation. Meet spriteCloud

Find out today why startups, SMBs, enterprises, brands, digital agencies, e-commerce, and mobile clients turn to spriteCloud to help improve their customer experiences. And their reputation. With complete range of QA services, we provide a full service that includes test planning, functional testing, test automation, performance testing, consultancy, mobile testing, and security testing. We even have a test lab — open to all our clients to use — with a full range of devices and platforms.

Discover how our process can boost your reputation.