Trial and error: A rare issue on a Rails API

How I fixed race conditions on a Rails API

  • Development
  • Ruby
  • Security
  • SQL

Ever since discovering Ruby on Rails as a framework at the company I worked for first after my high-school diploma, it was a breeze to work with it. Never have I looked jealously at other frameworks like Express.js, Phoenix, or any others alike. I always felt heartwarming, just kicking up a new project with the rails new command, and scribbling a little further through the controllers, models (and if it's not an API, also through the views). There are many people saying "Ruby is dead, long live XYZ", where "XYZ" is often Node.js. However, I am also familiar with Node.js, and I am much happier to just fiddle down an MVP with Ruby on Rails.

But just recently, in our newest project "Pupy", I discovered a rare issue, I didn't found any earlier: Race Conditions. After googling some time (For non-developers: Yeah, we also do this rarely), I found out about some solutions on how to fix a race condition very easily. Happily, it worked this easy for our project, however, we are all curious and I wanted to find out more about this topic.

What is a race condition?

Race conditions do not only happen in Ruby, neither only in Ruby on Rails. It's such a big topic, that even Wikipedia tells us something about this. We are going to cover the "networking" part.

Wikipedia states it pretty good:

In networking, consider a distributed chat network like IRC, where a user who starts a channel automatically acquires channel-operator privileges. If two users on different servers, on different ends of the same network, try to start the same-named channel at the same time, each user's respective server will grant channel-operator privileges to each user, since neither server will yet have received the other server's signal that it has allocated that channel.

So, to sum this up: A race condition happens, when two users are doing the same request (with some specific, normally, unique attributes) to the same server. When the server is multi-threaded, we are ending up in something like the following graph:

When thinking about the graph, it normally shouldn't be a problem, however SQL is "dumb" while you are not using it correctly. ActiveRecord features some methods to fix race conditions, but you should not use this method all over your application. This is why it is not enabled by default. More about this later.

What was our race condition about?

At Pupy, we struggled to get the users going. We are authenticating the user by handling users based on a unique ID, which is generated and saved on the device of the user. Having to do this, is leaving us to a possible race condition, when there are multiple requests, with the same ID which wants to "register". We are doing the registration on every request (we are using a find_or_create_by for the user model. Additionally, we have three to four requests going on, when opening the app.

See the graph above, and think of three to four users going to possibly three to four servers, this would result in minimum three users, which have been registered by the API. But those three users should have been only one user (because of the unique ID). This was our race condition.

Searching for a solution

Finding a solution wasn't actually that easy, as "google, and boom, you are good to go". I needed to try some things.

Let's just fix is with a database index?

Good idea. We are going to add a database index, which tells the database "Let's only take this attribute once for all, and then never again". Let's see how to achieve this.

Ok, the first issue solved. There are not multiple entries with the same unique ID anymore. But, we are seeing users reporting that some endpoints are not working. Why? Because ActiveRecord is raising an ActiveRecord::RecordInvalid error, when we are seeing something which should be unique but wants to be written to the database another time.

So, time to find another option.

Result: Possible, and also needed, however, this is not the complete answer.

What about a "locking"?

Next idea was to implement an optimistic or pessimistic lock. Short excursion into what a pessimistic lock is: A lock results in locking the database for a short time of period, until a transaction is done. This can be done easily with Rails, since ActiveRecord implements a lock class method.

This actually would work, if we are knowing, which record to "lock", however, we are not knowing, which record we want to lock, since the record is not created at all when the first request happens.

Result: Not feasible, because we don't know the record to lock.

Locking the whole table with pessimistic locking?

We also tried locking the whole table with pessimistic locking. This would work, but would result in an enormous request queue for our server, because at the first request of the user, the whole table would have been locked for all other users, too.

Result: Not possible, because we would have a "request queue" going on, and other users would have a long loading time for the API.

The new create_or_find_by to the rescue!

When working with Rails, we should all know the good old find_or_create_by class method from ActiveRecord. For those, who are new to Rails: This method is checking, if there is a record with the specified attributes in the database, and otherwise (if non is found) creates a new record with the given attributes.

But we were super lucky. The new star at the race condition horizon is here: create_or_find_by!

What is the difference between those two?

The BigBinary blogpost about the new class method describes the difference pretty much straight forward:

create_or_find_by is an improvement over find_or_create_by because find_or_create_by first queries for the record, and then inserts it if none is found.

In a practical example: When starting a request to our servers, we are not first trying to find a user with the unique ID, but rather try to create such a user, and if the database (or Rails) is raising an error, we are finding the specific record and are returning it. In this case, we are relying on the database to tell us (through the constraint index in the first half of the article) that a record was already created.

Result: Works!

Going a little bit further in our use case

At the time of writing this article, we are very pleased to have already worked with GraphQL in it's uprising time. But, since our recent project is basically a simple "CRUD" application, we decided to go with a Rest API. If we would have been used GraphQL (in a proper way) we would only need to have one simple query to the server, which fetches all three to four requests in one. This would result in:

  1. Not having to fix a race condition in the first place
  2. Do not use too many requests for the start-up of the app
  3. Slow down the server with all the requests

Seeing GraphQL evolve is a very nice step forward for all of us backend developers, however, in some cases, GraphQL can be a little too overwhelming for a simple project as ours.

So, you also like working with Ruby on Rails or you want to realise your project with that? Then don't be shy to contact us. We are looking forward to hearing from you!

Dominik Schmidt

CTO

This could also be interesting for you

Subscribe to our newsletter

No spam, just insightful content on design, development, product management, AI and much more.

Subscribe
I have read and accept the Terms of Service & Privacy Policy.