TIL: SKIP LOCKED can be used to implement a queue in PostgreSQL
TIL: SKIP LOCKED
can be used to implement a queue in PostgreSQL
A queue is used as a synchronization primitive to distribute work across many workers. On one end of the queue, one or more producers will call enqueue
to append work items to the queue, and on the other end of the queue workers will call dequeue
to pop the first item from the queue and do some work with it1.
Let's create a PostgreSQL table to hold our queue items:
CREATE TABLE queue (index SERIAL PRIMARY KEY, item INT);
Let's see how we can implement our basic queue methods: enqueue
and dequeue
. enqueue
can be implemented by an INSERT
query, as appending a new tuple to the table is quite analogous to appending an item to a queue:
INSERT INTO queue (item) VALUES (1);
INSERT INTO queue (item) VALUES (2);
Initially, dequeue
can be implemented with a DELETE ... RETURNING
query:
DELETE FROM queue
RETURNING item;
This implementation will consume and delete the entire queue on every query. So, our workers need to be prepared to handle multiple work items, which is not quite what we had in mind. Moreover, depending on the relation between work item production speed and the time it takes to do the work, we may end up with a lot of idle workers, and a few very overloaded workers.
Clearly, we need to select only the first item at the top of the queue. Thankfully, we have a serial primary key:
DELETE FROM queue
WHERE index = (SELECT index FROM queue ORDER BY index LIMIT 1)
RETURNING item;
This implementation complies with our queue API, but it does not scale well with the number of workers: in order to obtain a work item, the first worker will have to hold a ROW EXCLUSIVE
lock. This lock may be held while the work is being done, but at the very least it will be held until the worker receives the work item.
In our example we have inserted two work items into the queue, and we would like if two concurrent workers were able to get one of the two work items and execute the work in parallel. However, after the first worker gets its value, the SELECT
subquery in the WHERE
clause of our DELETE ... RETURNING
query will attempt to acquire a ROW SHARE
lock to figure out which is the first item in the queue, and the second worker will have to wait in line until the first worker finishes.
TIL that we can use a FOR UPDATE ... SKIP LOCKED
clause in our SELECT
subquery to get around this problem (see the documenation). SKIP LOCKED
will work with an incomplete view of the data; in our case that means all the work items that have not been picked up by a worker. Since this incomplete view doesn't include any locked rows, our query will return immediately with a work item if available, or nothing if the queue is empty.
DELETE FROM queue
WHERE index = (SELECT index FROM queue ORDER BY index FOR UPDATE SKIP LOCKED LIMIT 1)
RETURNING item;
And our workers can now do their work in parallel! PostgreSQL locking mechanism is now working for us by ensuring that no repeated values are handed out to our workers, leaving us more time to answer that incident page caused by a different concurrency-related bug…
Footnotes
For simplicity, I'm describing a single-ended queue.