Scale re-orderable lists

Creating a UI where a user can manually reorder items based on their needs is neat, but can be quite challenging to implement in the backend. One need to store the order of the items somehow. There are different approaches to doing this, but each of them has trade-offs when it comes to performance and scalability. In this post I want to explain my scenario, compare them, and choose the best implementation.

I have had to solve this problem a few times in the past already:

  • A UI that allows users to reorder menu items based so that they can easily access the most important ones.
  • A whiteboard where the user can change the drawing order of the elements on the canvas.
  • A large list of products that the user can rearrange and group according to their use.

For the first scenario, a naive approach might be to store the identifiers that refer to the menu items in a separate list and use the order in the list to display them accordingly. If the user wants to move an item, we simply change the order of the items in the list and store the full list in our storage system. The UI will probably only have a few dozen menu items, so writing the full list each time is probably fine. If you have a lot of items, the array can get quite large. It may also happen that the items in the list are out of sync with the menu items they reference, for example if a new version of the software adds new menu items or removes some.

Looking at the other scenarios, for example the product list, if we have a list of hundreds, thousands, or more items, writing a separate list for the order becomes a performance problem. It’s difficult to implement pagination because we would have to look up the items from the list. While many database technologies support joins to do this efficiently, some NoSQL databases such as Firebase Firestore don’t. In this case we need to load each document individually.

What other approaches are there? Let’s start by defining some requirements.

  • Easy to write: I want to be able to move an item by changing only the item itself, I don’t want to update any of the neighboring items to insert it. This is important for scalability, but also a cost issue when using a database like Firebase Firestore, where you pay per write.
  • Atomic operation: Moving the item should be an atomic operation, without the need to lock the list of items. I assume this is already fulfilled by restricting the write to a single item. So you should never end up reading an inconsistent state.
  • Easy to query: I want to be able to easily query the items in order from the database, especially in databases with limited query capabilities such as Firebase Firestore. Limited query capabilities will rule out some approaches such as joins, linked lists or adjacency tables. This should also work together with skip and limit or cursor-based paging. In the end this will probably come down to a single field per item that I can order by.

Let’s see what method we can come up with.

Simple integer order

At first sight, the simple approach seems to be a monotonically increasing integer. We assign each item an integer based on its position in the list (0, 1, 2, 3, …). In database queries we can easily order by the position field and get the result set in the right order. However, updating the position when moving an item is much more complicated: We need to update all items between the start and end positions of the move operation to rebalance the positions. The need to update multiple items doesn’t satisfy my need to update only a single item.

Using an integer based rank requires us to update multiple items on moves. You can drag and drop to move items around.

Integer order with gaps

To reduce the need for rebalancing operations, we can leave some gaps between the integers when generating the initial positions. For example, instead of using consecutive integers, we can use a step size of 10 when assigning the positions (0, 10, 20, 30, …). This allows us to squeeze a new item between two existing ones (0, 10, 15, 20, …). However, there is not an infinite space between two numbers, we can run into a situation where all the space is filled up (in our example, only 9 items fit). In this case, we need to perform a rebalancing operation to assign new positions by updating some of the items. Although we need to update more than one item less often, we still need to do this from time to time. Let’s see if there are better approaches.

If we use an integer-based order with gaps between the initial values, we don't need to update multiple items each time. But after some time it's necessary to perform a rebalancing operation, which still requires updating multiple items at once. For example, try moving several items between item 0 and item 1. After a few tries, the space is used up and we have duplicate ranks to resolve.

Float order

Instead of having gaps between integers, it would be perfect to use decimals, because we know that there are an infinite number of numbers between two decimals. So if we assign a decimal position to each item (0.0, 1.0, 2.0, 3.0, …), we can squeeze an infinite number of items between them. The position of the new item is just the average between the position of the previous and the next item (0.0, 1.0, 1.5, 2.0, …). We can repeat this as often as we like and only need to update a single item.

Or can we? This approach would only work if we had access to an infinite precision decimal type. But decimal types are rare in most databases; in JSON or Firebase Firestore, for example, we only have access to floating point values. Floating point values have a limited precision, which means we can’t use them to represent every possible decimal value. We can use floating point numbers for this approach, but it stops working if the numbers are smaller than what we can represent with a floating point value. In this case, either the reordering stops working or we have to do a rebalancing operation. While working on my prototype, I realized that this wasn’t just a theoretical thing, but that one could get to this state quite quickly by performing the same operation over and over again.

If we use a decimal-based order, we theoretically get an infinite space between two items that we can fill. But in practice, the floating point precision is not great enough to handle this. For example, move the second item in the list behind the third one and watch the rank converge to 3. After 52 repetitions the precision is no longer sufficient and one end up with a duplicate rank of three.

Fraction order

Choosing the average as the midpoint between two items quickly exhausts the precision of floating point values. But there are other approaches to finding a better midpoint. Joe “begriffs” Nelson presented a method based on true fractions. I won’t repeat the background of the algorithm here, but the idea is this: Instead of assigning integers, we assign a fraction as a position to each item (¹⁄₁, ²⁄₁, ³⁄₁, ⁴⁄₁, …). When inserting, we get the lowest/reduced fraction of the previous and next item. We take the sum of the nominators and of the denominators to create a new fraction (¹⁄₁, ²⁄₁, ⁵⁄₂, ³⁄₁, …). There are two ways of storing the fractions: Either we store the nominator and denominator as separate integer fields, which gives us a very high precision. Or we can store the fraction as a decimal in a float. The latter is better in our case, as it allows us to have a single field to order by.

When we try this out in the prototype, we can see that the fractions make much better use of the precision of the floating point type. I think this is a really smart approach as it’s pretty easy to implement. The most complicated part is the calculation of the reduced fraction, but that’s not very complicated either. However, since we’re still using a fixed precision floating point value for storage, I’m still worried that we’ll run out of precision after a while. Let’s see if there are any other methods.

If we work with fractions, rather than just taking the average between two numbers, we can make much better use of the precision available in floating point values. For example, move the second item in the list after the third, and repeat this over and over again. Even after repeating this a hundred times, there is still a lot of precision left compared to taking the average.

String based order

Instead of a numerical representation, it’s also possible to represent numbers as strings. Strings can also be ordered (less efficiently than numbers), and they don’t have problems with precision, since they can be arbitrarily long. Using a string based encoding for a variable length rank seems to be quite common, for example Figma for ordering elements on the canvas and Jira uses it for the ranking in the agile boards (but their lexical ranking with different buckets and rebalancing sounds quite complex).

I stumbled across lexorder, an NPM package for working with string based order keys. It uses two encodings, one is a string based base-16 (hexadecimal) encoding that is stored in each item (80, 8001, 8002, 8003, …). The encoding can automatically extend the length of the key if the precision is insufficient. A second numeric encoding is used once an item is inserted, in which case the position of the previous and next item is averaged to get the position of the inserted item (80, 8001, 800180, 8002, …). This is quite nice, as it will never run out of precision. The precision is only limited by the maximum string length that we can store. By using a more compact encoding such as base-32 or base-64, this string length can be optimized. There is no need for rebalancing, so it meets all of my requirements! In the prototype I implemented it using the lexorder package, but the implementation is quite simple and could be re-implemented if needed.

String based order implementation. If there isn't enough space to insert more items, the precision is automatically increased by extending the length without the need for rebalancing.

I’m happy to find two good solutions to my problem. Both the fraction and the string based ordering meet my needs. The order can be changed by updating only the item to be moved. Querying the items is easy: All we have to do is sort by a single field, which every database supports. And at first I thought I had also fulfilled my requirement for an atomic operation to move the item, since I only have to do a single write and you will never query an inconsistent state. However, this assumption seems to be wrong. While it’s probably true that you’ll never read an inconsistent state, it’s not safe to do parallel writes. For both methods, we need to know the order key of the item before and after the insertion point in order to calculate the order key of the inserted item. This must to be done within a transaction to ensure that we don’t end up with duplicate keys when two moves are made in parallel. Duplicate keys won’t break the list completely, but they will be inconvenient for the user, as the order won’t behave as it should in some cases. Both approaches are somehow able to recover from duplicate keys by moving the items around.

At first I thought that the fraction based approach was really clever and is probably totally sufficient - but it still has limitations when it comes to the floating point precision. The string based approach one is on the safe side.

If I wanted to use the string based approach, I would probably implement it myself. Using the lexorder package is probably a bad choice as it doesn’t seem to be maintained and has only a few downloads per week 😉.