Range queries in Firestore
I like Firestore because it’s an easy-to-use database. Especially because it’s easy to build queries that are just fast. It’s designed to work at scale - by providing queries that have a constant time, regardless of the query conditions. But if you have worked with Firestore before, you will know that this limits the types of queries you can do. This can make range queries, for example for calendar applications, a challenge.
In general, Firestore supports range queries.
For example, if you have documents that have a date field, you can filter all documents in a time range by filtering date >= rangeStart AND date <= rangeEnd
.
But what if your document doesn’t have a single date?
What if it’s an event with a start and end date?
You could change the query to startDate <= rangeEnd AND endDate >= rangeStart
, but this doesn’t actually work in Firestore, because you can’t have range queries with two different fields in a single query.
This wouldn’t be a problem with an SQL database like Postgres, but it’s a challenge with Firestore. What alternatives do we have?
Simply running two queries, one with a start filter and one with an end filter (startDate <= rangeEnd
, endDate >= rangeStart
) and combining the results on the client is not a good idea.
One of the queries would return all past events, while the other query would return all future events.
In the end, this wouldn’t scale very well, as it would just return all the documents in the collection, while some documents would even be returned twice!
Spatial or geo queries are another case where you want to filter by two fields:
Latitude and Longitude.
Let’s say you want to find all the places in a rectangular region on a map.
This would result in a query like latitude >= latitudeRangeStart && latitude <= latitudeRangeEnd && longitude >= longitudeRangeStart && longitude <= longitudeRangeEnd
, which, as we have learned, is not supported by Firestore.
However, there is a clever solution to such a query.
You can use Geohashes. A Geohash combines the two axes, latitude and longitude, into a single enumerable string representation that is stored in the documents. To query the places in an area, you first calculate the lower and upper bound Geohashes and use them to filter the documents on a single field. However, this is only a broad phase filter and will return some false positives that are not in your search area (over-fetching); you will need to filter these out using a more detailed condition on the client.
This idea is pretty neat, would it also work for our calendar events? Unfortunately not, because we don’t have a constraint range on each axis, but each axis is actually is open-ended query in one direction.
So if range queries don’t work, what else can we do? Let’s use the idea of generating an additional field from the Geohash to support our queries. We group the events in our collection into one or multiple time buckets. Time buckets can have different labels, with different sizes or resolutions. For example, we would could have daily buckets, weekly buckets, monthly buckets, quarterly buckets, or yearly buckets. This is just an example of buckets, depending on your type of queries you may not need all of them, or you may need different sizes. An event that starts on the 14.01.2024 and lasts until the 18.01.2024 will be placed into the following buckets:
- Daily:
2024-01-14
,2024-01-15
,2024-01-16
,2024-01-17
,2024-01-18
- Weekly:
2024-CW02
,2024-CW03
- Monthly:
2024-01
- Quarterly:
2024-Q1
- Yearly:
2024
We collect all buckets in an array and store them in a field of the event document.
When querying, we can use a similar logic to generate the buckets for our query range and then use them in an array-contains-any
filter on the Firestore database.
But there is a catch:
Like many queries, array-contains-any
has limits:
The number of values that array-contains-any
can filter on is limited.
It’s hard to say exactly what the limit is because it depends on your query, but it’s less than 30.
This limit isn’t so bad, it just makes it a bit more difficult to choose the right buckets for the query.
In general, you should only query buckets that are not already covered by a higher level bucket.
Some examples:
If you are querying for January to March 2024, you can filter by the buckets 2024-01
, or 2024-02
.
If you query from the 15 January 2024 to the 15 March 2024, you have several options:
Either you query for the buckets 2024-01
, or 2024-02
, or 2024-03
with the risk of false positives, or you try to combine the monthly buckets with some smaller sized ones, like weekly buckets: 2024-01
, or 2024-02
, 2024-03
, 2024-CW09
, 2024-CW10
, or 2024-CW11
.
You just have to make sure that you don’t exceed the number of filter conditions that Firestore allows for the query.
In general, false positives are probably not that bad, you can consider the query as a broad phase and do some more detailed filtering in a second step in the client.
I actually like Firestore’s limited query functionality — not just because it offers great performance and scalability guarantees. It takes you back to a time when you had to think more about how you were doing something to get a good performance. Sometimes a few clever tricks can speed up your code without adding too much complexity.
Tags: performance, database, firestore