Features

Digging deep with async queries

Wondering how async queries work? Let's dive into why they are our recommended default query type, how to use them, and the quirks you should be aware of.

Millie Macdonald

Jul 25, 2025

Some integration use cases require quick query responses, usually for real-time workflows. Halo Cloud enables these use cases with immediate queries, which are prioritised in Halo Link’s queue and return the response directly.

However, speed requires tradeoffs. Queries can’t be too large or they will lock up the database for too long, interrupting other high priority queries. And if everything is high priority then nothing is high priority - using only immediate queries means the queue becomes a FIFO (first in, first out) queue.

That’s why our recommendation is to use async queries by default, unless you specifically need immediate or registered (i.e. repeated polling queries - see this blog post for more information) queries.

How do async queries work?

Asynchronous HTTP requests aren’t a new concept, and our async queries don’t do anything particularly novel. At a high level, the process is:

An integrator sends a request to create an async query to Halo Cloud, and receives a response that includes a query ID the integrator can use to fetch the query result once its ready.
Halo Cloud sends the query to the relevant practice’s Halo Link, where it is queued for execution next time there’s free bandwidth.
Once the query executes, the result is paged (in roughly 1MB chunks) and each page is queued for upload to Halo Cloud.
The integrator checks the query status and how many result pages they will need to fetch using the query ID.
The integrator fetches the query result pages from Halo Cloud using the query ID and page numbers.

Simple, right? There’s a bit of nuance that’s worth explaining though…

Query limitations

Unlike immediate queries which have a size limit of 8MB, async queries can be as large as you need. The only limitation on async queries is a 5 minute execution timeout, but this does not include time taken for queuing or uploading results.

Queue priority

One of Halo Connect's key goals is to manage and reduce load on the practice server. To do this, Halo Link gives each integrator their own query queue (so your queries aren’t competing with other integrator’s queries) and prioritises processing queries as such:

Immediate queries have highest priority, and will be queued ahead of async and registered queries (but not other immediate queries).
Async queries are queued and executed in the order they are received by Halo Link.
Registered queries are queued at the frequency specified when they are created.

Using the right mix of immediate, async, and registered queries is key to optimising the performance of your integration with Halo. Queuing a heap of immediates will push any async queries down the queue, and a long running async will delay any immediates queued after it started executing, which can cause them to time out. If you find that queries are taking a long time to execute or are often timing out, we would recommend reviewing the spread of queries your integration uses — and feel free to reach out to Halo Support if you have any questions!

It’s also worth noting that async query result pages are put into a communal upload queue for upload to Halo Cloud, rather than the integrator-specific query queues. This is to protect the practice’s internet bandwidth, by ensuring there aren’t spikes in the bandwidth required by all integrations used by a practice. The result pages are however logically separated by requesting integrator, to ensure that integrators can only access results for their own queries.

Pagination

Async query results can be big. We’ve tested 5GB successfully, and haven’t seen any larger queries from integrators (yet). Trying to upload all of that data at the same time would lock up resources for far too long, so instead we page the query results.

Pages are intended to be 1MB or less. However, we do not split database rows. When one database row is more than 1MB, we return the whole row as one page, regardless of its size.

How do you check how many pages a result is?

The Query status endpoint allows you to check whether an async query is complete and, once it is, how many pages the result has been split into. The result object for an async query response with 2 pages would look something like:

    "result": {
        "rows": 30,
        "size": 25,
        "pages": [
            {
                "pageNumber": 1,
                "status": "queued",
                "size": 0,
                "rows": {
                    "count": 10,
                    "rangeStart": 1,
                    "rangeEnd": 11
                }
            },
            {
                "pageNumber": 2,
                "status": "queued",
                "size": 0,
                "rows": {
                    "count": 10,
                    "rangeStart": 11,
                    "rangeEnd": 21
                }
            }
        ]
    }

Result retention

Async query results are cached in Halo Cloud for roughly 24 hours.

If you’re not in a hurry to get the result, we recommend creating the query then waiting a decent amount of time before checking its status. How long a “decent amount of time” is will depend on how many queries you’re sending down and how long they take to execute. We can help you find that magic number if you like, based on our own analytics. Reach out to Halo Support if you’re interested!

Webhook notifications on query completion

One way to know when an async query is complete is to poll the Query status endpoint, but that can be inefficient and can put unnecessary load on our systems. Our recommendation is to use webhooks instead. This will notify you as soon as the query result is ready (or if it errored), allowing you to fetch the data as quickly as possible and to avoid unnecessary API calls.

Check out this blog post about webhooks for more details, and contact Halo Support if you would like to get setup.

Fetching result pages

Once the query result is ready, there are two options for actually fetching the result pages:

Async streaming: Particularly recommended for large data, this method is also more performant and more resilient in general but may require a bit more setup. Check out this blog post for more details.
Serialised JSON: The simpler method, this endpoint returns each result page as base64 encoded JSON.

Either way, each page must be fetched individually and the data stitched back together to get the full query result.

More information

If you would like to try using async queries, you could check out:

Our documentation about the different query types
Some examples of creating asyncs, checking their status, and fetching results
The API reference section for async queries (under the SQL Passthrough heading)

Features SQL Passthrough