QWP client behaviour specification
This is the normative behaviour specification for QWP clients at startup, connection, failover, and store-and-forward (SF) durability — the contract every QuestDB language client is aligned to. It is for client implementers and for advanced users who need the exact contract.
It is derived from the Java reference client and is under active refinement. Code samples use the Java client for illustration, but the normative content is the behaviour and configuration tables, which apply to every client. Where a client currently diverges, this spec is the target.
Scope
This specifies client behaviour for three connection concerns — initial connect / startup, failover and reconnection, and store-and-forward (SF) durability — plus the connection pooling model that ties them together. The Quick start and Mental model sections are the minimum to configure a correct client; the Reference section is the exhaustive behaviour matrix; the Implementation appendix records non-normative reference-client internals.
Behaviours still being aligned across clients are marked ⚠ Sharp edge and listed under Known sharp edges. "Intended" items are deliberate contracts; "Candidate" items are likely defects targeted for change.
Quick start
Write-only client that tolerates the server being down at startup
Use the direct Sender API (not the QuestDB facade — see
sharp edge #4).
String cfg = "ws::addr=db-a:9000,db-b:9000;"
+ "sf_dir=/var/lib/my-app/questdb-sf;" // opt into disk durability
+ "sender_id=writer-1;" // unique per process per sf_dir
+ "initial_connect_retry=async;" // non-blocking startup
+ "reconnect_max_duration_millis=86400000;" // outage budget (24h)
+ "sf_max_total_bytes=100g;";
// For production, prefer the builder so you can install an error handler:
try (Sender sender = Sender.builder(cfg)
.errorHandler(myErrorHandler) // see "Error visibility" below
.connectionListener(myConnectionListener)
.build()) {
sender.table("telemetry").longColumn("v", 42).atNow();
sender.flush(); // persists to SF storage; wire ACK is asynchronous
}
Why each line matters:
sf_diris the only SF enable switch — there is no boolean flag.initial_connect_retry=asyncis what makesbuild()return without a live socket. Without it, startup is blocking (see Mental model).reconnect_max_duration_millisis the outage budget for both the initial connect and later reconnects. If it expires, the sender latches terminal and stops; data already insf_dirsurvives for a future sender on the same slot.
Error visibility ⚠: the simplest path (Sender.fromConfig(...) + async)
surfaces terminal async failures only later, through a producer call or at
close(). For production, use Sender.builder(...) and install a
SenderErrorHandler / SenderConnectionListener
(sharp edge #7).
Read client that only reads from replicas
String cfg = "ws::addr=replica-a:9000,replica-b:9000,replica-c:9000;"
+ "target=replica;" // without this, the client may bind a primary
+ "failover=on;"; // default; affects execute()-time recovery only
try (QuestDB db = QuestDB.connect(cfg)) {
db.executeSql("select * from telemetry limit 10", myBatchHandler);
}
Why each line matters:
target=replicais required to avoid binding a primary/standalone server. The defaulttarget=anywill accept any role.failover=onis the default. It does not affect startup; it only governs reconnect+replay after a query connection that was already established later fails duringexecute().
Mental model
Three independent "connect" models live in one client
A QuestDB facade owns an ingest pool and a query pool. They do not
share a startup model. You must hold all three in mind:
| Concern | Controlled by | Startup is... |
|---|---|---|
| Ingest sender initial connect | initial_connect_retry = off / sync / async | one-shot / blocking-retry / background-retry |
| Query client initial connect | (no mode; always synchronous) | always blocking |
Facade prewarm (how many of each connect at build()) | sender_pool_min, query_pool_min | eager if min>0, lazy if min=0 |
failover=on (query default) is not a startup setting — it only affects
query execution after a connection exists. This naming trips people up
(sharp edge #3).
Ingest initial-connect modes
initial_connect_retry | Mode | build() behavior on a down server |
|---|---|---|
off / false | OFF | one attempt on caller thread; throws immediately |
on / true / sync | SYNC | retry loop on caller thread, bounded by reconnect_max_duration_millis (blocks) |
async | ASYNC | returns immediately; I/O thread retries in background |
Default resolution ⚠: if you don't set initial_connect_retry explicitly but
you do set any reconnect_* knob, the mode becomes SYNC — so a "resilience"
knob silently turns startup into a multi-minute blocking retry. If no
reconnect_* knob is set either, the mode is OFF. Always set
initial_connect_retry explicitly to avoid this (sharp edge #1).
Facade prewarm
QuestDBBuilder.build() validates both configs (without connecting), then
eagerly creates min connections per pool. Consequences:
| Configuration | Build-time network behavior |
|---|---|
defaults (min=1 both) | creates one sender + one query client; build fails if either cannot connect — unless ingest uses initial_connect_retry=async |
sender_pool_min=0 | no sender at build; first borrowSender()/sender() creates it (then follows the ingest initial-connect mode) |
query_pool_min=0 | no query client at build; first query submit() creates it |
both mins 0 | config-only validation at build; all network work is lazy |
After prewarm, both pools grow lazily up to max on demand, and shrink back to
min when idle. Growth uses the same real connect path as prewarm. At max,
callers block up to acquire_timeout_ms then throw.
Defaults (single source of truth)
Pool (facade only)
| Key / builder | Default |
|---|---|
sender_pool_min | 1 |
sender_pool_max | 4 |
query_pool_min | 1 |
query_pool_max | 4 |
acquire_timeout_ms | 5000 |
idle_timeout_ms | 60000 (0 ⇒ infinite) |
max_lifetime_ms | 1800000 (0 ⇒ infinite) |
housekeeper_interval_ms | 5000 |
Ingest sender (SF + reconnect)
| Key | Default |
|---|---|
sender_id | default |
sf_max_bytes (segment size) | 4 MiB |
sf_max_total_bytes (SF mode) | 10 GiB |
sf_durability | MEMORY |
sf_append_deadline_millis | 30000 |
reconnect_max_duration_millis | 300000 (0 ⇒ give up immediately, not infinite ⚠) |
reconnect_initial_backoff_millis | 100 |
reconnect_max_backoff_millis | 5000 |
close_flush_timeout_millis | 60000 |
auth_timeout_ms | 15000 |
Query client
| Key | Default |
|---|---|
target | any |
failover | on |
failover_max_attempts | 8 (incl. original) |
failover_max_duration_ms | 30000 (0 disables the duration cap) |
failover_backoff_initial_ms | 50 |
failover_backoff_max_ms | 1000 |
auth_timeout_ms | 15000 |
serverInfoTimeoutMs | 5000 (builder API only — no config key ⚠) |
Note the inconsistent 0 convention: idle_timeout_ms=0/max_lifetime_ms=0
mean infinite, but reconnect_max_duration_millis=0 means give up now
(sharp edge #2).
Knob availability by surface
Three configuration surfaces exist. Not every knob is reachable from every surface — this matrix shows where each lives.
- Conn string: a
ws/wssconfig string. Works forSender.fromConfig,QwpQueryClient.fromConfig, andQuestDB.connect(...). - Sender builder:
Sender.builder(...)(LineSenderBuilder) — direct ingest only. - Facade builder:
QuestDB.builder()(QuestDBBuilder) — pool knobs only; query/ingest behavior must come from the conn string.
| Knob | Conn string | Sender builder | Facade builder |
|---|---|---|---|
addr | ✅ | ✅ address()/port() | via conn string |
username/password/token | ✅ | ✅ | via conn string |
tls_verify/tls_roots | ✅ | ✅ | via conn string |
auth_timeout_ms | ✅ | ✅ | via conn string |
initial_connect_retry | ✅ | ✅ initialConnectMode() | via conn string |
reconnect_* | ✅ | ✅ | via conn string |
sf_dir/sender_id/sf_* | ✅ | ✅ | via conn string |
request_durable_ack | ✅ | ✅ | via conn string |
close_flush_timeout_millis | ✅ | ✅ | via conn string |
SenderErrorHandler | ❌ | ✅ errorHandler() | ❌ (not reachable) |
SenderConnectionListener | ❌ | ✅ connectionListener() | ❌ (not reachable) |
target | ✅ | n/a | via conn string |
failover/failover_* | ✅ | n/a | via conn string |
serverInfoTimeoutMs | ❌ | n/a | ❌ (QwpQueryClient builder only) |
sender_pool_*/query_pool_* | ✅ | n/a | ✅ |
acquire_timeout_ms/idle_timeout_ms/max_lifetime_ms | ✅ | n/a | ✅ |
⚠ Gaps worth noting: the ingest error handler / connection listener cannot
be installed through the facade at all, and serverInfoTimeoutMs has no
config key, so a facade query client cannot tune it
(sharp edge #6).
Known sharp edges
These are behaviours still under review as clients are aligned. "Intended" means a deliberate contract that will be kept; "Candidate" means a likely ergonomic defect targeted for change. The numbered references throughout this spec point here.
| # | Sharp edge | Status |
|---|---|---|
| 1 | initial_connect_retry is implicitly promoted to SYNC when any reconnect_* knob is set — a resilience knob silently makes startup block. | Candidate |
| 2 | reconnect_max_duration_millis name implies "reconnect only" but also governs initial connect; 0 means "give up now" while sibling 0s mean "infinite"; no infinite mode exists. | Candidate |
| 3 | failover sounds like it covers startup but only affects post-connect query execute(). Queries have no async/lazy initial connect at all. | Candidate |
| 4 | No first-class write-only facade: a write-only user must still supply a query config and remember query_pool_min=0. | Candidate |
| 5 | A single endpoint returning 401/403 is treated as cluster-wide terminal and aborts the whole endpoint walk, even at startup, even if other endpoints would accept the credentials. | Intended (documented), revisit |
| 6 | Ingest errorHandler/connectionListener and query serverInfoTimeoutMs are unreachable from the facade. | Candidate |
| 7 | The simplest API (fromConfig + async) has the worst error visibility — terminal async failures surface only on later producer calls or at close(). | Candidate |
| 8 | No client-side TCP connect timeout: a black-holed host in addr blocks the endpoint walk until the OS connect timeout. | Intended (transport limitation), revisit |
Reference
Store-and-forward semantics
sf_dir=... enables SF. There is no separate boolean enable flag.
- The sender owns one slot:
<sf_dir>/<sender_id>/. Defaultsender_idisdefault. - Multiple independent senders sharing one
sf_dirmust use distinctsender_idvalues, else the second fails because the slot lock is held. - In pooled
QuestDBusage,SenderPoolderives per-slot IDs from the base:<base>-0,<base>-1, … so pooled senders never collide. - On restart, the cursor engine opens existing segment files and replays unacknowledged frames; acknowledged/truncated frames are not replayed.
flush() semantics (QWP sender):
- Encodes pending rows into the cursor engine.
- In SF mode, data is persisted to mmap-backed segment files before
flush()returns. flush()does not wait for server ACKs unless backpressure requires space. The I/O thread sends frames and trims ACKed frames asynchronously.drain(timeoutMillis)flushes and waits for the server to ACK all currently published frames, up to the timeout.close()flushes then waits up toclose_flush_timeout_millisfor ACKs, unless that timeout is<= 0.
Async initial connect (ingest)
With initial_connect_retry=async:
build()returns without a live socket;wasEverConnected()isfalse.- Producer calls and
flush()can run before the server exists; frames accumulate in the cursor engine (and on disk withsf_dir). - The I/O thread retries in the background using the same loop used after wire failure.
- If a server appears before the budget expires, buffered frames are sent/replayed and ACK-driven trimming begins.
- If the budget expires before any connection, the sender latches a terminal
SenderErrorwhose message containsnever-connected-budget-exhausted. - If it connected at least once and a later outage exhausts the budget, the
message contains
connection-lost-budget-exhausted. - Terminal async errors go to a configured
SenderErrorHandler; without one they surface on later producer calls or at close-time.
There is no infinite-retry mode. For long maintenance windows, set a large
reconnect_max_duration_millis. On budget exhaustion the current sender stops;
persisted sf_dir data remains for a future sender on the same slot.
Ingest endpoint walk (addr=a:9000,b:9000,...)
| Per-endpoint result | Sender behavior |
|---|---|
| DNS failure | transport error; try next endpoint |
| TCP connect failure | transport error; try next endpoint |
| TLS session/certificate failure | transport error; try next endpoint |
| HTTP upgrade timeout / non-auth transport error | try next endpoint |
421 with X-QuestDB-Role: REPLICA | role reject; try next endpoint |
401 / 403 auth failure | terminal; do not try later endpoints ⚠ |
| durable-ack requested but unsupported | terminal mismatch |
| successful write upgrade | bind this endpoint |
| all endpoints fail transport | throw / retry per initial/reconnect mode |
| all endpoints role-reject as replicas | QwpRoleMismatchException |
Query client initial connect
QwpQueryClient.connect() is synchronous. Per endpoint it: opens TCP/TLS,
performs the WebSocket upgrade to /read/v1, reads the initial SERVER_INFO
frame, applies the target= role filter, and starts the egress I/O thread on
the first match. If no endpoint can be used, it throws. There is no async
initial-connect mode for queries.
target= matching:
| Target | Accepted roles |
|---|---|
any | any role |
primary | PRIMARY, PRIMARY_CATCHUP, STANDALONE |
replica | REPLICA only |
Query initial-connect endpoint matrix:
| Per-endpoint result | Behavior |
|---|---|
| DNS / TCP / TLS failure | record transport error; try next endpoint |
| HTTP upgrade timeout | transport error; try next endpoint |
HTTP 401 / 403 | terminal QwpAuthFailedException; do not try later ⚠ |
HTTP 421 + role header | role reject; try next endpoint |
upgrade ok but no SERVER_INFO before timeout | transport error; try next |
SERVER_INFO role ≠ target | role reject; try next endpoint |
| endpoint matches target | bind and return success |
| all endpoints transport-fail | HttpClientException: all QWP endpoints unreachable ... |
| all endpoints role-reject | QwpRoleMismatchException |
auth_timeout_ms bounds the upgrade/auth phase after TCP connect. There is
no separate client-side TCP connect timeout, so a black-holed connect blocks
until the OS timeout before the walk advances ⚠.
Query execution-time failover
With failover=on:
- A transport/protocol terminal failure during
execute()is intercepted; the client reconnects via the host tracker and re-submits. - The handler receives
onFailoverReset(...)before replayed batches. - Bounded by
failover_max_attempts(default8, incl. original) andfailover_max_duration_ms(default30000;0disables the duration cap). - Backoff:
failover_backoff_initial_ms=50,failover_backoff_max_ms=1000. - Auth failure during failover reconnect is terminal and reported to the handler.
With failover=off, a transport failure is reported to the handler with no
reconnect/replay.
Scenario matrix
Facade startup
| Scenario | Config | Result |
|---|---|---|
Default connect, all servers down | default mins | build fails |
Default connect, first endpoint down, second works | multi-addr | build can succeed; each prewarmed client walks endpoints |
| Write-only-ish startup while down | query_pool_min=0 + sender async | build returns |
| Fully lazy startup | both mins 0 | build returns after validation only |
| Query first use after lazy startup while down | query_pool_min=0 | first submit() throws |
| Sender first use after lazy startup while down | sender_pool_min=0 | first sender creation follows ingest initial mode |
Direct sender startup
| Scenario | Config | Result |
|---|---|---|
| server down, default mode | no reconnect_*, no async | one attempt; build throws |
| server down, reconnect duration set, no mode | reconnect_max_duration_millis=... | synchronous retry; build blocks ⚠ |
| server down, async | initial_connect_retry=async | build returns; I/O thread retries |
server returns 401/403 | any mode | terminal auth failure; no endpoint continuation |
| server appears before async budget | async + budget | buffered frames sent and ACKed |
| server appears after async budget | async + exhausted | sender terminal; new sender/restart needed |
Read-replica startup (one bad endpoint, another replica works)
| Bad endpoint type | Continue to working replica? | Notes |
|---|---|---|
| DNS failure | Yes | transport error |
| TCP refused/unreachable | Yes | transport error; black-hole waits for OS timeout |
| TLS handshake failure | Yes | transport error |
| HTTP upgrade timeout | Yes | after auth_timeout_ms |
upgrades but no SERVER_INFO | Yes | after serverInfoTimeoutMs (builder only) |
primary/standalone while target=replica | Yes | role mismatch |
421 role reject | Yes | try next |
401/403 | No | auth treated as cluster-wide terminal ⚠ |
| broken shared TLS/trust store | No | every endpoint fails |
| all endpoints down | No | all QWP endpoints unreachable |
reachable but none match target | No | QwpRoleMismatchException |
Implementation appendix
Non-normative. Documents how the Java reference client implements this spec; useful while aligning other clients. Primary source areas:
io.questdb.client.QuestDB/QuestDBBuilderio.questdb.client.impl.SenderPool/QueryClientPool/PoolHousekeeperio.questdb.client.Sender.LineSenderBuilderio.questdb.client.cutlass.qwp.client.QwpWebSocketSenderio.questdb.client.cutlass.qwp.client.QwpQueryClientio.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngineio.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoopio.questdb.client.cutlass.qwp.client.QwpHostHealthTrackerio.questdb.client.impl.ConfigSchema(the single key registry)
QuestDBBuilder.build() steps
- Require both ingest and query configs.
- Parse + validate both configs without connecting (runs even when mins are
0; malformed pool/ingest/query/TLS/auth/enum/range values fail here). - Resolve pool keys: explicit builder setters override conn-string keys; conflicting pool values across the two conn strings fail.
- Construct
SenderPoolandQueryClientPool. - Eagerly create
minconnections per pool. - Start the
PoolHousekeeper.
Initial-connect mode resolution (Sender.java)
if initialConnectMode set explicitly -> use it (incl. OFF + tuned budget)
else if any reconnect_* set -> SYNC
else -> OFF
Pooled SF startup recovery nuance
- Live/prewarmed sender slots recover their own unacked data via their
CursorSendEngine. - Non-live managed slots are scanned by the housekeeper startup recovery path,
so
build()does not block on stranded slots. - Recovery of non-live stranded slots is best-effort and bounded: a build/drain failure aborts that scan; data stays durable for a later attempt, but the current process does not retry the aborted scan indefinitely.
- For immediate background drain of all slots, keep enough
sender_pool_minslots warm or construct direct senders for the slots that must actively retry.
Reconnect deadline (CursorWebSocketSendLoop)
deadlineNanos = outageStartNanos + reconnect_max_duration_millis * 1e6; the
loop runs while (running && now < deadline). Hence 0 ⇒ no iterations ⇒
immediate give-up. QwpAuthFailedException / WebSocketUpgradeException inside
the loop are terminal across all endpoints.