QWP client behaviour specification

Audience

This is the normative behaviour specification for QWP clients at startup, connection, failover, and store-and-forward (SF) durability — the contract every QuestDB language client is aligned to. It is for client implementers and for advanced users who need the exact contract.

It is derived from the Java reference client and is under active refinement. Code samples use the Java client for illustration, but the normative content is the behaviour and configuration tables, which apply to every client. Where a client currently diverges, this spec is the target.

Scope

This specifies client behaviour for three connection concerns — initial connect / startup, failover and reconnection, and store-and-forward (SF) durability — plus the connection pooling model that ties them together. The Quick start and Mental model sections are the minimum to configure a correct client; the Reference section is the exhaustive behaviour matrix; the Implementation appendix records non-normative reference-client internals.

Behaviours still being aligned across clients are marked ⚠ Sharp edge and listed under Known sharp edges. "Intended" items are deliberate contracts; "Candidate" items are likely defects targeted for change.

Quick start

Write-only client that tolerates the server being down at startup

Use the direct Sender API (not the QuestDB facade — see sharp edge #4).

String cfg = "ws::addr=db-a:9000,db-b:9000;"
        + "sf_dir=/var/lib/my-app/questdb-sf;"   // opt into disk durability
        + "sender_id=writer-1;"                  // unique per process per sf_dir
        + "initial_connect_retry=async;"         // non-blocking startup
        + "reconnect_max_duration_millis=86400000;" // outage budget (24h)
        + "sf_max_total_bytes=100g;";

// For production, prefer the builder so you can install an error handler:
try (Sender sender = Sender.builder(cfg)
        .errorHandler(myErrorHandler)            // see "Error visibility" below
        .connectionListener(myConnectionListener)
        .build()) {
    sender.table("telemetry").longColumn("v", 42).atNow();
    sender.flush(); // persists to SF storage; wire ACK is asynchronous
}

Why each line matters:

sf_dir is the only SF enable switch — there is no boolean flag.
initial_connect_retry=async is what makes build() return without a live socket. Without it, startup is blocking (see Mental model).
reconnect_max_duration_millis is the outage budget for both the initial connect and later reconnects. If it expires, the sender latches terminal and stops; data already in sf_dir survives for a future sender on the same slot.

Error visibility ⚠: the simplest path (Sender.fromConfig(...) + async) surfaces terminal async failures only later, through a producer call or at close(). For production, use Sender.builder(...) and install a SenderErrorHandler / SenderConnectionListener (sharp edge #7).

Read client that only reads from replicas

String cfg = "ws::addr=replica-a:9000,replica-b:9000,replica-c:9000;"
        + "target=replica;"   // without this, the client may bind a primary
        + "failover=on;";      // default; affects execute()-time recovery only

try (QuestDB db = QuestDB.connect(cfg)) {
    db.executeSql("select * from telemetry limit 10", myBatchHandler);
}

Why each line matters:

target=replica is required to avoid binding a primary/standalone server. The default target=any will accept any role.
failover=on is the default. It does not affect startup; it only governs reconnect+replay after a query connection that was already established later fails during execute().

Mental model

Three independent "connect" models live in one client

A QuestDB facade owns an ingest pool and a query pool. They do not share a startup model. You must hold all three in mind:

Concern	Controlled by	Startup is...
Ingest sender initial connect	`initial_connect_retry` = `off` / `sync` / `async`	one-shot / blocking-retry / background-retry
Query client initial connect	(no mode; always synchronous)	always blocking
Facade prewarm (how many of each connect at `build()`)	`sender_pool_min`, `query_pool_min`	eager if `min>0`, lazy if `min=0`

failover=on (query default) is not a startup setting — it only affects query execution after a connection exists. This naming trips people up (sharp edge #3).

Ingest initial-connect modes

`initial_connect_retry`	Mode	`build()` behavior on a down server
`off` / `false`	`OFF`	one attempt on caller thread; throws immediately
`on` / `true` / `sync`	`SYNC`	retry loop on caller thread, bounded by `reconnect_max_duration_millis` (blocks)
`async`	`ASYNC`	returns immediately; I/O thread retries in background

Default resolution ⚠: if you don't set initial_connect_retry explicitly but you do set any reconnect_* knob, the mode becomes SYNC — so a "resilience" knob silently turns startup into a multi-minute blocking retry. If no reconnect_* knob is set either, the mode is OFF. Always set initial_connect_retry explicitly to avoid this (sharp edge #1).

Facade prewarm

QuestDBBuilder.build() validates both configs (without connecting), then eagerly creates min connections per pool. Consequences:

Configuration	Build-time network behavior
defaults (`min=1` both)	creates one sender + one query client; build fails if either cannot connect — unless ingest uses `initial_connect_retry=async`
`sender_pool_min=0`	no sender at build; first `borrowSender()`/`sender()` creates it (then follows the ingest initial-connect mode)
`query_pool_min=0`	no query client at build; first query `submit()` creates it
both mins `0`	config-only validation at build; all network work is lazy

After prewarm, both pools grow lazily up to max on demand, and shrink back to min when idle. Growth uses the same real connect path as prewarm. At max, callers block up to acquire_timeout_ms then throw.

Defaults (single source of truth)

Pool (facade only)

Key / builder	Default
`sender_pool_min`	`1`
`sender_pool_max`	`4`
`query_pool_min`	`1`
`query_pool_max`	`4`
`acquire_timeout_ms`	`5000`
`idle_timeout_ms`	`60000` (`0` ⇒ infinite)
`max_lifetime_ms`	`1800000` (`0` ⇒ infinite)
`housekeeper_interval_ms`	`5000`

Ingest sender (SF + reconnect)

Key	Default
`sender_id`	`default`
`sf_max_bytes` (segment size)	`4 MiB`
`sf_max_total_bytes` (SF mode)	`10 GiB`
`sf_durability`	`MEMORY`
`sf_append_deadline_millis`	`30000`
`reconnect_max_duration_millis`	`300000` (`0` ⇒ give up immediately, not infinite ⚠)
`reconnect_initial_backoff_millis`	`100`
`reconnect_max_backoff_millis`	`5000`
`close_flush_timeout_millis`	`60000`
`auth_timeout_ms`	`15000`

Query client

Key	Default
`target`	`any`
`failover`	`on`
`failover_max_attempts`	`8` (incl. original)
`failover_max_duration_ms`	`30000` (`0` disables the duration cap)
`failover_backoff_initial_ms`	`50`
`failover_backoff_max_ms`	`1000`
`auth_timeout_ms`	`15000`
`serverInfoTimeoutMs`	`5000` (builder API only — no config key ⚠)

Note the inconsistent 0 convention: idle_timeout_ms=0/max_lifetime_ms=0 mean infinite, but reconnect_max_duration_millis=0 means give up now (sharp edge #2).

Knob availability by surface

Three configuration surfaces exist. Not every knob is reachable from every surface — this matrix shows where each lives.

Conn string: a ws/wss config string. Works for Sender.fromConfig, QwpQueryClient.fromConfig, and QuestDB.connect(...).
Sender builder: Sender.builder(...) (LineSenderBuilder) — direct ingest only.
Facade builder: QuestDB.builder() (QuestDBBuilder) — pool knobs only; query/ingest behavior must come from the conn string.

Knob	Conn string	Sender builder	Facade builder
`addr`	✅	✅ `address()/port()`	via conn string
`username`/`password`/`token`	✅	✅	via conn string
`tls_verify`/`tls_roots`	✅	✅	via conn string
`auth_timeout_ms`	✅	✅	via conn string
`initial_connect_retry`	✅	✅ `initialConnectMode()`	via conn string
`reconnect_*`	✅	✅	via conn string
`sf_dir`/`sender_id`/`sf_*`	✅	✅	via conn string
`request_durable_ack`	✅	✅	via conn string
`close_flush_timeout_millis`	✅	✅	via conn string
`SenderErrorHandler`	❌	✅ `errorHandler()`	❌ (not reachable)
`SenderConnectionListener`	❌	✅ `connectionListener()`	❌ (not reachable)
`target`	✅	n/a	via conn string
`failover`/`failover_*`	✅	n/a	via conn string
`serverInfoTimeoutMs`	❌	n/a	❌ (QwpQueryClient builder only)
`sender_pool_`/`query_pool_`	✅	n/a	✅
`acquire_timeout_ms`/`idle_timeout_ms`/`max_lifetime_ms`	✅	n/a	✅

⚠ Gaps worth noting: the ingest error handler / connection listener cannot be installed through the facade at all, and serverInfoTimeoutMs has no config key, so a facade query client cannot tune it (sharp edge #6).

Known sharp edges

These are behaviours still under review as clients are aligned. "Intended" means a deliberate contract that will be kept; "Candidate" means a likely ergonomic defect targeted for change. The numbered references throughout this spec point here.

#	Sharp edge	Status
1	`initial_connect_retry` is implicitly promoted to `SYNC` when any `reconnect_*` knob is set — a resilience knob silently makes startup block.	Candidate
2	`reconnect_max_duration_millis` name implies "reconnect only" but also governs initial connect; `0` means "give up now" while sibling `0`s mean "infinite"; no infinite mode exists.	Candidate
3	`failover` sounds like it covers startup but only affects post-connect query `execute()`. Queries have no async/lazy initial connect at all.	Candidate
4	No first-class write-only facade: a write-only user must still supply a query config and remember `query_pool_min=0`.	Candidate
5	A single endpoint returning `401`/`403` is treated as cluster-wide terminal and aborts the whole endpoint walk, even at startup, even if other endpoints would accept the credentials.	Intended (documented), revisit
6	Ingest `errorHandler`/`connectionListener` and query `serverInfoTimeoutMs` are unreachable from the facade.	Candidate
7	The simplest API (`fromConfig` + async) has the worst error visibility — terminal async failures surface only on later producer calls or at `close()`.	Candidate
8	No client-side TCP connect timeout: a black-holed host in `addr` blocks the endpoint walk until the OS connect timeout.	Intended (transport limitation), revisit

Reference

Store-and-forward semantics

sf_dir=... enables SF. There is no separate boolean enable flag.

The sender owns one slot: <sf_dir>/<sender_id>/. Default sender_id is default.
Multiple independent senders sharing one sf_dir must use distinct sender_id values, else the second fails because the slot lock is held.
In pooled QuestDB usage, SenderPool derives per-slot IDs from the base: <base>-0, <base>-1, … so pooled senders never collide.
On restart, the cursor engine opens existing segment files and replays unacknowledged frames; acknowledged/truncated frames are not replayed.

flush() semantics (QWP sender):

Encodes pending rows into the cursor engine.
In SF mode, data is persisted to mmap-backed segment files before flush() returns.
flush() does not wait for server ACKs unless backpressure requires space. The I/O thread sends frames and trims ACKed frames asynchronously.
drain(timeoutMillis) flushes and waits for the server to ACK all currently published frames, up to the timeout.
close() flushes then waits up to close_flush_timeout_millis for ACKs, unless that timeout is <= 0.

Async initial connect (ingest)

With initial_connect_retry=async:

build() returns without a live socket; wasEverConnected() is false.
Producer calls and flush() can run before the server exists; frames accumulate in the cursor engine (and on disk with sf_dir).
The I/O thread retries in the background using the same loop used after wire failure.
If a server appears before the budget expires, buffered frames are sent/replayed and ACK-driven trimming begins.
If the budget expires before any connection, the sender latches a terminal SenderError whose message contains never-connected-budget-exhausted.
If it connected at least once and a later outage exhausts the budget, the message contains connection-lost-budget-exhausted.
Terminal async errors go to a configured SenderErrorHandler; without one they surface on later producer calls or at close-time.

There is no infinite-retry mode. For long maintenance windows, set a large reconnect_max_duration_millis. On budget exhaustion the current sender stops; persisted sf_dir data remains for a future sender on the same slot.

Ingest endpoint walk (`addr=a:9000,b:9000,...`)

Per-endpoint result	Sender behavior
DNS failure	transport error; try next endpoint
TCP connect failure	transport error; try next endpoint
TLS session/certificate failure	transport error; try next endpoint
HTTP upgrade timeout / non-auth transport error	try next endpoint
`421` with `X-QuestDB-Role: REPLICA`	role reject; try next endpoint
`401` / `403` auth failure	terminal; do not try later endpoints ⚠
durable-ack requested but unsupported	terminal mismatch
successful write upgrade	bind this endpoint
all endpoints fail transport	throw / retry per initial/reconnect mode
all endpoints role-reject as replicas	`QwpRoleMismatchException`

Query client initial connect

QwpQueryClient.connect() is synchronous. Per endpoint it: opens TCP/TLS, performs the WebSocket upgrade to /read/v1, reads the initial SERVER_INFO frame, applies the target= role filter, and starts the egress I/O thread on the first match. If no endpoint can be used, it throws. There is no async initial-connect mode for queries.

target= matching:

Target	Accepted roles
`any`	any role
`primary`	`PRIMARY`, `PRIMARY_CATCHUP`, `STANDALONE`
`replica`	`REPLICA` only

Query initial-connect endpoint matrix:

Per-endpoint result	Behavior
DNS / TCP / TLS failure	record transport error; try next endpoint
HTTP upgrade timeout	transport error; try next endpoint
HTTP `401` / `403`	terminal `QwpAuthFailedException`; do not try later ⚠
HTTP `421` + role header	role reject; try next endpoint
upgrade ok but no `SERVER_INFO` before timeout	transport error; try next
`SERVER_INFO` role ≠ `target`	role reject; try next endpoint
endpoint matches target	bind and return success
all endpoints transport-fail	`HttpClientException: all QWP endpoints unreachable ...`
all endpoints role-reject	`QwpRoleMismatchException`

auth_timeout_ms bounds the upgrade/auth phase after TCP connect. There is no separate client-side TCP connect timeout, so a black-holed connect blocks until the OS timeout before the walk advances ⚠.

Query execution-time failover

With failover=on:

A transport/protocol terminal failure during execute() is intercepted; the client reconnects via the host tracker and re-submits.
The handler receives onFailoverReset(...) before replayed batches.
Bounded by failover_max_attempts (default 8, incl. original) and failover_max_duration_ms (default 30000; 0 disables the duration cap).
Backoff: failover_backoff_initial_ms=50, failover_backoff_max_ms=1000.
Auth failure during failover reconnect is terminal and reported to the handler.

With failover=off, a transport failure is reported to the handler with no reconnect/replay.

Scenario matrix

Facade startup

Scenario	Config	Result
Default `connect`, all servers down	default mins	build fails
Default `connect`, first endpoint down, second works	multi-addr	build can succeed; each prewarmed client walks endpoints
Write-only-ish startup while down	`query_pool_min=0` + sender async	build returns
Fully lazy startup	both mins `0`	build returns after validation only
Query first use after lazy startup while down	`query_pool_min=0`	first `submit()` throws
Sender first use after lazy startup while down	`sender_pool_min=0`	first sender creation follows ingest initial mode

Direct sender startup

Scenario	Config	Result
server down, default mode	no `reconnect_*`, no async	one attempt; build throws
server down, reconnect duration set, no mode	`reconnect_max_duration_millis=...`	synchronous retry; build blocks ⚠
server down, async	`initial_connect_retry=async`	build returns; I/O thread retries
server returns `401`/`403`	any mode	terminal auth failure; no endpoint continuation
server appears before async budget	async + budget	buffered frames sent and ACKed
server appears after async budget	async + exhausted	sender terminal; new sender/restart needed

Read-replica startup (one bad endpoint, another replica works)

Bad endpoint type	Continue to working replica?	Notes
DNS failure	Yes	transport error
TCP refused/unreachable	Yes	transport error; black-hole waits for OS timeout
TLS handshake failure	Yes	transport error
HTTP upgrade timeout	Yes	after `auth_timeout_ms`
upgrades but no `SERVER_INFO`	Yes	after `serverInfoTimeoutMs` (builder only)
primary/standalone while `target=replica`	Yes	role mismatch
`421` role reject	Yes	try next
`401`/`403`	No	auth treated as cluster-wide terminal ⚠
broken shared TLS/trust store	No	every endpoint fails
all endpoints down	No	`all QWP endpoints unreachable`
reachable but none match `target`	No	`QwpRoleMismatchException`

Implementation appendix

Non-normative. Documents how the Java reference client implements this spec; useful while aligning other clients. Primary source areas:

io.questdb.client.QuestDB / QuestDBBuilder
io.questdb.client.impl.SenderPool / QueryClientPool / PoolHousekeeper
io.questdb.client.Sender.LineSenderBuilder
io.questdb.client.cutlass.qwp.client.QwpWebSocketSender
io.questdb.client.cutlass.qwp.client.QwpQueryClient
io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine
io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop
io.questdb.client.cutlass.qwp.client.QwpHostHealthTracker
io.questdb.client.impl.ConfigSchema (the single key registry)

`QuestDBBuilder.build()` steps

Require both ingest and query configs.
Parse + validate both configs without connecting (runs even when mins are 0; malformed pool/ingest/query/TLS/auth/enum/range values fail here).
Resolve pool keys: explicit builder setters override conn-string keys; conflicting pool values across the two conn strings fail.
Construct SenderPool and QueryClientPool.
Eagerly create min connections per pool.
Start the PoolHousekeeper.

Initial-connect mode resolution (`Sender.java`)

if initialConnectMode set explicitly -> use it (incl. OFF + tuned budget)
else if any reconnect_* set          -> SYNC
else                                 -> OFF

Pooled SF startup recovery nuance

Live/prewarmed sender slots recover their own unacked data via their CursorSendEngine.
Non-live managed slots are scanned by the housekeeper startup recovery path, so build() does not block on stranded slots.
Recovery of non-live stranded slots is best-effort and bounded: a build/drain failure aborts that scan; data stays durable for a later attempt, but the current process does not retry the aborted scan indefinitely.
For immediate background drain of all slots, keep enough sender_pool_min slots warm or construct direct senders for the slots that must actively retry.

Reconnect deadline (`CursorWebSocketSendLoop`)

deadlineNanos = outageStartNanos + reconnect_max_duration_millis * 1e6; the loop runs while (running && now < deadline). Hence 0 ⇒ no iterations ⇒ immediate give-up. QwpAuthFailedException / WebSocketUpgradeException inside the loop are terminal across all endpoints.

Scope​

Quick start​

Write-only client that tolerates the server being down at startup​

Read client that only reads from replicas​

Mental model​

Three independent "connect" models live in one client​

Ingest initial-connect modes​

Facade prewarm​

Defaults (single source of truth)​

Pool (facade only)​

Ingest sender (SF + reconnect)​

Query client​

Knob availability by surface​

Known sharp edges​

Reference​

Store-and-forward semantics​

Async initial connect (ingest)​

Ingest endpoint walk (addr=a:9000,b:9000,...)​

Query client initial connect​

Query execution-time failover​

Scenario matrix​

Facade startup​

Direct sender startup​

Read-replica startup (one bad endpoint, another replica works)​

Implementation appendix​

QuestDBBuilder.build() steps​

Initial-connect mode resolution (Sender.java)​

Pooled SF startup recovery nuance​

Reconnect deadline (CursorWebSocketSendLoop)​