QWP client behaviour specification

Audience

This is the normative behaviour specification for QWP clients at startup, connection, failover, and store-and-forward (SF) durability — the contract every QuestDB language client is aligned to. It is for client implementers and for advanced users who need the exact contract.

It is derived from the Java reference client and is under active refinement. Code samples use the Java client for illustration, but the normative content is the behaviour and configuration tables, which apply to every client. Where a client currently diverges, this spec is the target.

Scope

This specifies client behaviour for three connection concerns — initial connect / startup, failover and reconnection, and store-and-forward (SF) durability — plus the connection pooling model that ties them together. The Quick start and Mental model sections are the minimum to configure a correct client; the Reference section is the exhaustive behaviour matrix; the Implementation appendix records non-normative reference-client internals.

Behaviours still being aligned across clients are marked ⚠ Sharp edge and listed under Known sharp edges. "Intended" items are deliberate contracts; "Candidate" items are likely defects targeted for change.


Quick start

Write-only client that tolerates the server being down at startup

Use the direct Sender API (not the QuestDB facade — see sharp edge #4).

String cfg = "ws::addr=db-a:9000,db-b:9000;"
+ "sf_dir=/var/lib/my-app/questdb-sf;" // opt into disk durability
+ "sender_id=writer-1;" // unique per process per sf_dir
+ "initial_connect_retry=async;" // non-blocking startup
+ "reconnect_max_duration_millis=86400000;" // outage budget (24h)
+ "sf_max_total_bytes=100g;";

// For production, prefer the builder so you can install an error handler:
try (Sender sender = Sender.builder(cfg)
.errorHandler(myErrorHandler) // see "Error visibility" below
.connectionListener(myConnectionListener)
.build()) {
sender.table("telemetry").longColumn("v", 42).atNow();
sender.flush(); // persists to SF storage; wire ACK is asynchronous
}

Why each line matters:

  • sf_dir is the only SF enable switch — there is no boolean flag.
  • initial_connect_retry=async is what makes build() return without a live socket. Without it, startup is blocking (see Mental model).
  • reconnect_max_duration_millis is the outage budget for both the initial connect and later reconnects. If it expires, the sender latches terminal and stops; data already in sf_dir survives for a future sender on the same slot.

Error visibility ⚠: the simplest path (Sender.fromConfig(...) + async) surfaces terminal async failures only later, through a producer call or at close(). For production, use Sender.builder(...) and install a SenderErrorHandler / SenderConnectionListener (sharp edge #7).

Read client that only reads from replicas

String cfg = "ws::addr=replica-a:9000,replica-b:9000,replica-c:9000;"
+ "target=replica;" // without this, the client may bind a primary
+ "failover=on;"; // default; affects execute()-time recovery only

try (QuestDB db = QuestDB.connect(cfg)) {
db.executeSql("select * from telemetry limit 10", myBatchHandler);
}

Why each line matters:

  • target=replica is required to avoid binding a primary/standalone server. The default target=any will accept any role.
  • failover=on is the default. It does not affect startup; it only governs reconnect+replay after a query connection that was already established later fails during execute().

Mental model

Three independent "connect" models live in one client

A QuestDB facade owns an ingest pool and a query pool. They do not share a startup model. You must hold all three in mind:

ConcernControlled byStartup is...
Ingest sender initial connectinitial_connect_retry = off / sync / asyncone-shot / blocking-retry / background-retry
Query client initial connect(no mode; always synchronous)always blocking
Facade prewarm (how many of each connect at build())sender_pool_min, query_pool_mineager if min>0, lazy if min=0

failover=on (query default) is not a startup setting — it only affects query execution after a connection exists. This naming trips people up (sharp edge #3).

Ingest initial-connect modes

initial_connect_retryModebuild() behavior on a down server
off / falseOFFone attempt on caller thread; throws immediately
on / true / syncSYNCretry loop on caller thread, bounded by reconnect_max_duration_millis (blocks)
asyncASYNCreturns immediately; I/O thread retries in background

Default resolution ⚠: if you don't set initial_connect_retry explicitly but you do set any reconnect_* knob, the mode becomes SYNC — so a "resilience" knob silently turns startup into a multi-minute blocking retry. If no reconnect_* knob is set either, the mode is OFF. Always set initial_connect_retry explicitly to avoid this (sharp edge #1).

Facade prewarm

QuestDBBuilder.build() validates both configs (without connecting), then eagerly creates min connections per pool. Consequences:

ConfigurationBuild-time network behavior
defaults (min=1 both)creates one sender + one query client; build fails if either cannot connect — unless ingest uses initial_connect_retry=async
sender_pool_min=0no sender at build; first borrowSender()/sender() creates it (then follows the ingest initial-connect mode)
query_pool_min=0no query client at build; first query submit() creates it
both mins 0config-only validation at build; all network work is lazy

After prewarm, both pools grow lazily up to max on demand, and shrink back to min when idle. Growth uses the same real connect path as prewarm. At max, callers block up to acquire_timeout_ms then throw.


Defaults (single source of truth)

Pool (facade only)

Key / builderDefault
sender_pool_min1
sender_pool_max4
query_pool_min1
query_pool_max4
acquire_timeout_ms5000
idle_timeout_ms60000 (0 ⇒ infinite)
max_lifetime_ms1800000 (0 ⇒ infinite)
housekeeper_interval_ms5000

Ingest sender (SF + reconnect)

KeyDefault
sender_iddefault
sf_max_bytes (segment size)4 MiB
sf_max_total_bytes (SF mode)10 GiB
sf_durabilityMEMORY
sf_append_deadline_millis30000
reconnect_max_duration_millis300000 (0give up immediately, not infinite ⚠)
reconnect_initial_backoff_millis100
reconnect_max_backoff_millis5000
close_flush_timeout_millis60000
auth_timeout_ms15000

Query client

KeyDefault
targetany
failoveron
failover_max_attempts8 (incl. original)
failover_max_duration_ms30000 (0 disables the duration cap)
failover_backoff_initial_ms50
failover_backoff_max_ms1000
auth_timeout_ms15000
serverInfoTimeoutMs5000 (builder API only — no config key ⚠)

Note the inconsistent 0 convention: idle_timeout_ms=0/max_lifetime_ms=0 mean infinite, but reconnect_max_duration_millis=0 means give up now (sharp edge #2).


Knob availability by surface

Three configuration surfaces exist. Not every knob is reachable from every surface — this matrix shows where each lives.

  • Conn string: a ws/wss config string. Works for Sender.fromConfig, QwpQueryClient.fromConfig, and QuestDB.connect(...).
  • Sender builder: Sender.builder(...) (LineSenderBuilder) — direct ingest only.
  • Facade builder: QuestDB.builder() (QuestDBBuilder) — pool knobs only; query/ingest behavior must come from the conn string.
KnobConn stringSender builderFacade builder
addraddress()/port()via conn string
username/password/tokenvia conn string
tls_verify/tls_rootsvia conn string
auth_timeout_msvia conn string
initial_connect_retryinitialConnectMode()via conn string
reconnect_*via conn string
sf_dir/sender_id/sf_*via conn string
request_durable_ackvia conn string
close_flush_timeout_millisvia conn string
SenderErrorHandlererrorHandler()❌ (not reachable)
SenderConnectionListenerconnectionListener()❌ (not reachable)
targetn/avia conn string
failover/failover_*n/avia conn string
serverInfoTimeoutMsn/a❌ (QwpQueryClient builder only)
sender_pool_*/query_pool_*n/a
acquire_timeout_ms/idle_timeout_ms/max_lifetime_msn/a

⚠ Gaps worth noting: the ingest error handler / connection listener cannot be installed through the facade at all, and serverInfoTimeoutMs has no config key, so a facade query client cannot tune it (sharp edge #6).


Known sharp edges

These are behaviours still under review as clients are aligned. "Intended" means a deliberate contract that will be kept; "Candidate" means a likely ergonomic defect targeted for change. The numbered references throughout this spec point here.

#Sharp edgeStatus
1initial_connect_retry is implicitly promoted to SYNC when any reconnect_* knob is set — a resilience knob silently makes startup block.Candidate
2reconnect_max_duration_millis name implies "reconnect only" but also governs initial connect; 0 means "give up now" while sibling 0s mean "infinite"; no infinite mode exists.Candidate
3failover sounds like it covers startup but only affects post-connect query execute(). Queries have no async/lazy initial connect at all.Candidate
4No first-class write-only facade: a write-only user must still supply a query config and remember query_pool_min=0.Candidate
5A single endpoint returning 401/403 is treated as cluster-wide terminal and aborts the whole endpoint walk, even at startup, even if other endpoints would accept the credentials.Intended (documented), revisit
6Ingest errorHandler/connectionListener and query serverInfoTimeoutMs are unreachable from the facade.Candidate
7The simplest API (fromConfig + async) has the worst error visibility — terminal async failures surface only on later producer calls or at close().Candidate
8No client-side TCP connect timeout: a black-holed host in addr blocks the endpoint walk until the OS connect timeout.Intended (transport limitation), revisit

Reference

Store-and-forward semantics

sf_dir=... enables SF. There is no separate boolean enable flag.

  • The sender owns one slot: <sf_dir>/<sender_id>/. Default sender_id is default.
  • Multiple independent senders sharing one sf_dir must use distinct sender_id values, else the second fails because the slot lock is held.
  • In pooled QuestDB usage, SenderPool derives per-slot IDs from the base: <base>-0, <base>-1, … so pooled senders never collide.
  • On restart, the cursor engine opens existing segment files and replays unacknowledged frames; acknowledged/truncated frames are not replayed.

flush() semantics (QWP sender):

  • Encodes pending rows into the cursor engine.
  • In SF mode, data is persisted to mmap-backed segment files before flush() returns.
  • flush() does not wait for server ACKs unless backpressure requires space. The I/O thread sends frames and trims ACKed frames asynchronously.
  • drain(timeoutMillis) flushes and waits for the server to ACK all currently published frames, up to the timeout.
  • close() flushes then waits up to close_flush_timeout_millis for ACKs, unless that timeout is <= 0.

Async initial connect (ingest)

With initial_connect_retry=async:

  • build() returns without a live socket; wasEverConnected() is false.
  • Producer calls and flush() can run before the server exists; frames accumulate in the cursor engine (and on disk with sf_dir).
  • The I/O thread retries in the background using the same loop used after wire failure.
  • If a server appears before the budget expires, buffered frames are sent/replayed and ACK-driven trimming begins.
  • If the budget expires before any connection, the sender latches a terminal SenderError whose message contains never-connected-budget-exhausted.
  • If it connected at least once and a later outage exhausts the budget, the message contains connection-lost-budget-exhausted.
  • Terminal async errors go to a configured SenderErrorHandler; without one they surface on later producer calls or at close-time.

There is no infinite-retry mode. For long maintenance windows, set a large reconnect_max_duration_millis. On budget exhaustion the current sender stops; persisted sf_dir data remains for a future sender on the same slot.

Ingest endpoint walk (addr=a:9000,b:9000,...)

Per-endpoint resultSender behavior
DNS failuretransport error; try next endpoint
TCP connect failuretransport error; try next endpoint
TLS session/certificate failuretransport error; try next endpoint
HTTP upgrade timeout / non-auth transport errortry next endpoint
421 with X-QuestDB-Role: REPLICArole reject; try next endpoint
401 / 403 auth failureterminal; do not try later endpoints ⚠
durable-ack requested but unsupportedterminal mismatch
successful write upgradebind this endpoint
all endpoints fail transportthrow / retry per initial/reconnect mode
all endpoints role-reject as replicasQwpRoleMismatchException

Query client initial connect

QwpQueryClient.connect() is synchronous. Per endpoint it: opens TCP/TLS, performs the WebSocket upgrade to /read/v1, reads the initial SERVER_INFO frame, applies the target= role filter, and starts the egress I/O thread on the first match. If no endpoint can be used, it throws. There is no async initial-connect mode for queries.

target= matching:

TargetAccepted roles
anyany role
primaryPRIMARY, PRIMARY_CATCHUP, STANDALONE
replicaREPLICA only

Query initial-connect endpoint matrix:

Per-endpoint resultBehavior
DNS / TCP / TLS failurerecord transport error; try next endpoint
HTTP upgrade timeouttransport error; try next endpoint
HTTP 401 / 403terminal QwpAuthFailedException; do not try later ⚠
HTTP 421 + role headerrole reject; try next endpoint
upgrade ok but no SERVER_INFO before timeouttransport error; try next
SERVER_INFO role ≠ targetrole reject; try next endpoint
endpoint matches targetbind and return success
all endpoints transport-failHttpClientException: all QWP endpoints unreachable ...
all endpoints role-rejectQwpRoleMismatchException

auth_timeout_ms bounds the upgrade/auth phase after TCP connect. There is no separate client-side TCP connect timeout, so a black-holed connect blocks until the OS timeout before the walk advances ⚠.

Query execution-time failover

With failover=on:

  • A transport/protocol terminal failure during execute() is intercepted; the client reconnects via the host tracker and re-submits.
  • The handler receives onFailoverReset(...) before replayed batches.
  • Bounded by failover_max_attempts (default 8, incl. original) and failover_max_duration_ms (default 30000; 0 disables the duration cap).
  • Backoff: failover_backoff_initial_ms=50, failover_backoff_max_ms=1000.
  • Auth failure during failover reconnect is terminal and reported to the handler.

With failover=off, a transport failure is reported to the handler with no reconnect/replay.

Scenario matrix

Facade startup

ScenarioConfigResult
Default connect, all servers downdefault minsbuild fails
Default connect, first endpoint down, second worksmulti-addrbuild can succeed; each prewarmed client walks endpoints
Write-only-ish startup while downquery_pool_min=0 + sender asyncbuild returns
Fully lazy startupboth mins 0build returns after validation only
Query first use after lazy startup while downquery_pool_min=0first submit() throws
Sender first use after lazy startup while downsender_pool_min=0first sender creation follows ingest initial mode

Direct sender startup

ScenarioConfigResult
server down, default modeno reconnect_*, no asyncone attempt; build throws
server down, reconnect duration set, no modereconnect_max_duration_millis=...synchronous retry; build blocks ⚠
server down, asyncinitial_connect_retry=asyncbuild returns; I/O thread retries
server returns 401/403any modeterminal auth failure; no endpoint continuation
server appears before async budgetasync + budgetbuffered frames sent and ACKed
server appears after async budgetasync + exhaustedsender terminal; new sender/restart needed

Read-replica startup (one bad endpoint, another replica works)

Bad endpoint typeContinue to working replica?Notes
DNS failureYestransport error
TCP refused/unreachableYestransport error; black-hole waits for OS timeout
TLS handshake failureYestransport error
HTTP upgrade timeoutYesafter auth_timeout_ms
upgrades but no SERVER_INFOYesafter serverInfoTimeoutMs (builder only)
primary/standalone while target=replicaYesrole mismatch
421 role rejectYestry next
401/403Noauth treated as cluster-wide terminal ⚠
broken shared TLS/trust storeNoevery endpoint fails
all endpoints downNoall QWP endpoints unreachable
reachable but none match targetNoQwpRoleMismatchException

Implementation appendix

Non-normative. Documents how the Java reference client implements this spec; useful while aligning other clients. Primary source areas:

  • io.questdb.client.QuestDB / QuestDBBuilder
  • io.questdb.client.impl.SenderPool / QueryClientPool / PoolHousekeeper
  • io.questdb.client.Sender.LineSenderBuilder
  • io.questdb.client.cutlass.qwp.client.QwpWebSocketSender
  • io.questdb.client.cutlass.qwp.client.QwpQueryClient
  • io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine
  • io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop
  • io.questdb.client.cutlass.qwp.client.QwpHostHealthTracker
  • io.questdb.client.impl.ConfigSchema (the single key registry)

QuestDBBuilder.build() steps

  1. Require both ingest and query configs.
  2. Parse + validate both configs without connecting (runs even when mins are 0; malformed pool/ingest/query/TLS/auth/enum/range values fail here).
  3. Resolve pool keys: explicit builder setters override conn-string keys; conflicting pool values across the two conn strings fail.
  4. Construct SenderPool and QueryClientPool.
  5. Eagerly create min connections per pool.
  6. Start the PoolHousekeeper.

Initial-connect mode resolution (Sender.java)

if initialConnectMode set explicitly -> use it (incl. OFF + tuned budget)
else if any reconnect_* set -> SYNC
else -> OFF

Pooled SF startup recovery nuance

  • Live/prewarmed sender slots recover their own unacked data via their CursorSendEngine.
  • Non-live managed slots are scanned by the housekeeper startup recovery path, so build() does not block on stranded slots.
  • Recovery of non-live stranded slots is best-effort and bounded: a build/drain failure aborts that scan; data stays durable for a later attempt, but the current process does not retry the aborted scan indefinitely.
  • For immediate background drain of all slots, keep enough sender_pool_min slots warm or construct direct senders for the slots that must actively retry.

Reconnect deadline (CursorWebSocketSendLoop)

deadlineNanos = outageStartNanos + reconnect_max_duration_millis * 1e6; the loop runs while (running && now < deadline). Hence 0 ⇒ no iterations ⇒ immediate give-up. QwpAuthFailedException / WebSocketUpgradeException inside the loop are terminal across all endpoints.