First-class stage-inbox workflow topology¶

This document explains the implementation shape for Relayna's multi-stage workflow topology and why the library models it as a stage-inbox system rather than as explicit producer-owned outbox queues.

Why this topology exists¶

Relayna already supported:

one shared task queue plus one shared status queue/stream
routed task queues by task_type
shard-owned aggregation queues on the shared status exchange

Those shapes work well for single-hop and routed worker systems, but they do not express planner, re-planner, search, writer, and file-creation chains in a single topology object.

SharedStatusWorkflowTopology fills that gap with:

one workflow topic exchange for stage-to-stage work
one durable inbox queue per consuming stage
one shared status exchange and shared status queue/stream
named entry routes for ingress traffic such as planner and re-planner entry points

Why stage-inbox is the first-class choice¶

Stage-inbox matches RabbitMQ ownership semantics.

When the "next pod consumes it", the durable queue is naturally that next stage's inbox. That queue is where:

backlog accumulates
retry and DLQ semantics apply
autoscaling decisions are made
consumer lag is measured
operator ownership usually lives

Modeling that queue as an upstream stage's outbox hides the real owner of the work. The producer publishes to an exchange. The consumer owns the durable mailbox.

This is why the first-class topology chooses:

producer publishes with a routing key
downstream stage owns the queue
workflow queue names represent inboxes, not transitions

Explicit edge/outbox queues are deferred. They are useful in some systems, but they add extra queue objects, extra routing metadata, and extra operational surfaces before the library has proven the simpler stage-inbox abstraction.

Workflow and status stay on separate lanes¶

Relayna keeps workflow transport and user-visible progress separate on purpose.

Workflow lane:

exchange: workflow_exchange
payload: WorkflowEnvelope
purpose: move work between internal stages

Status lane:

exchange: status_exchange
payload: StatusEventEnvelope
purpose: publish progress, history, and final state for StatusHub, StreamHistoryReader, Redis history, and SSE

This separation avoids coupling internal workflow hops to user-facing history. It keeps the existing status stack unchanged and preserves current operational and API behavior.

Public model¶

relayna.topology exposes:

WorkflowStage
WorkflowEntryRoute
SharedStatusWorkflowTopology

WorkflowStage defines:

name
queue
binding_keys
publish_routing_key
queue_arguments_overrides
queue_kwargs

WorkflowEntryRoute defines:

name
routing_key
target_stage

SharedStatusWorkflowTopology defines:

one workflow topic exchange
one shared status topic exchange
one shared status queue/stream
any number of workflow stages
optional named entry routes
curated workflow queue arguments plus broker-specific escape hatches

Validation rules:

stage names must be unique
queue names must be unique
entry route names must be unique
each stage must have at least one binding key
each stage must define one default publish_routing_key
entry routes must target an existing stage
duplicate queue-argument keys across built-ins, overrides, and kwargs fail fast

Workflow message contract¶

relayna.contracts.WorkflowEnvelope is the canonical stage-to-stage transport shape:

task_id
message_id
correlation_id
stage
origin_stage
action
payload
meta

Contract semantics:

task_id is the workflow identity visible to status/history/SSE
message_id identifies a specific hop
stage is the intended destination stage
origin_stage is populated by downstream handoff helpers
correlation_id defaults to task_id

Ingress task delivery and workflow-hop delivery remain distinct concepts. Relayna therefore keeps TaskEnvelope and WorkflowEnvelope separate rather than overloading one transport shape for both.

Publishing and consuming¶

relayna.rabbitmq.RelaynaRabbitClient now supports:

publish_workflow(...)
publish_to_stage(...)
publish_to_entry(...)
publish_workflow_message(...)
ensure_workflow_queue(...)

publish_task(...) remains the legacy task-topology entry point and raises on SharedStatusWorkflowTopology so callers do not silently publish to the wrong lane.

relayna.consumer now supports:

WorkflowConsumer
WorkflowContext

WorkflowConsumer consumes a named stage inbox queue and reuses the same retry, DLQ, and status-publishing model already used by the task and aggregation consumers.

WorkflowContext supports:

publish_status(...)
publish_to_stage(...)
publish_workflow_message(...)

Downstream publish behavior:

preserve task_id
preserve correlation_id
generate a new message_id
set origin_stage from the current stage
set stage to the destination stage

Planner, re-planner, and writer all fit the same abstraction¶

Planner flow:

API publishes to planner.topic_planner.in
topic_planner consumes from its inbox queue
it publishes to planner.docsearch_planner.in
docsearch_planner consumes from its inbox queue
the chain continues stage by stage

Re-planner flow:

API publishes to a named re-planner entry route such as replanner.docsearch_planner.in
the same docsearch_planner inbox queue can bind both planner and re-planner keys
the consuming stage therefore stays the same while ingress changes

Writer flow:

API can publish into docsearcher and websearcher
both publish to the same researcher_aggregator inbox queue
fan-in stays simple because the queue belongs to the downstream stage

This is the key design win: one abstraction covers straight chains, alternate entry paths, and fan-in without introducing special queue families.

Compatibility with existing topologies¶

The workflow topology is additive.

Existing topologies remain first-class and unchanged:

SharedTasksSharedStatusTopology
SharedTasksSharedStatusShardedAggregationTopology
RoutedTasksSharedStatusTopology
RoutedTasksSharedStatusShardedAggregationTopology

The topology protocol was generalized so workflow-aware code can talk about "workflow queues" while older topologies map their single task queue into the same compatibility surface.

Retry, DLQ, and failure model¶

Workflow stage queues use the same retry infrastructure already used by Relayna's existing consumers:

retry queue suffix: .retry
dead-letter queue suffix: .dlq
retry metadata carried in x-relayna-* headers
DLQ indexing continues to store replay metadata in Redis

Failure semantics:

malformed JSON can be rejected or dead-lettered depending on retry policy
invalid WorkflowEnvelope payloads follow the same rule
handler failures either reject, retry, or dead-letter according to the configured policy
retry status publishing remains on the shared status exchange

Observability¶

Workflow-specific observation events now exist for stage activity:

WorkflowStageStarted
WorkflowMessageReceived
WorkflowMessagePublished
WorkflowStageAcked
WorkflowStageFailed

These events let operators debug stage traffic without mixing workflow transport concerns into status history.

Acceptance criteria¶

The topology counts as first-class when all of the following are true:

A user can declare the full multi-stage pipeline in one topology object.
Producers can publish to a stage or a named entry route without raw AMQP code.
Workers can consume a named stage with WorkflowConsumer.
Handlers can publish downstream work with WorkflowContext.
Status events from workflow handlers still reach StatusHub, history, and SSE unchanged.
Planner, re-planner, and writer flows are documented with concrete diagrams and code examples in the getting-started guide.

For end-user examples and Mermaid diagrams, see Getting Started.