Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Architecture

How Fungi's components fit together.

System Overview

┌─────────────────────────────────────────────┐
│                gRPC / REST API              │
├─────────────────────────────────────────────┤
│    SQL (DataFusion)  │  DataStream API      │
├─────────────────────────────────────────────┤
│         Streaming Execution Engine          │
│  ┌─────────┐ ┌──────────┐ ┌─────────────┐  │
│  │Operator │ │  State   │ │  Watermark  │  │
│  │  Graph  │ │ Manager  │ │  & Windows  │  │
│  └─────────┘ └──────────┘ └─────────────┘  │
├─────────────────────────────────────────────┤
│   Kafka  │  State Backend  │  Checkpoints   │
└─────────────────────────────────────────────┘

Crates

CratePurpose
fungi-commonShared types, errors, config
fungi-stateState backends (Memory, RocksDB)
fungi-streamingWatermarks, windows, timers
fungi-executionOperator graph, task executor, Python UDF
fungi-checkpointCheckpoint coordinator
fungi-sqlDataFusion integration
fungi-flinkFlink DataStream API
fungi-connectorKafka source/sink
fungi-servergRPC/HTTP server
fungi-cliCommand-line interface
fungi-clusterDistributed protocol
fungi-schedulerJob/task scheduling
fungi-reliabilityDLQ, retries, circuit breaker
fungi-securityAuth, RBAC, audit
fungi-tenancyMulti-tenant isolation
fungi-dashboardLeptos WASM dashboard
fungi-mcpMCP server for AI agents

Data Flow

Source → Deserialize → Timestamp → KeyBy → Window → Aggregate → Serialize → Sink
  1. Source reads from Kafka / file / API
  2. Deserialize → Arrow RecordBatch
  3. Timestamp assigns event time
  4. KeyBy partitions for parallelism
  5. Window groups by time (tumbling / sliding / session)
  6. Aggregate computes (sum / count / avg / custom)
  7. Serialize → bytes
  8. Sink writes to Kafka / DB / file

All internal data flows use Arrow RecordBatch — zero-copy, SIMD-friendly.

Execution Modes

ModeUse Case
EmbeddedSingle process, library API, CLI --local
ServerSingle node, gRPC API, dashboard
DistributedJobManager + N TaskManagers, K8s

See Concepts / Distributed for the multi-node model.