Docs/Advanced Features/Bucket Tables

    Bucket Tables

    Bucket Tables turn S3 collections into queryable tables. Manifests, indexes, and aggregations eliminate the N+1 fetch problem and enable fast queries over a flat key-value store.


    The Problem

    Without optimization, every query requires listing all object keys under a prefix and fetching each entity individually. For a collection with 1,000 entities, findAll() issues 1,001 S3 requests. At 50ms per request, that is 50 seconds.

    Bucket Tables solve this with two structures:

    • Manifests -- materialized collection summaries as a single JSON file. One GET instead of N+1.
    • Indexes -- per-field lookup tables for targeted queries without scanning.

    Collection Manifests

    A .worm.manifest.json file per collection that caches entity metadata for fast queries. This is the core optimization -- it turns O(N) fetches into O(1).

    Manifest Structure

    // org/customers/.worm.manifest.json
    {
      "version": 1,
      "model": "Customer",
      "count": 1423,
      "lastUpdated": "2026-02-24T14:30:00.000Z",
      "checksum": "sha256:abc123...",
      "entities": {
        "a1b2c3d4-...": {
          "name": "Acme Corp",
          "email": "hi@acme.com",
          "status": "active",
          "tags": ["vip", "enterprise"],
          "createdAt": "2026-01-15T09:00:00.000Z",
          "updatedAt": "2026-02-20T11:30:00.000Z"
        },
        "e5f6g7h8-...": {
          "name": "Widgets Inc",
          "email": "hello@widgets.com",
          "status": "inactive",
          "tags": [],
          "createdAt": "2026-01-20T14:00:00.000Z",
          "updatedAt": "2026-02-18T08:45:00.000Z"
        }
      }
    }
    

    Manifest Configuration

    {
      "models": {
        "Customer": {
          "manifest": {
            "enabled": true,
            "fields": ["name", "email", "status", "tags"],
            "autoUpdate": true,
            "partitionThreshold": 10000,
            "maxPartitions": 16
          }
        }
      }
    }
    
    FieldTypeDefaultDescription
    enabledbooleanfalseEnable manifest for this model
    fieldsstring[]All schema fieldsFields to include in the manifest
    autoUpdatebooleantrueUpdate manifest on save/delete
    partitionThresholdnumber10000Entity count before auto-partitioning
    maxPartitionsnumber16Maximum number of partition files

    When fields is omitted, all schema-defined fields plus id, createdAt, and updatedAt are included. Specifying fields keeps the manifest small for models with large documents.

    Manifest Location

    The manifest lives at the collection level, regardless of the storage layout:

    org/customers/.worm.manifest.json
    

    This is independent of the storage.layout setting, which controls oplog/snapshots/trash placement.

    How Queries Change

    With a manifest, findAll() becomes a single GET:

    // Without manifest: 1,001 S3 requests for 1,000 entities
    // With manifest: 1 S3 request
    
    const results = await customers.findAll({
      filter: { status: "active" },
      sort: { field: "name", order: "asc" },
      limit: 20
    });
    

    A query can be served from the manifest when:

    1. The manifest exists and is not stale
    2. All fields referenced in filter, sort, and select are present in the manifest
    3. No populate is requested

    When the query cannot be served from the manifest, it falls back to a full scan transparently.

    Partitioned Manifests

    For collections with more than 10,000 entities (configurable), the manifest is automatically split into partitions:

    org/customers/.worm.manifest/
      _meta.json            # { partitionCount: 4, partitionBy: "id_prefix", totalCount: 42000 }
      partition-0.json      # entities with IDs starting a-f
      partition-1.json      # entities with IDs starting g-l
      partition-2.json      # entities with IDs starting m-r
      partition-3.json      # entities with IDs starting s-z
    

    Partitioning is automatic and transparent. The SDK detects partitioned manifests via _meta.json and handles them identically to monolithic manifests.

    SDK API

    const customers = worm.model("Customer");
    
    // Rebuild manifest from scratch
    await customers.buildManifest();
    
    // Check manifest status
    const status = await customers.getManifestStatus();
    // { exists: true, fresh: true, count: 1423,
    //   lastUpdated: "...", sizeBytes: 847000,
    //   partitioned: false, fields: ["name", "email", "status", "tags"] }
    
    // Force rebuild on next query
    await customers.invalidateManifest();
    

    Field Indexes

    Optional per-field index files for O(1) lookups by specific field values. Indexes complement manifests: manifests scan the whole collection, indexes look up entities by a known value.

    Index Types

    TypeDescriptionEntry Shape
    standardOne-to-many: value maps to list of entity IDs"active": ["id-1", "id-2"]
    uniqueOne-to-one: value maps to a single entity ID"hi@acme.com": "id-1"
    multiValueFor array fields: each element is indexed separately"vip": ["id-1"], "enterprise": ["id-1", "id-2"]

    Index Structure

    // org/customers/.worm.index/status.json (standard)
    {
      "version": 1,
      "model": "Customer",
      "field": "status",
      "type": "standard",
      "unique": false,
      "lastUpdated": "2026-02-24T14:30:00.000Z",
      "entries": {
        "active": ["a1b2c3d4-...", "e5f6g7h8-...", "i9j0k1l2-..."],
        "inactive": ["m3n4o5p6-..."],
        "churned": ["q7r8s9t0-..."]
      }
    }
    
    // org/customers/.worm.index/email.json (unique)
    {
      "version": 1,
      "model": "Customer",
      "field": "email",
      "type": "standard",
      "unique": true,
      "lastUpdated": "2026-02-24T14:30:00.000Z",
      "entries": {
        "hi@acme.com": "a1b2c3d4-...",
        "hello@widgets.com": "e5f6g7h8-..."
      }
    }
    
    // org/customers/.worm.index/tags.json (multiValue)
    {
      "version": 1,
      "model": "Customer",
      "field": "tags",
      "type": "multiValue",
      "unique": false,
      "lastUpdated": "2026-02-24T14:30:00.000Z",
      "entries": {
        "vip": ["a1b2c3d4-..."],
        "enterprise": ["a1b2c3d4-...", "e5f6g7h8-..."],
        "startup": ["i9j0k1l2-..."]
      }
    }
    

    Schema Declaration

    {
      "models": {
        "Customer": {
          "indexes": {
            "byStatus": { "field": "status" },
            "byEmail": { "field": "email", "unique": true },
            "byTags": { "field": "tags", "type": "multiValue" }
          }
        }
      }
    }
    

    Unique Constraint Enforcement

    When an index is declared unique: true, save() checks the index before writing:

    await customers.save(customers.create({
      email: "hi@acme.com"  // already exists on another entity
    }));
    // throws: "Unique constraint violation: email='hi@acme.com'
    //          already exists on entity 'a1b2c3d4-...'"
    

    SDK API

    // Rebuild all indexes from scratch
    await customers.buildIndexes();
    
    // Rebuild a single index
    await customers.buildIndex("byStatus");
    

    Aggregations

    Compute aggregate values over a collection without loading individual entities.

    Operations

    OperationDescriptionField Type
    countCount entities (or per group)Any
    sumSum a numeric fieldnumber
    avgAverage a numeric fieldnumber
    minMinimum valuenumber, datetime, string
    maxMaximum valuenumber, datetime, string

    Examples

    const customers = worm.model("Customer");
    
    // Simple count
    const total = await customers.aggregate({ count: true });
    // { count: 1423 }
    
    // Group by status
    const byStatus = await customers.aggregate({
      groupBy: "status",
      count: true
    });
    // { groups: {
    //     active: { count: 1200 },
    //     inactive: { count: 150 },
    //     churned: { count: 73 }
    // } }
    
    // Numeric aggregations with filter
    const invoices = worm.model("Invoice");
    const revenue = await invoices.aggregate({
      groupBy: "status",
      sum: "amount",
      avg: "amount",
      filter: { currency: "USD" }
    });
    // { groups: {
    //     paid: { sum: 450000, avg: 1500 },
    //     draft: { sum: 12000, avg: 800 }
    // } }
    
    // Min/max
    const range = await invoices.aggregate({
      min: "amount",
      max: "amount"
    });
    // { min: 50, max: 25000 }
    

    Aggregate Options

    interface AggregateOptions {
      groupBy?: string;
      count?: boolean;
      sum?: string;
      avg?: string;
      min?: string;
      max?: string;
      filter?: Record<string, unknown>;
    }
    

    When a manifest is available and contains the needed fields, aggregations run entirely from manifest data. No individual entity fetches required.


    Filter Operators

    The full set of filter operators available in findAll() and aggregation queries.

    Comparison Operators

    OperatorDescriptionExample
    $eqEquals (implicit for scalar values){ status: "active" } or { status: { $eq: "active" } }
    $neNot equals{ status: { $ne: "churned" } }
    $gtGreater than{ amount: { $gt: 100 } }
    $gteGreater than or equal{ amount: { $gte: 100 } }
    $ltLess than{ amount: { $lt: 1000 } }
    $lteLess than or equal{ amount: { $lte: 1000 } }
    $inValue in array{ status: { $in: ["active", "inactive"] } }
    $ninValue not in array{ status: { $nin: ["churned"] } }
    $existsField exists (or not){ email: { $exists: true } }
    $containsArray contains value{ tags: { $contains: "vip" } }
    $startsWithString starts with{ name: { $startsWith: "Acme" } }

    Logical Operators

    OperatorDescriptionExample
    $andAll conditions must match{ $and: [{ status: "active" }, { score: { $gte: 80 } }] }
    $orAny condition must match{ $or: [{ status: "active" }, { tags: { $contains: "vip" } }] }
    $notNegate a condition{ $not: { status: "churned" } }

    Examples

    // AND is implicit for top-level conditions
    const results = await customers.findAll({
      filter: {
        status: "active",
        tags: { $in: ["vip"] }
      }
    });
    
    // Explicit $or
    const results = await customers.findAll({
      filter: {
        $or: [
          { status: "active", tags: { $contains: "vip" } },
          { status: "inactive", tags: { $contains: "enterprise" } }
        ]
      }
    });
    
    // Nested logic
    const results = await customers.findAll({
      filter: {
        $or: [
          { $and: [{ status: "active" }, { score: { $gte: 80 } }] },
          { tags: { $contains: "vip" } }
        ]
      }
    });
    
    // $not
    const results = await customers.findAll({
      filter: {
        $not: { status: "churned" }
      }
    });
    

    Query Strategy

    The SDK automatically selects the best execution plan for each query. No configuration needed.

    Strategy Selection Flow

    Query arrives
      |
      v
    [1] Can an index answer this query?
        Filter on a single indexed field with $eq or $in
        Yes --> INDEX LOOKUP: fetch index, resolve IDs, fetch matches
      |
      v
    [2] Can the manifest answer this query?
        Manifest exists, all filter/sort/select fields present
        Yes --> MANIFEST SCAN: single GET, filter/sort/paginate in memory
      |
      v
    [3] Can the manifest narrow the result set?
        Manifest exists, has filter fields, but select/populate needs full docs
        Yes --> MANIFEST + TARGETED FETCH: identify IDs from manifest, fetch only those
      |
      v
    [4] None of the above
        FULL SCAN: list keys, fetch all entities
    

    Cost Model

    StrategyS3 RequestsLatency (1K entities)Latency (10K entities)
    Index lookup (10 matches)11~0.5s~0.5s
    Manifest scan1~0.1s~0.3s
    Manifest + targeted fetch (50 matches)51~2.5s~2.5s
    Full scan1 + N~50s~500s

    Query Explain

    Inspect the strategy the SDK would use for a query:

    const plan = customers.explain({
      filter: { status: "active" },
      select: ["name", "email"]
    });
    
    // {
    //   strategy: "manifest_scan",
    //   reason: "manifest has all needed fields",
    //   estimatedRequests: 1,
    //   manifestFields: ["name", "email", "status", "tags"]
    // }
    

    Force a Strategy

    For debugging or testing, bypass automatic selection:

    const results = await customers.findAll({
      filter: { status: "active" },
      _strategy: "full_scan"   // bypass manifest/indexes
    });
    

    CLI Commands

    worm table list

    List all models and their table status.

    $ worm table list
    
    Model         Count    Manifest   Indexes              Last Updated
    ------------- -------- ---------- -------------------- ----------------------
    Customer      1,423    fresh      status, email, tags  2026-02-24T14:30:00Z
    Invoice       8,291    stale      status               2026-02-24T12:00:00Z
    OrgSettings   1        n/a        --                   2026-02-24T10:00:00Z
    

    worm table inspect

    Show detailed collection stats, manifest status, and index information.

    $ worm table inspect Customer
    
    Model: Customer
    Path:  #org/@customers/(id:uuid)
    Mode:  readwrite
    
    Collection:
      Count:        1,423
      Size:         ~2.1 MB (estimated from manifest)
      Last Updated: 2026-02-24T14:30:00Z
    
    Manifest:
      Status:       fresh (checksum verified)
      Size:         847 KB
      Fields:       name, email, status, tags, createdAt, updatedAt
      Auto-update:  enabled
    
    Indexes:
      byStatus:  status       (standard)   3 distinct values
      byEmail:   email        (unique)     1,423 entries
      byTags:    tags         (multiValue) 12 distinct values
    

    worm table query

    Query a collection from the command line.

    worm table query Customer \
      --filter status=active \
      --sort name:asc \
      --limit 10 \
      --select name,email \
      --format table
    

    CLI Filter Syntax

    SyntaxMeaning
    status=activeEquality filter
    "amount>100"Greater than
    "status=active,tags~vip"Multiple conditions (AND), ~ for contains

    Options

    FlagDescription
    --filterField filters (see syntax above)
    --sort field:orderSort by field, asc or desc
    --select field1,field2Projection (return only these fields)
    --limit NMaximum results
    --offset NSkip N results
    --formatOutput format: table (default), json, csv

    worm table rebuild

    Rebuild manifest and indexes from source entities.

    $ worm table rebuild Customer
    
    Scanning collection: org/customers/
    Found 1,423 entities
    Building manifest... done (847 KB)
    Building index: status... done (3 values)
    Building index: email... done (1,423 entries)
    Building index: tags... done (12 values)
    
    Rebuild complete in 12.3s
    

    worm table export

    Export collection data to file.

    # CSV export
    worm table export Customer --format csv --filter status=active > active-customers.csv
    
    # NDJSON export
    worm table export Invoice --format ndjson > invoices.ndjson
    
    # JSON with projection
    worm table export Customer --format json --select name,email,status