Bucket Tables

Bucket Tables turn S3 collections into queryable tables. Manifests, indexes, and aggregations eliminate the N+1 fetch problem and enable fast queries over a flat key-value store.

The Problem

Without optimization, every query requires listing all object keys under a prefix and fetching each entity individually. For a collection with 1,000 entities, findAll() issues 1,001 S3 requests. At 50ms per request, that is 50 seconds.

Bucket Tables solve this with two structures:

Manifests -- materialized collection summaries as a single JSON file. One GET instead of N+1.
Indexes -- per-field lookup tables for targeted queries without scanning.

Collection Manifests

A .worm.manifest.json file per collection that caches entity metadata for fast queries. This is the core optimization -- it turns O(N) fetches into O(1).

Manifest Structure

// org/customers/.worm.manifest.json
{
  "version": 1,
  "model": "Customer",
  "count": 1423,
  "lastUpdated": "2026-02-24T14:30:00.000Z",
  "checksum": "sha256:abc123...",
  "entities": {
    "a1b2c3d4-...": {
      "name": "Acme Corp",
      "email": "hi@acme.com",
      "status": "active",
      "tags": ["vip", "enterprise"],
      "createdAt": "2026-01-15T09:00:00.000Z",
      "updatedAt": "2026-02-20T11:30:00.000Z"
    },
    "e5f6g7h8-...": {
      "name": "Widgets Inc",
      "email": "hello@widgets.com",
      "status": "inactive",
      "tags": [],
      "createdAt": "2026-01-20T14:00:00.000Z",
      "updatedAt": "2026-02-18T08:45:00.000Z"
    }
  }
}

Manifest Configuration

{
  "models": {
    "Customer": {
      "manifest": {
        "enabled": true,
        "fields": ["name", "email", "status", "tags"],
        "autoUpdate": true,
        "partitionThreshold": 10000,
        "maxPartitions": 16
      }
    }
  }
}

Field	Type	Default	Description
`enabled`	`boolean`	`false`	Enable manifest for this model
`fields`	`string[]`	All schema fields	Fields to include in the manifest
`autoUpdate`	`boolean`	`true`	Update manifest on save/delete
`partitionThreshold`	`number`	`10000`	Entity count before auto-partitioning
`maxPartitions`	`number`	`16`	Maximum number of partition files

When fields is omitted, all schema-defined fields plus id, createdAt, and updatedAt are included. Specifying fields keeps the manifest small for models with large documents.

Manifest Location

The manifest lives at the collection level, regardless of the storage layout:

org/customers/.worm.manifest.json

This is independent of the storage.layout setting, which controls oplog/snapshots/trash placement.

How Queries Change

With a manifest, findAll() becomes a single GET:

// Without manifest: 1,001 S3 requests for 1,000 entities
// With manifest: 1 S3 request

const results = await customers.findAll({
  filter: { status: "active" },
  sort: { field: "name", order: "asc" },
  limit: 20
});

A query can be served from the manifest when:

The manifest exists and is not stale
All fields referenced in filter, sort, and select are present in the manifest
No populate is requested

When the query cannot be served from the manifest, it falls back to a full scan transparently.

Partitioned Manifests

For collections with more than 10,000 entities (configurable), the manifest is automatically split into partitions:

org/customers/.worm.manifest/
  _meta.json            # { partitionCount: 4, partitionBy: "id_prefix", totalCount: 42000 }
  partition-0.json      # entities with IDs starting a-f
  partition-1.json      # entities with IDs starting g-l
  partition-2.json      # entities with IDs starting m-r
  partition-3.json      # entities with IDs starting s-z

Partitioning is automatic and transparent. The SDK detects partitioned manifests via _meta.json and handles them identically to monolithic manifests.

SDK API

const customers = worm.model("Customer");

// Rebuild manifest from scratch
await customers.buildManifest();

// Check manifest status
const status = await customers.getManifestStatus();
// { exists: true, fresh: true, count: 1423,
//   lastUpdated: "...", sizeBytes: 847000,
//   partitioned: false, fields: ["name", "email", "status", "tags"] }

// Force rebuild on next query
await customers.invalidateManifest();

Field Indexes

Optional per-field index files for O(1) lookups by specific field values. Indexes complement manifests: manifests scan the whole collection, indexes look up entities by a known value.

Index Types

Type	Description	Entry Shape
`standard`	One-to-many: value maps to list of entity IDs	`"active": ["id-1", "id-2"]`
`unique`	One-to-one: value maps to a single entity ID	`"hi@acme.com": "id-1"`
`multiValue`	For array fields: each element is indexed separately	`"vip": ["id-1"], "enterprise": ["id-1", "id-2"]`

Index Structure

// org/customers/.worm.index/status.json (standard)
{
  "version": 1,
  "model": "Customer",
  "field": "status",
  "type": "standard",
  "unique": false,
  "lastUpdated": "2026-02-24T14:30:00.000Z",
  "entries": {
    "active": ["a1b2c3d4-...", "e5f6g7h8-...", "i9j0k1l2-..."],
    "inactive": ["m3n4o5p6-..."],
    "churned": ["q7r8s9t0-..."]
  }
}

// org/customers/.worm.index/email.json (unique)
{
  "version": 1,
  "model": "Customer",
  "field": "email",
  "type": "standard",
  "unique": true,
  "lastUpdated": "2026-02-24T14:30:00.000Z",
  "entries": {
    "hi@acme.com": "a1b2c3d4-...",
    "hello@widgets.com": "e5f6g7h8-..."
  }
}

// org/customers/.worm.index/tags.json (multiValue)
{
  "version": 1,
  "model": "Customer",
  "field": "tags",
  "type": "multiValue",
  "unique": false,
  "lastUpdated": "2026-02-24T14:30:00.000Z",
  "entries": {
    "vip": ["a1b2c3d4-..."],
    "enterprise": ["a1b2c3d4-...", "e5f6g7h8-..."],
    "startup": ["i9j0k1l2-..."]
  }
}

Schema Declaration

{
  "models": {
    "Customer": {
      "indexes": {
        "byStatus": { "field": "status" },
        "byEmail": { "field": "email", "unique": true },
        "byTags": { "field": "tags", "type": "multiValue" }
      }
    }
  }
}

Unique Constraint Enforcement

When an index is declared unique: true, save() checks the index before writing:

await customers.save(customers.create({
  email: "hi@acme.com"  // already exists on another entity
}));
// throws: "Unique constraint violation: email='hi@acme.com'
//          already exists on entity 'a1b2c3d4-...'"

SDK API

// Rebuild all indexes from scratch
await customers.buildIndexes();

// Rebuild a single index
await customers.buildIndex("byStatus");

Aggregations

Compute aggregate values over a collection without loading individual entities.

Operations

Operation	Description	Field Type
`count`	Count entities (or per group)	Any
`sum`	Sum a numeric field	`number`
`avg`	Average a numeric field	`number`
`min`	Minimum value	`number`, `datetime`, `string`
`max`	Maximum value	`number`, `datetime`, `string`

Examples

const customers = worm.model("Customer");

// Simple count
const total = await customers.aggregate({ count: true });
// { count: 1423 }

// Group by status
const byStatus = await customers.aggregate({
  groupBy: "status",
  count: true
});
// { groups: {
//     active: { count: 1200 },
//     inactive: { count: 150 },
//     churned: { count: 73 }
// } }

// Numeric aggregations with filter
const invoices = worm.model("Invoice");
const revenue = await invoices.aggregate({
  groupBy: "status",
  sum: "amount",
  avg: "amount",
  filter: { currency: "USD" }
});
// { groups: {
//     paid: { sum: 450000, avg: 1500 },
//     draft: { sum: 12000, avg: 800 }
// } }

// Min/max
const range = await invoices.aggregate({
  min: "amount",
  max: "amount"
});
// { min: 50, max: 25000 }

Aggregate Options

interface AggregateOptions {
  groupBy?: string;
  count?: boolean;
  sum?: string;
  avg?: string;
  min?: string;
  max?: string;
  filter?: Record<string, unknown>;
}

When a manifest is available and contains the needed fields, aggregations run entirely from manifest data. No individual entity fetches required.

Filter Operators

The full set of filter operators available in findAll() and aggregation queries.

Comparison Operators

Operator	Description	Example
`$eq`	Equals (implicit for scalar values)	`{ status: "active" }` or `{ status: { $eq: "active" } }`
`$ne`	Not equals	`{ status: { $ne: "churned" } }`
`$gt`	Greater than	`{ amount: { $gt: 100 } }`
`$gte`	Greater than or equal	`{ amount: { $gte: 100 } }`
`$lt`	Less than	`{ amount: { $lt: 1000 } }`
`$lte`	Less than or equal	`{ amount: { $lte: 1000 } }`
`$in`	Value in array	`{ status: { $in: ["active", "inactive"] } }`
`$nin`	Value not in array	`{ status: { $nin: ["churned"] } }`
`$exists`	Field exists (or not)	`{ email: { $exists: true } }`
`$contains`	Array contains value	`{ tags: { $contains: "vip" } }`
`$startsWith`	String starts with	`{ name: { $startsWith: "Acme" } }`

Logical Operators

Operator	Description	Example
`$and`	All conditions must match	`{ $and: [{ status: "active" }, { score: { $gte: 80 } }] }`
`$or`	Any condition must match	`{ $or: [{ status: "active" }, { tags: { $contains: "vip" } }] }`
`$not`	Negate a condition	`{ $not: { status: "churned" } }`

Examples

// AND is implicit for top-level conditions
const results = await customers.findAll({
  filter: {
    status: "active",
    tags: { $in: ["vip"] }
  }
});

// Explicit $or
const results = await customers.findAll({
  filter: {
    $or: [
      { status: "active", tags: { $contains: "vip" } },
      { status: "inactive", tags: { $contains: "enterprise" } }
    ]
  }
});

// Nested logic
const results = await customers.findAll({
  filter: {
    $or: [
      { $and: [{ status: "active" }, { score: { $gte: 80 } }] },
      { tags: { $contains: "vip" } }
    ]
  }
});

// $not
const results = await customers.findAll({
  filter: {
    $not: { status: "churned" }
  }
});

Query Strategy

The SDK automatically selects the best execution plan for each query. No configuration needed.

Strategy Selection Flow

Query arrives
  |
  v
[1] Can an index answer this query?
    Filter on a single indexed field with $eq or $in
    Yes --> INDEX LOOKUP: fetch index, resolve IDs, fetch matches
  |
  v
[2] Can the manifest answer this query?
    Manifest exists, all filter/sort/select fields present
    Yes --> MANIFEST SCAN: single GET, filter/sort/paginate in memory
  |
  v
[3] Can the manifest narrow the result set?
    Manifest exists, has filter fields, but select/populate needs full docs
    Yes --> MANIFEST + TARGETED FETCH: identify IDs from manifest, fetch only those
  |
  v
[4] None of the above
    FULL SCAN: list keys, fetch all entities

Cost Model

Strategy	S3 Requests	Latency (1K entities)	Latency (10K entities)
Index lookup (10 matches)	11	~0.5s	~0.5s
Manifest scan	1	~0.1s	~0.3s
Manifest + targeted fetch (50 matches)	51	~2.5s	~2.5s
Full scan	1 + N	~50s	~500s

Query Explain

Inspect the strategy the SDK would use for a query:

const plan = customers.explain({
  filter: { status: "active" },
  select: ["name", "email"]
});

// {
//   strategy: "manifest_scan",
//   reason: "manifest has all needed fields",
//   estimatedRequests: 1,
//   manifestFields: ["name", "email", "status", "tags"]
// }

Force a Strategy

For debugging or testing, bypass automatic selection:

const results = await customers.findAll({
  filter: { status: "active" },
  _strategy: "full_scan"   // bypass manifest/indexes
});

CLI Commands

`worm table list`

List all models and their table status.

$ worm table list

Model         Count    Manifest   Indexes              Last Updated
------------- -------- ---------- -------------------- ----------------------
Customer      1,423    fresh      status, email, tags  2026-02-24T14:30:00Z
Invoice       8,291    stale      status               2026-02-24T12:00:00Z
OrgSettings   1        n/a        --                   2026-02-24T10:00:00Z

`worm table inspect`

Show detailed collection stats, manifest status, and index information.

$ worm table inspect Customer

Model: Customer
Path:  #org/@customers/(id:uuid)
Mode:  readwrite

Collection:
  Count:        1,423
  Size:         ~2.1 MB (estimated from manifest)
  Last Updated: 2026-02-24T14:30:00Z

Manifest:
  Status:       fresh (checksum verified)
  Size:         847 KB
  Fields:       name, email, status, tags, createdAt, updatedAt
  Auto-update:  enabled

Indexes:
  byStatus:  status       (standard)   3 distinct values
  byEmail:   email        (unique)     1,423 entries
  byTags:    tags         (multiValue) 12 distinct values

`worm table query`

Query a collection from the command line.

worm table query Customer \
  --filter status=active \
  --sort name:asc \
  --limit 10 \
  --select name,email \
  --format table

CLI Filter Syntax

Syntax	Meaning
`status=active`	Equality filter
`"amount>100"`	Greater than
`"status=active,tags~vip"`	Multiple conditions (AND), `~` for contains

Options

Flag	Description
`--filter`	Field filters (see syntax above)
`--sort field:order`	Sort by field, `asc` or `desc`
`--select field1,field2`	Projection (return only these fields)
`--limit N`	Maximum results
`--offset N`	Skip N results
`--format`	Output format: `table` (default), `json`, `csv`

`worm table rebuild`

Rebuild manifest and indexes from source entities.

$ worm table rebuild Customer

Scanning collection: org/customers/
Found 1,423 entities
Building manifest... done (847 KB)
Building index: status... done (3 values)
Building index: email... done (1,423 entries)
Building index: tags... done (12 values)

Rebuild complete in 12.3s

`worm table export`

Export collection data to file.

# CSV export
worm table export Customer --format csv --filter status=active > active-customers.csv

# NDJSON export
worm table export Invoice --format ndjson > invoices.ndjson

# JSON with projection
worm table export Customer --format json --select name,email,status