Bucket Tables
Bucket Tables turn S3 collections into queryable tables. Manifests, indexes, and aggregations eliminate the N+1 fetch problem and enable fast queries over a flat key-value store.
The Problem
Without optimization, every query requires listing all object keys under a prefix and fetching each entity individually. For a collection with 1,000 entities, findAll() issues 1,001 S3 requests. At 50ms per request, that is 50 seconds.
Bucket Tables solve this with two structures:
- Manifests -- materialized collection summaries as a single JSON file. One GET instead of N+1.
- Indexes -- per-field lookup tables for targeted queries without scanning.
Collection Manifests
A .worm.manifest.json file per collection that caches entity metadata for fast queries. This is the core optimization -- it turns O(N) fetches into O(1).
Manifest Structure
// org/customers/.worm.manifest.json
{
"version": 1,
"model": "Customer",
"count": 1423,
"lastUpdated": "2026-02-24T14:30:00.000Z",
"checksum": "sha256:abc123...",
"entities": {
"a1b2c3d4-...": {
"name": "Acme Corp",
"email": "hi@acme.com",
"status": "active",
"tags": ["vip", "enterprise"],
"createdAt": "2026-01-15T09:00:00.000Z",
"updatedAt": "2026-02-20T11:30:00.000Z"
},
"e5f6g7h8-...": {
"name": "Widgets Inc",
"email": "hello@widgets.com",
"status": "inactive",
"tags": [],
"createdAt": "2026-01-20T14:00:00.000Z",
"updatedAt": "2026-02-18T08:45:00.000Z"
}
}
}
Manifest Configuration
{
"models": {
"Customer": {
"manifest": {
"enabled": true,
"fields": ["name", "email", "status", "tags"],
"autoUpdate": true,
"partitionThreshold": 10000,
"maxPartitions": 16
}
}
}
}
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable manifest for this model |
fields | string[] | All schema fields | Fields to include in the manifest |
autoUpdate | boolean | true | Update manifest on save/delete |
partitionThreshold | number | 10000 | Entity count before auto-partitioning |
maxPartitions | number | 16 | Maximum number of partition files |
When fields is omitted, all schema-defined fields plus id, createdAt, and updatedAt are included. Specifying fields keeps the manifest small for models with large documents.
Manifest Location
The manifest lives at the collection level, regardless of the storage layout:
org/customers/.worm.manifest.json
This is independent of the storage.layout setting, which controls oplog/snapshots/trash placement.
How Queries Change
With a manifest, findAll() becomes a single GET:
// Without manifest: 1,001 S3 requests for 1,000 entities
// With manifest: 1 S3 request
const results = await customers.findAll({
filter: { status: "active" },
sort: { field: "name", order: "asc" },
limit: 20
});
A query can be served from the manifest when:
- The manifest exists and is not stale
- All fields referenced in
filter,sort, andselectare present in the manifest - No
populateis requested
When the query cannot be served from the manifest, it falls back to a full scan transparently.
Partitioned Manifests
For collections with more than 10,000 entities (configurable), the manifest is automatically split into partitions:
org/customers/.worm.manifest/
_meta.json # { partitionCount: 4, partitionBy: "id_prefix", totalCount: 42000 }
partition-0.json # entities with IDs starting a-f
partition-1.json # entities with IDs starting g-l
partition-2.json # entities with IDs starting m-r
partition-3.json # entities with IDs starting s-z
Partitioning is automatic and transparent. The SDK detects partitioned manifests via _meta.json and handles them identically to monolithic manifests.
SDK API
const customers = worm.model("Customer");
// Rebuild manifest from scratch
await customers.buildManifest();
// Check manifest status
const status = await customers.getManifestStatus();
// { exists: true, fresh: true, count: 1423,
// lastUpdated: "...", sizeBytes: 847000,
// partitioned: false, fields: ["name", "email", "status", "tags"] }
// Force rebuild on next query
await customers.invalidateManifest();
Field Indexes
Optional per-field index files for O(1) lookups by specific field values. Indexes complement manifests: manifests scan the whole collection, indexes look up entities by a known value.
Index Types
| Type | Description | Entry Shape |
|---|---|---|
standard | One-to-many: value maps to list of entity IDs | "active": ["id-1", "id-2"] |
unique | One-to-one: value maps to a single entity ID | "hi@acme.com": "id-1" |
multiValue | For array fields: each element is indexed separately | "vip": ["id-1"], "enterprise": ["id-1", "id-2"] |
Index Structure
// org/customers/.worm.index/status.json (standard)
{
"version": 1,
"model": "Customer",
"field": "status",
"type": "standard",
"unique": false,
"lastUpdated": "2026-02-24T14:30:00.000Z",
"entries": {
"active": ["a1b2c3d4-...", "e5f6g7h8-...", "i9j0k1l2-..."],
"inactive": ["m3n4o5p6-..."],
"churned": ["q7r8s9t0-..."]
}
}
// org/customers/.worm.index/email.json (unique)
{
"version": 1,
"model": "Customer",
"field": "email",
"type": "standard",
"unique": true,
"lastUpdated": "2026-02-24T14:30:00.000Z",
"entries": {
"hi@acme.com": "a1b2c3d4-...",
"hello@widgets.com": "e5f6g7h8-..."
}
}
// org/customers/.worm.index/tags.json (multiValue)
{
"version": 1,
"model": "Customer",
"field": "tags",
"type": "multiValue",
"unique": false,
"lastUpdated": "2026-02-24T14:30:00.000Z",
"entries": {
"vip": ["a1b2c3d4-..."],
"enterprise": ["a1b2c3d4-...", "e5f6g7h8-..."],
"startup": ["i9j0k1l2-..."]
}
}
Schema Declaration
{
"models": {
"Customer": {
"indexes": {
"byStatus": { "field": "status" },
"byEmail": { "field": "email", "unique": true },
"byTags": { "field": "tags", "type": "multiValue" }
}
}
}
}
Unique Constraint Enforcement
When an index is declared unique: true, save() checks the index before writing:
await customers.save(customers.create({
email: "hi@acme.com" // already exists on another entity
}));
// throws: "Unique constraint violation: email='hi@acme.com'
// already exists on entity 'a1b2c3d4-...'"
SDK API
// Rebuild all indexes from scratch
await customers.buildIndexes();
// Rebuild a single index
await customers.buildIndex("byStatus");
Aggregations
Compute aggregate values over a collection without loading individual entities.
Operations
| Operation | Description | Field Type |
|---|---|---|
count | Count entities (or per group) | Any |
sum | Sum a numeric field | number |
avg | Average a numeric field | number |
min | Minimum value | number, datetime, string |
max | Maximum value | number, datetime, string |
Examples
const customers = worm.model("Customer");
// Simple count
const total = await customers.aggregate({ count: true });
// { count: 1423 }
// Group by status
const byStatus = await customers.aggregate({
groupBy: "status",
count: true
});
// { groups: {
// active: { count: 1200 },
// inactive: { count: 150 },
// churned: { count: 73 }
// } }
// Numeric aggregations with filter
const invoices = worm.model("Invoice");
const revenue = await invoices.aggregate({
groupBy: "status",
sum: "amount",
avg: "amount",
filter: { currency: "USD" }
});
// { groups: {
// paid: { sum: 450000, avg: 1500 },
// draft: { sum: 12000, avg: 800 }
// } }
// Min/max
const range = await invoices.aggregate({
min: "amount",
max: "amount"
});
// { min: 50, max: 25000 }
Aggregate Options
interface AggregateOptions {
groupBy?: string;
count?: boolean;
sum?: string;
avg?: string;
min?: string;
max?: string;
filter?: Record<string, unknown>;
}
When a manifest is available and contains the needed fields, aggregations run entirely from manifest data. No individual entity fetches required.
Filter Operators
The full set of filter operators available in findAll() and aggregation queries.
Comparison Operators
| Operator | Description | Example |
|---|---|---|
$eq | Equals (implicit for scalar values) | { status: "active" } or { status: { $eq: "active" } } |
$ne | Not equals | { status: { $ne: "churned" } } |
$gt | Greater than | { amount: { $gt: 100 } } |
$gte | Greater than or equal | { amount: { $gte: 100 } } |
$lt | Less than | { amount: { $lt: 1000 } } |
$lte | Less than or equal | { amount: { $lte: 1000 } } |
$in | Value in array | { status: { $in: ["active", "inactive"] } } |
$nin | Value not in array | { status: { $nin: ["churned"] } } |
$exists | Field exists (or not) | { email: { $exists: true } } |
$contains | Array contains value | { tags: { $contains: "vip" } } |
$startsWith | String starts with | { name: { $startsWith: "Acme" } } |
Logical Operators
| Operator | Description | Example |
|---|---|---|
$and | All conditions must match | { $and: [{ status: "active" }, { score: { $gte: 80 } }] } |
$or | Any condition must match | { $or: [{ status: "active" }, { tags: { $contains: "vip" } }] } |
$not | Negate a condition | { $not: { status: "churned" } } |
Examples
// AND is implicit for top-level conditions
const results = await customers.findAll({
filter: {
status: "active",
tags: { $in: ["vip"] }
}
});
// Explicit $or
const results = await customers.findAll({
filter: {
$or: [
{ status: "active", tags: { $contains: "vip" } },
{ status: "inactive", tags: { $contains: "enterprise" } }
]
}
});
// Nested logic
const results = await customers.findAll({
filter: {
$or: [
{ $and: [{ status: "active" }, { score: { $gte: 80 } }] },
{ tags: { $contains: "vip" } }
]
}
});
// $not
const results = await customers.findAll({
filter: {
$not: { status: "churned" }
}
});
Query Strategy
The SDK automatically selects the best execution plan for each query. No configuration needed.
Strategy Selection Flow
Query arrives
|
v
[1] Can an index answer this query?
Filter on a single indexed field with $eq or $in
Yes --> INDEX LOOKUP: fetch index, resolve IDs, fetch matches
|
v
[2] Can the manifest answer this query?
Manifest exists, all filter/sort/select fields present
Yes --> MANIFEST SCAN: single GET, filter/sort/paginate in memory
|
v
[3] Can the manifest narrow the result set?
Manifest exists, has filter fields, but select/populate needs full docs
Yes --> MANIFEST + TARGETED FETCH: identify IDs from manifest, fetch only those
|
v
[4] None of the above
FULL SCAN: list keys, fetch all entities
Cost Model
| Strategy | S3 Requests | Latency (1K entities) | Latency (10K entities) |
|---|---|---|---|
| Index lookup (10 matches) | 11 | ~0.5s | ~0.5s |
| Manifest scan | 1 | ~0.1s | ~0.3s |
| Manifest + targeted fetch (50 matches) | 51 | ~2.5s | ~2.5s |
| Full scan | 1 + N | ~50s | ~500s |
Query Explain
Inspect the strategy the SDK would use for a query:
const plan = customers.explain({
filter: { status: "active" },
select: ["name", "email"]
});
// {
// strategy: "manifest_scan",
// reason: "manifest has all needed fields",
// estimatedRequests: 1,
// manifestFields: ["name", "email", "status", "tags"]
// }
Force a Strategy
For debugging or testing, bypass automatic selection:
const results = await customers.findAll({
filter: { status: "active" },
_strategy: "full_scan" // bypass manifest/indexes
});
CLI Commands
worm table list
List all models and their table status.
$ worm table list
Model Count Manifest Indexes Last Updated
------------- -------- ---------- -------------------- ----------------------
Customer 1,423 fresh status, email, tags 2026-02-24T14:30:00Z
Invoice 8,291 stale status 2026-02-24T12:00:00Z
OrgSettings 1 n/a -- 2026-02-24T10:00:00Z
worm table inspect
Show detailed collection stats, manifest status, and index information.
$ worm table inspect Customer
Model: Customer
Path: #org/@customers/(id:uuid)
Mode: readwrite
Collection:
Count: 1,423
Size: ~2.1 MB (estimated from manifest)
Last Updated: 2026-02-24T14:30:00Z
Manifest:
Status: fresh (checksum verified)
Size: 847 KB
Fields: name, email, status, tags, createdAt, updatedAt
Auto-update: enabled
Indexes:
byStatus: status (standard) 3 distinct values
byEmail: email (unique) 1,423 entries
byTags: tags (multiValue) 12 distinct values
worm table query
Query a collection from the command line.
worm table query Customer \
--filter status=active \
--sort name:asc \
--limit 10 \
--select name,email \
--format table
CLI Filter Syntax
| Syntax | Meaning |
|---|---|
status=active | Equality filter |
"amount>100" | Greater than |
"status=active,tags~vip" | Multiple conditions (AND), ~ for contains |
Options
| Flag | Description |
|---|---|
--filter | Field filters (see syntax above) |
--sort field:order | Sort by field, asc or desc |
--select field1,field2 | Projection (return only these fields) |
--limit N | Maximum results |
--offset N | Skip N results |
--format | Output format: table (default), json, csv |
worm table rebuild
Rebuild manifest and indexes from source entities.
$ worm table rebuild Customer
Scanning collection: org/customers/
Found 1,423 entities
Building manifest... done (847 KB)
Building index: status... done (3 values)
Building index: email... done (1,423 entries)
Building index: tags... done (12 values)
Rebuild complete in 12.3s
worm table export
Export collection data to file.
# CSV export
worm table export Customer --format csv --filter status=active > active-customers.csv
# NDJSON export
worm table export Invoice --format ndjson > invoices.ndjson
# JSON with projection
worm table export Customer --format json --select name,email,status