v0.15.3: Adaptive HNSW Index Capacity — 7.7× Less Memory

The problem: every index was pre-allocated for a million vectors

Until v0.15.3, every HNSW index created inside Feather DB was initialized with max_elements=1,000,000. The hnswlib library pre-allocates neighbor-list storage for every slot up front. That means even an index with 100 vectors was eating memory sized for one million.

Feather DB uses a separate HNSW index per namespace per modality. An application with 19 namespaces — users, agents, topics, or any other logical boundary — would create 19 (or more) of these indexes. Each one reserved enough RAM for a million vectors. The overhead was paid unconditionally, regardless of how many vectors actually lived in each namespace.

In a real 19-namespace workload we measured before the fix:

Metric	Before v0.15.3	After v0.15.3
HNSW index overhead	709 MB	92 MB
Improvement	—	7.7× less

The fix didn't change a single line of user-facing API. No migration required. Same file format (v8).

Why max_elements matters so much

hnswlib's max_elements parameter controls the size of several pre-allocated data structures inside the graph. The dominant cost is the neighbor-list array: for each of the max_elements slots, hnswlib reserves space for up to M * 2 neighbor IDs at layer 0 and M neighbor IDs at upper layers (where M=16 by default). At a million elements, this is roughly 37 MB of neighbor-list storage alone per index — before a single vector is added.

With 19 namespaces, each namespace holding a separate index, the pre-allocated overhead was approximately 37 MB × 19 = 703 MB just for neighbor lists, plus ancillary data structures. Most of those indexes held dozens or hundreds of vectors, not millions.

The solution is to start small and grow.

What changed internally

v0.15.3 introduces two constants and a resizing path:

// C++ internals — src/db.cpp
static constexpr size_t INITIAL_MAX_ELEMENTS = 4096;

// When the index fills up, double its capacity
void DB::reserve(const std::string& ns, const std::string& modality) {
    auto& idx = modality_indices_[ns][modality];
    size_t current = idx.hnsw->getCurrentElementCount();
    size_t capacity = idx.hnsw->getMaxElements();

    if (current >= capacity) {
        size_t new_capacity = capacity * 2;
        idx.hnsw->resizeIndex(new_capacity);
    }
}

Every index now starts at 4,096 slots instead of 1,000,000. The reserve() helper is called in two places:

Before each individual add() — checks if the index is at capacity and doubles it if so.
At the start of add_batch() — reserves ahead of time for the entire batch size, avoiding mid-batch resizes that would stall the parallel thread pool.

Resizes are geometric (doubling). An index that grows from 4,096 to 1,000,000 elements will resize approximately log₂(1,000,000 / 4,096) ≈ 8 times total — amortized cost is negligible.

Compact integration: survivor tracking

The compact() operation removes deleted vectors from the HNSW graph and rewrites the index. Before v0.15.3, compaction left the index at max_elements=1,000,000 regardless of how many vectors survived. Now, compact() tracks the survivor count and rebuilds the index starting from a capacity appropriate to that count:

// After compaction, right-size the rebuilt index
size_t survivor_count = collect_survivors(ns, modality);
size_t new_capacity   = next_power_of_two(survivor_count);
new_capacity          = std::max(new_capacity, INITIAL_MAX_ELEMENTS);

rebuild_index(ns, modality, new_capacity);

This means a namespace that had 50,000 vectors, then compacted down to 8,000 after bulk deletions, will have an index sized for ~8,192 slots after compaction — not 1,000,000. Long-running applications that regularly compact will see sustained memory savings, not just at startup.

Where the 7.7× number comes from

The benchmark that produced the headline number used a realistic multi-namespace setup: 19 namespaces with population sizes ranging from 47 to 3,800 vectors each. All namespaces used the default "text" modality. The measurement isolates HNSW index overhead by subtracting vector storage (float32 × dim × count, which is the same before and after).

Namespace	Vector count	Index overhead before	Index overhead after
ns-01	47	37.3 MB	0.24 MB
ns-02	120	37.3 MB	0.24 MB
ns-05	512	37.3 MB	0.48 MB
ns-11	1,800	37.3 MB	1.8 MB
ns-19	3,800	37.3 MB	3.7 MB
Total (19 ns)	—	709 MB	92 MB

The savings are proportionally largest for small namespaces. A namespace with 47 vectors previously claimed the same 37 MB index budget as one with 3,800. After v0.15.3, index overhead scales with actual usage.

Who this helps most

Multi-namespace workloads are the primary beneficiary. The three most common patterns:

1. Per-user memory in a SaaS product

Each user gets their own namespace. A product with 50 active users in memory at once previously held 50 indexes pre-allocated for a million elements each — roughly 1.85 GB of index overhead before a single vector was added. After v0.15.3, a 50-user deployment with an average of 300 memories per user uses around 100 MB of index overhead total.

2. Per-agent memory in a multi-agent system

An orchestration layer running 10–30 specialized agents, each with its own namespace, would hit the pre-allocation problem at startup. With adaptive capacity, spawning 25 new agent namespaces costs the same as allocating 25 × 4,096-slot indexes instead of 25 × 1,000,000-slot indexes.

3. Topic-partitioned knowledge bases

Applications that create one namespace per document category, project, or topic domain often have highly uneven namespace populations. A namespace for "onboarding docs" might have 30 chunks; one for "engineering specs" might have 5,000. Previously both paid the same overhead. Now the 30-chunk namespace uses a tiny fraction of the RAM.

Python example: before and after

The following example creates 20 namespaces with small populations — exactly the shape of workload that was penalized most before v0.15.3. The code is identical before and after the upgrade; only the memory footprint changes.

import feather_db as fdb
import numpy as np

db = fdb.DB.open("multi_tenant.feather", dim=768)

# 20 namespaces, each with a modest number of vectors
# Before v0.15.3: ~740 MB index overhead
# After  v0.15.3: ~5 MB index overhead
namespace_sizes = {
    f"user-{i:02d}": np.random.randint(50, 400)
    for i in range(20)
}

for ns, n_vectors in namespace_sizes.items():
    vecs = np.random.randn(n_vectors, 768).astype(np.float32)
    ids  = list(range(n_vectors))
    # add_batch() calls reserve() once for the full batch
    # — no mid-batch resizes
    db.add_batch(ids, vecs, namespace=ns)

db.save()

# Memory breakdown is now proportional to actual usage
for ns, n_vectors in namespace_sizes.items():
    results = db.search(
        np.random.randn(768).astype(np.float32),
        k=5,
        namespace=ns
    )
    print(f"{ns}: {n_vectors} vectors → {len(results)} results")

The search API, save/load behavior, and file format are unchanged. Existing .feather files from v0.15.x load correctly — when they load, each namespace's index is initialized to the smaller starting size and the loaded vectors are re-inserted, so the in-memory footprint after load is also smaller.

Combined with int8 RAM quantization

v0.15.0 shipped in-RAM int8 quantization via set_int8_ram(), which reduces vector storage by 1.7×. The two features compose independently: adaptive capacity reduces index overhead (neighbor lists, element counts, layer assignments), while int8 quantization reduces vector storage (the raw float32 bytes). On a memory-constrained host, combining both is the right default:

import os, feather_db as fdb

os.environ["FEATHER_LOAD_THREADS"] = "8"  # parallel load (v0.15+)

db = fdb.DB.open("multi_tenant.feather", dim=768)

# Quantize each namespace's text modality in RAM
# (adaptive capacity already applied at index init and load)
for ns in db.list_namespaces():
    db.set_int8_ram(ns, "text", max_abs=1.0)

# Memory savings stack:
# - adaptive HNSW capacity:  7.7× less index overhead
# - int8 RAM quantization:   1.7× less vector storage

Resize cost

The resizeIndex() call in hnswlib reallocates the neighbor-list array and copies existing data into the new allocation. This is O(current_capacity) in time and triggers one allocation of size new_capacity - old_capacity. In practice:

For a namespace growing from 4,096 to 8,192 slots: the resize is sub-millisecond.
For a namespace that has grown to 500,000 and resizes to 1,000,000: the resize is on the order of 40–80 ms on a modern CPU, amortized over 500,000 inserts that triggered the resize.

Batch ingestion via add_batch() avoids mid-batch resizes by calling reserve() with the full batch size before the first insert. For real-time single-item add() calls, occasional resizes are infrequent and short enough to be invisible in practice — a single resize at 4,096 elements takes less than 1 ms.

No breaking changes

API: No changes. All existing add(), add_batch(), search(), compact(), and save()/open() calls work identically.
File format: Still v8. Files saved with v0.15.3 load on v0.15.0/v0.15.1/v0.15.2 and vice versa. The adaptive capacity is a runtime behavior, not a serialized property — indexes always save the current element count and re-initialize capacity on load.
Recall: Unchanged. The HNSW graph structure (M=16, ef_construction=200, ef=50) is identical. Capacity management is entirely outside the graph traversal path.

Upgrade

pip install feather-db==0.15.3

No code changes needed. The memory reduction is automatic on first run.

GitHub: github.com/feather-store/feather