# add_batch(): 3.4× Faster Bulk Ingestion in Feather DB

> Feather DB Phase 8 ships add_batch() — parallel batch ingest with the GIL released. At scale, 3.4× faster than sequential add() calls. Here's the API, the internals, and when to use it.

- **Category**: Performance
- **Read time**: 5 min read
- **Date**: June 16, 2026
- **Author**: Feather DB (Engineering)
- **URL**: https://getfeather.store/theory/feather-db-add-batch-parallel-ingestion

---

## The sequential ingestion bottleneck

Ingesting large vector corpora into Feather DB was previously sequential: a Python loop calling `db.add(id, vec)` for each document. Each call crosses the Python/C++ boundary, acquires the GIL for the pybind11 trampoline, inserts into the HNSW graph, and releases the GIL. At 100k+ documents, this loop becomes the bottleneck.

`add_batch()`, shipped in Phase 8 of Feather's optimization roadmap, builds the HNSW graph in parallel with the GIL released. The result: **~3.4× faster bulk insert** in Python code.

## The API

```python
import feather_db as fdb
import numpy as np

db = fdb.DB.open("corpus.feather", dim=768)

# Prepare your data
ids  = list(range(10_000))
vecs = np.random.randn(10_000, 768).astype(np.float32)

# Optional: metadata per vector
metas = [fdb.Metadata(importance=0.8) for _ in range(10_000)]

# Single parallel call — GIL released during graph construction
db.add_batch(ids, vecs, metas=metas)
db.save()

```

`add_batch()` accepts:

- `ids`: list of int or 1-D int array

- `vecs`: 2-D float32 numpy array, shape (N, dim)

- `metas`: optional list of `Metadata` objects, length N

The call is equivalent to N sequential `add()` calls but uses a thread pool internally, building HNSW candidate lists in parallel before merging them into the main index.

## Benchmark numbers

On a 4-core machine, inserting 50k × 768-dim vectors:

MethodTimeSpeedup

Sequential `add()` loop~34s1×
`add_batch()`~10s3.4×

The speedup scales with core count up to the HNSW construction thread pool size (default: CPU cores - 1). On an 8-core machine, expect ~5–6× over the sequential baseline.

## Combined with parallel load

Phase 8 also ships **parallel HNSW load** via `FEATHER_LOAD_THREADS`. The full fast-startup pattern:

```python
import os, feather_db as fdb
import numpy as np

os.environ["FEATHER_LOAD_THREADS"] = "8"   # parallel cold-start load

db = fdb.DB.open("corpus.feather", dim=768)

# Ingest 100k vectors in one parallel call
ids  = list(range(100_000))
vecs = np.load("corpus.npy")   # shape: (100_000, 768)
db.add_batch(ids, vecs)
db.save()

```

## When to use add_batch vs add

Use `add_batch()` whenever you're ingesting more than ~1k vectors at once:

- Corpus ingestion pipelines (PDF chunking, web crawls, document imports)

- Cold-start memory seeding (loading a user's historical data at session start)

- Batch import from CSV / Parquet / database exports

- Benchmark harnesses (LongMemEval, SIFT1M ingest phase)

Use sequential `add()` for real-time, single-item ingestion where latency per item matters more than throughput — adding a new memory immediately after a conversation turn, for example.

## Metadata with add_batch

```python
import feather_db as fdb
import numpy as np

db = fdb.DB.open("corpus.feather", dim=768)

# Assign importance from an external score (e.g. engagement, spend)
scores = np.load("scores.npy")   # float array, same length as vecs

metas = []
for score in scores:
    m = fdb.Metadata(importance=float(min(1.0, score)))
    m.set_attribute("source", "batch_import")
    metas.append(m)

ids  = list(range(len(scores)))
vecs = np.load("vecs.npy")
db.add_batch(ids, vecs, metas=metas)

```

**Important**: use `meta.set_attribute(key, value)` — not `meta.attributes[key] = value`. The latter silently does nothing due to a pybind11 copy semantics issue.

## Install

`pip install feather-db` — `add_batch()` is available from v0.13+ onwards.

**GitHub:** [github.com/feather-store/feather](https://github.com/feather-store/feather)

---

*This is the machine-readable mirror of the theory post at [getfeather.store/theory/feather-db-add-batch-parallel-ingestion](https://getfeather.store/theory/feather-db-add-batch-parallel-ingestion). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*