# add_batch(): 3.4× Faster Bulk Ingestion in Feather DB > Feather DB Phase 8 ships add_batch() — parallel batch ingest with the GIL released. At scale, 3.4× faster than sequential add() calls. Here's the API, the internals, and when to use it. - **Category**: Performance - **Read time**: 5 min read - **Date**: June 16, 2026 - **Author**: Feather DB (Engineering) - **URL**: https://getfeather.store/theory/feather-db-add-batch-parallel-ingestion --- ## The sequential ingestion bottleneck Ingesting large vector corpora into Feather DB was previously sequential: a Python loop calling `db.add(id, vec)` for each document. Each call crosses the Python/C++ boundary, acquires the GIL for the pybind11 trampoline, inserts into the HNSW graph, and releases the GIL. At 100k+ documents, this loop becomes the bottleneck. `add_batch()`, shipped in Phase 8 of Feather's optimization roadmap, builds the HNSW graph in parallel with the GIL released. The result: **~3.4× faster bulk insert** in Python code. ## The API ```python import feather_db as fdb import numpy as np db = fdb.DB.open("corpus.feather", dim=768) # Prepare your data ids = list(range(10_000)) vecs = np.random.randn(10_000, 768).astype(np.float32) # Optional: metadata per vector metas = [fdb.Metadata(importance=0.8) for _ in range(10_000)] # Single parallel call — GIL released during graph construction db.add_batch(ids, vecs, metas=metas) db.save() ``` `add_batch()` accepts: - `ids`: list of int or 1-D int array - `vecs`: 2-D float32 numpy array, shape (N, dim) - `metas`: optional list of `Metadata` objects, length N The call is equivalent to N sequential `add()` calls but uses a thread pool internally, building HNSW candidate lists in parallel before merging them into the main index. ## Benchmark numbers On a 4-core machine, inserting 50k × 768-dim vectors: MethodTimeSpeedup Sequential `add()` loop~34s1× `add_batch()`~10s3.4× The speedup scales with core count up to the HNSW construction thread pool size (default: CPU cores - 1). On an 8-core machine, expect ~5–6× over the sequential baseline. ## Combined with parallel load Phase 8 also ships **parallel HNSW load** via `FEATHER_LOAD_THREADS`. The full fast-startup pattern: ```python import os, feather_db as fdb import numpy as np os.environ["FEATHER_LOAD_THREADS"] = "8" # parallel cold-start load db = fdb.DB.open("corpus.feather", dim=768) # Ingest 100k vectors in one parallel call ids = list(range(100_000)) vecs = np.load("corpus.npy") # shape: (100_000, 768) db.add_batch(ids, vecs) db.save() ``` ## When to use add_batch vs add Use `add_batch()` whenever you're ingesting more than ~1k vectors at once: - Corpus ingestion pipelines (PDF chunking, web crawls, document imports) - Cold-start memory seeding (loading a user's historical data at session start) - Batch import from CSV / Parquet / database exports - Benchmark harnesses (LongMemEval, SIFT1M ingest phase) Use sequential `add()` for real-time, single-item ingestion where latency per item matters more than throughput — adding a new memory immediately after a conversation turn, for example. ## Metadata with add_batch ```python import feather_db as fdb import numpy as np db = fdb.DB.open("corpus.feather", dim=768) # Assign importance from an external score (e.g. engagement, spend) scores = np.load("scores.npy") # float array, same length as vecs metas = [] for score in scores: m = fdb.Metadata(importance=float(min(1.0, score))) m.set_attribute("source", "batch_import") metas.append(m) ids = list(range(len(scores))) vecs = np.load("vecs.npy") db.add_batch(ids, vecs, metas=metas) ``` **Important**: use `meta.set_attribute(key, value)` — not `meta.attributes[key] = value`. The latter silently does nothing due to a pybind11 copy semantics issue. ## Install `pip install feather-db` — `add_batch()` is available from v0.13+ onwards. **GitHub:** [github.com/feather-store/feather](https://github.com/feather-store/feather) --- *This is the machine-readable mirror of the theory post at [getfeather.store/theory/feather-db-add-batch-parallel-ingestion](https://getfeather.store/theory/feather-db-add-batch-parallel-ingestion). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*