ZeroProximity.VectorizedContentIndexer 1.0.2

.NET 9.0

dotnet add package ZeroProximity.VectorizedContentIndexer --version 1.0.2

NuGet\Install-Package ZeroProximity.VectorizedContentIndexer -Version 1.0.2

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="ZeroProximity.VectorizedContentIndexer" Version="1.0.2" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="ZeroProximity.VectorizedContentIndexer" Version="1.0.2" />
                    

                            Directory.Packages.props

<PackageReference Include="ZeroProximity.VectorizedContentIndexer" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add ZeroProximity.VectorizedContentIndexer --version 1.0.2

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: ZeroProximity.VectorizedContentIndexer, 1.0.2"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package ZeroProximity.VectorizedContentIndexer@1.0.2

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=ZeroProximity.VectorizedContentIndexer&version=1.0.2
                    

                            Install as a Cake Addin

#tool nuget:?package=ZeroProximity.VectorizedContentIndexer&version=1.0.2
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

🔍 ZeroProximity.VectorizedContentIndexer

A high-performance hybrid content indexing library for .NET — combining BM25 keyword search, semantic vector search, and RRF fusion with a fully embedded ONNX model. Zero config. Works offline.

📋 Overview

ZeroProximity.VectorizedContentIndexer brings production-grade hybrid search to any .NET 9 application without requiring external services, API keys, or model downloads. It ships an embedded MiniLM-L6-v2 ONNX model (384 dimensions, 22.7M parameters) and a custom binary vector index format designed for memory-efficient, high-throughput retrieval.

Why this library?

Capability	ZeroProximity.VectorizedContentIndexer
Embedding model	Embedded MiniLM-L6-v2 — no download, no API key
Vector index format	Custom AJVI binary format — append-only, memory-mapped, Float16
Storage efficiency	~825 bytes/vector at Float16 (384 dimensions)
Search modes	Lexical (BM25), Semantic, and Hybrid (RRF)
Document model	Generic `ISearchable` / `IDocument` — works with any content type
Hierarchical docs	Built-in parent-child relationship support
GPU acceleration	DirectML support for 10–20x faster embedding generation
Time-based ranking	Exponential temporal decay (configurable half-life)
Dependencies	All Apache 2.0 / MIT — fully permissive

✨ Key Features

Search Modes

Lexical Search — Lucene.NET BM25 keyword search with inverted index; fast and precise for exact term matching
Semantic Search — Dense vector search using ONNX-generated embeddings; finds conceptually similar content even without shared keywords
Hybrid Search — Reciprocal Rank Fusion (RRF) combines lexical and semantic rankings into a single, balanced result set

Document Model

ISearchable — Minimal interface: provide text and a timestamp, the library handles the rest
IDocument — Extended interface for richer content with ID, metadata, and field control
IHierarchicalDocument<TChild> — First-class support for parent-child structures (e.g., a session containing messages), enabling child-level indexing with parent-level context retrieval

Performance

Parallel lexical + semantic query execution in hybrid mode
Memory-mapped AJVI index for low-latency random reads at scale
SIMD cosine similarity (System.Numerics intrinsics)
Float16 storage cuts vector memory footprint by ~50% vs Float32
DirectML GPU acceleration for batch embedding workloads

Thread Safety

Concurrent reads fully supported
Writes are serialized — safe for multi-threaded indexing pipelines

📦 Installation

dotnet add package ZeroProximity.VectorizedContentIndexer --prerelease

This package is currently in prerelease (1.0.0-beta.1). The --prerelease flag is required until the stable release.

🚀 Quick Start

Basic: Hybrid Search with ISearchable

using ZeroProximity.VectorizedContentIndexer.Embeddings;
using ZeroProximity.VectorizedContentIndexer.Search;
using ZeroProximity.VectorizedContentIndexer.Models;

// Define your searchable content type
public record DocumentChunk : ISearchable
{
    public required string Id { get; init; }
    public required string Content { get; init; }
    public required DateTime CreatedAt { get; init; }

    public string GetSearchableText() => Content;
    public DateTime GetTimestamp() => CreatedAt;
}

// Initialize the embedded ONNX embedding provider (no download required)
IEmbeddingProvider embeddings = await EmbeddingProviderFactory.CreateAsync();

// Compose a hybrid search engine
ISearchEngine<DocumentChunk> searchEngine = new HybridSearcher<DocumentChunk>(
    luceneEngine: new LuceneSearchEngine<DocumentChunk>("./index/lucene"),
    vectorEngine: new VectorSearchEngine<DocumentChunk>("./index/vector", embeddings),
    lexicalWeight: 0.5,
    semanticWeight: 0.5
);

// Index a document
await searchEngine.IndexAsync(new DocumentChunk
{
    Id = "doc1",
    Content = "How to optimize async performance in C#",
    CreatedAt = DateTime.UtcNow
});

// Search — hybrid mode by default
IReadOnlyList<SearchResult<DocumentChunk>> results =
    await searchEngine.SearchAsync("async optimization", maxResults: 10);

foreach (SearchResult<DocumentChunk> result in results)
{
    Console.WriteLine($"[{result.Score:F3}] {result.Document.Content}");
}

RAG Pipeline

Retrieve semantically relevant context chunks to augment an LLM prompt:

// Retrieve the top 5 most semantically relevant chunks
IReadOnlyList<SearchResult<DocumentChunk>> chunks =
    await searchEngine.SearchAsync(userQuery, maxResults: 5, mode: SearchMode.Semantic);

string context = string.Join("\n\n", chunks.Select(r => r.Document.Content));

string prompt = $"""
    Context:
    {context}

    Question: {userQuery}

    Answer:
    """;

Hierarchical Documents (IHierarchicalDocument)

Index parent documents whose children are individually searchable, and expand context at retrieval time:

public class Message
{
    public required string Id { get; init; }
    public required string Text { get; init; }
    public required DateTime SentAt { get; init; }
}

public class ConversationSession : IHierarchicalDocument<Message>
{
    public required string Id { get; init; }
    public required IReadOnlyList<Message> Messages { get; init; }

    // Each message is indexed individually
    public IReadOnlyList<Message> GetChildren() => Messages;

    // Retrieve N messages of context before a matched message
    public IReadOnlyList<Message> GetChildrenBefore(string messageId, int count) =>
        Messages
            .TakeWhile(m => m.Id != messageId)
            .TakeLast(count)
            .ToList();
}

// The engine indexes each Message as a searchable unit,
// but returns the parent ConversationSession with surrounding context
var sessionEngine = new HybridSearcher<ConversationSession>(
    luceneEngine: new LuceneSearchEngine<ConversationSession>("./index/lucene"),
    vectorEngine: new VectorSearchEngine<ConversationSession>("./index/vector", embeddings)
);

await sessionEngine.IndexAsync(session);

IReadOnlyList<SearchResult<ConversationSession>> hits =
    await sessionEngine.SearchAsync("authentication error", maxResults: 5);

🔍 Search Modes

Mode	Enum Value	Best For
Lexical	`SearchMode.Lexical`	Exact term matching, keyword queries, known identifiers
Semantic	`SearchMode.Semantic`	Conceptual similarity, natural language queries, paraphrase matching
Hybrid	`SearchMode.Hybrid`	General-purpose search — balances precision and recall via RRF

// Explicit mode selection
var results = await searchEngine.SearchAsync(query, maxResults: 10, mode: SearchMode.Semantic);

🏗️ Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        ISearchEngine<T>                          │
└───────────────────────────┬──────────────────────────────────────┘
                            │
              ┌─────────────▼─────────────┐
              │      HybridSearcher<T>    │
              │       (RRF Fusion)        │
              └──────┬────────────┬───────┘
                     │            │
        ┌────────────▼──┐   ┌─────▼──────────────┐
        │LuceneSearch   │   │VectorSearchEngine<T>│
        │Engine<T>      │   │(Semantic Search)    │
        │(BM25 Keyword) │   └─────────┬───────────┘
        └───────┬───────┘             │
                │               ┌─────▼──────────────┐
        ┌───────▼───────┐       │   AJVI Index        │
        │  Lucene.NET   │       │   (Memory-Mapped,   │
        │  Inverted     │       │    Float16 / F32)   │
        │  Index        │       └─────────┬───────────┘
        └───────────────┘                 │
                                 ┌────────▼────────────┐
                                 │  IEmbeddingProvider  │
                                 ├──────────────────────┤
                                 │  OnnxEmbeddingProvider│
                                 │  (MiniLM-L6-v2,      │
                                 │   384 dims, DirectML) │
                                 └──────────────────────┘

Component	Role
`ISearchable` / `IDocument`	Minimal interfaces for indexable content
`IEmbeddingProvider`	Pluggable embedding generation (ONNX, hash-based, or custom)
`LuceneSearchEngine<T>`	BM25 keyword search backed by Lucene.NET
`VectorSearchEngine<T>`	Semantic search using the AJVI custom vector index
`HybridSearcher<T>`	RRF fusion layer that merges lexical and semantic result sets
`AJVI Index`	Append-only binary vector store with memory-mapped I/O and SIMD cosine similarity
`DecayCalculator`	Exponential temporal decay (90-day half-life by default) for freshness boosting

⚡ Performance

Benchmarks measured on typical developer hardware (CPU: modern x64; GPU: DirectML-compatible discrete GPU).

Indexing

Operation	CPU	GPU (DirectML)
Embed single document	~15 ms	~2 ms
Index 100 documents	~1.5 s	~300 ms
Lucene index 1,000 documents	~50 ms	—

Search

Operation	Dataset Size	Latency
Lexical (BM25)	10K documents	~20 ms
Semantic (AJVI, Float16)	100K vectors	~80 ms
Hybrid (parallel RRF)	10K documents	~30 ms

Storage

Format	Bytes per Vector (384 dims)
AJVI Float16	~825 bytes
AJVI Float32	~1,593 bytes
Lucene (per document)	~500–2,000 bytes (content-dependent)

Scalability

Under 100K vectors — Excellent performance with brute-force nearest neighbor
100K–1M vectors — Good performance; review performance tuning guidance
Over 1M vectors — Approximate nearest neighbor support is a planned enhancement

⚙️ Configuration

services.Configure<SearchEngineOptions>(options =>
{
    options.IndexPath         = "./index";
    options.DefaultMode       = SearchMode.Hybrid;
    options.Precision         = VectorPrecision.Float16;  // ~50% storage savings vs Float32
    options.LexicalWeight     = 0.5;
    options.SemanticWeight    = 0.5;
    options.RrfK              = 60;    // Reciprocal Rank Fusion constant
    options.DecayHalfLifeDays = 90.0;  // Exponential temporal decay half-life
});

📋 Requirements

.NET 9.0 or later
Platforms — Windows, Linux, macOS
GPU (optional) — DirectML-compatible GPU for accelerated embedding generation

Dependencies

Package	Version	License
Lucene.Net	4.8.0-beta00016	Apache 2.0
Microsoft.ML.OnnxRuntime.DirectML	1.20.0	MIT
Microsoft.ML.Tokenizers	1.0.0	MIT
System.Numerics.Tensors	10.0.1	MIT

📚 Documentation

Document	Description
Getting Started	Step-by-step tutorial for your first integration
API Reference	Complete API surface reference
Architecture Overview	Component design, data flow, and index internals
Hierarchical Documents	Parent-child document indexing and context expansion
Custom Field Mapping	Controlling how document fields are indexed
Performance Tuning	Scaling guidance and optimization strategies
Temporal Decay	Configuring freshness-based relevance ranking
Migration Guide	Migrating from agent-session-search-tools
Samples	Runnable RAG and Agent Session examples

🤝 Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.

Open an issue to report bugs or request features
Start a discussion for questions and ideas

📄 License

MIT License — see LICENSE for full terms.

🙏 Acknowledgments

Embedding Model

sentence-transformers/all-MiniLM-L6-v2 384 dimensions, 22.7M parameters. Apache 2.0 License. Trained by Nils Reimers.

Research

Reciprocal Rank Fusion (RRF) — Cormack, Clarke, and Buettcher. Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods. SIGIR 2009.

Inspiration

Lucene.NET — The foundation of the lexical search layer
Elasticsearch — Hybrid search implementation patterns
Vespa.ai — Multi-phase retrieval architecture

Product	Compatible and additional computed target framework versions.
.NET	net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net9.0-windows7.0 is compatible. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net9.0
- Lucene.Net (>= 4.8.0-beta00016)
- Lucene.Net.Analysis.Common (>= 4.8.0-beta00016)
- Lucene.Net.QueryParser (>= 4.8.0-beta00016)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 9.0.0)
- Microsoft.Extensions.Logging.Abstractions (>= 9.0.0)
- Microsoft.Extensions.Options (>= 9.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.20.0)
- Microsoft.ML.Tokenizers (>= 1.0.0)
- System.Numerics.Tensors (>= 10.0.1)
net9.0-windows7.0
- Lucene.Net (>= 4.8.0-beta00016)
- Lucene.Net.Analysis.Common (>= 4.8.0-beta00016)
- Lucene.Net.QueryParser (>= 4.8.0-beta00016)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 9.0.0)
- Microsoft.Extensions.Logging.Abstractions (>= 9.0.0)
- Microsoft.Extensions.Options (>= 9.0.0)
- Microsoft.ML.OnnxRuntime.DirectML (>= 1.20.0)
- Microsoft.ML.Tokenizers (>= 1.0.0)
- System.Numerics.Tensors (>= 10.0.1)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.2	112	3/12/2026
1.0.1	104	3/12/2026