ZeroProximity.VectorizedContentIndexer 1.0.2

dotnet add package ZeroProximity.VectorizedContentIndexer --version 1.0.2
                    
NuGet\Install-Package ZeroProximity.VectorizedContentIndexer -Version 1.0.2
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ZeroProximity.VectorizedContentIndexer" Version="1.0.2" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="ZeroProximity.VectorizedContentIndexer" Version="1.0.2" />
                    
Directory.Packages.props
<PackageReference Include="ZeroProximity.VectorizedContentIndexer" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add ZeroProximity.VectorizedContentIndexer --version 1.0.2
                    
#r "nuget: ZeroProximity.VectorizedContentIndexer, 1.0.2"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package ZeroProximity.VectorizedContentIndexer@1.0.2
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=ZeroProximity.VectorizedContentIndexer&version=1.0.2
                    
Install as a Cake Addin
#tool nuget:?package=ZeroProximity.VectorizedContentIndexer&version=1.0.2
                    
Install as a Cake Tool

πŸ” ZeroProximity.VectorizedContentIndexer

.NET License NuGet

A high-performance hybrid content indexing library for .NET β€” combining BM25 keyword search, semantic vector search, and RRF fusion with a fully embedded ONNX model. Zero config. Works offline.


πŸ“‹ Overview

ZeroProximity.VectorizedContentIndexer brings production-grade hybrid search to any .NET 9 application without requiring external services, API keys, or model downloads. It ships an embedded MiniLM-L6-v2 ONNX model (384 dimensions, 22.7M parameters) and a custom binary vector index format designed for memory-efficient, high-throughput retrieval.

Why this library?

Capability ZeroProximity.VectorizedContentIndexer
Embedding model Embedded MiniLM-L6-v2 β€” no download, no API key
Vector index format Custom AJVI binary format β€” append-only, memory-mapped, Float16
Storage efficiency ~825 bytes/vector at Float16 (384 dimensions)
Search modes Lexical (BM25), Semantic, and Hybrid (RRF)
Document model Generic ISearchable / IDocument β€” works with any content type
Hierarchical docs Built-in parent-child relationship support
GPU acceleration DirectML support for 10–20x faster embedding generation
Time-based ranking Exponential temporal decay (configurable half-life)
Dependencies All Apache 2.0 / MIT β€” fully permissive

✨ Key Features

Search Modes

  • Lexical Search β€” Lucene.NET BM25 keyword search with inverted index; fast and precise for exact term matching
  • Semantic Search β€” Dense vector search using ONNX-generated embeddings; finds conceptually similar content even without shared keywords
  • Hybrid Search β€” Reciprocal Rank Fusion (RRF) combines lexical and semantic rankings into a single, balanced result set

Document Model

  • ISearchable β€” Minimal interface: provide text and a timestamp, the library handles the rest
  • IDocument β€” Extended interface for richer content with ID, metadata, and field control
  • IHierarchicalDocument<TChild> β€” First-class support for parent-child structures (e.g., a session containing messages), enabling child-level indexing with parent-level context retrieval

Performance

  • Parallel lexical + semantic query execution in hybrid mode
  • Memory-mapped AJVI index for low-latency random reads at scale
  • SIMD cosine similarity (System.Numerics intrinsics)
  • Float16 storage cuts vector memory footprint by ~50% vs Float32
  • DirectML GPU acceleration for batch embedding workloads

Thread Safety

  • Concurrent reads fully supported
  • Writes are serialized β€” safe for multi-threaded indexing pipelines

πŸ“¦ Installation

dotnet add package ZeroProximity.VectorizedContentIndexer --prerelease

This package is currently in prerelease (1.0.0-beta.1). The --prerelease flag is required until the stable release.


πŸš€ Quick Start

Basic: Hybrid Search with ISearchable

using ZeroProximity.VectorizedContentIndexer.Embeddings;
using ZeroProximity.VectorizedContentIndexer.Search;
using ZeroProximity.VectorizedContentIndexer.Models;

// Define your searchable content type
public record DocumentChunk : ISearchable
{
    public required string Id { get; init; }
    public required string Content { get; init; }
    public required DateTime CreatedAt { get; init; }

    public string GetSearchableText() => Content;
    public DateTime GetTimestamp() => CreatedAt;
}

// Initialize the embedded ONNX embedding provider (no download required)
IEmbeddingProvider embeddings = await EmbeddingProviderFactory.CreateAsync();

// Compose a hybrid search engine
ISearchEngine<DocumentChunk> searchEngine = new HybridSearcher<DocumentChunk>(
    luceneEngine: new LuceneSearchEngine<DocumentChunk>("./index/lucene"),
    vectorEngine: new VectorSearchEngine<DocumentChunk>("./index/vector", embeddings),
    lexicalWeight: 0.5,
    semanticWeight: 0.5
);

// Index a document
await searchEngine.IndexAsync(new DocumentChunk
{
    Id = "doc1",
    Content = "How to optimize async performance in C#",
    CreatedAt = DateTime.UtcNow
});

// Search β€” hybrid mode by default
IReadOnlyList<SearchResult<DocumentChunk>> results =
    await searchEngine.SearchAsync("async optimization", maxResults: 10);

foreach (SearchResult<DocumentChunk> result in results)
{
    Console.WriteLine($"[{result.Score:F3}] {result.Document.Content}");
}

RAG Pipeline

Retrieve semantically relevant context chunks to augment an LLM prompt:

// Retrieve the top 5 most semantically relevant chunks
IReadOnlyList<SearchResult<DocumentChunk>> chunks =
    await searchEngine.SearchAsync(userQuery, maxResults: 5, mode: SearchMode.Semantic);

string context = string.Join("\n\n", chunks.Select(r => r.Document.Content));

string prompt = $"""
    Context:
    {context}

    Question: {userQuery}

    Answer:
    """;

Hierarchical Documents (IHierarchicalDocument)

Index parent documents whose children are individually searchable, and expand context at retrieval time:

public class Message
{
    public required string Id { get; init; }
    public required string Text { get; init; }
    public required DateTime SentAt { get; init; }
}

public class ConversationSession : IHierarchicalDocument<Message>
{
    public required string Id { get; init; }
    public required IReadOnlyList<Message> Messages { get; init; }

    // Each message is indexed individually
    public IReadOnlyList<Message> GetChildren() => Messages;

    // Retrieve N messages of context before a matched message
    public IReadOnlyList<Message> GetChildrenBefore(string messageId, int count) =>
        Messages
            .TakeWhile(m => m.Id != messageId)
            .TakeLast(count)
            .ToList();
}

// The engine indexes each Message as a searchable unit,
// but returns the parent ConversationSession with surrounding context
var sessionEngine = new HybridSearcher<ConversationSession>(
    luceneEngine: new LuceneSearchEngine<ConversationSession>("./index/lucene"),
    vectorEngine: new VectorSearchEngine<ConversationSession>("./index/vector", embeddings)
);

await sessionEngine.IndexAsync(session);

IReadOnlyList<SearchResult<ConversationSession>> hits =
    await sessionEngine.SearchAsync("authentication error", maxResults: 5);

πŸ” Search Modes

Mode Enum Value Best For
Lexical SearchMode.Lexical Exact term matching, keyword queries, known identifiers
Semantic SearchMode.Semantic Conceptual similarity, natural language queries, paraphrase matching
Hybrid SearchMode.Hybrid General-purpose search β€” balances precision and recall via RRF
// Explicit mode selection
var results = await searchEngine.SearchAsync(query, maxResults: 10, mode: SearchMode.Semantic);

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        ISearchEngine<T>                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚      HybridSearcher<T>    β”‚
              β”‚       (RRF Fusion)        β”‚
              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚            β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”   β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚LuceneSearch   β”‚   β”‚VectorSearchEngine<T>β”‚
        β”‚Engine<T>      β”‚   β”‚(Semantic Search)    β”‚
        β”‚(BM25 Keyword) β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
                β”‚               β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”       β”‚   AJVI Index        β”‚
        β”‚  Lucene.NET   β”‚       β”‚   (Memory-Mapped,   β”‚
        β”‚  Inverted     β”‚       β”‚    Float16 / F32)   β”‚
        β”‚  Index        β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
                                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                 β”‚  IEmbeddingProvider  β”‚
                                 β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                                 β”‚  OnnxEmbeddingProviderβ”‚
                                 β”‚  (MiniLM-L6-v2,      β”‚
                                 β”‚   384 dims, DirectML) β”‚
                                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Component Role
ISearchable / IDocument Minimal interfaces for indexable content
IEmbeddingProvider Pluggable embedding generation (ONNX, hash-based, or custom)
LuceneSearchEngine<T> BM25 keyword search backed by Lucene.NET
VectorSearchEngine<T> Semantic search using the AJVI custom vector index
HybridSearcher<T> RRF fusion layer that merges lexical and semantic result sets
AJVI Index Append-only binary vector store with memory-mapped I/O and SIMD cosine similarity
DecayCalculator Exponential temporal decay (90-day half-life by default) for freshness boosting

⚑ Performance

Benchmarks measured on typical developer hardware (CPU: modern x64; GPU: DirectML-compatible discrete GPU).

Indexing

Operation CPU GPU (DirectML)
Embed single document ~15 ms ~2 ms
Index 100 documents ~1.5 s ~300 ms
Lucene index 1,000 documents ~50 ms β€”
Operation Dataset Size Latency
Lexical (BM25) 10K documents ~20 ms
Semantic (AJVI, Float16) 100K vectors ~80 ms
Hybrid (parallel RRF) 10K documents ~30 ms

Storage

Format Bytes per Vector (384 dims)
AJVI Float16 ~825 bytes
AJVI Float32 ~1,593 bytes
Lucene (per document) ~500–2,000 bytes (content-dependent)

Scalability

  • Under 100K vectors β€” Excellent performance with brute-force nearest neighbor
  • 100K–1M vectors β€” Good performance; review performance tuning guidance
  • Over 1M vectors β€” Approximate nearest neighbor support is a planned enhancement

βš™οΈ Configuration

Register and configure the search engine via IOptions<SearchEngineOptions>:

services.Configure<SearchEngineOptions>(options =>
{
    options.IndexPath         = "./index";
    options.DefaultMode       = SearchMode.Hybrid;
    options.Precision         = VectorPrecision.Float16;  // ~50% storage savings vs Float32
    options.LexicalWeight     = 0.5;
    options.SemanticWeight    = 0.5;
    options.RrfK              = 60;    // Reciprocal Rank Fusion constant
    options.DecayHalfLifeDays = 90.0;  // Exponential temporal decay half-life
});

πŸ“‹ Requirements

  • .NET 9.0 or later
  • Platforms β€” Windows, Linux, macOS
  • GPU (optional) β€” DirectML-compatible GPU for accelerated embedding generation

Dependencies

Package Version License
Lucene.Net 4.8.0-beta00016 Apache 2.0
Microsoft.ML.OnnxRuntime.DirectML 1.20.0 MIT
Microsoft.ML.Tokenizers 1.0.0 MIT
System.Numerics.Tensors 10.0.1 MIT

πŸ“š Documentation

Document Description
Getting Started Step-by-step tutorial for your first integration
API Reference Complete API surface reference
Architecture Overview Component design, data flow, and index internals
Hierarchical Documents Parent-child document indexing and context expansion
Custom Field Mapping Controlling how document fields are indexed
Performance Tuning Scaling guidance and optimization strategies
Temporal Decay Configuring freshness-based relevance ranking
Migration Guide Migrating from agent-session-search-tools
Samples Runnable RAG and Agent Session examples

🀝 Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.


πŸ“„ License

MIT License β€” see LICENSE for full terms.


πŸ™ Acknowledgments

Embedding Model

sentence-transformers/all-MiniLM-L6-v2 384 dimensions, 22.7M parameters. Apache 2.0 License. Trained by Nils Reimers.

Research

Inspiration

  • Lucene.NET β€” The foundation of the lexical search layer
  • Elasticsearch β€” Hybrid search implementation patterns
  • Vespa.ai β€” Multi-phase retrieval architecture
Product Compatible and additional computed target framework versions.
.NET net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net9.0-windows7.0 is compatible.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.2 112 3/12/2026
1.0.1 104 3/12/2026