Skip to content

Vectl - Vector Cluster Store

Source: extern/vectl/README.md Last updated: 2024-10

A high-performance vector embedding storage system with clustering support, optimized for raw block devices. Perfect for semantic search, RAG systems, and other vector database applications.

Features

  • Direct block device access for optimized performance
  • K-means clustering for efficient vector similarity search
  • Python bindings for seamless integration
  • Memory-mapped I/O for high-throughput operations
  • Support for both file and block device storage
  • Command-line interface for management and testing

Requirements

  • Linux system (tested on Ubuntu 20.04+)
  • C++17 compatible compiler (GCC 9+ or Clang 10+)
  • CMake 3.10+
  • Python 3.6+ (for Python bindings)
  • pybind11
  • numpy
  • [Optional] Ollama for text embeddings

Building and Installation

# Clone and build
git clone https://github.com/yourusername/vector-cluster-store.git
cd vector-cluster-store
./build.sh

# For Python bindings
pip install -e .

Usage

./ollama_vector_search.py ./vector_store.bin

Raw Block Device Storage (Advanced)

sudo ./prepare_device.sh /dev/sdX   # Replace sdX with your device
sudo ./ollama_vector_search.py /dev/sdX

Python API

import vector_cluster_store_py

# Create a logger
logger = vector_cluster_store_py.Logger("vector_store.log")

# Create and initialize vector store
store = vector_cluster_store_py.VectorClusterStore(logger)
store.initialize("./vector_store.bin", "kmeans", 768, 10)

# Store a vector
vector_id = 0
vector = [0.1, 0.2, 0.3]  # Your embedding vector
metadata = "Example metadata"
store.store_vector(vector_id, vector, metadata)

# Retrieve a vector
retrieved_vector = store.retrieve_vector(vector_id)

# Find similar vectors
query_vector = [0.1, 0.2, 0.3]  # Query embedding
results = store.find_similar_vectors(query_vector, 5)  # Get top 5 matches

Architecture

┌───────────────────┐    ┌────────────────────┐    ┌──────────────────┐
│ LLM Application   │    │ Vector Cluster     │    │                  │
│ (Query Interface) │───►│ Storage Library    │───►│ Storage Device   │
└───────────────────┘    └────────────────────┘    └──────────────────┘
                                    ▲                        │
                                    │                        │
                         ┌──────────┴──────────┐             │
                         │ Clustering Index    │◄────────────┘
                         └─────────────────────┘

Block Device Layout

┌──────────────────────────────────────────────────────────────┐
│                       Block Device                           │
├────────────┬───────────────┬──────────────┬──────────────────┤
│ Header     │ Cluster Map   │ Vector Map   │ Vector Data      │
│ (512B)     │ Region        │ Region       │ Region           │
└────────────┴───────────────┴──────────────┴──────────────────┘

Performance (128GB USB device, Raspberry Pi 4B)

Operation Traditional FS Raw Block Device Improvement
Sequential Write 30-40 MB/s 35-45 MB/s 10-15%
Random Read 5-10 MB/s 15-25 MB/s 150-200%
Vector Search (1M vectors) 500-1000ms 100-300ms 70-80%
Memory Usage 200-300MB 50-100MB 60-70%