Vectl - Vector Cluster Store¶
Source:
extern/vectl/README.mdLast updated: 2024-10
A high-performance vector embedding storage system with clustering support, optimized for raw block devices. Perfect for semantic search, RAG systems, and other vector database applications.
Features¶
- Direct block device access for optimized performance
- K-means clustering for efficient vector similarity search
- Python bindings for seamless integration
- Memory-mapped I/O for high-throughput operations
- Support for both file and block device storage
- Command-line interface for management and testing
Requirements¶
- Linux system (tested on Ubuntu 20.04+)
- C++17 compatible compiler (GCC 9+ or Clang 10+)
- CMake 3.10+
- Python 3.6+ (for Python bindings)
- pybind11
- numpy
- [Optional] Ollama for text embeddings
Building and Installation¶
# Clone and build
git clone https://github.com/yourusername/vector-cluster-store.git
cd vector-cluster-store
./build.sh
# For Python bindings
pip install -e .
Usage¶
File-based Storage (Recommended for Testing)¶
Raw Block Device Storage (Advanced)¶
sudo ./prepare_device.sh /dev/sdX # Replace sdX with your device
sudo ./ollama_vector_search.py /dev/sdX
Python API¶
import vector_cluster_store_py
# Create a logger
logger = vector_cluster_store_py.Logger("vector_store.log")
# Create and initialize vector store
store = vector_cluster_store_py.VectorClusterStore(logger)
store.initialize("./vector_store.bin", "kmeans", 768, 10)
# Store a vector
vector_id = 0
vector = [0.1, 0.2, 0.3] # Your embedding vector
metadata = "Example metadata"
store.store_vector(vector_id, vector, metadata)
# Retrieve a vector
retrieved_vector = store.retrieve_vector(vector_id)
# Find similar vectors
query_vector = [0.1, 0.2, 0.3] # Query embedding
results = store.find_similar_vectors(query_vector, 5) # Get top 5 matches
Architecture¶
┌───────────────────┐ ┌────────────────────┐ ┌──────────────────┐
│ LLM Application │ │ Vector Cluster │ │ │
│ (Query Interface) │───►│ Storage Library │───►│ Storage Device │
└───────────────────┘ └────────────────────┘ └──────────────────┘
▲ │
│ │
┌──────────┴──────────┐ │
│ Clustering Index │◄────────────┘
└─────────────────────┘
Block Device Layout¶
┌──────────────────────────────────────────────────────────────┐
│ Block Device │
├────────────┬───────────────┬──────────────┬──────────────────┤
│ Header │ Cluster Map │ Vector Map │ Vector Data │
│ (512B) │ Region │ Region │ Region │
└────────────┴───────────────┴──────────────┴──────────────────┘
Performance (128GB USB device, Raspberry Pi 4B)¶
| Operation | Traditional FS | Raw Block Device | Improvement |
|---|---|---|---|
| Sequential Write | 30-40 MB/s | 35-45 MB/s | 10-15% |
| Random Read | 5-10 MB/s | 15-25 MB/s | 150-200% |
| Vector Search (1M vectors) | 500-1000ms | 100-300ms | 70-80% |
| Memory Usage | 200-300MB | 50-100MB | 60-70% |