Building a Vector Search Engine Using HNSW and Cosine Similarity

3 min readJul 18, 2023

Hierarchical Navigable Small World graphs (HNSW) is an algorithm that allows for efficient nearest neighbor search, and the Sentence Transformers library allows for the generation of semantically meaningful sentence embeddings.

We will be using the HNSWlib python library for our tutorial, which provides a fast and memory-efficient implementation of HNSW.

But first why?

While there are many existing search engines and databases that provide vector search capabilities (such as Elasticsearch or Faiss), building your own HNSW vector search might be a better choice if you need a fast, memory-efficient, and customizable solution, especially for applications involving real-time vector search.

1. Install Libraries

First, we need to install the necessary libraries.

pip install hnswlib sentence-transformers

2. Import Libraries

After installing, import all the necessary libraries:

import hnswlib
import numpy as np
from sentence_transformers import SentenceTransformer
import time

3. Initialize Sentence Transformer Model

We will use a pre-trained Sentence Transformer model from HuggingFace’s model hub. For this tutorial, we will use the all-MiniLM-L6-v2 model:

model = SentenceTransformer('all-MiniLM-L6-v2')

4. Generate and Store Embeddings in HNSW

Now, we create a list of sentences, generate their embeddings using Sentence Transformers, and store these embeddings in an HNSW index:

sentences = ['This is an example sentence', 'This is another one']

# Create embeddings
embeddings = model.encode(sentences)

# Dimension of our vector space
dimension = embeddings.shape[1]

# Create a new index
p = hnswlib.Index(space='cosine', dim=dimension)

# Initialize an index - the maximum number of elements should be known beforehand
p.init_index(max_elements=10000, ef_construction=200, M=16)

# Element insertion (can be called several times)
p.add_items(embeddings)

# Controlling the recall by setting ef:
p.set_ef(50) # ef should always be > k

The hnswlib library uses an in-memory data structure to store the HNSW index by default. When you call the init_index and add_items functions, the provided embeddings are stored in memory.

5. Query the HNSW Index

We can query the HNSW index to find the most similar sentence in our list to a new sentence:

new_sentence = "A new sentence similar to the previous ones"
new_embedding = model.encode([new_sentence])

# Fetch k neighbors
labels, distances = p.knn_query(new_embedding, k=1)

print("Most similar sentence is: ", sentences[labels[0][0]])

6. Measure Time

To measure the speed of our search, we can use Python’s time library:

start_time = time.time()

# Repeat the search process to get a fair measure
for _ in range(1000):
    labels, distances = p.knn_query(new_embedding, k=1)

end_time = time.time()
elapsed_time = end_time - start_time

print("Elapsed time for 1000 searches: ", elapsed_time)

All Together Now

import hnswlib
import numpy as np
from sentence_transformers import SentenceTransformer
import time

# Initialize Sentence Transformer Model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sentences to embed and store
sentences = ['This is an example sentence', 'This is another one']

# Create embeddings
embeddings = model.encode(sentences)

# Dimension of our vector space
dimension = embeddings.shape[1]

# Create a new index
p = hnswlib.Index(space='cosine', dim=dimension)

# Initialize an index - the maximum number of elements should be known beforehand
p.init_index(max_elements=10000, ef_construction=200, M=16)

# Element insertion (can be called several times)
p.add_items(embeddings)

# Controlling the recall by setting ef:
p.set_ef(50) # ef should always be > k

# Query HNSW index for most similar sentence
new_sentence = "A new sentence similar to the previous ones"
new_embedding = model.encode([new_sentence])

# Fetch k neighbors
labels, distances = p.knn_query(new_embedding, k=1)

print("Most similar sentence is: ", sentences[labels[0][0]])

# Measure the speed
start_time = time.time()

# Repeat the search process to get a fair measure
for _ in range(1000):
    labels, distances = p.knn_query(new_embedding, k=1)

end_time = time.time()
elapsed_time = end_time - start_time

print("Elapsed time for 1000 searches: ", elapsed_time)

Results:

Elapsed time for 1000 searches:  0.0037620067596435547

That’s 3ms…

I believe if you compare this to many off-the-shelf vector search engines, it will be faster. So if you need vector search for highly latency-sensitive workloads where volume doesn’t quite matter, consider building your own.

Of course, if you want to conduct semantic search across your S3 bucket, Mixpeek has you covered.