An Open-Source, Personalized Generative Model Framework

Ethan Steininger
4 min readMar 22


Custom, generative models are the future.

Whether you’re crafting video montages and chatbots or producing audio summaries and image montages they all require the same data pipelines.

Engineers need to prepare and process the source of truth, then fine-tune and deploy the generative models for their specific use cases.

Note: if you came here for that free $25 Amazon gift card scroll to the bottom

Here’s an example pipeline an engineer used to build a custom Lex Fridman chatbot trained on his unique audio:

I’ve seen and built this pipeline enough to know become a standard.

So I’m proposing the CollieConnector — an open source framework to streamline the process of customizing accurate, generative models.

The collieConnector framework will be comprised of several modules, each of which will provide full support for BYO anything. Have your own embedding model? Plug it in. A different vector search engine, no problem. It’s designed to be a fully modular framework.

Let’s examine each step in detail to understand how the CollieConnector brings these components together.

The CollieConnector framework consists of seven main steps:

  1. Accept corpus
  2. Split into chunks
  3. Embed the chunks
  4. Ask a question (Q)
  5. Return top K chunks
  6. Run chunks through GPT
  7. Return results in structured form

1. Accept Corpus

The first step is to process the input data source for the generative model. This can be achieved using a function that accepts a variety of input formats, such as JSON, CSV, or plain text.

import pandas as pd

def load_corpus(file_path, format='csv'):
if format == 'csv':
corpus = pd.read_csv(file_path)
elif format == 'json':
corpus = pd.read_json(file_path)
raise ValueError(f"Unsupported format: {format}")

return corpus

2. Split into Chunks

Next, the input data needs to be segmented into coherent and meaningful chunks. One common approach is to use a sliding window technique with a specified stride[1].

def split_into_chunks(text, window_size=100, stride=50):
chunks = []
for i in range(0, len(text) - window_size + 1, stride):
return chunks

3. Embed the Chunks

To facilitate efficient search, the chunks need to be embedded using a pre-trained language model, such as BERT or Sentence Transformers[2].

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('paraphrase-distilroberta-base-v1')

def embed_chunks(chunks):
return model.encode(chunks)

4. Ask a Question (Q)

To interact with the generative model, users can input a question or prompt. This is a simple string input that is processed by the model.

question = "What is the CollieConnector framework?"

5. Return Top K Chunks

The top K chunks most relevant to the question are retrieved using a vector search engine, such as FAISS[3].

import faiss
import numpy as np

def search_chunks(query_embedding, chunk_embeddings, k=5):
index = faiss.IndexFlatL2(query_embedding.shape[-1])
_, top_k_indices =[query_embedding]), k)
return top_k_indices[0]

6. Run Chunks through GPT

The selected chunks are fed into a GPT-based generative model, such as OpenAI’s GPT-3[4], to generate a response.

import openai

def generate_response(prompt, model='text-davinci-002', **kwargs):
response = openai.Completion.create(
return response.choices[0].text.strip()

7. Return Results in Structured Form

Finally, the generated response is returned with embedded inline citations from the source of truth.

def generate_citation(chunk_index):
return f"[{chunk_index}]"

def generate_answer_with_citations(question, chunks, chunk_embeddings):
query_embedding = model.encode([question])[0]
top_k_indices = search_chunks(query_embedding, chunk_embeddings)

prompt = question + " " + " ".join([f"{chunks[i]}{generate_citation(i)}" for i in top_k_indices])
response = generate_response(prompt)
return response

Sample Final Output

🚀 Use Cases

  • Sales & Support: Provide your teams with the tools they need to effectively sell your products and assist existing users with troubleshooting.
  • Application Search: Direct your users to the most appropriate content to meet their needs quickly and easily.
  • Content Creators: Convert your existing content into a powerful marketing funnel to grow your subscription business.
  • Marketing Funnels: Offer personalized summaries and guided next steps upon user’s return to reduce churn.

💰 Get Paid

Interested? We will give you a $25 gift card for for participating in a 30 minute interview. Schedule it here:


[1] Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint arXiv:1908.10084.

[2] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.

[3] Johnson, J., Douze, M., & Jégou, H. (2017). Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734.

[4] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Agarwal, S. (2020). Language models are few-shot learners. Nature, 587(7835), 604–610.