Vertical Integration is Key to Winning the AI Race

4 min readJan 26, 2023

I will live stream myself building this tech stack end-to-end: Register

Pre-trained ML models are now easier than ever to implement thanks to companies like HuggingFace and Replicate.

It’s come to the point where you can leverage resources that previously were only in the hands of a few tech giants with JUST 3 lines of code:

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

embeddings = model.encode("convert this sentence")

This is using an ML model that was trained on a massive volume of internet conversations. The top 5 sources, below. Notice the volume of tuples (which are Q/A pairs):

https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

The answer: vertical integration

Now that anybody can leverage ML models with insane capacity, the question is how are companies going to differentiate? What is their moat? In true capitalistic fashion the answer is always: vertical integration.

The process of attaching your business to every step of the journey has unlocked insane wealth for companies like:

Walmart — manufacturing, distribution, and retail

Amazon — warehousing, logistics, delivery and private labeling

Apple — hardware, software, and services, and retail stores

It all starts with open source

The models (just like most of software) begins at open source. Every company that has a data science/machine learning/artificial intelligence organization is training models on novel data, and the great ones are publishing their models onto libraries like HuggingFace and Replicate.

Facebook, Google, OpenAI have all published novel pre-trained ML models, but then the question is how do you implement it at scale?

One line model import

It’s one thing to have these models available, it’s another to have them native in your code. Huggingface and Replicate provide fantastic developer experiences for embedding these models in your code. I prefer Python for my development, so tend to use their Python libraries like SBERT’s Sentence Transformer.

The output, when not generative AI models like Dall-E and ChatGPT is typically a large vector. More on vector embeddings here: https://vectorsearch.dev/. Here’s an example of an embedding:

{
    "vector": [
        -0.0033736410550773144,
        0.07936657965183258,
        -0.06529629230499268,
        -0.007808310445398092,
        ... 
    ]
}

How do vector embeddings get created?

These companies are strategically getting into the cloud deployment of ML models.

Serverless GPUs

ML models tend to be incredibly computationally expensive. These computations are typically just matrix arithmetic, or linear algebra, an activity that GPUs were optimized for. A good analogy is CPUs were optimized for high throughput and GPUs for large payloads. Some of these matrixes are quite large and necessitate GPUs if you want to do anything fast.

These models therefore need to run on GPUs, and the reason we like serverless is because it scales infinitely. Remember, anybody can use models. The challenge is operationalizing them at scale. Some companies doing this quite well are Modal and Banana.

Where do these embeddings go?

In addition to the embeddings, any company that wants to build applications needs to have a database that adheres to certain properties. ACID compliance is that certain property. There needs to be guarantees that writes are sent to the storage layer, they’re atomic, consistent, isolated and durable.

Data Storage

I worked at MongoDB for 3 years, so I would be remiss to not use them in my example but there are tons of other ACID compliance databases. Here are the top 5 document stores:

https://db-engines.com/en/ranking/document+store

Vector Storage

Sparse vectors evolved into dense vectors recently as some new, innovative data structures have recently surfaced from academia. Historically, search engines like ElasticSearch and SOLR were using the underlying open source library, Lucene which has it’s own challenges. Since it’s built on Java, it succumbs to JVM bottlenecks like heap management.

The new innovative data structure, rather than an inverted index via segment files (which Lucene is based on) is called a hierarchical navigable small world graph (HNSW graph). These were built for fast, efficient vector traversal and retrieval. My opinion is that companies like Weaviate and Pinceone that build their own search indexes (via HNSW) are going to have the highest throughput and lowest latency.

It ends with the user…

Once we have this scalable stack of machine learning, it’s time to actually deliver value to our end users. Companies that offer multimodal search or enterprise search are typically leveraging the above stack or creating their own.

One problem that ML/vector search aims to solve is that of wrangling the explosion of content, which will only get bigger. Mixpeek is an intelligent file store that is capable of wrangling this explosion by surfacing unique, personalized insights to end users at the time where it’s most meaningful. It strives to optimize businesses’ conversions by providing empirical data and a unique A/B testing suite. All collapsed within a slick Python library, Javascript widget and Dashboard.

Disclaimer: I work at Mixpeek.

Food for thought…

Who do you think is most poised to take over the ML stack? What incumbents will get dethroned? What are the biggest challenges to operationalizing above?