Open-source software (OSS) is the backbone of cloud infrastructure and every part of the modern software stack, from operating systems, libraries, runtimes, databases, and more. It’s enabled the creation of thriving businesses like MongoDB, HashiCorp, Confluent, Elastic and Databricks. With millions of products spanning all categories of cloud infrastructure, OSS has come a long way since its early days as a seemingly improbable way to build a viable business.

In this blog post, we’ll explore the origins and evolution of open source, emphasizing the pivotal role of community. We’ll also share some observations about the current state of OSS and reveal the next generation of providers through the Redpoint Open-source Index, trackable via a publicly available dashboard that we’re excited to debut.

Origins

In the late 80s and early 90s, a group of developers began advocating for a new approach to software development. Their vision was one where software could be freely used, modified, and distributed without any restrictions, a concept that would become known as open source software (OSS). This movement was largely a response to the monopolistic grip that software behemoths like Microsoft, IBM, and Oracle held over the market selling close-sourced proprietary software.

The idea behind OSS was simple yet powerful: by working together, developers believed they could create better software than what could be achieved by any one company. This community-driven approach to innovation challenged the traditional notion that software development should be the sole purview of for-profit companies. With OSS, the focus shifted from profit to collaboration, with the goal of creating high-quality software that could benefit everyone.

They started by disrupting the hard stuff – operating systems. In 1991, Linus Torvalds released the Linux operating system, which rapidly rose to become the second most popular OS behind only Microsoft Windows. After Linux, developers aimed at databases1. MySQL challenged the Oracle Database and took much of its market share, and is now the world's most popular open-source relational database

Linux and MySQL paved the way for numerous open-source products and companies to emerge in the 90s and early 2000s, laying the foundation for the OSS ecosystem we have today.

However, turning these projects into lucrative businesses took time. In the early days, it was considered unlikely and conflicting to build for-profit companies by utilizing open source as a distribution channel. It was believed to contradict the fundamental principle of open source, which was to keep it accessible and free. However, scaling open source software for the enterprise comes with a host of problems, problems that are quite lucrative.

The first generation of OSS companies sold professional services (installation, training, 24/7 technical support). It was challenging to predict renewals, and profit margins were low. The second generation, known as open-core, provided two versions of their software: (1) a "core" version that remained open source and (2) an enterprise version that utilized the core code but offered premium features such as security, integrations, etc. for a subscription. Over the past few years, OSS businesses have incorporated an additional revenue stream through cloud-hosted services. This entails allowing users to access the open-source binary via a cloud-hosted model, much like a typical SaaS offering, instead of requiring them to set up and manage the product themselves.

The public OSS businesses we see today like MongoDB, Confluent and HashiCorp all employ a hybrid open-core / cloud-hosted model with professional services as an add-on. At the same time, they devote a ton of R&D into developing and fostering the core version - you can’t lose community!

The importance of community and bottoms-up

The traditional approach to selling software targets IT as the purchasing center, requiring manager buy-in and an often long POC process. In contrast, open-source offerings allow developers to easily try out software for free. Developers are a notoriously fickle target customer segment, with perpetually-evolving tastes and a disposition towards exhaustively trying software and alternatives before committing to a purchase. An open-source product enters the org through developers, working its way up the hierarchy to the eventual budget holder (VP Eng., CTO, etc.). This fundamentally changes the GTM model by putting the real customers - the developers and end-users - in control.

As a result, building and growing a community that caters to developers is essential for the success of any open-source project. Community attracts new users, brings in contributors, and provides valuable feedback to amplify a project relative to competing offerings. Let's take a closer look at the various stages of an OSS product's journey and explore the rough signals associated with each.

Early Adopters → A project attracts early adopters - the people that like to try something new and are attracted to novelty. The most popular open source projects often stem from leading academic research, unique data sources, or founders with impressive backgrounds. These early adopters engage with a project on GitHub and participate in active Discord and Slack groups. Channels such as Twitter and Reddit are great for attracting these early adopters. 

  • Signals: ~2k GitHub stars, >5% forks / stars ratio, ~50 contributors

Stable Growth & Feedback Loop → As a project gains traction, early adopters often become technical champions, providing valuable feedback and contributing code, documentation, and blog posts. Once a project gains technical champions and users start submitting issues on GitHub, it's an excellent indication that there is potential for a business, i.e. the “crossing the chasm” moment. At this point, companies will begin to invest in a Developer Relations function to foster engagement and ownership among the project's stakeholders. This role has become increasingly critical in open-source and bottoms-up companies, as it helps bridge the gap between the company and community.

  • Signals: DevRel team, >10% forks / star ratio, ~100 contributors

Production Deployment & Revenue Generation → Over time, users integrate the software into their work projects, catching the attention of their colleagues who also start trying out the product. As the project's value proposition becomes more apparent, it progresses from prototyping, integration/testing, and finally to production. IT typically conducts a POC of the solution to ensure it meets enterprise needs, particularly those related to security and scale, before deploying it in production. 

  • Signals: Open-core / managed service, >15% forks / stars ratio, ~300 contributors, 1,500+ pull requests merged in LTM

These signals provide a rough framework - for a more detailed breakdown of various metrics at each stage of a company’s lifecycle, our dashboard offers a live analysis under the 'Stage Analysis' section. Here’s the crux of the data looking at all cloud infrastructure private-stage startups. It’s interesting to see how each metric ramps with each subsequent round of financing.

Latest observations

Not everything needs to be open source. We often ask why a project is open-source in the first place, and how does it being open-source help create and capture value? Open-source is great for highly modular infrastructure or compute-intensive workloads, i.e. databases and developer tools. But, not everything needs to be open source. The further you go up the stack, the less open-source matters.

There are certain instances, however, where it’s debated whether infrastructure should be open-source. The most recent example is large language models. As AI and eventual AGI become increasingly powerful, it’s ethically questionable to make models open-source. Co-founder and Chief Scientist of OpenAI Ilya Sutskever strongly believes that “in a few years it’s going to be completely obvious to everyone that open-sourcing AI is not wise.” At the same time, there’s a world that doesn’t want OpenAI to be the sole arbitrator and would rather have researchers share ideas and build on each other’s work. In practice, engineers will find ways to reverse-engineer models to make them open-source. It’s already happening. One of the key missing pieces in the AI race is a comprehensive regulatory framework to address these issues.

Long-term efficiency isn’t lower for OSS. Open source is not a business model but a distribution advantage. Initial CAC may be lower, but converting these users to paid requires a separate and often expensive GTM motion. Companies need to intimately understand a user’s journey from discovery to deployment (this is why DevRel is so important).

We separated the public universe into OSS, infrastructure, and application-level SaaS businesses2 to compare % sales and marketing (S&M) spend over time since IPO. Interestingly, OSS companies debut with the highest % S&M spend before equaling the infrastructure median of ~50%. SaaS companies spend the least, operating at ~40% S&M.

When accounting for revenue generation and looking at GM-adjusted payback3, the story is similar: the OSS model doesn’t provide advantages in long-term efficiency. Companies selling OSS must layer on tops-down sales and not rely on a pure bottoms-up motion if they wish to target higher ACVs and IPO-scale revenue. My colleague Jordan has a great post on this.

Cloud-hosted is really just closed-source. Giving away core IP is difficult, especially if it comes at a cost to a company’s top-line. Nowadays, OSS infrastructure businesses typically provide a a cloud-hosted version of their product from day 1 that is very different from the open-source alternative. This model works great for performance-oriented workloads such as databases. A year before going public, MongoDB introduced Atlas, its cloud-hosted version that has since become a monster business, accelerating overall growth and making up approximately 60% of total revenue. But, the source code is proprietary and operates under a very restrictive license. These cloud-hosted versions resemble standard SaaS products, with the open-source aspect mostly used for marketing and lower CAC.

That said, with a hosted OSS version, at a minimum there is (1) an “escape hatch” that users have in case they need to run the code themselves, (2) an ability to offer up a patch to an issue, and (3) a larger community of users to collaborate with, all of which are NOT possible with a fully closed-source vendor.

Fracturing of the cloud → monetizable subcomponents. The expansion in cloud computing has made it easier to build standalone businesses around specialized services. A decade ago, who would have thought there would be public-sized companies around payments (Stripe), SMS APIs (Twilio), and secrets management (HashiCorp Vault)? The latest success stories are Vercel and Deno, building successful OSS companies by leveraging frameworks and runtimes as the core engine, which was thought impossible to monetize a few years back. These businesses have decoupled the frontend from the backend, proving value-add workflows in their cloud-hosted versions. We’ll continue to see the cloud “fracture” and OSS businesses emerge around each service.

The next generation of OSS

We’re thrilled to introduce the Cloud Infrastructure Open-source Software Index, a compilation of the top 25 most promising private-stage companies that are likely to follow in the footsteps of MongoDB, Confluent, HashiCorp, and other public OSS businesses. These names were selected from a basket of venture-backed OSS companies ranging from seed to pre-IPO stage, and ranked among the top 25 in index scores.

A project’s score is determined through a calculation that takes into account four dimensions with different weights (a more detailed calculation is available in the dashboard):

  1. Adoption: # GitHub stars, # GitHub contributors, # GitHub watchers
  2. Momentum: rate at which stars, contributors, and watchers are added
  3. Usage: rate at which a project is forked, # of issues, # of pull requests
  4. Health: % issues closed, downtime, release cadence

These 25 companies were chosen from a broader universe of 200+ venture-backed OSS companies, most of which are shown below. This market map and analysis is certainly not exhaustive, but provides a glimpse into the incredible innovation that is happening within OSS today. We’re sure a lot of these projects will become household names in the near future (and some already have).

The dashboard

We’re excited to release a publicly available dashboard that tracks this OSS universe. These projects have raised at least $5M in funding and operate within cloud infrastructure. We use the GitHub API to stream data from public repos on a daily cadence, and store metrics in a Postgres database which is then fed into Metabase. We’ve added filters at the top of the dashboard to enable some interesting analysis. Here are some questions the dashboard can answer:

  1. Which machine learning projects have the most momentum?
  2. A five-way comparison among all the vector databases that just raised ;)
  3. How many contributors should my security startup have before raising a Series B?
  4. A side-by-side comparison between transactional data lakes (which are the most forked projects in the entire OSS universe)

Let’s explore the four different types of metrics we pull from GitHub:

Adoption

Perhaps the best indicator of project success is total adoption. In our dashboard, we pull two different types of adoption metrics: total GitHub stars and total GitHub contributors. Below is the total contributors chart. Airflow, Vercel, Grafana, Hugging Face, and Clickhouse rank among the top 5.

Momentum

It’s important that a project grows its community over time, attracting technical champions and users. How quickly do OSS communities grow? We track two metrics: (1) % GitHub star growth from the previous month and (2) how many GitHub stars are added per month since the repo has existed (a more wholesome view). Here are the fastest growing projects since last month. Notice how the top 6 fastest-growing projects are all machine learning related, with Chroma, GPT Index, and Langchain exploding in user growth.

Usage

Total number of stars and growth is great, but are developers actually using your project? We anchor too much on total GitHub stars. In our opinion, usage is the most important metric. We measure 4 different usage metrics: (1) % total GitHub forks / total GitHub stars, (2) GitHub issues created in the last month, (3) GitHub pull requests merged in the LTM, and (4) total GitHub issues added over time. The below chart shows who’s actually forking these projects and using it themselves. It’s interesting to observe that the top 5 are all data or database related, indicating maturity of the ecosystem.

Health

The last dimension we care about is a project’s “health.” Are issues closed quickly? Is there a constant release cadence or is the project deprecated? We measure health through (1) % GitHub issues closed in LTM and (2) frequency of release cadence. If you hover over the bars on the dashboard, you can see the # of open and close issues for each project.

If you would like your project included in our dashboard and market map, please send me an email or comment below :).

Join our infrastructure & OSS group

We have an awesome advisory group of infrastructure and OSS enthusiasts at companies like Atlassian, DoorDash, PayPal, Stripe, and Twilio. We host seminars, happy hours, and virtual events throughout the year. If you own infrastructure & love open-source, please drop me a note. We would love to have you!

Big shout out to our colleague Matthew Phua for working through the GitHub API and creating much of the dashboard. And thank you JJ Fliegelman, Jaren Glover, Danel Dayan, and Clay Fisher for the edits and feedback.

1 Source

2 Source: Pitchbook, company filings. Market data as of 3/17/2023. OSS basket includes Confluent, Couchbase, Elastic, GitLab, HashiCorp, and MongoDB. Infrastructure basket includes Alteryx, Atlassian, C3.ai, Cloudflare, CrowdStrike, Datadog, Digital Ocean, Dynatrace, Fastly, JFrog, New Relic, Okta, PagerDuty, Ping Identity, SentinelOne, Snowflake, Splunk, Sumo Logic, Tenable, Twilio, UiPath, and Zscaler. Application SaaS includes the rest of the universe of publicly traded SaaS businesses.

3 Payback is gross-margin burdened. Defined as (Previous Q S&M) / (Net New ARR x Gross Margin) x 12.

Jason Warner
Author
Jason Warner
Sai Senthilkumar
Author
Sai Senthilkumar
Principal at Redpoint investing in the next generation of cloud infrastructure.
Scott Raney
Author
Scott Raney

Be a founder who's in the know.