Today Databricks is announcing that it has raised $250 million in a Series E funding round led by Andreessen Horowitz, a firm which has led three of the firm’s prior rounds, inclusive of this one. Prior investors New Enterprise Associates, Battery Ventures, and Green Bay Ventures participated in the deal. Two new investors, hedge fund Coatue Management and Microsoft, joined in as well.
Follow Crunchbase News on Twitter
The deal shows that VC interest in open source software continues apace. Just a couple weeks ago, Apache Kafka provider Confluent raised $125 million from Sequoia Capital at an over $2.5 billion valuation, post-money.
Databricks is now valued at $2.75 billion, post-money, according to a spokesperson for the company. This valuation lines up with targets disclosed through Delaware state regulatory filings made by the company in mid-January 2018, spotted by the Wall Street Journal at the time.
The company didn’t disclose specific figures related to its business, but according to a statement provided to Crunchbase News the company hit $100 million in annual recurring revenue in 2018 and experienced “approximately 3x year-over-year growth in subscription revenue during the last quarter of 2018.” Assuming the company is generating over $100 million in ARR today, its shares were valued at between 15 and 25x revenues.
The company says that over 2,000 organizations around the world use Databricks’s software in their data analytics, data science, and machine learning workflows. “What’s driving this incredible growth is the market’s massive appetite for Unified Analytics,” said cofounder and CEO Ali Ghodsi in a statement. “Organizations need to achieve success with their AI initiatives and this requires a Unified Analytics Platform that bridges the divide between big data and machine learning.”
The Unified Analytics Platform Ghodsi refers to is built atop a technology he helped co-create a decade ago. Now developed as an open-source project under the aegis of the Apache Software Foundation, Spark (formally Apache Spark) was developed out of the now-closed AMPLab (standing for “Algorithms Machines People”) at University of California, Berkeley. The original paper describing and naming Spark was published in 2010.
Crunchbase News spoke with Michael Franklin, who currently serves as chair of the computer science department at the University of Chicago, to learn a little more about Spark. Franklin co-founded the AMPLab and was its director at Berkeley before coming to UChicago. He sits on several big data companies’ technical advisory boards, including Databricks.
“Spark is a platform for doing scalable data analytics and machine learning. It’s known for being very flexible, very fast, and one of its salient features is that it offers a bunch of different interfaces to interact with and operate on data,” he said. In part because of the diverse set of disciplines represented by researchers at AMPLab, the framework’s scope expanded to include ways to perform SQL-style analytics queries, ingest data from streams, manipulate graph data (like social networks), and train machine learning models.
“The real reason [Spark] took off though, was because it was a faster Hadoop,” Franklin said. Hadoop was the first open-source implementation of MapReduce, a distributed, parallelized data processing model originally developed internally at Google. (As an aside, Hadoop’s development is also facilitated by the Apache Software Foundation. The for-profit company that supports Hadoop, Hortonworks, went public in October 2018.) Spark could work with data that was already loaded into the Hadoop File System (HDFS), which aided in demonstrating performance improvements eked out by the framework, prompting many to switch to Spark.
Performance may have netted Spark an initial following among the big data and analytics crowd, but the ecosystem and interoperability is what continues to drive broader adoption of Spark today. As open source software, anyone with an internet connection and a system that meets the minimum specifications can freely download and run Apache Spark on their own machines.
However, for enterprise clients that want a more full-service offering, Databricks developed a proprietary runtime—the aforementioned Unified Analytics Platform—that is even more efficient and offers more features than the open source package. (On its website, Databricks compared its feature set to Apache Spark’s.) Databricks offers different service tiers through partnerships with Amazon Web Services (AWS) and Microsoft’s Azure cloud compute platform.
A spokesperson for Databricks told Crunchbase News the company is “fully committed” to maintaining an open source development model for Apache Spark. “Together with the Spark community, Databricks continues to contribute heavily to the Apache Spark project, through both development and community evangelism,” they added.
Illustration: Li-Anne Dias
Databricks said it’s raised $498.5 million to date, but Crunchbase only has data for $247 million of that, starting at Series A. People browsing through prior funding rounds should keep in mind that Databricks must have raised $1.5 million in prior funding, which, at this time, is not listed in Crunchbase.↩