Machine Learning: Build or Buy?

The wrong question in 2021

by Andrew Blumenfeld

March 24, 2021 7 min read

It’s a cliche because it’s true: the question is not if you’re going to incorporate machine learning into your products, it’s how you’re going to do it.

But whereas the demands were previously on leveraging AI/ML to make smarter business processes, today’s consumers also increasingly expect everything they use- from apps on their phone to appliances in their kitchen- to be “smart.” They know you’re collecting their data, and they expect a more personalized, more seamless, more intelligent product in return.

Whatever the end-feature, though, those building companies and products often wrestle with whether to build or buy when it comes to AI/ML. Making this decision comes with all the usual trade-offs that tech decision makers are used to. What kind of team do you have, what can they accomplish, what’s the opportunity cost? Is what you’re building core to your business or a key competitive differentiator?

But in 2021, the question of “build or buy” is the wrong one when it comes to machine learning. Because of course you need the speed and simplicity that a purchased solution offers and of course you need the accuracy and control that you get when you build it all yourself. That’s why taking a hybrid approach is becoming so popular in this space. Advances in AutoML and other technologies are making it possible for product teams to incorporate machine learning into their product development cycles in much the same way they incorporate any other feature. This allows them to use their own data to build custom, hosted models that can serve real-time predictions to their products via an API. 

This approach outsources and abstracts away the math concepts, data science jargon and devops, without requiring the business to give up quality, customization, or control. The really good tools also provide visibility into the model and outputs so that there is the explainability necessary to empower the team to iterate and experiment towards better and better solutions. More companies than ever are relying on this technology to extend the capacity of their developers, making predictive applications and features as available to them as Twilio did for text messages or Stripe did for payment processing. 

Building ML/AI features quickly

No one needs to be sold on the advantages of speed, but one thing many companies overlook when they first endeavor to implement machine learning into their products is that the traditional data science development cycle is largely anathema to most technology companies. The proliferation of machine learning tools to support data scientists has significantly improved this situation, but still leaves a lot to be desired. Even with a fully staffed data science team, the timeline to produce models is best measured in months.

Tools that help you put machine learning powered products and features into production faster are obviously helpful, but that’s really only part of the issue. For companies to truly expand the universe of individuals in their team that can build ML/AI features, the development cycle has to be as quick as possible to help non-data scientists develop intuition about their data, the models it feeds, and the outputs those models produce. This means the production timeline has to be measurable in hours and minutes-- not weeks and months. 

Giving developers the tools they need to experiment and iterate in this way aligns well with all the skills and practices they already bring to the table. Here, again, leveraging tools that automate the data science math and engineering makes that feasible.

Building ML/AI features on your own data

Most assessments of the “build v. buy” question when it comes to ML/AI assume that the company is giving up the accuracy, control, or customization when they opt for buy over build. That is a lot to give up, but it’s also not necessary anymore. The technology that powers AutoML platforms is as fast and effective as it’s ever been and we have every reason to suspect it will only become more so over time. 

That means that companies can actually leverage their own data to solve novel problems using custom models that are built by machine. And as these tools become further abstracted away into developer-friendly frameworks, it will mean product teams can access them armed with expertise about their domain, their data, their customer, and their product-- but without needing a PhD in statistics. 

This might be the most powerful aspect of the hybrid approach. Because one thing both “build” and “buy” have in common? They both actually silo the problem from some of the most powerful resources a company has to offer to the development of machine learning products: it’s local expertise and data. An off-the-shelf solution completely outsources the issue, potentially starving it of the unique context of the particular problem it’s trying to solve. But even with a traditional in-house approach, too often companies “other” their data science problems to data scientists, and allow the intimidating math of it to keep them at arms-length from the rest of the product development team. The lifeblood of machine learning is that context, so the further away the data science is from the data, the bigger the problem.  

Table stakes or differentiator? 

A common consideration when deciding whether to build or to buy a technology is whether the end it serves is core to the business or a major market differentiator. Increasingly, however, AI/ML is necessary to power products, features, and processes that are considered table stakes. Recommendations, predictions, forecasts, automated categorizations, etc., are becoming the expectation in enterprise and consumer software at a pace that is far outstripping the capacity of most companies to actually meet those expectations. This means that businesses need to shift the way they assess the tools at their disposal to bring these products to market. AI features might have overwhelmingly been standout differentiators in the past, but as they become a part of every tool we use every day, they shift from the forefront of a users’ mind to the background. That shift creates a new standard for MVP, and it can’t be the case that the only companies able to meet it are those that can allocate a data science team to any particular feature.

That isn’t to say that there won’t be highly innovative utilizations of AI/ML that are core differentiators for companies. Undoubtedly there will be. But the same technology that puts machine learning within reach for companies without data teams can be used by companies with all the resources in the world looking for better ways to experiment with their data, and to bring more of their company into that process. In fact, it seems almost inevitable that the biggest innovations leveraging AI/ML will be done exactly that way. So table takes or differentiator, it’s time to rethink the approach.

Picking the right tools

The tools a company decides on often dictate where on the build-buy spectrum they land when it comes to leveraging machine learning technology.

On one end of the spectrum you have tools like Spark ML, Scikit Learn, and Databricks, which are designed to assist and even supercharge those already very experienced in data science and machine learning. These are powerful tools, and what a company is buying here is the ability to make their data science teams and machine learning experts faster and more effective. 

On the other end of the spectrum you have tools like Amazon CodeGuru, which is an off-the-shelf offering that will review your codebase and alert you of potential issues. The models are pre-fabricated by Amazon on top of Amazon’s dataset (as well as that of thousands of open source GitHub projects), and so doesn’t have room for much customization or flexibility beyond the narrow use-case intended. Other tools like Kairos and TextRazor similarly rely on existing datasets to build reusable models that can be plugged into various applications.

The hybrid ground between those ends of the spectrum is the future of this space and is what makes the classic build versus buy question so inapplicable here. Software like Telepath provides developers with a platform they can use as non-data scientists to experiment with and prepare their own data, push it through an AutoML engine that builds a custom model on top of that proprietary data, and hosts that model so that it can serve real-time machine learning API calls. Datarobot and H20.ai, similarly leverage AutoML to empower non-data scientists to build and deploy models, though generally some machine learning prowess is still advised to take full advantage of their offerings.

Riding the next wave of ML

Many find machine learning and artificial intelligence intimidating-- including people with significant technical capacity. And the wave of demand is only just beginning to swell. The combination of those two things could create big problems for companies. But- to end, as we started, with a cliche- it also means tremendous opportunity.

The burgeoning growth of cloud AI/ML developer services unlocks for software developers the capacity to be fully engaged in leveraging machine learning for the products they’re building. This will inevitably feed the demand for those products, while also empowering companies to meet it. This decade will bring the greatest growth and innovation we’ve seen yet in machine learning, and companies that are at the forefront of this revolution will be the leaders in their industries.

Want products news and updates?

Sign up for our newsletter to stay up to date.