With twenty years of experience in software development, Dustin Boss, CEO and co-founder of Telepath and CallTime, talks about his experience building machine learning products and what the current landscape of machine learning looks like for software developers.
Below is the transcript of the conversation:
Lily. Hello Dustin. How are you?
Dustin. I'm good. How are you?
Lily. I'm good. So, the first question I would like to ask you is if you could tell me a little bit about your background in software.
Dustin. My background has been as a software developer for the past 20 years, and as a software developer, I primarily focused on tools that build applications for businesses to automate processes that take them a long time, and so payroll systems and financial systems, things like that. When we were at our last company, the co-founders of Telepath and I, we built political software that automated the process of political fundraising and tracking donors and building scores related to who would be likely to give, that sort of thing. That tool was different from the sorts of tools I built in the past because that tool required us to develop some sort of scoring and understanding of who was likely to take a certain behavior. Prior to that I only really worked on tools like automated systems and made rules easier to work with so it was the first time I had implemented machine learning into any product that I had built and it was a very interesting process and it was very difficult to get set up and to figure out like what needed to be done and how this entire different type of application development than I was used to working and so that was a kind of an eye-opening moment for us as a team which was that it was very difficult for application developers to implement machine learning with the tools that were available because it had an expectation that you were going to go and learn an entirely different kind of set of skills.
Lily. Yeah, that makes sense, can you tell me a little bit more about the scoring component of CallTime and what you all were hoping to get at with that?
Dustin. Sure, so with that, I mean this is a little bit in the weeds of how fundraising works but in the fundraising world you end up with lists of people within your network and extended network that are much bigger than you could actually make contact with one-on-one so you have to prioritize who's worth reaching out to and who's not and there's an entire class of consultants who are in the business of helping you make those decisions, but those decisions are made on very kind of, they're not quite arbitrary but they're definitely demographic broad strokes which will be like okay this person has given this much money to someone before, this person lives in a rich neighborhood or this person that's whatever and you're making a lot of generalizations about a handful of rules and there's not a lot of nuance looking into the pattern matching of like whether that person has given in a sort of way that donors that tend to give to you also gift, and that level of nuance is important when you're doing like looking at a lot of data which is the case for a lot of these campaigns and so we were trying to find a way to speed that process up so people didn't have to rely on sort of their gut intuition about people.
Lily. Was machine learning part of the original product vision or was it implemented later on?
Dustin. I think we knew from the beginning that machine learning was the answer to our problem. I don't think we really understood at the time what the solution looked like it was just like oh we're going to make a score and that score is going to take into account all of these things so we had the intuition like that was the solution but we didn't know, okay, in order to make that score you're going to need to use the following libraries or languages that was all stuff we had to learn along the way.
Lily. Cool. Can you tell me a little bit more about the challenges that you faced as a software developer meeting machine learning in the product?
Dustin. Yeah, so right now the challenge to bring machine learning into a product as a software developer is that there aren't any tools that are actually for software developers. All the tools that exist today are for data scientists and it's not that data scientists can't also be software developers, there are plenty of software developers who have become data scientists, but it's that there is that state of having to become a data scientist. There's this learning curve on top of being a software developer that you have to learn before you can utilize the tools designed for data scientists and all data scientists are software developers so those tools some of them are designed for developers and data scientists and some of them are just designed for data scientists but none of them are designed just for developers.
I'll give you like an example of some of the difference, because some of it's really simple some of it's just like language difference. So when I think of data as a developer, I think of data in rows and columns right like these are tables and then when you join them together you start having these joins and these relational databases or maybe you're talking about a non-relational database and you're using fields or something like that as their terminology but in either case there's kind of like two standards you're going to think of rows and columns you're going to think of objects which can then be converted into rows and columns. In machine learning what they call a column of data is a feature, now if you're within the world of machine learning you've gotten used to this or the world of data science you've gotten used to this term and you don't even question anymore you know what it means but if you're in the world of application development a feature is like a thing that your product does. There's all sorts of talk about feature development and within products and that's about like what features you're going to put on the roadmap what are the things what are the features we're going to take off the roadmap and we're going to develop next. When you sit down to read like a Hello World kind of tutorial about data science and you're trying to just get and dabble in it you're going to read it and immediately come across many terms like "feature" that they're going to treat very like casually like a term that you should know and then it makes you realize right away like you're not the audience for this material which can have several implications but for a lot of developers it just means they put it down and then maybe they come back to it later or maybe they never come back to it because it suggests a very steep learning curve and in fact to use the tools today there is a very steep learning curve because they're all on the other side of all of this expectation that comes that you're going to be familiar with the data science world.
So from my background as a developer, I just never even attempted to do machine learning or data science. I knew it was there I tried to know just what I need to know about like the contours of it so if I ever had to engage with it I would like to know how to approach it but my expectation always was if I wanted to implement it I was going to have to find someone to hire that was like a specialist in that thing that would then implement it basically for me and it would just keep me in touch with the pieces I needed to know, yeah so I think it's I think that answers the question but it is right now not a space that is very approachable to people that aren't trying to actually become a data scientist the people that are just trying to build on top of it.
Lily. Was there a time when you yourself tried to implement a machine learning feature? I know that at CallTime you did have a data scientist and you had someone else doing the modeling but was there ever a time when you actually had a problem you felt like machine learning would solve it for a feature or an element of the product and you sort of went into the process of trying to get it done?
Dustin. Not for me and I think there are plenty of developers do and I think that there are plenty of developers of course that are much better than me and it will that have both the capacity and the desire to learn all these things very deeply and they should learn all these things deeply I think there's a huge opportunity for people that know all of these things very deeply in the same way there's a huge opportunity for people to understand the ins and outs of how databases work very deeply and how security works very deeply. There's lots of levels of expertise that you can drop down into within being a developer, but for me, this was not one of those places that made sense to go into for a few reasons. One I don't come from a math background so like it was always going to be a lot of the learning and a lot of terminology and a lot of jumping back and forth between what I was trying to apply and then like a dictionary somewhere right I tried to read the glossary terms alongside it um and then the other reason was because the programming languages are limited so like I come to CallTime specifically we were writing everything in Node.js. Everything I wrote was a Node.js, if it was on the data science side it's gonna be written in python so I’ve got a little experience with python but I’m not a python developer so I wasn't going to learn an entirely new programming language just to then participate in something that we had someone specialized to do and that structure puts like an undue burden on the data scientist also because now I have this data science co-founder at that company and he's just responsible for like everything data science. Well, the problem with that is that now you've lost all the collaboration you typically get on a development team. If I was working on something in Node.js and he also works in Node.js and he's also worked as an application developer then he can participate in the conversations around that but now if it's on the data science side it's like I have to level up before I can even participate in offering something constructive on that side so that was a sort of we kept running into these sorts of problems of limited resources and as we started thinking about Telepath it was clear that more and more people than us were experiencing these same sorts of constraints.
Lily. Yeah, and a sort of a quote I keep hearing as I talk to product managers and product people is that you either collaborate or die. That like any siloed workforces within a team can often really stymy innovation so making sure that you can collaborate is really important.
Dustin. Yeah well, no I totally agree.
Lily. So given your lens as a software developer and also a product leader, as a co-founder, what do you sort of see as the future of machine learning and product development.
Dustin. I think that every company that is implementing data science and machine learning today started with the same problem and it was, we have a bunch of data or we're going to get a bunch of data, but typically the companies that are really using it today started with a bunch of data. They tend to be bigger enterprise companies. They were sitting already on a lot of data. It was worth it to them to figure out how can we extract insights and meaning from this data even if it's a little bit experimental. We have to do it because of the little bit of gain that we're going to get at the scale of market that we're in is going to be worth it so we'll invest in this space and we know that there's this concept of machine learning that's out there that's emerging and that we could utilize and so everyone started with the same problem how do we extract the data or how do we set the insights from these mountains of data and so they started, by example, what you need to do is you have to build a machine learning model. Okay well, who knows how to build a machine learning model? Oh well, statisticians, mathematicians, okay so let's go find some of those, so we brought into the technology world statisticians and mathematicians at this like sort of seed level whose job it was to build these models and to take this sort of cutting-edge technology of machine learning and some of it a little older than cutting edge and then implement it within a technology world or within these enterprise ecosystems. So, these people did that and for several years we started building practices around this idea of a data scientist who would then collaborate with business people to understand what their needs were and would collaborate with data people to understand where the data was and what it meant and then put it all together and build these models.
And then okay, a few years down the line we start to have models, and then it goes, okay, what do we do with these files how do we deploy these models in a way that we can actually use them like in real-time in our day-to-day life as opposed in these very slow batch processes. Oh well I don't know how to deploy a model I’m just the statistician so now we're going to need to go and get an engineer, all right so we bring in different types of engineers but at this point we're now already polluted the space with a lot of different terminology and a lot of different concepts that are very mathematically specific so you've got to take engineers that come in that have to learn something extra about being a data scientist and they probably also need to learn something extra about the ways that you handle these big data and data likes and data warehouses and things so you start to get this rise of like data engineers or machine learning engineers and that eventually gets the point of like trying to become more automated and now we're kind of in this level of like what we call MLops which is sort of a combination of all of these things coming together where you've got these mathematicians who have these models, you've got these engineers who have constructed the ways that we need to run these models and now we need to be able to deploy them into cloud environments and host them and we need all this to be consistent and so it starts to build this this inverted pyramid structure with the bottom as a statistician and everything up that's being built on top of it is this added complexity that's based on the step that we introduced before it and that's fine if that's the only way to solve it and so it keeps getting more and more complex it keeps getting larger and larger and larger and more and more expensive because the companies that need it really need it and so they're willing to support this entire ecosystem of all these things that are designed to support this data scientist this whole world now is what we call data science.
And so this is great because we have made these enormous advancements and all sorts of technologies because of it and we're able to identify patterns and data that we would never been able to do before because it was too hard to search and we it's been incredibly valuable but now along comes another technology which is this auto ml or automated machine learning technology this technology suggests that for not all problems and maybe not even most problems yet but for a very large section of problems these machines can automatically determine the best type of model to build for certain types of data sets and so this means that that job that statistician did at the beginning of building a model for all sorts of use cases is now eliminated and now you can have this machine do that. What doesn't seem to be being asked right now in the data science world is okay if that happens if that continually gets better and better and we've eliminated the need for this individual statistician to design the models then does everything we built on top of it still make sense is all of that the still the right way to execute on a model we already have or is it something different.
What we believe is that the future of data science is that it starts to look like something different. If a data scientist is no longer the person who has to be solely in charge of like thinking about which math to apply to develop the right type of model and a computer is doing that, then you have to start thinking about okay so what is the data scientists job going to be in this world. I still think it's the right term, I still think it's a term that is going to continue and something that's a role that's going to continue but I think it starts to become a lot less about math and a lot more about creativity. It starts to become a lot more about thinking about I’ve now got a machine that if I feed data into it can spit models back out on the other end so what's the right type of data to feed into it and that's less about like how do we you know reconfigure the columns and re-aggregate them to build these different features as they call them right to to generate the models and more about what are the sorts of data points that would be interesting. Oh maybe it would be interesting to know like how long did that person wait for their uber or maybe be interested to know like how long on average did they wait which would just be a thing the model would come up with from how long or maybe it would be what was the average tip price that they get you sort of think about all the data points that you can be collecting and putting into the spreadsheets that you start feeding into the models and those models will start generating results there are people within organizations that are already doing this sort of data collection and they're very good at it and they're nowhere near the data science space right now you think about the things that marketers do and the things that sales people do they have all sorts of intuition about like oh if we make a coupon for that's going to do better than a coupon for during this time of year or if we if we have this particular type of promotion during this type of year that's going to resonate with customers in this way they're seeing patterns already in behavior so they'll be able to identify I think in time as we start getting more and more comfortable with the idea of data like oh what are the ways that we could capture what I believe these patterns are represented by numbers they'll collect these data points and that will start getting fed into these machines beyond that the engineering sites have to become a lot more straightforward. If you're starting to deal with things that are not these homemade algorithms and they're standardized, you can start deploying them in more standardized ways you can start to build a more streamlined operation on top of it.
I think that's what we think the future of data science looks more and more like that it starts to become more and more creative. How do we collect all the data more and more reliant on machines they're going to be doing a lot of the hard part for you and then more and more standardized to the way that we do the rest of application development today rather than continuing to build out this separate engineering ecosystem alongside our application development ecosystem it makes much more sense to to just surround ourselves surround the ecosystem that we already have.
Lily. Yeah, that makes sense and I think you know the infusion of creativity into this space is really exciting because already we've seen machine learning being used in so many different and cool ways that um you know creating a system in which creative thought can flourish I think we'll only see cooler and cooler things down the line.
Dustin. Yeah, I think so too. I’ll add like another piece here and I’m sure we're going way over time but the I think that the that there's because of these two separate systems because there's a system of data science that is designed for data scientists and then there is this system for building products that's designed for products there's a really difficult time right now integrating the two there's not a lot of machine learning that's in products today most of the focus on machine learning today is focused on the analytics side so focus on we take all this data and where most of its really doing batch jobs but even if we're getting more and more real time it's about analyzing data that is somehow separate from how your users interact with your product and if you think now about all the appliances in your house all of the applications on your phone most of them are today not making any sort of smart decisions about what you're going to do next some of the applications are doing that now but certainly like my toaster's not doing it, my fridge isn't doing it, but yeah I know that in the next years all of those appliances are going to do it. Our expectation as users is that we're going to get more and more smart guessing about what our behavior wants from all the devices and things around us in order to do that we have to start pulling data science into the application realm and into the development realm.
What I don't understand I’m curious to hear as you do these interviews if anyone from the traditional data science side has a better answer for this than I’ve been able to get but is that in this world is the expectation let me pause it no one disagrees with that assessment but in the future all these products are going to do these things or at least I haven't met the people that disagree with that so I want to know how do they see this silo moving into this island is the expectation that over time all of these people that build products and applications are eventually just going to get large enough that they can afford to participate in this or is it that this giant silo is eventually going to get cheap enough somehow or is it the data scientists are going to become so plentiful that everyone will have a data scientist? It seems like this very specialized enterprise use case which is what's happening right now doesn't apply to most companies they're in the technology space right now and I don't understand how we expect that in the next 10 years everyone's going to be using machine learning and it's going to happen by this giant silo continuing to get bigger.
I think something is going to break that damn and we're just going to see a totally different approach to it and it's going to have a lot more to do with the tooling coming to the developers than the developers coming to the existing tooling and so from our perspective we're focused a lot on that is how do you take the all the great things that were accomplished by this data science world but then plug them into the regular ecosystem in a familiar way so people can build models very quickly, they can bring them into market they can iterate on them, and get better and they can start to build up all the sort of lean product building mentality that we've come to count on in the development world.
Lily. Yeah, well that's an excellent question and I will carry it with me into future conversations so um yeah so, I think we've hit about the 20-minute mark so thank you so much for taking the time to have this conversation, Dustin.
Dustin. Yeah of course thanks, Lily.