Dec. 12, 2023

(Bonus) Chris Smith: Simple guidelines for AI investment sizing

The player is loading ...
Make Things That Matter

Chris Smith is a longtime engineering leader who has been in the trenches of building with AI & machine learning for years. This is a short, bonus episode to go along with our main conversation: https://pod.fo/e/20a5ef

 

Transcript

Andrew Skotzko [00:01:04]:

Considering that being early December, it's planning season, everyone's going through that craziness. So if we have this technology that everyone knows they have to do something about but don't actually know what to do with yet, how the hell do we plan for this?

Chris Smith [00:01:18]:

Yeah, carefully, as is my favorite answer on that kind of thing. Total dad joke there. No, to a certain degree, I think this is specialization on the work that you would do if you were planning out a software project. Usually with software you're doing something you haven't done before, right? And you're trying to explore it. I think the risk reward ratios on software are also similar to with machine learning in the sense of there's a lot of upfront cost and then the per unit cost tends to be very low. Right. And that's where your profit comes in. So it's high risk, though, because of the fact that you have to do that upfront cost in order to get there.

Chris Smith [00:01:56]:

I do think the interesting thing that has emerged over the last little while is with AI tooling being more productized, you can do some baby steps that minimize how much you have to do up front in order to start getting into that virtuous cycle of incremental changes. You can literally test out an idea of whether it works with a large language model by going right now to Chat GPT and typing in a question and seeing what the answer is that comes back. And if it comes back with a reasonably good answer, you know, hey, it's not going to be that hard for me to harness AI for this particular use case, and if you get a terrible answer, well, now you know you've got a lot more investment you're going to need to do before you're going to see anything. So I think that's probably the biggest change now is the step zero, where you're doing exploratory work to figure out how big of the size the puddle is. You can figure that out a lot more cost effectively. Now, before, you kind of had to sort of invest all this time in building out a model, testing it out, and then if it didn't work very well, you weren't sure if it was because you made a mistake somewhere along the way or whether it was just something that's not going to work. And so it tended to make the cost much more expensive. Now, you can get a very early signal about whether something is likely to work.

Chris Smith [00:03:11]:

Right from the beginning, you take one of these large language models that's been built and trained, pretrained on just a massive data set, and if you're lucky, that data set includes enough knowledge that there's context and applicability to whatever problem you're working on and you don't have to do anything else. So I think that's the easy part, is you can have a pilot project to explore that.

Andrew Skotzko [00:03:32]:

So best case scenario, OpenAI has already trained the model for you and done the hard part. And you just get to rent theirs.

Chris Smith [00:03:38]:

Exactly. At least to get to your initial proof that it's worth investing in this. Right. And then, as you know, once you get to that point, it becomes so much simpler. Right. Because then it becomes, how much is this product worth to me? Right. You can test it out in the marketplace, you can see how much it's worth, and then based on how much it's worth, you can make decisions about how much you want to invest in making it better. And that's like playing on easy mode from my perspective, because it's a lot harder when you don't know what the market value is for the work that you're doing and you're having to guesstimate whether an investment makes sense or not.

Chris Smith [00:04:13]:

That's step one. That's the easy case. Great. The next level harder is it's okay, but you can already tell before you launch that you're going to have to do some fine tuning on this before you're going to get something of value. And there's literally a process called fine tuning that you can do on these models, which again, has comparatively small effort for the rewards. Right. And you can get pretty far with that, but you do have to be a lot more principled about it. So now you're going to have to make investment in collecting data properly, potentially labeling data, and then feeding it into the model in a principled fashion, that kind of stuff.

Chris Smith [00:04:55]:

That means now you need a team that understands how to do all these things, right? So you got some cost there and some headcount. That's, like, not nothing, right? And that's likely where a lot of people are. And to a certain degree, it's where you want to be. Because if it really is as easy as just going to chat GBT and you get the answer already, you don't have much of a moat between you and your competitors. Right. It's like they're just going to cross it as easily as you did.

Andrew Skotzko [00:05:23]:

Yeah, you have no moat.

Chris Smith [00:05:24]:

So you actually kind of want to be in that space where at least some investment needs to be done in order to reap the reward. And that's also where you can take advantage of your proprietary data, explain a.

Andrew Skotzko [00:05:34]:

Little bit about that, of like, what does this middle case look like, where you're sort of leveraging the OpenAI model, for example, and then blending it with your proprietary data to create something that is uniquely your own? What's an example of that? That would illustrate this for people. It would be very helpful to be a little more explicit about what are the kinds of signals that, as an executive who's not fluent in this technology, like, if I'm a CFO, what are the signals that it's working? It's not. We have a long way to go. What should I look for?

Chris Smith [00:06:02]:

I want to start with that, actually, because that's a really good question and super relevant. So usually you have a principled scoring mechanism where you track. It's almost like an NPS score, right, where you're tracking how good were my outcomes? How many times did I get a successful outcome? How many times did I get a mediocre outcome? And how many times did I get a terrible outcome? Right? And then it just becomes a matter of scoring that and paying attention to it. There are much more principled, depending on the kind of problem you're working on, there are much more principled statistical measures that the data science team will no doubt want to use and should use. But that hive, is the product working well? Is it not working well? That's what makes sense at sort of a business level. At a more detailed level, there's things like precision and recall, which is often relevant, and there's a bunch of other sickle measures that you might want to come up with. And they're good for tracking the progress on the model itself, but they're not necessarily good for tracking progress on the business problem. Right.

Chris Smith [00:07:01]:

Which are not necessarily the same thing. Often if you're doing well in one, it certainly means you're doing better in the other, but it's not a direct relationship. Sometimes you can have an excellent model and it provides no business value. Right. It's entirely possible.

Andrew Skotzko [00:07:13]:

Totally, absolutely.

Chris Smith [00:07:15]:

Anyway, so that's there. And so then when we're trying to about an example case. So let's say that you are trying to use a large language model to support your customer service team, right? They're getting all these customer service calls, right? And you want to help scale them up and prioritize and identify trouble cases and all this kind of stuff. So most of the conversations that customer service folks have with your company probably look like a lot of other conversations that happen everywhere else. So these large language models actually will come in with a pretty nice understanding, at least rudimentary, of how to understand what's going on. But there's nothing like being trained on a record of what actually happened in all of your customer service interactions. Right. It's going to give them detailed context and knowledge about both your customers context and your business's context.

Chris Smith [00:08:12]:

Right. Like you can train it potentially on, oh, for product x, when you get this kind of request, the right solution is to do y or jeez, this keeps happening with product x, and every time we have to deal with it, it's a really hard problem to deal with. You have all these of training data that you can feed into a model to help you get far better results than you would get without that proprietary information.

Andrew Skotzko [00:08:35]:

Okay, let me make sure I'm tracking with you here almost the best case scenario, as we're describing here, right? The worst case scenario is you can just go to Chat GPT and it just is amazing at it right now because then it's already done and there's no advantage for your business there.

Chris Smith [00:08:48]:

There is a disadvantage with not using it. So you do have to use it at that point, but it's like it's not getting you a win, it's just preventing you from losing.

Andrew Skotzko [00:08:55]:

Excellent point. Okay, so that's not great. But then our middle case, where there might actually be some additional new value, is we have a problem domain that I would assume is decently well represented by the data that these OpenAI models have been trained on, I. E. The public Internet. We'll come back to the dollars and cents here in a second. But it sounded like earlier there might have been a third scenario you were talking about.

Chris Smith [00:09:22]:

There's the third scenario which, know, you kind of realize that the model to learn the wrong things, and so then you're going to potentially want to build from scratch or maybe you're going to need to do some really aggressive hyperparameter tuning in that scenario. And this is a common scenario you have to do to get that fine tuned model that's really doing exactly performing at the level you need for your solution to be viable as a business. That's where you're starting to talk big costs and so you're going to have to budget for that. And it's going to be one of these things of like, you don't know how well it's going to perform until you get to the other side of it. So you're definitely looking at a much more prohibitive cost up front. But on the upside, that means you're going to have something that nobody else has on the other side of it and now you're going to have a real advantage.

Andrew Skotzko [00:10:08]:

I guess what I would love is if there's a way you could walk the listener through, essentially, like, here's how I could do some back of the napkin math to get a rough approximation of what this is going to take in a situation. So obviously there's some upfront thing about figuring out which of those big buckets you just laid out. I'm in. And then how do I do some back and napkin math? How does that sound? Do you think we could do that?

Chris Smith [00:10:30]:

We can try.

Andrew Skotzko [00:10:31]:

Okay, let's try it.

Chris Smith [00:10:32]:

So it's always challenging to trap it down to people's domain. So I think in the first scenario, your cost is mostly just the cost of actually doing the exploration, of having someone evaluate whether the existing models can do the job well enough and whether there's business value just in that. So that initial exploration, it's almost like a feasibility study that you would do even if it wasn't AI. I don't think it's necessarily a special challenge at least, or a different challenge from what people are actually used to. I mean, the more challenging one is like the ones that we talked about after that. So the fine tuning scenario where you're having to make adjustments and you're trying to invest in it, first of all, you got to think about you're going to need to invest in a team that knows how to manipulate these tools. Now, the bar for that's gotten a lot lower because things have been productized so much. But now it's really more about just having someone who can run experiments properly, understand how to manipulate models.

Chris Smith [00:11:31]:

And so that's like you're talking like a team of. You can even do it with a team of one. Depending on how small the exploration is. A team of like three or four people can get a lot done in terms of exploring the opportunity at least to derisk it to identify. You don't have a finished product at the end of that, but you have identified can I get value or can I not at the end of that process? Or how much more am I going to have to invest in order to get value? The biggest challenge when I say that depends on the nature of the problem is how easy is it to collect together the data that you need in order to perform the experiment? Sure, if it's all neatly organized in a database already and you just need to do some queries and feed it into it, it might not take more than three weeks really to get that done right. Usually in my experience, that's not the state of things. The data is really kind of a mess.

Andrew Skotzko [00:12:24]:

All right, so it sounds like this is something that if I'm doing kind of like order of magnitude budgeting here, it's like, okay, I could have a small handful of folks do this assessment within a quarter.

Chris Smith [00:12:36]:

Yeah, exactly. And you can get some kind of result from it. And you might even by the end of the quarter you might be in a position to actually be working on actually putting the product out there. Even. You might have already done your exploration before the quarter is over. But, yeah, I think that's a reasonable statement. And then there's kind of that third scenario which you can often get to after making those first two investments of tried it with nothing. We tried it with just doing a little bit of something with our special data mixed into it to see what that does.

Chris Smith [00:13:08]:

And, okay, now in all cases, you saw enough evidence to think that a suitable model was achievable, but you haven't achieved it yet. And, you know, you have that minimum bar of like, you have to get above here in terms of product quality before people can really realize business value from it. And so that's where. Okay, now you're going to have to bring out your checkbook. It's going to be a bit of an effort, right? Yeah. And first of all, you're going to have to plan potentially for iterations. Right. Everything I talk about now, you're going to have to plan for the fact that you have to do it twice, you might have to do it three times.

Chris Smith [00:13:46]:

It's just the nature of the beast. And then you're going to need, first of all, you're going to need what I call a data engineering team, which is going to be people who are tasked to organize that data and get it ready for use in a data product. And often, if you're planning to actually put this out the door, they're also going to have to build a principal data pipeline so that as new data arrives, it continues to flow into this well structured form that you can take advantage of it. The compute cost, though, is considerable.

Andrew Skotzko [00:14:17]:

Yes.

Chris Smith [00:14:17]:

And basically it ratchets up. The more that you have a problem, essentially. Right, like, the harder it is to get a model to perform the way you need it to, the more you're going to have to spend. And that's kind of obvious, I guess, even to an outside person who doesn't understand the space.

Andrew Skotzko [00:14:32]:

I think the key takeaway from what you just said there is that we have these three scenarios, right? We have the thing where essentially what you have to do is just incorporate it so you don't fall behind. We have the mid scenario where you benefit greatly from the stuff that's out there, and you can customize it and tailor it to your situation with your proprietary data. And hooray. And then we have the third case where, sorry, you have to roll your own from scratch, and it's going to be much harder and more expensive. In my mind, I'm seeing these as, like, almost order of magnitude jumps.

Chris Smith [00:15:02]:

Yeah, this is definitely an order of magnitude.

Andrew Skotzko [00:15:04]:

First one cost me ten grand. The second one cost me 100. The last one cost me a million. And scale accordingly.

Chris Smith [00:15:10]:

Yeah, that's a very fair way to look at it. I think order of magnitude is probably the right scale, although with the caveat that the last scenario, when the first two don't work so well, the last scenario, it can be very hard to predict, really what the cost is going to be to get to where success is. And you need to be prepared for that. You need to basically decide up front when's the moment that I'm going to decide to cut bait and just walk away from this, because I'm not getting the result that I need, because there's always an opportunity to try to build something better, either by incorporating more data, by doing more exhaustive training, building a bigger model than the one you built in the last iteration, there's a lot of different and doing a more exhaustive hyperparameter search, all of those things are all possible investments that could just, you could try to keep going forever diminishing returns each time. But again, there's often, for products in the AI space, there's like very much a threshold that if you're below that threshold, you don't have a product. And if you just get just above that threshold, suddenly you have an amazing product. And so it can be very hard to make that judgment call of, like, when am I going to stop? But you have to decide that up front, I think, and you just sort of commit yourself to it and you go, this is what I'm going to have to see after I've spent this much money. I need to know what the answer, if I don't see this kind of result, I'm going to walk away.

Andrew Skotzko [00:16:35]:

Yeah. And that goes back to something we said in the main conversation, which was also about, you need to understand before you go down this road, the business outcomes you're really hoping for and what product outcomes that are on the way to those business outcomes. Because otherwise, especially if you end up in bucket three, the really hard case, it can be extremely hard to know when to stop and when it's just not worth it anymore.

Chris Smith [00:16:59]:

Yeah, absolutely.

Andrew Skotzko [00:17:00]:

All right, so just in wrapping this up, any other kind of like, hey, just seen this come up as a gotcha in every budgeting conversation. Like, everyone forgets about this or anything like that, where you're just like, hey, just don't forget this.

Chris Smith [00:17:12]:

Yeah, actually, and I should have said it because it is an important part of it, particularly once you get into the second and third case. But even in the first case, there is a bit about measuring your outcomes. That is an investment that you need to make that people often forget about how much investment they need to make there. And what they find out either at the end is, oops, there's an extra cost I didn't think about, or you don't spend the money and then you don't get really meaningful results. So this is about stuff like making sure you're properly collecting signals about whether you got a good result or not, that you're organizing that data in a useful way. And if you're doing any kind of like a b testing, making sure that you're doing that in a principled fashion as opposed to some sort of ad hoc fashion. And you do definitely want to work with your team to identify how much investment you need to make there right from the beginning, because that's the part that people always forget about.

Andrew Skotzko [00:18:07]:

Yeah. And the one thing I'll add on to that is with anything like this would be a very good time to engage your product discovery and prototyping skills. Because particularly if you're in bucket three and you don't know how much it's going to cost you, but you know, it's a lot. You need to figure out if it's actually going to deliver any business value, and that is what product discovery is all about. So please do that and remember that. And I think in the main conversation we talked about this, there's this idea that not every product is going to end up being an AI product or having AI in it, and that's fine, right? But you can still benefit from AI in other ways, maybe your internal operations, things like that, but just some things to think about to help people perhaps stay a little more even keeled amidst the pressure they're probably feeling to aiify all the things. It does not always make sense. And that is okay.

Andrew Skotzko [00:18:56]:

Yeah.

Chris Smith [00:18:57]:

And that's a very important part of all of these projects where I talked about them is you're going to measure an outcome, and one of the possible outcomes is this doesn't work. This is not a good application of the technology and it doesn't work. That's table stakes. You have to have an agreement that's a possible outcome before you start exploring the space in the first place. If you're not prepared to believe that is possible, at least then you got to have that conversation.

Andrew Skotzko [00:19:23]:

Yeah, it's a very fruitful leadership conversation to have of really getting clear on the outcome. All right, Chris, well, thank you so much for sharing that with us. I know that so many of the questions I asked you, the answer, it really is, it depends. But just getting a rough mental model, I think is actually really useful for people because there's just so many unknowns here. So thank you very much.

Chris Smith [00:19:43]:

Thank you very much. And you framed it really well. I think it is really about just having the mental model and being aware of the fact that you don't know there are a lot of unknowns and it really does depend, but you can certainly use that mental model to help you walk through the unknown.