Product Design with Generative AI — The HiClass Framework
How to Define Products in a World where 2+2 Can Be Anything between 0 and 100
As Generative AI goes through multiple micro-cycles of hype and disillusionment, Product Managers across the globe and across types of companies — Main Street enterprises and tech startups alike — have been asked to build great LLM-based products in the most efficient way. As someone who has been doing this for a bit now, I have found traditional product design frameworks to be helpful but not completely extensible to the new context.
Identify the Design Factors
The HiClass framework describes these seven design factors, some of them new, as important to developing products with LLMs in addition to various factors you use in any other product design. Each of these may mean something different for your context.
- Hallucination — LLMs will make things up. What is your strategy to minimize hallucinations or the damage they can cause?
- Inaccuracy — No AI system is perfect. How do you handle inaccurate answers? How do you know if an answer is inaccurate?
- Cost — What is the cost per transaction of your product: Is your product viable? At least is there a path to viability? Equally important, what is the cost (and time) to get the product up and running?
- Latency — Esp if you are using chain of prompts or a more advanced prompting scheme, it may take seconds for LLMs to generate output. What is your user doing in the meanwhile?
- Attributability — If you need to, can you explain the source and/or reasoning the LLM used to take the action it took? Can you produce the evidence to assure your user that your product is not hallucinating?
- Security — What are the possible points of leakage of your and your users’ data? What are the various risk scenarios?
- Style — For generative applications, how are you making sure that the generated content is in sync with the tone of your brand, and is generally positive about your brand (believe it or not, LLMs may end up advocating for your competitors if left alone)?
Note 1: It’s important to differentiate between hallucination and accuracy. If a system hallucinates, i.e. makes things up that are not true, then it’s definitely inaccurate, however there could be many other sources of inaccuracy for an LLM system. As a PM actually you do want a small level of hallucination because its control is the same as creativity, and creativity is what gives your bots personality.
Note 2: This list excludes Access to Talent as a design factor, which is probably the most important factor as of Feb 2024, but hopefully this is only a temporary situation.
Prioritize
Not all factors may be important to your user story. For example if you are building an app to generate automated marketing emails, latency, attributability and security may not be as high priority as the style. If this product involves a human in the loop to edit the AI output, then accuracy and hallucination are not mission critical either. Here is an indicative table to start thinking about this.
The more factors survive this stage, the harder you will have to work as a Product Manager before you can launch something. It’s a good idea to rank every surviving factor, identify ideal goals and draw red-lines below which the product will not be acceptable.
Innovate and Trade
Now is when brilliant PMs start to shine. I had written earlier that in general creativity is overrated and misapplied with a couple of exceptions. This scenario is a big one in the exception category. I always ask myself two questions:
One — can I innovate on the product to remove one of the constraints completely. For example, if latency is a big concern due to a chain of prompts, can we design the system not as a bot, where the expectation is for instant messaging, but as a multi-page questionnaire. Now, you can still have dynamic questions leading up to the outcome but each transition gives you breathing space in terms of user expectations. Perhaps you can add a progress bar and a gif showing how hard the system is working in the back end. Or you can keep going with an anticipated workflow, and correct as and when an async signal tells you to. In any case, latency is no longer a huge concern and you can pay some of it to reduce cost.
Two — what are the tradeoffs between the various dimensions of HiClass, and where is a good equilibrium. This is going to be iterative. Given the complexity, it’s also important to do a lot of research upfront because the iterations may be expensive. Here is a chart to remind you of how the different factors of HiClass affect the others (if you improve a row, the cells indicate what happens to the columns).
Hope this is helpful. As a parting comment — please remember that the architecture of the rest of the system is as important for an AI product as the AI itself. Please don’t ignore it!