Nov 16, 2024
Coval Follow Up
Q: What are the most critical "edge cases" that AI agents should be tested against before they are widely adopted, especially in customer-facing industries?
A: The way I see it, the "edge cases" that AI agents should be tested against can be categorized into three main vectors – security, emotional intelligence (EQ), and human integration. In my article on My Thesis Behind Coval, I highlight an example of using autonomous agents to purchase Knicks tickets. Personally, I would feel naturally wary about providing an autonomous agent with sensitive information, such as my credit details.
There are two key dimensions to security – inadvertent and deliberate data exposure. The scenario I highlight above relates to inadvertent data exposure – for instance, the agent I am using may inadvertently share my credit card information due to inadequate data handling protocols. However, I believe we are rapidly moving towards a future where agents will become the primary interface for human computer interaction. Along with this shift will come a host of challenges, many of which are familiar from the realm of traditional computing – such as hacking, adversarial attacks, and phishing – that agents must be equipped to handle effectively.
Along with the key security risk, strong EQ and human integration capabilities are equally as important. For AI agents in customer-facing industries, managing the full spectrum of human emotions is crucial – from livid anger to nonchalant sarcasm. Regardless of the situation, the principle that "the customer is always right" remains, and agents must respond with empathy, just as a human counterpart would. Failure to do so would effectively eliminate the enterprise use case for agents.
Lastly, understanding the "edge cases" as agents integrate with humans is critical. Having an AI agent should feel like having a personal analyst or assistant, whether in a consumer or enterprise context. You'd want to provide targeted guidance, when necessary, without being overwhelmed by excessive questions. At the same time, you wouldn't want an agent making critical decisions without your approval. Understanding the "edge cases" of when to act and when to seek human guidance is necessary for AI agent adoption in customer-facing industries.
Q: How do you think Coval can stay effective if real-world complexity is inherently difficult to replicate?
A: I believe Coval's potential for success lies precisely in the fact that real-world complexity is inherently difficult to replicate. From a theoretical physics perspective, the world we live in today is the result of countless decisions that have shaped this specific outcome. There could be parallel universes where an infinitesimally different set of decisions led to slightly different outcomes. In my main article, I explore how concepts like non-determinism and the butterfly effect stem from these ideas and impact AI agent behavior.
Ultimately, the complexity of the "real world" arises from the immense computation of possible decisions and their ripple effects on outcomes. Because there are infinite possible decisions and corresponding outcomes, the only way to test an agent accurately and effectively is by running a massive number of simulations that evaluate every potential decision it might face. This is the same approach Brooke utilized at Waymo for autonomous vehicles and why I believe she will be able to replicate her success for AI agents.
Thanks for making it all the way to the end! If you have any thoughts, questions, or feedback, I'd love to hear them – your input is always valuable.