This post was originally published on LinkedIn on December 14, 2025.
Evals were everywhere at re:Invent, but the focus felt off. Rather than helping teams get to production evals are becoming observability add ons and that doesn’t feel right. Here are a few take aways of things I saw on the expo floor.
Elie Schoppik from Anthropic gave a great talk on the importance of eval driven development and his points about defensive evals (testing for what could go wrong) really resonated with me as something I see teams missing. Anthropic had a strong presence not only at thier own booth, but across everyones offerings. They clearly have jumped ahead as the enterprise model provider.
On the observability front, it’s always hard to beat Datadog who had a second booth dedicated to just AI tracing (and some new socks in classic datadog purple). That said Braintrust really captured the conversation in terms of being the new player for evals. I didn’t find thier booth but they defacto had the most mindshare of attendees when I brought up evals. Personally I think the Weights & Biases platform is the most comprehensive for MLE teams pulling together everything you need from fine-tuning through to monitoring. Thanks to Nicolas Remerscheid for the demo.
While I fully understand why commercially it makes sense to focus on production monitoring the biggest challenges I am seeing teams deal with are all the decisions, and checkpoints needed to get to production, not monitor it. Thats the idea behind vibe check and it remains an opprtunity I am excited about despite all these well funded encumbants also tackling the problem from other angles.
Of course, my favorite part of re:Invent is getting to meet folks that are building something innovative and new — increasingly rare to find at re:Invent but shout out to turbopuffer for attending their first show, and nice meeting Antonio Bustamante from Bem. Both teams are unlocking unstructured data at scale in innovative and novel ways that maximize infrastructure and model capabilities. Neo clouds like Baseten, Lambda, CoreWeave, and Groq also have an interesting choice to make about re:Invent and AWS but I’ll leave the trillion dollar speculation to the pundits (what could go wrong!?).
Many thanks to the James Bayer for his generosity, and great seeing Graham Siener, Austin Parker, Roko Kruze and many others.Activate to view larger image,
