Hippos, skunks and launch parties: How culture drives successful experimentation outcomes

Wed Mar 06 2024

Sid Kumar

Product Marketing, Statsig

There's finally a definitive answer on where the legendary term "HiPPO" originated.

Last month, I hosted Dylan Lewis, Experimentation Leader at Atlassian, for a virtual fireside chat on building the culture of experimentation. Dylan brings over two decades' worth of experience in the domain and had a lot of great anecdotes to share. 

A trip down memory lane

Back in 2005, when Dylan was working at Intuit-TurboTax as the first Data Analyst on their web team, they had a learning window during tax season, from January through April.

This period essentially provided them one quarter to try out ideas, learn as much as they could, and help customers.

Leadership proposed ideas each Monday morning. The team would then build and launch those experiments by Friday and review the early results the following Monday. 

The outcome at the end of the tax season was revealing:

Out of the 40 experiments they ran, 38 didn’t win. Side note: The two winning experiments came from marketers. ;)

The Highest Paid Opinion (HIPO) was not always correct. 

The customers—the ones actually using the product and experiencing the treatment variants—helped them understand what would ultimately succeed.

Dylan shared, “The term HIPO was modified to 'HIPPO'. Avinash Kaushik presented it at an Emetrics conference, and Ronny Kohavi published this.” It has since become commonplace in the vocabulary of experimentation. Dylan noted that these symbols added a lot of fun and excitement. 

“We loved it, and as teams began experimenting, we sent a (stuffed) hippo to the team with a winning experiment for that week. It moved from one place to another depending on which team was winning, and they got to decorate it. By the end of tax season, the hippo would be covered in souvenirs from the teams.” 

It didn't stop with the hippos; they also introduced skunks, awarded for experiments that didn't win. Engineers would write the experiment ID on the skunks, giving them to people whose experiments didn't achieve 100% success. By the end of the tax season, engineers would have collected plenty of skunks—proudly displayed on their tables in intricate dioramas! 

Culture eats strategy for breakfast 

Now at Atlassian, Dylan is working to scale a mature experimentation program. Modern-day experimentation platforms have become more robust in terms of metric trustworthiness and statistical capabilities, enabling greater experimentation velocity.

Yet Dylan noted that culture remains the biggest challenge for most organizations. A good example Dylan shared, highlighting how culture can make a difference, concerned one of the key metrics on his dashboard: the percentage of failed/restarted experiments—a figure that should be low ideally.

One of their experimentation teams was experiencing a 40% restart rate. To address this, they organized a launch party, during which the experiment was made available to those in the room. This process allowed them to verify if the experience worked as expected.

One of the critical factors for success here was including someone who wasn’t part of the experiment to ensure an unbiased perspective. 

The results were impactful, reducing the percentage of restarts to 5%.

Our conversation was filled with valuable takeaways for operationalizing the culture of experimentation, focusing on themes around identifying roadblocks, conducting reviews, prioritization, and ensuring trustworthiness.

This fireside chat is one you won’t want to miss! Watch below. 👇

Build fast?

Subscribe to Scaling Down: Our newsletter on building at startup-speed.

Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy