Warehouse Native Year in Review

Tue Jun 25 2024

Craig Sexauer

Data Scientist, Statsig

A year ago, Statsig announced Warehouse Native, bringing our experimentation platform into customers’ Data Warehouses.

Since then, Warehouse Native has grown into a core product for us; we treat it with the same priority as Statsig Cloud, and have developed the two products to share core infrastructure, methodology, and design philosophies. In fact, Warehouse Native integrates directly with Statsig’s battle-tested experimentation SDK and real-time logging infrastructure - resulting in customers getting near-real-time experiment results in their own warehouse.

The team supporting Warehouse Native has grown as well, with a dedicated vertical pod to support customers, build features, and continuously optimize the jobs we run on customer warehouses. I wanted to take a moment to talk about how we got here, what we learned, and what lies ahead.

Why Warehouse Native?

One of the first features I worked on at Statsig was our data imports. These reverse ETLs allow customers on cloud to sync metrics, events, or rich metadata from their warehouses that they couldn’t log directly to Statsig. This was a solid solution for a core problem a lot of teams had, which is that they were having to recreate their “source of truth” metrics like revenue and user accounting from scratch in Statsig, and even a 1% jitter in topline values could be cause for concern.

When we talked to some of these teams, the common narrative was that they already had existing data stacks for metrics, experimentation, or both; the unlock they needed was the management of experiment analysis and the required statistics and health checks to do so at scale. The other common story was that they were in banking, healthcare, or another field where customer data is highly sensitive and often regulated.

We were noticing that as we grew and started attracting more mature customers, these source-of-truth and security questions came up more in evaluation; it became very clear that not having a solution for this would, in the long-term, hinder our ability to serve a large class of potential customers.

Statsig has a culture of escalating quickly and openly - we started to have internal debates at every level about what we should do. The crux of the debate quickly became clear - some team leads were concerned about the maintenance cost of maintaining two systems, and unknowns about technical complexity on the data side. A few days later, we came back with a plan for how to:

  • Integrate the foreign warehouse portion of a warehouse native product with our existing statistics and console product . This would minimize reproduced work and the complexity of supporting multiple product lines. We also made it clear what portions of our stack would diverge and have to be built again from scratch.

  • Prove out the core concept quickly with a small, senior team to make sure we understood if, and how, the product would work - before investing more heavily in the full product experience.

With this, the path forward was clear - we carved out the bandwidth for a few of us to focus on an initial build-out and we started sprinting.

What Worked

The (very serious-looking) Warehouse Native team at an offsite

We were able to spin up our MVP product in a very short period of time, and got it to market shortly after. When we looked back on our process, there were a few key factors that helped us:

  • Willingness - and internal/external alignment - to ship things that were functional-but-ugly

  • Letting the development team play to their strengths

  • Treating early customers like team members

Done Is Better than Perfect

When we launched Warehouse Native, our Statistics engine was strong, and we were confident in the results. However, secondary features (non-critical metric types, infrastructure for managing warehouse storage, and validation to prevent user inputs from breaking SQL jobs) were missing.

We were transparent about this with early customers. We made the bet that if we instantly addressed issues as they came up (and I mean instantly - this doesn’t scale) in the short term, we’d get feedback on what mattered to people and have valuable insight on what to prioritize - without burning their trust. So far, this seems to have worked well - as our customer base has scaled we’ve tightened up the ship, but we were able to move at lightning speed to build the first two dozen or so meaty features on Warehouse Native.

This isn’t a one-size fits all solution. One of the competitive advantages we think Statsig has is “clock speed” - we just build faster than other teams building the same thing. Being able to stay ahead of customer requests and keep them satisfied meant an unsustainable sprint; every morning was triage, communication, a whiteboard, and then heads-down building. We had faith that we could build enough out in front of customer demand to get to where we are today - working hard, but in a stable way.

The flip side of this coin is the risk of building things that don’t make sense - us ‘just building things without thinking about them’ has actually come up in sales cycles as competitor-seeded “FUD” (Fear, Uncertainty, and Doubt) - but as we built out the competitive features that we missed, we’re now watching competitors catch up to the same features we’ve already built in front of them, which is our signal that we’re not off the mark on what we’re prioritizing.

Playing to Strengths

Statsig often frames people in terms of engineering archetypes; our team leads were:

  • An Engineering “Code Machine” - someone who could crank out quality code twice as fast as other engineers

  • A PM “Tech Lead” - someone who leads efforts across the company

  • A DS “Product Hybrid” - someone who bridges product and engineering to solve complex business problems

Early in the process, we spent a few hours with a whiteboard and figured out all of the pieces that needed to exist for an MVP to work. This meant identifying potential blockers on the compute side, de-risking scale as customers started running hundreds of experiments and large compute jobs, and figuring out how to marry any new infrastructure into our existing console in a maintainable way.

We didn’t figure out exact solutions or perfectly formulate long-term visions, beyond absolute hard requirements. Honestly, that wasn’t in any of our wheelhouses. We had effective expertise around what requirements existed, how our current system worked (and how we could plug into it), and what customers were looking for.

The way it panned out was that we all got to cook:

  • Our engineering team was able to build a new framework for a warehouse metric semantic layer, orchestration for warehouse jobs, and connecting warehouse results to the console from scratch in an incredibly short period of time. This unblocked the product team - our DS developers and PMs could start using and developing on top of it.

  • The data team focused on identifying the “end user” problems we needed to solve with the platform, and turning the new metric definitions into scalable SQL jobs that we could plug into the orchestrator. As the core product solidified, we started spending more time on problem “discovery” - meeting with data scientists at other companies and learning what we were missing.

  • Our PM team quickly learned what made Warehouse Native different, and worked hard to unblock us by getting customer feedback — and being the first “customer” to poke holes in the MVP version of Warehouse Native.

It felt like our clock speed was through the roof; we were integrating multiple large features a week, and on top of this base we continued to keep up a stream of meaningful features and improvements to the platform like percentile metrics, stratified sampling, meta-analysis, semantic layer syncs, and more.

This was super productive - and also super fun for everyone involved.

Welcome to Statsig!

I’ve written before about how important it is to us to have a partnership mentality when working with customers. In our early days with Warehouse Native, we bet big with this; we didn’t have all of the fluff features and quality-of-life improvements ready on day one, and we knew that — so we let prospective customers know that as well.

This worked, because the next thing we did was invite them to be partners, and then did our best to deliver. This looked like:

  • Shipping features behind gates to customers in a matter of hours or days after they were requested. We’d message them in Slack to let them know it was working, but in beta, and that their feedback was exactly what we needed to sharpen it.

  • Having frank discussion about our internal priorities, and integrating feature requests into our roadmap based on asking for their priorities. In some cases this ended up being a collaborative set of Google Docs; it felt exactly like we were two teams at the same company trying to solve a problem on our joint roadmap.

  • Co-developing; the Warehouse Native development team flew to Berlin, London, San Francisco, New York, and a few more places to sit down, whiteboard, and solution with customers. This was invaluable - a day or two of jet-lagged, coffee-fueled in-person context dumps, was the equivalent of weeks or months of asynchronous communication.

This strategy had risks - but because of the clock speed and the scrappy v-team we had, we were able to avoid overcommitting and delivered on promises in rapid order.

Challenges

No plan survives first contact with the enemy - we definitely had a fair share of challenges (expected and unexpected).

People Scale Linearly

As customers started evaluating Statsig, the initial flow was very manageable - we had a rotating cast of folks trying out the platform, having questions, and then figuring things out and happily humming along in their experimentation program. Because of this, we didn’t allot much time for thinking about how to scale support and communication.

A few months after launch, we got hit with a tidal wave of customers - for a slew of reasons, interested companies had decided to bite. I remember a day where I worked through problems with well over a dozen different customers who were in various phases of their proof of concept (PoC).

A few of us on the data team like to joke - when we know we’re doing something that might bite us later - that “hey, if it happens, it’s a good problem to have”. It definitely didn’t feel like it at the time; until we got our ducks in a row, development on the data side crawled to a halt for a few weeks as we dealt with the deluge of questions.

What helped?

  • Documentation makes people super-linear. Our docs were okay, but not comprehensive; at some point I dropped everything for a day (onto Tim - sorry!) and completely rewrote our docs. Since then, any time a question came up more than once I put it on my bug list and had asked people to prioritize it accordingly. You can check out our docs here.

  • Hiring and onboarding internally; this is slow, but once we got a critical mass of folks who could answer questions, the constant context-switching cost on the core team fell off exponentially and we were able to get moving again, out of the danger zone where we couldn't find time to focus on work.

Craigs Theory Context Switching

Looking back, I wish we’d invested in the docs/product clarity tax much earlier. In hindsight, adding a large volume of new data-oriented practitioners (Warehouse Native skews to Data Science users) to our customer base was a bandwidth sink that we hadn’t thought about when talking about hiring.

Third Party Dependencies Always Break

This is kind of cheating, because it’s also a scaling issue. Warehouse queries fail transiently, people drop tables or views by accident, and because we didn’t start with all of the quality of life features, it took us a while to build in resilience.

At the same time - back to clock speed and customer focus - we made the decision to have a side chat where every failed experiment scorecard load would send us a message. This was amazing - and we still do it - because we got ahead of frustrating situations if there was a real bug, and were able to proactively reach out to customers about solutions on our end, or theirs, to transient problems.

As time went on, our error rate went down as we built in more validations, retries, and a layer to parse what was a real issue with someone’s data setup vs. a transient or network error; however, the usage of our platform scaled multiple times faster. We had a customer drop a core view, and we got over 50 messages as their daily experiment loads had partial failures when metrics depending on that view couldn’t load.

What helped?

  • Extending our parsing layer saved us here; we now receive a daily summary of “transient” errors and will reach out if something stays broken, but we keep real-time alerting for something breaking in our infrastructure or if a bug gets pushed.

  • Buffing up the tools on our internal dashboards allowed us see the digest of failures clearly, dig into the queries, and easily filter out likely low-pri errors while giving visibility into the health of the platform.

We knew this would happen - it was part of the maintenance risk we discussed when we started work on the platform. The main learning was how high the cost of getting pinged and having to dig into the alert was, and how well that could be solved with better tooling.

Failure Rate WHN Review

Where we’re at, and what’s next:

Statsig has two “flavors” now - “Cloud” and “Warehouse Native”. Both run the same statistics and core analysis, in different places - and can solve for different experimentation needs.

Customers are running thousands of experiment analyses in Statsig Warehouse Native - ranging from 1,000-subject product tests to 1B+ subject session-level tests. There’s new experimenters measuring their feature releases for the first time, and established (and well-known) experimenters who have moved their platform to Statsig Warehouse Native so they can spend their time implementing more advanced measurement approaches.

Some large customers on cloud who were almost exclusively using our metric imports have switched to calculating in their warehouse to reduce latency and additionally enrich the data they’re using.

All of this means is that we’re seeing customers onboarding quickly, and we’re excited to do the work to keep them coming.

Statsig WHN Usage

As Marcos, our head of engineering, likes to say - we’re all in. We see a few key steps in the journey to come:

  • Continued focus on being #1 in experimentation. Continued development on tools like Stratified Sampling, Meta-Analysis, Switchback and Market Based Tests, and CUPED implementations are all aimed at keeping us in a spot where we think Statsig is a no-brainer choice for companies who are considering build vs. buy for experimentation.

  • Deeper integrations with the customer data stack. We already offer a semantic layer sync through our console API, but we want to continue building out more richness - and third-party integrations - so Statsig can easily integrate with most customer warehouses.

  • More tools in the warehouse; for example, we have a beta version of Metrics Explorer for warehouse native customers and are excited to continue developing this - sharing a metric definition language between quick metric explorations and advanced statistical calculations helps bridge the gap between exploratory work and rigorous measurement.

Get a free account

Get a free Statsig account today, and ping us if you have questions. No credit card required, of course.
an enter key that says "free account"

Build fast?

Subscribe to Scaling Down: Our newsletter on building at startup-speed.

Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy