Today we're announcing the Statsig Elixir Core SDK, in an early Beta. Elixir core is built around our performance-focused and rust-based Statsig Server Core library, that we're rolling out across multiple languages and frameworks. The Server Core library will receive feature updates and performance optimizations rapidly given its usage across multiple SDKs. Elixir core is written with Rustler and has some operations that are dirty scheduled, so it may not work in all Elixir environments. This SDK is in early Beta, and we'd be happy to hear your feedback in our Slack Channel. Get started with Elixir core in our docs!
Having your history of experiments attempted lets team's create institutional memory. When you create a new experiment with a hypothesis, Statsig now shows you existing, related experiments in case you want to explore what has already been done. You can always go to the knowledge base to explicitly look for this context too - but it's delightful to have this surfaced in-context without being intrusive.
If an implementation issue means that you've over-exposed users to an experiment, you can retroactively apply a filter to only analyze people truly exposed to an experiment. This was previously available on Analyze Only experiments on Warehouse Native (to work around over-exposure from 3rd party assignment tools). It is now available for Assign and Analyze experiments (where you're using the Statsig SDKs for assignment).
To do this, use the Filter Exposures by Qualifying Event option on the experiment's setup page (under advanced settings).
This filtering is also possible to do in Custom Queries (under the Explore tab).
Example use case : There is an experiment on search suggestions, visible only when people click into the search box. User's are currently exposed when the search bar renders, but this causes dilution, since all users see the search bar. In this case, we'd want to filter down exposures to users who actually clicked into the search box - so we'd point to a Qualifying Event (defined in SQL) to filter exposures down to this subset of users.
The Experiment Quality Score is a metric designed to get a sense for the quality of an experiment configured within Statsig. It helps experimenters quickly identify potential issues in experiment setup, execution, and data collection, ensuring more confident decision-making. Measuring this over a number of experiments can help measure improvements in maturity over time.
Learn more about enabling and configuring it here. This is rolling out now.
Server Core is a full rewrite of our Server SDKs with a shared, performance-focused Rust library at the core - and bindings to each other language you'd like to deploy it in. Today, we're launching Node Server Core (Node Core).
Node Core leverages the natural speed of a core written in Rust - but also benefits from all of our latest optimizations in a single place. Out initial benchmarking suggests that Node Server Core can evaluate 5-10x faster than our native Node SDK. Beyond that, Node Core supports new features like Contextual Multi-Armed Bandits, and advanced bootstrapping functionality, like bootstrapping Parameter Stores to your clients. Using Node core with our Forward Proxy has even more benefits, as changes can be streamed, leading to 1/10th of the CPU intensity.
Node Server Core is in open beta beginning today, see our docs to get started. In the coming months, we'll ship Server Core in Ruby, PHP, and more - if you're looking forward to a new language, let us know in Slack.
One caveat of most experimentation implementations is the latency required to get experiment values for each user. Various approaches attempt to work around this, some of which Statsig provides - like local evaluation SDKs, bootstrapping, and non-blocking initialization, but each have their own caveats - like security, speed, and making sure you have the latest values (respectively).
Today we're announcing a new feature that we believe resolves many of these concerns for experimenting at app startup: the Statsig Local Eval Adapter. With this approach - you can ship an app version or webpage with a set of config definitions, which can be evaluated immediately on startup. Following that initial evaluation - values from the network can takeover.
While local evaluation SDKs - which download the experiment ruleset for all users - could theoretically solve this problem in the past by shipping that ruleset with the SDK, they didn't have the ability to switch into a "precomputed" mode after that, meaning if you wanted to ship configurations with an app, you were compromising security. With this approach, you can be selective on the info included in the Adapter, ensuring security. Check out the Local Eval Adapter in our docs!
The Statsig SDKs use deterministic hashing to bucket users. This means that the same user being evaluated for the same experiment will be bucketed identically - no matter where that happens. Every experiment has it's own unique salt, so that each experiment's assignment is random.
For advanced use cases - e.g. a series of related experiments that needs to reuse the control and test buckets, we now expose the ability to copy and set the salts used for deterministic hashing. This is meant to be used with care. and is only available to Project Administrators. It is available in the Overflow (...) menu in Experiments.
We’re excited to release Max/Min metrics on Statsig Warehouse Native. Max and Min metrics allow you to easily track users’ extremes during an experiment; this can be extremely useful for performance, score, or feedback use cases. For example, these easily let you:
Understand how your performance changes impacted users’ worst experiences in terms of latency
Understand if changes to your mobile game made users’ peak high scores change
Measure the count of users in your experiment that ever left a 2-star review, or lower — using MIN(review_score) with a threshold setting
Mins and maxes can map directly onto users’ best and worst experiences, and now it’s just a few clicks to start measuring how they’re changing with any feature you test or release.
When you're done with your experiment, you can now chose to ship it with an experiment-specific holdback. This is helpful when you're done with the test, are shipping a test group, but still want to measure impact on a small subset of the population to understand longer term effects.
Example use case : When ending a 50% Control vs 50% Test, you can ship Test with a 5% experiment specific holdback. Statsig will ship the Test experience to 95% of your users - and will continue to compute lift vs a the 5% holdback. It compares this 5% holdback (who don't get the test experience) to a similar sized group who got the test experience when you made the ship decision. You can ship to the holdback when you conclude this experiment. See docs
Statsig also natively supports Holdouts. These typically are used across features, and aren't experiment specific.
Server Core is a full rewrite of our Server SDKs with a shared, performance-focused Rust library at the core - and bindings to each other language you'd like to deploy it in. Today, we're launching Python Server Core (Python Core).
Python Core leverages the natural speed of a core written in Rust - but also benefits from all of our latest optimizations in a single place. Out initial benchmarking suggests that Python Server Core can evaluate 5-10x faster than our native Python SDK. As an added benefit, Python Core's refresh mechanism is a background process, meaning it never needs to take the GIL. Using Python core with our Forward Proxy has even more benefits, as changes can be streamed, leading to 1/10th of the CPU intensity.
Python Server Core is in open beta begging today, see our docs to get started. In the coming months, we'll ship Server Core in Node, PHP, and more - if you're looking forward to a new language, let us know in Slack.