These payloads contain everything our customers need to configure and optimize their applications—such as feature flags, experiments, and dynamic parameters—all tailored to the user making the request. This process is designed to ensure that our customers can deliver personalized, performant, and feature-rich experiences at scale.
However, as our customer base grew—and as some of our customers started operating at incredible scales—so did the complexity and volume of the data served by these payloads. Maintaining the consistency of this data throughout a request lifecycle became increasingly challenging.
We knew we needed a better approach—one that could scale with our customers without sacrificing the performance and reliability that Statsig is known for.
On each request, a versioned snapshot of a company’s data is created in order to ensure consistency throughout the entirety of the request. Creating this snapshot each time required a fair amount of CPU though, and as some of our customers scaled, it became a significant bottleneck in our ability to serve requests in a timely manner.
Calculating these snapshots would take upwards of 500ms
for our largest customers, which required significant over-provisioning to maintain healthy latencies.
Building each of these snapshots is a multi-step process:
Fetch updates to the company’s entities
Create wrapper objects around the raw data
Create views and indexes on top of the wrapper objects
Views and indexes for entities were originally implemented in a very naive way, recalculating the full output each time, from the base objects. Writing it this way gave very strong assurances that the result would be correct, but came at the price of CPU utilization.
Our original code did a full recreation of all views from the base, on each change, and looked pretty similar to this sample:
A significant speed-up to this function came from copying the previous snapshots data, and using only the fetched changes to mutate the new snapshot. This clone + delta strategy allowed work to be preserved across builds, which was a significant savings in object creations and garbage collection.
The resulting code looked something like this:
Changing to this model took processing time from ~500ms
to ~50ms
, a significant speed-up.
Changes are constantly streamed in, so changes.length
is typically < 10. After benchmarking this approach, nearly all clock time was now being taken up by the call to Object.assign
. Incremental work caused by changes
was now near 0.
Even in this paradigm though, each store rebuild still required a full copy of the pointers to objects. 50ms
is a very high flat price to pay per rebuild, so we investigated how to limit the amount of pointer copying required:
To get around the cost of copying the entirety of the entities map, we devised a data structure that uses an Array<Map<string, T>>
to selectively invalidate and copy small portions of memory at a time.
The design was inspired by rust’s dashmap library, which uses a similar hash map sharding strategy to maximize concurrent access.
With this approach, a majority of memory is now shared between these immutable stores, representing a significant perf savings.
clone()
: Creates a new FastCloneMap
, which shallow copies the top-level hashing array. Since the number of buckets used is typically sqrt(entities.length)
, the big O for cloning goes from O(n)
to O(ln(n))
.
Additionally, in testing we found that shallow copying Array
s is dramatically faster than Map
, which gave another sizable performance increase.
Full Big O analysis for store creation. n
= total entities for a company, d
is length of changed entities:
Previous: O(n+d)
Now: O(d*ln(n)+d)
Since making this change, the company that previously took 500ms
, and then 50ms
, now takes ~2ms
for a full rebuild.
All perf wins aren’t without trade-offs:
The upfront initial store creation cost nearly doubled, as the extra Map
and Array
declarations are comparatively expensive.
This is fine in the majority case - our services have a warming period when spinning up. A one-time cost on pod startup is a very small price to pay for a lifetime of cheap rebuilds. If this was a throwaway or short-lived computation, however, this data structure would likely be a poor fit.
The access speed is slower than just using a Map
, but in testing this cost was virtually negligible for our use case. If our read/write ratio were higher, it would likely be too expensive to have the extra hash check.
It's one thing to have a really great and functional product. It's another thing to have a product that feels good to use. Read More ⇾
The authoritative guide on the design and implementation of an in-house feature flagging and AB test assignment platform. Read More ⇾
Standard deviation and variance are essential for understanding data spread, evaluating probabilities, and making informed decisions. Read More ⇾
We’ve expanded our SRM debugging capabilities to allow customers to define custom user dimensions for analysis. Read More ⇾
Detect interaction effects between concurrent A/B tests with Statsig's new feature to ensure accurate experiment results and avoid misleading metric shifts. Read More ⇾
Statsig's biggest year yet: groundbreaking launches, global events, record scaling, and exciting plans for 2025. Explore our 2024 milestones and what’s next! Read More ⇾