Products

Solutions

Resources

Docs Pricing

Products

Solutions

Resources

Products

Solutions

Resources

How we 250x'd our speed with FastCloneMap

Fri Feb 07 2025

At Statsig, we power decisions for our customers by delivering highly dynamic initialize payloads.

These payloads contain everything our customers need to configure and optimize their applications—such as feature flags, experiments, and dynamic parameters—all tailored to the user making the request. This process is designed to ensure that our customers can deliver personalized, performant, and feature-rich experiences at scale.

However, as our customer base grew—and as some of our customers started operating at incredible scales—so did the complexity and volume of the data served by these payloads. Maintaining the consistency of this data throughout a request lifecycle became increasingly challenging.

We knew we needed a better approach—one that could scale with our customers without sacrificing the performance and reliability that Statsig is known for.

The problem

On each request, a versioned snapshot of a company’s data is created in order to ensure consistency throughout the entirety of the request. Creating this snapshot each time required a fair amount of CPU though, and as some of our customers scaled, it became a significant bottleneck in our ability to serve requests in a timely manner.

Calculating these snapshots would take upwards of 500ms for our largest customers, which required significant over-provisioning to maintain healthy latencies.

Building each of these snapshots is a multi-step process:

Fetch updates to the company’s entities
Create wrapper objects around the raw data
Create views and indexes on top of the wrapper objects

Views and indexes for entities were originally implemented in a very naive way, recalculating the full output each time, from the base objects. Writing it this way gave very strong assurances that the result would be correct, but came at the price of CPU utilization.

Our original code did a full recreation of all views from the base, on each change, and looked pretty similar to this sample:


public function createView(
    rawEntityData: EntityData[]
) {
    for (entityData in rawEntityData) {
        const loadedEntity = Entity.load(entityData);
        this.entities[loadedEntity.getId()] = loadedEntity;
    }
}

Rebuilding from base store data

A significant speed-up to this function came from copying the previous snapshots data, and using only the fetched changes to mutate the new snapshot. This clone + delta strategy allowed work to be preserved across builds, which was a significant savings in object creations and garbage collection.

The resulting code looked something like this:


public function createViewFromPreviousStore(
    prevStore: Store, 
    changes: EntityData[]
) {
    this.entities = Object.assign({}, prevStore.entities);

    for (entityData in changedEntityData) {
        const loadedEntity = Entity.load(entityData);
        this.entities[loadedEntity.getId()] = loadedEntity;
    }
}

Changing to this model took processing time from ~500ms to ~50ms, a significant speed-up.

Changes are constantly streamed in, so changes.length is typically < 10. After benchmarking this approach, nearly all clock time was now being taken up by the call to Object.assign . Incremental work caused by changes was now near 0.

Even in this paradigm though, each store rebuild still required a full copy of the pointers to objects. 50ms is a very high flat price to pay per rebuild, so we investigated how to limit the amount of pointer copying required:

Enter FastCloneMap

To get around the cost of copying the entirety of the entities map, we devised a data structure that uses an Array<Map<string, T>> to selectively invalidate and copy small portions of memory at a time.

The design was inspired by rust’s dashmap library, which uses a similar hash map sharding strategy to maximize concurrent access.

With this approach, a majority of memory is now shared between these immutable stores, representing a significant perf savings.

The API

clone(): Creates a new FastCloneMap, which shallow copies the top-level hashing array. Since the number of buckets used is typically sqrt(entities.length), the big O for cloning goes from O(n) to O(ln(n)).

Additionally, in testing we found that shallow copying Arrays is dramatically faster than Map, which gave another sizable performance increase.


public function createViewFromPreviousStore(
    prevStore: Store,
    changes: EntityData[]
) {
    this.entities = prevStore.entities.clone();

    for (entityData in changedEntityData) {
        const loadedEntity = Entity.load(entityData);
        this.entities.set(loadedEntity.getId(), loadedEntity);
    }
}

Full Big O analysis for store creation. n = total entities for a company, d is length of changed entities:

Previous: O(n+d)
Now: O(d*ln(n)+d)

Since making this change, the company that previously took 500ms, and then 50ms, now takes ~2ms for a full rebuild.

Trade-offs

All perf wins aren’t without trade-offs:

The upfront initial store creation cost nearly doubled, as the extra Map and Array declarations are comparatively expensive.
- This is fine in the majority case - our services have a warming period when spinning up. A one-time cost on pod startup is a very small price to pay for a lifetime of cheap rebuilds. If this was a throwaway or short-lived computation, however, this data structure would likely be a poor fit.
The access speed is slower than just using a Map, but in testing this cost was virtually negligible for our use case. If our read/write ratio were higher, it would likely be too expensive to have the extra hash check.


class FastCloneMap<T> implements Map<string, T> {
    public buckets;
    private __has_modified: Array<true | undefined>;

    public constructor(buckets: Array<Map<string, T>>) {
        this.buckets = buckets;
        this.__has_modified = new Array<true | undefined>(buckets.length);
    }

    public clone(): FastCloneMap<T> {
        return new FastCloneMap([...this.buckets]);
    }

    private hash(value: string): number {
        return (
            (value.charCodeAt(0) * 31 + value.charCodeAt(value.length - 1))
            % this.buckets.length
        );
    }

    public get(key: string): T | undefined {
        const hash = this.hash(key);
        return this.buckets[hash]?.get(key);
    }

    public set(key: string, value: T): this {
        const hash = this.hash(key);
        if (!this.__has_modified[hash]) {
            this.__has_modified[hash] = true;
            this.buckets[hash] = new Map(this.buckets[hash]);
        }
        this.buckets[hash].set(key, value);
        return this;
    }
}

Request a demo

Statsig's experts are on standby to answer any questions about experimentation at your organization.

Grab a Demo

Permalink: https://www.statsig.com/blog/perf-problems-250x-fastclonemap

Products

Solutions

Resources

Products

Solutions

Resources

Docs

Pricing

Back to Blog home

Brent Echols

How we 250x'd our speed with FastCloneMap

At Statsig, we power decisions for our customers by delivering highly dynamic initialize payloads.

The problem

Rebuilding from base store data

Enter FastCloneMap

The API

Trade-offs

Request a demo

Recent Posts

Optimizing cloud compute costs with GKE and compute classes

Pablo Beltran

How Statsig lets you ship, measure, and optimize AI-generated code

Sid Kumar, Brock Lumbard

Your users are your best benchmark: a guide to testing and optimizing AI products

Skye Scofield

The more the merrier? The problem of multiple comparisons in A/B Testing

Allon Korem, Oryah Lancry-Dayan

Randomization: The ABC’s of A/B Testing

Allon Korem, Oryah Lancry-Dayan

Speeding up A/B tests with discipline

Yuzheng Sun, PhD