In the past year, AI has gone from interesting to impactful.
While people had built AI applications prior to 2024, there were few that had achieved massive scale. Today, there are too many to count.
This shift is very exciting, but has also created challenges for developers. The frameworks, tools, and implementation techniques that worked for test applications don't work in production. Developers are running into problems with managing, measuring, and optimizing scaled AI applications.
People have tried many ways to solve these problems, including forking open-source libraries, setting up custom dashboards, stringing together tools (none of which were purpose built for problem at hand) or - most commonly - just shipping something that works, and hoping for the best.
But what if there was a way for any company building with Azure AI to solve all these problems with one single integration?
Well today, there is: The Statsig <> Azure AI Integration.
The Statsig <> Azure AI Integration is a complete solution for configuring, measuring, and optimizing AI applications.
The Statsig Azure AI integration allows you to:
Configure your Azure AI models from a single pane of glass
Implement Azure AI models in code using a simple, lightweight framework
Automatically collect a variety of metrics on model & application performance
Run powerful A/B tests and experiments to optimize your AI application
Compute the results of all tests automatically - with no additional work required
Even better, this integration is included for every current Azure AI customer.
Companies like OpenAI, Anthropic, Notion, and Brex have been using Statsig to build and optimize their AI applications for several years. However, historically, there hasn't been a native integration available for Azure customers, so companies building on top of Azure would need to invest time in manual configuration.
Now, any company can access world-class data tools to optimize their AI application within their Azure project.
Let's dig in to how it can help you build AI applications.
Statsig has helped thousands of companies add dynamic configuration to their application, collect metrics on product performance, and run powerful A/B tests. Now, those features are available to any Azure AI customer via a simple SDK and a powerful native integration.
Statsig’s Azure AI SDKs simplify the implementation of features like completions and embeddings in your server application, in two primary ways:
They provide a layer of abstraction from direct Azure AI API calls, letting you store API parameters in a config and change them dynamically (rather than making code changes)
They give you a simplified framework for implementing Azure AI models in code
This unlocks a very high level of flexibility for your engineering team. Swapping models in production is now something you can do in seconds - without having to reconfigure your code.
The Statsig Azure AI SDK will also automatically log metrics on model and application performance to a Statsig project, giving you a simple, out-of-the-box method for tracking model cost, latency, and performance.
Once you've begun to configure your application and collect metrics, Statsig can be used for broader use cases, including:
Targeting releases to internal users to test changes in your production environment
Running staged rollouts of new models or model configurations over time
Running powerful A/B tests and multi-variate experiments
Ready to get started? Keep reading for a deep dive into how this actually works.
When building an application using Azure AI, you first need to create a model in the Azure AI platform. Azure AI gives you access to thousands of unique models behind a single API call, giving you a powerful, flexible way to implement models from many providers.
Once you select the model, you simply need to call the model endpoint with your key (plus an additional set of parameters) to get an output.
While this is simple and intuitive, it can become complex if you’re managing many model configurations in production - particularly if you'd like to use a unique set of parameters for models in different parts of your application.
This is where the Statsig <> Azure AI SDK comes in. Statsig helps you avoid this problem by giving you the ability to create a Model Client - either through a Statsig Dynamic Config, or through hardcoded values. Here’s an example of a dynamic config:
Once you’ve created this client, calling a model in code is easy. You simply instantiate your Model Client directly by using the id of this Dynamic Config like this:
You can then use this model client to generate text responses for specific prompts. These prompts could also be stored in dynamic configs, for additional configurability.
Once this is implemented, all you need to do to adjust the configuration of your model is to change the value of your dynamic config in Statsig. Once the change to the config is made, it will be live in any target applications in ~10 seconds!
This type of configurability has several benefits:
It gives your team a central place to manage model configurations
It allows your team to quickly test new models or “hot swap” models if a provider is down
It provides your team with a single development framework for building AI applications, preventing any friction from underlying differences the models themselves
This becomes even more exciting when combined with automatic metric logging.
Statsig’s Azure AI SDK automatically captures relevant invocation and usage metrics from each API call and logs them to Statsig. Here’s an example event stream:
By default, completion invocations capture the following metrics automatically: completion token length, prompt token length, latency, model name, total token length. You can add any other metrics you care to log on model performance, latency, or cost.
Once these events are in your Statsig console, you have the ability to create additional custom metrics by using filters, groups, and aggregations. For example, you can create a custom metric that captures average latency from a specific model, or average inference cost from a specific page of your application.
All of these metrics can be accessed in product analytics workflows, or surfaced in dashboards, giving your team full visibility into application performance.
Once you have metrics up and running, you’re ready to begin optimizing your application.
The easiest test to start is by running simple A/B tests - such as testing one model against another. All you need to do is create two configs, add them as inputs in a Statsig experiment, then implement the experiment in code. You can see a full example here.
Running A/B tests allows you to see the causal effect of changing any model parameter on your actual product metrics. This allows you to do things like:
Compare the latency of different models in your production environment
Measure the impact of prompt changes on engagement, user retention, etc.
Understand the relevance of various settings like temperature, max tokens, or frequency penalty
There are many ways to configure AI experiments, and many AI experiments to run - but the important part is getting started. Online experimentation is an extremely powerful method for optimizing AI application performance.
Statsig is incredibly excited to collaborate with Azure AI on this framework. We hope it makes it far easier for developers to build with Azure AI - particularly as applications grow in complexity and scale.
If you're ready to get started, you can go to statsig.com/azureai-docs to learn more. If you have questions or suggestions, please reach out to our team - either on Slack, or via email.
Happy Building!
Understand the difference between one-tailed and two-tailed tests. This guide will help you choose between using a one-tailed or two-tailed hypothesis! Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾
From continuous integration and deployment to a scrappy, results-driven mindset, learn how we prioritize speed and precision to deliver results quickly and safely Read More ⇾
Take an inside look at how we built Statsig, and why we handle assignment the way we do. Read More ⇾
Learn the takeaways from Ron Kohavi's presentation at Significance Summit wherein he discussed the challenges of experimentation and how to overcome them. Read More ⇾
Learn how the iconic t-test adapts to real-world A/B testing challenges and discover when alternatives might deliver better results for your experiments. Read More ⇾