Your landing page absolutely needs cloud imagery, you’ve decided.
You try to explain it to your brand designer, but she doesn’t seem to understand. Your landing page is targeted, and the target audience has a very specific mindset. And that mindset involves big, puffy cumulus clouds—you’re sure of it.
Inconveniently, the brand designer has her own ideas: “I was thinking that a minimal vibe would speak to website visitors. Short text, basic aesthetic imagery, and a prominent ‘get started’ button will probably get our users clickin’,” she says.
And now you are at an impasse. Even your "please, I saw it in a dream" plea is unable to change her mind. 😖 What do you do now?
Well, it's time to talk about A/B testing.
A/B testing is the simple act of testing two different variants and seeing which one performs better. According to Nicolai Kramer Jakobsen, A/B testing has been recorded as far back as 1753, in James Lind’s Treatise of the Scurvy.
Take our “sandbox” project for a spin! In this project, we have generated real-time sample data for you to explore and experiment with.
Long story short, Lind was a doctor who performed clinical trials to treat scurvy for the Royal British Navy. Lind had 12 sailors with scurvy, so he paired them up, and gave different treatments to each pair. He discovered that the pair treated with a lemon and two oranges made a full recovery. Yes, science was really bleak back then.
Later, the term A/B/n test would be created to describe tests like this. What’s the difference between an A/B test and an A/B/n test? Well, an A/B test is a test between two things—thing A and thing B—and an A/B/n test is a test between more than two things: Things A, B, C, D, … N.
What makes A/B testing so ubiquitous (and special) is that the metrics that are used to determine success can be whatever the experimenters choose. In Lind’s case, the metric would be whether or not the scurvy symptoms went away. In the cloudy landing page example, a metric might be CTA clicks, or pageviews per session.
Pro tip: To avoid testing too many random things, we recommend forming a communicable hypothesis for each variant of an A/B/n test.
These days, marketers and frontend teams run all sorts of tests on web pages. The small things can make a big difference, and oftentimes those small things aren’t even considered by content creators.
For instance, our friends at YoYoFuMedia earned a client 28% more revenue on their Shopify site by simply changing the sitewide “complete purchase” button from grey to gold.
Buttons are probably the #1 A/B-tested website component to date. Tons and tons of studies have been done about which colors convey which emotions in marketing, and just about every brand has done its own testing of button colors.
But there’s also button text. Some buttons are no-brainers like “add to cart,” or “checkout,” but what about the lesser-frequented CTAs like submitting a support ticket? What about the “heavy lift” CTAs like demo-request buttons?
Fun fact: Our demo button didn’t always say “Get a live demo.” We actually used Autotune to test variants like “Schedule Live Demo” and “Chat with us.”
Buttons are pretty straightforward to change on just about any site, and there’s no shortage of studies promising big payoffs simply by nailing the button color/text combo. Just make sure to experiment with other aspects too!
Images are tough, and it’s hard to predict how a given audience is going to react to an image. The metrics used to determine a “winner” is usually just simple engagement metrics: Time on page, pages per session, bounce rate, etc.
What makes images difficult is that, unless it’s an infographic, the image will be subject to some qualitative, opinion-based scrutiny from viewers; there’s just no right or wrong image.
Some website visitors—who would otherwise be perfect leads—might not like a specific type of flower (or cloud) on your website, so they bounce. Bummer.
A bit of a bigger lift than images or buttons, some A/B tests experiment with shifts in entire layout and page order. Sometimes it’s simple, like swapping the position of two sections with one another. Other times it’s complicated, and involves dividing an entire web page into sections, and displaying those in varying orders.
Scroll through Intuit’s homepage as an example: As of January 2023, their products appear in the order
If you think that specific order was determined simply by guesstimating user intent, think again.
Although not particularly advisable for blogs and other long-form content, many simple A/B tests are responsibly conducted on core web pages to determine which headings are most impactful.
This is most often done on landing pages. The ol’ marketing 101 handbook dictates that two, nearly-identical landing pages (except for headings/titles) should be created, and then ads and paid traffic can be diverted equally to both of them.
Paid traffic is usually used for this type of landing page testing because it’s a relatively immediate way to get website traffic, and is independent of SEO and organic growth. In fact, having two nearly-identical web pages is a no-no by SEO standards.
Once a winning heading configuration becomes obvious, the old page is deleted, and 301’d to the new page, and ad traffic is changed to solely target the winning page.
Websites and brands often A/B test colors. This is done by brand designers, web developers, CROs, rogue marketers who want to try a new Photoshop technique, and so on.
Officially, most brands have specific color palettes to use for their dotcom presence, but even then, there’s often leeway to switch and swap simple colors and mood graphics and measure their effects.
Unlike the more tangibly-measured components such as buttons, color tests are usually done gradually, and simple metrics like bounce rate and time on page might be the leading metrics for making a decision about color.
Connect with our data scientists and engineers, and ask questions or just hang out with other cool folks that believe in an experimentation culture!
Across marketing, engineering, customer experience, and virtually every discipline, there are tons of reasons a company would run an A/B test. For marketing, testing web page and landing page variants is a common use case, whereas engineering teams might focus on validating product hypotheses or measuring the impact of new features.
Regardless of the use case, an A/B test is truly only used for one simple thing: To compare two (or more) things and determine which one is *better.
A couple of caveats, though. First, it’s not always easy to compare two things head-to-head and get identical data. A few things can throw data off, including:
Audience: Imagine A/B testing a marketing email. Naturally, one variant will be sent to one audience and one will go to another audience. However, an audience is made up of individual people who are each unique in their own ways: While two audiences can be demographically identical, the people that compose them are not identical. The disparities in demographically-similar audiences’ behaviors can usually be smoothed out by simply increasing the sample size.
Pro tip: Use a sample size calculator to determine the right size to get statistically significant results.
Timing: Simply put, users behave differently at different times. This applies to time of day, weekday, and even seasonality (like new software being purchased at the beginning of fiscal years). When testing marketing stuff like web pages or emails, make sure the tests are happening concurrently, otherwise you will likely incur bias.
Experiment issues and data pollution: This one is a bit general, as each specific website A/B test will have its own unique variables. For example purposes, let’s say we’re changing a homepage graphic and measuring its effect on requests for a demo. If users are confused by the changes and click the request a demo button anyway, or if the demo button itself isn’t configured properly, the results will be polluted and unusable.
External factors: Numerous things outside of the market norm can interfere with the results of your A/B test, including economic issues, competitors running a sale, political factors, media distractions (launching a test during the Super Bowl), website outages, and more. Some of these are unpredictable, whereas others can be accounted for with a little bit of research ahead of time.
Other user biases: Simply put, there is a myriad of seemingly-small factors that can interfere with test data. Statsig uses CUPED to deliver laser-accurate, unbiased results. We’re always happy to describe how CUPED works, and how Statsig users benefit from it.
Also, a website A/B test is a way to compare two page designs and determine which one is better, but “better” is a subjective term. It is up to the experimenter to determine what “better” actually means in the context of each experiment.
Again, common metrics for website A/B tests are things like conversion rate, bounce rate, time on page, pageviews per user, and so on. However, sometimes making changes may produce adverse effects.
For example, if your designers cook up the most compelling web page ever, and every user who lands on it instantly requests a demo of your product, that means your pageviews per user and time on page are going to decrease. It’s up to you to decide if that’s a good thing.
Related: Statsig’s guide on how to approach picking metrics for an experiment.
Good news, marketers: You can run A/B and A/B/n tests on your web pages with a Statsig account. Oh, and it’s low/no code (depending on your tech stack).
Setup takes five minutes or less. Assuming you’ve already created at least two landing pages to compare, you’ll:
Create an experiment in your Statsig console: You can leave the rest of the fields blank, just make sure to designate the ID type as Stable ID. This will ensure the experiment works for logged-out users.
Add parameters to your experiment: Create a new “string” parameter modal, and name it page_url. From there, you can set the URLs that you want to test in your experiment. Note—your A/B test will automatically split traffic equally between pages unless specified otherwise here.
Start the experiment: Simple, click the start button. Be sure to copy the experiment name, as you’ll need it later.
Copy the API key: This can be found under your Project Settings screen. You’ll need this to wire up the landing pages.
Add HTML snippet into your web pages: On your website, paste your Statsig-generated code into the <head> of each of your landing pages. This will ensure that when a user attempts to load a page, they will be diverted towards whichever variant they are allocated to.
Check out the website A/B test and experimentation setup docs, which include how to monitor and interpret results, add multiple pages, and a cool little cheat sheet for Webflow users.
You're invited to create a free Statsig account! Get started today, and ping us if you have questions. No credit card required, of course.
Thanks to our support team, our customers can feel like Statsig is a part of their org and not just a software vendor. We want our customers to know that we're here for them.
Migrating experimentation platforms is a chance to cleanse tech debt, streamline workflows, define ownership, promote democratization of testing, educate teams, and more.
Calculating the right sample size means balancing the level of precision desired, the anticipated effect size, the statistical power of the experiment, and more.
The term 'recency bias' has been all over the statistics and data analysis world, stealthily skewing our interpretation of patterns and trends.
A lot has changed in the past year. New hires, new products, and a new office (or two!) GB Lee tells the tale alongside pictures and illustrations:
A deep dive into CUPED: Why it was invented, how it works, and how to use CUPED to run experiments faster and with less bias.