Simple Tests, Significant Gains

How our partner increased revenue by 130% with small changes

By Editorial Staff On May 29, 2008

Research has shown a massive difference in conversion and return on investment between companies that test their online communications and those who don’t; so why doesn’t every online business have a regular testing program?

The fact is that many marketers are confused about where to begin, and how. Some that do test don’t do it often enough, or are not sure how to interpret the results.

With so many potential areas to test, including landing pages, pay-per-click ads, and emails, marketers need to know:

Which tests provide the biggest potential return on investment,
How to structure tests for consistency and accuracy, and
How to continue improving test results and duplicate success.

We’ve developed a solid framework for basic online testing that’s enabled us to help many partners realize significant gains in conversion and revenue.

We hope this overview of the process helps marketers who have not yet implemented a formal program as well as those looking for such a framework to guide and improve their efforts.

For extensive training and professional certification in the fundamentals of online testing, we recommend interested marketers consider our Fundamentals of Online Testing course.

The audio from the Web clinic on this topic can be downloaded here:

Simple Tests, Significant Gains: How our partner increased revenue by 130% with small changes

CASE STUDY

We recently completed a comprehensive, two–year testing program for one of our research partners, Encyclopædia Britannica Online. The goal was simple: Increase net revenue per unique visitor to their reference material.

Our strategy focused on three metrics:

Increase visits,
Increase free trials, and
Increase conversion from free trials to paid subscriptions.

Ten tests of varying complexity were conducted over the two–year period, but three tests in particular illustrate the impact of testing small changes through the relatively simple process of single variable, A/B split testing.

Example Test #1

This test changed the color and copy on the Call–to–Action button on the article page and added a simple email capture field for lead generation and basket–recovery purposes:

Control version before changes“Winning” Treatment

The Treatment shown resulted in 30% fewer clicks due to the Friction and Anxiety factors of adding the email capture field, but the net result was 13% more free trials per unique visitor, 61% more free trials per clickthrough, and 24% more paid subscriptions per unique visitor.

Example Test #2

In this test Encyclopædia Britannica Online tested the addition of a Hacker Safe logo to the offer page. It delivered a 13.2% gain in take–up of the free trial:
Control offer pageTreatment with new logo

Example Test #3

In this test, removing a whole page from the subscription registration path gave Encyclopædia Britannica Online a 117% lift in conversion to free trials, a 61% lift in paid subscriptions, and a 130% lift in revenue.

What you need to understand: Simple A/B tests that made relatively minor changes in three key areas significantly increased subscriptions and revenue.

A structured program can help marketers easily demonstrate the value of optimization testing. Start by testing elements with the biggest potential return on investment (ROI); for example:

Calls–to–action,
Forms, and
Length of transaction process.

The Concept of Utility

At MarketingExperiments, we apply the concept of “utility” to our testing process.

What do we mean by utility? A testing process that is useful, practical, and predictive.

The utility of an online testing process is directly related to five key elements. At MarketingExperiments we express it in this patented index:

Wherein:
u = Utility
q = Research Question
t = Treatment
m = Metric System
v = Validity Factor
i = Interpretation

In other words, ask the right question, measure the right thing, perform the right test, and learn from the results.

We recommend that marketers interested in establishing a formal online testing program incorporate these elements into a testing protocol.

The Testing Protocol

#1: Start with the right research question.

In our recent Web clinic on this subject, more than 60% of the audience said that the very first step in successful online testing—asking “the right research question”—is their biggest challenge.

All single factor A/B split tests begin with “which?”

Marketers may start with the interrogative “what?” or “why?” when they are brainstorming on an informal basis, but their research question must progress to “which?” in order for it to be specific enough to be used in a formal test.

For example, “What is the best headline?” becomes “Which headline will convert best?” and “What is the best price?” becomes “Which of three price points is best for this product?”

#2: Develop the Treatment(s).

A variable is the general element you intend to test. Headline, price, and copy are all example of variables, and there are dozens more. Again, one key to successful A/B testing is knowing which one you want to single out.

A value is the specific option (content, size/amount, color, position) of the Variable you are going to test.

The test Control—“A”—will be your current page, ad, or email.

The test Treatment or Treatments—“B” or “C”— are the pages, ads, or emails containing the new values of your variable.

Values become Treatments. When a new Treatment performs better (better than the Control, better than another Treatment if there is more than one), it becomes the new Control in subsequent testing.

#3: Determine the right metrics.

Your main objective will be to determine the metrics that will best answer your primary research question (the “which?”), though secondary metrics may also provide insight.

First, establish what you are trying to accomplish. Are you trying to attract more subscribers? Capture more qualified leads? Obtain a higher conversion rate? Produce a better ROI on a pay–per–click (PPC) campaign?

After establishing what you want to achieve, choose the element you wish to test in order to select the appropriate metric. There are essentially four elements you can measure:

The amount of activity on your site,
The source of that activity,
The nature of that activity, and
The results of that activity.

Use organizing questions to help you choose the appropriate metric; for example:

QuestionMetricWho visited my page?Number of unique visitors (UV)

Where did they come from?	Referring URL, organic search, PPC ad
What did they do/view?	Page views per visitor; clicks
What did they buy or sign up for?	Number of orders, average amount, total revenue

#4: Check validity, starting with sample size sufficiency.

For confidence in your findings, test results must be statistically meaningful. In other words, there has to have been enough difference in results to declare a clear winner.

Typically, researchers will tolerate no more than a 5% chance that the test results are due to random variation. This is referred to as “5% significance level” or “95% confidence level.” For example, you would want to be at least 95% confident that the results of your test will detect a difference in clickthrough rate of 0.5% or more.

Key point: as sample size (n) increases, Margin of Error (E) declines.

Check for four additional, major validity threats.

There are a number of threats to the validity of testing, but these four threats are common in online testing:

History effects,
Instrumentation effects,
Selection effects, and
Sampling distortion effects.

You must ensure that your test results have not been skewed or invalidated by these threats.

History effects. The effect on a test variable by an extraneous variable associated with the passage of time. Industry events, news events, and holidays are all examples of history effects that could skew your test results.

Instrumentation effects. The effect on a test variable by an extraneous variable associated with a change in the measuring instrument. Having a server go down in the middle of your test is one example of an instrumentation effect.

When reviewing your test protocol to exclude this threat, ask: Did anything happen to the technical environment, measurement tools, or instrumentation that could have significantly influenced the results?

Selection effects. The effect on a test variable by an extraneous variable associated with different types of subjects not being evenly distributed between experimental treatments.

Mixing traffic sources can skew results. A great landing page created for one specific channel—for example, a PPC ad with great copy related to a specific organic search term—can do poorly when presented to visitors coming from a different PPC ad for a completely different product or service.

Sampling distortion effects. The effect on the test outcome caused by failing to collect a sufficient number of observations.

Depending on the amount of traffic your Landing Page normally gets, the number of email addresses you have to send your new email to, or the clicks you normally get on your existing PPC ad, it may take days, weeks or even months to reach the number of observations necessary to achieve testing validity.

Editor’s note: For detailed instruction on validity and sample size, we recommend our Fundamentals of Online Testing course. A testing protocol tool that calculates sample size and estimates how long it will take to achieve it is included in the course.

#5: Correctly interpret the results.

If results are valid, determine the impact and decide what your next test should be. Remember, the “winning” Treatment becomes the Control for your next test.

If results are invalid or inconclusive, consider re–running the test, but keep in mind that even invalid or inconclusive tests can provide value when properly analyzed. Learn from your results:

Expertise is cumulative knowledge. Keep careful records of tests and results. When results are surprising, look for hidden sources of error or insight.
Lack of difference is meaningful. That changing the variable(s) in the way that you did had little effect on performance is valuable knowledge. This may be the single most valuable insight you gain from an inconclusive test.
Try stratification. Try filtering the results, looking for patterns (days of the week, hours of the day, etc.). You may find a clear statistical “winner.”
Note alternative or secondary conclusions. Look for patterns of performance, even those that identify things that you definitely DON’T want to do.

Another option: Radical redesign

Sometimes simple tests are just not enough. If an A/B test is inconclusive, consider a radical redesign instead of testing the A/B values again. The results can help you single out areas with high ROI potential, which you can then optimize, test, and refine using the A/B format.

Let’s look at a very successful example.

We recommended a radical redesign of this Encyclopædia Britannica Online page:

To the untrained eye, this page looked very well designed—and it was if you looked at it from a purely graphic or copy–oriented standpoint.
But to our optimization experts it was a disaster in terms of eyepath, distracting use of the images, the offer’s bullet points disappearing into the right column, and more.

Here’s the radical redesign:

What you need to understand: This optimized offer page along with other optimization steps taken in the subscription path meant cumulative gains for our partner:

125% more free trials per unique visitor,
65% more paid subscriptions per unique visitor, and
53% increase in total revenue.

The radical redesign of the initial offer page was a critical piece of the overall subscription path tests.

Summary

Even small changes can result in a big impact. Test areas with the biggest potential return on investment (ROI) first.
Use a formal testing protocol to ensure all tests are performed consistently and accurately.
Ensure you have enough traffic and data so the test is valid and you have confidence in the results, but keep in mind that even invalid or inconclusive tests can provide value when properly analyzed.
Determine the impact of tested changes, apply your knowledge, and test again. Try radical redesign tests. Multivariable tests may produce positive results faster than changing one or two page elements and can help you identify the best elements for A/B testing.

Related Marketing Experiments Reports

As part of our research, we have prepared a review of the best Internet resources on this topic.

Rating System

These sites were rated for usefulness and clarity, but alas, the rating is purely subjective.

* = Decent | ** = Good | *** = Excellent | **** = Indispensable

Credits

Managing Editor Hunter Boyle

Copy Editor Frank Green

Writer — Peg Davis

Contributor(s) — Flint McGlaughlin
Bob Kemper
Jimmy Ellis
Ana Diaz

HTML Designer — Cliff Rainer
Mel Harris

Email Designer — Holly Hicks

Special thanks to Joe Miller,
Encyclopædia Britannica Online