4 Threats that Make Email Testing Dangerous and How a Major Retailer Overcame Them


To test emails, you just send out two versions of the same email. The one with the most opens is the best one, right?


“There are way too many validity threats that can affect outcomes,” explained Matthew Hertzman, Senior Research Manager, MECLABS.

A validity threat is anything that can cause researchers to draw a wrong conclusion. Conducting marketing tests without taking them into account can easily result in costly marketing mistakes.

In fact, it’s far more dangerous than not testing at all.

“Those who neglect to test know the risk they’re taking and market their changes cautiously and with healthy trepidation,” explains Flint McGlaughlin, Managing Director and CEO, MECLABS, in his Online Testing Course. “Those who conduct invalid tests are blind to the risk they take and make their changes boldly and with an unhealthy sense of confidence.”

These are the validity threats that are most likely to impact marketing tests:

  • Instrumentation effects — The effect on a test variable caused by an external variable, which is associated with a change in the measurement instrument. In essence, how your software platform can skew results.
    • An example: 10,000 emails don’t get delivered because of a server malfunction.
  • History effects — The effect on a test variable made by an extraneous variable associated with the passing of time. In essence, how an event can affect tests outcomes.
    • An example: There’s unexpected publicity around the product at the exact time you’re running the test.
  • Selection effects — An effect on a test variable by extraneous variables associated with the different types of subjects not being evenly distributed between treatments. In essence, there’s a fresh source of traffic that skews results.
    • An example: Another division runs a pay-per-click ad that directs traffic to your email’s landing page at the same time you’re running your test.
  • Sampling distortion effects — Failure to collect a sufficient sample size. Not enough people have participated in the test to provide a valid result. In essence, the more data you collect, the better.
    • An example: Determining that a test is valid based on 100 responses when you have a list with 100,000 contacts.


A look at a real-world example

Matthew had to mitigate these validity threats when he was charged with testing a series of holiday-themed emails for a major retailer. This meant he and his team had to complete 20 email tests in a month that were sent to a list of more than 2 million. The short time frame would quickly diminish the chances of history and selection effects and would make it easier to identify instrumentation and sampling distortion effects.

The goal of tests was to find out:

  • When urgency mattered most — one set of emails had a countdown banner across the top, the other didn’t
  • If increasing the frequency would increase unsubscribes

“It was just incredible to see so many tests launched so quickly,” says Matthew.

He admitted it took detailed planning, strategizing and communication. Matthew’s team made sure the players who approve communication — including legal — were on board with everything, including:

  • Testing hypothesis
  • List of who would receive the emails
  • Products being promoted
  • Incentives
  • Time frame of the sends
  • Email content and designs
  • Potential insights

To achieve buy-in, they began the process several weeks in advance, and the planning paid off. Thanks to this intense effort, the retailer learned that:

  • Emails without a countdown banner performed better two weeks before the holidays, indicating when to lay off the urgent message.

“Test results indicated that prospects likely know the holidays are coming up fast, so we can replace the urgency message with something that would resonate better,” explains Hannah Morrell, Research Analyst, MECLABS, who worked on the project with Matthew.

  • The increased number of emails did not cause more people to unsubscribe from the mailing list during the most important season for retailers, but it did produce more sales.

Most importantly, because of Matthew’s efforts to ensure the highest validity by launching so many tests in a short time frame, the retailer can confidently use these insights to inform future holiday campaigns.


You might also like

MarketingSherpa Email Summit 2015

MECLABS Online Testing Training Course

Lead Gen: 17% lift in lead capture by including more details in email [Email Summit 2014 live test] [MarketingSherpa case study]

Determining if a Data Sample is Statistically Valid [MarketingSherpa slideshow]

Optimization Testing Tested: Validity Threats Beyond Sample Size

You might also like

Leave A Reply

Your email address will not be published.