As I was reading a few LinkedIn discussions about multivariate testing (MVT), I began to wonder if 2010 was going to be the year of multivariate.
1,000,000 monkeys can’t be wrong
Multivariate Testing (MVT) is starting to earn a place in the pantheon of buzzwords like cloud computing, service-oriented architecture, and synergy. But is a test the same thing as an experiment? While I am not a statistician (nor did I stay at the Holiday Inn last night), working at MarketingExperiments with the analytical likes of Bob Kemper (MBA) and Arturo Silva Nava (MBA) has helped me understand the value of a disciplined approach to experimental design.
What I see out there is that a little knowledge is indeed a dangerous thing. Good intentions behind powerful and relatively easy-to-use platforms like Omniture® Test&Target™ and Google® Website Optimizer™ have generated a misleading sense that as long as a multivariate test is large enough (several hundred or more combinations being tested), at least one of the combinations will outperform the control.
This notion has become the value proposition of a growing number of companies offering services around either the big-name or their own (simpler, and often therefore easier to set up) MVT tools. They are ostensibly betting on the technology, and not on a systematic approach to experimental design or any particular UI/UX (user interface/user experience) optimization theory.
Even though, as Bob has pointed out to me, it is reasonable that an MVT setup with a billion combinations may not yield a lift over the control, my contention is that the risk-weighted business cost of a dissatisfied customer is low. Therefore, little stops the burgeoning MVT shops from safely offering a “100% lift guarantee.” Just like the proverbial million monkeys with typewriters, somewhere among thousands of spray-and-pray treatments their MVT tests are expected to produce one that’s better than the rest.
1 monkey with a stick
One major difficulty with testing in general becomes painfully obvious with MVT: the more treatments, the longer the test will run. For most companies, what looks at first like a great test may require a year’s worth of traffic to get statistically valid results.
In response, one emerging MVT service model offers getting to a “lift” faster by using adaptive elimination of likely underperformers, in exchange for the test results providing limited information beyond identifying the winner. Such test results are not as useful as their full-factorial brethren for designing subsequent tests because adaptive elimination of treatments makes it difficult to extrapolate the psychological factors and consumer preferences responsible for the test outcome. The immediate business benefits, however, are more immediate.
So, where exactly is the problem? As marketers, are we in the business of employing the scientific method to design graceful experiments or is our fiduciary duty to get measurable results? I humbly suggest that as marketing professionals, we should neither bet on nor be satisfied with just one test, no matter how successful it is.
The bad news and the good news is that we must design an experimental plan to optimize continually, to learn from preceding test results, and to respond to changes in customer preferences, market conditions, and our ability to segment data and traffic. Expertise in experimental design and understanding how to interpret results simply cannot be replaced by set-it-and-forget-it technology (yet).
Economy of testing
That is not to say that MVT provides incorrect results. The results are mathematically valid, even if they do require a long time to obtain. At the same time, from the business point of view, investment into experimental design expertise is expensive. Understanding volumes of published research consumes valuable time. The 100% guarantee sure sounds good.
And so the “guaranteed lift” offers will appeal to the spendthrift marketers who are yet to delve into the science of optimization. The critical issue in the economy of testing is whether methodical design of experiments is likely to provide greater ROI through an interpretation-driven sequence of test iterations than a successful, but terminal one-off test. Our research supports the former.
2010 may become the year of multivariate, but I hope that it will also quietly set the stage for an upcoming year of ROI-conscious design of experiments.
How do you use multivariate testing? Have you created an experimentation plan or do you rely on a series of one-off tests? Share your triumphs and concerns in the comments section of this post or start a conversation with your peers in the MarketingExperiments Optimization group.
We haven’t done much with MVT, we have been experimenting quite a bit with A/B style tests. We haven’t gotten any major ‘wins’, but we have learned a lot about how our site users react to different recipes on our pages. I think this is valuable and will be put to good use as we redesign our site.
Your note about guaranteed lift – while A/B testing is less complex than MVT I would think the same basics apply. None of these tools are magical…if you put garbage in you’ll get garbage out. We spend a lot of time planning our tests before launching.
@Andrea
Great point about learning about your site users–in the not-too-long run, this is just as valuable as an immediate lift, if you use this knowledge to develop subsequent tests.
Immediate “wins” are not a bad thing, of course. However, we have seen many times that iterative test cycle gets a better return on the effort than independent one-off tests. The exception is when the site has glaring problems, and for business reasons, it makes sense to fix them individually as quickly as possible, before returning for less drastic improvements.
In any given test, we consider results “good”, as long as we learn something form them, not necessarily because we improved a particular metric.
I am troubled when I read a promotional message such as this ‘article’. A senior manager of ‘research’ should know that research is a tool to assist obtaining understanding, which should lead to better decisions – there is no ‘guarantee of a lift’. Statistics can be elegant and/or complex, but they never guarantee business results. They are a set of tools that are useful in restricted circumstances. Using the wrong statistical technique is analogous to using a screwdriver to drive a nail – it doesn’t provide a good outcome. Statistics can be used (or misused) to reveal almost anything if the desired outcome is to show a predetermined result – but that is abusing what statistics are designed to do.
For myself, I am a veteran of over 25 years in business. A few years ago I completed 12 years in a university setting, completing my PhD and learning a lot about appropriate use of research and statistics (I also ran an MBA program!). It taught me that business abuses statistics and many marketers have little appreciation of the process of creating research that produces meaningful outcomes.
MVT (Multivariate Testing) or more correctly MVA (multivariate analysis) is basically using multiple variables (factors) simultaneously in deriving a result. There are many multivariate statistical techniques, with each being suitable for different research questions and desired answers. MVA is not directly related to volume, as implied in this article. Volume depends on the MVA technique and the number of variables, and the volume per variable depends on a number of other factors, so a direct correlation between MVA and volume is simple incorrect. MVA does require larger volumes than single variable analysis.
If this explanation is not clear to you, please ask someone who does understand. An MBA is not a statistics degree and is an incredibly inappropriate qualification to be used to justify MVT as the next best thing for business. If we follow this strategy, some ‘MVT expert’ consultants will earn a lot of money and businesses will be left holding an empty bag and someone’s career may be on the line. Worse, statistics will get a ‘black eye’ when it is an innocent victim of misuse. This article is, in my humble opinion, erroneous and in its simplicity, very misleading.
@Dr Ashley Lye
Thank you for your thoughtful comments. Having read them carefully, I found myself (and all the respective parts of my original post) in agreement with their substance.
Both you and I, for example, argue that there is no “guaranteed lift.” In fact, one of the key points of my original post is to suggest that marketers should not believe “guaranteed lift” claims. Another example is that we agree that MVT, nomenclature aside, will generally require a larger total number of samples (as you state, “MVA does require larger volumes than single variable analysis”), which results in requiring a longer time to test, given a fixed amount of traffic. We also share a concern about the misuse of statistics in marketing—in fact, I hope that would be one way to summarize my blog post in a single phrase.
At MarketingExperiments, experimental design and statistical analysis are critical to what we do, and I think the tone of my blog post should be put in the right perspective, so that it doesn’t undermine the scientific rigor of our work. The balance of my reply below is therefore aimed at the potential misunderstandings I may have created, since I am otherwise in agreement with your arguments.
The blog post was not meant—nor did I think reasonable to expect our readers to perceive it—as an academic research paper. It’s decidedly a brief opinion piece meant to give fellow marketing professionals some food for thought and share our experiences.
Since we are a for-profit business entity, a division of MECLABS LLC, I suppose it’s fair to construe everything we publish as “promotional,” since it has a business purpose. In fact, I teach at workshops about using content marketing (http://en.wikipedia.org/wiki/Content_marketing ) as a sophisticated, yet transparent strategy to establish credibility for services enterprises by sharing useful information.
I had expected that it wouldn’t be surprising that real research is being conducted under a commercial roof. Research, by a common definition, is simply a systematic investigation to establish facts. This is exactly what MEx is primarily occupied with, and you can find the results of our research typically published on our site (https://www.marketingexperiments.com ). In addition to primary research, we do offer training and consultative services based on what we have learned.
It appears also that I should have clarified the language I was using. I was indeed talking about MVT, the marketing practice (e.g., here’s an MVT tool from Omniture® http://www.omniture.com/en/products/conversion/testandtarget ), not MVA, the statistical tool. Here’s a quick primer about MVT as marketers understand it: http://en.wikipedia.org/wiki/Multivariate_testing .
Lastly, I wanted to thank you for drawing attention to my poorly constructed reference to Bob and Arturo’s credentials. I had reduced their expertise to a single acronym as a silly joke, under the assumption that our readers were largely aware that completing an MBA program (this I know first-hand), or even holding a teaching position, is not itself proof of having expertise in statistics (or in any field, for that matter). I certainly didn’t fully represent their expertise or even academic backgrounds, not to mention the decades of achieving real business successes through the application of their knowledge.
Again, thank you for sharing your thoughts, and I look forward to your continued participation on our blog.
P.S. forgive the Wikipedia references. I simply used this peer-reviewed medium to inform terms, as professional marketers (our primary audience) use them.