What do you get when you divide Jacksonville Beach, Fla. by Arden Hills, MN? I’m sure there’s a punch line in there somewhere. However, if you were tracking your customers’ ZIP codes in a database you would have 32250/55112, or 0.585.
Never mind that it doesn’t make any sense to you and me to divide one ZIP code by another, but a statistical software package is happy to do exactly that for us. Most software just isn’t smart enough to realize that each ZIP code holds a discrete meaning from the next. It sees them as numbers: values which can be sorted in order and used in any type of calculation.
That is why researchers and statistical software packages classify variables into four main types: Nominal, Ordinal, Interval and Ratio.
In this post, I’m going to describe each type of variable to help you understand how they should be used, let you know how this can help improve your data collection … and, while we’re at it, help you sound sharp the next time you’re chatting with your data analyst at the water cooler.
Nominal Variables: Used to describe categories
Variables are classified by the structure of what they represent. For example, ZIP codes are an example of a Nominal variable, a categorical name which simply allows us to differentiate between groups.
Gender and Ethnic group are other common examples of this type. Only a limited number of statistical analyses are valid for this type of variable. We can count how many customers have each ZIP code, and compare the counts to see what is most common (Statisticians call this most frequent value the Mode).
We cannot “average” their ZIP codes to determine a population center, or calculate correlations between ZIP code and a customer satisfaction index because there is no real meaning to a “higher” or “lower” numerical ZIP code.
If we wanted to know about geographic patterns in customer satisfaction, we would have to take the average satisfaction index for each ZIP code and compare those averages to one another. Browser type and operating system are two other common Nominal variables.
Word of Caution – This first one seems obvious, but keep in mind it is an easy oversight to have a number in a spreadsheet or database inadvertently become part of a calculation.
Ordinal Variables: Used to rank preference
The next level of complexity is represented by the Ordinal variable. Ordinal variables are sequential; they advance in a direction but the increments on the scale are unknown or uneven.
For example, the organizational chart of a company might show that the mailroom attendant is below the marketing analyst, and he in turn is below the vice president, who is below the president. There is a clear direction, but the relationship between ranks is not consistent.
In marketing research, consumers sometimes rank new products in order of preference. They do not necessarily like product 1 twice as much as product 2, or 3 twice as much as 4. So when analyzing the data from the test, a researcher can find the Mode, or calculate the middle ranked item (the Median), but it is not valid to calculate the “average rating” given to an item. Because the distance between items on the scale is unknown it is not possible to really tell an average value.
Calculations such as addition and multiplication can be done with ordinal data, however any calculation made on one must be consistently made on all items in the data set, in order to maintain the proportions and order of all members of the data set.
Word of caution – One common survey scale is the Likert scale, which allows respondents to rate their agreement with statements on a 5- or 7-point scale from “Strongly Agree” to “Strongly Disagree.” Because there is no way to know the difference between “Strongly Agree” and “Agree” in the mind of each respondent, or to ensure that each respondent is consistent in their judgments, these results are Ordinal data.
Many research studies treat Ordinal data as Interval data (more on that next), making a basic and sometimes flawed assumption that the scale represents a consistent interval between one ranking and the next. While each individual will be relatively consistent in their ratings, there is no consistency between individuals. This creates a limitation on the generalization of the results of the calculations, but this type of analysis may still offer significant insights into your data. It is important to understand that the results from such an analysis are imprecise and should only be interpreted generally, rather than by comparisons of small differences.
Interval and Ratio Data: Now we can get into the valuable number crunching
Both Interval and Ratio variables possess not only a sequence, but an even interval. Here’s where it gets tricky: the difference between the two types is zero. Yes, 0.
Interval variables may have a point which we designate “zero,” however negative numbers are theoretically possible.
A Ratio variable has a real zero point, a point which nothing can be below.
For example, an item’s price can be zero, or “free,” but price is not a Ratio value. Why? Because -$1.99, or a negative price, is conceptually possible. Take German government bonds. In a recent auction, the bonds yielded negative 0.0122%.
We try never to pay our customers to purchase our products, but theoretically, negative price has meaning. Therefore, price is an Interval variable.
Many true Ratio variables are found in marketing research. “Number of Page Visits” and “Time on Page” are common Ratio variables. The good news is that almost all statistical techniques used in marketing research can be applied to both Interval and Ratio data. Mean, Median, Mode, Correlation, Standard Deviation and ANOVA are all equally valid with both types of data.
So what does this mean for you?
When you design your experiments, think about the type of variables you will be collecting data for. Interval and Ratio variables allow the most flexibility in statistical analysis, so whenever possible try to use them rather than Ordinal or Nominal data. A survey question could ask “which of the following tasks have you undertaken in the last 24 hours?” which produces a multiple choice, Nominal, answer.
It could also ask, “Please rank these tasks from most to least recently undertaken,” which produces Ordinal data and allows some additional analysis.
Finally, the survey could ask, “At what time and date did you last undertake these tasks?” producing concrete Interval data which will allow you to compare between respondents and run in depth statistical functions.
In the design phase of your marketing tests, think about the statistical data you would like to produce, and what variable types are required to calculate the results you need in order to answer your research questions. When you enter your data into a statistical software package, be careful to designate the correct variable type in the software so that the program can prevent you from dividing Florida by Minnesota.