The Basics: An Introduction to Statistics
- Define and distinguish between independent and dependent variables
- Define and distinguish between nominal, ordinal and interval/ratio variables
- Acquaint yourself with the layout of SPSS and the various functions accessible through the toolbar
Variable: the basic unit of statistics
Population: every conceivable member of the group we're studying
Sample: a small subset of our population of interest
The basic unit of statistics is the variable. Variables are properties or characteristics of some event, object, person, place or thing that are measurable and can take on more than one value, or vary. There are several different ways to conceptualize variables in the wide world of statistics, and we'll talk about a few of them below.
Independent vs. Dependent Variables
One way to think of variables is in terms of being either independent or dependent. In the social sciences, independent variables are typically thought of as being the cause, and dependent variables are often seen as being the effect. The independent variable, in other words, affects the dependent variable in some way. When the CIA used LSD in Cold War-era mind control experiments, the amount of LSD ingested by the participant was the independent variable (cause) and his or her state of mind (i.e., the intensity and duration of his or her hallucinations) was the dependent variable (effect). This example also illustrates the importance of time order. In other words, the independent variable must precede, or come before, the dependent variable. Most of us would need to take some sort of mind-altering substance before we would start tasting sounds and/or seeing purple elephants dance on our ceilings, not the other way around.
Sometimes, however, the causal relationship between two variables can be unclear. Consider the relationship between religious affiliation and attitudes toward abortion. Which is the independent variable? Let's say, for example, that members of a given faith are opposed to abortion. Can we assume that the church's teachings influenced its members' attitudes toward abortion? Or did the members choose that church because they were already opposed to abortion? When two phenomena appear to be occurring at the same time, social scientists need to come up with their own theories about how the two variables interact. This means that a variable might be either a dependent or independent variable, depending on the particular theory of the social scientist. Other variables, however, are always going to be independent. Race is one such variable, as there are no social variables (income, education, religion, etc.) that determine an individual's race. It is an ascribed status.
Continuous vs. Discrete Variables
Continuous variables can take on any number of values, whereas discrete variables are limited in the number of values they can take on. Height is an example of a continuous variable, while number of children is an example of a discrete variable. It's possible for someone to be 1.23 meters tall, but it's impossible for someone to have 1.23 children.
Levels of Measurement
Not all variables are created equal. Some only serve to distinguish one category from another, while others are numbers that represent an exact quantity. Being able to distinguish between the three levels of measurement is an important first step in determining the best way to analyze your data.
A nominal variable is a strictly categorical variable. Nominal variables describe categories that are qualitatively different from one another, but cannot be organized in any meaningful order. Examples of nominal variables include colors. Red and green are clearly different from one another, but they cannot be ranked in any objective way (red is not higher, bigger, or more "color" than green). Other examples include political party, gender, religion and race. Nominal variables are said to be the lowest level of measurement in that they do not contain much information and lend themselves to relatively few statistical analyses.
Like nominal variables, ordinal-level variables are composed of categories. Unlike those of nominal variables, however, the categories that comprise an ordinal variable can be put in a logical order. Military ranks are a good example of ordinal variables. Captain is definitely higher than private, but—and this is a characteristic shared by all ordinal variables—the distance between the two can't be determined. In other words, you can't subtract a lieutenant from a captain and get a private. Another example of an ordinal variable is the Likert Scale, which is often used in surveys. Likert Scales will often provide a statement like "I think Nancy Grace seems nice" and ask the respondent to mark a category ranging from "Strongly Agree" to "Strongly Disagree." Ordinal variables allow for more detailed analyses than do nominal variables, but the fact that you can't do any real math with them (sergeant + sergeant = ???) is still fairly limiting.
Finally, we have our interval/ratio-level variables. An interval/ratio variable, also known as a scale variable, is a variable with an exact number or quantity attached. With these variables, we can compare values not only in terms of which is larger and smaller, but also in terms of how much larger or smaller one is compared to another. Height is an example of an interval/ratio variable. Someone who is 6 feet tall is exactly six inches taller than someone who is 5'6". Generally speaking, if you can add it, subtract it, multiply it or divide it, it's an interval/ratio variable. Having said that, you will occasionally come across numbers that must be treated as nominal data. Your student ID number is one such number, as are the numbers on the backs of football jerseys. Rather than representing a certain quantity, these numbers serve only to distinguish between individuals. In other words, they don't tell you anything about the distance between two values. You cannot subtract a quarterback (#8) from a wide receiver (#80) and get an offensive lineman (#72).
A Final Word on Variables
Before we move on, it is important to note that the different types of variables we've covered are not mutually exclusive. In other words, it's possible to conduct a study in which the independent variable is both discrete and nominal while the dependent variable is both continuous and interval/ratio.
Population vs. Sample
As far as social statistics are concerned, the term "population" refers to every possible member of a group we're studying, while the term "sample" refers to a small subset of that group. Suppose I'm interested in studying Americans' mean income. One way of doing that would be to collect income data from everyone in my population interest—all 300 million Americans. Since getting every single man, woman and child in the United States to fill out a survey about their income would be all but impossible, it would make more sense to gather income data from a much smaller group (i.e., a sample) and then use that data to estimate the mean income of all Americans (i.e., a population). The people in my sample are sometimes referred to as either "cases" or "observations." The process of using sample data to estimate the characteristics of a population is something with which you'll become very comfortable during our section on inferential statistics.
A variable is the basic unit of statistics. Variables must take on more than one value, and they must be measurable. Variables can be conceptualized in several different ways:
- Independent variable (cause) vs. dependent variable (effect)
- Discrete (can only take on a limited number of values) vs. continuous (can take on an infinite number of values)
- Levels of Measurement
- Nominal (aka categorical) such as race, religion, favorite flavor of ice cream
- Ordinal, such as military ranks and Likert scales (Strongly Agree, Agree, etc.)
- Interval/Ratio (aka scale) such as income and GPA
- These different ways of looking at variables are NOT mutually exclusive. It is possible for a variable to be simultaneously independent, discrete and interval/ratio.
Use the NIS dataset to answer the following questions. Should you have any SPSS-related issues, please refer to the previous section, as most of the procedures necessary to answer these questions are addressed in the final video walkthrough.
- How many nominal-level variables are in the dataset? List them.
- What level of measurement is "ENGLISH?" How many categories does it have?
- How many years of education does respondent #63 have?
- Is respondent #17 male or female?