Bivariate Tables and Cross-Tabulation

Learning Objectives

  1. Construct a bivariate table
  2. Understand and distinguish between direct, indirect, spurious and conditional relationships
  3. Familiarize yourself with the concept of elaboration including its uses and limitations

Key Terms

Bivariate table: a table that illustrates the relationship between two variables by displaying the distribution of one variable across the categories of a second variable
Cross-tabulation: A technique used to to explore the relationship between two variables that have been organized in a table
Column variable: a variable whose categories comprise the columns of a bivariate table
Row variable: a variable whose categories comprise the rows of a bivariate table
Cell: the intersection of a row and a column in a bivariate table
Marginals: the row and column totals in a bivariate table

Overview

Cross tabulation allows us to look at the relationship between two variables by organizing them in a table. This is called bivariate analysis. The easiest, most straightforward way of conducting bivariate analysis is by constructing a bivariate table. We generally refer to bivariate tables in terms of rows and columns. In other words, a table with two rows and two columns would be a 2 x 2 table. By convention, the independent variable is usually placed in the columns and the dependent variable is placed in the rows. Rows and columns intersect at cells. The row totals are found along the left side, and the column totals are found along the bottom. These areas are called marginals.

Bivariate analysis allows us to answer two questions:

  1. Is there a relationship between the two variables?
  2. If so, what is the pattern or direction of the relationship?

In the example below, we are going to see if there is a relationship between the authoritarianism of bosses and the efficiency of the workers in 44 different offices. In other words, we're going to see if there is a relationship between how big of a jerk a given boss is and how hard his or her employees work. We've broken the bosses into two categories: low authoritarianism (totally chill) and high authoritarianism (overbearing jerk). Similarly, we've broken down the workers according to efficiency (high and low).

Worker Efficiency and Workplace Authoritarianism

  Low Authoritarianism High Authoritarianism Total
Low Efficiency 10 12 22
High Efficiency 17 5 22
Total 27 17 44

Since the bosses' authoritarianism is our independent variable, we put that in the columns. Employee efficiency goes in the rows. The row and column totals are displayed in the respective marginals. Displaying our data in terms of raw scores is all well and good, but the differences in the number of workers who fall into each group (there are 27 employees who work in low authoritarianism environments compared to 17 who work in high authoritarianism environments) makes direct comparison impossible. In order to make legitimate comparisons between the two groups, we need to calculate the relative frequency for each (also known as the column percentages). We always calculate percentages according to the variable in the column, as that is our independent variable. Let's calculate column percentages for the low authoritarian employees first. There are a total of 27, with 10 falling into the low efficiency category, and 17 falling into the high efficiency category. In order to figure out percentages, we need to divide each (10 and 17) by the column total (27).

10/27 = 0.3704; 0.3704 * 100 = 37.04 percent
17/27 = 0.6296; 0.6296*100 = 62.96 percent

These numbers tell us that of the 27 employees who work in low authoritarian environments, more than 60 percent of them are highly efficient workers. Now let's do the same thing for the employees in high authoritarian environments:

12/17 = 0.7059; 0.7059*100 = 70.59 percent
5/17 = 0.2941; 0.2941*100 = 29.41 percent

Worker Efficiency and Workplace Authoritarianism

  Low Authoritarianism High Authoritarianism Total
Low Efficiency 10 (37.04%) 12 (70.59%) 22 (50%)
High Efficiency 17 (62.96%) 5 (29.41%) 22 (50%)
Total 27 (100%) 17 (100%) 44 (100%)

Notice that the percentages in our columns add up to 100 percent. If that's not the case, we've done something wrong.

Of course, the relationships between variables usually aren't that simple. On the rare occasions where an independent and dependent variable are the only two variables explaining a relationship, it is said that the relationship is direct. In most cases, however, there are other factors that affect the relationship between the independent and dependent variables. Such relationships are said to be indirect.

Two examples of indirect relationships are spurious relationships and intervening relationships. A spurious relationship is a relationship in which both the independent and dependent variables are affected by a third variable that explains away any apparent link between them. Think about the relationships between firefighters and property damage. If data indicated that the number of firefighters sent to a fire was positively correlated with property damage (i.e. lots of firefighters = lots of property damage), we might be tempted to conclude that firefighters cause property damage. But we know there is a third variable with which the number of firefighters and the amount of damage are correlated: the size of the fire. This relationship is spurious because the size of the fire affects both the number of firefighters called and the property damage.

An intervening relationship occurs when a third variable comes between the independent and dependent variables and functions almost like a chain reaction. In such a scenario, the independent variable affects a mediating variable, which in turn affects the dependent variable. Consider the relationship between education and longevity. Quite a few studies have established a strong correlation between an individual's education level and how long her or she lives. What's less clear, however, is why. It's possible that the relationship between these two variables is direct, such that highly educated people make better decisions regarding their health. But it's also possible that the relationship in question is indirect. In that case, an individual's level of education could affect his or her income, which could then affect his or her health. One way to determine which of these two theories is correct would be to control for income. If, for instance, we were to compare only individuals with the same level of income and the relationship between education and longevity were to disappear, we could safely conclude that this is an example of an indirect (or intervening) relationship.

Elaboration

Elaboration is a process designed to further explore bivariate relationships by introducing additional variables called control variables. The data below come from 20 fires that the fire department was called in to put out. Ten were small fires, and 10 were large. By looking at the percentages, we might be tempted to conclude that the firefighters caused the property damage.

Number of Firefighters Called and Damage Caused (n=20)

  Few Firefighters Many Firefighters Total
Low Damage 7 (70%) 3 (30%) 10
High Damage 3 (30%) 7 (70%) 10
Total 10 (100%) 10 (100%) 20

We can elaborate on our data by controlling for the size of the fire. To accomplish this, we need to construct two partial tables based on the size of the fire. All of the small fires go in one table, while all of the large fires go in the other. Note that the independent and dependent variables remain the same throughout the elaboration process.

Small Fires (n=10)

  Few Firefighters Many Firefighters Total
Low Damage 4 (80%) 5 (100%) 9
High Damage 1 (20%) 0 (0%) 1
Total 5 (100%) 5 (100%) 10

Large Fires (n=10)

  Few Firefighters Many Firefighters Total
Low Damage 0 (0%) 1 (20%) 1
High Damage 5 (100%) 4 (80%) 9
Total 5 (100%) 5 (100%) 10

By splitting our one large table into two smaller tables based on the size of the fire, we can see there is no direct causal relationship between the number of firefighters and property damage. The size of the fire affects both.

Limitations of Elaboration

Elaboration can be useful, but it also has its limitations. First, it tends to be a little bit tedious, especially if you're doing it by hand. Second, it's not the most precise form of analysis. Elaboration allows you to compare the distribution of one variable across the categories of another, but there are other measures of association that do a better job of quantifying the relationship between two variables.

Main Points

  • A bivariate table displays the distribution of one variable across the categories of another variable. The independent variable usually goes in the columns, while the dependent variable goes in the rows. Rows and columns intersect at cells. The row and column totals of a bivariate table are called marginals.
  • Bivariate relationships come in several different flavors. When the variation in the dependent variable can be attributed only to the independent variable, the relationship is said to be direct. When a third variable affects both the independent and dependent variables (think of the firefighter example) the relationship is said to be spurious. When the independent variable affects the dependent variable only by way of a mediating variable (sort of like a chain reaction), it is said to be an intervening relationship.
  • Elaboration is an effective (albeit somewhat tedious) means of weeding out spurious and intervening relationships.

Bivariate Tables in SPSS

Bivariate tables are known as crosstabs (short for cross-tabulations) in the world of SPSS. To generate one, click "Analyze," "Descriptive Statistics," and then "Crosstabs." You will need to put one variable in the "Rows" box and one in the "Columns" box. Generally speaking, the independent variable should go in the columns and the dependent variable should go in the rows. If you'd like to include percentages in your table, click on the "Cells" button, which will give you the option of choosing "Row," "Column," and "Total" percentages. I generally only choose one of the three, as clicking all three makes for a large (and rather confusing) table. To test the strength of the relationship, click "Statistics." You should be careful to choose only statistics that are appropriate for the variables' levels of measurement. Here's yet another video walkthrough. It includes how to make a bivariate table and how to elaborate by a third (control) variable:

Exercises

  1. Using the World Values Survey data, make a bivariate table exploring the relationship between marital status ("MARITAL") and health ("HEALTH"). Treat marital status as the independent variable and health as the dependent variable. Calculate column percentages as well as any statistics that you think might be helpful. Interpret your findings. Does there appear to be a relationship between the two variables?
  2. Elaborate the relationship between marital status and health by gender. Does considering men and women separately change what you observed in your answer to Question #1?
  3. Using the New Immigrant Survey data, create a bivariate table exploring the relationship between religion ("RELIGION") and the region of the world from which the respondent migrated ("REGION"). Interpret your findings. Does there appear to be a relationship between the two variables?