Also calculates the contingency coefficient, phi, and kappa statistics. The chi-square test is used to compare the distributions of two independent samples. It is used with data in the form of frequencies. It is a nonparametric procedure that makes no assumptions about distribution shapes, variances, or levels of measurement.

D Statistics: Opens the Crosstabs: Statistics window, which contains fifteen different inferential statistics for comparing categorical variables. E Cells: Opens the Crosstabs: Cell Display window, which controls which output is displayed in each cell of frre crosstab. Note: in a crosstab, the cells are the inner sections of the table.

They show the number of observations for a given combination of the row and column categories. There are three options in this window that are useful but optional when performing a Chi-Square Test of Independence:.

This option is enabled by default. F Format: Opens the Crosstabs: Table Format window, which specifies how the rows of the table are sorted. In the sample dataset, respondents were asked their gender and whether or not they were a cigarette smoker. There were three answer choices: Nonsmoker, Past smoker, and Current smoker.

Before we test for "association", it is helpful to understand what an "association" and a "lack of association" between two categorical variables looks like. One way to visualize this is using clustered bar charts. Let's look at the clustered bar chart produced by the Crosstabs procedure. This is the chart that is produced if you use Smoking as the row variable and Gender as the column variable running the syntax later in this example :.

The "clusters" in a clustered bar chart are determined by the row variable in this case, the smoking categories. Squafe color of the bars is determined by the column variable in this case, gender.

The height of each bar represents the total number of observations in that particular combination of categories. This type of chart emphasizes the differences within the categories of the row variable. Notice how within each smoking category, dkwnload heights of the bars i. That is, there are an approximately equal number of male and female nonsmokers; approximately equal number of male and female past smokers; approximately equal number of male and female current smokers.

Sofwtare there were an association between gender and smoking, we would expect these counts to differ between groups in some way. The first table is the Case Processing summary, which tells us the number of valid cases used for analysis.

Only cases with nonmissing values for test smoking behavior and gender can be used in the test. Recall that the column percentages of the crosstab appeared to indicate that upperclassmen were less likely than underclassmen to live on campus:.

The clustered bar chart from the Crosstabs procedure can act as a complement to the column percentages above. Let's look at the chart produced by the Crosstabs procedure for this example:. The "clusters" are formed by the row variable in this case, class rank. This type of chart emphasizes the differences between the underclassmen and upperclassmen groups.

Here, the differences in number of students living on campus versus living off-campus is much starker within the class rank groups. Only cases with nonmissing values for both class rank and living on campus can be used in the test. The next table is the crosstabulation. If you elected to check off the boxes for Observed Count, Expected Count, and Unstandardized residuals, you should see the following table:.

With the Expected Count values shown, we can confirm that all cells have an expected value greater than 5. Twst this Guide Search. This test is also known as: Chi-Square Test of Association. Common Uses The Chi-Square Test of Independence is commonly used to test the following: Statistical independence or association between two or more categorical variables.

Data Requirements Your data must meet the following requirements: Two categorical variables.

Two or more categories groups for each variable. Independence of observations. There is no relationship between the subjects in each group. The categorical variables are not "paired" in any way e. Relatively large sample size. Expected frequencies for each cell are at least 1.

