 This guide is intended to support the data analysis work that is an integral part of graduate coursework. It is essential to acquire a firm grasp of both descriptive and inferential statistics since they will be used for a wide array of analytical purposes.

Following presentation of ways to modify data, information specific to various descriptive and inferential statistics is provdided. The information presented in each section provides both context (when to use) and menu paths within SPSS to follow to execute various analyses. Modifying Data Graphs Validity Reliability Objectivity Differences - Parametric, Non-parametric Relationships - Parametric, Non-parametric Power & Sample size Factor Analysis

## Modifying Data

### Selecting a subset of a group prior to analysis

Frequently due to the nature of the group that measures have been obtained from, analyses on a subset of the entire group are of interest. When this is the case you first identify the subset (select cases) then proceed with the analysis.

### Selecting a subset of a group Step Summary (be certain you are in the data file (rather than output file) when you begin.

• Under the Data menu, click on select cases
• Select 'if condition is satisfied', click the if button
• In the box on the right identify what subgroup you want to select: e.g. gender = 1
• Click continue
• Click OK (at this point the only cases available in the data set are those you selected)
• Conduct the analysis of interest (e.g. central tendency, crosstabs, ....)
• When done, remember to return to the data menu and select all cases to make all cases available for subsequent analyses.

#### Spliting a File

Some of the analyses to be conducted may need to be repeated on all groups that make up a variable (e.g. gender: males/females). For example you may want to look at the correlation between exercise frequency and cholesterol level for men then for women. You could of course use the procedure above first for the males then repeat for females. However, the split file feature lets you do the two analyses at the same time.

#### Splitting a File - Step Summary

• Under the Data menu, select split file
• Select compare groups, in the open box identify the categorical/ordinal variable with the subgroups you want subsequent analyses to apply to.
• Click OK (at this point subsequent analyses will be carried out on each subgroup in the categorical/ordinal variable identified).
• When done, remember to return to the data menu and select analyze all cases

### Data Transformations

Regardless of the nature of the variable, it is often useful to condense information before reporting it. For example: Assume you collected information on years of education in 5 categories (< High School, High School, some college, Bachelor’s degree, > Master’s degree) but only wanted to report the proportion of people with no college work and those with at least some college work. You would not want to manipulate the original variable so you would first create a new variable then recode the new variable.

#### Recode Step Summary

• Under the transform menu select recode then select into different variable.
• Move the old variable into the box on the right.
• Give a name for the new recoded variable (in the output variable box).
• Click the change button
• Click the old and new variables button. Carefully identify the old values and what you want them recoded to and following each recode click the add button.
• When recoding complete press the continue button then click OK button. Don’t forget to give these recoded values value labels (done from the variable view of the data file).

### Combining information to create a new variable

In situations where you have component information and you need for example a total for each individual, a new variable needs to be created. This is easily done within the transform menu.

#### Step summary for combining information into a new variable

• Under the transform menu select compute.
• Name the new variable under the target variable box
• In the numeric expression box on the right, enter the formula for combining the information
• Click OK.

### Displaying Data Specifications

To obtain a listing of all variable information (e.g. labels, names) contained in the variable view:

• Under the file menu select File
• Select Display Data File Information
• Select Working File
• Click OK

Notice that the information produced in the output file is essentially the same as that in the variable view. The information will be displayed in two parts: the Variable Information and the Variable Values.

## Descriptive Statistics

Summarizing group information is typically the first step in the search for patterns, highlights, and meaning in a data set. Summary information can be presented both visually with the use of graphs and in the form of summary statistics. This section will focus on:

### Review of connection between measurement scales and analytical processes

This table conveys in column 2 what statistics could be used when the data is of the level of measurement listed to the left. This table does NOT convey information about what the level of measurement is for the statistics in the 2nd column. For example, percentiles are NOT interval scaled data.

 Measurement Scale Statistics/SPSS procedures Categorical Percentages: Frequencies (FDT) Crossed Percentages: Crosstabs Bar Charts: Frequencies Correlation (dichotomous variables): Crosstabs-stats (Phi) Inferential Stats: Chi Squared Ordinal Percentages: Frequencies (FDT) Crossed Percentages: Crosstabs Bar Charts: Frequencies Correlation: Correlate (Kendall) Inferential Stats: Mann Whitney, Kruskal Wallis, Wilcoxon, Friedman Interval Central Tendency: Frequencies-stats; compare means (for sub-groups) Variability: Frequencies-stats; compare means (for sub-groups) Percentiles: Frequencies-stats Histogram: Frequencies Correlation: Correlate (PPMC) Scatterplot Inferential Stats: t-tests, ANOVAs Ratio Central Tendency: Frequencies-stats; compare means (for sub-groups) Variability: Frequencies-stats; compare means (for sub-groups) Percentile & Percentile Ranks: Frequencies-stats Histogram: Frequencies Correlation: Correlate (PPMC) Scatterplot Inferential Stats: t-tests, ANOVAs

## Frequency Distribution Tables for Summarizing Group Information

For categorical and ordinal data the construction of frequency distribution tables is an excellent way to summarize group information.

If you were to make a frequency distribution table by hand you would simply list each category/value observed followed by a count (also called absolute frequency) of the number of individuals in that category. An additional column called the relative frequency is often useful since it notes the percentage of the group in a particular category. For example:

 Gender f rf Male 28 48% Female 30 52%

f: absolute frequency - count

rf: relative frequency - count/N (100) - record as %

### Frequency Distribution Tables Step Summary

To get a frequency distribution table for all cases in the data file:

• Under the analyze menu choose descriptive statistics then
• choose frequencies to get frequency distribution tables for categorical and ordinal variables. Once inside the frequencies box select the variables you are interested in then single click on the statistics or formats button to further specify what type of output you want.
• Single click on the OK button when selections complete.

To get a frequency distribution table for a subset of cases in the data file:

• Under data menu choose select cases
• Select if condition is satisfied
• Press if button and identify the subgroup needed by completing the if statement
• Press continue button
• Press OK button

With subgroup now selected:

• Under the analyze menu choose descriptive statistics then
• choose frequencies to get frequency distribution tables for categorical and ordinal variables. Once inside the frequencies box select the variables you are interested in then single click on the statistics or formats button to further specify what type of output you want.
• Single click on the OK button when selections complete.

Remember to go back through data menu to reselect all cases before starting analyses where all cases are needed.

 Note: You would not construct frequency distribution tables for continuous data when the intent is to summarize information. The reason is that such data can take on a great number of values and since each value is listed in a frequency distribution table little summary may accomplished. Measures of Central Tendency and Variability are much more useful in summarizing group information for continuous data.

### Frequency Distribution Tables for Error Checking

Following entry of data into the SPSS spreadsheet it is important to check for errors. For example, consider the variable GENDER with value labels of 1 for male and 2 for female. It is reasonable to assume that a typing error could result in entries of other than a 1 or 2. One way to detect this error is to have SPSS produce a frequency distribution table for this variable. It might look like this:

 Gender frequency Male 35 Female 41 3 6 6 2

This table makes it clear that 8 of the entries are erroneous. For six subjects the value 3 was entered for gender and for another two subjects the value 6 was entered. With the errors detected, you would use the search feature in SPSS to find these data entry errors and correct them.

### Error Checking with Frequency Distribution Tables Step Summary

To get a frequency distribution table for all variables and all cases in the data file:

• Under the analyze menu choose descriptive statistics then
• choose frequencies to get frequency distribution tables for categorical and ordinal variables. Once inside the frequencies box select all variables.
• Single click on the OK button when selections complete.
• If errors detected, click inside the spreadsheet on the variable with data entry errors. Under the edit menu choose find and provide the value you are searching for and then click on the button find next if there is more than one case to correct.

When data entry errors located, but you cannot correct them then in the variable view of the data identify that number as a missing value so SPSS does not use it in any analyses. If you identify values that appear incorrect but only for select cases, then enter a blank in place of the value you deem inappropriate in the spreadsheet view of the data. For example, consider the situation where you have obtained two heart rates. One resting and the other one minute after jogging in place. If for one of the cases the two values were 128 and 128 that seems likely to be an error since the resting heart rate is quite high and the exercise heart rate is unlikely to be the same as the resting heart rate. If you don't have access to the original data so you can re-enter the correct values then you need to make these values missing. But since 128 may be a legitimate value for some other cases you can't just assign it a missing value. You need to go into the spreadsheet, find this case and delete each 128 leaving blank cells for these two variables for this particular case.

 Note: Constructing frequency distribution tables for every variable for the purpose of error checking is important to complete prior to initiating any analytical work.

## Crosstabulation Tables for Summarizing Group Information

For categorical and ordinal data the construction of crosstabulation tables is an excellent way to cross-reference summary information for two or more variables.

If you were to make a crosstabulation table by hand you would in rows list each category/value of one variable and in columns list each category/value of a second variable. The table then would contain a count of the number of individuals in cells representing the various combinations of values for the two variables. For example, you might want to combine in one table gender (categorical) and age group (ordinal).

 Age Group 20-25 26-30 31-35 Male 28 20 15 Gender Female 30 18 20

From this table you can see that 28 of the subjects were male and in the youngest age group, and 18 of the subjects were female and in the middle age group.

### Crosstabulation Tables for Summarizing Group Information Step Summary

• Under the analyze menu choose descriptive statistics then choose crosstabs to crosstabulate categorical information.
• Once inside the crosstabs box select the row and column variables then single click on the cells or formats buttons to further specify what type of output you want.
• Single click on the OK button when selections complete.

Step Summary to break down by a 3rd variable.

• Under the analyze menu choose descriptive statistics then choose crosstabs to crosstabulate categorical information.
• Once inside the crosstabs box select the row and column variables and place the 3rd variable in the 'Layer' box. Then single click on the cells or formats buttons to further specify what type of output you want.
• As an example, if you want to know what percent of the fall 10, transfer students, are male you would put the variable semester in the layer box, transfer status in the row box, and gender in the column box.
• Single click on the OK button when selections complete.

 Note: You would not construct crosstabulation tables for continuous data when the intent is to summarize information. The reason is that such data can take on a great number of values and each value would be listed in a crosstabulation table. Therefore little summary may be accomplished. Measures of Central Tendency and Variability are much more useful in summarizing group information for continuous variables.

Risk Odds Ratio

Crosstabulation of two dichotomous variables where one represents the presence/absence of a disease or outcome and the other variable represents the presence/absence of a risk factor enables you to obtain the risk odds ratio statistic.

• Under the analyze menu choose descriptive statistics then choose crosstabs to crosstabulate categorical information.
• Once inside the crosstabs box select the row (outcome) and column (risk factor) variables then single click on the cells or formats buttons to further specify what type of output you want.
• Click on the statistics button, click the 'Risk' box, click continue.
• Single click on the OK button when selections complete.

## Central Tendency & Variability

Measures of central tendency summarize data by identifying where the center of a distribution of scores is. Measures of variability summarize data by quantifying the spread or dispersion of scores around the center.

For categorical and ordinal data with few categories, the Mode (though not an optimal measure) is an acceptable measure of central tendency and the range is an appropriate measure of variability. Frequently however, such data is best summarized with a frequency distribution table.

For data at least interval scaled, the Median and Mean are appropriate measures of central tendency. If the distribution of scores is skewed the Median is the best measure of central tendency. The most common measure of variability is the standard deviation and is appropriate for use with data at least interval scaled.

In addition to being used to summarize a data set, measures of central tendency and variability are critical compoenents of other statistical procedures.

### Central Tendency & Variability Step Summary

Using the frequencies option in SPSS:

• from the analyze menu choose descriptive statistics then choose frequencies.
• Uncheck the box that says display frequency tables. Click on the statistics button. Select the variables (at least interval scaled for the mean) you are interested in.
• Under central tendency check mode, median, or mean and under dispersion check range or standard deviation then click continue button. Then click OK button.

If working with interval or ratio data and the data is normally distributed you can obtain the mean and standard deviation from the descriptions option in SPSS:

• from the analyze menu choose descriptive statistics then choose descriptives. Select the continuous (interval or ratio scaled) variables you want the mean and standard deviation for. Press the options button.
• Check the mean and standard deviation (also any other measures you would like) then select the display option you prefer:
• Ascending Means
• Alphabetic
• Decending Means
• Variable List
• Click continue button after selections made. Click OK button.

REMEMBER, you must check the shape (obtain histogram under graphs) of the distribution of scores to decide what measure of central tendency is appropriate. If the shape is skewed then you need to obtain a median.

To get measures of central tendency and variability for continous measures on subgroups of your sample,

• from the analyze menu choose compare means then choose means.
• Select from the list of variables the interval or ratio scaled variables you want central tendency and variability for and move them to the dependent list box.
• Then select the categorical variable(s) that constitute the subgroups you’re interested in and move them to the independent list box.
• Now click the options button and move over to the right the statistics (eg median, standard deviation) you want for each group then click continue
• Then click OK button.

To break the analysis down by a 2nd categorical variable:

• from the analyze menu choose compare means then choose means.
• Select from the list of variables the interval or ratio scaled variables you want central tendency and variability for and move them to the dependent list box.
• Then select the categorical variable that constitutes the first subgroups you’re interested in and move it to the independent list box.
• Click the 'next' button to place the 2nd categorical variable in the new blank independent list box. It becomes 'layer 2 of 2'.
• Now click the options button and move over to the right the statistics (eg median, standard deviation) you want for each group then click continue
• Then click OK button.

REMEMBER, you must check the shape (obtain histograms under explore option) of the distribution of scores for each group to decide what measure of central tendency is appropriate. If the shape is skewed for either group then you need to obtain medians.

## Correlation

There are several types of correlation coefficients to choose from. The choice is based on the nature of the data being correlated.

 Pearson Product Moment Correlation Use when both variables have continuous data Phi Use when both variables have dichotomous data Kendall's Tau Use when both variables have ordinal data Point Biserial Correlation Use when one variable has continuous data and the other a true dichotomy

### Pearson Product Moment Correlation (PPMC)

The PPMC can be used to describe the strength and direction of the linear relationship between two continuous variables. When two variables are not linearly related, the PPMC is likely to underestimate the true strength of the relationship. A graph of the x and y values can show whether or not the relationship is linear.

#### Correlation Step Summary for PPMC

• Under the analyze menu choose correlate then choose bivariate.
• Select the two continuous variables and then move them to the variables box. Then click OK button (PPMC is the default selection).

### Kendall's Tau

Kendall's Tau can be used to describe the strength and direction of the relationship between two ordinal variables. It is a rank-order correlation coefficient (as is PPMC) and can convey the extent to which pairs of values (x,y) are in the same rank order.

#### Correlation Step Summary for Kendall's Tau

• Under the analyze menu choose correlate then choose bivariate.
• Select the two ordinal variables and then move them to the variables box.
• Check the box labeled Kendall's Tau.
• Then click OK button.

### Phi

Phi can be used to describe the strength of the relationship between two dichotomous variables. It can convey the direction of the pattern in the two by two crosstab table of the the two variables.

#### Correlation Step Summary for Phi

• Under the analyze menu choose descriptive statistics then choose crosstabs to crosstabulate the categorical information.
• Once inside the crosstabs box select the row and column variables (each dichotomous) then single click on the cells or formats buttons to further specify what type of output you want.
• Click the continue button.
• Click on the statistics button then select 'phi and Cramer's V'.
• Click the continue button.
• Single click on the OK button when selections complete.

### Point Biserial Correlation

The Point Biserial Correlation can be used to describe the strength of the relationship between one continuous variables and one dichotomous variable. The point biserial correlation coefficient is useful in detecting a pattern in group measures (e.g one group's scores tending to be higher than another group).

The computational formula for the point biserial coefficient is Where:

X0 = mean of x values for those in category 0
X1 = mean of the x values for those in category 1
Sx = standard deviation of all x values
P0 = proportion of people in category 0
P1 = proportion of people in category 1

#### Steps to obtain summary information in order to do point biserial by hand:

• from the analyze menu choose compare means then choose means.
• Select from the list of variables the interval or ratio scaled variable you want central tendency and variability for and move them to the dependent list box.
• Then select the categorical variable(s) that constitute the subgroups you’re interested in and move them to the independent list box.
• Then click OK button.

## Using Graphs to Summarize Data

Graphs are the visual counterparts to descriptive statistics and are very powerful mechanisms for revealing patterns in a data set. In addition, when used appropriately in a report they can highlight trends and summarize pertinent information in a way no amount of text could.

When summarizing categorical data, pie or bar charts are the most efficient and easy to interpret though line graphs may be more helpful particularly at times when trying to draw attention to trends in the data. For continuous data, histograms are a good choice, easily constructed and simple to interpret. When attempting to represent visually the relationship between two continuous variables a scattergram can be used.

### Bar Charts

To create simple bar, chart for categorical and ordinal (with few categories) data:

• Under the graphs menu choose legacy dialogs; choose bar; click define button.
• Put variable you need bar chart for in the category axis box.
• Single click on the OK button when selections complete.

### Scattergrams

To create a scattergram (two continous variables)

• Under the graphs menu choose legacy dialogs; choose scatter/dot; click define button.
• Put x and y varibles in the boxes to the right.
• Single click on the OK button when selections complete.

### Histograms

To create a histogram (continuous variable) you can work from the frequencies option

• Under the graphs menu choose legacy dialogs; choose histograms.
• Put variable you need histogram for in the variable box.
• Check 'display normal curve' box.
• Single click on the OK button when selections complete.

To create histograms for subsets of a group:

• Under the graphs menu choose legacy dialogs; choose histograms.
• Put continuous variable you need histogram for in the variable box.
• Check 'display normal curve' box.
• Put grouping variable in the rows box.
• Single click on the OK button when selections complete.

To break down by a 2nd categorical variable:

• Under the graphs menu choose legacy then choose histogram.
• Once inside the histograms box highlight the continuous variable you are interested in and move it to the variable box and check the box underneath 'display normal curve'.
• Highlight the 1st categorical/ordinal variable for your sub groups and move it to the rows box.
• Highlight the 2nd categorical/ordian variable and move it to the columns box.
• Click the titles button and supply titles as needed then click continue.
• Single click on the OK button when selections complete.

## Validity

### Validity of Scores

Depending on the type and purpose of a test, criterion-related validity of can be examined from one or more of several perspectives. The two situations covered in this class are:

#### Concurrent validity of scores

This is examined when you are interested in the extent to which a particular measure is as good as an already established criterion known to provide valid and reliable data. You determine this by correlating your scoress (x is continuous) with scores or classifications from a criterion measure (y).

The process would entail:

• Gather x and y measures from a large group - y is the criterion measure
• Compute an appropriate correlation coefficient (depending on the measurement scale of x and y)
• If correlation > .80 for variables positively related or < -.80 for variables inversely related, your data (x) is said to have good concurrent validity

#### Steps for concurrent validity of scores

• When both x and y (criterion) continuous - Use PPMC
• When x continuous and y dichotomous - Use Point Biserial (use the compare means feature to obtain data needed to compute by hand) ## Reliability

The primary concern here is the accuracy of measures. Reducing sources of measurement error is the key to enhancing the reliability of the data.

Reliability is typically assessed in one of two ways:

• Internal consistency - Precision and consistency of test scores on one administration of a test.
• Stability - Precision and consistency of test scores over time. (test-retest)

To estimate reliability you need 2 or more scores (or classifications) per person.

 Note: When interpreting coefficient alpha or the intraclass R, a value > .70 reflects good reliability.

### Internal Consistency of Scores - Continuous data

If multiple cognitive and motor skills/physiological measures collected on one day, the estimate of reliability is referred to as internal consistency. The intraclass coefficients you can use are Cronbach's Alpha and the Intraclass R.

#### Steps for coefficient alpha

• Under analyze menu choose scale then choose reliability analysis.
• Select the 2 or more measures per subject and move them to the items box.
• Click OK.

Steps for Intraclass R

• Under analyze menu choose scale then choose reliability analysis.
• Select the 2 or more measures per subject and move them to the items box.
• Click the statistics button. Check intraclass correlation. Then click continue button.
• Click OK.

### Stability of scores - Continuous Data

If every individual can be measured twice on the variable you're interested in then you readily have data from which reliability can be examined.

Once you have 2 scores per person the question is how consistent overall were the scores.

In many situations reliability has been estimated incorrectly using the Pearson correlation coefficient. This is not appropriate since (1) the PPMC is meant to show the relationship between two different variables - not two measures of the same variable, and (2) the PPMC is not sensitive to fluctuations in test scores. The PPMC is an interclass coefficient; what is needed is an intraclass coefficient. The most commonly used reliability coefficients are the intraclass R calculated from values in an analysis of variance table and coefficient alpha.

#### Steps for coefficient alpha

• Under analyze menu choose scale then choose reliability analysis.
• Select the 2 or more measures per subject and move them to the items box.
• Click OK.

Steps for Intraclass R

• Under analyze menu choose scale then choose reliability analysis.
• Select the 2 or more measures per subject and move them to the items box.
• Click the statistics button. Check intraclass correlation. Then click continue button.
• Click OK.

## Objectivity

### Objectivity of scores - Continuous Data

In motor skill performance settings it is often necessary to collect measures through observation. To examine the objectivity of these measures you look at the consistency of measures across observers (inter-rater consistency). Note: you may also video tape a group and have one person record measures on two occasions (intra-rater consistency).

To assess objectivity, your task, since the measures come from observations, is to examine the objectivity of the measures produced by observers using a rating scale. To do this, have two people observe one group of examinees and evaluate their performance using a rating scale. The measures from the two observers (you could also videotape the group and have one person evaluate the group twice) give you two scores per person to use in the coefficient alpha or intraclass R formulas. The Spearman-Brown formula is not needed in this situation since test length is not manipulated.

 Note: When interpreting coefficient alpha or the intraclass R, a value > .70 reflects good objectivity.

#### Steps for coefficient alpha

• Under analyze menu choose scale then choose reliability analysis.
• Select the 2 or more measures (scores from observers) per subject and move them to the items box.
• Click OK.

## Inferential Statistics

The branch of statistics concerned with using sample data to make an inference about a population is called inferential statistics. This is generally done through random sampling, followed by inferences made about central tendency, or any of a number of other aspects of a distribution. This section will focus on:

 Parametric Tests for Differences Parametric Tests for Relationships Non-Parametric Tests for Differences Non-Parametric Tests for Relationships

### Parametric tests for differences - Dependent t-test

The dependent t-test is a statistical Procedure for testing H0: mean1 = mean2 when the two measures of the dependent variable are related. For example, when one group of subjects is tested twice the two scores are related.

Assumptions of the dependent t-test procedure:

• Normality - is the distribution of differnece scores for each measurement of the dependent variable in the population normal? You check this assumption by obtaining a difference score for each person then examining the histogram for the difference scores for the group. The difference scores should be normally distributed.

• To get the difference scores, under transform, select compute. In the target variable box type in a variable name for the new variable you are creating that represents the difference between pairs of measures. In the numeric expression box place the first measure of your dependent variable followed by a minus sign followed by the second measure of your dependent variable, then click OK button.
• Sample randomly selected. Examine whether or not you have met this assumption by scrutinize sampling procedure

• Dependent variable at least interval scaled. Examine whether or not you have met this assumption by checking to see that the dependent variable meets the definition of an interval scaled variable.

If assumptions met you can proceed and conduct a dependent t-test. If distributional assumptions not met you should conduct a non-parametric test (Wilcoxon)

#### Dependent t-test Step Summary

Conducting dependent t-test

• Under the analyze menu choose compare means then choose paired samples t-test.

• Select the two variables that represent the two measures of the dependent variable and then move them to the paired variable(s) box. Click OK button.

### Parametric tests for differences - Independent t-test

To examine whether or not there is a statistically significant difference in means on some dependent variable (continuous) as a function of some independent variable (categorical) you can use the t-test when you have just two levels (unrelated) of the independent variable (ex: gender).

An Independent t-test is a statistical procedure for testing H0: mean1 = mean2 when the two levels of the independent variable are not related.

#### Assumptions of the independent t-test procedure:

• Homogeneity of variance - is the variability of the dependent variable in the population similar for each level of the independent variable? You examine this assumption by comparing the two standard deviations for the groups in your sample. If they are similar (larger divided by smaller <2) you have met this assumption.

• Normality - is the distribution of scores for the dependent variable in the population normal for each level of the independent variable? You check this assumption by examining histograms for each group. The dependent variable should be normally distributed for each group in your sample.

• Sample randomly selected. Examine whether or not you have met this assumption by scrutinize sampling procedure

• Dependent variable at least interval scaled. Examine whether or not you have met this assumption by checking to see that the dependent variable meets the definition of an interval scaled variable.

If assumptions met you can proceed and conduct an independent t-test. If distributional assumptions not met you should conduct a non-parametric test (Mann-Whitney).

#### Independent t-test Step Summary

Checking homogeneity of variance assumption

• To get the standard deviation for the dependent variable (as well as mean though it is not of interest in checking homogeneity) on the groups that constitute your independent variable, from the analyze menu choose compare means then choose means.

• Select from the list of variables the dependent variable you want standard deviations for and move it to the dependent list box.

• Then select the categorical variable that constitutes the independent variable you’re interested in and move it to the independent list box. Then click OK button.

Checking normality assumption

• Under the graphs menu choose legacy dialogs; choose histograms.
• Put continuous dependent variable you need histogram for in the variable box.
• Check 'display normal curve' box.
• Put independent variable in the rows box.
• Single click on the OK button when selections complete.

To conduct an independent t-test

• Under the analyze menu choose compare means then choose independent samples t-test.

• Select the dependent variable and move it to the test variable(s) box. Select the independent variable and move it to the grouping variable box.

• Click on the define groups button. In the Group 1 box, type the value that identifies subjects in group 1. In the Group 2 box, type the value that identifies subjects in group 2. These are the values associated with the independent variable.

• Click the continue button. Click OK button.

### Parametric tests for differences - Repeated Measures Analysis of Variance

The repeated measures ANOVA is an extension of the dependent t-test. It is a statistical pocedure for testing H0: mean1 = mean2 = mean3 = ... when the two or more measures of the dependent variable are related. For example, when one group of subjects is tested three times the three scores are related.

#### Assumptions of the repeated measures ANOVA procedure:

• Sphericity - do the population variances associated with the levels of the repeated measures factor, in combination with the population correlations between pairs of levels, represent one of a set of acceptable patterns. One of the acceptable patterns is for all the populations variances to be identical and for all bivariate correlations to be identical. You examine this assumption by applying Mauchley’s test.

• Sample randomly selected. Examine whether or not you have met this assumption by scrutinize sampling procedure

• Dependent variable at least interval scaled. Examine whether or not you have met this assumption by checking to see that the dependent variable meets the definition of an interval scaled variable.

If assumptions met you can proceed and conduct a repeated measures ANOVA. If distributional assumptions not met you should conduct a non-parametric test (Friedman)

#### Repeated Measures Analysis of Variance Step Summary

Checking Sphericity assumption

• Under analyze menu choose general linear model then choose repeated measures.

• Once inside the repeated measures dialog box give a name to the within subjects factor - dependent variable - (by default it will be named factor1). In the number of levels box, type the number of repeated measures of the dependent variable you have. Then press the add button. Next press the define button. Highlight the variable names in the left side box that represent the repeated measures of the dependent variable and move them over to the within-subjects variable box. Then click the OK button.

• Mauchley’s test of significance will be automatically done. If significant, the condition of sphericity does not exist and an adjustment to the numerator and denominator degrees of freedom must be made. This is necessary in order to validate the univariate F statistic. Three estimates of this adjustment, which is called epsilon, are available in the GLM Repeated Measures procedure. Both the numerator and denominator degrees of freedom must be multiplied by epsilon, and the significance of the F ratio must be evaluated with the new degrees of freedom.

Conducting Repeated Measures Analysis of Variance

• Since the procedure for checking sphericity entails complete specification of the variables needed to conduct the repeated measures procedure, no additional steps are needed. The output produced when checking sphericity includes the information needed to check for significant differences across measures of the dependent variable.

• If Mauchley’s test is not significant , the F statistic (or p value) needed to assess significance will be found in the table labeled sphericity assumed. If Mauchley’s test significant, the F statistic needs to be compared to a new critical value based on adjusted numerator and denominator degrees of freedom.

### Parametric tests for differences - One way analysis of variance

To examine whether or not there is a statistically significant difference in means on some dependent variable (continuous) as a function of some independent variable (categorical) you can use the One way analysis of variance procedure when you have two or more levels (unrelated) of the independent variable.

A One way analysis of variance is a statistical procedure for testing H0: mean1 = mean2 = mean3 .... when the two or more levels of the independent variable are not related.

#### Assumptions of the one way ANOVA procedure:

• Homogeneity of variance - is the variability of the dependent variable in the population similar for each level of the independent variable? You examine this assumption by comparing the standard deviations for the groups in your sample. If they are similar (larger divided by smaller <2) you have met this assumption.

• Normality - is the distribution of scores for the dependent variable in the population normal for each level of the independent variable? You check this assumption by examining histograms for each group. The dependent variable should be normally distributed for each group in your sample.

• Sample randomly selected. Examine whether or not you have met this assumption by scrutinize sampling procedure

• Dependent variable at least interval scaled. Examine whether or not you have met this assumption by checking to see that the dependent variable meets the definition of an interval scaled variable.

If assumptions met you can proceed and conduct an independent t-test. If distributional assumptions not met you should conduct a non-parametric test (Kruskal Wallis).

#### One way analysis of variance Step Summary

Checking homogeneity of variance assumption

• To get the standard deviation for the dependent variable (as well as mean though it is not of interest in checking homogeneity) on the groups that constitute your independent variable, from the analyze menu choose compare means then choose means.

• Select from the list of variables the dependent variable you want standard deviations for and move it to the dependent list box.

• Then select the categorical variable that constitutes the independent variable you’re interested in and move it to the independent list box. Then click OK button.

Checking normality assumption

• Under the graphs menu choose legacy dialogs; choose histograms.
• Put continuous dependent variable you need histogram for in the variable box.
• Check 'display normal curve' box.
• Put independent variable in the rows box.
• Single click on the OK button when selections complete.

To conduct a one way analysis of variance

• To conduct a one-way ANOVA, under the analyze menu choose compare means then choose one-way anova.

• Select the dependent variable and move it to the dependent list box. Select the independent variable and move it to the factor box.

• Click on post-hoc button if you have three or more levels of the independent variable. Check Scheffe. Click the continue button.

• Click options button. Under statistics check descriptive and homogeneity of variance.  Click the continue button. Click OK button.

### Parametric tests for differences - Two way analysis of variance

To examine whether or not there is a statistically significant difference in means on some dependent variable (continuous) due to the influence of two independent variables (categorical) you can use the two way analysis of variance procedure when you have two or more levels (unrelated) of each independent variable.

A two way analysis of variance can be used to answer three questions: a) is there a difference in means on the dependent variable due to the 1st independent variable, b) is there a difference in means on the dependent variable due to the 2nd independent variable, and c) do the two independent variables interact to affect the dependent variable.

#### Assumptions of the two way ANOVA procedure:

• Constant variance - is the variability of the dependent variable in the population similar for each cell (combinations of levels of the independent variables)? You examine this assumption by comparing the standard deviations for each cell. If they are similar (larger divided by smaller <2) you have met this assumption. You could also look at the spread of your observations in a box-and-whiskers plot to see if the variability is markedly different in the groups.

• Normality - is the distribution of scores for the dependent variable in the population normal for each cell (combinations of levels of the independent variables)? You check this assumption by examining histograms for each cell. The dependent variable should be normally distributed within each cell.

• Sample randomly selected. Examine whether or not you have met this assumption by scrutinize sampling procedure

• Dependent variable at least interval scaled. Examine whether or not you have met this assumption by checking to see that the dependent variable meets the definition of an interval scaled variable.

If assumptions met you can proceed and conduct an independent t-test. If distributional assumptions not met you could conduct two non-parametric tests (Kruskal Wallis) to examine the main effects, but, there is no comparable non-parametric test to examine the interaction.

#### Two way analysis of variance Step Summary

Checking constant variance assumption

• To get the standard deviation for the dependent variable (as well as mean though it is not of interest in checking this assumption) for each combination of independent variables, from the analyze menu choose compare means then choose means.

• Select from the list of variables the dependent variable you want standard deviations for and move it to the dependent list box.

• Then select the categorical variable that constitutes the first independent variable you’re interested in and move it to the independent list box. Then press the next button so you can identify a second layer. Then select the categorical variable that constitutes the second independent variable you’re interested in and move it to the independent list box. With two layers identified, the mean and standard deviation for each combination of the two independent variables will be displayed in the output window. When done click OK button.

Checking normality assumption

• Under the graphs menu choose legacy dialogs; choose histograms.
• Put continuous variable you need histogram for in the variable box.
• Check 'display normal curve' box.
• Put 1st independent variable in the rows box.
• Put 2nd independent variable in the columns box
• Single click on the OK button when selections complete.

To conduct a Two way (fixed) analysis of variance

• To conduct a two way ANOVA, under the analyze menu choose general linear model then choose univariate.

• Select the dependent variable and move it to the dependent list box. Select the independent variables and move them to the fixed factor box.

• Click on the post hoc button then move variables over to the post hoc test box.  Select Scheffe (or other test) then click continue.

• Click on the options button and move variables you want means for over to the display means box.  Under display select items you need (eg descriptive statistics, effect size) then click continue.  Click OK button.

### Non-Parametric tests for differences

When the dependent variable is an ordinal variable a non-parametric test should be used to examine group differences. The reason for this is that one of the assumptions associated with parametric tests is that the data is continuous (at least interval scaled).

When parametric distributional assumptions (eg normality, homogeneity of variance) have been violated, even though the dependent variable may be continuous, a non-parametric test should be used to examine group differences.

This excerpt from the SPSS guide to data analysis explains well the application of parametric and non-parametric tests:

"The disadvantage to nonparametric tests is that they are usually not as good at finding differences when there are differences in the population. Another way of saying this is that nonparametric tests are not as powerful as tests that assume an underlying normal distribution, the so-called parametric tests. That’s because nonparametric tests ignore some of the available information. For example, data values are replaced by ranks when using the Wilcoxon test. In general, if the assumptions of a parametric test are plausible, you should use the more powerful parametric test. Nonparametric procedures are most useful for small samples when there are serious departures from the required assumptions. They are also useful when outliers are present, since the outlying cases won’t influence the results as much as they would if you used a test based on an easily influenced statistic like the mean."

### Non Paramtric tests for differences - Wilcoxon

The Wilcoxon test is the non-parametric counterpart to the dependent t-test. It is a statistical Procedure for testing the null hypothesis that two medians are equivalent when the two measures of the dependent variable are related. For example, when one group of subjects is tested twice the two scores are related.

#### Assumptions of the Wilcoxon procedure:

• Symmetry - the differences between pairs of values be a sample from a symmetric distribution. This is a less stringent assumption than requiring normality, since there are many other distributions besides the normal distribution that are symmetric.

• Sample randomly selected. Examine whether or not you have met this assumption by scrutinize sampling procedure.

• Dependent variable at least ordinally scaled. Examine whether or not you have met this assumption by checking to see that the dependent variable meets the definition of an ordinally scaled variable.

#### Wilcoxon Step Summary

Checking symmetry

• To get the difference scores, under transform, select compute. In the target variable box type in a variable name for the new variable you are creating that represents the difference between pairs of measures. In the numeric expression box place the first measure of your dependent variable followed by a minus sign followed by the second measure of your dependent variable, then click OK button.

• Next, under the graph menu select histogram. Move the variable representing the difference scores over to the variable box, then click OK button.

Conducting Wilcoxon test

• Under the analyze menu choose nonparametric tests then select legacy dialogs and then choose 2 related samples.

• Select the two variables that represent the two measures of the dependent variable and then move them to the test pairs list box. Then click OK button.

### Non-Parametric Tests for Diferences - Friedman

The Friedman test is the nonparametric counterpart to the Repeated Measures ANOVA. To examine whether or not there is a statistically significant difference in medians from repeadted measures of a dependent variable (continuous) you can use the Wilcoxon test when you have two or more measures of the dependent variable.

#### Assumptions of the Friedman procedure:

• Symmetry - the differences between pairs of values be a sample from a symmetric distribution. This is a less stringent assumption than requiring normality, since there are many other distributions besides the normal distribution that are symmetric.

• Sample randomly selected. Examine whether or not you have met this assumption by scrutinize sampling procedure.

• Dependent variable at least ordinally scaled. Examine whether or not you have met this assumption by checking to see that the dependent variable meets the definition of an ordinally scaled variable.

#### Friedman Step Summary

Checking symmetry

• To get the difference scores, under transform, select compute. In the target variable box type in a variable name for the new variable you are creating that represents the difference between pairs of measures. In the numeric expression box place the first measure of your dependent variable followed by a minus sign followed by the second measure of your dependent variable, then click OK button.

• Next, under the graph menu select histogram. Move the variable representing the difference scores over to the variable box, then click OK button.

#### To conduct a Friedman test

• To conduct a Friedman test, under the analyze menu choose nonparametric tests then select legacy dialogs and then choose k dependent samples.

• Select each measure of the dependent variable and move it to the test variable list box.

• Click the continue button. Click OK button.

### Non-parametric Tests for Differences - Mann-Whitney U

The Mann-Whitney U test is the nonparametric counterpart to the independent t-test. To examine whether or not there is a statistically significant difference in medians on some dependent variable (at least ordinally scaled) as a function of some independent variable (categorical) you can use the Mann-Whitney U test when you have just two levels (unrelated) of the independent variable (ex: gender).

#### Assumptions of the Mann-Whitney U procedure:

• Sample randomly selected. Examine whether or not you have met this assumption by scrutinize sampling procedure.

• Dependent variable at least ordinally scaled. Examine whether or not you have met this assumption by checking to see that the dependent variable meets the definition of an ordinally scaled variable.

#### To conduct a Mann-Whitney U test

• Under the analyze menu choose nonparametric tests then select legacy dialogs and then choose 2 independent samples.

• Select the dependent variable and move it to the test variable(s) list box. Select the independent variable and move it to the grouping variable box.

• Click on the define groups button. In the Group 1 box, type the value that identifies subjects in group 1. In the Group 2 box, type the value that identifies subjects in group 2. These are the values associated with the independent variable.

• Click the continue button. Click OK button.

### Non-Parametric Tests for Diferences - Kruskal-Wallis

The Kruskal-Wallis test is the nonparametric counterpart to the one-way ANOVA. To examine whether or not there is a statistically significant difference in medians on some dependent variable (continuous) as a function of some independent variable (categorical) you can use the Kruskal-Wallis test when you have two or more levels (unrelated) of the independent variable.

#### Assumptions for the Kruskal-Wallis procedure:

• Sample randomly selected. Examine whether or not you have met this assumption by scrutinize sampling procedure.

• Dependent variable at least ordinally scaled. Examine whether or not you have met this assumption by checking to see that the dependent variable meets the definition of an ordinally scaled variable.

#### To conduct a Kruskal-Wallis test

• To conduct a Kruskal-Wallis test, under the analyze menu choose nonparametric tests then select legacy dialogs and then choose k independent samples.

• Select the dependent variable and move it to the test variable list box.

• Select the independent variable and move it to the grouping variable box. Click on the define range button. In the minimum box, type the lowest value that identifies subjects in your groups. In the maximum box, type the largest value that identifies subjects in your groups. These are the values associated with the independent variable.

• Click the continue button. Click OK button.

### Parametric Tests for Relationships - Correlation

When testing for the presence of a statistically significant relationship, the null hypothesis under examination is that the correlation between your independent and dependent variable is zero.

#### Assumptions when testing for a significant relationship

• Linearity: are the two variables linearly related? This is checked by examining a plot of the two variables. If a straight line can be drawn through the points on the graph this assumptions has been met.

• Homoscedasticiy: is the variability of the y values the same at each x? This is checked by examining a plot of the two variables. If the spread around the line through the points on the graph is constant you have met the homoscedasticity assumption.

• Variables at least interval scaled. Examine whether or not you have met this assumption by checking to see that the variables meet the of an interval scaled variable.

If assumptions met continue and test for a significant relationship. If assumptions not met, recode continuous variables to categorical/ordinal data and use the chi square statistic.

#### Correlation Step Summary

Checking Linearity & Homoscedasticity Assumptions

• Under the graphs menu choose legacy then scatter/dot.

• Click define button. Select one of the two continuous variables and move it to the x axis box. Select the other continuous variable and move it to the y axis box. Click OK button.

#### Conducting correlation test

• Under the analyze menu choose correlate then choose bivariate.

• Select the independent and dependent variables and then move them to the variables box. Then click OK button.

### Non-parametric tests for relationships - Chi Square

The Chi Square test of independence is used to examine the statistical significance of the relationship between two categorical/ordinal variables.

#### Assumptions associated with the Chi Square test

• Expected frequencies must be greater than 5, and none less than 1.

• Variables categorical or ordinally scaled. Examine whether or not you have met this assumption by checking to see that the variables meet the definition of a categorical or ordinal variable.

• Observations are independent. This implies that an individual can appear only once in a table. It also means that the categories of a variable cannot overlap. Careful examination of the variables chosen is the place to start when checking this assumption followed by how individuals are categorized.

• Sample randomly selected. Examine whether or not you have met this assumption by scrutinize sampling procedure.

If assumptions met continue and test for a significant regression. If assumptions not met, no other statistical test available, so, report a measure of practical significance such as Phi or Cramer’s V.

#### Chi Square Step Summary

• Chi Square: Under the analyze menu choose descriptive statististics then choose crosstabs.

• Once inside the crosstabs box select the row and column variables then single click on the statistics button select the chi square button click on the continue button. Single click on the OK button when selections complete.

## Determining Power & Sample Size

While SPSS does have some capacity with respect to power estimation, software specifically designed to estimate power and determine a-priori sample size is recommended. The free software used in this course is G-Power. The directions here apply to the G-Power software.

### Determining Power for a Differences study after a study/analysis complete: Post-Hoc Power Analysis

Post-hoc power analyses are done after you or someone else conducted an experiment.

You have:

* alpha,
* N (the total sample size),
* and the effect size.

 Effect size can be conceived of as measures of the "distance" between H0 and H1. Hence, effect size refers to the underlying population rather than a specific sample. In specifying an effect size, researchers define the degree of deviation from H0 that they consider important enough to warrant attention. In other words, effects that are smaller than the specified effect size are considered negligible.

You want to know

* the power of a test to detect this effect.

For instance, you tried to replicate a finding that involves a difference between two treatments administered to two different groups of subjects, but failed to find the effect with your sample of 36 subjects (14 in Group 1, and 22 in Group 2).

• Choose Post-hoc as type of power analysis,
• and t-Test on means as type of test.

Suppose you expect a "medium" effect according to Cohen's effect size conventions between the two groups (delta = .50), and you want to have alpha =.05 for a two-tailed test, you

• enter these values (and 14 for n 1, plus 22 for n 2) and click the "Calculate" button

to find out that your test's power to detect the specified effect is ridiculously low: 1-beta = .2954.

However, you might want to draw a graph using the Draw graph option to see how the power changes as a function of the effect size you expect, or as a function of the alpha-level you want to risk.

### Determining Sample Size for a Differences study at the outset: A-priori Power Analysis

A priori power analyses are done before you conduct an experiment.

You have:

alpha,
the desired power (1-beta),
and the effect size of the effect you want to detect.

You want to know how many subjects you need:

the total sample size.

For instance, if you want to compare the effects of two treatments administered to two different groups of subjects, you choose

• A priori as type of power analysis,
• and t-Test (means) as type of test with the "two-tailed" option selected.

Suppose you expect a "large" effect according to Cohen's effect size conventions between the two groups (d = .80), and you want to have alpha = beta = .05 (i.e., power = .95), you

enter these values and click the "Calculate" button to find out that you need N = 84 subjects.

If you think this is too much, you might want to have G*Power draw a graph for you to see how the sample size changes as a function of the power of your test, or as a function of the effect size you expect. Simply click on the Draw Graph button.

For an ANOVA you need the same information plus you need to specify the number of groups.

For a Correlational analysis, the effect size is the value of the correlation coefficient. G-Power will need the same information for a correlational analysis as it did with differences: effect size, alpha, power (to determine sample size) and effect size, alpha, sampole size (to determine power).

## Factor Analysis

This technique can be used for

• Data reduction: to ascertain the factors underlying data
• Examination of Construct Validity
• Examination of Content Validity

Steps for factor analysis

• Under analyze choose data reduction then choose factor.
• Select the item(s)/test(s) in the survey and the marker item(s)/test(s) that you need to assess construct validity for and move them to the variables box.
• Press the rotation button. Under method choose varimax. Then click continue button.
• Press the options button. Once inside options box, check sort by size. Then click continue button. Then click OK button.