Research Overview; Ethics in Research

Descriptive Statistics

Evaluating Research Psychometrics
Sampling Survey Construction & Data Analysis
Quantitative Design & Analysis Issues Confidence Intervals
Results & Discussion Sections Inferential Statistics - Differences
Power, Type I error, Type II error Inferential Statistics - Relationships

 

 

 

 

 

Research Overview

Definitions

Research: The systematic and replicable investigation of a question/problem.

Research process often referred to as the scientific method. The language in which the scientific method is discussed is in need of transformation, but, it can and should apply to all forms of inquiry. The scientific method takes a linear approach to problem solving and typically entails:

Define and delimit the problem
Formulate the hypothesis
Gather Data
Analyze and interpret findings

Qualitative Research: Nature of the ‘data’ the distinguishing characteristic. With qualitative research no summary or reduction to a numerical representation of the data is made.

Quantitative Research: With quantitative research, descriptive and/or inferential statistics are used to summarize data and infer from a sample something about the population the sample represents.

Research need not be entirely one or the other. In fact a combination many times will yield a richer and more comprehensive examination of question.

Qualitative and Quantitative research are not polar opposites with completely different sets of techniques and approaches to inquiry. They exist along a continuum commonly framed in terms of the amount of control or manipulation present.

The advantage of a quantitative approach is that it is possible to measure the reactions of many people to a limited set of questions thus facilitating comparison and statistical aggregation of data. A broad, generalizable set of findings result.

The advantage of a qualitative approach is that a wealth of detailed information about a specific event is produced. This increases understanding of the cases and situations studied but reduces generalizability.

Good research begins with a good and well-articulated question. This will help you decide what type of research and data you need to examine.

INDEPENDENT VARIABLE: The variable manipulated by the experimenter. Or a broader definition would be - any variable that is assumed to produce an effect on, or be related to, a behavior of interest.

LEVELS OF AN INDEPENDENT VARIABLE: The various values or groupings of values of an independent variable. Ex: a study is conducted to determine the effect of room temperature on performance. If the experimenter tests the subject at 70, 80, and 90 degrees, there is one independent variable - room temperature - with three levels.

DEPENDENT VARIABLE: The behavior or characteristic observed of analyzed by the researcher, generally in regards to how the independent variable(s) affected or were related to it.

TYPE OF DEPENDENT VARIABLE: In empirical research, the dependent variable is quantified in some way. Statistical analysis is carried out on the numerical values of the dependent variable. The three basic types are score data (ratio, interval), ordered data (ordinal), and frequency data (categorical).


 

Connection between level of measurement & analytical choices

 

Selection of Descriptive Statistics to Summarize Data

Level of Measurement Applicable Statistics
Nominal/Categorical Percentages, Mode
Ordinal Percentages, Mode, Median*
Interval Mean, Median, Mode, Standard Deviation, Range, Percentiles, Z scores
Ratio Mean, Median, Mode, Standard Deviation, Range, Percentiles, Z scores

*Note: Use of the median for ordinal data should be applied only in situations where the underlying variable can be considered continuous and the numbers do not simply represent a few discrete categories.

 

Selection of Inferential Statistics to Summarize Data

Level of Measurement Test Needed Applicable Statistics
Ordinal Differences Mann Whitney, Kruskal Wallis, Friedman
Interval/ratio Differences t-tests, ANOVAs
Nominal/Ordinal Relationships Chi Squared
Interval/ratio Relationships Correlation, Regression

 

 

 

Data Collection Methods

Participant Observation: When behaviors of individuals are of interest, observation is an appropriate data collection method. A participant observer is one fully engaged in the activity the group under study is involved in. Depending on needs, identity is sometimes concealed. An advantage of this is that the observer is typically better able to interact with others in a more normal fashion. In addition, concealment enhances the probability that the actions of those observed would be more natural and decreases the chances of the researcher affecting the event/activity under observation. Downsides - ethical? Researcher’s lens clouded

Unobtrusive Observation: Often undertaken is a public/natural setting and those being observed unlikely to know they are part of a research project. Advantage - researcher does not influence events. Downside: unable to collect detailed information on social circumstances, subjects’ backgrounds, personal characteristics, demographics - data limited to what can be observed without any interaction.

Content Analysis: The study of recorded communication - text, audio, visual. Sources depend on study’s goals. Advantage: can collect info without influence; others can examine data to verify interpretation/results. Downside: difficult to draw conclusions about social issues from recorded sources. Inferences must be limited to the nature and sources of recorded data.

Historical Assessment: Similar to content analysis, however, it typically includes a broader range of data sources - interviews, examination of relics, text, artifacts, geography, archeological data.

Personal Interviews: In unstructured interviewing, general topics are identified prior to the interview. However, much of the interview is guided by respondent comments and the researcher’s questions about these comments. In structured interviewing, an interview guide is prepared, pretested, and carefully followed during data collection. Structured ensures obtaining equivalent information across respondents, but, limits breadth/range of responses. Unstructured can produce very rich, broad ranging information, but, may not be able to be summarized well. Overall, benefit of face to face interviewing is that questions can be clarified and info about the respondent as well as the environment, and context can be recorded (e.g. body language)

Telephone interviewing: Same as Personal Interviewing, but, an additional downside is that individuals may be reluctant to reveal information over the telephone.

Written surveys: Self administered mail surveys permit inclusion of a greater number of respondents across a wider geographic area - and at a lower cost. Downside: response rate can be quite low. Most effective when well structured and focused.

Active interaction: This refers to the collection of performance data from individuals and typically involves exposing a group to some experimental condition, training, treatment, etc. One group may be measured twice (or more) or two groups (exposed to varying conditions, training,...) may be measured once (or more). Advantage: a great deal of ‘control’ possible which strengthens ability to draw inferences. Downside: interaction changes people.

 

 

Ethics

Before enrolling participants in an experiment, the investigator should be genuinely uncertain of the outcome. In other words, a true null hypothesis should exist at the onset.

The investigator must consider how adverse events will be handled; who will provide care for a participant injured in a study and who will pay for that care are important considerations.

Government/institutions typically have definitions around misconduct. In addition, there are many activities commonly considered unethical.

Central to all design & analysis concerns are ethical considerations with respect to

 

 


Evaluating Research

General Guidelines

Because researchers often conduct their research on narrowly defined problems, an important task in the evaluation of research is to judge whether a researcher has defined the problem too narrowly to make an important contribution to the advancement of knowledge.

Remember, all methods of observation (data collection) are flawed. There is no perfect way to observe a given variable. An evaluator must ask: To what extent is the method likely to produce valid and reliable data given the purpose/context framed by the researcher.

The most common sampling flaw is the use of a convenience sample or voluntary responses - e.g. mailed survey. Where self selection is an issue the evaluator must look 1st for the author’s acknowledgment of the problem and their perspective on the effect and 2nd consider whether or not the problem is great enough to invalidate or obscure findings.

Even what seems like a straight forward analysis can be flawed. For qualitative research the evaluator must consider the extent to which the design and data collection protocols are like to produce data with minimal variations in interpretation. For quantitative research the evaluator must 1st consider the evidence regarding the extent to which the data (dependent variable) is reliable and valid, 2nd the match between the analysis conducted and the research question, and 3rd the appropriateness of the analyses conducted in the context of data type, assumptions, and point around which decisions made (# tests-p values-alpha).

Details are important in research articles. The evaluator should examine whether or not enough detail is present to fully understand what was said and done to participants as well as how the data was constructed.

No research provides ‘proof’ of anything. You’re reading work from a weak/ questionable source when they write ‘research proves.....’ This is enough to call into question an entire piece of work.

 


Evaluating Quantitative Research (EQR) – Titles, Abstracts, Intro & Review of Literature, Methods

Evaluating Titles:

Titles should be sufficiently specific
Titles should not describe results
Titles should not pose a simple yes/no question: eg do boys/girls differ in upper body strength
If two part titles are employed BOTH parts should contain specific/important information about the study
If the title framed around the main analytical question it is desirable to have the IV and DV in the title. e.g. The relationship between cholesterol level and exercise frequency.
If a narrowly delimited sample used it is desirable to include a reference to the population in the title.
Titles should not infer causality unless the analytical techniques employed appropriate for drawing this type of inference. Words that infer causality - effect, influence, impact,...
Title should not use acronyms or jargon

Evaluating Abstracts:

Purpose of the study should be stated or clearly implied.
A snapshot of the methodology should be given.
Full titles of instruments should not be used unless the purpose of the study is to evaluate the reliability and validity of data from the instrument(s)
Highlights of results should be include, but, the brevity should not result in a misrepresentation of findings.
References to implications or future research do not belong in the abstract.

Evaluating Introductions:

Intro should lead in by identifying one or more problems without a lot of extraneous verbage. Ideally, the 1st sentence provides a concise statement of the problem and a reference to support the statement.
The importance of the problem should be made clear. Include implications of current research. The point is - has the author made the case?
Unless chronology (eg historical research) is of over riding importance the intro should be developed around topics (not references)
Key terms should be defined as they come up in the intro.
While the author’s opinion may be brought into the intro it must be clear that’s what it is. Any factual statement requires a source for support. e.g. incidence of injury has increased in recent years....
The intro should lead the reader smoothly into a wrap up paragraph with the study’s purpose, research questions, and reason study undertaken.
Underlying theory should be adequately described.

Evaluating Literature Reviews

Structure: Depending on the nature of the end product (thesis vs. journal article) the breadth of the opening will vary. In either case, the structure is that of a funnel and the entire section should take the reader down a logical path that ends with a restatement of the purpose, but, now in the context of the foundation set by previous work.
Researcher should be selective in lit review. Long lists suggests work has not been scrutinized.
When results vary across studies, the author should identify for the reader which they deem more dependable and why.
Current research must be included, however not at the expense of relevant work - regardless of publication date.
Opinion, when it surfaces should be clearly communicated to be such.
Must use primary sources predominantly.

Methods

The methods section of a research paper needs to be meticulously inclusive. Someone not connected with the study should be able to replicate your work just by reading your methods section.

Evaluating methods Section:

Length: Technical reports and journal articles may have page limits and within that authors should convey critical components others would need to replicate work. In contrast, a thesis needs to take as much space as needed and be as meticulously inclusive as possible. Someone not connected with the study should be able to replicate your work just by reading your methods section.

Structure/Content: The following is one recommended structure. The order may vary, however the content should be present.

Instrument development (including reliability & validity information)

Sampling

Data collection protocol(s)

Research design

Statistical analysis of data - must include how you will assess validity & reliability of dependent variable(s) (and independent variable(s) for relationship study).


Sampling

Sampling - The selection of a sample is one of the keys to limiting the problems of internal and external validity and reliability of the research.

Identify Population

Once the population you are interested in has been clearly defined a strategy for drawing a sample from that population is needed. The sample should be carefully chosen so that the researcher is able to see all the characteristics that are present in the total population in the sample chosen in the same proportions. In addition, it is important to be clear in the problem statement that you are taking a representative sample for your study.

Issues to consider when determining sample size:

Bias: Any influence, condition, or set of conditions which singly or together cause distortion of the data from what would have been obtained by pure chance. With this definition, any factor that impairs the randomness of the sample would be considered bias. Bias due to inadequate sampling impairs external validity.

Bias due to inadequate sampling can be a major problem for example in survey research. As the project is conceived the sample should be carefully chosen so that the researcher is able to see all the characteristics of the population in the sample.

Sampling Strategies: There are 4 major sampling techniques (Show link to cartoon example of sampling techniques) :

One step is common in each of the 4 techniques above and that is randomization.

A random sample is a subset of observations drawn from a given population in such a way that each observation contrived in the population has an equal chance of being included in the sample. In practice, samples seldom meet this criterion for randomness, but they are treated as random if no systematic bias exists that might be expected to invalidate the generalizations based on the sample.

When stratified vs. proportional stratified?

When the focus is on the differences across the strata or subgroups, non-proportional stratified random sampling should be used to select samples in each strata that are of the same size.

If it is more important to have a representative sample then a proportional stratified sampling process should be used to select samples of sizes that are representative of those in the population.

Random Selection & Random Assignment

The random selection of subjects is employed to obtain a representative sample of the population. This enhances external validity (generalizing results) and internal validity (results not confounded by sources of invalidity related to bias). The reason to employ randam assignment of subjects to treatment groups is to enhance the liklihood the groups are equivalent at the start.


Reminder: sample size is directly related to ‘power’ - probability of correctly rejecting the null hypothesis. Therefore, it is important that researchers determine sample size from the perspective of power. Software is available to help determine sample size. So, no reason not to.

Note on sample size: a well selected and controlled small sample is better than a poorly selected and poorly controlled large sample. Size alone is not the key to good research.

Summary Information on Sampling


 

Quantitative Design & Analysis Issues

Research Methods; Reliability & Validity of Research & Data; Descriptive and Inferential Statistics

Data collection

The quality of the methods employed to collect data is another key to limiting the problems of internal and external validity and reliability of the research.

For the thesis format this section must be meticulously detailed. Absolutely every piece of information related to collecting the data must be included. In an article format it typically must be tighter since there will be page limits. Specificity with respect to EVERY variable you collect data on is necessary.

Reminder: The methods section of a research paper needs to be meticulously inclusive. Someone not connected with the study should be able to replicate it work just by reading the methods section.

When data is collected via survey, information on the development of the instrument (including reliability & validity information) is critical.

The data collection protocol(s) and selection of a sample are the keys to limiting the problems of internal and external validity and reliability of the research.

Validity of Research:

Sources of Invalidity

Reliability of Research:

Essentially this refers to the replicability of the research. The reliability of the research is assessed qualitatively by scrutinizing the design and methodology employed in the research.

Reliability of the research hinges on the thoroughness of the data collection protocol in addition to obtaining a representative sample.


Psychometric Characteristics of data from dependent variable

To clarify, now talking about reliability & validity of the DATA.

Validity of Data

Concerned primarily with the dependent variable. The instrument used to quantify the dependent variable should be examined for it's ability to produce valid data (ability to truly measure what it's supposed to). Valid data is clean and relevant. If the instrument is a well known one with work already in place establishing the validity of data produced by it, it may be enough to site a reference where validity was examined and show that the same protocol was followed in your study on similar subjects.

Depending on the type and purpose of a data collection, validity can be examined from one or more of several perspectives. Content; Concurrent; Predictive; Construct Validity of the dependent variable can be assessed using an interclass coefficient:

Qualitatively

Quantitatively

When measures are found to be valid for one purpose they will not necessarily be valid for another purpose. Validity also may not be generalizable across groups with varying characteristics.

Content/logical validity (assessed qualitatively) - expect authors to

1. Clearly define what was measured.
2. State all procedures used to gather measures.
3. Have had an "expert" assess whether or not instrument/test measuring what you think you are.

Content validity (assessed quantitatively) Ex: survey research - expect authors to

1. Pilot test the survey
2. Conduct a factor analysis of survey results
3. Revise based on analysis
4. Administer survey and conduct another factor analysis

Criterion-related validity (predictive and concurrent) - Compare measures from your dependent variable with measures from a criterion (expert, another test, etc.) of the same skill/knowledge.

Concurrent validity (assessed quantitatively) - expect authors to

1. Gather x [dependent variable] and y measures from a large group
2. Compute an appropriate correlation coefficient
3. If correlation > .80 for positively correlated variables or < -.80 for inversely related variables measure (x) is said to have good concurrent validity

Predictive validity (assessed quantitatively) - expect authors to

1. Gather measures using their instrument (x) and measures on the variable(s) they are trying to predict (y)
2. Compute an appropriate correlation coefficient
3. If correlation > .80 for positively correlated variables or < -.80 for inversely related variables measure (x) is said to have good predictive validity
4. Follow up with estimation of SEE - band place around predicted score to quantify prediction error.

Construct validity (assessed quantitatively)

A construct is an intangible characteristic. When you want to measure a construct such as anxiety, competitiveness, etc., you have no direct means to do so. Therefore indirect methods need to be employed. To then estimate the validity of the indirect measures (as reflections of the construct you're interested in) you record a pattern of correlations between the indirect measure(s) and other similar and dissimilar measures. Your hope is that the pattern reveals high correlations with similar measures (convergent validity) and low correlations with different measures (divergent/discriminant validity).

Expect authors to employ one of two techniques used to quantitatively assess construct validity

- Multi-trait multi-method matrix
- factor analysis.

 

Reliability of Data

Concerned primarily with the dependent variable. The instrument used to quantify the dependent variable should be examined for it's ability to produce reliable data. (accuracy of measures reflected in consistency)

Reliability of the dependent variable can be assessed quantitatively using and intraclass coefficient:

1. Coefficient alpha
2. Intraclass R

Reliability of Scores (Norm-referenced Reliability)

Data is reliable when there is little or no measurement error (when scores are accurate). So the key to reliability is minimizing measurement error (highly unlikely to ever eliminate).

When analyzing research, look for sources of measurement error that may have a negative impact on reliability:

Measuring device/test Test administrator
Temporary effects warm-up, practice Test length
Factors that represent sources of invalidity in the research

The reliability of measures is typically assessed in one of two ways:

An intraclass coefficient is needed to examine the reliability of data. The two common statistics used are the intraclass R and coefficient alpha.

Relationship between reliability and validity of data

It is possible to have reliable data that is invalid. Data/information that is valid on the other hand, should also be reliable. So, reliability does not insure validity.

 

Reliability & Validity Summary

You are at all times interested in the reliability & validity of both the research and the data when analyzing the quality of research.

Examining the reliability & validity of the research is done by scrutinizing the design, sampling and data collection protocols. As a reader you should not assume that if no mention is made by the author(s) no threats to internal/external validity or reliability existed. When mention is made that should not necessarily cause you to question the quality.

Examining the reliability & validity of the data is done by scrutinizing the data collection process and statistics used to assess validity and reliability of the data representing the dependent variable.

Reliability and validity of the data (examined statistically) should be reported in a research paper.


 

Statistical Analysis of Data & Research Designs

Statistical Analysis of Data

Measurement Issues

The place to start is with how to classify data - the scale the data appropriately belongs on will affect analysis decisions.

Measurement Scales

Categorical/nominal scale: Used to measure discrete variables that can be classified by two or more mutually exclusive categories.

Ex: Gender is a categorically scaled variable with two categories: male & female. the scale scores (0,1) have no meaning.
Data at this level of measurement can be summarized by:

Frequency distribution tables
Crosstabulation tables
Charts/graphs

Ordinal scale: Used to measure discrete variables that are categorical in nature and can be ordered (meaningfully).

Ex: Undergraduate class is an ordinally scaled variable with four meaningfully ordered categories: freshman, sophomore, Junior, Senior. The scale scores (1,2,3,4) have meaning in that Juniors have complete more units than sophomores who have completed more than freshman . . .

Another example is likert scaled items: eg strong agree ---- strong disagree

Data at this level of measurement can be summarized by:

Frequency distribution tables
Crosstabulation tables
Charts/graphs

There is a tendency to want to jump to the presentation of central tendency and variability at this level of measurement. Should not. Data is not yet continuous (measured to finer degrees).

In survey research, some make the argument that the underlying scale is continuous, however the data is clearly ordinal.

The reasonable exception is when you generate from several likert scaled items a factor score. Now the combined set of several items approaches a continuum and it is now more meaningful and less misleading to summarize factor scores with measures of central tendency and variability.

Interval scale: Used to measure continuous variables that are ordinal in nature and result in values that represent actual and equal differences in the variable measured.

Ex: Temperature is an interval scaled variable with meaningfully ordered categories (hot, cold) that can be measured (scale has a constant unit of measurement) to finer and finer degrees given appropriate instrumentation.
Data at this level of measurement can be summarized by:

Charts/graphs
Central Tendency & Variability
Correlation

Data is now considered continuous and measures of central tendency and variability are an excellent way to summarize descriptive information on subjects’ characteristics at this level of measurement.

Ratio scale: Used to measure continuous variables that have a true zero, implying total lack of the attribute/property being measured.

Ex: Weight is a ratio scaled variable with meaningfully ordered categories (heavy, light) that can be measured to finer and finer degrees that also has a true rather than arbitrary zero.

Data at this level of measurement can be summarized by:

Charts/graphs
Central Tendency & Variability
Correlation

 

Statistical Analysis of Data - Descriptive Statistics

Depending on level of measurement summary information should be provided on

Participant Demographics
Participant Demographics by subgroup
All other variables relevant to the question under study
All other variables relevant to the question under study by subgroup

Descriptive Statistics

 

Frequency Distribution Tables

Category frequency Percent
High 15 17%
Medium 30 33%
Low 45 50%

 

When reporting percentages, author should report the underlying frequencies because percentages alone can be misleading.

  College A College B
Number of Students 150 350
Sport Philosophy Students 12 (8%) 15 (4%)

 


Crosstabulation Tables

For example, if you have data on dominant hand and gender and want to know what percentage of females in a group are left handed, you could crosstabulate the two:

  Left Handed Right Handed
Male 10 (33%) 20 (67%)
Female 15 (33%) 30 (67%)

Author should make sure the direction for the total matches the text explanation.

 

Central Tendency - Mean, Median, Mode

Provides a measure of where scores tend to center. Most commonly reported is the mean; however, it is NOT a representation of the center when the distribution is skewed. The median should be reported in that instance.

Data may be severely mis-represented when an inappropriate measure of central tendency is reported.

Data should be at least interval scaled when using the median or mean.

Responses to individual Likert scaled items are not interval scaled.

 

Variability - Standard Deviation, Range

The companion to central tendency. Provides a measure of the spread of scores. Should always be reported with measures of central tendency.

 

Correlation

Provides a measure of the strength of the relationship between two variables. Selection of a correlation coefficient depends on the variable type

Two continuous: Pearson Product Moment Correlation
Two true dichotomous: Phi
Two ordinal: Kendall's Tau
One continuous; one true dichotomous: Point Biserial

General Interpretation:

-.8 to -1.0 High/strong +.8 to 1.0
-.6 to -.79 Moderate High .6 to .79
-.4 to -.59 Moderate .4 to .59
-.2 to -.39 Low .2 to .39
0 to -.19 no relationship 0 to .19

 


 

Research/Analytical Designs to address main question

Experimental Research - Designs (Differences)

When interested in differences or change over time for one group or between groups a number of designs are applicable. The most frequently used designs can be collapsed into two broad types: true experimental and quasi-experimental.

True experimental designs: these designs all have in common the fact that the groups are randomly formed. The advantage associated with this feature is it permits the assumption to be made that the groups were equivalent at the beginning of the research which would provide control over sources of invalidity based on non-equivalency of groups.

The control is of course not inherent in the design. The researcher must still work with the groups in such a way that nothing happens to one group (other that the treatment) that does not happen to the other and that scores on the dependent measure do not vary as a result of instrumentation problems, or that the loss of subjects is not different between the groups.

Randomized groups design:

This design requires the formation of at least two groups. One group will receive the ‘experimental treatment’ the other will not. The group not receiving the treatment is commonly referred to as the control group.

This design allows the researcher to test for significant differences between the control and experimental group after the experimental group has received the treatment. An independent t-test or one-way analysis of variance (ANOVA) may be used to statistically test the null hypothesis that

H0: µ1 = µ2.

In this design there is one independent variable and one dependent variable. When there are 2 levels of the independent variable either a t-test or one-way ANOVA can be used. When there are 3 or more levels of the independent variable then the one-way ANOVA must be used. For example, when there are 3 levels of the independent variable the null hypothesis is:

H0: µ1 = µ2 = µ3.

In this expanded design there is still one independent variable (now with 3 levels) and one dependent variable. The independent variable is still groups or treatment condition and the dependent variable is again the variable under study.

Factorial design:

Essentially an extension of the randomized-groups design, this design has more than one independent variable and just one dependent variable. This design requires the formation of a group for every combination (of every level) of the two or more independent variables.

This design allows the researcher to test for significant differences as a function of each independent variable separately (main effects) and in combination (interaction). A two-way ANOVA would be used to statistically test the null hypothesis that H0: µ1 = µ2 = µ3 = ... for the first independent variable, that µ1 = µ2 = µ3 = ... for the second independent variable, and that the interaction is not significant.

The ‘jargon’ commonly associated with a factorial design looks like:

a 2X2 ANOVA .....

This is communicating that there are two levels of the first independent variable and two levels of the 2nd independent variable. The language used to talk about the results would be the main effect for the first IV, the main effect for the 2nd IV, and the interaction.

Variation of factorial design: When one or more of the independent variables is a categorical variable, such as gender, where individuals cannot be randomly assigned to the levels, you have a factorial design that no longer qualifies completely as a true experimental design, but, is used quite frequently and is quite appropriate when the topic under study calls for the examination of characteristics that people cannot be ‘assigned’ to.

Pretest-Posttest Randomized-groups design

In its simplest form, this design requires the formation of two groups. One group will receive the ‘experimental treatment’ the other will not. The group not receiving the treatment is still referred to as the control group.

Consider a dietary seminar intended to change eating habits particularly with respect to consumption of fat.

Group 1 Pre Test Seminar Post Test
Group 2 Pre Test   Post Test

 

In this example there are two independent variables and one dependent variable. In the situation depicted above there are two levels of each independent variable. The first independent variable is group or treatment condition (two levels - experimental/group 1 & control/group 2). The second independent variable is test (two levels - pretest & posttest). The dependent variable is grams of fat consumed.

Repeated Measures Design

The repeated measures design is a variation of the completely randomized design though not considered a true experimental design. Instead of using different groups of subjects, only one group of subjects is formed and all subjects are measured/tested multiple times. There is no control group.

This design allows the researcher to test for significant differences produced by the treatment - are the means across repeated measures different. A repeated measures ANOVA is the recommended analytical procedure. With this approach you have one independent variable and one dependent variable.

As an example, assume that a researcher wants to know whether or not mean scores on a measure of exercise satisfaction change depending on the environment runners exercise in. To answer this, the researcher obtains measures of exercise satisfaction from subjects after they run in an urban setting, the countryside, an indoor track, and an outdoor track. The dependent variable is exercise satisfaction and the independent variable is exercise environment.

The major advantage of this design over the completely randomized design is that fewer subjects are required. In addition, very often increased statistical power is gained because the random variability of a single subject from one measure to the next is usually much less than the variability introduced by measuring and comparing different subjects. The major disadvantage is that there may be carry-over effects from one treatment/testing to the next. In addition, subjects might become progressively more proficient at performing the criterion task and show an improvement in performance more attributable to learning than the treatment.

Regardless of the design, tests of significance should be followed by an examination of practical significance.


 

Experimental Research - Designs (Relationships)

When interested in the relationship between/among variables, there are no design designations like ‘factorial’. The design in this situation is equated with the analytical technique to be employed. Even without design names, good researchers communicate clearly what the independent and dependent variables were and how the strength of the relationship was tested. In addition, an examination of practical significance is essential.

The null hypothesis under examination with a relationship question is:

H0: ρ = 0


Statistical Analysis of Data - Inferential Statistics

Differences - (parametric tests): t-test - Independent & Dependent; omega squared

To examine whether or not there is a statistically significant difference in means on some dependent variable (continuous) as a function of some independent variable (categorical) you can use the t-test when you have just two levels of the independent variable (ex: gender)

Independent t-test Statistical Procedure for testing H0: µ1 = µ2 when the two levels of the independent variable are not related.

Dependent t-test Statistical Procedure for testing H0: µ1 = µ2 when the two measures of the dependent variable are related. For example, when one group of subjects is tested twice, the two scores are related.

 

There are distributional assumptions associated with parametric statistics such as the t-test and ANOVAs. The most basic are:

Authors should convey to reader results of checking assumptions. If assumptions violated then the non-parametric equivalent should be used.

Assessing statistical significance Following analyses using a t-test you could compare the t statistic to an appropriate table of critical values. Information needed is alpha and df:

n1+n2-2 (independent)

N –1 (dependent)

If the t statistic > critical value you can reject your null hypothesis. Most frequently however authors have used software to give them a p value to compare to the alpha they’ve chosen. If the p value < the alpha you can reject the null hypothesis. REMEMBER, if multiple tests done, alpha should be modified before comparison done.

Note: p value can be considered the probability that findings due to chance (sampling error).

Assessing practical significance. Remember the above ‘test’ tells you whether there's a statistically significant difference not whether the difference is of any practical importance. Therefore, it's important for authors to take the next step and examine practical significance by calculating a statistic such as omega squared - proportion of total variance that can be explained by the independent variable. Another useful measure is an effect size.

Effect Size. Infrequently reported, but, this statistic very valuable when it comes to interpreting results. It conveys the size of the effect observed in a way that permits interpretation of the practical significance of the results.

For a differences study:

Interpretation:

.30 Small effect
.50 Moderate effect
.80 Large effect

 

ANOVA - one-way, two-way, repeated measures; eta squared

To examine whether or not there is a statistically significant difference in means on some dependent variable (continuous) as a function of some independent variable (categorical) you can use the F test from an ANOVA table when you have two or more levels of the independent variable (ex: 3 training protocols)

One-way ANOVA

Statistical Procedure for testing H0: µ1 = µ2 = ... when the two or more levels of the independent variable are not related.

There are distributional assumptions associated with parametric statistics such as the t-test and ANOVAs. The most basic are:

Authors should convey to reader results of checking assumptions. If assumptions violated then the non-parametric equivalent should be used.

Assessing statistical significance Following analyses using a F test you could compare the F statistic to an appropriate table of critical values. Information needed is alpha and df:

K-1; N-K

If your F statistic > critical value you can reject your null hypothesis. Most frequently however authors have used software to give them a p value to compare to the alpha they’ve chosen. If the p value < the alpha you can reject the null hypothesis. REMEMBER, if multiple tests done, alpha should be modified before comparison done.

Assessing practical significance Remember the above ‘test’ tells you whether there's a statistically significant difference not whether the difference is of any practical importance. Therefore, it's important for authors to take the next step and examine practical significance by calculating a statistic such as eta squared - proportion of total variance that can be explained by the independent variable. Another useful measure is an effect size.

Effect Size. Infrequently reported, but, this statistic very valuable when it comes to interpreting results. It conveys the size of the effect observed in a way that permits interpretation of the practical significance of the results.

For a differences study:


 

Two-way ANOVA

You now have two independent variables and one dependent variable. The two way ANOVA provides information on three H0:

A difference in the dependent variable due to the 1st independent variable
A difference in the dependent variable due to the 2nd independent variable.
A difference in the dependent variable due to the interaction of the two independent variables.

Assumptions - Homogeneity of Variance, Normality

Assessing Statistical Significance

Take a look at the p values for each of the main effects and interaction. If the p value < the alpha you can reject the null hypothesis. REMEMBER, since multiple tests done, alpha should be divided by 3 before comparison done.

Assessing practical significance Remember the above ‘test’ tells you whether there's a statistically significant difference not whether the difference is of any practical importance. Therefore, it's important for authors to take the next step and examine practical significance by calculating a statistic such as eta squared - proportion of total variance that can be explained by the independent variable.

Repeated Measures ANOVA

Statistical Procedure for testing H0: µ1 = µ2 = ... when the two or more measures of the dependent variable are related. For example, when one group of subjects is tested two or more times, the two scores are related.

Assumptions:

Repeated Measures at least interval scaled
Sphericity

Assessing statistical significance following analyses using a F test you could compare the F statistic to an appropriate table of critical values. Information needed is alpha and df:

K-1; N-K

If your F statistic > critical value you can reject your null hypothesis. Most frequently however authors have used software to give them a p value to compare to the alpha they’ve chosen. If the p value < the alpha you can reject the null hypothesis. REMEMBER, if multiple tests done, alpha should be modified before comparison done.

Assessing practical significance Remember the above ‘test’ tells you whether there's a statistically significant difference not whether the difference is of any practical importance. Therefore, it's important for authors to take the next step and examine practical significance by calculating a statistic such as eta squared - proportion of total variance that can be explained by the independent variable.


 

Inferential Statistics - Differences - Non-parametric

Mann Whitney: This statistic is the non-parametric equivalent to the independent t-test. There are no distributional assumptions to meet. This statistic tests for a difference in two medians and should be used when the underlying distribution can be considered continuous.

Wilcoxon: This statistic is the non-parametric equivalent to the dependent t-test and repeated measures ANOVA. There are no distributional assumptions to meet. This statistic tests for a difference in two or more medians and should be used when the underlying distribution can be considered continuous.

Kruskal Wallace: This statistic is the non-parametric equivalent to the one-way ANOVA. There are no distributional assumptions to meet. This statistic tests for a difference in two or more medians and should be used when the underlying distribution can be considered continuous.

Assumptions for non-parametric (differences) tests

1. Samples were drawn at random from the population under consideration.
2. Variable(s) under study have underlying continuity.


 

Relationships - Parametric Tests

Correlation


Statistical Significance: Correlation

Practical Significance: Coefficient of Determination

Pearson Product Moment Correlation. When examining the null hypothesis: ρ = 0, it is important to remember that the reliability of the research should be considered. In this setting this is a matter of considering the reliability of the correlation coefficient. Said another way the question becomes: If the study is repeated, would the coefficient be similar? Answer rests in examination of the sample size and variability of scores.

A restriction in the range of scores (sampling; subgroups) can drastically affect the correlation coefficient. Interpretation must take into consideration the variability of the scores.

Assumptions:

Linearity: straight line can be draw through points on scatterplot
Data for both x and y at least interval scaled

Assessing statistical significance Following analyses using a PPMC you could compare the PPMC statistic to an appropriate table of critical values. Information needed is alpha and df:

N-2

If the PPMC statistic > critical value you can reject your null hypothesis. Most frequently however authors have used software to give them a p value to compare to the alpha they’ve chosen. If the p value < the alpha you can reject the null hypothesis. REMEMBER, if multiple tests done, alpha should be modified before comparison done.

Assessing practical significance Remember the above ‘test’ tells you whether there's a statistically significant relationship not whether the relationship is of any practical importance. Therefore, it's important for authors to take the next step and examine practical significance by calculating a statistic such as the coefficient of determination - r2 - proportion of total variance that can be explained by the independent variable.

 

Regression

This is the most common approach to prediction problems when you have one dependent variable and multiple independent variables.

When used as a data reduction tool, the process can be viewed as a step by step consideration of which variables in combination with each other are most strongly correlated with the dependent variable.

Assumptions - Regression

Linearity: straight line can be draw through points on scatterplot
Homoscedasticity: Y values at each x similar in variability
Dependent variable at least interval scaled
Multicolinearity: relationship among independent variables - Regression

Note: The distributional assumptions are likely to be violated when:

1. N small
2. Growth is present. Variance tends to increase with age.
3. Observations/trials truncated or insufficient practice given. Pattern may be curvilinear.

Hypothesis testing for significant regression

H0: b = 0

Assessing statistical significance Following analyses based on the analysis of variance procedure, you could compare the F statistic to an appropriate table of critical values. Information needed is alpha and df:

K; N-k-1

If the F statistic > critical value you can reject your null hypothesis. Most frequently however authors have used software to give them a p value to compare to the alpha they’ve chosen. If the p value < the alpha you can reject the null hypothesis. REMEMBER, if multiple tests done, alpha should be modified before comparison done.

Assessing practical significance

Remember the above ‘test’ tells you whether there's a statistically significant relationship not whether the relationship is of any practical importance. Therefore, it's important for authors to take the next step and examine practical significance by calculating a statistic such as the coefficient of determination - r2 - proportion of total variance that can be explained by the independent variable(s). Another useful measure is the effect size.



Relationships - Non-parametric


Statistical Significance: Chi-squared

Practical Significance: Cramer’s V & Phi

The statistic that will test for the presence relationship between two categorical (though can also be used on ordinal data) variables is the chi-square statistic. The null hypothesis under examination is:

ρxy = 0

This is read as: the correlation between x and y is zero. Another way to say this is that the variables x and y are independent. In fact the χ2 statistic is commonly referred to as the chi square test of independence.

Assumptions

The expected frequency in all cells is at least 5.
Data must be random samples from multinomial distributions.

 

Assessing statistical significance Following analyses using the χ2 statistic , you could compare the χ2 statistic to an appropriate table of critical values. Information needed is alpha and df:

df = (R-1)(C-1)

Where R = # of rows, and C = # of columns in a cross-tabulation table.

If the χ2 statistic > critical value you can reject your null hypothesis. Most frequently however authors have used software to give them a p value to compare to the alpha they’ve chosen. If the p value < the alpha you can reject the null hypothesis. REMEMBER, if multiple tests done, alpha should be modified before comparison done.

Assessing practical significance Remember the above ‘test’ tells you whether there's a statistically significant relationship not whether the relationship is of any practical importance. Therefore, it's important for authors to take the next step and examine practical significance by calculating a statistic such as the coefficient of determination. Another useful measure is the effect size.

For a relationship study, the effect size is the correlation coefficient (Phi, Cramer’s V). These statistics convey the strength of the relationship between the two categorical variables. Interpretation:

.30 Small
.50 Moderate
.80 Large


 

Statistical Analysis of Data - related questions

 

Best to use descriptive statistics to examine related questions so as not to diminish power.

Selection of Descriptive Statistics to Summarize Data

Level of Measurement Applicable Statistics
Nominal/Categorical Percentages, Mode
Ordinal Percentages, Mode, Median*
Interval Mean, Median, Mode, Standard Deviation, Range, Percentiles, Z scores, Correlation
Ratio Mean, Median, Mode, Standard Deviation, Range, Percentiles, Z scores, Correlation

*Note: Use of the median for ordinal data should be applied only in situations where the underlying variable can be considered continuous and the numbers do not simply represent a few discrete categories.


Results Section - Analysis of Data

Summarizing Data

Practical Significance

Distinction to reinforce: Continuous variables are ones at least interval scaled - call for use of parametric statistics. Discrete variables are ones that are categorical or ordinal in nature and call for use of non-parametric statistics.

Depending on level of measurement, statistical testing is an appropriate process for examining the main research question.

Null Hypothesis for differences: H0: µ1 = µ2

Null Hypothesis for Relationships: H0: ρ = 0

Decision as to whether or not parmetric test appropriate tied to whether or not assumptions met.

 

Work not complete until an assessment of practical significance done (minimally effect size).

Relationships: Coefficient of Determination
Differences: Eta squared (or omega squared)

 



Evaluating Results Section

Structure/Content: Entire text should be cohesive and follow a logical path that generates confidence in the findings. The following is one recommended structure. The order may vary, however the content should be present and should match the information conveyed in the analysis portion(s) of the methods section.


 

Evaluating Discussion Section

A tight summary of purpose and results should be included.

Methodological limitations should be clarified.

Results should be cast in light of literature cited in the intro/review of literature

New literature should not be brought into the discussion.

Implications and/or recommended action in light of findings should be drawn out in this section.

Next steps likely to extend or clarify research presented should be suggested.

Any speculations must be clearly identified as such. There should be no doubt as to whether the discussion of results is data based or conjecture.

 


Experimental Research - Power, Type I error, Type II error

 

Hypothesis testing involves examination of a statistically expressed hypothesis. The statistical expression is referred to as the null hypothesis. It is called null because the expression when completed implies no difference or relationship depending on the problem being examined.

You can think of hypothesis testing as trying to see if your results are unusual enough so that they would not even be expected by chance.

Key Elements

 

 

In practice you will never know whether or not you've made a poor decision (made a type I or type II error) but, you can (a) set the probability that you will make a type I error when you select your alpha, and (b) determine beta (through estimating power) to estimate the probability that you made a type II error. Note: since sample size is directly related to power (and so tied to beta), studies will fail to find statistically significant results even when they do exist because of a small sample size.

Power - is the probability of correctly rejecting a false null hypothesis.

Ideally, power should be considered when planning a study, not after it is over. Knowing what you would like power to be you can determine (using software or power charts) what your sample size should be.

If power is not considered at the start of a study it should be estimated at the end, particularly when non-significant results arise.

Sample size is closely tied to power. True differences/relationships go unnoticed without enough subjects. On the other hand, trivial differences/relationships can be statistically significant with large sample sizes.

Another factor affecting power is measurement precision. As precision increases, power increases.

- If you decrease alpha (more stringent) power will decrease so beta will increase.
- As you increase sample size, power increases so beta decreases.
- As you enhance measurement precision both alpha and beta decrease so power increases.- As effect size increases both alpha and beta decrease so power increases.

Null & Alternative distributions for a 2-tailed test & Alpha = .05

 

 

Null & Alternative distributions for a 2-tailed test & alpha = .05 - increased effect size

 

 

Null & Alternative distributions for a 2-tailed test & Alpha = .01 (more stringent)

 

 

Null & Alternative distributions for a 2-tailed test & Alpha = .05 - large sample.

 

To determine sample size in the context of power or to determine power at the completion of a study, use the GPower software.


Hypothesis testing: Assessing Statistical Significance

To determine whether or not you have a statistically significant finding you either

A common misconception is that failure to reject the null is evidence that the null is true. Simply not the case - there is no statistical evidence to support this following hypothesis testing. In fact, even with power = .80, Beta is .20 which means 20% chance of making a type II error.