当前位置:文档之家› 【北师大心理统计学课件】2 Chapter 2数据清理

【北师大心理统计学课件】2 Chapter 2数据清理


Missing data

Andrew’s Fourier thransformation Chernoff’s face

Missing data: Valid values on one or more variables are not available for analysis. Researcher’s primary concern is to identify the patterns and relationships underlying the missing data in order to maintain as close as possible the original distribution of values when any remedy is applied.
Missing data: A Simple Example Missing data

The Impact of Missing Data


The practical impact of missing data is the reduction of the sample size avaliable for analysis; From a substantive perspective, any statistical results based on data with a nonrandom missing data process could be biased.
2010-3-12
Learning Objectives
Multivariate Data Analysis Chapter 2 – Examining Your Data




Selected the appropriate graphical method to examine the characteristics of the data or relationships of interest; Assess the type and potential impact of missing data; Understand the different types of missing data processes; Explain the advantage and disadvantage of the approaches available for dealing with missing data.
4
2010-3-12
Step 2:Determine the Extent of Missing Data

Step2:RULES OF THUMB 2-1

How much missing data is too much?

Assessing the extent and patterns of missing data
Chapter 2:Graphical Examination of the data

Univariate Profiling: Examing the shape of the Distribution

Histogram Stem and leaf diagram
1
2010-3-12
X6 - Product Quality Stem-and-Leaf Plot Frequency 3.00 10.00 10.00 10.00 5.00 11.00 9.00 14.00 18.00 8.00 2.00 Stem & Leaf 5. 5. 6. 6. 7. 7. 8. 8. 9. 9. 10 . 012 5567777899 0112344444 5567777999 01144 55666777899 000122234 55556667777778 001111222333333444 56699999 00
X7 - E-Commerce Activities
SS Between Within Total .864 47.718 48.582 df 2 97 99 MS .432 .492 F .878 Sig. .419
2
2010-3-12
Chapter 2:Multivariate Profiles

Researcher attains a basic understanding of the data and relationships between variables; Researcher ensures that the data underlying the analysis meet all of the requirements for multivariate analysis


Missing data are expected and part of the research design; Sampling rather than Population The specific design of data collection process Censored data

Substantive


3
2010-3-12
Missing Data

Determine the Type of Missing Data

Four-Step Process for Identifying Missing Data and Applying Remedies

Ignorable Missing Data

Missing data under 10% for an individual case or observation can be generally be ignored, except when the missing data occurs in a specific nonrandom fashion (e.g., concentration in a specific set of questions, attrition at the end of the questionnaire, etc.) The number of cases with no missing data must be sufficient for selected analysis technique if replacement values will not be substituted (imputed) for missing data.
Simple Example for Missing Data

Practical Standpoint

Only 5 cases with no missing data; Eliminating V3, There are 12 cases with no missing data. The missing pattern of V4 based on the value of V2. Mean(V2)=7.8 Vs. Mean(V2)=8.4



The percentage of variables with missing data for each case; The number of cases with missing data for each variable. The number of cases with no missing data on any of the variables.

Determine whether the extent or amount of missing data is low enough to not affect the results, even if it operates in a nonrandom manner.
What is low enough?
Chapter 2:Graphical Examination of the data


Bivariate Profiling: Examining the Relationship Between Variables Scatterplot
Stem width: Each leaf:
1.0 1 case(s)


Determine the Type of Missing Data Determine the Extent of Missing Data Diagnose the Randomness of the Missing Data Processes Select the Imputation Method
Chapter 2:Graphical Examination of the data

Bivariate Profiling: Examining Group Differences

Boxplot
ANOVA
X6 - Product Quality
SS Between Within Total 83.078 109.932 193.010 df 2 97 99 MS F Sig. .000 41.539 36.652 1.133

Deleting individual cases with excessive levels of missing data. Deleting individual variable with excessive levels of missing data.
Step 2:Determine the Extent of Missing Data

Step2: Deletions Bases on Missing Data
相关主题