Data Management in Large-Scale Education Research
Crystal Lewis
Contents
About the Author…………………………………………………………………………………….xiii
Acknowledgments…………………………………………………………………………………..xiv
Introduction…………………………………………………………………………………………1
1.1 Why This Book?………………………………………………………………………….1
1.1.1 Lack of Training, Resources, and Standards……………………..2
1.1.2 Consequences…………………………………………………………………..3
1.2 About This Book………………………………………………………………………….3
1.2.1 What This Book Will Cover………………………………………………4
1.2.2 What This Book Will Not Cover……………………………………….4
1.3 Who This Book Is For………………………………………………………………….5
1.4 Final Note……………………………………………………………………………………5
Research Data Management Overview……………………………………………….6
2.1 What Is Research Data Management?………………………………………….6
2.2 Data Management Standards………………………………………………………6
2.3 Why Care about Research Data Management?……………………………7
2.3.1 External Reasons………………………………………………………………7
2.3.2 Personal Reasons……………………………………………………………..8
2.4 Existing Frameworks…………………………………………………………………10
2.4.1 FAIR……………………………………………………………………………….10
2.4.2 SEER………………………………………………………………………………11
2.4.3 Open Science………………………………………………………………….11
2.5 Terminology………………………………………………………………………………12
2.6 The Research Life Cycle…………………………………………………………….12
Data Organization……………………………………………………………………………..16
3.1 Basics of a Dataset……………………………………………………………………..16
3.1.1 Columns…………………………………………………………………………16
3.1.2 Rows………………………………………………………………………………18
3.1.3 Cells……………………………………………………………………………….18
3.2 Dataset Organization Rules……………………………………………………….18
3.3 Linking Data……………………………………………………………………………..22
3.3.1 Database Design…………………………………………………………….22
3.3.2 Data Structure………………………………………………………………..27
Notes ………………………………………………………………………………………………….29
Human Subjects Data………………………………………………………………………..30
4.1 Identifiability of a Dataset………………………………………………………….30
4.2 Data Classification…………………………………………………………………….31
4.3 Human Subjects Data Oversight………………………………………………..32
4.3.1 Regulations and Laws…………………………………………………….32
4.3.2 Institutions and Departments…………………………………………33
4.3.3 External Permission………………………………………………………..34
4.3.4 Agreements…………………………………………………………………….34
4.3.5 Funders…………………………………………………………………………..34
4.4 Protecting Human Subjects Data……………………………………………….35
Notes ………………………………………………………………………………………………….35
Data Management Plan……………………………………………………………………..36
5.1 History and Purpose………………………………………………………………….36
5.1.1 Why Are DMPs Important?……………………………………………38
5.2 What Is It?…………………………………………………………………………………38
5.2.1 What to Include?…………………………………………………………….39
5.3 Creating a Data Sources Catalog………………………………………………..41
5.4 Getting Help……………………………………………………………………………..43
5.5 Budgeting………………………………………………………………………………….44
Notes ………………………………………………………………………………………………….44
Planning Data Management………………………………………………………………46
6.1 Why Spend Time on Planning?………………………………………………….46
6.2 Goals of Planning………………………………………………………………………48
6.3 Planning Checklists…………………………………………………………………..48
6.3.1 Decision-Making Process………………………………………………..49
6.3.2 Checklist Considerations………………………………………………..50
6.4 Data Management Workflow…………………………………………………….51
6.4.1 Benefits to Visualizing a Workflow…………………………………57
6.4.2 Workflow Considerations……………………………………………….57
6.5 Task Management Systems………………………………………………………..57
Project Roles and Responsibilities…………………………………………………….59
7.1 Research Project Roles……………………………………………………………….59
7.1.1 Investigators…………………………………………………………………..59
7.1.2 Project Coordinator………………………………………………………..60
7.1.3 Data Manager…………………………………………………………………60
7.1.4 Project Team Members……………………………………………………60
7.1.5 Other Roles…………………………………………………………………….62
7.2 Assigning Roles and Responsibilities………………………………………..62
7.3 Documenting Roles and Responsibilities…………………………………..64
Documentation…………………………………………………………………………………..68
8.1 Team-Level………………………………………………………………………………..69
8.1.1 Lab Manual…………………………………………………………………….69
8.1.2 Wiki………………………………………………………………………………..72
8.1.3 Onboarding and Offboarding…………………………………………73
8.1.4 Data Inventory……………………………………………………………….74
8.1.5 Team Data Security Policy………………………………………………75
8.1.6 Style Guide…………………………………………………………………….76
8.2 Project-Level……………………………………………………………………………..77
8.2.1 Data Management Plan………………………………………………….77
8.2.2 Data Sources Catalog……………………………………………………..77
8.2.3 Checklists and Meeting Notes………………………………………..77
8.2.4 Roles and Responsibilities Document…………………………….78
8.2.5 Research Protocol……………………………………………………………79
8.2.6 Supplemental Documents………………………………………………80
8.2.7 Standard Operating Procedures……………………………………..84
8.3 Dataset-Level…………………………………………………………………………….86
8.3.1 Readme…………………………………………………………………………..86
8.3.2 Changelog………………………………………………………………………89
8.3.3 Data Cleaning Plan…………………………………………………………90
8.4 Variable-Level……………………………………………………………………………91
8.4.1 Data Dictionary………………………………………………………………92
8.4.2 Codebook……………………………………………………………………….99
8.5 Repository Metadata……………………………………………………………….100
8.5.1 Metadata Standards………………………………………………………103
8.6 Wrapping It Up………………………………………………………………………..105
Notes ………………………………………………………………………………………………..105
Style Guide………………………………………………………………………………………107
9.1 General Good Practices……………………………………………………………109
9.2 Directory Structure………………………………………………………………….110
9.3 File Naming…………………………………………………………………………….113
9.4 Variable Naming……………………………………………………………………..117
9.4.1 Time……………………………………………………………………………..120
9.5 Value Coding…………………………………………………………………………..122
9.5.1 Missing Value Coding…………………………………………………..124
9.6 Coding…………………………………………………………………………………….125
Note ………………………………………………………………………………………………….127
Data Tracking……………………………………………………………………………………128
10.1 Benefits……………………………………………………………………………………130
10.2 Building Your Database……………………………………………………………131
10.2.1 Comparing Database Types………………………………………….132
10.2.2 Designing the Database………………………………………………..136
10.2.3 Choosing Fields…………………………………………………………….139
10.2.4 Choosing a Tool…………………………………………………………….141
10.3 Entering Data…………………………………………………………………………..142
10.3.1 Entering Data in a Tabular View…………………………………..142
10.3.2 Entering Data in a Form……………………………………………….143
10.4 Creating Unique Identifiers……………………………………………………..144
10.5 Summary…………………………………………………………………………………147
Notes ………………………………………………………………………………………………..148
Data Collection…………………………………………………………………………………149
11.1 Quality Assurance and Control……………………………………………….149
11.2 Quality Assurance……………………………………………………………………151
11.2.1 Questionnaire Design……………………………………………………152
11.2.2 Pilot the Instrument………………………………………………………155
11.2.3 Choose Quality Data Collection Tools…………………………..157
11.2.4 Build with the End in Mind………………………………………….159
11.2.5 Ensure Compliance………………………………………………………169
11.3 Quality Control………………………………………………………………………..171
11.3.1 Field Data Management……………………………………………….171
11.3.2 Ongoing Data Checks…………………………………………………..173
11.3.3 Tracking Data Collection………………………………………………174
11.3.4 Collecting Data Consistently………………………………………..175
11.4 Bot Detection……………………………………………………………………………176
11.5 Review…………………………………………………………………………………….177
Notes ………………………………………………………………………………………………..178
Data Capture…………………………………………………………………………………….180
12.1 Electronic Data Capture…………………………………………………………..180
12.1.1 Documenting Electronic Data Capture…………………………186
12.2 Paper Data Capture…………………………………………………………………187
12.2.1 Choose a Quality Data Entry Tool…………………………………188
12.2.2 Build with the End in Mind………………………………………….188
12.2.3 Develop a Data Entry Procedure…………………………………..189
12.2.4 Documenting Paper Data Capture………………………………..194
12.2.5 Scanning Forms…………………………………………………………….194
12.3 Extant Data………………………………………………………………………………194
12.3.1 Non-Public Data Sources………………………………………………194
12.3.2 Public Data Sources………………………………………………………199
12.3.3 Documenting External Data Capture……………………………200
Note ………………………………………………………………………………………………….200
Data Storage and Security………………………………………………………………..201
13.1 Planning Short-Term Data Storage…………………………………………..201
13.1.1 Electronic Data……………………………………………………………..203
13.1.2 Paper Data……………………………………………………………………206
13.1.3 Oversight……………………………………………………………………..206
13.2 Documentation and Dissemination………………………………………….207
Data Cleaning…………………………………………………………………………………..208
14.1 Data Cleaning for Data Sharing……………………………………………….208
14.2 Data Quality Criteria……………………………………………………………….211
14.3 Data Cleaning Checklist…………………………………………………………..213
14.3.1 Checklist Steps……………………………………………………………..214
14.4 Data cleaning Workflow…………………………………………………………..226
14.4.1 Preliminary Steps………………………………………………………….226
14.4.2 Cleaning Data Using Code……………………………………………227
14.4.3 Cleaning Data Manually……………………………………………….230
14.4.4 Data Versioning Practices……………………………………………..231
Note ………………………………………………………………………………………………….231
Data Archiving…………………………………………………………………………………232
15.1 Long-Term Storage…………………………………………………………………..232
15.1.1 Paper Data……………………………………………………………………232
15.1.2 Electronic Data……………………………………………………………..234
15.1.3 Oversight and Documentation……………………………………..235
15.2 Internal Data Use…………………………………………………………………….235
15.3 Using a Repository…………………………………………………………………..237
Notes ………………………………………………………………………………………………..238
Data Sharing…………………………………………………………………………………….239
16.1 Why Share Your Data?……………………………………………………………..239
16.2 Data Sharing Flow Chart…………………………………………………………241
16.2.1 Are You Able to Share?…………………………………………………242
16.2.2 Where to Share?……………………………………………………………243
16.2.3 What Data to Share……………………………………………………….249
16.2.4 What Documentation to Share………………………………………257
16.2.5 When to Share………………………………………………………………260
16.3 Repository File Structure…………………………………………………………261
16.4 Roles and Responsibilities……………………………………………………….263
16.5 Revisions…………………………………………………………………………………264
Notes ………………………………………………………………………………………………..265
Additional Considerations……………………………………………………………….266
17.1 Multi-Site Collaborations…………………………………………………………266
17.2 Multi-Project Teams…………………………………………………………………267
17.3 Summary…………………………………………………………………………………268
Glossary………………………………………………………………………………………………….270
Appendix………………………………………………………………………………………………..276
References……………………………………………………………………………………………….278
Index……………………………………………………………………………………………………….293