Data Contents Overview

The study has conducted six waves of data collection, spanning from 1998 through 2017.  The publicly-available data from interviews at each wave, as well as the home visit activities, and interviewer observations in Year 3 through Year 15, is described below.   This page also provides a brief overview of data available through the restricted-use contract process, and a publicly-available file constructed specifically for the Fragile Families Challenge. 

Public Data

The Baseline wave of data collection took place from 1998 to 2000 and includes mother and father core interviews at the birth of the study's "focal child."   These interviews were conducted primarily in the hospital shortly following the focal child's birth.  

At Baseline and the subsequent five waves, the core phone interviews collected data on parental relationships, parenting, health and health behaviors, family and social support, demographics, housing, use of social programs, and education and employment.   

The Year 1 follow-up wave of data collection took place from 1999 to 2001 and includes mother and father core interviews around the time of the focal child's first birthday.   

The Year 3 (2001-2003) and Year 5 (2003-2006) follow-up waves included mother and father core interviews, as well as primary caregiver interviews and home visits around the time of the focal child's third and fifth birthdays.   The primary caregiver interviews included questions on home life and routines, health and health care, and parenting.  During the home visits, assessments such as the Peabody Picture Vocabulary Test (PPVT) and direct height and weight measurements were given. Interviewers observed the home environment (surrounding neighborhood, interior and exterior of house/apartment) and recorded additional information about the parent and child's affect during the home visit.   

The Year 9 (2007-2010) follow-up wave included mother and father core interviews, as well as primary caregiver interviews, home visits, and interviewer observations, similar to the previous two waves.  We conducted a short interview with the focal child around their ninth birthdays, collecting information on their relationships with parents and siblings, school connectedness, task completion, self-concept, and home routines.   

The Year 15 follow-up wave of data collection took place from 2014 to 2017.  Activities included:  primary caregiver and teen interviews (mostly conducted by phone), and home visit activities and interviewer observations conducted with a subset of ~1,000 teens.  In addition to recollecting data on the topics covered throughout the previous five waves, the phone interviews included new measures on focal childrens' education and school experiences, risky behaviors such as sexual activity and substance use, peer interactions, and pro-social behaviors.  The home visits included height/weight/waist circumference measurements, and interviewer observations of the home environment. 

Data files containing measures from the interviews, home visits, and observations are available for download through the OPR Data Archive.

Restricted-Use Contract Data

Additional files are available through our restricted-use contract data file, including:

A geographic file (focal child's birth city, mother's and father's state of residence at each interview,  and stratum and psu), a set of contextual characteristics for the family's Census tract of residence, and are available from each wave

Medical records data for mothers and children from the birth hospitalization record (Baseline).

We also release a school characteristics file (for the focal child's school at Grade 1, and the Year 9 and Year 15 follow-up waves) based on National Center for Educational Statistics data.

A labor market and macroeconomic file with data on local employment and national consumer confidence is available from the Baseline through Year 9 waves.

We also distribute a FFCWS genetic data appendage based on the saliva samples collected from mothers and children at the Year 9 wave.

FF Challenge Files

The FF Challenge data files are associated with the predictive modeling stage of the FF Challenge competition, held in Summer 2017. These files are now being provided so that other data users may replicate and extend what participants did in the Challenge.

The Challenge files include:
-    readme.txt – a text file with descriptions of the remaining files
-    background.csv - birth-Year 9 data, as a .csv
-    background.dta - birth-Year 9 data, as a Stata .dta file
-    codebook_FFChallenge.txt - merged codebook for all waves
-    prediction.csv - an example submission that predicts the mean of the training data for all outcomes
-    train.csv - outcomes for training observations (half the sample)
-    test.csv - outcomes for test observations
-    leaderboard.csv - outcomes for observations in the leaderboard set, with missing outcomes imputed (as provided via Codalab)
-    leaderboardUnfilled.csv - outcomes for observations in the leaderboard set (not imputed)

Data files from the FF Challenge project are available for download through the the OPR Data Archive. For more detail on what's available, please visit the Fragile Families Challenge blog.