Study Design Questions
Data Structure Questions
- How do I merge the public data files?
Currently, baseline, Year 1, Year 3, Year 5, and Year 9 core follow-up data are available to the public through the Office of Population Research data archive. Year 3, Year 5, and Year 9 in-home data are also available via the archive for a subset of core respondents. Medical records data, contextual, macroeconomic, geographic identifiers (sample city, state of residence, stratum/psu), Year 9 school characteristics, and genetic data are available to the public via a restricted use contract.
There is a two-step process to access the data: (1) register as a user of Princeton University's Office of Population Research data archive, and (2) sign up for access to the Fragile Families and Child Wellbeing Study within the data archive. Registering as a user of the archive is immediate and automated. Signing up for the Fragile Families data submits a request which is usually reviewed for approval within 1 business day. Please note that after logging in, the OPR website occasionally times out and a Fragile Families data request that was submitted by the user is not actually processed. After logging in to the archive, completing the data request within 10 minutes usually results in a successful submission. If you don't hear back about your approval status within 1 business day, email us at firstname.lastname@example.org.
Geographic identifiers are only available through a restricted use contractual agreement. See the Fragile Families contract data page for more information.
If you want to review frequencies before downloading the data, please review the codebooks available by wave on the documentation page. Frequencies for variables are presented in the same order as the questions were asked in the survey instrument with constructed variables following the appropriate sections.
Please email all questions about the data to email@example.com.
We ask that all users personally register in order to access the data files.
The Fragile Families Study receives funding from a number of different sources. We want to be able to provide our funders with information about data usage, such as the number of data users and what the data are being used for. Your contact information will not be used unless you ask to receive mailings about the data, study, etc.
We request that users cite the substantial funding from the Eunice Kennedy Shriver National Institute of Child Health & Human Development in their publications with the following statement: “Research reported in this publication was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) of the National Institutes of Health under award numbers R01HD36916, R01HD39135, and R01HD40421, as well as a consortium of private foundations. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.”
The Fragile Families & Child Wellbeing Study has been supported by a number of foundations and agencies. Click here to view the list of those who funded the core study.
Our questionnaire maps will show you many general topics covered in FFCWS, which waves included those topics, and where to find the relevant questions in the questionnaires. Maps are available for all waves of mother and father interviews (aka, the “Core Interviews”) and for the Year 3 and Year 5 In-home interviews. If you do not find what you are looking for in the questionnaire maps, we suggest you look at the questionnaires directly and use “Ctrl, F” to search for your terms of interest. Questionnaires are available by wave on our Data and Documentation page.
Most data users learn about the data by using the documentation available on our website. When you have any questions, you’re welcome to email us at firstname.lastname@example.org. For more formal training, we host an annual summer data workshop at Columbia University for advanced doctoral students and early career scholars. Applications are usually due in March and are accepted on a competitive basis. Finally, we also bring an exhibit booth to a few research conferences each year where data users can drop by and ask questions. Some of the conferences we’ve attended in recent years include the annual meetings of Population Association of America, Association for Public Policy Analysis and Management, Society for Social Work and Research, and American Public Health Association.
Follow our Twitter (@FFCWS) and sign up for our newsletter for regular updates about the data. We also post data alerts to the website when we release new information. For more specific questions, email us at email@example.com.
Study Design Questions
A detailed description of our sample design is contained in Reichman et al 2001, "The Fragile Families and Child Wellbeing Study: Sample and Design" Children and Youth Services Review, 2001, Vol.23, No, 4/5. A brief summary and additional details on data collection and hospital protocols are included in the "Introduction to the Fragile Families Public Use Data".
Sampling Mothers - Mothers of new babies were sampled at each hospital from maternity ward lists. Once sampled, mothers were asked to complete a screening instrument to determine marital status and eligibility for participation in the study. Quotas were set at each hospital for number of unmarried and married births, based on sample cities’ 1996/1997 unmarried birth rates. If a mother was determined to be above the set quota for a given marital status, the case was coded “over quota” and the mother was not interviewed. Mothers’ eligibility was determined based on the analytic goals, logistical restraints and design of the study, including the need to interview both a mother and father of a child who would be residing with at least one of those parents. Thus, for instance, mothers whose babies would be adopted were considered “ineligible” and were not interviewed.
Sampling Fathers - Once a mother had been determined to be eligible, and had given her signed consent for participation, the baby’s father was also asked to participate in the study.
See the Guide to the Public Use Files (section VII) and the sample design paper.
National weights make the data of 16 of the 20 cities representative of births in the 77 U.S. cities with populations over 200,000. See the weights documentation, Sample Design paper, and Introduction to the Fragile Families Public Use Data for extensive discussions of the weights and samples.
The mother is considered the primary caregiver if she lives with the child at least half of the time, which applies to the majority of families. If she does not, however, the PCG interview would have been conducted with the father or other adult who lives with the child at least half of the time. See the Guides to the Public Use Files for information regarding constructed variables which indicate the specific relationship of the PCG to the child for each family/interview.
Data Structure Questions
The data are structured as one record per child. Mother and father data are in separate files. There are records for all 4,898 mothers and fathers at each wave, regardless of whether they were interviewed. Mother and father data can be merged using the IDNUM variable. Flag variables ( e.g. CF1FINT, CM2MINT, CF2FINT) indicate whether or not a mother/father was interviewed at a given wave (all mothers were interviewed at baseline so there is no CM1MINT variable). Cases not interviewed are coded as -9 "Not in wave" on all other variables. There are also flag variables (e.g. CM1FINT, CM2FINT and CF2MINT) on the mothers' and fathers' records indicate whether the corresponding mother or father was interviewed at the time of the follow-up.
Questionnaires are available by wave on the documentation section of the website. There are also questionnaire maps, which indicate when a concept was measured in the core surveys and in-home activities across the first five waves.
On each of the baseline files (mothers' and fathers') there are two variables you should use to find out when the respondent was interviewed. M1INTMON / F1INTMON represent the month of interview, and M1INTYR / F1INTYR represent the year of interview. There is also a constructed variable (CM1TDIFF) that is found in the mothers' files and can be used to check the time gap between parent interviews. There are corresponding variables at all waves.
The main identifier on the file for merging and sorting is the IDNUM, a 4-character string variable. IDNUM can be found on all of the public data files.
Flag variables (e.g. CF1FINT, CM2MINT, CF2FINT) indicate whether or not a mother/father was interviewed at a given wave (all mothers were interviewed at baseline so there is no CM1MINT variable). Cases not interviewed are coded as -9 "not in wave" on all other variables. Flag variables (e.g. CM1FINT, CM2FINT and CF2MINT) on the mothers' and fathers' records indicate whether the corresponding mother or father was interviewed at the time of the follow-up. Additionally, variables with the "SAMP" root in the name (e.g. CM2SAMP and CF2SAMP) provide information about the status of the case at each follow-up. Information such as mother/father/child death between waves, nonresponse and changes in eligibility are coded in these variables.
"-5" in the data file means the person was not asked a given question because that question was not on the version of the questionnaire used at the time of the interview. "-6" means the respondent was skipped from a question that wasn't appropriate for them to answer. For more help with skipped questions, see “How do I figure out why a participant skipped a particular survey question?”
There are many skip patterns in the data and they can be complex. In order to understand a skip pattern, the best place to start is by 1) going to the applicable questionnaire, 2) finding the variable you are examining, and 3) working backwards from the variable of interest until you find the skip instruction(s) which initiates the skip pattern. Please note that some of the more complex interview segments may contain more than one skip pattern in a particular section so if the first one you see does not account for all of the cases that skip the variable of interest, you may need to look back further for an additional skip command. Questionnaires are available by wave on our Data and Documentation page.
The FF Challenge data files are associated with the predictive modeling stage of the FF Challenge competition, held in Summer 2017 [http://www.fragilefamilieschallenge.org/]. These files are now being provided through Princeton’s OPR Data Archive [http://opr.princeton.edu/archive/ff/] so that other data users will be able to replicate and extend what participants did in the Challenge. If you are trying to identify which file you have downloaded, the Challenge files can most easily be identified in contrast to the traditional FFCWS data files by the unique ids (integer values between 1 and 4,242) and number of observations (4,242) - the FFCWS data files have an idnum with four digits and 4,898 observations.
The Challenge files include:
- readme.txt – a text file with descriptions
- background.csv - birth-Year 9 data, as a .csv
- background.dta - birth-Year 9 data, as a Stata .dta file
- codebook_FFChallenge.txt - merged codebook for all waves
- prediction.csv - an example submission that predicts the mean of the training data for all outcomes
- train.csv - outcomes for training observations (half the sample)
- test.csv - outcomes for test observations
- leaderboard.csv - outcomes for observations in the leaderboard set, with missing outcomes imputed (as provided via Codalab)
- leaderboardUnfilled.csv - outcomes for observations in the leaderboard set (not imputed)
There are no mother or father interviews at Year 15, however the Year 15 primary caregiver (PCG) interview incorporates many of the questions and topics previously included in the mother and father interviews of previous waves as well as some of the questions and topics included in the PCG interviews of previous waves.
The Year 15 scales documentation is included in the Year 15 Guide to the Public Use Files, at the end of that document.
If you are having trouble downloading the files simply by clicking on them (please select "Save" and not "Open"), try right-clicking on the file and selecting “Save Target As.” We reccommend using the WinZip Classic interface to open the zip files you downloaded. Users may also want to check with their IT department to make sure you have an up to date copy of WinZip. Click here to download the most recent version of WinZip.
The data are available in SAS, SPSS, and Stata (for Windows) format. If users need data in other formats, we suggest using a file transfer program such as StatTransfer or DBMS/Copy.
Please use the SAS code included in zip files to read the formats. The formats are permanently attached to the variables in each data set. Or users can use the NOFMTERR option when reading in data.
Data files can be merged using the IDNUM variable. Providing instruction or advice regarding the use various statistical software packages is beyond the objectives of the Fragile Families and Child Wellbeing Study. However, data users may find the following online resources helpful in merging data files. This is not a comprehensive list, and there may be many other online resources of equal or greater benefit to data users. The following links are to external organizations that are not affiliated with FFCWS. Data users should use them at their own discretion.
- STATA 14 Help
- SAS 9.3 Statement Help
- SPSS Programming and Data Management 3rd Edition
- Princeton University Data & Statistical Services
- UCLA Statistical Computing
- UNC Carolina Population Center Data Analysis Tools
The weights were constructed to adjust for sample design (probablility of selection), non-response at baseline, and attrition on observed characteristics over the waves. For a brief introduction to using the weights, please read Fragile Families & Child Wellbeing Study: A Brief Guide to Using the Mother, Father, and Couple Weights for Core Telephone Surveys Waves 1-4. For a detailed account of how the weights were constructed, please read Fragile Families & Child Wellbeing Study: Methodology for Constructing Mother, Father, and Couple Weights for Core Telephone Surveys Waves 1-4.
There are valid weights for 1) interviewed cases and 2) cases in which we determined that the parent or child had died or that the child had been adopted or is living with neither parent. The cases for adoptions/living with neither parent have little or no interview data, they are coded as no in the national sample flags (and interview flags). Data users can, however, estimate the proportion of children/parents who died, etc by applying the weights to the interview sample flags.