Study Design Questions
Data Structure Questions
- How do I merge the public data files?
Metadata Explorer Questions
Publishing Your FFCWS Research
Currently, Baseline, Year 1, Year 3, Year 5, Year 9, and Year 15 survey data are available to the public through the Office of Population Research data archive. Year 3, Year 5, Year 9, and Year 15 in-home data are also included in these files, as well as child care provider (Year 3) and teacher surveys (Years 5 and 9), for a subset of core respondents. Additional files including Geographic identifiers, Census tract measures, Labor market and macroeconomic, Opportunity Insights, Gun Violence, Uniform Crime Reports, NCES, SEDA, CRDC, Medical records, Candidate genes, and Genotype array are available to the public via a Restricted Use Contract.
The data collection for Year 22 began in the Fall of 2020 and will continue through late 2022. A release date is not yet set. If you would like to be informed when new data are released, please sign up for our newsletter. We are not able to share the Year 22 surveys publicly at this time, but they will be posted on our Public Data Documentation page when they are available.
There is a two-step process to access the data: (1) register as a user of Princeton University's Office of Population Research data archive, and (2) sign up for access to the Fragile Families and Child Wellbeing Study within the data archive. Registering as a user of the archive is immediate and automated. Signing up for the Fragile Families data submits a request which is usually reviewed for approval within 1 business day.
Please note that after logging in, the OPR website occasionally times out and a Fragile Families data request that was submitted by the user is not actually processed. After logging in to the archive, completing the data request within 10 minutes usually results in a successful submission. If you don't hear back about your approval status within 1 business day, email us at firstname.lastname@example.org.
Please email all questions about the data and documentation to email@example.com.
We ask that all users personally register in order to access the data files.
The FFCWS receives funding from a number of different sources. We ask each data user to register separately because we want to be able to provide our funders with information about data usage, such as the number of data users and what the data are being used for. Your contact information will not be used unless you ask to receive mailings about the data, study, etc.
The Fragile Families & Child Wellbeing Study has been supported by a number of foundations and agencies. Click here to view the list of those who funded the core study.
You can use our Metadata Explorer to browse or search our database of the FFCWS variables by a variety of topics. You can use the Advanced Variable Search option to complete a text search using your own terms. It may also be helpful to look at the questionnaires directly, so you can see the questions that were asked in the same section as those you discover in the Metadata Explorer. Questionnaires are available by wave on our Public Data Documentation page. You can also see what other researchers have published on a variety of topics with FFCWS data in our Publication Archive.
Most data users learn about the data by using the documentation available on our website. You may also use our Metadata Explorer to browse or search the database of the FFCWS Public Data variables. When you have questions, you’re welcome to email us at firstname.lastname@example.org. We often bring an exhibit booth to a few research conferences each year where data users can drop by and ask questions. Some of the conferences we’ve attended in recent years include the annual meetings of the Population Association of America and the American Sociological Association. Visit our homepage for information regarding upcoming conferences and other events.
Follow our Twitter (@FFCWS) and sign up for our newsletter for regular updates about the data. We also post data alerts to the website when we release new information. Visit our homepage for information regarding upcoming conferences and other events. For more specific questions, email us at email@example.com.
Study Design Questions
A detailed description of our sample design is contained in Reichman et al 2001, "The Fragile Families and Child Wellbeing Study: Sample and Design" Children and Youth Services Review, 2001, Vol.23, No, 4/5. A brief summary and additional details on data collection and hospital protocols are included in the User Guide for each wave of data, which can be found on our Public Data Documentation page.
Sampling Mothers - Mothers of new babies were sampled at each hospital from maternity ward lists. Once sampled, mothers were asked to complete a screening instrument to determine marital status and eligibility for participation in the study. Quotas were set at each hospital for number of unmarried and married births, based on sample cities’ 1996/1997 unmarried birth rates. If a mother was determined to be above the set quota for a given marital status, the case was coded “over quota” and the mother was not interviewed. Mothers’ eligibility was determined based on the analytic goals, logistical restraints and design of the study, including the need to interview both the mother and father of a child who would be residing with at least one of those parents. Thus, for instance, mothers whose babies would be adopted were considered “ineligible” and were not interviewed.
Sampling Fathers - Once a mother had been determined to be eligible, and had given her signed consent for participation, the baby’s father was also asked to participate in the study.
See the Sample and Design Paper.
National weights are available to make the data of 16 of the 20 cities representative of births in the 77 U.S. cities with populations over 200,000. See the weights documentation and Sample Design paper for extensive discussions of the weights and samples.
See the User Guide for each wave, located on the Public Data Documentation page.
The mother was considered the primary caregiver if she lived with the child at least half of the time, which applies to the majority of families. If she did not, however, the PCG interview would have been conducted with the father or other adult who lived with the child at least half of the time. See the User Guides for information regarding constructed variables which indicate the specific relationship of the PCG to the child for each family/interview.
Data Structure Questions
You may download all of the public data in one data file or you may download separate files for each wave of data collection. The data are structured as one record per child/family. There are records for all 4,898 families at each wave, regardless of whether they were interviewed. Data from each wave (if downloaded separately) can be merged using the idnum variable. Flag variables (e.g. cf1fint, cm2mint, cf2fint) indicate whether or not a mother/father was interviewed at a given wave (all mothers were interviewed at baseline so there is no cm1mint variable). Cases not interviewed are coded as -9 "Not in wave" on all other variables.
Questionnaires are available by wave on the Public Data Documentation page. You can also look up questions by topic, text search, or variable name in the Metadata Explorer. When you click on a specific variable in the Metadata Explorer, a list of similar variables in other waves and surveys will be provided.
In the baseline file there are two variables you should use to find out when the respondent was interviewed. m1intmon / f1intmon represent the month of interview, and m1intyr / f1intyr represent the year of interview. There is also a constructed variable (cm1tdiff) that can be used to check the time gap between parent interviews. There are corresponding variables at all waves.
The main identifier on the file for merging and sorting is the idnum, a 4-character string variable idnum can be found on all of the public data files and all Restricted Use Contract files with the exception of the Psych Array file (which uses a different id number).
Flag variables (e.g. cf1fint, cm2mint, cf2fint) indicate whether or not a given respondent was interviewed at a given wave (all mothers were interviewed at baseline so there is no cm1mint variable). Cases not interviewed are coded as -9 "not in wave" on all other variables. Flag variables (e.g. cm1fint, cm2fint and cf2mint) on the mothers' and fathers' records indicate whether the corresponding mother or father was interviewed at the time of the follow-up. Additionally, variables with the "samp" root in the name (e.g. cm2samp and cf2samp) provide information about the status of the case at each follow-up. Information such as mother/father/child death between waves, nonresponse, and changes in eligibility are coded in these variables.
"-5" in the data file means the person was not asked a given question because that question was not on the version of the questionnaire used at the time of the interview. "-6" means the respondent was skipped for a question that wasn't appropriate for them to answer. For more help with skipped questions, see “How do I figure out why a participant skipped a particular survey question?” For more information on negative codes, see the User Guides.
There are many skip patterns in the data and they can be complex. In order to understand a skip pattern, the best place to start is by 1) going to the applicable questionnaire, 2) finding the variable you are examining, and 3) working backwards from the variable of interest until you find the skip instruction(s) which initiates the skip pattern. Please note that some of the more complex interview segments may contain more than one skip pattern in a particular section so if the first one you see does not account for all of the cases that skip the variable of interest, you may need to look back further for an additional skip command. Questionnaires are available on ourPublic Data Documentation page.
The FF Challenge data files are associated with the predictive modeling stage of the FF Challenge competition, held in Summer 2017 [https://www.pnas.org/content/117/15/8398]. These files are now being provided through Princeton’s OPR Data Archive so that other data users will be able to replicate and extend what participants did in the Challenge. If you are trying to identify which file you have downloaded, the Challenge files can most easily be identified in contrast to the traditional FFCWS data files by the unique IDs (integer values between 1 and 4,242) and number of observations (4,242) - the FFCWS data files have an idnum with four digits and 4,898 observations.
The Challenge files include:
- readme.txt – a text file with descriptions
- background.csv - birth-Year 9 data, as a .csv
- background.dta - birth-Year 9 data, as a Stata .dta file
- codebook_FFChallenge.txt - merged codebook for all waves
- prediction.csv - an example submission that predicts the mean of the training data for all outcomes
- train.csv - outcomes for training observations (half the sample)
- test.csv - outcomes for test observations
- leaderboard.csv - outcomes for observations in the leaderboard set, with missing outcomes imputed (as provided via Codalab)
- leaderboardUnfilled.csv - outcomes for observations in the leaderboard set (not imputed)
There are no mother or father interviews at Year 15, however the Year 15 primary caregiver (PCG) interview incorporates many of the questions and topics included in the mother and father interviews of previous waves as well as some of the questions and topics included in the PCG interviews of previous waves.
The Scales and Concepts Documentation page shows a table with the scales and concepts included in each wave of FFCWS data. Each “x” in the table links to more detailed documentation within the User Guide for that particular wave, including source information, the full variable list for the scale or concept, modifications, and scoring instructions (if applicable). You can also filter variables by scale in the Metadata Explorer.
If you are having trouble downloading the files simply by clicking on them (please select "Save" and not "Open"), try right-clicking on the file and selecting “Save Target As.” We recommend using the WinZip Classic interface to open the zip files you downloaded. Users may also want to check with their IT department to make sure you have an up-to-date copy of WinZip.
The data are available in SAS, SPSS, and Stata (for Windows) format. If users need data in other formats, we suggest using a file transfer program such as StatTransfer or DBMS/Copy. R users may use the .dta (Stata) files as well.
Please use the SAS code included in zip files to read the formats. The formats are permanently attached to the variables in each data set. Or, users can select the NOFMTERR option when reading in data.
Data files can be merged using the idnum variable.
The weights were constructed to adjust for sample design (probability of selection), non-response at baseline, and attrition on observed characteristics over the waves. For a brief introduction to using the weights, please read Fragile Families & Child Wellbeing Study: A Brief Guide to Using the Weights for Waves 1-6. For a detailed account of how the weights were constructed, please read the Constructing the Weights documents available on our Public Data Documentation page.
There are valid weights for 1) interviewed cases and 2) cases in which we determined that the parent or child had died, that the child had been adopted, or is living with neither parent. The cases for adoptions/living with neither parent have little or no interview data, they are coded as no in the national sample flags (and interview flags). Data users can, however, estimate the proportion of children/parents who died, etc. by applying the weights to the interview sample flags.
Metadata Explorer Questions
You can download the metadata as a CSV by going to the Metadata Explorer and clicking “Download full metadata” in the menu bar.
R and Python packages are available. You can access them by going to the Metadata Explorer homepage and scrolling down to “Other API resources to help you use the metadata.”
Using the Advanced Variable Search option, you can type your own search terms into the “Search results” box. This type of search can be done on its own or in combination with other parameters that you specify in the advanced search query builder. If you choose this second option, first specify your query rules and click “search”, and then enter your text search in the box below.
You can look up questions by topic, text search, or variable name on the Metadata Explorer. When you click on a specific variable in the Metadata Explorer, a list of similar variables in other waves and surveys will be provided. Questionnaires are also available by wave on the Public Data Documentation page.
You can filter variables by scale in the Browse or Advanced Variable Search features of the Metadata Explorer. Please also visit the Scales and Concepts Documentation page to view specific documentation for that scale and wave within the User Guide including source information, variable list, modifications, and scoring (if applicable).
Publishing Your FFCWS Research
We request that users cite the substantial funding from the Eunice Kennedy Shriver National Institute of Child Health & Human Development in their publications with the following statement: “Research reported in this publication was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) of the National Institutes of Health under award numbers R01HD036916, R01HD039135, and R01HD040421, as well as a consortium of private foundations. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.”
Send an email to firstname.lastname@example.org with the authors and title of your working paper. Please attach a word or PDF document including either the full text of your paper or an abstract, whichever you would like to be posted.
Send an email to email@example.com with the title and a link to your publication online.
FFCWS data users are encouraged, but not required, to submit their publications to PubMed Central. If you have attended one of our Columbia University Summer Data Workshops, you are required to submit your publications to PubMed Central. Attendees of the 2012 Workshop should cite R25HD072818. Attendees of the 2013-2018 Workshops should cite R25HD074544.