This chapter discusses the potential for creating a new tier of access to the Survey of Income and Program Participation (SIPP) data that would be intermediate between a public-use file and access through Federal Statistical Research Data Centers (FSRDCs). Such a tier would offer access to more data than the public-use file and less than at FSRDCs, and it would have different mechanisms for determining eligibility for data access and for maintaining data security. A commonly used term for such a tier is a virtual data enclave (VDE). However, the Census Bureau already makes SIPP data available through two mechanisms that might be considered VDEs. To avoid confusion with these other tiers, the panel instead uses the term SODA to refer to what it proposes.
An important context to this suggested change is that systems like SODAs are already used widely to provide access to confidential data and have a record of providing appropriate security and protection for human subject confidentiality. They have been embraced by the scientific community. Such approaches are in use by other federal agencies in a way that can serve as a model for the Census Bureau processes for protecting SIPP confidential data. For example, the Bureau of Labor Statistics (BLS) asks people to sign a BLS Agent Agreement to gain access to its confidential data accessed using a VDE. The Letter of Agreement establishes the data user as a temporary BLS agent, and no additional work (such as Special Sworn Status) is needed. (The Census Bureau is subject to Title 13, which requires such temporary staff to swear to observe the limitations imposed by Title 13 but does not specify the nature of how the agent is sworn in.) Additionally, a SODA-like approach is used by federal agencies including
the National Center for Education Statistics, the Bureau of Justice Statistics, the United States Department of Agriculture, and the Millennium Challenge Corporation. The Food and Drug Administration and National Institute on Drug Abuse operate a VDE to provide access to the restricted use files from the Population Assessment of Tobacco and Health. NORC at the University of Chicago, Westat, and the Inter-university Consortium for Political and Social Research are several examples of organizations contracted to operate VDE infrastructure for various data and partners and to provide access to confidential data to authorized users.
SODAs also have an important role in fulfilling the requirements of the Foundations for Evidence-Based Policymaking Act of 2018 (Evidence Act), which formalizes a goal of increasing access to data, and the Information Quality Act, which makes tiered access a tool for supporting that move.
Currently, SIPP data are primarily offered through two vehicles: a public-use file and FSRDCs. Data access through FSRDCs originally required the researcher’s physical presence at an FSRDC but starting with the COVID-19 pandemic and later as a more permanent change, access to Title 13 data at FSRDCs (including SIPP) is now also available online, though with the same approval process as for physical attendance.1 Thus, data access has now been expanded to three tiers. An additional tier, but much more limited in terms of the amount of SIPP data that are available, is the SIPP Synthetic Beta;2 in this tier, SIPP synthetic data can be accessed online, with a somewhat different (and easier) process for gaining approval for access than for FSRDC access.
The reason for considering a new tier of access is that, given the risks of high data granularity discussed in Chapter 3, it seems likely that the public-use file in its current format will soon be found to be too disclosive, either based on current data availability or the general trend toward more types of data becoming available. Though a reduced public-use file could still be useful to many researchers, the FSRDC process is too onerous, costly, and time-consuming to be practical for those researchers whose data needs would not be met. SODA can provide a solution, but its success will depend both on data users’ motivations (e.g., whether the public-use file will meet their needs and what other databases are available as alternatives) and on the particular conditions and tools that the Census Bureau provides for accessing the data through SODA. If SODA access requirements are largely the same as FSRDC access requirements while the public-use file becomes less useful, then SIPP may face a precipitous loss of users. It will be important
___________________
1 Such access may not be available when other data are comingled with SIPP data, depending on the requirements set by the agencies providing the data.
2 At the time of this writing, access to SIPP Synthetic Beta has been suspended until a new host site is established.
to monitor user access and the level of burden imposed upon users, and to design a system that supports access while providing data security. Multiple factors are important, including the initial process of approval for access, the mechanisms available for access, the presence of tools to simplify analysis, and the disclosure review process before data can be released.
The panel notes that a long-standing concern of the Census Bureau has been to maintain the confidentiality requirements specified in Title 13. In this particular case, where the move would be to make some data more secure (moving from a public-use file to a more secure tier) rather than making FSRDC data less secure, it is not clear whether any changes to Title 13 procedures are required. However, part of Chapter 9 also discusses how the policy interpretation of Title 13 is actually much stronger than what is required by Title 13 itself. Thus, there is also a possibility that Title 13 could be reinterpreted in a way that would recognize the federal government’s move toward increasing tiers of access and promoting the provision of data for evidence building.
A SODA would be very valuable for providing access to Title 13 data for several reasons. First, a SODA places a less onerous burden on the data requestor than the FSRDC, as it involves less credentialing work in becoming authorized for data access than using an FSRDC requires. Researchers seeking to use data in an FSRDC currently must obtain a Special Sworn Status, which requires applicants to meet U.S. citizenship and residency requirements and to pass a background check administered by the Department of Homeland Security. Applicants must also submit fingerprints; complete online training; submit residential, travel, employment, and education histories; provide references; and complete an interview.
SODA user credentialing can incorporate a system and protocols to vet users and monitor the terms of their controlled access. Users generally submit applications that demonstrate their need for restricted data and the scope of their analysis plans. The approval process may also consider applicants’ statistical training, educational background, institutional affiliations, and titles, and it may require a supervisor as a co-applicant (a faculty advisor in the case of student applicants). Requiring Institutional Review Board approval or exemption and a signed Data Use Agreement by an institutional representative adds additional layers of security and accountability, though they can become administrative barriers to nonacademic researchers. The Association for the Accreditation of Human Research Protection Programs, Inc., offers an accreditation process for organizations, including independent Institutional Review Boards, which can offer ethical oversight for those wishing to access SIPP data in SODAs who are not at academic institutions.
It is also possible to augment the application process with additional user training, which would be valuable to creating researcher behaviors that keep SIPP data from being disclosed. There are other difficulties inherent in FSRDCs, such as (a) a limited number of seats; (b) limitations in computing power, thus slowing responses; and (c) high cost to the applicants. A SODA stands to offer a much broader set of researchers with quicker access to SIPP confidential data while maintaining Census Bureau authorization of users and disclosure review controls applied to any output. Expanding access through SODA will alleviate the additional administrative and cost burden of applying to multiple physical research data centers for multisite research teams and will ease the transition if researchers change institutions during the project. A SODA is not limited by physical computing space (e.g., using secured cloud computing) and can more easily expand its computing capacity in response to shifting researcher demand, though SODA users will still be subject to disclosure risk review on their analysis output. Thus, the SODA approach would be more equitable across different user and research communities as compared with those cost and experience barriers that are associated with FSRDCs. The process in gaining access would necessarily be more involved than simply downloading a dataset, but institutions such as the Inter-university Consortium of Political and Social Research have worked to make data both secure and accessible.
There is another equity issue other than access but related to disclosure avoidance. Minority groups of all types tend to be at greater risk of disclosure simply because of their smaller numbers and thus their greater probability of having unique combinations of characteristics. If one were to apply disclosure avoidance tools differentially to those groups in order to protect their confidentiality, then one might also be weakening statistical analyses concerning these groups. SODA helps to address this issue as well by providing a secure way of accessing the data without the data being perturbed.
A SODA environment simulates a non-internet-connected desktop computer and can be configured to meet security requirements for most confidential data. The SODA would ensure secure access to SIPP data files that could be reinterpreted as controlled unclassified information. Thus, the SODA would then need to adhere to the Confidential Information Protection and Statistical Efficiency Act and NIST 800-53 (or NIST [National Institute of Standards and Technology] 800-171) confidentiality and security requirements for controlled unclassified information in a federal system. SIPP data accessed through a SODA would still be restricted from general
dissemination, protecting the confidentiality of the individuals and organizations represented in the data. From a technological perspective, a SODA could be similar to a VDE provided through an FSRDC; the real difference would be in the application process and the disclosure review process.
A SODA provides authorized users with a mechanism with technical controls to securely access confidential data from their local desktops. It is a virtual environment launched from the researcher’s desktop—operating on a remote server meeting Federal Information Security Modernization Act of 2014 (FISMA) security controls or in the FedRAMP Cloud-based environment. For the user, launching the SODA is akin to remotely logging into another physical computer or secure network, something that many researchers are accustomed to doing. The virtual environment is isolated from the users’ physical desktop computers, restricting users from emailing, printing, copying, or otherwise moving files outside the secure environment. Therefore, the user never has access to a physical copy of the data; the data remain within the SODA, giving the organization managing the environment the ability to monitor data access and terminate access within minutes. A SODA can also be configured to provide access at specified locations, such as through an approved IP address. In a SODA, desktop security and data protection are centralized and applications can be quickly added, deleted, upgraded, and patched. The SODA is both useful and cost-effective for providing a wide array of desktop editing and statistical software.
A SODA can be configured to restrict researchers’ access to the datasets they are approved to use and can support combination with other publicly approved datasets and other restricted federal statistical agency datasets. The virtual environment can be deleted when the researcher logs off of the SODA, and a new virtual environment is created each time the researcher logs back on to the SODA. A SODA can be configured to offer a secure workspace for each approved project, one that is isolated from others’ access. Thus, authorized users retain access to any files previously saved in the SODA but not to the workspaces. The SODA also facilitates collaborative research with multiple individuals within and across institutions being able to access a shared folder for their work.
A SODA also allows an agency to manage legal and social controls for accessing the data. SODA access is supplied to authorized users only after Census Bureau user approval is in place (such as through a Census Bureau Agent Agreement). A SODA can be configured to provide agreement tracking with notifications for expiring agreements to manage compliance. Data access can be suspended quickly. Authorized users request output and files that they want to receive outside the SODA to be reviewed against a set of disclosure review guidelines (either by a Census Bureau Disclosure Review Board or supported with automated checking). Census Bureau staff or a designee conducts a disclosure review of the output. It is possible to add
disclosure protections to the output, including a differential privacy process. The cleared output will have no confidential information. Finally, SODA access can automatically terminate at the end of the agreement period (typically one year) unless the SODA user applies and is approved for an extension.
The Census Bureau’s infrastructure solution ideally would be connected to government efforts to streamline access to its data. The Evidence Act, enacted in January 2019, which has the goals of advancing data and evidence-building functions in the federal government, has already made it easier to access government data in ResearchDataGov.org, a single application portal for discovering and requesting access to restricted microdata from federal statistical agencies. Another outcome of the Evidence Act is accelerated planning for a National Secure Data Service (NSDS) hosting a secure infrastructure where researchers and other users (such as program evaluators) would be able to (a) submit proposed projects for approval; (b) link and access data for research and analyses; and (c) have project results privacy-protected, then prepared for public dissemination (Potok & Hart, 2022).
The panel notes that creating and maintaining SODA will require some cost and effort on the part of the Census Bureau. This includes creating a mechanism for approving people for access (even if it is simpler than at FSRDCs), creating and maintaining the hardware and software infrastructure, and reviewing statistical findings for potential release. The panel has not attempted to estimate those costs, though the timeline presented in this report makes allowance for the development time needed. Possibly the NSDS will be helpful in this regard. To the extent that SODA allows access to the actual data rather than to synthetic data, SODA has the advantage of not requiring a separate verification and validation process, thus avoiding one source of burden on the Census Bureau; still, the results will need to go through disclosure review. Ideally, some parts of the disclosure review process might be automated, though doing so for complex analysis would require a long-term development process. The Census Bureau might also consult with other agencies (including through NSDS) and external organizations to find ways of lessening the burden.
The Census Bureau might also individually implement security and confidentiality requirements for its own SODA infrastructure using a contractor as a way of reducing the burden on the Census Bureau and potentially making use of systems that are already in place. The Census Bureau would require any organization under contract to select, train, and maintain rigorous security processes for employees who have access to SIPP confidential
data (documenting employee training in confidentiality and security procedures) and have the ability to destroy data at the end of any contract period. An organization contracted to manage the SODA infrastructure would also be required to maintain a secure worksite and system to prevent unauthorized access. A contractor would implement security controls and maintain documentation and security controls for any system holding SIPP confidential data, obtaining a FISMA moderate Authority to Operate.3 The Census Bureau may use security inspections to ensure that the system and worksite where the data are stored are in compliance and secured. Sometimes the Census Bureau’s use of private contractors has gone poorly, especially when the contractors were working outside of their areas of expertise or lacked sufficient infrastructure for the large-scale operations that are sometimes needed (Carless, 2020; Ruggles & Magnuson, 2020). If the Census Bureau makes use of a private contractor, it is important to verify the contractor’s experience and capabilities for maintaining a SODA. If a contractor is given responsibility for disclosure review, then it may be that certain types of decisions could be delegated completely to the contractor, while others require consultation with the Census Bureau; even in the latter case, the contractor may be able to organize the request in a way that lessens work on the Census Bureau.
Conclusion 5-1: Secure online data access is a viable method for protecting confidentiality for access to a more comprehensive set of Survey of Income and Program Participation data than would be available in the public-use file.
Conclusion 5-2: There are elements of Survey of Income and Program Participation synthetic and virtual data access (provided through Federal Statistical Research Data Centers) approaches that could be used to develop new modes of access to protect confidentiality and limit disclosure.
Recommendation 5-1: If disclosure risk assessment studies find that the current public-use file does not provide adequate disclosure avoidance, the Census Bureau should consider secure online data access as a mode likely to support both access and security.
___________________
3 FISMA scores are used to grade the security of a company’s or agency’s internal systems, with three levels of security: low, moderate, and high. The standards are set by the National Institute of Standards and Technology as part of its statutory responsibilities under FISMA of 2014. See https://doi.org/10.6028/NIST.SP.800-53r5
This page intentionally left blank.