Toolkit FAQ: Web Based Data Dissemination

Latest post 06-15-2007 1:20 PM by administrator. 0 replies.
  • 06-15-2007 1:20 PM

    Toolkit FAQ: Web Based Data Dissemination

    Locked | Reply |Contact

    What requirements are for and barriers to development of Web-based data Query Systems?

    What are some central concerns relating risk of disclosure in Web-based data dissemination?

    While small cell size is a concern for most public health statistical publications, it is more acutely so in web-based data dissemination systems for several reasons.  First, web-based data dissemination systems are particularly desirable for immediate answers to questions about the public’s health, and generally, the users of the systems are interested in data for small geographical areas and other small groups of individuals.  Second, the information reaches a much broader audience than a paper publication, and often this includes individuals without statistical or epidemiologic training.  Third, web-based systems generally provide less documentation on how to interpret the results than do paper publications which usually provide extensive bibliographies, appendices, footnotes, caveats, and so forth, and web-based systems are often likely to provide less basic information on certain conditions.

    How can the risk of disclosure be minimized in Web-based data dissemination? What are standards used by various state agencies?

    Broader use and dissemination of public health data sets serves a public good, highlights public health’s important role in health and health improvement, and places additional value on public health data assets.  Yet, there is more risk of personal disclosure of sensitive information when it is displayed on the Internet; the Internet is an impersonal access tool that increases the velocity of interactions and as a result allows for rapid use and dissemination to others who may or may not have a good understanding of appropriate data use.

    What constitutes disclosure?  “Disclosure relates to inappropriate attribution of information to a data subject, whether an individual or an organization. Disclosure occurs when a data subject is identified from a released file (identity disclosure), sensitive information about a data subject is revealed through the released file (attribute disclosure)”.

    Approaches to prevent disclosure can be divided into two broad categories: “management and institutional controls;” and “data modification and alteration methods.” 

    Multiple protective layers to assure anonymity and confidentiality should include the management and technical controls and data modification and alteration described below: 

    (i)                  Data protection agreements: Data Protection agreements are used to both inform users and control use of the data. In web-based data query systems (WBQS’s) the effectiveness of this approach has not yet been tested.

    (ii)                Limited data set:  A subset of the full data set is created for public use, dropping identifiable data elements.

    (iii)               On-line query system:  Users are not allowed to download or obtain copies of raw data files.  Instead data reside on the host machine often protected by a firewall.  The users conduct their own analysis by submitting queries and obtaining aggregated results.  

    (iv)              User authentication and access validation:  Password protection to CD-ROM and public use files and for access to web query systems.

    (v)                Education and training of public use file users:  The data-providing agency educates users about the disclosure risk of micro data, the types of analyses that are considered breeches of confidentiality, and the legal issues associated with disclosure.

    (vi)              Making preconstructed tables and pivot tables available.  Pre-constructed tables allow review of results in a form that disclosure of personal identity and health information.  Pivot tables, do allow for some alternative displays of the data for the end-user, still controlling the level of drill down to prevent disclosure.

    (vii)             Anonymizing/de-identifying data. Anonymizing a micro data file by removing information such as names, addresses, policy numbers, etc.

    (viii)           Cross-tabulations and micro-aggregation:  The display is fixed in terms of number of rows and columns and/or the data is aggregated to avoid disclosure.

    (ix)              Restriction of geographic detail:  Rare events or events occurring in small geographic areas are removed or altered to avoid disclosure.

    (x)                Recoding into intervals and rounding:  Grouping values of continuous data elements into broader categories to increase the cell size and prevent the disclosure.

    (xi)              Cell suppression:  Removing data values below pre-determined cell sizes and applying rules regarding the display of margins.

     

    In addition to the approaches listed above, several new software packages are available that provide technical support for protecting public health data.  The National Center for Health Statistics (NCHS) sponsored the development of disclosure limitation software for two-way tables by OptTek Systems, Inc.(OptTek, 2002).  The OptTek software includes the following functionality:

    • cell suppression
    • controlled rounding (minimum-distance controlled rounding)
    • unbiased controlled rounding
    • controlled rounding subject to subtotal constraints
    • synthetic substitution (controlled tabular adjustment)
     

    For approaches to cell suppression, refer to the question: What if any are the standard rules for suppression of data reported to public?

    Ryley Fogg NAHDO IT

    Filed under:
Page 1 of 1 (1 items) | RSS
Powered by Community Server (Commercial Edition), by Telligent Systems