Differential privacy

from Wikipedia, the free encyclopedia

Differential Privacy aims to maximize the accuracy of responses to queries to databases while minimizing the likelihood of being able to identify the data sets used for the response. The term falls into the area of ​​secure, privacy- preserving publication of sensitive information. Mechanisms that fulfill differential privacy prevent attackers from being able to distinguish whether a particular person is contained in a database or not.

situation

Differential privacy is used in the context of publishing sensitive information. One would like to make general, statistical information accessible, but at the same time to protect the privacy of individual participants.

A conceivable scenario is the publication of patient data from a hospital in order to enable researchers to uncover statistical relationships. It is necessary to process the data in such a way that the individual patients cannot be identified. For research purposes, it is sufficient that the statistical distributions in the published data correspond to those of the real data. It does not matter whether individual properties, such as the names of the patients, have been changed.

motivation

With the advent of online outsourcing and cloud computing , the sensitivity for data protection in this context is also increasing. The cost reduction and the increase in performance that one expects from the benefits of the new technologies go hand in hand with the loss of control over the data.

While encryption is an established instrument that can effectively guarantee data protection , it also restricts the processing of the data. In order to perform operations on encrypted data, much more complex algorithms are required, which are associated with a corresponding expenditure of time. In addition, the use of encryption prevents unrestricted publication.

Therefore, one is looking for methods with which one can publish information while preserving privacy, without encrypting the data.

Differential Privacy follows the approach of adding noise to data in order to make clear statements about certain properties of the data impossible.

ε-differential privacy

The original term for differential privacy uses a central parameter ε to enable a measure of privacy protection.

definition

A randomized function provides differential to transmit privacy if for all records and which differ in no more than one entry, and all the following applies: .

Mechanisms

There are different approaches to implement the requirements of differential privacy in mechanisms. By adding noise to a given dataset, it is possible to get the properties you want. Noise can be achieved by generating new entries. These new entries, also called dummies, have to be indistinguishable from the original data in order to meet the requirements of differential privacy.

weaknesses

ε-Differential Privacy places high demands on mechanisms, which means that the results are sometimes very much less useful. If too much noise is generated and this is too different from the original data, the information content is very limited.

(ε, δ) -Differential privacy

Since ε-Differential Privacy shows certain restrictions regarding the applicability, the extension (ε, δ) -Differential Privacy was created. This term allows conditions to remain unfulfilled to a certain extent.

definition

For a given randomized mechanism to say that M differential to transmit Privacy fulfilled if for all with and the following equation: .

Probabilistic definition

For a given randomized mechanism and constants , it is said that M -Probabilistic differential privacy satisfied if one for all the range of values into two sets can be divided so that and for all , so that , and for all the following equation: .

Differences to ε-Differential Privacy

ε-Differential Privacy was extended by the parameter δ. This allows the mechanism to violate the requirements to a certain extent, which is determined by δ.

Composability

Differential privacy has the property of composability. If a series of queries are made to a differential privacy mechanism with a given ε , and if the mechanism uses an independent random source for each processing , the result is ultimately that differential privacy shows.

Computational Differential Privacy

Differential privacy is a static term that does not impose any formal restrictions on the might of attackers. For practical use, however, it makes sense to make certain restrictions. For example, the use of encryption requires that an attacker cannot find out the private key by simply using trial and error .

There are basically two ways of achieving Computational Differential Privacy (CDP). On the one hand, it is sufficient to simply expand the existing definition of differential privacy to include a restriction on attackers. This approach is called IND-CDP. There is also the term SIM-CDP, which is based on a simulation of the attacker's point of view.

See also

literature

  • Cynthia Dwork : Differential Privacy . In: 33rd International Colloquium on Automata, Languages ​​and Programming, part II (ICALP 2006) . Springer, July 2006, p. 1–12 (American English, online [accessed October 27, 2013]).
  • Cynthia Dwork, Frank McSherry, Kobbi Nissim, Adam Smith: Calibrating Noise to Sensitivity in Private Data Analysis . In: Shai Halevi, Tal Rabin (Ed.): Theory of Cryptography . Springer, 2006, ISBN 978-3-540-32731-8 , pp. 265–284 , doi : 10.1007 / 11681878_14 (American English, online [accessed October 27, 2013]).
  • Ilya Mironov, Omkant Pandey, Omer Reingold , Salil Vadhan : Computational Differential Privacy . In: Advances in Cryptology - CRYPTO 2009 . Springer, August 2009, doi : 10.1007 / 978-3-642-03356-8_8 (American English, online [accessed October 27, 2013]).