Privacy-preserving data sharing via probabilistic modeling

Show simple item record

dc.contributor.author Jalko, Joonas
dc.contributor.author Lagerspetz, Eemil
dc.contributor.author Haukka, Jari
dc.contributor.author Tarkoma, Sasu
dc.contributor.author Honkela, Antti
dc.contributor.author Kaski, Samuel
dc.date.accessioned 2021-08-12T07:17:01Z
dc.date.available 2021-08-12T07:17:01Z
dc.date.issued 2021-07-09
dc.identifier.citation Jalko , J , Lagerspetz , E , Haukka , J , Tarkoma , S , Honkela , A & Kaski , S 2021 , ' Privacy-preserving data sharing via probabilistic modeling ' , Patterns , vol. 2 , no. 7 , 100271 . https://doi.org/10.1016/j.patter.2021.100271
dc.identifier.other PURE: 167428876
dc.identifier.other PURE UUID: ba8a853f-a971-4c75-9450-adc0c5134703
dc.identifier.other WOS: 000672159300011
dc.identifier.other ORCID: /0000-0001-9193-8093/work/98417495
dc.identifier.other ORCID: /0000-0003-3875-8135/work/98418356
dc.identifier.uri http://hdl.handle.net/10138/333081
dc.description.abstract Differential privacy allows quantifying privacy loss resulting from accession of sensitive personal data. Repeated accesses to underlying data incur increasing loss. Releasing data as privacy-preserving synthetic data would avoid this limitation but would leave open the problem of designing what kind of synthetic data. We propose formulating the problem of private data release through probabilistic modeling. This approach transforms the problem of designing the synthetic data into choosing a model for the data, allowing also the inclusion of prior knowledge, which improves the quality of the synthetic data. We demonstrate empirically, in an epidemiological study, that statistical discoveries can be reliably reproduced from the synthetic data. We expect the method to have broad use in creating high-quality anonymized data twins of key datasets for research. en
dc.format.extent 10
dc.language.iso eng
dc.relation.ispartof Patterns
dc.rights cc_by
dc.rights.uri info:eu-repo/semantics/openAccess
dc.subject NOISE
dc.subject 113 Computer and information sciences
dc.title Privacy-preserving data sharing via probabilistic modeling en
dc.type Article
dc.contributor.organization Department of Mathematics and Statistics
dc.contributor.organization Department of Computer Science
dc.contributor.organization HUS Abdominal Center
dc.contributor.organization Department of Public Health
dc.contributor.organization Content-Centric Structures and Networking research group / Sasu Tarkoma
dc.contributor.organization Helsinki Institute for Information Technology
dc.contributor.organization Probabilistic Mechanistic Models for Genomics research group / Antti Honkela
dc.description.reviewstatus Peer reviewed
dc.relation.doi https://doi.org/10.1016/j.patter.2021.100271
dc.relation.issn 2666-3899
dc.rights.accesslevel openAccess
dc.type.version publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
1_s2.0_S2666389921000970_main.pdf 1.476Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record