50 likes | 152 Vues
Explore the complexities of genomic data in personalized medicine, from speed and scalability challenges to privacy and security solutions using cloud computing and cryptographic protocols.
E N D
High dimensional genomic data, identifiability, and query-response Haixu Tang School of Informatics and Computing Indiana University, Bloomington
“Big Data” in Personal Genomics • Genomics is a key component of personalized medicine • Massive • Large research-oriented projects: 1000 genomes to 106 • Genome sequencing for all new-borns? • Open data project, e.g., the Personal Genomics Project (PGP) • Heterogeneous • Genomic sequence (variations) • Constant, dynamic monitoring • Transcritpomics, proteomics, metabolomics, microbial communities, etc. (as demonstrated by iPOP)
Challenges in Personal Genomics Challenges: Speed, Storage, Scalability, Security Solution: cloud, hybrid cloud, bring computing to the data!
Privacy Enhancing Technologies Database security approaches: access control, query auditing, differential privacy Cryptographic protocols: SMC, homomorphic computation, functional encryption Ethic studies, informed consent, policy
What is specific for genomic data? • Challenges • Genome technologies evolve very fast! • Genomic data are extremely high dimensional • Millions of SNPs, easily identifiable • Balance between data security and utility • Not only the data, but also analysis results need to be protected • Allele frequencies or test statistics (e.g., Homer’s attack) • Special properties • Different dimensions are NOT independent • Genetic structures (e.g., linkage disequilibrium) • Specific genomic research focuses on a small number of dimensions (e.g., disease-associated SNPs)