1 / 66

" OMICS( 체학 )” Revolution & research paradigm shift in biology

" OMICS( 체학 )” Revolution & research paradigm shift in biology. 박종화 (Jong Bhak) 게놈연구재단 개인게놈연구 소 http://personalgenomicsinstitute.org. Acknowledgement and Disclaimer. 국립암센터 , 이연수박사님 유전체 정보를 생산하고 공유하는 많은 연구자들 테라젠이텍스 ( 대표이사 고진업 , 조형석 )

cedric
Télécharger la présentation

" OMICS( 체학 )” Revolution & research paradigm shift in biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. "OMICS(체학)” Revolution & research paradigm shift in biology 박종화 (Jong Bhak) 게놈연구재단 개인게놈연구소 http://personalgenomicsinstitute.org

  2. Acknowledgement and Disclaimer • 국립암센터, 이연수박사님 • 유전체 정보를 생산하고 공유하는 많은 연구자들 • 테라젠이텍스(대표이사 고진업, 조형석) • 게놈연구재단을 지원해주시는 많은 분들 (http://genomefoundation.kr) • 진실된 태도와 열정을 가진 동료들 • 모든 자료는 공짜입니다. (under BioLicense : http://biolicense.org)

  3. What is Omics? • 생물학이 공정화, 산업화 하는 과정에서 생긴 말. • 생명 ‘공학’과 생명 ‘산업’ 이 ~2008년 부터 생김 • http://omics.org

  4. Ome BiO Matrix Transcript Proteome Interactome Functome Textome Genome 자원 file Sequence Resource Structure Expression 분석 분석 Pipeline Pathway Regulation Materials BioEngine DB Network BioDiversity Info Type 소재은행 Portal 2ndary DB 2ndary Info

  5. (Personal Genomics)개인 유전체학? • Core Tech: 대중화된 개인유전체 (personal genomics) • What is it? • 각 개개인의 유전자 타입에 맞춰서 의사가 진단, 처방, 조언 • 일반 대중이 유전체 해독기와 분석기를 어떤 식으로든 이용할 수 있음. • http://PersonalGenomics.org

  6. What is Genomics?

  7. Genomic T : 유전체 “티”자 6 billion persons 6 billion Bases Jong Bhak, under BioLicense

  8. 유전체학의 양대축: 인족의 다양성 과 개개인의 게놈 6 billion people 50,000 Bases for PASNP Dr. Kim Seong-jin Jong Bhak, under BioLicense

  9. DNA 서열 해독과 DNA (유전자) 타이핑 • 서열 해독은 세포의 모든 DNA/RNA를 해독하는 것  서열 해독기 HellogenomeTM • 유전자 타이핑은 세포의 일부 DNA/RNA 를 타입을 결정해주는 것  바이오 칩 HellogeneTM

  10. 개인 게놈 해독 역사 • NCBI Reference genome, Pool DNAs Caucasian • Craig Venter, Caucasian (publically available) • James Watson, Caucasian (publically available) • Nigerian (anonymous), African (HapMap) • YH, Han Chinese, publically available (BGI) • Seong Jin Kim, Publically available (테라젠 팀) • AK1, Korean , Publically available • Rosalynn Gill, Caucasian, (publically available) 최초 공개된 여성 PGP9 (테라젠)

  11. 게놈 해독의과거, 현재, 미래 HGP: 13 years, $2.7 billion (3.5조원 14X) 2004 • Craig Venter: 4 years, $100 million (1300억 원) 2007년 • James Watson: 2 years, $2 million (26억원 7.5X) 2008 • 김성진박사: 6 months, $0.17 million (2억 2천 만원 29X) 2009 • 2010: 1 month, $20,000 (2천 6백만원30X) • 2015(?): <1 week, $1,000 (1백 3십만원)

  12. Omics revolution 의 중심? • 대용량의 정확한 데이터: • 서열 해독: sequencingtech/cost • 초고속의 자동화된 분석 • 서열분석: computing tech/cost

  13. Ome and Omics graph (옴과 오믹스의 관계) $3,000,000,000 $50,000 per person Cost $ 0 2016 2003 Ome and Omics Balance point ~ 2010 Year Jong Bhak, under BioLicense

  14. 실제 예: 유전체학의Y 축

  15. 최초의 한국인 게놈 해독 논문: 김성진박사 게놈 데이터 공개 시점 2008년 12월: ftp://bioftp.org

  16. 두번째 한국인 게놈: AK1 (서울대 의대: 무명) • 서울대 의대와 마크로젠이 2009년 해독 • 데이터 공개 시점 2009년 12월

  17. The first Korean Genome (SJK) • First analyzed by Gacheon medical school LCDIand KOBIC, KRIBB in 2008 (Joint effort among LCDI, KOBIC, and 국가참조표준센터) • First annotated and made public on 4th Dec. 2008 (through web and ftp) • SNP, CNV, indels were analysed • Automated phenotypic association study was done • Non-syn. Analysis • Phylogeneticstudy ofmtDNA, Y Chr And autosomes showed Korean relationship to Chinese and Japanese. • First intra-Asian genome comparison (Chinese and Korean) • Analyzed at: 7.8, 17.3, 23.5 and 28 x folds • By Jan. 23.5 fold sequenced and analyzed • Openfreely Available from: http://koreagenome.org

  18. The Karyogram of the donor DNA No obvious chromosomal abnormalities!

  19. Classification and number of intra-genic SNPs Not represent in dbSNP

  20. Comparison of individual SNPs SJK shared 56% with Yoruba SJK shared 60% with Chinese Korean vs African : 56% Korean vs Chinese : 60% SJK shared 50% with Venter SJK shared 53% with Watson Korean vs Caucasians : 52%

  21. Korean Genome Variation Browser SJK’s SNPs “NOC2L” gene Hapmap Watson’s SNPs YH’s SNPs Venter’s SNPs http://koreagenome.org/cgi-bin/gbrowse/kgenome/

  22. SJK’s genetic lineage Autosomal phylogenic tree SJK Chromosome Y haplogroup lineage mtDNA ethno-geographic lineage

  23. Size distribution and classification of short indels found in SJK Using MAQ, we identified 342,965 short indels  We found that only 247 (0.1%) were validated,113,287 (33.0%) non-validated, and 229,431 (66.9%) indels were not found in dbSNP

  24. Indels in SJK genic regions

  25. Comparison of individual Indels Comparison of the SJK indels (< 4bp) overlapped with those of YH, HuRef (Venter), Watson, and NA18507 (Yoruba) genomes This discrepancy seems to result from the method used rather than from the ethnic similarities between SJK and NA18507 (i.e., because, paired-end sequencing was used for SJK and NA18507). This may partially explain why HuRef and Watson which are Caucasian as the NCBI reference, have lower levels (86.2% and 87.8%) of common indels against SJK.

  26. Homo- and heterozygous deletions in SJK genome (A) Homozygous 2.3 kb genomic deletion and (B) Heterozygous 5 kb genomic deletion.

  27. Detection and identification of structural variants • We found structural variants by using paired-end reads. • 2920 deletions (100bp ~ 100kb) • 415 inversions (100bp ~ 100kb) • 963 insertions (175bp ~ 250bp) • We found deletion SVs in 21 coding genes. •  All heterozygous deletions

  28. Repeat composition in SJK deletion variants Long Interspersed Nuclear Elements (LINE) Short Interspersed Nuclear Elements (SINE)

  29. 실제 예: 유전체학의X축

  30. The population diversity in Pan Asia • PASNP consortium • http://pasnpi.org 6 billion people

  31. PASNP project • is an international consortium for Pan Asian’s SNP study. • Researches with Asian populations : • 1. Population diversity mapping (Phylogenic study) • 2. Functional annotation study • 3. CNV study • Basic services : • 1. SNP data repository and basic information services via web for Pan Asians

  32. 1. Sample number: 1,8332. Ethnic group: 763. Country: 114. SNP marker number: 58,960 (Affymetrix 56K Xba SNP genotyping chip) Sampling from Pan Asia 11 countries

  33. Genotyped 76 ethnic groups over 11 countries

  34. How many recognizable human groups in the world? • Just in the right fig., there are simply six recognizable population groups. • When we consider human migration, isolation, admixture, and more ethnic groups, this is not a simple question. Maximum likelihood tree of 29 populations. The tree is based on data from 19,934 SNPs. Bootstrap values based on 100 replicates

  35. Admixture, migration, andisolation events make population grouping more complicated In this study, we found 1. Genetic ancestry is strongly correlated with linguistic affiliations as well as geography. 2. Most populations show relatedness within ethnic/linguistic groups despite prevalent gene flow amongst populations.

  36. PCA results Finding 1: Genetic ancestry is strongly correlated with linguistic affiliations as well as geography Finding 2: Most populations show relatedness within ethnic/linguistic groups despite prevalent gene flow amongst populations

  37. Phylogentic and population structure analysis results Finding 1: Genetic ancestry is strongly correlated with linguistic affiliations as well as geography Finding 2: Most populations show relatedness within ethnic/linguistic groups despite prevalent gene flow amongst populations Phylogenic tree: Da distance based NJ tree Population stratification : STRUCTURE

  38. ASD based NJ individual tree Finding 1: Genetic ancestry is strongly correlated with linguistic affiliations as well as geography Finding 2: Most populations show relatedness within ethnic/linguistic groups despite prevalent gene flow amongst populations

  39. Eight population outliers whose linguistic and genetic affinities are inconsistent • AX-ME/Melanesian, MY-JH/Jehai (Negrito), MY-KS/Kensiu (Negrito), • TH-MO/Mon, TH-KA/Karen, CN-JN/Jinuo, IN-TB/Spiti, and CN-UG/Uyghur • These linguistic outliers tend to cluster with their geographic neighbors or to occupy an intermediate position between their geographic neighbors and the more distant members of their linguistic group • These patterns are consistent either with substantial recent admixture among • the populations, a history of language replacement , or uncertainties in the linguistic classifications themselves

  40. Considerable gene flow among Asian populations was observed • Considerable gene flow was observed amongst sub-populations in clusters, including those groups believed to practice endogamy based on linguistic, cultural and ethnic information • STRUCTURE reveals that the six Han Chinese population samples show varying degrees of admixture between a northern ‘Altaic’ cluster and a ‘Sino-Tibetan/Tai-Kadai’ cluster which is most frequent in the ethnic groups sampled from southern China and northern Thailand

  41. Peopling of Asia: one-wave VS. two-wave hypothesis • Two-wave hypothesisCavalli-Sforza et al (Nat Genet, 2003) suggested a hypothesis of peopling of Asia that anatomically modern humans (also called Homo sapiens sapiens) spread into Asia through two routes. • The first was a southern route, perhaps along the coast to south and Southeast Asia, from where it bifurcated north and south. In the south, these modern humans reached Oceania between 60 and 40 kya, whereas the northern expansion later reached China, Japan and eventually America. • The second was a central route through the Middle East, Arabia or Persia to central Asia, from where migration occurred in all directions reaching Europe, east and northeast Asia about 40 kya, after which the first and principal migration to America suggested by Greenberg occurred not later than 15 kya. • One-wave hypothesis • All modern East Asian and Southeast Asian populations was derived from a single initial entry of modern humans into the subcontinent. • Which model can be better explanation for our observation? Our current observation is expressed in right figures.

  42. Forward time simulation Peopling of Asia: one-wave versus two-wave hypothesis • Based on hypothesis of Cavalli-Sforza et al. and our observations on Asian Negritos, we constructed three models which are testable. • Current our observation is like the one, Model 3. • If Model 1 or 2 can change into our current observation under some known conditions, “two-wave models” will be acceptable. *Conditions - Allele frequency spectrum of MRCA was from YRI. 10,000 SNPs were simulated. One generation was calculated as 20 years. Gene flow proportion was set to different levels (M=0.005~0.95) • Hypothetical models of the peopling of Asia. Model 1 and Model 2 represent the “two waves” hypothesis (Cavalli’s hypothesis), and Model 3 represents the “one wave” hypothesis. • - AF: African; NG: Negrito; AS: Asian; EU: European.

  43. Results and Conclusion: Forward time simulation Peopling of Asia: one-wave versus two-wave hypothesis • Our simulation results indicate that model 1 is not compatible with the empirical data, • Model 2 is only compatible if gene flow from other Asian populations to the Negritos has been fairly extreme, with more than 50% of Negrito chromosomes coming from other Asian populations, without dramatically affecting the Negrito phenotype. • Thus Model 1 and 2 are impertinent to the explanation of current observation. No extreme gene flow! Negrito: The Semang people of the Malay Peninsula Thailand people

  44. Finding 3: Haplotype diversity was strongly correlated with latitude with diversity decreasing from South to North Haplotype diversity versus latitudes • Haplotype diversity was strongly correlated with latitude (R2 = 0.91, P < 0.0001), with diversity decreasing from South to North. • This is consistent with a loss of diversity as populations moved to higher latitudes. ① Indonesian; ② Malay; ③ Philippine; ④ Thai; ⑤ South Chinese minorities; ⑥ Southern Han Chinese; ⑦ Japanese & Korean; ⑧ Northern Han Chinese; ⑨ Northern Chinese Minorities; ⑩ Yakut.

  45. Finding 4: Southeast Asian has the most of the Asian gene pools Group private haplotype sharing analysis (Frequency was not considered (type only)) • 90% of haplotypes in East Asian populations was found in SEA and Central-South Asian (CSA) populations . • Of which about 50% were found in SEA and EA only and 5% found in CSA only. • These observations suggest that the geographic source(s) contributing to EA populations were mainly from SEA populations. * Proportion of haplotypes in population A that can be also found in population B (HSa) • HSa :CSA private • HSb : EA private • HSc : sharing by all groups • HSd: SE private * YKT: Yakut; N-CM: Northern Chinese minorities; N-HAN: Northern Han Chinese; JP-KR: Japanese and Korean; S-HAN: Southern Han Chinese; S-CM: Southern Chinese minorities; EA: East Asian

  46. Koreans and his neighbors - Northeastern Asians including Japanese, Northern Chinese, and Korean have high autosomal genetic similarity compared to others. American Indians Northern Asians Northeastern Asians Chinese and SEAs Figure S28 Maximum likelihood tree of 126 population samples (PASNP + HGDP). * Bootstrap values based on 100 replicates are shown. Language families are indicated with colors as shown in the legend. AX: Affymetrix; CN: China; ID: Indonesia; IN: India; JP:Japan; KR: Korea; MY: Malaysia; PI: the Philippines; SG:Singapore; TH: Thailand; TW: Taiwan.

  47. CPU used for PASNP: 600 CPUs for about one year running “Structure” program. Only 56K SNP chip data (2000 samples)

  48. Exome sequencing 49/12

  49. Targeted resequencing개요 • 게놈 DNA의 일부를 분리해 내어 차세대 sequencing으로 분석함 • -필요성: 현재 상용화된 차세대 sequencing으로도 주요 게놈 분석을 달성하기 위해 • 전체 게놈의 1% 만을 보는 서비스 아이디어 대두 • - 핵심 기술은 게놈 DNA의 일부를 분리해는 target capture 기술임 • (유사어: exon capture, target selection) • - 종류: 액상 분리 (Agilent)와 microarray상 분리 (Roche, Agilent, Febit)가 있음 Targeted Resequencing Process Capture 디자인, 제조 Target capture 게놈 DNA 준비 Sequencing 데이터 분석 3/10

More Related