1 / 72

데이터 분석 방법론

제 2 회 고에너지물리 여름학교. 데이터 분석 방법론. 2002. 6. 25 경북대학교 고에너지물리연구소 조기현. 목 차. 고에너지물리 데이터 처리 방법론 Fitting 결론. 고에너지 물리. Goal. 물질의 궁극적구조의 그사이 상호작용의 연구로 우주의 기원에 대한 이해. 고에너지 물리. 방향. What is World Made of?. Atom Electron Nucleus Proton, neutron quarks.

Télécharger la présentation

데이터 분석 방법론

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 제 2 회 고에너지물리 여름학교 데이터 분석 방법론 2002. 6. 25 경북대학교 고에너지물리연구소 조기현

  2. 목 차 • 고에너지물리 • 데이터 처리 방법론 • Fitting • 결론

  3. 고에너지 물리

  4. Goal 물질의 궁극적구조의 그사이 상호작용의 연구로 우주의 기원에 대한 이해 고에너지 물리 방향

  5. What is World Made of? • Atom • Electron • Nucleus • Proton, neutron • quarks

  6. How to know any of this?(Testing Theory)

  7. How to detect?

  8. How do we experiment with tiny particles? (Accelerators) • Accelerators solve two problems: • High energy gives small wavelength to detect small particles. • The high energy create the massive particles that the physicist want to study.

  9. World-wide High Energy Physics Experiment • Europe • In 2007, the LHC will be completed at CERN • Two big experiments (ATLAS, CMS) in collab. of HEP institutes and physicists all over the world • CERN, IN2P3(France), and INFN(Italy) are preparing HEP Grid for it. • USA • The BaBar Exp at SLAC • The Run II of the Tevatron at Fermilab (CDF and D0) • The CLEO at Cornell • The LHC experiments at CERN (ATLAS, CMS) • The RHIC exp at BNL • The Super-K in Japan • The HEP Grid in the ESNET program • Japan • Belle at KEK • Super-K, Kamioka • LHC at CERN (ATLAS) • The RHIC at BNL (USA) • They are now working for it. 한국이 국제 공동연구로 참여 중 • Korea • We have most of these world-wide experimental programs…

  10. 연구내용 GermanyDESY Space Station (ISS) US FNAL ChinaIHEP USBNL EuropeCERN Korea CHEP JapanKEK

  11. Where is Fermilab? • 20 mile west of Chicago • U.S.A Fermilab

  12. Booster p source Main Injector and Recycler Overview of Fermilab CDF Fixed Target Experiment D0

  13. Fermi National Accelerator Laboratory Highest Energy Accelerator in the World Energy Frontier: CDF, D0 Search for New Physics (Higgs, SUSY, quark composites,… Precision Frontier: charm, kaon, neutrino physics (FOCUS, KTeV, NUMI/MINOS,BOONE,…etc. Connection to Cosmology: Sloan Digital sky survey, Pierre Auger,… Largest HEP Laboratory in USA 2200 employees 2300 users (researchers from univ.) Budget is >$300 million

  14. Data 처리 방법론

  15. Why do we do experiments? • Parameter determination • To set the numerical values of some physical quantities • Ex) To measure velocity of light • Hypothesis testing • To test whether a particular theory is consistent with our data • Ex) To check whether velocity of light has suddenly increased by several percent since beginning of this year

  16. Type of Data • Real Data (on-site) • Raw Data : Detector Information • Reconstructed Data : Physics Information • Stream (Skim) Data : Selected interested physics • Simulated Data (on-site or off-site) • Physics generation : pythia, QQ, bgenerator, … • Detector Simulation : Fastsim, GEANT, …

  17. 연구 방법 Remote-sites (CHEP + participating institutions) Data Analysis HEP Knowledge Reaction Simulation = Event Generation Detector Simulation Simulated Data Data Reduction Real Data On-sites (Experimental sites)

  18. 오차  (Error) • 오차 (error) • 오차 : 계산치 또는 실제 값 사이의 차이 • 실제값 (true value) • 대체적으로 모름 • 통계오차(statistical error) • 데이터의 통계적 요동에 의한 의한 오차 • 계통오차(systematic error) • 장치를 옳 바로 보정하지 못하거나 관측자의 편견에 의한 오차 • 실험치의 표시 : 측정값  통계오차  계통오차 Example ) m(top) = 175.9 4.8  5.3 GeV/c2 (CDF, 1998)

  19. Why estimate errors? • To know how accuracy of the measurement • Example • 현재의 빛의 속도 측정값 c=2.998 X 108 m/sec • 새로운 빛의 속도 측정값 c=(3.09  0.15) X 108 m/sec • Case 1. If the error is  0.15, then it is consistent. • Conventional physics is in good shape. • 3.09  0.15 is consistent with 2.998 X 108 m/sec • Case 2 . If the error is 0.01, then it is not consistent. • 3.09  0.01 is world shattering discovery. • Case 3. If the error is 2, then it is consistent. • However, the accuracy of 3.09 2 is too low. • Useless measurement Whenever you determine a parameter, estimate the error or your experiment is useless.

  20. How to reduce errors? • 통계오차(statistical error) • 같은 측정을 반복한다. • N : the expected number of observation •  = Sqrt(N) : the spread • 계통오차 (systematic error) • No exact formulae • Ideal case : All such effects should be absent. • Real world : An attempt to be made to reduce it

  21. How to solve systematic errors? • Use constraint condition • Ex) Triangle • Calibrations • Energy and momentum conservation • E(after) – E(before) = 0 • |P(after)| - |P(before)| = 0 How small of the systematic error? • Systematic errors should be smaller than statistical errors

  22. The meaning of  (error) • 분포 (distributions) x -> n(x) • Discrete • ex) # of times n(x) you met a girl at age x • Continuous : • ex) Hours sleep each night (x), # of people sleeping for time. => For an even larger number of observation and with small bin size, the histogram approach a continuous distribution. • 평균(Mean)과 분산(Variance) • 가우스 분포 (Gaussian distribution) • Data 양이 많을 때 • Error 계산에서 중요

  23. Ksp+p- Tracking Performance Hit Resolution ~200mm Goal : 180mm Residual distance (cm) COT tracks L  p-p

  24. 평균(Mean)과 분산(Variance) 실제로는 참값을 알 수 없는 경우가 대부분임

  25. 평균(Mean)과 분산(Variance) • 평균 • N개의 데이터가 (x1, x2, x3,… xN) 값을 가질때 • 분산 • 실제값을 모르므로

  26. Accuracy () • 측정의 정확도를 나타냄

  27. Gaussian Distribution • 대부분 실험의 경우 Data 양이 많을 때 • Gaussian distribution is the fundamental in error treatment.

  28. Gaussian Distribution (cont’d) • The normalized function • Mean () • Width () • Width () is smaller, distribution is narrower. • Properties

  29. Gaussian Distribution (cont’d) • Mean () is same as zero. • However width ( ) is different.

  30. CDF Secondary Vertex Trigger NEW for Run 2 -- level 2 impact parameter trigger Provides access to hadronic B decays Data from commissioning run COT defines track SVX measures (no alignment or calibrations) at level 1 impact parameter s ~ 87 mm d (cm)

  31. Mn_fit 을 이용한 Gaussian fitting -  +

  32. 유효숫자 (Significant Figure) • 측정값은 실험적으로 불확실한 범위 이내서만 의미를 갖는 값 • 유효 숫자 • 첫번째 불확실한 자리까지 포함 • LSD (least significant digit)와 MSD(Most significant digit) 사이의 모든 숫자 • LSD • 소수점이 없을 때 : 가장 오른쪽이 0이 아닌 숫자 ex)23000 • 소수점이 있을 때 : 가장 오른쪽 숫자 ex) 0.2300 • MSD : 가장 왼쪽의 0이 아닌 숫자

  33. 유효숫자 (Example) • 유효숫자 네 자리 : 1234, 123400, 123.4, 1000. • 유효숫자 네 자리 : 10.10, 0.0001010, 100.0, 1.010X103 • 유효숫자 세 자리 : 1010 cf) 1010. (유효숫자 네 자리)

  34. 유효숫자 연산 • 덧셈 또는 뺄셈 • 마지막 결과의 소수부분의 자릿수는 셈에 포함된 측정값 중 가장 작은 소수점 아래 • Example) 123 + 5.35 -------- 128.35 1.0001 (유효숫자 5 자리) + 0.0003 (유효숫자 1 자리) -------- 1.0004 (유효숫자 5 자리)

  35. 유효숫자 연산 (cont’d) • 곱셈 및 나눗셈 • 가장 적은 유효 숫자와 같게 • Example) 16.3 X 4.5 = 73.35 => 73

  36. 오차의 전파 I(Propagation of Errors) • 두개 이상의 확률변수 (x1,x2, …)로 된 함수 F(x1, x2, …)표준편차는 다음과 같이 나타낼수 있다. • 단, 변수사이에 correlation이 없을 때

  37. 오차의 전파 II(Propagation of Errors) • 두개 이상의 확률변수 (x1,x2, …)로 된 함수 F(x1, x2, …)표준편차는 다음과 같이 나타낼수 있다. • 단, 변수사이에 correlation이 있을 때 => 앞으로 correlation이 없는 경우만 고려

  38. Combining Errors • 덧셈 또는 뺄셈 (F=x1+x2 or F= x1-x2) Example) x1 =100.  10. + x2 = 400.  20. ----------- F = 500.  22. Example) 측정값의 오차

  39. Combining Errors (cont’d) • F=ax (단, a는 상수) Example) x =100.  10. a = 5 ------------ F = 500.  50.

  40. Combining Errors (cont’d) • 곱셈 (F=x1•x2) Example) x1 =100.  10. x2 = 400.  20. ----------- F = (400.  45. ) X 102

  41. Combining Errors (cont’d) • 나눗셈 (F= x1/ x2) Example) x1 =100.  10. x2 = 400.  20. ----------- F = 0.250  0.028

  42. Combining results Using weighting factor • Cases • With different detection efficiencies • With different parts of apparatus • With different experiment

  43. Combining results Using weighting factor (cont’d) • 평균 • N개의 데이터가 (x1, x2,. ..xk,… xN) 값을 가지고 • Xk에 대한 error가 k라고 하면 where weighting factor • Error :

  44. Ex) World Average of sin(2b)

  45. Ex) B0 lifetime summary

  46. Ex) CDF Bd Mixing

  47. Upper Limit • Measurement (B = Bm  ) • Observation (Bm> 5) • Signal is greater than 5 sigma of error. • Evidence ( 3 < Bm < 5 ) • Signal is greater than 3 sigma of error, however less than 5 sigma. • Upper Limit (3 < Bm ) • Signal is less than 3 sigma.

  48. Upper Limit Bl (cont’d) • Method I. General Case Measurement B= Bm  Bl < Bm + 1.28 (90% CL) 1.64 (95% CL) 2.33 (99% CL) Measurement B = Bm  Ex) Bl =(3  5) X 10-9 Bl < (3+1.28X5) X 10-9 at 90% CL

  49. Upper Limit Bl (cont’d) • Method 2. Negative Bm • Background Subtracted • Example) • Bm = (-1  1) X 10-9 • Bm = ( 0  1) X 10-9 • Upper Limit at 90 % CL Level • g is Gaussian (Mean is Bm , width is )

  50. Compare Upper Limit (90% CL) Assume =1

More Related