Uncertainties of Parton Distribution Functions

Uncertainties of Parton Distribution Functions Daniel Stump Michigan State University & CTEQ Phystat

High energy particles interact through their quark and gluon constituents – the partons. Asymptotic freedom : the parton cross sections can be approximated by perturbation theory. Factorization theorem : Parton distribution functions in the nucleon are the link between the PQCD theory and measurements on nucleons. Phystat

Parton distribution functions are important. Phystat

The goals of QCD global analysis are • to find accurate PDF’s; • to know the uncertainties of the PDF’s; • to enable predictions, including uncertainties. Phystat

The systematic study of uncertainties of PDF’s developed slowly. Pioneers… J. Collins and D. Soper, CTEQ Note 94/01, hep-ph/9411214. C. Pascaud and F. Zomer, LAL-95-05. M. Botje, Eur. Phys. J. C 14, 285 (2000). Today many groups and individuals are involved in this research. Phystat

Current research on PDF uncertainties CTEQ group at Michigan State (J. Pumplin, D. Stump, WK. Tung, HL. Lai, P. Nadolsky, J. Huston, R. Brock) and others (J. Collins, S. Kuhlmann, F. Olness, J. Owens) MRST group (A. Martin, R. Roberts, J. Stirling, R. Thorne) Fermilab group (W. Giele, S. Keller, D. Kosower) S. I. Alekhin V. Barone, C. Pascaud, F. Zomer; add B. Portheault HERA collaborations ZEUS – S. Chekanov et al; A. Cooper-Sarkar H1 – C. Adloff et al Phystat

Outline of this talk (focusing on CTEQ results) • General comments; CTEQ6 • Our treatment of experimental systematic errors • Compatibility of data sets • Uncertainty analysis • 2 case studies • inclusive jet production in ppbar or pp • strangeness asymmetry Phystat

Global Analysis … of short-distance processes using perturbative QCD (NLO) The challenge of Global Analysis is to construct a set of PDF’s with good agreement between data and theory, for many disparate experiments. Phystat

The program of Global Analysis is not a routine statistical analysis, because of systematic differences between experiments. We must sometimes use physics judgement in this complex real-world problem. Phystat

Parametrization At low Q0 , of order 1 GeV, P(x) has a few more parameters for increased flexibility. ~ 20 free shape parameters Q dependence of f(x,Q) is obtained by solving the QCD evolution equations (DGLAP). Phystat

CTEQ6 -- Table of experimental data sets H1 (a) 96/97 low-x e+p data ZEUS 96/97 e+p data H1 (b) 98/99 high-Q e-p data D0 : d2s/d dpT Phystat

Global Analysis data from many disparate experiments Phystat

The Parton Distribution Functions Phystat

Different ways to plot the parton distributions Linear Logarithmic Q2 = 10 (solid) and 1000 (dashed) GeV2 Phystat

In order to show the large and small x regions simultaneously, we plot 3x5/3f(x) versus x1/3. {Integral = momentum fraction} Phystat

Comparison of CTEQ6 and MRST2002 blue curves : CTEQ6M black dots : MRTS2002 gluon and u quark at Q2 = 10 GeV2 Phystat

Our treatment of systematic errors Phystat

What is a systematic error? “This is why people are so frightened of systematic errors, and most other textbooks avoid the subject altogether. You never know whether you have got them and can never be sure that you have not – like an insidious disease… The good news, however, is that despite popular prejudices and superstitions, once you know what your systematic errors are, they can be handled with standard statistical methods.” R. J. Barlow Statistics Phystat

Imagine that two experimental groups have measured a quantity  , with the results shown. OK, what is the value of  ? This is very analogous to what happens in global analysis of PDF’s. But in the case of PDF’s the systematic differences are only visible through the PDF’s. Phystat

We use 2 minimization with fitting of systematic errors. For statistical errors define (S. D.) Ti = Ti(a1, a2, ..,, ad)a function of d theory parameters Minimize 2 w. r. t. {am}  optimal parameter values {a0m}. All this would be based on the assumption that Di = Ti(a0) + i ri Phystat

Treatment of the normalization error In scattering experiments there is an overall normalization uncertainty from uncertainty of the luminosity. We define where fN = overall normalization factor Minimize 2 w. r. t. both {am} and fN. Phystat

A method for general systematic errors ai : statistical error of Di bij : set of systematic errors (j=1…K) of Di Define quadratic penalty term Minimize c2 with respect to both shape parameters {am} and optimized systematic shifts {sj}. Phystat

Because c2 depends quadratically on {sj} we can solve for the systematic shifts analytically, ss0(a). Then let, and minimize w.r.t {am}. The systematic shifts {sj} are continually optimized [ ss0(a) ] Phystat

So, we have accounted for … • Statistical errors • Overall normalization uncertainty (by fitting {fN,e}) • Other systematic errors (analytically) We may make further refinements of the fit with weighting factors Default : we and wN,e = 1 The spirit of global analysis is compromise – the PDF’s should fit all data sets satisfactorily. If the default leaves some experiments unsatisfied, we may be willing to reduce the quality of fit to some experiments in order to fit better another experiment. (We use this sparingly!) Phystat

Quality How well does this fitting procedure work? Phystat

Comparison of the CTEQ6M fit to the H1 data in separate x bins. The data points include optimized shifts for systematic errors. The error bars are statistical only. Phystat

Comparison of the CTEQ6M fit to the inclusive jet data. (a) D0 cross section versus pT for 5 rapidity bins; (b) CDF cross section for central rapidity. Phystat

How large are the optimized normalization factors? Phystat

We must always check that the systematic shifts are not unreasonably large. 10 systematic shifts NMC data 11 systematic shifts ZEUS data Phystat

Comparison to NMC F2 without systematic shifts Phystat

A study of compatibility Phystat

N c2 c2/N Table of Data Sets The PDF’s are not exactly CTEQ6 but very close – a no-name generic set of PDF’s for illustration purposes. Ntot = 2291 c2global= 2368. Phystat

The effect of setting all normalization constants to 1. Dc2 c2(opt. norm) = 2368. c2(norm 1) = 2742. Dc2 = 374.0 Phystat

By applying weighting factors in the fitting function, we can test the “compatibility” of disparate data sets. Example 1. The effect of giving the CCFR F2 data set a heavy weight. Dc2 Dc2 (CCFR) = -19.7 Dc2 (other) = +63.3 Giving a single data set a large weight is tantamount to determining the PDF’s from that data set alone. The result is asignificant improvement for that data set but which does not fit the others. Phystat

Example 1b. The effect of giving the CCFR F2 data weight 0, i.e., removing the data set from the global analysis. Dc2 Dc2(CCFR) = +40.0 Dc2 (other) = -17.4 Imagine starting with the other data sets, not including CCFR. The result of adding CCFR is that c2global of the other sets increases by 17.4 ; this must be an acceptable increase of c2 . Phystat

Example 5. Giving heavy weight to H1 and BCDMS Dc2 Dc2 for all data sets Dc2(H & B) = -38.7 Dc2(other) = +149.9 Phystat

Lessons from these reweighting studies • Global analysis requires compromises – the PDF model that gives the best fit to one set of data does not give the best fit to others. This is not surprising because there are systematic differences between the experiments. • The scale of acceptable changes of c2 must be large. Adding a new data set and refitting may increase the c2‘s of other data sets by amounts >> 1. Phystat

Clever ways to test the compatibility of disparate data sets • Plot c2 versus c2 J Collins and J Pumplin (hep-ph/0201195) • The Bootstrap Method Efron and Tibshirani, Introduction to the Bootstrap (Chapman&Hall) Chernick, Bootstrap Methods (Wiley) Phystat

(I)Methods Uncertainty Analysis Phystat

a2 nearby points are also acceptable the standard fit, minimum c2 a1 We continue to use 2globalas figure of merit. Explore the variation of 2global in the neighborhood of the minimum. The Hessian method (m, n = 1 2 3 … d) Phystat

“Master Formula” Classical error formula for a variable X(a) Obtain better convergence using eigenvectors of Hmn Sm(+) and Sm(-) denote PDF sets displaced from the standard set, along the  directions of the mth eigenvector, by distance T = (Dc2) in parameter space. (available in the LHAPDF format : 2d alternate sets) Phystat

l : Lagrange multiplier controlled by the parameter l The Lagrange Multiplier Method … for analyzing the uncertainty of PDF-dependent predictions. The fitting function for constrained fits Minimization of F [w.r.t {am} and l] gives the best fit for the value X(a min,m ) of the variable X. Hence we obtain a curve of c2global versus X. Phystat

The question of tolerance X : any variable that depends on PDF’s X0 : the prediction in the standard set 2(X) : curve of constrained fits For the specified tolerance ( c2 = T2 ) there is a corresponding range of uncertainty,  DX. What should we use for T? Phystat

Estimation of parameters in Gaussian error analysis would have T = 1 We do not use this criterion. Phystat

( = s / N ) Aside: The familiar ideal example Consider N measurements {i} of a quantity q with normal errors {si} Estimate q by minimization of c2, The mean of qcombined is qtrue , the SD is and The proof of this theorem is straightforward. It does not apply to our problem because of systematic errors. Phystat

Add a systematic error to the ideal model… (for simplicity suppose bi = b ) Estimate q by minimization of c 2 ( s : systematic shift, q : observable ) and ( = s 2/N + b 2 ) Then, letting , again Phystat

Still we do not apply the criterion Dc2 = 1 ! • Reasons • We keep the normalization factors fixed as we vary the point in parameter space. The criterion Dc2= 1 requires that the systematic shifts be continually optimized versus {am}. • Systematic errors may be nongaussian. • The published “standard deviations” bij may be inaccurate. • We trust our physics judgement instead. Phystat

To judge the PDF uncertainty, we return to the individual experiments. Lumping all the data together in one variable – Dc2global – is too constraining. Global analysis is a compromise. All data sets should be fit reasonably well -- that is what we check. As we vary {am}, does any experiment rule out the displacement from the standard set? Phystat

In testing the goodness of fit, we keep the normalization factors (i.e., optimized luminosity shifts) fixed as we vary the shape parameters. End result e.g., ~100 for ~2000 data points. This does not contradict the Dc2= 1 criterion used by other groups, because that refers to a differentc2 in which the normalization factors are continually optimized as the {am} vary. Phystat

Some groups do use the criterion of 2 = 1 for PDF error analysis. Often they are using limited data sets – e.g., an experimental group using only their own data. Then the 2 = 1 criterion may underestimate the uncertainty implied by systematic differences between experiments. An interesting compendium of methods, by R. Thorne Phystat

Uncertainties of Parton Distribution Functions

Uncertainties of Parton Distribution Functions

Presentation Transcript

Progress in parton distribution functions

Parton distribution functions and quark orbital motion

Selected Recent Results on Parton Distribution and Fragmentation Functions

Nucleon Polarized Parton Distribution Functions

Parton Distribution Functions

Probing Parton Distribution Functions with Neutrinos

A Bayesian Analysis of Parton Distribution Uncertainties

Parton Distribution Functions: from HERA to LHC

Parton Distribution Functions from Global Fits

Sensitivity of Tevatron Measurements to Parton Distribution Functions

Nuclear Modifications of Parton Distribution Functions

Status of Recent Parton Distribution Analyses

Parton Distribution Functions and Electroweak Physics

A prediction of unintegrated parton distribution

Parton distribution functions and jet cross sections at HERA

AAC analysis of polarized parton distributions with uncertainties

Parton Distribution Functions

A meta analysis of parton distribution functions for LHC applications

QCD Factorization and Universality of Nuclear Parton Distribution Functions

Parton Distribution Functions

Self Organizing Maps: Parametrization of Parton Distribution Functions