150 likes | 300 Vues
Bias-Free Estimation in Multicomponent Maximum Likelihood Fits with Component-Dependent Templates. Pierluigi Catastini I.N.F.N. - Pisa and Siena University Giovanni Punzi S.N.S. and I.N.F.N - Pisa. Problem.
E N D
Bias-Free Estimation in Multicomponent Maximum Likelihood Fits with Component-Dependent Templates Pierluigi Catastini I.N.F.N. - Pisa and Siena University Giovanni Punzi S.N.S. and I.N.F.N - Pisa P. Catastini
Problem • Suppose we have a sample of particles generated by a certain physics process produced by our experiment. • Suppose we know that the sample is a mixture of different particle types, for example, Pions, Protons and Kaons, but the proportion of each particle type is completely unknown. • Of course, our experiment is also equipped with some kind of Particle IDentification (PID) device, providing the measurement of some quantity related to the particle type. • We want to measure the fractions of each particle type : f, fp, fk . P. Catastini
The mean of the PID observablestrongly depends on particle momentum (which is an additional observable, known event-by-event): Component Dependent Templates ! Electrons Muons Protons Kaons Pions A “Real Life” Problem* … • Measuring the particle type fractions is common in particle physics: e.g. understanding the particle produced during the fragmentation of the B mesons (flavor tagging), separating different particle decays... • Usually PID information provided by energy loss of charged particle in gas (dE/dx), measurement of Time of Flight, Cherenkov light… • Solution obtained performing an unbinned Maximum Likelihood Fit. But remember… P. Catastini * At least for a Physicist…
Please write the Likelihood ! • Unfortunately, the Likelihood is not simply: • i ( fi P(pidi| , Momi)(WRONG!) • Using the above, you may get strongly biased results if the additional observables have different distributions [1]. • The reason for the failure is, quoting from [1]: • “Whenever the templates used in a multi-component fit depend on additional observables, one should always use the correct, complete Likelihood expression, including the explicit distributions of all observables for all classes of events“ • In our problem, the above means that we need to include the momentum distributionsof each particle type (they are almost always different in practice). [1] physics/0401045 (G.Punzi, PHYSTAT03) P. Catastini
Particle IDentification information is represented by a certain observable called pid; we than write the likelihood as: L (f fP fK) = i ( f P(pidi, Momi | ) + fP P(pidi, Momi| P) + (1 - f - fP)P(pidi, Momi | K) ) = i (j fj P(pidi | Momi , typej) P(Momi | typej) ) Given: • f + fP + fK = 1 • j = Pion, Proton, Kaon Writing the Likelihood… P. Catastini
We generate a sample with known particle types composition as follow: • PID variable is distributed according a typical resolution function (i.e. the template used in the fit) defined as PIDmes - PIDexp(mom): • Momenta are distributed according a gaussian N(,) : P(pidi | Momi , typej) P(Momi | typej) = 1.00GeV/c P =1.25 GeV/c K = 1.50 GeV/c = 0.25GeV P =0.25 GeV K = 0.25 GeV f = 0.50 fP =0.15 fK = 0.35 Momentum (GeV/c) A toy study of the “Real Life” Problem P. Catastini
Pions Protons OK ! OK ! If we wouldn’t take into account the momentum distributions… Pions Protons Bias ! Bias ! Result of the Fits P. Catastini
Often in “Real Life”… • Writing the complete likelihood with all observables distribution is almost straightforward. Of course, provided the assumption you can easily obtain a parameterization of those distribution… • Often we have poor information about those distribution (barely acceptable, after a very hard work!), sometimes they could be even completely unknown. • If, for example, the goal of the particle type fit we have been performing in the previous slides is to estimate the fractions of particle produced during the heavy quarks fragmentation… Grate! We have no idea about the functional form of each particle type’s momentum distribution. How can we write the correct Likelihood ? P. Catastini
Use a general functional form • Series Expansion Used from 0th to 6th term. A solution • No functional form is known in order to parameterize the missing P(Mom | type). • P(Mom | typej) = mamj Fm(Mom) with amj free parameters of the fit • We decide to use Orthogonal Polynomials, among them: • Legendre Polynomials Pi [-1,1] • First type Chebyshev Polynomials Ti[-1,1] • Second type “ “ Ui [-1,1] • Lagerre Polynomials Li [0,+] • Hermite Polynomials Hi [-,+] P. Catastini
Our toy • Replacing the exact distribution N(,) with mamj Fm(Mom) for each particle type, we fitted again our toy sample: Pions Protons OK ! OK ! • The Bias is again corrected ! P. Catastini
Projections of P(Mom | typej) = mamj Fm(Mom) : Pions Protons Kaons Momentum (GeV/c) Some Comments • Of course, we are happy: although we didn’t know a priori the P(Mom | type) we have been able to avoid the bias. • Please, notice that resolution on the parameter is not degraded a lot ! • Just 7terms of the series expansion were used! Not so many. P. Catastini
Suppose our PID information is obtained by the measurement of the Time Of Flight (TOF). • The expression of the Expected TOF is a function of 2 obsevables : TOFexp = arclength / c sqrt(1 + mj2/Mom2) • It means that our pdf is (after having verified that the correlation between arclength and momentum is almost zero) : P(Mom, Arcl, pid |typej,) P(pid | Mom, Arc, typej) * P(Mom | typej ) * P(Arc | typej) Both unknown ! Another Complication • We want to apply the same technique of series expansion both for momentum and arclength ! P. Catastini
Fractions, pid and momentum variables generated as before • Arclength distributed according a gaussian N(,) Same distribution for all particle types but in principle you don’t know ! • = K = P • = K = P Back to our toy L (fj, amj, blj) = i (j fj ( P(pidi | Momi , Arci, typej) mamj Fm(Momi) lblj Fl(Arci)) ) • Again we used 7 terms for the momenta series expantion • We used 3 terms for the arclegth series expantion P. Catastini
Projections of P(Arc | typej) = lblj Fl(Arc) : Pions Protons Kaons Results Pions Protons OK ! OK ! P. Catastini
Conclusions • We faced a common problem of particle physics were the incomplete Likelihood expression is cause of a detectable bias. We had cure it ! • The proposed problem has also the complication of the lack of information about the distribution of an observable! • We solved the problem, removing the bias in the fit results, including series expansion as a parameterization of the unknown distributions (coefficients free parameters determined by the fit). • We even faced the case where two observables have unknown distributions. Again we used two different series expansions in order to parameterize those distribution and avoid the bias. P. Catastini