次数依变量模型 ( Models for Count Outcomes)

次数依变量模型(Models for Count Outcomes)

Models for Count Outcomes (计次变量模型) • Count variables indicate how many times something has happened. • 美国总统否决法案的次数 • 某教授发表论文的篇数 • 非洲国家发生政变的次数 2

Estimates from the linear regression models are inefficient, inconsistent, and biased • Functional form • Nonsensical predictions 3

A frequently adopted remedy for linear regression model is to make a natural logarithmic transformation of the dependent variable so that a log-linear function is acquired • Because zero is one of the observed values, a constantc is often added to the dependent variableYi, i.e., ln(Yi +c) 4

Example: Article Counts(论文篇数) example (file name:couart2): the data on the number of publications produced by Ph.D. biochemists are used 5

Count Models • Poisson Regression Model (PRM泊松模型) • Negative Binomial Regression Models（负二项模型） 6

泊松分布（Poisson Distribution） • 若依变数 y 是计数(count)在某个时段内感兴趣的事件(event)共发生了几次，,其值为包含0在内之正整数，且在学理上并无上限，这类型变量的分布属于泊松分布（Poisson distribution）

泊松分布的一大特色是：期望值，其变异量亦为泊松分布的一大特色是：期望值，其变异量亦为 • 泊松分布的连接函数为对数函数（log link）

泊松分布的变异量是随平均数之大小而定，此一特性常称为「变异量与期望值相等」（equidispersion）泊松分布的变异量是随平均数之大小而定，此一特性常称为「变异量与期望值相等」（equidispersion）

Poisson Regression Model (PRM泊松回归模型)：将GLM之「系统部分」设为自变数的线性组合后，代入连接函数中：

Interpretation of PRM • the expected value of the count variable (rate of occurrence):listcoef, prchange • the probability of counts:prvalue • predicted count:prtab 11

Interpretation of PRM 1. Change in for changes in the independent variables • factor ( or percent) change in expected count usinglistcoef • 在其他变数固定不变的情形下,女性科学家的平均论文数是男性科学家的女性科学家的0.8倍(或,少20%) 12

在其他变数固定不变的情形下,指导教授的论文数增加一个标准差,科学家的平均论文数会增加27%在其他变数固定不变的情形下,指导教授的论文数增加一个标准差,科学家的平均论文数会增加27% For a standard deviation increase in the mentors’ productivity, a scientist's mean productivity increases by 27 percent, holding all other variables constant 13

Marginal and Discrete change in (predicted rate) using prchange 在一般情形下(其他变数保持在平均值),女性科学家的平均论文数会比男性少0.36篇 14

2. creating ideal types withprvalue andprtab: 15

Negative Binomial Model（负二项模型） • 变异量过大问题 • 泊松回归在理论模型中均设定变异量等于期望值 16

实际上，经验资料的变异量往往大于理论的预期，即实际上，经验资料的变异量往往大于理论的预期，即，称为变异量过大（overdispersion）问题 • 若不校正，系数之标准误会被低估，使得检定比实际更容易在统计上显著，造成推论上的误判

造成变异量过大的诸多原因之一，就是事件发生率 除了受已观测到的引数影响之外，还有研究者「未观测到的异质」（unobserved heterogeneity）

处理方式有二： • 不采用泊松回归本身的标准误，而另行计算不会低估的变异量及共变数矩阵（variance-covariance matrix of the estimator, VCE），以估计强韧标准误（robust standard error）

设定事件发生率本身亦为随机变数，呈迦玛（gamma）概率分布，将之代回泊松分布后，二者合成新的「负二项」概率模型设定事件发生率本身亦为随机变数，呈迦玛（gamma）概率分布，将之代回泊松分布后，二者合成新的「负二项」概率模型

重估泊松回归之强韧标准误 • 在Stata，于poisson 指令后，加上vce(robust) 之次指令，即可估算系数强韧之标准误： poisson y x1 x2 x3, vce(robust)

两个「负二项」回归模型 • (Negbin 2或NB2) 上式显示负二项分布的条件期望值与泊松回归模型相同；但条件变异量则不同

(Negbin 1或NB1) 上式显示负二项分布的条件期望值与泊松回归模型相同；但条件变异量则不同

检定: • 当时，负二项分布的变异量等于泊松分布本身的变异量，则泊松模型适用 • 但只要是，负二项分布的变异量就大于泊松分本身的变异量(过度离散)，则负二项模型适用

Stata内建负二项回归模型指令： • nbreg y x1 x2 x3 • 在报表下方有变异量参数（alpha ）的估计值及LR的检定值。如拒斥H0，表示变异量在统计上显著地大于期望值，故应采负二项回归。 25

Stata之nbreg指令是设为NB2模型。若要以NB1模型估计，则需在加上dispersion(constant)的次指令Stata之nbreg指令是设为NB2模型。若要以NB1模型估计，则需在加上dispersion(constant)的次指令 26

Interpretation of NBM • the expected value of the count variable (rate of occurrence): listcoef, prchange • the probability of counts: prvalue • predicted count: prtab 27

Interpretation of NBR 1. Change in for changes in the independent variables • factor ( or percent) change in expected count usinglistcoef 在其他变量固定不变的情形下,女性科学家的平均论文数是男性科学家的0.8倍(或,少20%) 28

Marginal and Discrete change in (predicted rate) usingprchange 在一般情形下(其他变量保持在平均值),女性科学家的平均论文数会比男性少0.34篇 29

次数依变量模型 ( Models for Count Outcomes)