150 likes | 274 Vues
SJTU CMGPD 2012 Methodological Lecture Day 3. Position and Status Variables. Variables for position. The basic and analytic files include a variety of indicator variables for whether a male holds position These are based on the statuses recorded in the registers
E N D
SJTU CMGPD 2012Methodological LectureDay 3 Position and Status Variables
Variables for position • The basic and analytic files include a variety of indicator variables for whether a male holds position • These are based on the statuses recorded in the registers • File with hanyu pinyin for raw occupations has been released • DS 6 • Occupations with original Chinese characters are released as PDF • Turned out to be difficult to include Chinese characters in the released data
Variables for position • In the original data, entries included the official positions held by males. • Coders assigned a numeric code to each new position, and entered the code into the dataset. • Codes started again for each new dataset • Transcribed the original Chinese into a codebook • Can use DATASET and POSITION_CODE to look up original Chinese in the appendix to the Analytic release codebook • DS 6 allows merging of hanyu pinyin for code, if you want to create your own position variables from the originals.
Position variables • We have provided a variable of flag variables identifying different kinds of position • We have a separate file that for each combination of dataset and numeric position code specifies the hanyu pinyin and Chinese characters. • This file provides flag and other variables describing characters of positions. • These flags are merged back into the main file to provide variables for analysis.
Created Position Variables • HAS_POSITION • Any salaried official position or purchased title • Doesn’t include miding, piding, etc. Those were statuses, not salaried official positions • ESTIMATED_INCOME • Imputed income based on stipends associated with the position(s) held by an individual • RANK • Bureaucratic rank, based on specification of pin in the position
Position variables • BI_TIE_SHI, ZHI_SHI_REN, and flags for specific positions • JUAN, DING_DAI etc. for presence of modifiers • EXAMINATION for any examination-related title • NO_STATUS indicates that no status at all was recorded for a male, even though we would have expected one.
Name variables • HAS_SURNAME • DIMINUTIVE_NAME • RUSTIC_NAME • NON_HAN_NAME • NUMBER_NAME
Creating New Variables • DS-6 contains pinyin for positions • DATASET and POSITION_CODE are the basis of a merge back to the data files • POSITION_PINYIN is the ‘raw’ position, as transcribed by the coders • POSITION_CORE is a stripped down version that includes modifiers • Chinese characters are in an appendix to the Analytic File codebook
Creating new variables • STATA lets you search strings for particular values, and return an indicator if a string is fine. • Can use this for occupations of special interest • For example, • generate artisan = index(POSITION_PINYIN,"jiang") > 0 • generate juanna = index(POSITION_PINYIN,”juanna”) > 0 • Can code positions manually using Chinese characters in the appendix of the Analytic File codebook
Studying attainment • We have mainly used event-history • Determinants of chances of attaining position by next register • Allows for consideration of time-varying characteristics • Characteristics of kin • An alternative would be to look at determinants of attaining a position by a specific age, with one observation per person
Creating variables to identify attainment of position by next register generate at_risk_position = SEX == 2 & PRESENT & NEXT_3 & HAS_POSITION == 0 bysort PERSON_ID (YEAR): generate next_position = at_risk_position & HAS_POSITION[_n+1] bysort AGE_IN_SUI: egentotal_at_risk_position = total(at_risk_position) bysort AGE_IN_SUI: egentotal_next_position = total(next_position) generate p_next_position = total_next_position/total_at_risk_position bysortAGE_IN_SUI: generate first_in_age = _n == 1 twoway line p_next_position AGE_IN_SUI if AGE_IN_SUI >= 1 & AGE_IN_SUI <= 80 & first_in_age, ytitle("Proportion attaining position by next register") scheme(s1mono)
bysort • bysort groups the records in the dataset according to the values of the specified variables. • Each set of records defined by a unique value of the specified variables is treated as a distinct block of records when the command is executed. • If a variable is in parentheses, the data is sorted on that variable, but not divided according to the unique values of that variable. • [ ]allows access to values from other observations in the same block. [1] says to draw the value of a variable from the first record in the block, [_N] from the last record, [_n+1] the next record and so forth • _n refers to the location of the current record within the block
x y 1 3 1 7 1 8 1 12 2 15 2 21 2 22 2 -5 3 -10 3 10 4 8 4 2 • Create a variable with the record number within x: • bysort x (y): generate a = _n • Create a flag identifying the first record within x: • bysort x (y): generate b = _n == 1 • Create a flag identifying the last record within x: • bysort x (y): generate c = _N == _n • Create a variable with the total number of records with that unique value of x: • bysort x (y): generate d = _N • Create a variable with the y from the next record within x: • bysort x (y): generate e = y[_n+1]
Results x y a b c d e 1 3 1 1 0 4 7 1 7 2 0 0 4 8 1 8 3 0 0 4 12 1 12 4 0 1 4 2 -5 1 1 0 4 15 2 15 2 0 0 4 21 2 21 3 0 0 4 22 2 22 4 0 1 4 3 -10 1 1 0 2 10 3 10 2 0 1 2 4 2 1 1 0 2 8 4 8 2 0 1 2