An Introduction into Stata I

An Introduction into Stata I Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” Session 3, June 9, 2011

Contents • Introduction into the workplan • Introduction into the dataset • Introduction into STATA I • Overview on working with STATA • Menues and editors • General editor • Data editor • Do File editor • The Grammar of STATA • commands • loading data • describing data • graphs • Working with Do-Files

1 Workplan • Forming four teams à 4-5 students • Introduction and outline of research question • Review of literature on labour market effects of migration (3-5 pages) • Description of the dataset • Data sources and caveats • Descriptive statistics and graphs • Presenting the empirical model • Presenting and discussing the regression results • Conclusions • Presenting the papers in class

2 The dataset: general information • The IAB employment sample (IABS) • 2% random sample of all employees obliged to pay social security contributions and recipients of unemployment benefits (e.g. SGB II and III) • Precise information on wages and unemployment spells • Information on education and work experience • Period: 1974-2004 (meanwhile until 2008) • Here we use 1980 – 2004 since information at beginning of sample period are less reliable • Focus on Western Germany excl. (West-)Berlin due to unification

2 The dataset: Caveats I • Identification of foreigners by nationality • We use nationality of first spell to control for nationalisations • Problem to identify immigration of ethnic Germans (Spätaussiedler) • We try to identify via programme participation • No civil servants (“Beamte”) and self-employed • Nothing what we can do. • Wages are censored at legal pension threshold level (66,000 Euros) • We impute wages above threshold level

2 The dataset: Caveats II • Missing education information (17%, about 35 per cent of foreigners) • We impute education information • We have only daily wages (not hourly wages) • We exclude all part-time workers • See Brücker/Jahn (2011), Data Section for Description and FDZ at IAB for description of data set

2 The dataset: Organisation • We distinguish 25 years (1980 – 2004) • We distinguish 64 labour market spells by education (4), work experience (8) and nationality (2) • 4 x 8 x 2 = 64 • We use the following indexes: • h = native (German) • f = foreigner • q = Education • k = work experience • t = time • Note that we have also aggregates in the dataset (e.g. wt, wqt, wqkt and not only whqkt, wfqkt)

General overview of STATA • The desktop of STATA is divided in four different parts: • Review shows executed commands • Results shows the results of your commands • Variables the current list of variables in the data set • command here the commands have to be typed in

Review window: Lists yourpreviouscommands

Resultwindow: Shows outcomeofyourcurrentcommand

Variable window: Shows variables ofyourdataset

Command window: Here you can type your commands

STATA has the following menues/editors you can work with: • The desktop menue You can run all commands here • The data editor Here you can edit the data you have loaded • The data browser Here you can browse the data you have loaded, but not edit • The do file editor The do file is a file where you can edit and execute all types of commands. Very useful for replication and memorizing what you have done. We come back to this.

The Data Editor. Youcanchangeeachcellbyhand. The Data Browser lookssimiliar. But youcan‘teditthedata.

The Do File Editor. Youcan type yourcommandsandexecuteyourcommandsthere. (Words in starsare not treatedascommands, e.g. * Note that … *).

The Grammar of STATA General Structure of STATA [prefix :] command [varlist] [if] [in] [weight] [, options]

General structure of STATA We will concentrate on: [prefix :] command [varlist] [if][in] [weight] [, options]

General structure of STATA We will concentrate on: [prefix :] command [varlist] [if][in] [weight] [, options] What you want to do?

[prefix :] command [varlist] [if][in] [weight] [, options] • First step how to load data: • > use “Filename” , clear • Practice: • > use “C:\EigeneDateien\Stata\data1.dta” , clear • other option to load data: • -> File -> Open -> Choose your data

General structure of STATA There are two types of variables (data): numerical variables, e.g.: 0, 1, 501, 0.5, -12 etc. string variables, e.g.: no voc train , male, female etc. How to deal with the data types: Numerical variables: you can do all mathematical operations, e.g. var1 + var2, var1/var2, var1*var2 etc. String variables: You have to use quotation marks for identifcation, e.g. var1 = 1 if sex == “female”

The black variables arenumerical variables. The red variables arestring variable.

[prefix :] command [varlist] [if][in] [weight] [, options] Since you have now loaded the data – How to get an overview of your data? > describe “describe” gives general information about the data, such as the number of observations, the amount of variables, the label and the name of the variables etc.

[prefix :] command [varlist] [if][in] [weight] [, options] How to get an overview of your data? > list enlists the data of every single cell (e.g. persons, groups, classes) in the data set. Attention your data might be really large! “-more-” indicates that there are more information available, either put any key to continue or “q” in order to “quit”.

General structure of STATA We will concentrate on: [prefix :] command [varlist] [if][in] [weight] [, options] What is concerned?

[prefix :] command [varlist] [if][in] [weight] [, options] [varlist] stands for either a list of variables or only one variable which is concerned by the command. [varlist] is set into brackets since it’s an optional specification; in case there is no [varlist] specified, STATA will execute the command for all variables. Practice: In order to get information only about education and wages in the data set: > list edwhqkt

[prefix :] command [varlist] [if][in] [weight] [, options] Further commands to describe the data set I.: > tabstat gives a table with the mean of the variable(s) > codebook indicates the codification of the variable with information on the datatype, range, units, unitvalues, missings, mean, standard deviation, percentiles In practice: tabstatwhqktwfqkt codebook tabstatwhqkt

[prefix :] command [varlist] [if][in] [weight] [, options] Further commands to describe the data set II.: > summarize gives the absolute frequencies, the mean, the standard deviation, the minimum and the maximum of a variable > tabulate indicates a table with the absolute and relative distributions of a certain variable In practice: > sum whqktwfqkt > tab whqktwfqkt

[prefix :] command [varlist] [if][in] [weight] [, options] • Practice: • how many observations • mean earnings or unemployment rate • standard deviation of earnings and unemployment rate • range of observations (minimum and maximum wage and unemployment rate) • Note that the descriptive statistics provides already interesting information about the data, helps to control for outliers and measurement error and for the interpretation of regression results (most results refer to the sample mean)

General structure of STATA We will concentrate on: [prefix :] command [varlist] [if][in] [weight] [, options] Under which condition

[prefix :] command [varlist] [if][in] [weight] [, options] • With [if] you can set a condition, or make restrictions. • e.g. in order to get to know only the average income of migrants with the lowest education (no vocational training). • summarize wfqkt if ed == “no voc train”? • “no voc train” is a string variable (therefore the quotation marks) and indicates that an individual has no vocational training.

[prefix :] command [varlist] [if][in] [weight] [, options] How to create dummies? What is a dummy variable? A dummy variable has a value of 0 or 1. With STATA you are also able to make up new variables out of the data. In order to do so you need the command of “generate” and “replace” > gen ed1 = 0 > replace ed1 = 1 if education == “no voc train” Other example: > gen ex1 = 0 > replace ex1 = 1 if ex == 1

[prefix :] command [varlist] [if][in] [weight] [, options] How to calculate and transform numerical variables > generate newvar = var1 – var2 STATA knows the mathematic calculations rules (+, -, /, logs, etc.) Practice: Create the log wage: > generate ln_whqkt = ln(whqkt)

[prefix :] command [varlist] [if][in] [weight] [, options] How to modify variables/dummies? > replace var = (var1 – var2)/2 STATA knows the mathematic calculations rules (+, -, /, log, etc.) Practice: Replace the wage by the log wage only for low skilled > replace ln_whfqkt = ln(whqkt) if ed == “no voc train”

[prefix :] command [varlist] [if][in] [weight] [, options] How to create graphics? > graph twoway line var1 year [if] [in] STATA produces twodimensional graphs with lines, bars, dots, scatter plots etc. with the “graph twoway” command, the type of the graph is assigned after that, e.g. “line” Practice: Graph the development of native and foreign wages for the years in our sample in a given education and experience group. > graph twoway line whqktwfqkt year if ed == “no voc train” & ex == 1 > graph twoway scatter whqktwfqkt if ed == “no voc train” & ex == 1

The do-file • STATA also provides a do-file (= text-editor), into which the commands can be written. • - the do-file can be opened by the command “doedit” or by pressing “STRG + 8” or by clicking at the do-file bar. • How to execute commands in a do-file? • you write the command into the text-editor, then mark the text and press “STRG + d” • in case of no text is marked, the whole do-file will be executed. That can create troubles if you have in your list of commands a mistake. (That happens in most cases.)

The do-file • Reasons to use a do-file: • your work is documented and reproducible! • you can include comments into your work by setting a “*” at the very beginning of the line (they automatically get a green color): • e.g. • > *load data • > use “C:\User\...data1.dta” , clear • > *get an overview • > describe • - you can save your do-file ->File ->Save • - and you also can open do-files ->File ->Open • - do-files have the extensions “.do”

Thisis an exampleof a Do-File. First I „setmore off“ andloadthedata. Second I use a commandforpanelregressions. Third I generatesome variables. The remarks in starsareexplaingwhatI‘mdoing.

Now I markthelineswhere I havethecommands I wanttoexecute. Then I press theexecutebutton.

Next Meeting: June 30, Room RZ 1.03!

An Introduction into Stata I

An Introduction into Stata I

Presentation Transcript

Introduction to Stata Programming

An Introduction to Stata for Economists Part I I: Data Analysis

An Introduction into the Web API

STATA I: An Introduction Into the Basics

Introduction to STATA/SPSS

STATA: An Introduction Into the Basics

Introduction to Stata

I ntroduction to STATA

Automating Your Work: An Introduction to Programming in Stata

INTRODUCTION TO STATA

Stata Introduction, Short v2

An Introduction into euroCRIS

Introduction to STATA

Introduction to Stata

Introduction into STATA III: Graphs and Regressions

Stata Introduction, Short v2

An Introduction to Stata for Survey Data Analysis

I. Writing an Introduction

An Introduction to Stata for Economists Part I: Data Management

An Introduction to Stata for Economists Part I I: Data Analysis