1
ECM607A: Microeconometrics 1
SET EXERCISE INFORMATION
Credit level: 7
Credits: 10
Module convenor: Shulin Deng (shulin.deng@reading.ac.uk)
2
This exercise is worth 50% of the final module marks for ECM607A
Exercise Background
This project is intended to give you practice of: using a panel dataset and Stata; selecting a
suitable empirical strategy (econometric estimators) to answer the set research question and
to test hypotheses; interpreting output and drawing conclusions in relation to the research
question and hypotheses. The provided dataset is an extract from the Understanding Society
survey (USS), a UK survey which follows households over time (roughly 40,000 households
and their members) and collects data on range of personal, socio-economic and attitude
variables. The extract contains data from the first nine waves (2009-2018), and includes the
general population and Northern Ireland sample, excluding ethnic minority/immigration
boost and BHPS samples. The USS is intended to be representative of the UK, and the
provided extract provides a selection of the variables available from the full USS dataset. The
data only includes individuals who completed a full interview in a given wave, and like other
longitudinal surveys suffers from attrition, therefore the sample you will use for your
estimations will not be fully representative. You are not expected to deal with issues of a nonrandom sample in this project but should be aware of any potential issues.
The dataset has been transformed into panel (long) form (ordered by individual with a
separate row for each wave they have a full interview; the dataset is, therefore, not a balanced
panel. There may be various reasons individuals are missing from a wave. Firstly, they may
not have been part of the target sample at that point e.g. because they were under 16 or were
not living in the sample household; secondly, they may not have provided full (or any)
information (you may also find individuals cannot be used in a given wave if they have
information missing on variables of interest). In total there are 240,860 person-years in the
provided dataset. The following documents have been uploaded to Blackboard under the set
exercise area:
• US_data9W.dta – the Stata data file of the extract Understanding Society Survey data
from the first nine waves (as mentioned this is already in panel long form), however, in
order to access this dataset you need to agree to the data access agreements. You will
find this agreement under the Assignment content area; once you have followed the steps
you will then be able to download the data.
• 6614_wave_1_to_9_ user_guide – a user guide for the Understanding Society which
reports information about the design of the survey which is quite complex. However, this
guide does not provide much information about the variables but you can find out more
details about the variables provided in the extract from the on-line dataset documentation
https://www.understandingsociety.ac.uk/documentation/mainstage/dataset-documentation. You
can also find the questionnaires for each wave on the online documentation which, where
relevant, will tell you the source of the questions. The selection of variables you have
been provided are taken from the individual interview(data file indresp); the household
variables from the household interview (data file hhresp) and the fixed wave variables
from the stable characteristics of individuals file (data file xwavedat).
3
• US_varlist_202021.xlsx – an Excel file with the variable list (sorted alphabetically and by
theme: labels and which data file the variable come from is provided on the alphabetical
tab) provided in the data, and waves the variables are in. The handlers of the USS, have
provided the data in Stata form, labelled variables and values of the variable (where
relevant); again for further details about variables see the on-line dataset documentation
(to find out value labels of a variable also use the labellist command):
https://www.understandingsociety.ac.uk/documentation/mainstage/dataset-documentation.
Submission
All coursework has to be submitted on-line in PDF file through the ‘Assignment Submission’
tab in the main menu on the left of the module Blackboard page. It is important that you are
officially registered to take the module for credit (as opposed to auditing it). You must submit
your work as only one file. Name your file using your University username. Instructions on
electronic submission of coursework can be found under the ‘Assignment Submission’ tab on
Blackboard.
You can write your work in any software. Please make sure that your work is anonymised.
Do not put your own name in the header or footer or the front page of your work. Blackboard
will identify your work by your username, so your coursework will not get lost in the system.
Deadline
Monday 11th January 2021, before 12 Noon
Marking Criteria
The following 4 marking criteria will be used when marking the projects.
1. 35% – Development of hypotheses and clear outline of methodology:
Clear and well developed hypotheses, with strong links to economic theory and/or
previous evidence/literature. Chosen methodology is fully developed, described, fully
understood and appropriate, with effective use of equations. The justification of
methodology makes critical reference to relevant literature. Limitations to the data
and chosen methods are well understood and discussed.
2. 35% – Application and analysis:
Effectively and correctly interpreted results, both in terms of economic and statistical
significance. Critical analysis of produced evidence and findings which has both
depth and breadth. Discussion of produced results rigorously linked to the research
question and hypothesis tests, and economic theory/evidence from the literature.
Construction of logical/convincing argument with conclusions supported by the
project findings; with implications discussed.
4
3. 20% – Use of Stata and the dataset:
Effectively and correctly use Data and Stata commands to manage and clean data, to
produce statistics, regression results and other results. It is desirable to have usage of
Stata beyond that taught.
4. 10% – Presentation and Style:
Very well organised and structured work. Written expression is eloquent and flows
flawlessly. Results, tables and other figures very well and clearly presented with
detailed and concise information. References are all listed and cited correctly and
consistently.
5
1. Exercise Topic: Life Satisfaction
Life satisfaction – the dataset includes satisfaction with life overall but also includes other
satisfaction variables such as satisfaction with income, health: these are measured on a scale
of Completely dissatisfied (1) to Completely Satisfied (7)
Note the dataset also contains an alternative well-being measure of subjective Well-Being
(GHQ).
2. Research Question
How do parenthood and marriage affect life satisfaction?
3. Exercise Structure
Undertake the following tasks included in each section (you are not limited to the suggested
section headings and you can use sub sections to help with the flow of your project):
Introduction
From the set research question (How do parenthood and marriage affect life satisfaction?),
list two hypotheses which you can test with the provided data. You can number your chosen
hypotheses so that you can easily refer back to them. You should focus on one or two
important factors for which you can make hypotheses and predictions about, and then include
other factors as a set of controls. You need to set up any prior expectations in your
hypotheses. Following each of your hypotheses, you should clearly justify what you are
doing referring to the example journal articles provided in this exercise. Although a formal
literature review is not required, do make reference to any literature, economic theory and
other sources you used to develop your resulting hypotheses.
Data
Use married and non-married individuals as the population of interest. You should discuss the
dependent variable and your chosen key explanatory variables (e.g. what your dependent and
independent variables are, and why you select them). You should provide relevant descriptive
statistics (which may include graphics if you wish) of your dependent and independent
variables (such as means, standard deviations, max, min) as this may aid your interpretation
(especially if results are unexpected). Demonstrate your understanding of the data and
variables through the use of descriptive statistics.
Methodology
Choose one estimation method (estimator) from four methods: pooled OLS, random effects,
fixed effects, and ordered probit model. Discuss why you are using this estimation method
(e.g. do tests for fixed versus random effects), how the estimator can be used to test your
hypotheses, and how your chosen variables are measured. You should use regression
equations to demonstrate the specification of your population model (i.e. include variables in
6
your regression specification). Reference should be made to any literature which helped make
these decisions.
It is important to demonstrate your understanding of the estimators (estimation methods)
applied and justify their use. I don’t expect to see full derivations of the estimator used but
expect to see some regression equations, especially for the specification of your population
model(s) and the application of the estimator. You should provide enough information about
the chosen methods so that the reader could replicate your analysis.
Results and discussion
You should provide well-presented tables of your statistics and regression output; I don’t
want to see raw Stata output. Tips for formatting your project, especially tables, are given in
the appendix.
Interpret and discuss your results (including any specification tests) to help answer the
research question and provide the results of your hypothesis tests. Remember the aim is to
use an appropriate estimator to help answer the research question and test your hypotheses, so
this is a very important section. It is important to discuss both the statistical and economic
significance of the results – i.e. look at the magnitude, does the size of the effect matter, are
the results meaningful, what do they mean in reality, are they important, would policy makers
care about the results?
A key part of research is to be able to critically analyse your results. Therefore, do link results
back to the research question and hypotheses, and more widely discuss them in relation to
economic theory (do they support or contradict what economic theory would predict?),
findings from past literature (do they back up, contradict or develop further what others have
found?) and potentially the intuition behind the results (do they make sense, do they fit with
your prior expectations?). Given you are using a sample to make inferences about a
population of interest you also need to convince the reader that your results are good
estimates of the true population parameters. Therefore, you should discuss any specification
tests, assumptions and any measures to judge the goodness of fit. It is also common to
undertake a number of sensitivity/robustness checks to check your results are unchanged –
this may involve changing the specification of your model, using a different measure of your
variables of interest or even using a different estimator (e.g. you may show there is little
difference between pooled OLS and ordered probit model but much more substantial
differences between, say, random and fixed effects).
Conclusions
The conclusions summarise main results and provide answers to the research question and the
results of your hypothesis tests. You can discuss any policy implications on the research
question. You should discuss any caveats to your results e.g. potential problems with the
estimators used, data limitations – these may appear in the conclusions or be discussed at
other relevant points (such as in the discussion of the methods or in the results section). You
can comment on potential avenues for future research.
7
References and appendices
Your reference list usually comes after the conclusions and before any appendices. You need
to ensure you have cited any references within the text correctly and provided all references
in your reference list. I do not mind which referencing system is used (e.g. the Harvard
system) so long as you are consistent. Please see the library guide on citing references
http://libguides.reading.ac.uk/citing-references. In particular these resources provide advice
on different reference systems and how to reference a variety of sources such as journal
articles, books, websites etc. There is also some advice on avoiding accidental plagiarism as
if you are directly quoting or paraphrasing someone else’s ideas (which need to be written in
your own words) as supporting evidence you need to acknowledge the source properly.
You can provide additional information in an appendix. You can report some results in the
appendix to demonstrate you have done them if you don’t want to report them (fully) in the
main body of the text; all appendices should be referred to at some point in the main text.
You must paste a copy of your do-file into the appendix (you do not need to refer to) as
discussed below.
Stata commands
You do not need to provide details of the Stata commands used within your text but you must
include a copy of a do-file to show what commands you utilised to produce your output and
to help assess whether you implemented these commands correctly (which you can copy and
paste into your document) in your appendix. It is recommended you leave in any comments
you have added so it is clear which of the reported output the reported commands has led to.
Excluding references, appendices and your do-file the recommended word count for the
exercise is around 2,500 words. The Introduction to Stata course, Stata booklet and the PDF
help files, along with the class exercises, should contain all the information you need if you
are not familiar with certain Stata commands you want to use.
4. Example journal articles
You should select hypotheses which you can test with the provided data using some of the
econometric methods that you have learnt in the module. On the on-line reading list you will
find a few articles to get you thinking:
Two review articles on life satisfaction (relatively old but these should give you an idea of
some of the factors that have been looked at in the life satisfaction literature):
Dolan, P., Peasgood, T. and White, M. (2008) Do we really know what makes us happy? A
review of the economic literature on the factors associated with subjective well-being.
Journal of Economic Psychology, 29: 94–122.
Mackerron, G. (2012) Happiness economics from 35,000 feet. Journal of Economic Surveys,
26(4), 705-735
8
And an example journal articles that used the BHPS (the predecessor to the USS, with the
BHPS subsumed into the USS dataset but these respondents not included in your data):
Della Giusta, M., Jewell, S., and Kambhampati, U. (2011) Gender and Life Satisfaction in
the UK, Feminist Economics, 17 (3), 1-34
And a few more example journal articles on life satisfaction:
Helliwell, J. F. (2003). How’s life? Combining individual and national variables to explain
subjective well-being. Economic modelling, 20(2), 331-360
Margolis, R., & Myrskylä, M. (2012). Happiness: Before and After the Kids (pp. 2012-013).
MPIDR Working Paper
Nomaguchi, K. M. (2012). Parenthood and psychological well-being: Clarifying the role of
child age and parent– child relationship quality. Social science research, 41(2), 489-498
The above journal articles are just a few examples and you should not simply try to replicate
work these papers and other papers have done but come up with your own resulting
hypotheses. You can search for papers that have used the USS data using their search
facility: https://www.understandingsociety.ac.uk/research/publications
5. Further Project Considerations
Things to consider (also refer to the steps for empirical research in the Introduction to
Econometrics for Research document):
How to Exploit the Panel Nature of the data?
How can you exploit the fact you have panel data? Note not all variables are available in all
waves so this may limit the waves you can use or you may decide that a variable can be
treated as fixed over time (common with personality variables, say). You have learnt how to
estimate both linear and non-linear (discrete) models in the module, that allow for unobserved
heterogeneity using panel data (linear models are much easier to estimate).
Should you treat satisfaction as an ordered or continuous variable?
Do you want to treat satisfaction as an ordered or continuous variable – or compare methods
treating the variable as ordered and continuous? The follow article, is commonly cited as it
argues that whether you treat life satisfaction as a cardinal or ordinal variable (i.e. continuous
versus an ordered/discrete variable) does not tend to make much difference to results rather it
is more important to allow for unobserved heterogeneity.
Ferrer-i-Carbonell, A., & Frijters, P. (2004). How important is methodology for the estimates
of the determinants of happiness. The Economic Journal, 114, 641–659
Several studies have cited this article to justify their treatment of satisfaction as a continuous
dependent variable (strictly speaking satisfaction is an discrete (ordered) variable): it is up to
you to determine how you want to treat your dependent variable and also what methods you
9
use (you do not necessarily have to follow what past studies have done, although past studies
may help guide you).
Data Tips
• The USS uses negative numbers to denote missing values with the following codes: -9 –
missing; -8 – inapplicable; -2 – refused; -1 – don’t know. You can make use of the
command mvdecode to change missing values to Stata’s missing value of a “.”.
• Some questions are only asked conditional on response to another question, for example,
questions relating to the labour market are only asked to respondents currently
participating in the labour market (the on-line dataset documentation provides an idea of
who is asked the question) – with the rest coded as inapplicable (-8) – so it is important
you understand which respondents are asked a question. I have tried to provide a brief
overview of who the question is asked to in the variable list Excel file; note you may find
occasionally some discrepancies between who was supposed to be asked the question and
who was, possibly as a result of interviewer/coder error.
• Due to the fact that some of the income variables have imputed values some of the
income values (for personal and household income) could be negative, in which case you
may want to exclude these (some researchers set negative values to zero but this would
pose an issue if you wanted to take the log of income)
• Note that hourly wage (hourpay) is estimated through dividing usual pay by usual hours
(taking into account pay is measured on a monthy basis and hours on a weekly basis),
using the following formula: (paygu_dv/jbhrs+jbotpd)*(12/52). Hourly pay is only
derived for current employees (as the self-employed and those not in the labour do not
report all the required information).
Students are reminded of the University’s penalty for late submission of work:
•where the piece of work is submitted after the original deadline (or any formally
agreed extension to the deadline): 10% of the total marks available for that piece of
work will be deducted from the mark for each working day (or part thereof) following
the deadline up to a total of five working days;
•where the piece of work is submitted more than five working days after the original
deadline (or any formally agreed extension to the deadline): a mark of zero will be
recorded.
Extensions and penalty remissions must be applied for on the Extenuating
Circumstances form.
10
Appendix: Project Tips
Finding Literature and Using Literature
You will likely undertake further reading. The majority of your references should be
academic sources. To help find journal articles and other academic sources refer to the
University library guide on resources in economics https://libguides.reading.ac.uk/economics
(in particular see the journal articles and E-resources tabs). You can search literature
databases (e.g. econlit, a database specifically for economics articles) or source a reference
from the reference list of another paper (how to do these is provided in the link above).
Another commonly used tool is google scholar (https://scholar.google.com/) which leads to a
wider range of search results than standard databases, but includes only scholarly (as the
name suggests) sources (unlike a basic google search which is not recommended).
A formal literature review is not required but do get into the habit of critical
analysing/reviewing everything you read. More on critical analysis is provided below. The
following library guide https://libguides.reading.ac.uk/literaturereview gives an idea on how
to use and critical analyse literature in a research project, since the project contains elements
of a literature review. In particular you should make critical reference to relevant literature in
the introduction to help motivate your topic and research question (in your later research),
and develop your hypotheses; in the methodology to justify the chosen methods; and
critically link your discussion of results (and potentially the conclusions) and explanations of
your results to past literature. It is important to cite sources properly and provide a reference
list.
Guide to Academic Writing
You project should be written in an academic way but if you are unsure how to do this please
refer to the library guides (especially if English is not your first language):
https://libguides.reading.ac.uk/?b=g&d=a. In particular look at the academic integrity toolkit
guide: https://libguides.reading.ac.uk/academicintegrity and the academic writing guide:
https://libguides.reading.ac.uk/writing. Academic writing should be critical i.e. demonstrate
critical analysis and critical thinking; all important parts of the marking criteria for this
project. See the academic writing guide link above for information on writing critically as
well as http://www.phrasebank.manchester.ac.uk/being-critical/
When you are reading other’s research do not immediately take the findings, the arguments
etc. at face value. You should question what you read, you may agree with it, but you should
consider: Are there any issues with the sample, data or methods chosen, the arguments made?
Are there limitations to the work? Have other researchers criticised the work? What
contribution does the work make? You should then be able to state the reasons why you agree
or disagree with the findings or arguments of past studies. This should help you to make use
of and reference to past literature throughout your project.
11
More broadly critical writing within this project should involve making critical reference to
your own findings and other appropriate evidence (such as findings of past studies); in
particular demonstrating how you reached a given conclusion and providing implications and
explanations for your findings, arguments and ideas, backed with appropriate evidence.
Critical writing gets you to go beyond just providing a description of something (say a
finding or idea) and to consider, why? how? why is something important? The key thing is to
avoid being overly descriptive.
Project Formatting Tips
We have seen some useful commands such as outreg2 to export regression results in the
introductory computer class (a video on how to export results can be found under the ‘Stata’
tab). Your work should be formatted in a professional way, so I do not want to see raw Stata
output (unless in an appendix). You should present results succinctly and any results
presented should add value to your project, and be presented in a way the reader can
understand what the table is reporting. If you are not sure what output and statistics to include
in tables, or how to present results, have a look at some example journal articles that use the
econometric techniques you are using. You can put some results into an appendix if you want
to demonstrate you have done them but don’t want to report them (fully) in the main bod; but
all appendices should be referred to at some point in the main text (with the exception of your
do-file). Make sure you report specification tests statistics you perform – these may be
entered directly in tables or added to the text/a footnote. Here are some tips on how you can
present results.
1. Variable descriptions
If you have a lot of variables (control variables) you may not want to describe each in detail,
in which case including a table of variable descriptions can be a useful way to present this
information succinctly. There are several ways to do this and I have given a couple of
examples below. Firstly, you may want to separate variables into your dependent variable(s),
variables of interest and control variables. If you have categorical variables you may simply
want to list the categories or you may prefer to list each dummy variable you will include
(including the base category which you can indicate is the base category). If you use
shortened names elsewhere, here is a good place to define them. If you have generated any
new variables or made any adjustments to variables you can also include notes on how they
were generated/adjusted here if you wish.
| Variable Name | Variable description | Notes/categories |
| Dependent variable | ||
| Logwage | Log of hourly pay | Only includes values between 0-99 |
| Variables of interest | ||
| Hqual | Highest qualification | Categories of: Higher, First degree, other higher education qualification, A-levels or equivalent, GCSE or equivalent, Other qualifications, no qualifications |
| OR |
12
| Hqual1 | =1 if highest qualification is a higher degree, 0 otherwise |
| Hqual2 | =1 if highest qualification is a first degree, 0 otherwise |
| …. | |
| Hqual7 | =1 if highest qualification is no qualifications, 0 otherwise |
| …. | |
| Control Variables | |
| ….. |
If you have quite a bit of detail in this table, it is probably best to put it in the appendix (if
you want to put it in the main text, it is best placed in the data or methodology section) but
refer readers to the table in the text for more information. Some researchers like to add basic
statistics to such a table such as observations, mean, standard deviation and potentially
maximum and minimum values. So something like:
| Variable (you may use shortened variable names if defined elsewhere) |
Variable notes/details (if relevant) |
N (if for a regression sample N may be the same for all variables) |
Mean | Standard deviation |
Maximum | Minimum |
However, if this creates a cluttered and large table you may wish to have a separate table for
the statistics (so the second and third column would not appear). You will see in the next
section how to obtain basic statistics for your regression variables.
2. Formatting Descriptive Statistics
The descriptive statistics reported by Stata are not well formatted plus you may want to
combine statistics from different output provided by different commands. Therefore, I
recommend copying tables into Excel where you can re-arrange statistics and format tables
nicely. When creating tables ensure the reader has all the information needed to understand
the table, so include an informative table title, relevant information within the table (such as
column/row headings) and add any relevant notes to the bottom of the table. As an example,
let’s assume we want to present a table that includes qualification distribution by gender,
average earnings by gender and qualification, and tests of statistical significance. I have
pasted in some example raw Stata output (but not the individual t-tests for each qualification)
using the USS extract in MWP_example supporting document
13
. tab hiqual_dv sex if age>=16 & age<=64, col chi2 nofreq
highest | Sex, derived
qualification | Male Female | Total
——————–+———————-+———-
Higher degree | 11.92 10.39 | 11.05
First degree | 14.96 16.63 | 15.91
other higher degree | 11.26 14.18 | 12.92
A-level etc | 25.56 20.79 | 22.85
GCSE etc | 21.71 23.80 | 22.90
Other qualification | 8.28 6.84 | 7.46
No qualification | 6.32 7.36 | 6.91
——————–+———————-+———-
Total | 100.00 100.00 | 100.00
Pearson chi2(6) = 134.9596 Pr = 0.000
. tab hiqual_dv sex if age>=16 & age<=64, su(hourpay) nofreq nostandard
Means of Estimated Hourly Wage
highest |
qualificat | Sex, derived
ion | Male Female | Total
———–+———————-+———-
Higher de | 21.440336 17.663467 | 19.400917
First deg | 19.312635 15.543042 | 17.06648
other hig | 15.25116 11.856449 | 13.146183
A-level e | 12.08916 9.310293 | 10.627779
GCSE etc | 10.888576 8.7987461 | 9.6723824
Other qua | 10.975256 8.6415385 | 9.806944
No qualif | 8.4026171 8.1975929 | 8.2883194
———–+———————-+———-
Total | 14.555828 11.772552 | 12.981553
Based on pasting the above output into Excel, I have then created the following table:
Table X: Qualification Distribution and Average Wage, by Gender
| % Distribution | Mean Pay (£) | |||
| Men | Women | Men | Women | |
| Higher degree | 11.92 | 10.39 | 21.44 | 17.66 |
| First degree | 14.96 | 16.63 | 19.31 | 15.54 |
| other higher qualification | 11.26 | 14.18 | 15.25 | 11.86 |
| A-level | 25.56 | 20.79 | 12.09 | 9.31 |
| GCSE | 21.71 | 23.8 | 10.89 | 8.80 |
| Other qualification | 8.28 | 6.84 | 10.98 | 8.64 |
| No qualification | 6.32 | 7.36 | 8.40 | 8.20 |
| Total | 100 | 100 | 14.56 | 11.77 |
| Own estimates using wave 4 of the Understanding Society Survey | ||||
| Sample includes those of working age (16 to 64) ad includes 8,918 men and 11,724 women with 5,587and 7,275 men reporting wage | ||||
| *Qualification distribution statistically significantly different by gender (chi2: 134.96; Pr = 0.000) | ||||
| *Average earnings statistically significantly at the 1% level by gender and statistically significantly for each qualification by gender at the 1% level, except for no qualification which is insignificantly different |
A few things to note about the above example table (there’s a fine balance between providing
enough information but not overloading a table so as it make it difficult to read); you do not
need to do things exactly as I have done (so long as you heed the advice):
14
• Titles should be informative and clear but not too long, you can add additional notes at
the bottom of the table
• Numbers should be rounded to make the table easier to read and for numbers to fit! I have
used 2 decimal places; you should typically round to 1 or 2 decimal places in tables
presenting descriptive statistics. However, for regression output commonly 3 decimal
places are used.
• You should provide the source of data in notes to the table.
• You should provide additional information about the sample: in this case the sample used
were of working age– the information may be contained in notes to tables if detailed or
even in the title if short. Where the sample size is not included in the table (say because
you are using percentages across several variables) indicate this in notes to the table. In
this case I listed the number of men and women underlying the estimates in the table, as
all estimates were separate by gender. Note the number of observations were different for
the % distribution of the average wages.
• Ensure the reader understands the numbers in the table e.g. if they are percentages (make
sure it is clear whether and how they add up to 100), frequencies or in other units (such as
£ for average wage) and what is contained in a given cell, row or column.
• If you have undertaken any statistical tests you can add any test statistics to the (notes to
the) table or if this is too messy (as the case for the individual t-tests in this example) add
a note to say differences are statistically significantly different (and at what level: usually
people report at the 10, 5 or 1% level – in this case all significant differences are
significant at the 1% level): you’ll see a succinct way to do this in the regression output.
You can also add any test statistics to footnotes if you want to discuss test results within
the text but not report them in a table.
• Note I often paste in tables and Stata output into documents in quite small font to make
them fit and to save space. You should make sure for your projects that the font is a
reasonable size for the reader and, if need be, present results in several tables. At least 10
is a good font size and I would not go below a font size of 8.
3. Formatting Regression Output
You will have already seen outreg2 as a means to export regression. As an example, I
exported the results by gender we obtained from MWP_example supporting document, and
tidied up the table
Table X: Wage Equations by Gender
| Men | Women | |
| Highest qualification (ref: No qualification) | ||
| Higher degree | 0.410*** | 0.480*** |
| [0.025] | [0.022] | |
| First degree | 0.312*** | 0.392*** |
| [0.024] | [0.019] | |
| Other higher degree | 0.119*** | 0.134*** |
| [0.023] | [0.019] | |
| A-level or equivalent | -0.149*** | -0.128*** |
15
| [0.020] | [0.017] | |
| GCSE or Equivalent | -0.248*** | -0.223*** |
| [0.032] | [0.025] | |
| Other qualification | -0.400*** | -0.291*** |
| [0.032] | [0.034] | |
| Age | 0.084*** | 0.076*** |
| [0.005] | [0.004] | |
| Age Squared | -0.001*** | -0.001*** |
| [0.000] | [0.000] | |
| Married | 0.110*** | 0.033*** |
| [0.018] | [0.013] | |
| Age of the Youngest Child (ref: no children under 16) | ||
| Aged 0-2 | 0.060*** | -0.001 |
| [0.022] | [0.025] | |
| Aged 3-4 | 0.012 | -0.003 |
| [0.030] | [0.029] | |
| Aged 5-11 | 0.076*** | -0.082*** |
| [0.023] | [0.018] | |
| Aged 12-15 | 0.015 | -0.112*** |
| [0.033] | [0.021] | |
| White | 0.261*** | 0.048* |
| [0.027] | [0.026] | |
| Region (ref: South East) | ||
| North East | -0.192*** | -0.120*** |
| [0.033] | [0.030] | |
| North West | -0.156*** | -0.068*** |
| [0.028] | [0.026] | |
| Yorkshire and the Humber | -0.177*** | -0.096*** |
| [0.034] | [0.025] | |
| East Midlands | -0.137*** | -0.109*** |
| [0.029] | [0.024] | |
| West Midlands | -0.085*** | -0.070*** |
| [0.029] | [0.027] | |
| East of England | -0.063** | -0.026 |
| [0.028] | [0.025] | |
| London | 0.084** | 0.086*** |
| [0.035] | [0.031] | |
| South West | -0.188*** | -0.073*** |
| [0.032] | [0.025] | |
| Wales | -0.289*** | -0.093*** |
| [0.039] | [0.029] | |
| Scotland | -0.135*** | -0.056** |
| [0.030] | [0.025] | |
| Northern Ireland | -0.226*** | -0.061** |
| [0.033] | [0.030] | |
| Constant | 0.373*** | 0.554*** |
| [0.089] | [0.073] | |
| Observations | 5,518 | 7,188 |
| R-squared | 0.336 | 0.299 |
16
| F test | 114.3 | 120 |
| Prob > F | 0 | 0 |
| Own estimates using wave 4 of the Understanding Society Survey | ||
| Robust standard errors in brackets | ||
| *** p<0.01, ** p<0.05, * p<0.1 |
Tips on formatting:
• Again make sure table titles and any column headings are informative (additional
information can be added to the notes). It is common for the dependent variable to be
mentioned in the title or where there are different dependent variables for these to be
mentioned in column headings (in this case the column headings included gender,
since we were running separate models by gender). You can add any further
information about the dependent variable(s) to the table notes.
• Variable names should be clear – you can always have a table in the appendix that
provides further information on variables (similar to that shown earlier in this
document): such as full names if you want to use shortened, details on how
constructed and other details about the variables– you can refer the reader to this table
in any notes to the table.
• Base categories for categorical variables should be clear. I put them in brackets within
the table, but some prefer to have a list of base categories in the table notes.
• Notes are useful for providing information on samples, and helping the reader to
interpret the output in the table: note that outreg2 automatically adds stars for
significance level, and reports what is included in the brackets (in this example we
have robust standard errors in brackets).
• The stars for statistical significance are incredibly useful as it saves you from having
to work out the level of statistical significance yourself and makes for easier reading
(as p-values or t-statistics do not need to be reported)
In the above example I have reported all results in the table but you may want to report only
selected key results and there are a number of ways of doing this. With the above table you
could have excluded the constant (quite common as researchers rarely interpret this) and any
variables you wanted to control for these but not show, and then added a note to the table to
say additional controls were included of….
If for example you wanted to show how coefficients changed as you add further controls but
not report controls, one way of doing this is as follows. I have omitted the regression output
for space and made up some additional controls for illustration. You should see at the bottom
I have included a list of groups of controls and then put a Y if they included in a particular
estimation and an N if they are not.
Table X: TITLE
| 1 | 2 | 3 |
| REGRESSION OUTPUT |
17
| Region | Y | Y | Y |
| Occupation | N | Y | Y |
| Personality | N | N | Y |
ADD STATS AND NOTES
If you only want to report some results for brevity in the main text but want the reader to
have access to the full results you can put full results in the appendix and refer the reader to
the appendix should they wish to see the full results.
Note that if you ever include interaction terms using the factor syntax that outreg2 exports
the base categories that are reported as 0, you should exclude any row that contains just 0s
from your final regression output table.
4. Other formatting Tips
Numbering of Sections
It is important to number sections e.g. 1. Introduction, 2. Methodology and data etc. as this
helps with the organisation and flow of your work, it also means you can refer easily to past
or future sections. You may also want to use sub-sections e.g. 1.1, 1.2 etc. or sub-headings
within sections/sub-sections to help organise your sections and help guide the reader.
Numbering of tables and figures
Make sure to number any tables or figures so they can be easily referred to. All tables and
figures should be referred to within the text. These can be numbered in order they appear
throughout the document e.g. Table 1, Table 2…. Figure 1, Figure 2…. Or you may number
them according to which section they are in, so if they appear in section 1 use Table 1.1,
Table 1.2…; if they appear in section 2 use Table 2.1, Table 2.2 etc.…
Numbering of appendices
You should also number any sections in your appendices and number tables/figures e.g. using
say Table A1, Table A2 etc., so you can make reference to them in your main body of text.
Contents page and List of Tables/Figures
You may include a contents page and List of Tables/Figures at the start of your project.
5. Further Tips on discussing results:
When writing up results and using the results to help answer the research question you do not
have to go into detail about every variable or as mentioned report all variables. You should
focus more on the variables of interest (particularly those linked to the research question) and
only need to comment briefly on any control variables (variables that impact your dependent
variable but are not your main interest) to discuss whether they may make sense. If the signs,
significance and size of the coefficients of your control variables are consistent with past
literature this gives you some confidence in your estimated model. In some cases, you may
simply state that you controlled for certain variables. If the coefficients for a variable do not
18
make intuitive sense or fit with past studies, you should firstly check you have specified your
variables correctly or there are no highly correlated variables. Commands for correlating
variables are correlate and pwcorr (the latter command allows you to test the
significance of the correlations). However, if you cannot identify any specification issues
then you should consider alternative explanations for why you have counterintuitive results
(this is part of the critical analysis element that is crucial to your projects).