BEAM079 Coding Analytics for Accounting and Finance Assignment | UOE

Category	Assignment	Subject	Accounting
University	University of Exeter	Module Title	BEAM079: Coding Analytics for Accounting and Finance

Assignment Option (A) – Social Network Analysis.

Requirement

This assignment investigates Social Network Analysis and requires you to evaluate a network of company directors.

Using data from BoardEx (available via WRDS)1, you are tasked with compiling a network of company directors using your knowledge of Social Network Theory. The network should consist of nodes (the directors) and edges. The edges between each node should be connected only if the directors sit on the same company board in the same year.

Your network should span at least one whole year and include the latest full (calendar) year of data from BoardEx (currently this would be 2023). BoardEx has three databases of companies and their directors, US, UK, and Europe; you may use any one of these for your analysis.234

Your analysis should include a statistical investigation into the network you create. Who are the key, central and influential players in the network? Why are they so influential? Do they often represent a particular type of company or companies, or have particular characteristics?
To answer these questions, you should provide information regarding the network in terms of the centrality measures provided to you in class: Degree, Betweenness, Closeness and Eigenvector centrality.

Using your python skills acquired within this module (and elsewhere), your report should seek to provide (but not limited to):

An introduction – A general discussion of Social Network Analysis, the potential application to corporate director networks, what your research goals are, and what you hope to achieve through your analysis.

A literature review – You have one main strand of literature to assess, Social Network Theory and in particular, its utilisation in the area of business power and influence. Remember that a literature review should be a cohesive discussion of extant literature, how it relates to your research, and is the main indicator of whether you understand the subject area or not.

Methodology – What statistical tests/models are you going to perform and why?

Data – A description of the data and its origin, including summary statistics of the key variables where appropriate, including the size of the network and other pertaining information.

Analysis – You are required to analyse the network as a whole and discuss the most influential directors across the network. There are four centrality measures which may offer different results. You may wish to formulate a “overall score” taking into account each measure in combination.

Visualisation – Visualise your network. This can be done in any way you feel fit. Make sure to explain your visualisation and how it was created. You may wish to colour code the network by community, or by industry for extra visual appeal. The examples and links given to you in class will assist you on this.

Conclusion – What are your key findings and what are the implications?

Python code – A code (*.py) file which documents each stage of your analysis (uploaded as a separate file).

You may expand or explore this topic in any way you feel appropriate – provided that the key areas highlighted above are covered.

Assignment Option (B) – Fraud Detection

Requirement

This assignment surrounds fraud detection and asks you to investigate the application of Benford Law as a tool to detect financial reporting fraud.

A list of 33 companies has been provided for you. These companies have been determined by Audit Analytics to have not only misrepresented their accounts, but have done so in a fraudulent manner. The details of each fraud and the account-years which are affected are provided for you and are available on ELE within the Assignment section.6 Note that only the last 10 full years (2014-2023) are to be used.

Using financial data, your task is to provide empirical analysis and establish whether these fraudulent accounts can be detected by Benford Law.

In particular, you should analyse these fraudulent reports and compare the results to companies that are deemed to be clean. The clean companies can be any of your choosing. The clean sample can either be comparable to those which are fraudulent (e.g. in terms of size and industry), or simply a larger population of remaining firms.

Your analysis should include a comparison of MAD, KS and Chi-squared test for both groups – including tests of difference where possible.
You are also required to perform separate analyses, not only on the financial data as a whole, but according to type of information i.e. Balance sheet, Income Statement, and Cashflow items. Comparisons should be made to the findings of Amiram et al (2015).

You may also wish to take the type of fraud being committed into account when performing any analysis. The column named “RES_FRAUD_RES_CAT_FKE_LIS” details the type of fraud e.g. [6] “revenue recognition issues” would suggest that the fraud would be more likely detected in the income statement rather that the other statements. You may wish to investigate whether this is true or not.

No financial data is provided – you will need to manually collect this yourselves through, for example, WRDS.

An introduction – Discussion of Benford Law, the potential application to fraudulent account detection, what your research goals are, and what you hope to achieve through your analysis.

A literature review – You have two strands of literature to discuss, namely Benford Law, and fraud detection. Remember that a literature review should be a cohesive discussion of extant literature, how it relates to your research, and is the main indicator of whether you understand the subject area or not.

Methodology – What statistical tests/models are you going to perform and why?

Data – A description of the data and its origin, including summary statistics of the key variables.

Univariate analysis – The inclusion of (but not limited to) tests of difference between the two groups of firms in terms of the financial data acquired, along with separate analysis of balance sheet, income and cashflow statement items. You would expect to see larger deviations from BL within the fraudulent accounts.

Multivariate analysis – No multivariate analysis is required but you may wish to investigate Amiram et al’s (2015) proposition that misstatements are more likely to occur in smaller, younger, more volatile, growing firms. This may be done by using key independent variables in order to explain your Benford conformity measures by way of a regression (conformity being the dependent variable).

Conclusion – What are your key findings and what are the implications?

Python code – A code (*.py) file which documents each stage of your analysis (uploaded as a separate file).

You may expand or explore this topic in any way you feel appropriate – provided that the key areas highlighted above are covered.

Option (C) – Bankruptcy Prediction with Text Analysis Requirement

This assignment incorporates several elements of the BEAM079 module and requires that you perform an empirical investigation and discussion of the following scenario – building upon the work you may have carried out in BEAM078/BEFM022 or other modules which have introduced Bankruptcy Prediction as a topic.

Using financial and textual data, your task is to provide empirical analysis to establish whether several high-profile US corporate bankruptcies (which occurred in 2023) could have been predicted or not. You are tasked with creating a series of models and assessing how accurate these models are in the prediction of bankruptcies in the US.

Your data comes in two forms. Firstly, the list below details the Names and Tickers for 10 Bankrupt Companies and 20 Non-Bankrupt companies.

This is your sample and ONLY this sample should be used. A number of financial variables are also provided for you to create financial ratios. Should you require further accounting data beyond what is provided then please download these from WRDS.1

Secondly you have been provided with one annual report for each of the companies listed above. Each of the annual reports are from 2022 (one year prior to bankruptcy) and can be found on ELE in the Assignment section. You are to use these annual reports in order to add textual information to your bankruptcy prediction models.

Using your python skills acquired within this module (and elsewhere), the empirical analysis contained within your Assignment should seek to provide (but not limited to) the following models:

• A logit model predicting bankruptcy one year before failure (2022) using only financial ratios of your choosing. These should be selected based on the prior literature. This should be tested for accuracy via an ROC curve.

• Adding textual information from the annual reports (e.g. sentiment and readability scores), does this improve the accuracy of the logit model?

• Does methodology matter? Instead of using a logit model, how are the above two models affected if they are now conducted using a Neural Network?11

In addition, your Assignment should implement the following structure and some further suggestions have been provided for you:

An introduction – Discussion of the task at hand highlighting what your research goals and objectives are, whilst setting the scene from which everything else is based.

A literature review – You have 2 main strands of literature to discuss namely: Bankruptcy prediction literature and sentiment/textual analysis literature. Your study will add to the small number of papers which have joined the two strands together. Remember that a literature review should be a cohesive discussion of extant literature, how it relates to your research, and is often seen as the main indicator of whether you understand the subject area or not.

Methodology – What statistical tests/models are you going to perform and why? How have they been used in the past? How successful have they been?

Data – A description of the data and its origin, including summary statistics of the key variables.

Univariate analysis – The inclusion of (but not limited to) tests of difference (t-tests) between the two groups of firms in terms of both financial and textual data; correlation – ensure/demonstrate that there are no extreme correlations that will affect your multivariate logit model12; a comparison of bankrupt and non-bankrupt word clouds may be interesting if an adequate number of stopwords are utilised.
Multivariate analysis – Combine your individual variables into multivariate models. As described above, your main overriding task is to discover whether it is better having textual data in a bankruptcy prediction model rather than just having financial variables? Does the textual data add any value to the model(s).

How much more accurate is a neural network than a logit model? (if any). Due to a lack of bankrupt firms in 2022 you do not need to validate your models with a separate sample- you may base your results on the training sample alone.

Conclusion – What are your key findings and what are the implications?

Python code – A code (*.py) file which documents each stage of your analysis (uploaded as a separate file).