✍️ Get Writing Help
WhatsApp

Evolutionary Informatics

1
Life621/Life721. Evolutionary Informatics – Estimating sites
undergoing positive selection
Seth Barribeau (sethb@liv.ac.uk)
There are a variety of software packages which look for evidence of selection in orthologous
gene sequence. We are going to use the Datamonkey web interface
(http:www.datamonkey.org) which runs the HyPhy package (which one can download for
standalone use but installation is nontrivial).
These tools can do a number of different analyses to detect how a sequence is evolving. Much
of the basis of this is to compare the number of synonymous and nonsynonymous mutations
(dN/dS or ω [omega]) and comparing different models which allow these ratios to vary.
The rationale is likelihood based: comparing the likelihood of the pattern of evolution observed
when codons are not allowed to vary in their evolutionary properties (the null hypothesis, no
variation in dN/dS between codons) with a model where codons are allowed to have different
dN/dS rates. If the latter model produces a significantly larger (less negative) likelihood value,
then there is evidence of heterogeneity in rate within the gene, and the programme will identify
those codons under positive selection, and those under purifying selection.
In more complex analyses, one can also ask if the properties of codons vary over a phylogeny,
associated with a known biological difference. For instance, one can ask: are there fewer sites
under positive selection in reproductive genes from monogamous species than polyandrous.
This would test the hypothesis that polyandry drives positive selection. Again, the test is
performed through a model comparison: is the model allowing variation in rate significantly
better than the model that does not?
Datamonkey webserver
Sergei L. Kosakovsky Pond and Simon D. W. Frost (2005)
Datamonkey: rapid detection of selective pressure on individual sites of codon
alignments
Bioinformatics 2005 21(10):2531-2533
HyPhy package
Sergei L. Kosakovsky Pond, Simon D. W. Frost and Spencer V. Muse (2005)
HyPhy: hypothesis testing using phylogenies
Bioinformatics 2005 21(5):676-679
Exercise: Influenza H5N1 haemagluttinin evolution and HIV
evolution.
We are going to adjust a tutorial from the datamonkey homepage. This can be downloaded
from Vital folder for this session. The current version of datamonkey has changed quite a lot
from this original tutorial but the information in the introduction is a useful primer. Go through
the following steps with the flu alignment and try to understand what each analysis tells you.
Work in groups of 3-4 (with one computer) and the instructors will tell you which step to do
first. We need to split up the analyses because if too many users submit jobs to the
datamonkey website it will slow to a crawl.
2
Do at least the recombination analysis (step 4) and one of each of steps 6-8. For steps 6-7
record the codons that show significant positive or purifying selection, for step 8 note
any lineages under selection.
1- First download our data file from Vital which is the Influenza H5N1 sequence
alignment (also used in the tutorial file from datamonkey)
2- Go to the front page of http://www.datamonkey.org
3- Click on ‘get started’
4- First, let’s test for recombination. When it asks what kind of evolutionary process
we’d like to look at, select ‘Recombination’, click ‘GARD’ and follow the instructions
on screen with the flu data set. NB THIS MAY FAIL, PROBLEM WITH WEB
SERVER
a. Why is recombination important to test for before doing these
analyses?
5- Next, we can start looking at rates of evolution.
6- Gene wide evolution:
a. BUSTED (Selection/Gene/) – Choose the part of the tree you’re interested
(here we can click the top node to select the whole tree)
7- Site specific (download the results table at each step and copy the codons that
are under purifying selection into an excel sheet):
a. SLAC (Selection/Sites/Pervasive/Large/Counting)
b. FEL (Selection/Sites/Pervasive/Small)
c. FUBAR (Selection/Sites/Pervasive/Large/Bayesian)
d. MEME **takes a while** (Selection/Sites/Episodic)
i. Enter the codons under negative/purifying selection into
http://bioinfogp.cnb.csic.es/tools/venny/ to see how they agree
with each other.
8- Branches:
a. aBSREL (Selection/Branches/Episodic)
i. Which, if any, lineages are under selection?
Assessed Work. Deadline Dec 15th.
Via e-submission
Picking the paper appropriate to your birth month (below), write a short account
highlighting how phylogenetic/adaptive evolution methods
(HyPhy/Datamonkey/PAML/etc) were used to examine the evolutionary history of the
genes under study.
In your account, which should be no more than one single spaced A4 side, 1.5cm
margins, Arial 11 point font, relate:
a) The biological problem under study – eg past information on the gene and its function,
and why a test of positive selection was being made.
b) The hypothesis being tested in the paper
c) The package(s) used, the analysis completed, and what was discovered using
phylogenetic methods in terms of past patterns of molecular evolution.
d) How this result relates to the findings from other methods utilized in this paper, or in
previously/subsequently published work.
You may place reference cited on a second side of paper if needed.
3

month of
birth
Paper to read
Jan-Mar Santibáñez-López CE, et al. (2018). Integration of phylogenomics and molecular
modeling reveals lineage-specific diversification of toxins in
scorpions. PeerJ 6:e5902 https://doi.org/10.7717/peerj.5902
April
June
Roy, C et al (2020) Trends of mutation accumulation across global SARS-CoV-
2 genomes: Implications for the evolution of the novel coronavirus. Genomics
https://doi.org/10.1016/j.ygeno.2020.11.003
July
Sept.
Lhee, D., Ha, J., Kim, S. et al. Evolutionary dynamics of the chromatophore genome in
three photosynthetic Paulinella species. Sci Rep 9, 2560 (2019) doi:10.1038/s41598-
019-38621-8
Oct-Dec Zhang, CY et al. (2006) Adaptive evolution of the spike gene of SARS
coronavirus: changes in positively selected sites in different epidemic groups.
BMC Microbiology 6:88 doi:10.1186/1471-2180-6-88

Please remain aware of the SoLS rules on collusion and plagiarism (See SoLS VITAL
page, Collusion and Plagiarism). In particular, note:
 Copying blocks of text from the source paper is major plagiarism, and you will be
awarded no marks.
 Failing to attribute sources is likewise plagiarism.
Three hints to avoid plagiarism:
 Be aware of the SoLS policy on what constitutes plagiarism and collusion.
 When completing your account, first read the paper (twice) and then set it aside while
you write your account, so you cannot accidently copy from it.
 Check the TURNITIN report – only actual references cited should be reported as
having been found on the wider internet.

For faster services, inquiry about  new assignments submission or  follow ups on your assignments please text us/call us on +1 (251) 265-5102