Crocodile are imaginary animals

Question One
Crocodile are imaginary animals that live in the jungles of Borneo. Four species of Crocodile
are believed to exist, we will refer to these as “A”, “B”, “C” and “D”. Copy the following code
into R and run it. Replace the xxx in the first line with the last three digits of your ID number.
This will provide you with your own unique set of data. (You will need to install the MASS
package if you have not installed it previously).
set.seed(047)
Library(MASS)
n<4*ceiling((runif(1,25,200)))
w<-rnorm(4,12,1.5)
w1<-w[1]
w2<-w[2]
w3<-w[3]
w4<-w[4]
species<-c(rep(“A”,n/4),rep(“B”,n/4),rep(“C”,n/4),rep(“D”,n/4))
weight<-round(c(rnorm(n/4,w1,2),rnorm(n/4,w2,2),
rnorm(n/4,w3,2),rnorm(n/4,w4,2)),1)
olength<-round(c(rnorm(n/4,w1*3,3),rnorm(n/4,w2*3,3), rnorm(n/4,w3*3,3),
rnorm(n/4,w4*3,3)),1)
leglength<-round(c(rnorm(n/2,9.6,1.9),rnorm(n/2,15.8,2.2)),1) CROCODILE<-
data.frame(species,weight,olength,leglength)
CROCODILE is a data frame with four variables:
• species: the species of crocodile
• weight: the weight of the animal (in pounds)
• olength: the overall “nose to tail” length of the crocodile, (in inches).
• leglength: the leg–length of the crocodile, (in inches).
CONTINUED OVERLEAF
i) produce the first six lines of your CROCODILE dataset by using head(CROCODILE), and
copy it into your answer. (This is so that I can check that you are using the correct
data). You should obtain something that looks like this, but with differing numbers.
species weight olength leglength

1 A 11.9 33.1 11.5
2 A 12.9 36.8 10.3
3 A 14.0 37.2 8.7
4 A 15.6 31.9 7.8
5 A 10.7 32.3 7.6
6 A 9.7 44.3 9.6

ii) produce a partial plot of olength versus leglength, (see Figure 15 of the lecture notes
for an example of this type of plot). Your plot should distinguish between the four
species, and include a title and a legend box. Based upon this plot what may be said
concerning classifying the species of a crocodile based upon its overall length and its
leg–length?
iii) Perform a linear discriminant analysis of species versus the four other variables.
(a) What are the mean weight, overall length and leg length of a species D crocodile?
(b) Write down the value of the first linear discriminant function for a Crocodiles that
weighs 10 pounds, has overall length 32 inches and leg length 16 inches.
(c) Produce a classification table of the observed versus the predicted classifications.
(See the table of Section 13.3.3, for an example of such a table, note that here there
are four species.)
(d) What proportion of species are mis-classified? Is there any “pattern” to the
misclassifications?
CONTINUED OVERLEAF
iv) Data is obtained about two more crocodiles, the data is shown below.

crocodile weight olength leglength
1 12 36.3 12.2
2 8.2 30.3 6.5

(a) What are the predicted classifications of crocodiles 1 and 2?
(b) (To three decimal places) what is the probability that each crocodile is of species
B?
Question Two
Part One
Please run the following code
Set.seed(047)
m<-runif(1,17,87)
mu1<-sample(20:40,4,TRUE)
mu2<-sample(20:40,4,TRUE)
sigma<-sample(5:18,8,TRUE)
i<-1
V1<-c(rnorm(m,mu1[i],sigma[i]),rnorm(m,mu2[i],sigma[i+4])) i<-2
V2<-c(rnorm(m,mu1[i],sigma[i]),rnorm(m,mu2[i],sigma[i+4]))
i<-3
V3<-c(rnorm(m,mu1[i],sigma[i]),rnorm(m,mu2[i],sigma[i+4]))
i<-4
V4<-c(rnorm(m,mu1[i],sigma[i]),rnorm(m,mu2[i],sigma[i+4]))
DD<-round(cbind(V1,V2,V3,V4),0)
i) Create a silhouette plot for the data. What is the optimal number of clusters indicated
by this plot?
ii) Create an elbow plot for the data. What is the optimal number of clusters indicated by
this plot?
iii) Run a k-means cluster analysis of the data, where k is the optimal number of plots as
indicated by the silhouette plot.
(a) How many observations are in each cluster?
(b) What are the coordinates of the centroids of the clusters. (Please round to one
decimal place).
iv) Create a plot of V3 versus V4 where the clusters are distinguished by colour.
Part Two
Please run the following code
set.seed(047)
n<-sample(60:180,1)
Length<-c(rnorm(n,500,5),rnorm(n,500,5),rnorm(n,526,5),rnorm(n,526,5))
Circumference<-c(rnorm(n,51,2),rnorm(n,30,2),rnorm(n,51,2),rnorm(n,30,2))
EEL<-data.frame(Length,Circumference)
This will produce a dataset unique to you, we take the data to represent the length and
circumference of a sample of eels.
Produce two plots of Circumference (x-axis) versus length, the first where a 2-means
clustering has been performed on the unscaled data, and the second where a 2-means
clustering has been performed on the scaled data.The points of the plots should be coloured
so as to distinguish the clusters.
• Do the clusters differ between the two plots? If yes describe the difference in the
clustering, and explain why this difference may have occurred. If there is no difference
explain why not.

WhatsApp
Hello! Need help with your assignments?

For faster services, inquiry about  new assignments submission or  follow ups on your assignments please text us/call us on +1 (251) 265-5102

🛡️ Worried About Plagiarism? Run a Free Turnitin Check Today!
Get peace of mind with a 100% AI-Free Report and expert editing assistance.

X