You are hired by Blue Moon Consulting (BMC) to conduct a data mining project to improve the targeting of a new financial planning service. The service has been quite successful so far, being marketed over the last 6 months only via a very inexpensive word-of-mouth campaign, and BMC has already garnered a large customer base without any targeting. However, the CIO of BMC
believes that very accurate targeting might cost-effectively expand your audience to consumers
that word-of-mouth would not reach.
You are given an abridged version of BMC’s data science proposal (found below). Identify 3
things that you believe are weaknesses/flaws in their plan and explain why (3-5 sentences each).
I encourage you to refer to the questions associated with the CRISP-DM process and the
Appendices in the back of the book if you are struggling to come up feedback.
“We will build a logistic regression (LR) model to predict service uptake for a consumer using
data on BMC’s existing customers, including their demographics and their past usage of the
service. We believe that logistic regression is the best choice of method because it is a tried-and-
true statistical technique, and we can check if the results make sense. If they do make sense, then
we can have confidence that the model will be accurate in predicting service uptake. We will
then apply the model to BMC’s large database of existing customers and target those whom the
LR model predicts to be the most likely to use the service.”
Problem 2 (5 points)
In a departure from your usual work, you are contracted by non-other than Disney to investigate
fan-inspired accusations that Marvel superheroes and stories are becoming too “predictable.”
Specifically, that “good” and “bad” heroes share too many similar traits and that more creative
diversity is needed to keep audiences engaged.
You are provided access to a data set that contains a few attributes on a selection of the most
popular superheroes within the Marvel universe. The target variable (alignment) indicates whether
the hero is considered “good” or “bad” in the Marvel universe. There are also (when applicable)
the following attributes on each hero: gender, race (Human/Non-Human), eye color, hair color,
height (in inches), and weight (in pounds).
Upload the Superheroes.csv data to BigML and use it to run a logistic regression to predict the
probability that a hero’s alignment is “good” or “bad.” Make sure you set the target variable to
“alignment.” After you run the model, you’ll want to use the “predict” option in BigML to
examine what trends there might be.
In about 8-12 sentences, report the key findings of your analysis. You should address:
-Any general trends you see in the target variable based on the attributes.
A discussion of at least one predicted probability using your model and its decision-making
importance.
If and how Disney should address hero alignment being too “predictable” within the
Marvel universe based on your analysis.
What the use of the model could be in future and the potential limits of this analysis.
NB: The data provided below should just be used to answer the question ( In about 8-12 sentences, report the key findings of your analysis. You should address)