Problem 1
You are hired by Blue Moon Consulting (BMC) to conduct a data mining project to improve the targeting of a new financial planning service. The service has been quite successful so far, being marketed over the last 6 months only via a very inexpensive word-of-mouth campaign, and BMC has already garnered a large customer base without any targeting. However, the CIO of BMC believes that very accurate targeting might cost-effectively expand your audience to consumers that word-of-mouth would not reach.
You are given an abridged version of BMC’s data science proposal (found below). Identify 3 things that you believe are weaknesses/flaws in their plan and explain why (3-5 sentences each). I encourage you to refer to the questions associated with the CRISP-DM process and the Appendices in the back of the book (attached as well) if you are struggling to come up feedback.
“We will build a logistic regression (LR) model to predict service uptake for a consumer using data on BMC’s existing customers, including their demographics and their past usage of the service. We believe that logistic regression is the best choice of method because it is a tried-and-true statistical technique, and we can check if the results make sense. If they do make sense, then we can have confidence that the model will be accurate in predicting service uptake. We will then apply the model to BMC’s large database of existing customers and target those whom the LR model predicts to be the most likely to use the service.”
Problem 2
In a departure from your usual work, you are contracted by non-other than Disney to investigate fan-inspired accusations that Marvel superheroes and stories are becoming too “predictable.” Specifically, that “good” and “bad” heroes share too many similar traits and that more creative diversity is needed to keep audiences engaged.
You are provided access to a data set that contains a few attributes on a selection of the most popular superheroes within the Marvel universe. The target variable (alignment) indicates whether the hero is considered “good” or “bad” in the Marvel universe. There are also (when applicable) the following attributes on each hero: gender, race (Human/Non-Human), eye color, hair color, height (in inches), and weight (in pounds).
Run a logistic regression to predict the probability that a hero’s alignment is “good” or “bad.”
Here is the link for the model:
https://bigml.com/shared/logisticregression/gq1lboPQYVy39YYmKm3EMVrrzmH
You need to use predict from the right upper corner, see screenshot below
Once you selected predict you should see the following
Graphical user interface Description automatically generated
In about 8-12 sentences, report the key findings of your analysis based on the prediction above. You should address:
· Any general trends you see in the target variable based on the attributes.
· A discussion of at least one predicted probability using your model and its decision-making importance.
· If and how Disney should address hero alignment being too “predictable” within the Marvel universe based on your analysis.
· What the use of the model could be in future and the potential limits of this analysis.