Note: All data files for this coursework can be downloaded from blackboard. This is an individual coursework.
You are employed as an information analyst in the Management Information section of a national charity. The charity is planning a new campaign to raise funds. This will be posted out to a subset of previous supporters to past campaigns. You have been asked to give advice on the selection of the subset.
A trial mailing of about 3000 randomly chosen supporters was made. Data on the responses to this and information about these supporters are available in file datacw1.sav. List of fields and their descriptions is given in the following table.
Response to trial campaign (0=No & 1=Yes)
Follow CRISP-DM with six stages of data mining processes. In the modelling stage of your investigation, use neural network models to predict who will respond positively. Define criteria for testing and selecting a model. Consider the results for at least 3 different neural networks employing various training methods. Develop an appropriate association or rule induction model to predict responders to the Mailshot.
Consider using only some of the available fields. Choose the neural network node that you consider to be the best and compare it to your best association or rule induction model. Pick whichever model you prefer and justify your choice.
Information on a subset of 150 supporters who were not part of the Trial mailing is also available in file datacw1-test.sav. Use your preferred model to select 25 of these supporters whom you believe are most likely to respond.
1 A short report for the Head of the Management Information section. This
should briefly explain your preferred model and justify its use. The Head of the Management Information section who will receive this summary has little knowledge of data mining and wants to know nothing about IBM-Modeler. This report should be no more than 2 sides of A4 including graphs, tables, etc (20%).
2 A report on the various stages of the data mining aspects, documenting in detail what you have done and what you have found. Include appropriate SPSS (if used) and IBM-Modeler outputs in this report. This report should be no more than 6 sides of A4 including the main graphs, tables, etc. It should refer to:
explain data cleaning process
describe the data by producing summary statistics as appropriate
identify factors associated with a positive response
use of neural network and other models for prediction
evaluate selected models using a gains chart / analysis node / or by
comparing the errors
Be expressed in plain English, using short sentences and short paragraphs, supporting graphs, screenshot of models, etc can be included in the appendix (80%).
Reports on data mining stages should cover all six phases as follows.
Business Understanding 5% Data understanding 15% Data preparation 10% Modelling 30% Evaluation 15% Deployment 5% Total (80%)