Following Neil's discussion on Data Mining Process and Predictive Analytics (part I of this presentation), he nows discusses the Data Extraction, Transformation and Loading, a critical step on the data mining process. He starts with an amusing quote about data: "Raw data is horrible, is messy, is noisy; it is a bit like other people's children" In this Part II (out of three), Neil discusses how to reduce the noise on the data by asking questions like: what is it that we need out of all this data? What are the key events that we need to understand from our data? This process is essentially a data reduction exercise. Neil also goes over a case study on customer segmentation where he explains two different ways of applying behavioral segmentation strategies:
- Deterministic Segmentation: we decide the rules based on which visitors will be segmented.
- Discovery Based Segmentation: the data do the talking; we try to find the patterns, the associations, the correlations that we may not have thought about.