2017-06-13 Azure-Machine-Learning

Basics Microsoft Machine Learning

Data rich, information poor

4 Pillars of analytics#

Description (what happened, who are my customers)
diagnostic analysis (why things happened, drivers behind why)
predictive (what will happen in the future, predict probability of an outcome)
prescriptive (what should we do)

As you move lower difficulty and value increases

Why is Big Data so Big?#

Data is a Competitive advantage
New insights for smarter decision
Traditional BI gives backward insights
More data everyday
More computing power

Datascience Process#

Define a business problem
Acquire and prepare the data
Develop model
Deploy model
Monitor model performance

Common Data Science Techniques#

Classification

- Supervised learning - logistic regression, decision trees, booster decision trees, multimodal neural networks

Clustering

- Unsupervised learning - Outcomes unknown - k-means algorithm: set number of clusters you want with k variable - self organising maps, ART (adaptive resonance theory) * Regression - predict numerical outcomes

- linear regression, decision trees, neural networks, boosted decision tree regression * Simulation - testing scenarios

- Markov chain analysis

Content Analysis - mine text files, image and video

- pattern recognition, neural networks: multilayered perceptron, ART network

Recommenders

- Collaborative filtering - similarity or ratings etc - Analysing selected content - Naïve Bayes, Microsoft Association rules

Azure Algorithms#

Algorithms are baked into the modules

Difficult part is choosing which algorithm to apply in different scenarios

Azure Studio#

Experiments - experiments saved as drafts
Web Services - exposed by AML
Notebooks - visualise data
Trained models - completed models

Module or dataset view#

RHS: properties LHS: Datasets and modules

Components of an experiment#

Creating a model creates an experiment Experiment: Dataset + modules

Four step model creation#

Get Data
Clean Data (Preparation usually takes the longest)
Choose and apply learning model
Predict over new data

Confusion matrix#

A table used to describe the performance of a classification model where end values are known

True positive: we predict yes, and they do True negative: we predict no, and they don’t False postive: we predict yes, but don’t have disease False negative: we predict no, but they have disease

accuracy - how often classifier is correct
precision - when yes, how often is it correct

Machine learning#

Class of algorithms that is Data driven Data will define the good answer

Supervised - examples are labelled Unsupervised - unlabelled (it clusters data into groups)

Anomaly detection#

Predicting credit card transactions has a huge number of legit ones, and very few fraudulent.

Classification#

Supervised learning

Predicting whether a client will buy a product from us

Classification categorises into buckets, regression predicts values on a continuium.

Classifier types: 2 class classifiers - two options multi-class classifier - three or more categories

Binary Classification#

Simplist form of machine learning

Azure Machine Learning#

You can click the little dot under a block and visualise the data

Missing Values scrubber makes sure there are no missing values

Adding an removing columns is called projecting columns now called Select columns from dataset

Sometimes you can’t visualise data until you have run the experiment

Split data used to create 2 sets of data. One that has been trained by the machine and one that hasn’t.

Trained Model an important module, basically you tell the algorithm what you are trying to predict

Score Model and Evaluate Model are modules that visualises ho well the model works

Top Tips#

When uploading a csv that is ; semicolon separated, you need to change it to a ,, American style CSV otherwise Azure raises issues.