1 files changed, 140 insertions, 0 deletions
diff --git a/ML/03_classif/recap_cm3.md b/ML/03_classif/recap_cm3.md
new file mode 100644
index 0000000..711acc3
--- /dev/null
+++ b/ML/03_classif/recap_cm3.md
@@ -0,0 +1,140 @@
+# CM 3 : Classification
+
+- MNIST Dataset = 70_000 images (28x28) of handwritten digits and label
+
+## Binary Classifier
+
+Check if 5 or 'Not 5'
+
+SGDClassifier:
+Score of 3-fold-cross-validation = \[0.95035, 0.96035, 0.9604\]
+
+DummyClassifier:
+Score of 3-fold-cross-validation = \[0.90965, 0.90965, 0.90965\]
+
+-> Skewed dataset, 90% of instances are 'Not 5' so just saying 'No' gives a 90% accuracy
+
+### Confusion Matrices
+
+| TN | FP |
+| -------------- | --------------- |
+| FN | TP |
+
+With :
+
+- TN = model predicts negative, label is negative (OK)
+- TP = model predicts positive, label is positive (OK)
+- FN = model predicts negative, label is positive (KO)
+- FP = model predicts positive, label is negative (KO)
+
+### Precision/Recall formulas
+
+Precision (the banker): the model classifies something if he is sure about the prediction;
+$$ Precision = \frac{TP}{TP+FP} $$
+
+Recall (the doctor): when in doubt, the model will classify the instance into the category;
+$$ Recall = \frac{TP}{TP+FN} $$
+
+### F score
+
+Combines Precision and Recall in a single metric
+
+#### F1 score
+
+It is the harmonic mean (more weight to low values) of precision and recall
+
+$$ F_1 = 2 \times \frac{Precision \times Recall}{Precision + Recall} = \frac{TP}{TP + \frac{FN + FP}{2}} $$
+
+### Decision function
+
+If this score is greater than a threshold, it assigns the instance to the positive class ; otherwise it assigns it to the negative class. (Ex: SGD classifier)
+
+(see the curve in the slides)
+
+### Precision/Recall curve
+
+Recall as the X axis, and Precision as Y => easy to create a classifier with desired precision
+
+### ROC Curve
+
+- ROC = Receiver operating characteristic : common tool used with binary classifier
+  - very similar to precision/recall curve
+  - plots the TP rate (recall) vs the FP rate (also called fall-out)
+  - FPR = ratio of negative instances that are incorrectly classified as positive
+  $$ FPR = 1-TNR $$
+  - TNR : ratio of negative instances that are correctly classified as negative, it is also called specificity
+- ROC curve plots the sensitivity (recall) versus 1-specificity
+- Once again, it is a trade-off
+- One way to compare classifier is to measure the area under the curve (AUC ROC)
+- A perfect classifier will have a ROC AUC equal to 1
+- A purely random classifier : ROC AUC = 0.5
+
+### ROC curve or PR curve
+
+- Prefer PR curve when :
+  - the positive class is rare
+  - you care more about the false positives than the false negatives
+- Otherwise use the ROC curve
+- Example : Considering the previous ROC curve you may think that the classifier is really good but this is mostly because there are few positives (5s) compared to the negatives (non-5s). In constrast, PR curve makes it clear that the classifier has room for improvement.
+
+## Multiclass Classification
+
+To distinguish between more than two classes, aka multinomial classifiers
+
+- Some classifiers are able to handle multiple classes natively (Logistic reg, Random forest, gaussian NB, ...)
+- Others are strictly binary classifiers (SGD, SVC, ...)
+
+### How to perform multiclass classification with multiple binary classifiers
+
+#### One-versus-all (OVA/OVR)
+
+Create a system that can classify the instances into k classes by training k binary classifiers.
+To classify a new instance :
+
+- take the decision for each classifier
+- Select the class whose classifier outputs the highest score
+
+On MNSIT : 10 binary classifier, one per digit: 0-detector, 1-detector, ...
+
+#### One-versus-One (OVO)
+
+Train a binary classifier for every pair of labels.
+One to distinguish 0s and 1s, another to distinguish 0s and 2s …
+For N classes => needs Nx(N-1) / 2 classifiers
+
+To classify an image:
+
+- run the image through all the classifiers
+- see which class wins the most duels
+
+Each classifier only needs to be trained on the part of the training set containing the two classes it must distinguish
+
+Some algorithms (e.g., SVM) scale poorly with the size of the training set ⇒ OvO is preferred because it is faster to train many classifiers on small training set than few classifiers on large training sets
+
+### Error Analysis
+
+Use ConfusionMatrixDisplay from the module sklearn.metrics
+
+### Data augmentation
+
+Create some new instances in the set by using other instances and tweeking them a bit
+
+Eg : shifting the digits in some direction in the mnist dataset
+
+## Multilabel Classification
+
+In some cases, you may want your classifier to output multiple classes for each instance.
+
+- Face recognition : several people in the same picture.
+- News : may have several topics (e.g., diplomacy, sport, politics, business).
+=> A system that outputs multiple binary tags is called a multilabel classification system
+
+To go further : have a look at ChainClassifier to capture dependency between labels
+
+## Multioutput Classification
+
+Multioutput-multiclass Classification or just Multioutput Classification
+
+A generalization of multilabel classification where each label can be multiclass (i.e., can have more than two possible values).
+
+Example : image denoising