Jakub Lemiesz - Homepage

CS Department > Lemiesz > DM 2020 > Results

Results

Practical Exercises Results - 9.12.2020

Lecture Materials

Slides and files conected to lectures will be available on MS Teams in the group related to the lecture - MS Teams link

What is data mining - a map
Supervised, unsupervised and semi-supervised learning
Interference vs. prediction
Statistics vs. machine learning

Regression vs. classiffication problem
Building a model: training, validation and test data
Model flexibility, overfitting, bias/variance decomposition
Optimal (Bayes) model, naive Bayes model
K-nearest neighbours model
Curse of dimensionality
Parametric vs. non-parametric models

Linear regression, model assumptions, scatter plots
Point and interval estimation of model parameters
Hypothesis testing, p-value, T-statistic, F-statistic
R-squared, R-squared adjusted
Collinarity, interaction terms

Dealing with data
Categorical vs. continuous variables
One-hot encoding
Outliers and high-leverage points
Resampling methods: cross-validation and bootstrap
Data representation, creating a data pipeline

Classification algorithms overview
Binomial and multinomial logistic regression
Decision Trees + entropy and Gini index
Random Forests + model ensembling technique
No free lunch thoerem + decision boundaries visualization
Loss function: cross-entropy, Kullback–Leibler divergence
Model hyperparameters and grid-search

Neural networks and deep learning
Automatic feature extraction
Stochastic gradient descent
Backpropagation algorithm
Standard layers and activation functions
Multi-input and multi-output networks

Convolutional neural networks
Local connectivity and parameter sharing
Convolution and MaxPooling layers
Constraints on kernel size, strides and padding
1x1 convolution, Inception model and its evolution

Regularization techniques
L1 and L2 regularization
Dropout, data augmentation
Transfer learning and fine tunning

Unsupervised learning
Clustering, anomaly detection
K-means and hierarchical clustering
Visualization techniques for multidimensional data