Books
Lectures
Na razie wykłady odbywają się w trybie stacjonarnym.
- What is data mining - a map
- Supervised, unsupervised and semi-supervised learning
- Interference vs. prediction
- Statistics vs. machine learning
- Regression vs. classiffication problem
- Building a model: training, validation and test data
- Model flexibility, overfitting, bias/variance decomposition
- Optimal (Bayes) model, naive Bayes model
- K-nearest neighbours model
- Curse of dimensionality
- Parametric vs. non-parametric models
- Linear regression, model assumptions, scatter plots
- Point and interval estimation of model parameters
- Hypothesis testing, p-value, T-statistic, F-statistic
- R-squared, R-squared adjusted
- Collinarity, interaction terms
- Dealing with data
- Categorical vs. continuous variables
- One-hot encoding
- Outliers and high-leverage points
- Resampling methods: cross-validation and bootstrap
- Data representation, creating a data pipeline
- Classification algorithms overview
- Binomial and multinomial logistic regression
- Decision Trees + entropy and Gini index
- Random Forests + model ensembling technique
- No free lunch thoerem + decision boundaries visualization
- Loss function: cross-entropy, Kullback–Leibler divergence
- Model hyperparameters and grid-search
- Neural networks and deep learning
- Automatic feature extraction
- Stochastic gradient descent
- Backpropagation algorithm
- Standard layers and activation functions
- Multi-input and multi-output networks
- Convolutional neural networks
- Local connectivity and parameter sharing
- Convolution and MaxPooling layers
- Constraints on kernel size, strides and padding
- 1x1 convolution, Inception model and its evolution
- Regularization techniques
- L1 and L2 regularization
- Dropout, data augmentation
- Transfer learning and fine tunning
- Unsupervised learning
- Clustering, anomaly detection
- K-means and hierarchical clustering
- Visualization techniques for multidimensional data