Main page My lectures

2018/19: Probability

This is a lecture on Probability for students of the Big Data Analytics at Faculty of Fundamental Problems of Technology. It takes place on Thursdays from 17:05 - 18:45 in the room A2/L-1.

Rules for passing the course

The course ends with an exam. The final mark from the course will be calculated using the formula: $$ \text{if } Exam = 2 \text{ then } 2, else \max\left(Exam, \frac{Exam+Exercises}{2}\right) $$ where $Exam$ is the mark from examination and $Exercises$ is the final mark obtained after finishing execises.

  1. First term: 24.06.2019; room 320a/A1; time: 11:00 - 13:00
  2. Second term: 01.07.2019; room 320a/A1; time: 11:00 - 13:00

Sample tasks for the exam

  1. Let $X$ and $Y$ be two independent random variables with uniforma distribution in $\{1,\ldots,n\}$. Let $Z = \min(X,Y)$. Calculate $E(Z)$ and $var(Z)$.
  2. Expected value of IQ is 100 and it standard deviation is 15. Estimate the probability that a randomly met person on the street has an IQ greater than 150 using
    • Markov inequality
    • Thebyshev inequality
    • using the knowledge that IQ is well approximated by a normal distribution with the above parameters
  3. Let $X:\Omega\to\{0,1,2,\ldots\}$ be a random variable with a generating function $\phi$. Let $Y = a X + c$ where $a, b$ are two fixed positive natural numbers. Express generating function of $Z$ in terms of generating function of $X$
  4. Let us consider $[0,1]^2$ as a probability space where probability is the Lebesgue measure. Let $X,Y:\Omega\to\mathbb{R}$ be given by $X(x,y)=x+y$ and $Y(x,y)=x-y$. Are the random variables $X$, $Y$ independent?

First term

Task 1. Let $a >0$ and let $X$ be a random variable such that $X \geq a$ and $E[X]=2\cdot a$. Show that $\Pr[X\geq 3\cdot a] \leq \frac12$.
Solution. Let $Y=X-a$. Then $Y\geq 0$ and $(X\geq 3\cdot a) \equiv (Y\geq 2\cdot a)$, so, from Markov inequality, we have $$ \Pr[X\geq 3\cdot a] = \Pr[Y\geq 2\cdot a] \leq \frac{E(Y)}{2 a} = \frac{E(X-a)}{2a} = \frac{E(X)-a}{2a} = \frac{a}{2a} = \frac12~. $$

Task 2 We choose points $(x_n,y_n) \in [0,1]^2$ independently according to uniform distribution on $[0,1]^2$. Let $$ X_n = \begin{cases} 1 &: \text{if }y_n \leq (x_n)^2\\0 &: \text{otherwise}\end{cases}~. $$ Let $S_n = \frac{X_1+\ldots+X_n}{n}$. Estimate the following probability for large values of $n$: $$ \Pr\left[ \frac13 - \sqrt{\frac{2}{n}} \leq S_n \leq \frac13 + \sqrt{\frac{2}{n}}\right]~. $$ Solution. The area of the figure $\{(x,y)\in [0,1]^2: y \leq x^2\}$ is $\frac13$. Therefore $X_n \sim Ber(\frac13)$, so $E[X_n] = \frac13$ and $var(X_n) = \frac29$. From Central Limit Theorem we deduce that the random variables $$ Z_n = \frac{(X_1+\ldots+X_n) - \frac13 \cdot n}{\sqrt{n} \sqrt{\frac29}} $$ converges in distribution to $\mathcal{N}(0,1)$. Therefore $\Pr[-3 \leq Z_n \leq 3] \approx 0.997$. But the following sentences are equivalent: $$ -3 \lt Z_n \lt 3$$ $$ -3 \sqrt{n}\sqrt{\frac29} \leq (X_1+\ldots + X_n) - \frac13 n \leq 3 \sqrt{n}\sqrt{\frac29}$$ $$ -\sqrt{\frac{2}{n}} \leq S_n - \frac13 \leq -\sqrt{\frac{2}{n}} $$ $$ \frac13 - \sqrt{\frac{2}{n}} \leq S_n \leq \frac13 + \sqrt{\frac{2}{n}} $$ Therefore the number 0.997 is the required approximation for large $n$.
Comment. If someone applied Chernoff inequalities, he also received a positive rating.

Task 3.Assume that $X_0,X_1,\ldots$ are independent random variables such that $X_n \sim \text{Exp}\left(\frac12\right)^n$ for each $n\geq 0$. Let $Y_n = \min\{X_0,\ldots,X_n\}$. Calculate $\lim_{n\to\infty} E(Y_n)$.
Solution. Observe that if $Z_i \sim\text{Exp}(\lambda_i)$ (where $i=0,\ldots,n$) are independent and $U_n = \min\{Z_0,\ldots,Z_n\}$ then $$\Pr[U_n>x] = \Pr[Z_0>x \land Z_1>x \land \ldots \land Z_n>x] = $$ $$ \exp(-\lambda_0 x)\cdot\ldots\cdot\exp(-\lambda_n x) = \exp(-(\lambda_0+\ldots+\lambda_n)x)~,$$ so $U \sim \text{Exp}(\lambda_0+\ldots+\lambda_n)$. From this property of exponential distributions we deduce that $$ Y_n \sim \text{Exp}\left(1+\frac12+\ldots+ \frac{1}{2^n}\right) = \text{Exp}\left(2\left(1-\frac{1}{2^{n+1}}\right)\right)~, $$ so $$ E(Y_n) = \frac{1}{2\left(1-\frac{1}{2^{n+1}}\right)} \to_{n\to\infty} \frac12~. $$

Bibligraphy

$ \def\RR{\mathbb{R}} \def\QQ{\mathbb{Q}} \def\ZZ{\mathbb{Z}} \def\CC{\mathbb{C}} \def\NN{\mathbb{N}} \newcommand{\span}[1]{\mathrm{span}(#1)} \newcommand{\IS}[2]{\langle\,#1,#2\rangle} \newcommand{\sgn}[1]{\mathrm{sgn}(#1)} \newcommand{\Borel}[1]{\mathcal{Bor}(#1)} \newcommand{\E}[1]{\mathrm{E}[#1]} \newcommand{\VAR}[1]{\mathrm{var}[#1]} $

Topics discussed during lectures

28.02.2019: Introduction

  1. Probability space: $(\Omega,\mathcal{S},p)$
  2. $\sigma$-fields generated by finite family of sets
    1. Take a sequence $\mathcal{A}=(A_k)_{k=1,\ldots,n}$ of subsets of $\Omega$
    2. For each sequence $\eta$ from $\{0,1\}^n$ define "components" $$ A_\eta = \bigcap_{k=1}^{n} A_k^{\eta_k} $$ where $X^{0} = X^{c}$ and $X^{1} = X$.
    3. For each $T\subseteq \{0,1\}^n$ put $A_T = \bigcup_{\eta\in T}A_\eta$
    4. The family $\sigma(\mathcal{A}) = \{A_T:T\subseteq\{0,1\}^n\}$ is the $\sigma$ field of subsets of $\Omega$ generated by $\mathcal{A}$.
  3. Thm. For every family $\mathcal{A}$ of subsets of $\Omega$ there existsthe least $\sigma$-field $\sigma(\mathcal{A})$ of subsets of $\Omega$ containing $\mathcal{A}$
  4. Open and closed subsets of $\RR^n$
  5. Def. $\text{Borel}(\RR^n) = \sigma(\text{OPEN}(\RR^n))$

07.03.2019: Probability

  1. Combinatorial probability space: we have a finite nonempty set $X$; we put $\mathcal{S} = \mathcal{P}(X)$ and $$ \Pr[A] = \frac{|A|}{|X|} $$
  2. Discrete probability space: we have a finite (or countable) set $\Omega=\{\omega_1,\ldots,\omega_n\}$, we put $\mathcal{S} = \mathcal{P}(X)$, we have a sequence of non-negatite numbers $(p_1,\ldots,p_n)$ and we put $$ \Pr[A] = \sum_{i:\omega_i\in A} p_i $$
  3. Thm. $\Pr[A\cup B] = \Pr[A]+\Pr[B] - \Pr[A\cap B]$
  4. Thm (Generatlization of previous theorem).For every sequence of events $(A_k)_{k=1\ldots,n}$ we have $$ \Pr[\bigcup_{k=1}^{n}] = \sum_{k=1}^{n}(-1)^{k+1} \sum_{T\in [n]^k} \Pr[\bigcap_{i\in T}A_i] $$
  5. Example. Let $S_n$ denotes the set of all permutationsa of $\{1,\ldots,n\}$. Let $$ F_n = \{\pi\in S_n: (\exists i\in\{1,\ldots,n\})(\pi(i) = i)\} $$ (permutations with fix-points). We consider $S_n$ as a combinatorial probability space. Then $$ \Pr[F_n] = \frac{1}{1!} - \frac{1}{2!} + \frac{1}{3!} + \ldots + (-1)^{n+1}\frac{1}{n!} $$ so $\Pr[F_n] \approx 1 - \frac1e$.

14.03.2019: Independence

  1. Def. Evens $(A_1,\ldots,A_n)$ are independent if for any $T\subseteq \{1,\ldots,n\}$ we have $$ \Pr[\bigcap_{i\in T} A_i] = \prod_{i\in T}\Pr[A_i] $$
  2. Example: $\Omega = [0,1]^2$, $\mathcal{S} = \Borel{\Omega}$, $P[A] =\lambda(A)$. Consider $A,B\in \Borel{[0,1]}$. Then the sets $A\times [0,1]$ and $[0,1]\times B$ are independent.
  3. Example (Kołmogorov dice): $\Omega = \{\{R\},\{G\},\{B\},\{R,G,B\}\}$ with uniform probability. We put $A_x = \{\omega: x\in\omega\}$. Then events $A_R, A_G, A_B$ are pairwise independent but not independent.
  4. Conditional probability: If $\Pr[B]>0$ then $$\Pr[A|B] = \frac{\Pr[A\cap B]}{\Pr[B]}$$
  5. Observation: If $A$ and $B$ are independent then $\Pr[A|B] =\Pr[A]$.
  6. Thm. Suppose that $(B_k)_{k=1,\ldots,n}$ is a partition of$\Omega$ into events. Then $$\Pr[A] = \sum_{k=1}^{n} \Pr[A|B_k]\Pr[B_k]~.$$
  7. Consider the following algorithm observing stream of data
    INIT: P = 0; data = NIL; n = 0;
    onGet(x) {
      n = n+1
      if (random() <= 1/n) then {
        P = n
        data = x
      }
    }
    
    Theorem: After reading $n$ elements from a stream we have $\Pr[P = i] =\frac1n$ for each $i\in\{1,\ldots,n\}$ (hence $P$ is uniformly distributed on $\{1,\ldots,n\}$).

21.03.2019: Expected value

  1. Birthday paradox - I $$ \Pr[B_{n,k}] = \prod_{a=1}^{k-1}\left(1-\frac{a}{n}\right) $$ and upper - bound (we used the inequality $1+x\leq e^x$): $$ \Pr[B_{n,k}] \leq \exp \left( \frac{(k-1)k}{2 n}\right) $$ so, if $k\approx \sqrt{2 n}$ then $\Pr[B_{n,k}] \leq \frac1e$.
  2. Random variables: $(\forall a)(X^{-1}((a,\infty)\in\mathcal{S})$
  3. Thm. The following sentences are equivalent
    1. $X$ is a random variable
    2. $(\forall B\in\Borel{\RR})(X^{-1}(B)\in\mathcal{S})$
  4. Expected value on discrete probability space: $$ E[X] = \sum_{\omega\in\Omega}X(\omega)\Pr[\{\omega\}] $$
  5. Corollary (for discrete probability space): $E[X] = \sum_{a\in rng(X)} a \Pr[X=a]$
  6. Geometric distribution: $X\sim Geo(p)$ if $(\forall\geq 1)(\Pr[X=k] = p(1-p)^{k-1})$
  7. Thm. If $X\sim Geo(p)$ then $E[X] = \frac1p$.

28.03.2019: Expected value

  1. Def. Cumulative distribution function (CDF) of random variable $X$: $$ F_X (t) = \Pr[X\leq t]$$
  2. Def. $f_X$ is a density of a random variable $X$ if for any $a,b$ we have $$ \Pr[a\leq X \leq b] = \int_{a}^{b} f_x dx $$
  3. If $f_X$ is a density and is continuous in $t$ then f_X(t) = F'_X(t)$
  4. Thm: If $f_X$ is a density of $X$ then $\E{X} = \int_{\RR} x f_X(x) dx$
  5. Curse of dimensionality - example:
    1. The $n$-dimensional volumne of $n$-dimensional sphere $B_n(r)$ of radius $r$ is given by formula $V_n(r) = C_n r^n$
    2. We consider $B_n(1)$ probability space with uniform probability
    3. We draw a random point from $B_n(1)$ and $L_n$ is its distance from center
    4. $\Pr[L_n\leq r] = (C r^n)/(C_n 1^n) =r^n$ (for $0\leq r\leq 1)$
    5. $f_n = (r^n)' = n r^{n-1}$
    6. Therefore $$ \E{L_n} = \int_{0}^{1} x \cdot n \cdot x^{n-1} dx = \frac{n}{n+1} = 1 - \frac{1}{n+1} $$

04.04.2019: Independent Random variables

  1. Thm. If $(A_n)$ is a partition into events then $$ \E{X} = \sum_n \E{X|A_n} \Pr[A_n]$$
  2. Morris probabilistic counter:
    INIT: C = 0;
    
    onTick() { 
      if (random() <= (1/2)^n) 
        C++;
    }
    
    Theorem $\E{2^{C_n}} = n+1$
  3. Independence of random variables
  4. Example: If $X_1,\ldots,X_n$ are independent and have $Ber(p)$ distribution then $$ X_1+\ldots+X_n \sim Bin(n,p) $$ So $Y \sim Bin(n,p)$ then $\E{Y} = n p$
  5. Thm. If $X,Y$ are independent then $\E{X\cdot Y} = \E{X}\cdot\E{Y}$
  6. Def. $\VAR{X} = \E{(X-\E{X})^2}$
  7. Thm. $\VAR{X} = \E{X^2} - (\E{X})^2$
  8. Thm. If $X_1,\ldots,X_n$ are pairwise independent and $Y=X_1+\ldots+X_n$ then $$ \VAR{Y} = \sum_{k=1}^{n} \VAR{X_k} $$

11.04.2019: Basic inequalities

  1. Thm [Markov inequality] If $X\geq 0$ and $a>0$ then $\Pr[X\geq a] \leq \frac{\E{X}}{a}$
  2. Thm [Chebyshev's inequality]. If $a>0$ then $\Pr[|X-\E{X}|\geq a] \leq \frac{\VAR{X}}{a^2}$
  3. Example. Suppose that $X_1,\ldots, X_n$ are pairwise independent, $\E{X_i} = \mu$, $\VAR{X_i} = \sigma^2$. Let $Y_n = X_1+\ldots+X_n$. Then
    1. $\E{Y_n} = n \cdot\mu$
    2. $\VAR{Y_n} = n \cdot\sigma^2$
    3. If $\epsilon\gt 0$ then $\Pr[|Y_n- n\cdot\eta| \geq n^{\frac12+\epsilon}\eta] \leq \frac{\sigma^2}{\eta^2}\frac{1}{n^{2\epsilon}} \to_{n\to\infty} 0$.
  4. Reformulation of previous example. Suppose that $X_1,\ldots, X_n$ are pairwise independent, $\E{X_i} = \mu$, $\VAR{X_i} = \sigma^2$. Let $$S_n = \frac{X_1+\ldots+X_n}{n}$$ Then
    1. $\E{S_n} = \mu$
    2. $\VAR{S_n} = \frac{\sigma^2}{n}$
    3. For every $\epsilon\gt 0$ we have $$ \Pr[|S_n-\eta| \geq \frac{n^\epsilon}{\sqrt{n}}\sigma] \leq \frac{1}{n^{2\epsilon}} \to_{n\to\infty} 0 $$
  5. Def. If $X:\Omega\to\NN$ then its generating function is defined by formula $$ \phi_X(x) = \sum_{n} x^n \Pr[X=n]~. $$
  6. Example: if $X\sim\text{Bin}(n,p)$ then $\phi_X(x) = (q+px)^n$
  7. Example: if $X\sim\text{Geo}(p)$ then $\phi_X(x) = (px)/(1-qx)$
  8. Thm. If $X:\Omega\to\NN$ then $\displaystyle\E{X} = \lim_{x->1-} \phi'(x)$ and $\displaystyle\lim_{x->1-} \phi'(x) = \E{X^2}-\E{X}$.

15.04.2019: Generating functions

  1. Thm. If $X_1,\ldots,X_n$ are independent and $Y = X_1+\ldots+X_n$ then $$ \phi_S = \prod_{i=1}^{n} \phi_{X_i}~. $$
  2. Example: if $X_1,\ldots,X_n$ are independent and $X_i \sim \text{Ber}(p)$, then $$\phi_{X_1+\ldots+X_n} (x) = \left(\phi_{X_1}(x)\right)^n = (q+px)^n$$.
  3. Thm. Let $N,X_1,\ldots,X_n$ be independent with values in $\NN$, $X_i \sim X$ and $$ S= \sum_{i=1}^{N} X_i~.$$ Then $\phi_{S} = \phi_N(\phi_X)$.
    From this we get $\E{S} = \E{N}\cdot \E{X}$.
  4. Branching process: controlled by random variable $X:\Omega\to \NN$.
    1. THEOREM. Let $X_n$ be the size of population at $n$th step. Then $$ \phi_{X_{n+1}}(x) = \phi_{X_{n}}(\phi_X(x)) $$
    2. THEOREM. Let $d_n = \Pr[X_n=0]$. Then $0=d_0\leq d_1\leq d_2 \leq \ldots\leq 1$ (so this sequence is convergent) and $$ d_{n+1} = \phi_X(d_n)~. $$

02.05.2019: Chernoff bounds

We introduced the notion of moment generating function $$ M_X(t) = \E{e^{t X}} $$ and we proved its basic properties.
We proved the following theorem: if $X_1,\ldots,X_n \sim \text{Ber}(p)$ are independent, $Y=X_1+\ldots+X_n$, $\mu = n\cdot p$ then for every $\delta>0$ we have
$$ \Pr[Y>(1+\delta)\cdot \mu] \leq e^{-\frac{\mu\delta^2}{2+\delta}} $$
and for any $\delta \in (0,1)$ we have
$$ \Pr[Y\lt(1-\delta)\cdot \mu] \leq e^{-\frac{\mu\delta^2}{2}} $$
Read the notes Chernoff bounds, and some applicationsChernoff bounds, and some applications

09.05.2019: Bloom filters

Read the classical paper: Network Applications of Bloom Filters: A Survey

16.05.2019: Normal distributions - I

  1. Basic integral:
    $$ \int\limits_{\RR} e^{-x^2} \; dx = \sqrt{\pi} $$
    Begining of the proof: put $C = \int_{\RR} e^{-x^2} \! dx$. Then $$ C^2 = \left(\int\limits_{\RR} e^{-x^2} \! dx\right) \cdot \left(\int\limits_{\RR} e^{-y^2} \! dy\right) = \iint\limits_{\RR^2} e^{-(x^2+y^2)} \;dx\; dy = \ldots $$ Lord Kelvin said of this trick “A mathematician is one to whom that is as obvious as that twice two makes four is to you. Liouville was a mathematician.”
  2. Second integral: $\int_{\RR} x e^{-x^2} \!dx = 0$
  3. Third integral: $$ \int\limits_{\RR} x^2 e^{-x^2} \!dx = \int\limits_{\RR} x \cdot (-\frac12) (e^{-x^2})' \! dx = (-\frac12) \int\limits_{\RR} x (e^{-x^2})' \!dx = \ldots = \frac12 \sqrt{\pi} $$Hint: use integration by parts $\int f\cdot g' = f\cdot g - \int f'\cdot g$.
Main page My lectures