# Bayesian theorem

## Basics

### Conditional probability

Conditional probability is the probability of an event given something already happened. It is a joint probability divdied by marginal probability.

$$ P(A|B) = \frac{P(AB)}{P(B)} $$

The `|`

means “given” ^{1}.

### Law of total probability

$$ P(Y) = P(X|Y) + P(!X|Y)= \frac{P(XY)}{P(Y)} + \frac{P(!XY)}{P(Y)} $$

### Bayesian theorem

Clearly $P(A|B)$ has some relationship with $P(B|A)$, because they share the common part $P(AB)$.

$$ P(A|B) = \frac{P(AB)}{P(B)} $$

$$P(B|A) = \frac{P(BA)}{P(A)} $$

$$ P(BA) = P(AB) = P(B|A)P(A) = P(A|B)P(B) $$

Therefore,

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} \prop P(B|A)P(A) $$

If we use the total probability for our denominator, it becomes

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|!A)P(!A)} $$

This euqation is called Bayesian theorem and it means if you are given something like P(A|B), you can find its reverse P(B|A) (i.e., the Bayesian inference, see the next block).

## Bayesian inference

### Hypothesis test

Based on the theorem, if we set B the given event as our collected data, A the thing we want to test (or a hypothesis), we change the equation into:

$$ \color{blue}{Pr(H_i|data)} = \frac{\color{pink}{Pr(data|H_i)}\color{steelblue}Pr(H_i)}{\sum_{j=1}^n \color{pink}{Pr(data|H_i)}\color{steelblue}Pr(H_i)} $$

$\color{pink}{Pr(data|H_i)}$ is called $\color{pink}{likelihood}$, $\color{steelblue}Pr(H_i)$ the $\color{steelblue}{prior}$ probability, $\color{blue}{Pr(H_i|data)}$ refers to $\color{blue}{posterior}$ probability.

### Parameter estimation

Change the hypotheses to parameters to estimate, the only difference is parameter is continuous. If the probability is discrete, we use Pr(); otherwise, we use P() to represent the PDF distribution (and use integration to replace sum).

$$ \color{blue}{P(\theta|data)} = \frac{\color{pink}{P(data|\theta)}\color{steelblue}P(\theta)}{\int \color{pink}{P(data|\theta)}\color{steelblue}P(\theta)d\theta} $$

The $\theta$ here refers to the single parameter in our PDF distribution. If there are two or more, simply change it.

### What is science?

As you can see, In conclusion, Bayesian inference does work like below: `Initial belief + New Data = Updated belief`

. And this is exactly same to the philosophy of science. Science refers to a system of acquiring knowledge and update our cognition. The scientific method consists of *induction* and *deduction* (see the figure below).

图像化理解：在B的圈子里，和A重叠的概率 ↩︎