# PRML Chapter 1

## Polynomial Curve Fitting

 y( x, vec w) = sum_(j=0)^M w_j x^j = w_0 +w_1x+w_2x^2 +…+w_Mx^M

 hat E (vec w) = E(vec w) + lamda/2 ||vec w||^2

Regularisation Term 中：

•  L0= lamda * size(vec w) = lamda M
•  L1 = lamda sum_1^M |w_i|
•  L2 = lamda sum_1^M w_i^2

## Probability Theory

sum rule p(X) = sum_Y p(X, Y )

product rule p(X,Y) = p(Y | X) p(X) = p(X | Y) p(Y)

 p(Y |X) = (p(X|Y )p(Y )) / (p(X))

• p(B) We call this the prior probability because it is the probability available before we observe the identity of the fruit.
• p(B|F) we shall call the posterior probability because it is the probability obtained after we have observed F .

### Probability densities

x 落在(a, b) 区间内的概率为：  p(x in (a, b)) = int_a^b p(x) dx

If the probability of a real-valued variable x falling in the interval (x, x + δx) is given by p(x)δx for δx → 0, then p(x) is called the probability density over x.

sum rule:  p(x) = int p(x, y) dy

product rule:  p(x, y) = p(y|x)p(x)

The probability that x lies in the interval (− oo,z)  is given by the cumulative distribution function defined by P (z) = int_(- oo)^z p(x) dx  which satisfies P′(x) = p(x). If x is a discrete variable, then p(x) is sometimes called a probability mass function because it can be regarded as a set of ‘probability masses’ concentrated at the allowed values of x.

### Expectations and covariances

 bbb E [f] = sum_x p(x)f(x)

bbb E [f]= int p(x)f(x)dx.

conditional expectation  bbb E_x[f|y] = sum_x p(x|y)f(x) The variance of f(x) is defined by  var[f] = bbb E[(f(x)− bbb E[f(x)])^2] = bbb E[f(x)^2] − bbb E[f(x)]^2

covariance expresses the extent to which x and y vary together

• 单一变量 cov[x, y] = bbb E_(x,y) [{x − bbb E[x]} {y − bbb E[y]}] = E_(x,y) [xy] − E[x]E[y]
• 向量 cov[vec x, vec y] = E_(x,y){x − E[x]}{y^T − E[y^T]}= E_(x,y)[xy^T] − E[x]E[y^T].

### Bayesian probablities

since p(w|D) = (p(D|w)p(w))/(p(D))

The quantity p(D|w) on the right-hand side of Bayes’ theorem is evaluated for the observed data set D and can be viewed as a function of the parameter vector vec w, in which case it is called the likelihood function. It expresses how probable the observed data set is for different settings of the parameter vector vec w

A widely used frequentist estimator is maximum likelihood, in which w is set to the value that maximizes the likelihood function p(D|w)

### The Gaussian distribution

Normal/Gaussian distribution
 cc N(x|mu, delta^2 ) = 1 / sqrt(2 pi delta^2) e^(-(x-mu)^2 / (2 delta^2))
μ, called the mean, and σ2, called the variance. β = 1/σ2, is called the precision.

• int_(- oo)^(oo) cc N(x| mu, delta^2) dx = 1
• E[x] = int_(- oo)^(oo) cc N(x| mu, delta^2) x dx = mu
• E[x^2] = int_(- oo)^(oo) cc N(x| mu, delta^2) x^2 dx = mu^2 + delta^2
• var[x] = E[x^2] - E[x]^2 = delta ^2

likelihood function for the Gaussian:  p(bb x | mu, delta ^2) = prod _(n=1)^N cc N (x_n | mu, delta ^2)

### Information Theory

H[x] = - sum_x p(x) log p(x)

the average amount of information needed to specify the state of a random variable

Kullback-Leibler divergence:

 KL (p || q) = - int p(x) ln ((q(x))/(p(x))) dx

mutual information:

1. Root Mean Square Error

Original post: http://blog.josephjctang.com/2015-04/prml-chapter-1/

### 时间管理中的断舍离

[TOC]## 理论篇“断舍离”概念因山下英子的《断舍离》一书而广为人知。如原书所述：> 断，就是让你的生活入口狭窄（断绝不需要的东西）；>> 舍，就是让你的生活出口宽广（舍弃多余的废物）；>> 离，就是通过断和舍，来脱离对物品的执着。>> 所以：断 + 舍 = 离。>> 断舍离的终极目的，是...… Continue reading

#### 科學の上網的便捷方法

Published on February 03, 2018

#### 2017 記

Published on January 02, 2018