Flyman的知識庫: Introduction to machine learning chapter 2

第二章 Supervised Learning

看完這一章，讓我體會到在data mining課本與machine learning課本的敘述風格有著些許的不同。machine learning讓我從更廣義的角度來看分類問題。雖然大部分都已在data mining的課本中看過，但還是有一些收獲。

1. 分類就是要找到positive與negative的邊界。most general hypothesis指的是包含positive example的最大範圍，也是一般Supervised Learning algorithm所要找出的範圍。與此相反是most specific hypothesis。中間的範圍稱為version space

2.提到Vapnik-Chervonenkis(VC) Dimension的概念，它指的是the capacity of the hypothesis，也就是我們找出的hypothesis能包含的point最多有多少。

3. Probably Approximately Correct (PAC) Learning
教我們如果用most specific hypothesis來當做我們的hypothesis，那麼我們可以預估需要多少的example才能符合我們想要的error(ε)大小範圍和分類的準確度(1-δ) N>= (4/ε)log(4/δ)

4. Multiple Classes可視為多個2-class問題，或是每個分類用一個classifier

5. Noise的影響，太複雜的學習並不一定好

6. Model Selection and Generalization
因為我們幾乎不可能得到所有的traning example，所以我們的演算法必須要有所假設，稱為inductive bias，例如rectangle可以當作分類的區塊；linear regression是假設linear function
我們要選擇對的bias，這就是model selection。然後以此來預測稱為generalization，但要注意function complexity。否則可能會overfitting或underfitting。書中提到一些解決方法

7. 假設sample是iid. 那麼Supervised Learning algorithm必須要做三個部分

(1) 選擇使用那種model及其parameter來決定hypothesis
(2) 定義loss function來算出approximation error
(3) Optimization procedure：找出θ來minimize the approximation error

Flyman的知識庫

2007年2月12日星期一

Introduction to machine learning chapter 2

沒有留言:

2007年2月12日 星期一

Introduction to machine learning chapter 2

沒有留言:

2007年2月12日星期一