- Document Preprocessing
Output: pure text document with following preprocess
(1) Eliminate Script Tag
(2) Eliminate HTML Tag
(3) Remove Stopword
(4) Stemming
(5) Term Frequency Limit
- IR System basic functionality
Input: Many Text Files with preprocess
Output: A basic IR System with indexing
(1) Compute common used features such as TF、IDF...... for building IR Model.
(2)Design a easy way to combine above features to construct your own weight schema equation in the future
※ program for using easily, so don't very concern performance (memory space)
※ Reference "Information Retrieval Algorithms and Heuristics", 2e, 2004
2. Project
- Reading statistical report
"Pennies from eBay: the Determinant of Price in Online Auction", May 2006 - Correct my paper
- Reading "Introduction to machine learning", 2004 - chapter 1
4. Interesting Reading
- 知識管理的第一本書 (我想這應該不限進度吧…)
沒有留言:
張貼留言