2007年1月23日 星期二

1/22~1/28 一週工作計畫

1. Program
  • Document Preprocessing
Input: A URL(http html file) or A Text File or Directory contains many Text Files or Database string field
Output: pure text document with following preprocess
(1) Eliminate Script Tag
(2) Eliminate HTML Tag
(3) Remove Stopword
(4) Stemming
(5) Term Frequency Limit
  • IR System basic functionality

Input: Many Text Files with preprocess
Output: A basic IR System with indexing

(1) Compute common used features such as TFIDF...... for building IR Model.
(2)Design a easy way to combine above features to construct your own weight schema equation in the future

※ program for using easily, so don't very concern performance (memory space)
※ Reference "Information Retrieval Algorithms and Heuristics", 2e, 2004

2. Project

  • Reading statistical report
    "Pennies from eBay: the Determinant of Price in Online Auction", May 2006
  • Correct my paper
3. Learning

  • Reading "Introduction to machine learning", 2004 - chapter 1

4. Interesting Reading

  • 知識管理的第一本書 (我想這應該不限進度吧…)
這文字編輯器好難用 不能按Tab鍵縮排

沒有留言: