Hao-Chin Chang Department of Computer Science & Information Engineering

Selected New Training Documents to Update User Profile Abdulmohsen Algarniand Yuefeng Li and Yue Xu CIKM 2010 Hao-Chin Chang Department of Computer Science & Information Engineering National Taiwan Normal University 2011/06/07

Outline • Introduction • Pattern-basedmodel • Adaptive information filtering • Experiment • Conclusion

Introduction • Filteringtask indicates to the user which document might be interested to him • Determine which ones are really relevant is fully reserved to the user • Information Filtering (IF) model aim is to re-rank the incoming set of documents based on user profile of the user's topic • Major studies in this area can be grouped into two main groups • First, the purpose of knowledge extraction from user feedback is to build the user's profile • The second deals with how effectively an efficiently a user profile can be updated with a new feedback in order to follow the user's interest change and improve the quality of the filtering system

Introduction • Relevance Feature Discovery(RFD) • High level features (patterns) and low level features will be extracted from initial training documents • The higher level features include both positive and negative patterns • The low level term weights are evaluated according to both their specicity and their distributions in the higher level features

Introduction • In order to deal with adaptive issues in an IF model, there are two main areas of focus • The first involves updates of the user's profile to follow changes in the user's interests with new information • The second area involves updating the user profile to solve nonmonotonic problems • Training documents about the “Agent” • IF systems may return information objects such as “Intelligent Agent” , “PropertyAgent” , “SoftwareAgent” • previous matching decisions (e.g. considering “PropertyAgent” as relevant) • user’s actual information need (e.g. user is only interested in “SoftwareAgent” as non relevant) • How slove nonmonotonic problems • The first is how to select a document that containsnew knowledge that a system does not have • The second issue is how to evaluate and update based-knowledge with thenew one in an efficient way

Introduction • Adaptive Relevance Features Discovery (ARFD) • First, is the ability of IF system to extract dierent knowledge for dierent users in dierent interested topics • Second, is the ability of updating and reviewing the weight of features in the hypothesis space model when is received a new feedback

Pattern-basedmodel • We used a pattern-basedmodel to extract features from relevance feedback • This is different from the usual defnition where a pattern consists of distinct terms and duplicate terms are removed. • Coverset({t3,t4,t6},d) = {dp2,dp3,dp4} • supa({t3,t4,t6},d) = 3 • Supr ({t3,t4,t6},d) = 3/6=0.5 • Closed patterns :< t3,t4,t6 >,< t1,t2>,< t6 >

Pattern-basedmodel • For a given term t, its weight in discovered patterns in positive text documents • The specicity of a given term t in the training set D = D+∪D- • The initial weights of terms finally are revised

Adaptive information filtering • Document Selection • The new feedback can be categorized into two main categories • First, is a document that contains more explanation about what user need on the same topic • Second category is documents that contain new area or topic which are indicate that the user changed his interest topic and that is out of our scope • The system has used the following ranking function.

Adaptive information filtering • Knowledge Extraction and Merging of Adaptive RFD model (ARFD) (1) Mining features and functions from the initial (or the base) training set Db (2)SNTSelect describes the details of selecting some target documents Ds from a new training set Dn in order to remove some redundant documents (3)Mining features and functions from the target documents Ds (4)Merging these features and the functions discovered from the both initial training set and the selected target documents

Experiment

Conclusion • We proposed an adaptive information fltering system called adaptive relevance features discovery • The main aim of this method is the efficient revision and updating of extracted features weight in vector space using new training documents to solve the nonmonotonic problem • The combination of the knowledge will be tested to ensure that it helps to solve the nonmonotonic problem

P-value • P-value 定義上是: 以現有的樣本資料而言, 能棄絕(reject)虛無假說 H0 的最小顯著水準 • 顯著水準是做檢定時我們能容許的型一錯誤機率上限。因此, 顯著水準愈小, 則棄絕域愈小。所以, 若在特定的顯著水準下依據目前的資料 H0 能被棄絕, 則可將顯著水準降低; 但降得太低, 則目前的資料點可能被排擠出棄絕域之外, 即不能棄絕 H0。P-value 就是表示顯著水準放寬至能棄絕 H0 後又儘量縮減至幾乎不能棄絕 H0 的情況。

SPMining

HLFmining

NRevision

SNTselect

Hao-Chin Chang Department of Computer Science & Information Engineering