1 / 21

Classification based on online learning

Classification based on online learning. Keywords: reinforcement learning and inverted indexes. Classification learning at different levels. Supervised Given appropriate “targets” for data learn to map from data to target Unsupervised

fionan
Télécharger la présentation

Classification based on online learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification based on online learning Keywords: reinforcement learning and inverted indexes

  2. Classification learning at different levels • Supervised • Given appropriate “targets” for data learn to map from data to target • Unsupervised • Given data identify “targets” embedded in the data without hints or clues • Reinforcement • Given data and a policy, guess a target, receive reward or punishment, adjust the policy

  3. A motivating example: SPAM • Imagine you wish to create a system to reduce spam • You cannot always predict who send you mails; only upon consideration of the content, i.e., subject, that you can determine if a mail is spam or not

  4. SPAM detection using a classifier • Hence, the goal is to “learn” to distinguish good subjects from bad subjects • This problem is a classic classification problem

  5. Learning a mapping function • The mapping involves identifying for each item in the “Mailbox” a relevance value in the range {0,1} R

  6. (Relevance) Learning based on RL St SK agent Rt At Rt+1 User St+1 Agent in situation St chooses At, receives a reward R from the user at situation St+1 The goal is to learn the Policy: t(S, A) = Pr{At=a | St = S} At time t, given the situation S, the policy  provides the probability that the agent’s action will be A Credit: Based on a model by Gillian Hayes

  7. SPAM prediction • Given at time t there are n emails (situation S), predict for each email its relevance (i.e., the action which is a value between 0,1) so as to maximize the reward R from the user • Given a reward (thumbs up or down, so to speak), adjust policy and expectation about future rewards

  8. Several Key Aspects of RL • The policy  (s, a) helps deduce actions • The value function predicts future rewards or expected rewards • Actually what we need is a rank-ordered relevance vector for each item (i.e., emails) • The learning step : rate of updating the policy (associated with the policy function)

  9. A High-Level RL Algorithm for spam classification • Initialize policy and reward representations • Look at what emails have arrived • Predict relevance of each email based on the policy function • Collect rewards from the user • Check to see if policy has converged; if not update the policy at rate  • Update expected reward

  10. An Implementation of RL: The Actual RL Environment in Spam Classification F1 F2 R Basically, the mapping function is decomposed into two functions: F1: semantic classification and F2: relevance classification

  11. RL Implementation • From the agent’s perspective the environment is stochastic • The agent has to choose an action • That is classify the best class and present it to the user • Based on its policy and reward representations

  12. The Policy and Reward Functions • The Policy or action prediction function is a vector of action predictions over the set of classes Ci such that Pr(Ci) represents the probability class Ci should be selected as the best (first) class • The Reward function is based on an estimated relevance vector over the set of classes Ci such that Pr(Ci) represents the probability the user will provide the maximum reward for the class Ci

  13. Details on the key RL functions I • The reward or estimated relevance probability vector is Ŷ (i = 1, …, n), where there are n classes • In the absence of knowledge of expected rewards all elements of the Ŷ are initialized to be zero • Recall that each element represents the probability that the corresponding class Ci will receive the maximum reward

  14. Details on the key RL functions II • The policy or action probability vector is represented with P (I=1, …, n), where we have n maximum classes • In the absence of information, at the beginning all elements of P are initialized to be equal to each other (i.e., 1/n) • Pi is probability that the corresponding class Ci will be selected as the best (first) class

  15. The RL Spam Detection Algorithm • Classify incoming emails to classes Ci (i= 1, …, n) • Select the best class (most likely not spam) by sampling the distribution in policy representation P • Generate a pseudo-random number (PRN) and select from P accordingly (explained next) • Create a ranked list of classes Ci based on the policy and the expected reward representations (explained later) • Collect the rewards from the user and update the reward representation Ŷ • Update policy P vector

  16. How to select an action? • By sampling the action probability distribution (policy function) • Assume a simple P representation of [0.2, 0.4, 0.1, 0.3] A PRN between 0 and 1 is generated; then class 1 will be selected if the PRN lies between 0 and 0.2, class 2 is top if PRN is between 0.2 and 0.6, class 3 first when PRN is between 0.6 and 0.7, and class 4 in other cases As the learning progresses the action probability vector converges to a unit vector, hence a single class emerges as the dominant one

  17. Generating a ranked list of relevance values • Choose the class with the highest value in the action probability vector as the top class  in instance k • Then generate a vector q(k) as below: qi(k) = 1 + Ŷi(k) if (k) = Ci = Ŷi(k) otherwise Where Ŷi(k) is ith element of vector Ŷ at instance k; as Ŷi(k) is bounded by 1, the top element of q(k) always corresponds to the class ; The remaining elements of q(k) follow the order in the estimated relevance probability vector Ŷ(k)

  18. Updating estimated relevance / reward and the action / policy vectors • After collecting rewards (a value between 0, 1) then each element of the Ŷvector (representing the corresponding class C) is updated as a running average of the reward received for the class • Then another vector E(k) is generated in instance k such that all elements of E are set to zero except the lth element corresponding to the top class in the recently updated Ŷ vector • The P vector is subsequently updated as following: P(k+1) = P(k) + (E(k) – P(k)) where 0< < 1is the learning step Hence, an optimal unit vector (actions) is computed at every instance based on current estimates of the relevance probability vector (rewards)

  19. Some known limitations of RL • Estimating the  or the learning step • Too fast or lazy convergence • Initial latency of learning • Scarcity of rewards • Radical shifts in reward patterns • Re-learning may be slow

  20. RL in action(Note: Latency and Convergence)

  21. Acknowledgement • Snehasis Mukhopadhyay – IUPUI, Indianapolis • Gillian Hayes – Univ. of Edinburgh

More Related