1 / 12

Learning from Labeled and Unlabeled Data using Graph Mincuts

Learning from Labeled and Unlabeled Data using Graph Mincuts. Avrim Blum and Shuchi Chawla May 24, 2001. Utilizing unlabeled data. Cheap and available in large amounts Gives no obvious information about classification Gives information about distribution of examples Useful with a prior

mquinones
Télécharger la présentation

Learning from Labeled and Unlabeled Data using Graph Mincuts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning from Labeled and Unlabeled Data using Graph Mincuts Avrim Blum and Shuchi Chawla May 24, 2001

  2. Utilizing unlabeled data • Cheap and available in large amounts • Gives no obvious information about classification • Gives information about distribution of examples • Useful with a prior • Our prior: ‘close’ examples have a similar classification

  3. + - Mincut Classification using Graph Mincut

  4. Why not nearest neighbor?

  5. Why not nearest neighbor? Classification by 1-nearest neighbor

  6. Why not nearest neighbor? Classification by Graph Mincut

  7. Self-consistent classification • Mincut minimizes leave-one-out cross validation error of nearest neighbor • May not be the best classification • But, theoretically interesting!

  8. Assigning edge weights • Several approaches: • Decreasing function in distance eg. Exponential decrease with appropriate slope • Unit weights but connect only ‘nearby’ nodes How near is ‘near’? • Connect every node to k-nearest nodes What is a good value of k? • Need an appropriate distance metric

  9. How near is ‘near’? • All pairs within  distance are connected • Need a method of finding a ‘good’  • As  increases, cut value increases • Cut value = 0  supposedly no-error situation (Mincut- 0)

  10. Mincut- 0 does not allow for noise in the dataset • Allow longer distance dependencies • Grow  till the graph becomes sufficiently well connected • Growing till the largest component contains half the nodes seems to work well (Mincut- ½ )

  11. Other ‘hacks’ • Weigh edges to labeled and unlabeled examples differently • Weigh different attributes differently eg. Use information gain as in decision trees • Weigh edges to positive and negative example differently: for a more balanced cut • Use mincut value as an indicator of performance

  12. Some results

More Related