Relaxed Transfer of Different Classes via Spectral Partition

Relaxed Transfer of Different Classes via Spectral Partition • Unsupervised • Can use data with different classes to help. How so? Xiaoxiao Shi1Wei Fan2Qiang Yang3 Jiangtao Ren4 1 University of Illinois at Chicago 2 IBM T. J. Watson Research Center 3Hong Kong University of Science and Technology 4 Sun Yat-sen University

What is Transfer Learning? Standard Supervised Learning training (labeled) test (unlabeled) Classifier 85.5% New York Times New York Times 2

What is Transfer Learning? How to improve the performance? In Reality… training (labeled) test (unlabeled) 47.3% Labeled data are insufficient! New York Times New York Times 3

What is Transfer Learning? Source domain training (labeled) Target domain test (unlabeled) Transfer Classifier 82.6% New York Times Reuters Not necessary from the same domain and do not follow the same distribution 4

Transfer across Different Class Labels Source domain training (labeled) Target domain test (unlabeled) Transfer Classifier 82.6% New York Times Reuters Labels: WorldU. S.Fashion StyleTravel…… Labels: MarketsPoliticsEntertainmentBlogs…… How to transfer when class labels are different? in number and meaning Since they are from different domains, they may have different classlabels!

Two Main Categories of Transfer Learning • Unsupervised Transfer Learning • Do not have any labeled data from the target domain. • Use source domain to help learning. • Question: is it better than clustering? • Supervised Transfer Learning • Have limited number of labeled examples from target domain • Is it better than not using any source data example?

Transfer across Different Class Labels • Two sub-problems: • (1) What and how to transfer, since we can not explicitly use P(x|y) or P(y|x) to build the similarity among tasks (class labels ‘y’ have different meanings)? • (2) How to avoid negative transfer since the tasks may be from very different domains? Negative Transfer: when the tasks are too different, transfer learning may hurt learning accuracy.

Dataset exhibits complex cluster shapes • K-means performs very poorly in this space due bias toward dense spherical clusters. Eigenspace: space expended by a set of eigen vectors. In the eigenspace (space given by the eigenvectors), clusters are trivial to separate. -- Spectral Clustering The proposed solution • (1) What and How to transfer? • Transfer the eigensapce

The proposed solution • To get the Clustering-based KL divergence: • Perform Clustering on the combined dataset. • Calculate the KL divergence by some basic statistical properties of the clusters. See Example. • (2) How to avoid negative transfer? • A new clustering-based KL Divergence to reflect distribution differences. • If distributions are too different (KL is large), automatically decrease the effect from source domain. Traditional KL Divergence Need to solve P(x), Q(x) for every x, which is normally difficult to obtain.

the portion of examples in Q that are contained in cluster C2 the portion of examples in P that are contained in cluster C2 the portion of examples in Q that are contained in cluster C1 the portion of examples in P that are contained in cluster C1 An Example E(P)=8/15E(Q)=7/15 Q P For example, S(P’, C) means “the portion of examples in P that are contained in cluster C ”. C2 S(P’, C1) S(Q’, C1) S(P’, C2) S(Q’, C2) = 0.5 Clustering = 0.5 C1 =5/9 P’(C1)=3/15Q’(C1)=3/15P’(C2)=5/15Q’(C2)=4/15 CombinedDataset =4/9 KL=0.0309

Objective Function • Objective: Find an eigenspace that well separates the target data • Intuition: If the source data is similar to the target data, make good use of the source eigenspace; • Otherwise, keep the original structure of the target data. Prefer Source Eigenspace Prefer OriginalStructure TraditionalNormalized Cut Penalty Term Balanced by R(L; U) More similar of distributions, less is R(L; U), more the function will rely on source eigenspace TL

How to construct constraint TL and Tu? • Principle: • To construct TL --- it is directly derived from the “must-link” constraint (the examples with the same label should be together). • To construct TU --- (1) Perform standard spectral clustering (e.g., Ncut) on U. (2) the examples in the same cluster should be together. 4 1, 2, 4 should be together (blue); 3, 5, 6 should be together (red) 1 3 5 2 6 4 1, 2, 3 should be together; 4, 5, 6 should be together 1 3 5 2 6

How to construct constraint TL and Tu? • Construct the constraint matrix M=[m1, m2, …, mr]’ T 1, -1, 0, 0, 0, 0 1, 0, 0, -1, 0, 0 0, 0, 1, 0, -1, 0 …… 1 and 2 For example, 1 and 4 ML = 4 3 and 5 1 3 5 2 6

Experiment Data sets 15

Experiment data sets

Text Classification Comp1 VSRec1 1: comp2 VS Rec2 2: 4 classes (Graphics, etc) 3: 3 classes (crypt, etc) Org1VSPeople1 1: org2 VS People2 2: 3 classes (Places, etc) 3: 3 classes (crypt, etc)

Image Classification HomerVSReal Bear 1: Superman VS Teddy 2: 3 classes (cartman, etc) 3: 4 classes (laptop, etc) CartmanVSFern 1: Superman VS Bonsai 2: 3 classes (homer, etc) 3: 4 classes (laptop, etc)

Parameter Sensitivity

Conclusions • Problem: Transfer across tasks with different class labels • Two sub-problems: • (1) What and How to transfer? • Transfer the eigenspace. • (2) How to avoid negative transfer? • Propose an effective clustering-based KL Divergence; if KL is large, or distributions are too different, decrease the effect from source domain. 20

Thanks! Datasets and codes: http://www.cs.columbia.edu/~wfan/software.htm 21

# Clusters? Condition for Lemma 1 to be valid: In each cluster, the expected values of the target and source data are about the same. If > where is close to 0. Adaptively Control the #Clusters to guarantee Lemma 1 valid!--Stop bisecting clustering when there is only target/source data in the cluster, or

Optimization Let Then, Algorithm flow

Relaxed Transfer of Different Classes via Spectral Partition

Relaxed Transfer of Different Classes via Spectral Partition

Presentation Transcript

Different classes of mutations – mutation detection

Identifying Boundary of Different Classes of Objects

Partition of India

Transfer Learning on Heterogeneous Feature Spaces via Spectral Transformation

Partition of Africa

Different Classes of People

Random Partition via Shifting

Partition of Africa

Random Partition via Shifting

Partition of Bangal

Partition of Africa

Transfer Learning Via Advice Taking

Three different classes of ADP-ribose transfer reactions : NAD-dependent signaling pathways

Machine Translation via Dependency Transfer

Mounting different drives on one partition

Partition

Partition of Infinity

Different Classes in Train

Different Fire Extinguishers For The Different Classes Of Fire!

Different Ways Of Online Money Transfer

Spectral Classes

Exploring Different Classes of Gemstones