tim
Uploaded by
19 SLIDES
332 VUES
190LIKES

Week 6 Progress Update: kNN Analysis and Clustering Techniques

DESCRIPTION

In Week 6, I collaborated with Enrique on refining kNN and threshold graphs; the best k-value was found to be 475. Despite initial noisy results, I explored clustering via the PICS method, a user-parametric approach for mining attributed graphs. This method potentially reveals insights from large datasets like Twitter and YouTube, with results illustrated in generated figures. While working with several scripts yielded no ideal outcomes yet, I remain committed to improving code and methodologies in the coming week.

1 / 19

Télécharger la présentation

Week 6 Progress Update: kNN Analysis and Clustering Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 6 Shelby Thompson

  2. This week… • Emailed Enrique; my kNN/Threshold graphs were wrong • Redid them and experimented with many k-values; results are still too noisy • Values ranged from 5-500 • The greater the number, the closer the threshold graph was to the p-distance graph

  3. Best Threshold Graph(non neighbors 0, thresh graph right) Best k-value was found to be 475

  4. Best Threshold Graph(non neighbors Inf, thresh graph right) Best k-value was found to be 475

  5. Clustering • Next looked at clustering • Used a paper Mahdi suggested: “PICS: Parameter-free Identification of Cohesive Subgroups in Large Attributed Graphs” by Leman Akoglu, Hanghang Tong, Brendan Meeder, and Christos Faloutsos • Paper proposed PICS method of clustering

  6. PICS • Method for mining attributed graphs • Requires no user input/parameters • Running time scales linearly with total graph and attribute size • PICS can reveal useful insight into datasets such as Twitter and YouTube • The above datasets have tens of thousands of nodes

  7. Images generated from PICS(Figure 1)

  8. Images generated from PICS(Figure 1) • Figure 1 shows all of the nodes, separated, before any operation is performed on them

  9. Images generated from PICS(Figure 2)

  10. Images generated from PICS(Figure 2) • Figure 2 shows the node groups in Figure 1, divided based on the average location of the group and number of nodes in the group before the operations are performed

  11. Images generated from PICS(Figure 3)

  12. Images generated from PICS(Figure 3) • Figure 3 shows the node groups in Figure 1, divided based on the average location of the group and number of nodes in the group after the operations are performed

  13. Images generated from PICS(Figure 4)

  14. Images generated from PICS(Figure 4) • Figure 4 shows the major node groups after the clustering operations are performed

  15. Other work this week… • Worked with a number of scripts • None have yielded good results yet • Will continue to work on them this coming week

  16. kNN Code: • Part 2: • %kNN graph • knn=100; • knnIndZero = zeros(length(fbgTestIds),length(fbgTrainIds)); • for i = 1 : length(fbgTestIds) • [vals,ind] = sort(dist(i,:),'ascend'); • knnIndZero(i,ind(1:knn)) = 1; • end • % Threshold Graph • threshIndZero = zeros(length(fbgTestIds),length(fbgTrainIds)); • for i = 1 : length(fbgTestIds) • ind = dist(i,:) <= dist(i,i); • threshIndZero(i,ind) = 1; • end • figure;imagesc(zeroMatrix) • figure;imagesc(knnIndZero) • figure;imagesc(threshIndZero) Part 1: load('pf83_gabor_lbp_hog_2048.mat') dist = pdist2(fbgTestImgs',fbgTrainImgs','cosine'); figure;imagesc(dist); [rows, cols] = size(dist); zeroMatrix = zeros(length(fbgTestImgs),length(fbgTrainImgs); for i = 1:numel(fbgTestIds) for j = 1:numel(fbgTrainIds) if fbgTestIds(i) == fbgTrainIds(j) zeroMatrix(i,j) = 1; end end end

  17. Clustering Code:(Runs fine but no good output) Part 1: load('data/A_call.mat') load('data/F_call.mat') load('pf83_gabor_lbp_hog_2048.mat') xlabels = {'prof','grad','grad-1','ugrad','ugrad-1','staff','sloan'}; groundTruthLabel = zeros(length(fbgTestImgs),length(fbgTrainImgs)); for i = 1:numel(fbgTestIds) for j = 1:numel(fbgTrainIds) if fbgTestIds(i) == fbgTrainIds(j) groundTruthLabel(i,j) = 1; end end end Part 2: clust = test_reality('call', 1, inf); lengthClust=length(clust); cHist = zeros(lengthClust,83); for c=1:lengthClust ind=clust==c; for i=1:83 cHist(c,i)=sum(groundTruthLabel(ind)==i); end end figure;imgsc(cHist) figure;imgsc(groundTruthLabel)

  18. K-Means:(Still in progress) X = [fbgTrainImgs';fbgTestImgs']; k = 100; opts = statset('MaxIter’,10); [idx,ctrs] = kmeans(X,k,'Replicates',1,'options',opts); classes = unique(fbgTrainIds); trnCtrs = ctrs(1:length(fbgTrainIds)); trainHist = zeros(k,length(classes)); for c = 1 : k ind = trnCtrs == c; for i = 1 : length(fbgTrainIds) trainHist(c,i) = sum(ind & fbgTrainIds == i); end end tstCtrs = ctrs(length(fbgTrainIds)+1:end);

  19. End of Week 6

More Related