1 / 13

Graphical Representations of Knowledge and Its Distribution

Graphical Representations of Knowledge and Its Distribution. Cliff Behrens Information Analysis Applied Research Telcordia Technologies, Inc 973.829.5198 cliff@research.telcordia.com. Workshop on Statistical Inference, Computing and Visualization for Graphs

aiden
Télécharger la présentation

Graphical Representations of Knowledge and Its Distribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graphical Representations of Knowledge and Its Distribution Cliff Behrens Information Analysis Applied Research Telcordia Technologies, Inc 973.829.5198 cliff@research.telcordia.com Workshop on Statistical Inference, Computing and Visualization for Graphs Stanford University, August 1 - 2, 2003

  2. Knowledge, Consensus and Information Sharing Cultural Knowledge Derived from Consensus Consensus  Knowledge Individual Knowledge Information Sharing Among Individuals in a Single COI

  3. Schemer Knowledge Validation Services • Issues with CSCW technology • Focus of CSCW research on new tools, less on motivating their use • Collaborative modeling building often lacks scientific rigor and quality control • Schemer Web-based technology that derives knowledge from consensus among Subject Matter Experts • Knowledge-based collaboration reveals distribution of domain expertise among panelists • Metrics for qualifying panelists and validating the models they produce • validates saliency of domain to SMEs • estimates competency of SMEs • yields best answers based on responses of SMEs weighted by their respective competencies • Generic service, but first tried on SIAM® influence networks

  4. SIAM® Influence Net Example

  5. Mathematics of Consensus Analysis (Romney et al. 1986) • Formal model consists of a data matrix X containing the responses Xik of SMEs 1..i..N on items 1..k..M • from this matrix a symmetrical matrix M* is estimated and holds the empirical point estimates M*ij, the proportion of matching responses on all items between SMEs i and j, corrected for guessing (if appropriate), on off-diagonal elements. • Obtain approximate solution yielding estimates of the individual SME competencies (the D*i) by applying Maximum Likelihood Factor Analysis to fit equation below and solve for the main diagonal values • M* = D*D*' • relative magnitude of eigenvalues (λ1 > 3 λ2) implies single factor solution • D*i, are the loadings for SMEs on the first factor • D*i = v1i{λ1} • Estimated competency values (D*i ) and the profile of responses for item k (Xik,l) used to compute Bayesian a posteriori probabilities for each possible answer. The formula for the probability that an answer is best or “correct” one follows: N • Pr(<Xik> i=1 | Zk=l) =  [D*i + (1-D*i)/L]Xik,l [(1-D*i)(L-1)/L]1-Xik,li = 1

  6. Schemer Knowledge Validation Services

  7. SME Contact Data • Email services • Meeting services • Other plug-ins • Structured Collaboration and Advice Network • User’s relation to other SMEs • Most similar point-of-view • Most different point-of-view • Someone a bit more knowledgeable • Gurus • Novel thinkers • Information Routing • Supports/challenges one’s point-of-view • Supports/challenges the consensus point-of-view Knowledge-Based Communications Interface

  8. Standard Vector Space Model (ndims = nterms) Reduced LSI Vector Space Model (ndims << nterms) Doc 1 chip memory Doc 3 chip Doc 3 LSI Dimension 2 computer Doc 1 Doc 2 Doc 2 memory LSI Dimension 1 computer Latent Semantic Indexing (LSI): What is it?

  9. LSI: How Does It Work? • Analyze training collection of documents • throw-out stop words and mark-up • count frequencies of words in each document • Compute term  document matrix • store word counts as entries in a matrix • apply appropriate weighting, e.g., log-entropy, to entries • Compute LSI vector space • reduce term  document matrix with Singular Value Decomposition • Fold new documents into LSI vector space • document vector computed from weighted sum of its term vectors • Compute vector for query (“pseudo-document”) • query vector computed from weighted sum of its term vectors • Search vector space for semantically-close term/document vectors • compute cosine of angle between query and other vectors

  10. potato Many Undifferentiated Conceptual Domains/COIs corn chip silicon sugar wafer valley copper "chip" "wafer" valley silicon copper Dimension 2 wafer chip sugar corn potato Dimension 1 "chip" "wafer" Scalability: Large Document Collections and Polysemy

  11. LSI: Ongoing Work • Distributed LSI • Needed for LSI to scale to massive document collections • Adopts “divide and conquer” approach • Sort documents by conceptual domain • recognizes documents created for different COIs • create more semantically homogeneous subcollections • apply cluster analysis, e.g., bisecting K-means • Compute independent LSI vector spaces for each subcollection • more parsimonious representations of concept domains or contexts • Compute similarity measures between spaces • construct graphs from terms shared by two vector spaces • compute similarity between these two graphs • Discover appropriate search vector spaces for a query • cosine calculations (as before) • relevance feedback (as before) • query expansion • Visualizations to explore semantic context for a query in different LSI vector spaces

  12. Vector Spaces Dimensions Non-stop Terms Documents NSF-Geology 298 25,963   3,255 NSF-Engineering 229 30,247  3,057 NSF-Biology 224 38,176   3,645 Movie Reviews 239 70,411   3,557 All Documents 282   122,685  13,514 DLSI: Experiments with NSF-Movie Review Corpus

  13. university center/center’s cooperative earth center reports travel research earth science-fiction/ sci-fi travel alien earth DLSI: The Context of Term Meaning Graph of semantic relationships between top five terms retrieved for the query {travel, center, earth} from the vector space containing only NSF geology abstracts. Graph of semantic relationships between top five terms retrieved for the query {travel, center, earth} from the vector space containing only Ebert movie reviews. Graph of semantic relationships between top five terms retrieved for the query {travel, center, earth} from the vector space containing all documents.

More Related