Exploring Sociologist Consensus on Policy Perspectives: Kass and Option Y

Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? • Fuzzy Relational Data Base: Buckles, Petry • Elements of the tuples contained in the relations may be subsets of the domain universal set. • A similarity relation is defined on each domain universal set.

Fuzzy Data Base • (Project (select Assessment where Name = Kass and Option = Y) over Opinion) giving R1 Retrieve the opinion of Kass concerning option Y • (Project (select Expert where Field = Sociologist) over Name) giving R2 Select all sociologists from the table of experts

(Project (select (Join R2 and Assessment over Name) where Opinion = Y) over Name, Opinion) giving R3 List the opinions of the sociologists • (Join R3 and R1 over Opinion) with THRES (Opinion) ≧ 0.75 and THRES (Name) ≧ 0

Information retrieval based on fuzzy associations • Introduction • Three components in information retrieval • Fuzziness in a thesaurus: first component • Fuzziness in retrieval: second component • Fuzziness on output: third component • Classification of output • Conclusion

Three components in information retrieval • D = {d1,d2,…,dn} be a finite set of documents for retrieval • W = {w1,w2,…,wm} denote a set of descriptors • T : D ─>[0,1]w．T(d): a subset of descriptors in W indexed to the document d. • U(U = T-1)．U(w): documents have keyword w. Information retrieval based on fuzzy associations r’ F U P q r

Fuzziness in a thesaurus: first component • Three type thesaurus (represented as binary relation) • RT: related terms • NT: narrower terms • BT: broader terms • B(v,w) = N(w,v) R(v,w) = R(w,v) • Method of automatic generation of thesauri: • Typical:counting frequencies of simultaneous occurrences of pairs of keywords in a set of documents. • Fuzzy set model: • C = {c1,c2,…,cp} be a finite set of concepts where each ci, i=1,…p represents a unit of concept • H:W ─>[0,1]p a fuzzy set valued function which maps each keyword to it’s corresponding concepts as a fuzzy set in C. is concept of the word w.

Even by present computers, it’s difficult to calculate values of the fuzzy relation above using array in straightforward way, since the numbers of elements in W and D are very large(103ｘ105). Although techniques to handle sparse matrices may be applied, there is another method for generation R and N based on manipulation of sequential files. The principle tool for this is sorting. • (a,b,c) means a record in which field are a,b and c.{(a,b,c)} means a set of records such as (a,b,c). • Input: a set D of documents, Each document d ∈ D has a number of keywords in W.A keyword may occur twice or more in a document. The frequency of occurrence of wi in dk is denoted by hik. • Output: a set of records {(wi,wj,R(wi,wj)]} for all pairs R(wi,wj)<>0

Algorithm GFT (generation of a fuzzy thesaurus). // Find pairs of keywords in every document.// For all dk∈D do find all keywords wi∈W and calculate hjk for all (wi,wj),wi<wj, that are found in dk do make record (wi,wj, min(hik,hjk)) output (wi,wj,min(hik,hjk) to WORK1 repeat for all wi that are found in dk do make record (wi, hjk) output (wi,hjk) to WORK2 repeat repeat //sort WORK1 and WORK2.// sort WORK1 into increasing order of the key (wi,wj) sort WORK2 into increasing order of the key wi

//Calculate R.Scan WORK1 and WORK2.// for all (wi,wj) in WORK1 do find all record for (wi,wj) in WORK1 and all records for wi, and wj in WORK2 R (wi,wj)←∑k min(hik,hjk)/(∑k hik+ ∑k hjk- ∑kmin(hik,hjk)) output (wi,wj,R(wi,wj)) to an output file repeat end-of algorithm GFT In a foregoing paper an experimental calculation on three thousand documents and thirty thousand keywords was carried out using GFT based on sorting shows a reasonable amount of 800 sec of CPU time.

//record (di,pi)// //before another record (di,pi) satisfies either di<dj or// //di = dj, pi > pj// Take the first record (d1,p1) in work (D,P)<-(d1,p1) for all dj in WORK do //the dj’s are sequentially examined.// if D <> dj then output (D,P) to to an output file OUT (D,P)<-(di,pj) endif repeat output(D,P) to OUT //OUT contains exactly those records that represent P=Uf(d,w) define by above//

//Third step: if necessary sort again.// sort OUT into the decreasing order of the key p and print OUT

Fuzziness in retrieval: second component • For the crisp case a retrieval through a thesaurus given a keyword w is as follows. • Examine the thesaurus F and find all associated terms v11,v12,…,v1p. • Find subsets U(v11),U(v12),…,U(v1p)． • Establish the retrieved set of documents as the union of U(v11),U(v12),…,U(v1p): ∪1≦i≦p U(v1i) • Uf(d,w) = 1 iff d∈U(v1) for some v1 such that F(v1,w) = 1, 0 otherwise.

When the thesaurus F is fuzzy and U is crisp Uf(d,w) = max v∈W min [U(d,v),F(v,w)]. This equation is valid also for a fuzzy relation U(d,v). Algorithm FR(Fuzzy Retrieval). //First step: Find all records.// for all v such that F(v,w) <> 0 in FT do for all d∈U(v) do p(d,v)<-min[U(d,v),F(v,w)] output record (d,p(d,v)) to a work file WORK repeat repeat //second step: Find values of Uf.// sort WORK into increasing order of the first key d and into decreasing order of the second key p //the above sorting means that in the resulting sequence,a//

End-of FR • Fuzziness on output: third component • Fuzzy filter. EX: • (a) Find recent documents that have keyword w. • (b) Find documents that have keywords w and are • relevant to one’s field of interest • r = r’ ∩ g • Classification of output • Decreasing of membership • Divide into layers • Conclusion • Problem for further studies • Discussion of crisp techniques of advanced indexing and retrieval using a fuzzy set model,

Studies of efficient algorithms for large scale database. In particular, development of hardware for information retrieval should be taken into account. • Application of methods in fuzzy information retrieval to related areas.

Exploring Sociologist Consensus on Policy Perspectives: Kass and Option Y

Exploring Sociologist Consensus on Policy Perspectives: Kass and Option Y

Presentation Transcript

5. The Intentionality Model Diagrammed and Defined in Relation to the Duplex Pyramids

Seeds for Similarity Search

Domain Restriction on Relation

The Problem Defined:

The Opinion Paragraph

Data-driven Visual Similarity for Cross-domain Image Matching

The Hero Defined

The study domain for this

GENIA-GR: a Grammatical Relation Corpus for Parser Evaluation in the Biomedical Domain

The Relation Ontology

Similarity

The Opinion Essay

The Opinion Essay

DNSSEC for the Domain

The Opinion Essay

A Study of Hybrid Similarity Measures for Semantic Relation Extraction

What is the domain of the following relation? (use correct notation)

Similarity

Operators for Similarity Search

Similarity

The Need for Domain Parking