150 likes | 274 Vues
Strategies for Model-Oriented Information Organization. Robert B. ALLEN ロバート アレン Research Center for Knowledge Communities University of Tsukuba Tsukuba, Japan rba@boballen.info. “Big Data” Problem of Organization and Access for Cultural Heritage Materials.
E N D
Strategies for Model-Oriented Information Organization Robert B. ALLEN ロバートアレン Research Center for Knowledge Communities University of Tsukuba Tsukuba, Japan rba@boballen.info
“Big Data” Problem of Organization and Access for Cultural Heritage Materials • Behavior-based models go beyond ontologies and traditional approaches to knowledge representation. • Rather than developing indexes. Perhaps we can model communities and eventually cultures. • Those models can provide structure to support organization, context, and access. Original motivation is information science view based on indexing historical newspapers. But, it is potentially broader than information science.
Causal Relationships in Timelines of Events Events can be threaded into narratives. Other types of discourse as well. Here is an interactive threaded curated timeline visualization which describes threaded relationships of events. . • Allen, R.B., Visualization, Causation, and History, iConference, 2011.
Explicit Modeling of Texts and Communities • Many types of cultural texts. • Many communities are relatively closed systems. This makes the more tractable than indexing cities. • Detailed knowledge about the community allows synergies • People • Locations • Processes • Earlier work developed an “interactive community directory”. Allen, R.B., Toward an Interactive Directory for Norfolk, Nebraska: 1899-1900, IFLA Newspaper and Genealogy Section Meeting, Singapore, Aug 2013. arXiv:1308.5395
Behavior-Based Models • Timing is right for more explicit cultural modeling. • Big data needs to be organized (e.g., animations from keynote) • Need to represent process and functionality • Fragments of knowledge are unified by models across collections and databases. Our models have conceptual units • Behavior-based models go beyond ontologies and traditional approaches to knowledge representation. • Based on software engineering. Full-fledged programming languages, specifically object-oriented programming languages, are useful for representing those descriptions. Here we explore how to use Java for modeling communities and events in communities. • Entities with state and behavior. • Abstraction and instantiation • Executable models which unfold as they are run. • We need to be cautious about complex, large-scale models • Some efforts at complex modeling such as Cyc have not fulfilled their promise • Other efforts at large-scale indexing has done better • UMLS (Unified Medical Language System) • We focus on a conservative approach to modeling • we will not do too much automatic much inference. • Allen, R.B., Model-Oriented Information Organization: Part 1, The Entity-Event Fabric, D-Lib Magazine, July 2013. • Allen, R.B., Model-Oriented Information Organization: Part 2, Discourse Relationships, D-Lib Magazine, July 2013.
Modeling Text Descriptions with FrameNet • We have lots of rich text descriptions from cultural descriptions. Could we use that? After all, the text descriptions are representations. • One approach to modeling. • FrameNet ( https://framenet.icsi.berkeley.edu/fndrupal/ ) • Essential concepts in natural language described with frames. Connected semantic roles. • Based on cognitive principles, but we can use it as a language resource for out modeling. • We are particularly interested in verb frames because they describe transitions in attributes. • About 700 verb frames. • Frame: “Releasing” A Captor ends the captivity or inhibition of the motion of a Theme from the Location_of_confinement. The release is in accord with the plans of the Captor.
Modeling Text Descriptions (continued) • Limitations of frames • Not always a perfect match • Other types of supplemental knowledge. • Conceptual relationships. • Some of these from FrameNet • Classification (inheritance) hierarchies • Partonomies • Grouping like-objects • Hierarchical parts of a system • World knowledgefrom many sources • Newspapers, Census, Books, Diaries • Separate the entity-event fabric from discourse. • Can we simplify the frame-based models with models of the underlying mechanisms?
Code Fragment for Verb Frame “Release” as a Java Class class V_Release { // A Captor ends the captivity or inhibition of the motion // of a Theme from the Location_of_confinement. The // release is in accord with the plans of the Captor. //State of confinement would be better public V_Release(Person Captor, Person Captive){ Captive.isPrisoner=false; }; Could also be a group } Too simplistic
Example Text • We used textbook or Wikipedia-level texts • These are relatively straight-forward, with simple past tense • By comparison, primary sources have many difficulties. Full of slang, complex constructions, un-grammatical, and often incorrect statements. • Some massaging is still required • Early history (1750-1820) of Minneapolis, Minnesota from Wikipedia French explorer Daniel Greysolon, Sieur du Lhutexplored the Minnesota area in 1680 on a mission to extend French dominance over the area. While exploring the St. Croix River area, he got word that some other explorers had been held captive. He arranged for their release.
Representing Processes (Flows) • More than individual events. Composite structures. • Extending verbs to be processes. • Baking as a narrow process of applying heat versus a complex activity of using a recipes. • Abstractions • Typology of processes • Deterministic sequences of events, scripts • Non-deterministic • Error conditions and work-arounds
Representing Information Resources in the Community • Representing the context in which information resources are generated and accessed. • We need to couple the model of the information resource with flexible models of the community and of the reader. • This is somewhat analogous to the insight about the records continuum model from archives.
Challenges for Coding Natural Language • Processes • Information resources • Discourse versus events: Narrative, explanation, argumentation • Future events, goals • Inexact descriptions (“some”, “sometimes”) • Rich representations about people • Representations for mental events • Culture • Abstraction
Status • We have shown first steps to developing community models. • More gaps than known events. Need to develop frameworks for adding constraints to high-level descriptions. • FrameNet frames generally works well but they need to be extended and there are difficulties with some constructions. • Many composites in natural language such as: “bake-baker-baking-bakery” How to represent this generally given the nuances such as baking at home vs baking in a bakery
Future Directions • Extending community models • Multi-family genealogies • Modeling cities • Linking community and city models together (national models) • Wire in additional procedures (e.g., laws). • Support User Interaction • Better support for discourse such as argumentation by authors • Better support for authoring model-oriented descriptions • Interactive interfaces for working with community histories • Supporting scholarly workbench • Tutorial like descriptions of histories. • Interactive historical re-enactors, games, and cyber-dramas • Broader effort to develop model-oriented information organization • Application of model-oriented information organization to museum objects and informatics • Relationship to cognitive modeling • Frames as a protocol for agents in multi-agent systems • Standards
Strategies for Model-Oriented Information Organization Robert B. ALLEN ロバートアレン Research Center for Knowledge Communities University of Tsukuba Tsukuba, Japan As of March 1, 2014: Department of Library and Information Science Yonsei University, Seoul, Korea For more information see: http://boballen.info/ Contact: rba@boballen.info