Mapping Open Source Potential in Pakistan

Exploiting Semantic Web for Open Source Software Development Opportunities and Challenges O. Univ. Prof. Dr.techn. A Min Tjoa Director of the Institute of Software Technology and Interactive Systems Vienna University of Technology – Austria http://www.ifs.tuwien.ac.at December 29 / 2006

Outline • Overview of typical Open Source environments • Benefits of OS for Pakistan – An external view • Critical factors • Semantic web – Promise and mechanism • SemanticLIFE – A case study (open source project) • Conclusions

Overview of Typical OS Environments (1/2) • Open source development initiatives • May start as an individual effort (e.g., Linux kernel) • Current trend is collective effort (e.g., Apache) • Development within Community of Practice (CoP) • Harmony and tolerance despite differences and plenty of freedom within CoP • Indeed, differences are welcome • generate new sub-projects and new CoPs • the sub-projects are naturally created (not arbitrarily) • provides natural division of expertise in proper pool

Overview of Typical OS Environments (2/2) • High level of trust within CoP • Run on self management basis – no corporate directives • Behind successful OS projects there is still a hierarchy and methodology • Multiple and flexible ways to contribute (email, wiki pages, CVS updates, blogs, issue tracking systems,...) • Grant of successive contribution rights over resources based upon technical merit • Normally, a voluntary process, however the contributions are widely publicized

Benefits of OS for Pakistan (1/2) • Collaboration: Depending upon your skills collaboration in international projects is possible • Learning: Provides an instant exposure to international working environment and technology • Economy: Less monetary resources are required for OS software projects • Independence: Conversion from reliance on a few proprietary vendors to many OS vendors for your needs (because one can‘t afford many proprietary vendors)

Benefits of OS for Pakistan (2/2) • Software piracy: Can be reduced considerably by adapting OS culture, especially in educational institutions • A change in thinking: A new thinking and working style with proven track record not only for academia and research purposes but also for industry • Revenue: The impact of above benefits naturally raises the Productivity => Revenue increase

Software Piracy Situation – 2004 • Courtesy of 2rd Annual BSA and IDC Global Software Piracy Study, 2005

Software Piracy Situation – 2005 • Courtesy of 3rd Annual BSA and IDC Global Software Piracy Study, May 2006

Critical Factors (1/4) • How to accommodate linguistic diversity vis-à-vis terminological differences. Problem severity increases when • OS projects move more towards end user applications • Application domains evolve dynamically • Collaborations cross boundaries (geographical, cultural, corporate) • Interaction and number of involved agencies increase (producer, income tax, certification,...)

Critical Factors (2/4) • How to gain from personal workbench • Benefit from user‘s intellectual resources • Trouble shooting on personal workbench • Traceability of work patterns • Traceability of project coordination • Benefit from user‘s infrastructure • computing resources open to Grid environment • sharing of communication resources for coordination • sharing of digital resources (libraries, testbeds, ...)

Critical Factors (3/4) • How to balance the privacy protection and access control over your information • Secure flow of shared information within CoPs • Automatic management of access control over your workbench resources

Critical Factors (4/4) • How to make it a viable work culture in your environment. Adaptations may be required for exploitation of: • Local / corporate work culture • Available infrastructure • Prevalent regulations of interacting agencies • The Biggest challenge -- Building evolutionary CoPs

Semantic Web – Promises • Ability to integrate heterogeneous data sources using ontologies • Ability to formally describe the information as you perceive it while making it sharable at the same time • Formal data description makes it understandable and thus processable by software agents • Automatic reasoning is possible due to formal description • Abundance of OS tools from modeling, storage, annotation, reasoning, & query to user interfaces

Semantic Web – Architecture

Semantic Web - The URI Way (1/2) • Everything has a URI • Don’t say “Mina Bazar” • Say http://pakistan.org/culture/festival/MinaBazar • The URI is located within a namespace • http://pakistan.org/culture/festival/

Semantic Web - The URI Way (2/2) • Each CoP may have their own interpretation under a different namespace • http://pakistan.gov.pk/balochistan/MinaBazar

Semantic Web – Resource Description (1/4) • Every Thing can be thought of as an Entity • The Entity is described as a Resource • The Resource may range from literal to concept map, and has formal Description • The Resource is identified by a URI • The Description has the consensus within CoP, at the minimum • The Description is asserted in terms of triples of the form <Subject Predicate Object>, where each of them is a Resource, e.g., • <Organization leadBy Director> • <:KICS leadBy :WaqarMahmood>

Semantic Web – Resource Description (2/4) • RDF vs. Relational Model • Can be encoded in XML • Simplicity and mathematical consistency • Not just tables: Trees

Semantic Web – Resource Description (3/4) In principle, every “piece of information” • can be conceptualized in terms of inter-relation of entities – Ontological (Philosophical) Level, AND • has schema (ontology) described in terms of chains of triples as well as its instances (individuals) – Developer or Content Author Level, AND • has implementation for those chains of triples based upon strong theoretical formalism of Description Logic – Implementation Level • Allows Reasoners to make inference • Deduce relationships such as containment, symmetrical, transitive, inverse relations

Semantic Web – Resource Description (4/4) • So, Semantic Web provides a platform from Conceptual Modeling to Implementation with added inference capabilities • Not necessary to implement all of it for getting started.“Even a little semantics go a long way – James Hendler”

Semantics in Action – Example (1/9)

Semantics in Action – Full Example

Semantics in Action – Another View (1/3)

SemanticLIFE – A Case study • Memories of Life • SemanticLIFE architecture • Feeding artifacts and activities • Semantic annotations, visualization • Lessons learned / issues exposed • Research dissemination

Memories of Life – The vision • Vannevar Bush's 1945 Memex vision (Ref: As We May Think, by Vannevar Bush, The Atlantic Monthly, 176(1), July 1945, 101-108.) • “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.” • “Associative indexing, the process of tying two items together, is the essential feature of the memex.” => persistent trails of information

Memories of Life – Grand Challenge • Memories for life: managing information over a human lifetime (Andrew Fitzgibbon and Ehud Reiter) – GC3 in UK Grand Challenges for Computing Research 2004 • 2006 - still a grand challenge (M4L Working Group) “M4L is not a single phenomenon, implying a single research objective, but a collection of inter-related challenges.”=> Interdisciplinary research challenges • IST Framework 6 rsearch project • Personal vs. Organizational Memories

Memories of Life – Projects • Haystack – MIT CS & AI Labs • MyLifeBits Project – Microsoft BARC Media Presence Group • SemanticLIFE - IFS TU Wien • OpenIris (Semantic Desktop) – SRI AI Center CA • NEPOMUK (Social Semantic Desktop) – 16 EU Partners

Essential Aspects in MoL • Capture of user activities • Interaction with multiple interaction devices • Integration of heterogeneous data sources and data formats • Ability to store lifetime data • Assertion of association of thoughts • Use of appropriate information model • Adaptability towards evolutions

SemanticLIFE – Architecture Requirements • Architecture must be flexible for future data types and software libraries => more freedom in OS • Ability to connect with external data sources (such as legacy databases) • Scalability is a critical issue due to massive amount of lifetime data of different types • Due to heterogeneity of interaction devices, data types, and varying user preferences the software architecture should follow plug-n-play style => more freedom in OS • Functional components perceived as Plug-ins => more freedom in OS • Inter-component communication must be asynchronous

Tools/Technologies Used • Open source Java based under free license • Eclipse as IDE and application workbench • SVN for revision control • Web based technologies

Component Architecture

Data Feed Plugins • Using Google Desktop – Google Desktop API is extended for data transformation and retrieval of GD items

Data Feed Plugins • Other Data Feeds - the following plug-ins are developed by us as open source modules • Email – IMAP folders from email servers • Contacts – From Microsoft Outlook • Calendar – From Microsoft Outlook • File – Any type of file on your computer • Web Pages – browsed web pages by Mozilla Fiorefox • Process Monitoring (windows, linux) – processes data running on your computer

Data Feed – Screenshots

Service Oriented Pipeline Architecture Following components realized as plug-ins: Service Bus: delivers a level of abstraction between application and system-wide services including Pipelines, other system plug-ins and external web services Pipeline:enables the SOPA systems to realize scenarios based on the basic services and the pipelines. The pipeline idea has been inspired from Apache Cocoon Services Bus Plug-in External Web Services Pipelines Plug-in Services MessageBus Plugin (1/2) SOPA Among Top 10

<pipeline name="checkWeather" serialization="xswt"> <parameters> < parameter name="startDate" rdf:datatype="xsd:date"/> < parameter name="endDate" rdf:datatype="xsd:date"/> </parameters> <call id="cities" service="at.slife.webservice" operation="listCities"/> <xsl:for-each select="/result/cities/city"> <call id="city-weather" service="at.slife.weatherservice“ operation="getWeather"> <pipe:attribute name="city">{xpath:cityName}</pipe:attribute > <parameter>{xpath:cityNmae}</parameter> <parameter>{startDate}</parameter> <parameter>{endDate}</parameter> </call> </xsl:for-each> <call id="my-destinations" service="at.slife.profile“ operation="rankData"> <parameter>{xpath:/result/city-weather}</parameter> </call> <transform stylesheet ="weather.xsl"/> </pipeline> MessageBus Plugin (2/2) Sample Pipeline Features • Plug-n-Play business services in Eclipse environment • Service call transparency • Service orchestration • Scenario-oriented design • XSL processing on Web Service calls • Results transformation in multiple target languages

Information Architecture • Comprehensibility in a lifetime knowledge box is an issue • Little semantics goes a long way • Following the LATCH principle • Location • Agent • Time • Categorization • Hierarchy • Collections and Tagging • Automatic extraction of named entities from Text

Photo Annotation

Photo Annotation (Region Semantics)

Visualization – Lifetime Trends

Mapping Open Source Potential in Pakistan