Department of Computer Science and Engineering

A Framework for Community based Distributed and Semantically Annotated Course-ware Development, Sharing and Quality Management for Higher Technical education over Publish/subscribe P2P Overlay Department of Computer Science and Engineering Motilal Nehru National Institute of Technology Allahabad and Applied Artificial Intelligence Group Centre for Development of Advanced Computing, Pune

Higher Technical Education: Observations • Engineering Institutions : 2,500 approx • Annual output: 400,000 approx • Computer Science graduates : 300,000 approx • Growth rate: 20% expected (NASSCOM) • Employable Output: 25% only (McKinsey Global)

Higher Technical Education: Observations • M.Tech. Output: 20,000 • Ph.D. Output • Engineering: less than 1000 • Basic Sciences: around 5,000.

Higher Technical Education: Observations Number of researchers (2007-08) • India :About 154,800 • China: 1,423,000 • US : 1,571,000

Higher Technical Education: Observations Needs: Order of magnitude growth of Quantity and Quality • Rapid and large scale growth of • Student enrollment • Institutes/universities • Research Scholars • Total quality management of • Outputs: Publications, Patents, Personals • Resources: Courseware, Training material, Labs and Evaluation • Services

Impact of Internet • Highly scalable, anywhere/anytime access • Very large volume of: • Courseware • Research papers • Training materials • No positive impact on quality of education. • Points to a disconnect between needs and availability

Possible Reasons for Disconnects • Resources are targeted to a specific groups • May not be suitable for academically, linguistically and culturally different groups of users • Disproportionately larger effort required to search • Lack of semantic annotation • Lack of quality assessment and indicators

Learning Methodologies • Traditional class room teaching with/without ICT • Face to face interaction with teacher and peers • Valuable learning experience • Peer interaction dominant • E-learning: • Unsupervised: No interaction, Learners work in isolation • Supervised: Limited interaction • Static resources: • Very limited support for evolving heterogeneous needs of learners.

E-Learning Infrastructure • Content Delivery : Client/Server Mode • Dedicated Servers in LAN Environment • Through Portals on WWW • Communication Paradigm • Request/Reply • Synchronous • Coupled • Scalability: Limited • Fault Tolerance: Limited

Latent Knowledge Resources • Every institution has large number of hosts. • Each host contains valuable knowledge resources. • Latent: search engine can’t list them • Reason: • Hosts do not have Public IP address • Hosts are not servers • Hidden behind Proxy/NAT

Sharing Latent Knowledge Resources • Interest based cooperative sharing is desirable • Difficulties: • Heterogeneity of interest • Dynamic interest evolution • Rendezvous of availability and interest • Hosts are widely distributed

Sharing Latent Knowledge Resources • Visibility of interests and contents • resource owner – declare the availability and • Interested user -- submit there interest • Dynamic evolution of Interest based communities

Our Vision • Decentralized and autonomous middleware • Highly Scalable • Fault-tolerant • Minimal management and maintenance overhead • Support dynamic evolution of interest based communities for • Collaborative generation of: • Content • Meta-data • Domain ontology • Seamless sharing of resources • Peer interaction

Our Vision • Semantic searching based on • Meta-data • Domain ontology • Quality assessment of resources by community • Behavioral Mining

Challenges • Heterogeneity • Users: Interest and content • Host: uptime, memory, CPU, bandwidth • Scalability and interoperability • Hosts without Public IP • Management of dynamics • content, user group and their behaviors • Absence of domain ontology and meta-data

Requirements • Communication paradigm to support scalability • Decoupling: Time, space and synchronization • Anonymity • Network Infrastructure to support • Peer-to-peer interaction • Dynamic evolution of interest based communities • Interoperability • Seamless dynamic leaving and joining of nodes

Decoupling : • Between providers and consumers • Increase scalability • No dependencies • No coordination & synchronization. • Create highly dynamic, decentralized systems

Dimensions Of Decoupling: • Three dimensions • Space - No need to hold references or even know each other • Time - No need to be available at the same time • Synchronization (flow) - Control flow is not blocked by the interaction

Publish/Subscribe • Paradigm for scalable distributed applications • Provides • Decoupling • Anonymity • Asynchrony

Publish/Subscribe: High Level View

Publish/subscribe: Subscription Model • Topic (subject) -based • Content-based • Type based

Implementation of Event Service • Centralized Implementation • Event matching is easy • No Scalability • No fault Tolerance • Distributed Implementation • Set of nodes designated as Brokers • Improved Scalability and fault tolerance • Routing and matching of events is difficult

Implementation of Event Service • Role based Implementation • Every node can take any role based on context • Broker • Publisher • Subscriber • Highly scalable and fault tolerant

Role based Implementation: Challenges • Management of scalability and fault-tolerance • Application Layer Overlay Hierarchy • Informed/Un-informed leaving • Routing of Publications and subscriptions • Location of rendezvous • Life span

Role based Implementation: Challenges • Role assignment • Designated (fix role) • Dynamic • Matching • Content based • Type based • Notification • Service Guarantee (at least once, at most once etc.)

Current Network Infrastructure

Current Network Infrastructure Within Institute/Organization: • Nodes are assigned Private IPs • Grouped in IP based subnets • Physically connected with each other through layer-2 and layer-3 switches. • Not visible to outside world • Connect to outside world through NAT/Proxy

Our Network Architecture Within LAN of Institute/Organization • Nodes having same interest: • Not aware about each other • May be physically distant • Some virtualization is required • Formation of interest based virtual rings • Virtual links are formed using virtual (e.g.. TCP) links • Virtual ring termed as Overlay.

Our Network Architecture With in LAN of Institute/Organization

Our Network Architecture Node visibility • Nodes hidden behind Proxy/NAT • Virtual rings of same interest may be behind different proxy/NAT • Isolated rings • Resource sharing not possible: Invisibility • Have to come under one umbrella

Our Network Architecture • Virtual Ring of Proxies too. • This makes it a 2-tier Overlay

Our Network Architecture Dynamic Community Evolution • Abstraction over the 2-tier overlay • Isolated rings form communities • Virtual Interest based proximity: Physically nodes may be far apart

Our Overall Network Architecture

Pub/Sub on our Network Architecture: • Every Node acts as: • Publisher, Subscriber, Broker • Rendezvous Point based Matching • Distributed Hash Table (DHT) • Nodes: • Majority are short lived and have minimal capabilities • Small percentage • Remains up for long periods • Relatively better storage, bandwidth and memory • Termed as Super nodes.

Super Nodes • Candidate Super Nodes: • May get elected dynamically • Proxy Nodes • GARUDA nodes/ NKN nodes • May act as Brokers for • Popular content (temporal locality) • Hot contents are automatically cached

Finding Content • Push/Pull Model • Subscription Instead of Searching • Learner need not make search effort • Learner subscribes for content • System provides matching Publication

Finding Content • Semantic Support • Publication with/without meta-data • Subscription with/without meta-data • Knowledge Resources enriched with meta-data • Use of domain specific ontology

Meta-data • Meta-data can be created in distributed manner by: • Content creator • Some designated meta-data expert from the community • Automatic or semi-automatic • Meta-data: Published/subscribed, stored, retrieved as usual knowledge resource.

Ontology • Distributed Ontology creation by • Some experts from community • Published/subscribed, stored, retrieved as usual knowledge resource.

Our Universal Client • Every node will run a generic client application • Universal client provides an interface for: • Joining, Leaving: virtual ring maintenance • Fault tolerance: replication, caching • Publishing, Subscribing content • Event Brokering • Meta-data creation • Ontology creation • Behavior mining and Quality assessment

Our Software Architecture

Layer 1: Distributed and Federated Database It Contains: • Meta-data base • Ontology base • Knowledge Resource base • Access log • Base for user profiles

Layer 1: Distributed and Federated Database It also contains: • Publication base • Subscription base • Base for event brokering

Layer 2: Publish/Subscribe, Overlay Layer It has three sub-layers: • Sub-layer 1 : Overlay sub-layer • Sub-layer 2 : Community Management sub-layer • Sub-layer 3 : Publish/Subscribe sub-layer

Layer 3: Service Layer Provides Services for • Distributed Ontology Creation • Metadata Harvesting • Inference Engine • Multilingual Subscription/Publication Support

An Example Demonstration • Layer 3 of our Software Architecture • Presentation by C-DAC

Design Challenges and Trade-offs • Overlay Architecture: Structured/Unstructured/Hybrid • Unstructured • Stateless, Maintenance cost minimum • Flooding instead of routing, bandwidth wastage • Structured • State full, Maintenance required • No flooding, saves bandwidth

Design Challenges and Trade-offs • Implementation of event service • Purely Distributed • Every node can be broker • High scalability • Higher cost of event management, routing and matching • Partially Distributed • Only Proxies as brokers • Scalability is reduced • Lower cost of event management, routing and matching

Simulation • To evaluate design alternatives: • Role: • Assignment Vs acquisition • Static Vs Dynamic • Utilization of Skewedness in subscription • Replication of Hot Content • Service Guarantee • Life span of Knowledge resources • Informed and Uninformed Leaving

Strengths: MNNIT • Implicit Invocation Systems and Semantic Web • Group of faculty members and research scholars (PhD, MTech) indulged in: • Large scale Publish/Subscribe for dynamic topologies • Automatic meta-data extraction and generation. • Networking and Distributed Computing • Group of faculty members and research scholars (PhD, MTech) indulged in: • Peer-to-Peer computing • Cloud Computing

Department of Computer Science and Engineering