580 likes | 716 Vues
Knowledge out there on the Web. Serge Abiteboul. Knowledge out there on the Web: Video of the talk at the Royal S ociety. Personal knowledge o ut there. Knowledge out there on the Web. Organization. The context The personal information management system
E N D
Knowledge out there on the Web Serge Abiteboul Abiteboul - EDBT keynote, Athenes
Knowledge out there on the Web: Video of the talk at the Royal Society Personal knowledge out there Knowledge out there on the Web Abiteboul - EDBT keynote, Athenes
Organization • The context • The personal information management system • The concept of Pims • Pims are coming • Advantages • From information to knowledge • The Webdamlog language • The language in brief • Probabilities • Access control • Conclusion: some research issues Abiteboul - EDBT keynote, Athenes
1. The context Abiteboul - EDBT keynote, Athenes
Data explosion • data: pictures, music, movies, reports, email, tweets, contacts, schedules… • social interactions: opinions, annotations, recommendation… • metadata: on photos, documents, music… • ontologies: Alice’s ontology and mapping with other ontologies • web localizations: friends account on FB, twitter, lists of blogs… • security: credentials on various systems • data in various organizations • jobs, schools, insurances, banks, taxes, medical, retirement… • data in various vendors • amazon, retailers, netflix, applestore… • data that software or hardware sensors capture • with or without our knowledge • web navigation, phone use, geolocation, "quantified self" measurements, contactless card readings, surveillance camera pictures, • … Abiteboul - EDBT keynote, Athenes
Data dispersion • Laptop, desktop, smartphone, tablet, car computer • Residential boxes (tvbox), NAS, electronic vaults… • Mail, address book, agenda, todo-lists • Facebook, LinkedIn, Picasa, YouTube, Tweeter • Svn, Google docs, Dropbox • Government services • Business services • Also machine and systems from • family, friends, associations, work • Systems even unknown to the user • third party cookies Abiteboul - EDBT keynote, Athenes
Data heterogeneity Type: text, relational, HTML, XML, pdf… Terminology/structure/ontology Systems: MS, Linux, IOS, Android Distribution Security protocols Quality: incomplete / inconsistent information Abiteboul - EDBT keynote, Athenes
Bad news • Limited functionalities because of the silos • Difficult to do global search, synchronization, task sequencing over distinct systems… • Loss of control over the data • Difficult to control privacy • Leaks of private information • Loss of freedom • Vendor lock-in Abiteboul - EDBT keynote, Athenes
Growing resentment • Against companies • Intrusive marketing, cryptic personalization and business decisions (e.g., on pricing), and automated customer service with no real channel for customers' voices • Creepy "big data" inferences • Against governments • NSA and its European counterparts • Dissymmetry between what these systems know about a person, and what the person actually knows Abiteboul - EDBT keynote, Athenes
Future alternatives (for normal people) • Continue with this increasing mess • Use a shrink to overcome frustration • Regroup all your data on the same platform • Google, Apple, Facebook, …, a new comer • Use a shrink to overcome resentment • Study 2 years to become a geek • Geeks know how to manage their information • Use a shrink to survive the experience • And, of course, there is the Pims’ way Abiteboul - EDBT keynote, Athenes
2. The personal information management system 2.1 Introduction Abiteboul - EDBT keynote, Athenes
The Pims • Personal information management system • What is a successful Web service today • Some great software • Some machines on which it runs (and a business model) • Separate the two facets • Some company provides the software • It runs on your machine with another business model Abiteboul - EDBT keynote, Athenes
The Pims (1) • The Pims runs software • The user chooses the code to deploy on the server. • The software is open source, a requirement for security. • With the user's data • All the user’s personal information • 0n the user’s server(s) • The user owns it or pays for a hosted server • The server may be a physical or a virtual machine • It may be physically located at the user’s home (e.g., a tvbox) or not • It may run on a single machine or be distributed among several machines • The server is in the cloud, i.e., it can be reached from everywhere - personal cloud Abiteboul - EDBT keynote, Athenes
The Pims: the 2 main issues • Security • Enforced by the Pims: guaranteed by the contract the user has with the Pims • Reasonably small piece of code; possible to verify it • Enforced by the services running on it: open source so that we don’t need to trust the providers of these systems • A higher level of security than now • The management • Should be epsilon-work • Should require little competence • A company can be paid to do it (in the cloud) Abiteboul - EDBT keynote, Athenes
2. The personal information management system 2.2 This is arriving Abiteboul - EDBT keynote, Athenes
It is becoming possible • System administration is easier • Abstraction technologies for servers • Virtualization and configuration management tools. • Open source is very active • Open source technology more and more available • Price of machines is going down • A hosted-low cost server is as cheap as 5€/month • Paying is no longer a barrier for a majority of people Indeed I am sure you have friends already doing it Abiteboul - EDBT keynote, Athenes
Many people are working on it • Many systems & projects • Lifestreams, Stuff-I’ve-Seen, Haystack, MyLifeBits, Connections, Seetrieve, Personal Dataspaces, or deskWeb. • YounoHost, Amahi, ArkOS, OwnCloud or Cozy Cloud • Some on particular aspects • Mailpile for mail • Lima for a Dropbox-like service, but at home. • Personal NAS (network-connected storage) e.g. Synologie • Personal data store SAMI of Samsung... • Many more Abiteboul - EDBT keynote, Athenes
Data disclosure movement • Smart Disclosure in the US • MiDatain the UK • MesInfosin France Several large companies (network operators, banks, retailers, insurers…) have agreed to share with a panel of customers the personal data that they have about them Abiteboul - EDBT keynote, Athenes
Big companies are interested (1) Pre-digital companies • E.g., hotels or banks • Disintermediated from their customers by pure Internet players such as Google, Amazon, Booking.com, Mint. • In Pims, they can rebuild direct interaction • The playing field is neutral • Unlike on the Internet where they have less data • They can offer new services without compromising privacy Abiteboul - EDBT keynote, Athenes
Big companies are interested (2) Home appliances companies • Many boxes deployed at home or in datacenters • Internet access provider "boxes”, NAS servers, "smart" meters provided by energy vendors, home automation systems, "digital lockers”… • Personal data spaces dedicated to specific usage • Could evolve to become more generic • Control of private Internet of objects Abiteboul - EDBT keynote, Athenes
2. The personal information management system 2.3 Advantages Abiteboul - EDBT keynote, Athenes
Advantages • User control over their data • Who has access to what, under what rules, to do what • User empowerment • They choose freely services & they can leave a service • Participation to a more “neutral” Web • With the "network effects", the main platforms are accumulating data/customers and distorting competition • The Pims bring back fairness on the Web • Good practices are encouraged, e.g., interoperability, portability Abiteboul - EDBT keynote, Athenes
Advantages – New functionalities • Single identity/login • Semantic global search with (personal) ontology • Synchronization/backups across services • Access control management across services • Task sequencing across services • Exchange of information between “friends” • Connected objects control, a hub for the IoT • Personal big data analysis Abiteboul - EDBT keynote, Athenes
3. From information to knowledge (aka let’s move a tad more technical) Abiteboul - EDBT keynote, Athenes
Machines prefer knowledge • Integration of data & information sources • It is easier to integrate knowledge than information • Collaboration between services& devices • It is easier for services to collaborate using knowledge than with information • Problem solving based on knowledge inference Abiteboul - EDBT keynote, Athenes
Humans as well • The users of the system are human beings • They want support for managing information • But they are not geeks • They don’t want to program • To facilitate the interactions between humans and machines, We should use declarative languages ! Abiteboul - EDBT keynote, Athenes
It all started with datalog • Popular in the 90’s • Some followers in 00’s • A., Afrati, Atzeni, Cali, Greco, Gotloeb, Milo, Sacca, Ullman… • Recent revival • 2010 Oege de Moor’s workshop @oxford • Datalog 2.0 • 2010 Joe Hellerstein’s keynote @pods • Datalog Redux: Experience and Conjecture • 2014 Frank Neven’s keynote @icdt • Remaining CALM in declarative networking • Now featuring: Webdamlog Abiteboul - EDBT keynote, Athenes
Requirement 1: Distribution • Different machines • Different users • We use the notion of principal here • family@alice(Bob) • agenda@Alice-iPhone(…) • friends@Alice-FaceBook(…) • A principal comes with identity and privileges Abiteboul - EDBT keynote, Athenes
Requirement 2: Privacy • Control of who sees what in a distributed environment • Access control • Should be clear from the first part of the talk this is a most important issue Tutorial on privacy by Nicolas Anciaux, Benjamin Nguyen, IulianSanduPopa • Today at 2:00 Abiteboul - EDBT keynote, Athenes
The more I see, the less I know for sure. John Lennon Requirement 3: Probabilities • We have to deal with negation • Elvis was not French • With negations, come contradictions • Elvis Presley died in 1977; The King is alive • There are different points of view • Elvis’s music is the best; it stinks • Measure uncertainty with probabilities Abiteboul - EDBT keynote, Athenes
So, what is the goal • A datalog-style language with distribution access control probabilities We are lucky, there is such a language: Webdamlog Abiteboul - EDBT keynote, Athenes
4. The Webdamlog language (aka let’s be serious) 4.1 Webdamlog in brief Abiteboul - EDBT keynote, Athenes
Facts and rules Facts are of the form R@p(a1,…,an) • p is a principal, i.e., Serge, Serge’s-iPhone, Facebook/Serge, s.ab@gmail.com Rules are of the form $R@$P($U) :- $R1@$P1($U1), ..., $Rn@$Pn($Un) • $R, $Ri are relationterms • $P, $Pi are peer terms • $U, $Ui are tuples of terms • Safety condition • Also negations: ignored here Abiteboul - EDBT keynote, Athenes
The semantics of rules Classification based on locality and nature of head predicates (intentional or extensional) • Local rule at my-laptop: all predicates in the body of the rules are from my-laptop Local with local intentional head datalog Local with local extensional head database update Local with non-local extensional head messaging between peers Local with non-local intentional head view definition Non-local general delegation Abiteboul - EDBT keynote, Athenes
Local rules with localhead Intensional local head – datalog [at my-iphone] fof@my-iphone($x, $y) :- friend@my-iphone($x,$y) fof@my-iphone($x,$y) :- friend@my-iphone($x,$z), fof@my-iphone($z,$y) Extensional local head – database updates [at my-iphone] believe@my-iphone(“Alice”, $loc) :- tell@my-iphone($p,”Alice”, $loc), friend@my-iphone($p) Abiteboul - EDBT keynote, Athenes
Local rules & non-local extensional head Messaging between peers $message@$peer($name, “Happy birthday!”) :- today@my-iphone($date), birthday@my-iphone($name, $message, $peer, $date) Example • today@my-iphone(3/25) • birthday@my-iphone("Manon”, “sendmail”, “gmail.com”, 3/25) • sendmail@gmail.com("Manon”, “Happy birthday”) Abiteboul - EDBT keynote, Athenes
Local rules & non-local intentional head View definition boyMeetsGirl@gossip-site($girl, $boy) :- girls@my-iphone($girl, $event), boys@my-iphone($boy, $event) • Semantics of boyMeetGirl@gossip-site is a join of relations girls and boys from my-iphone • Defines a view at some other peer Abiteboul - EDBT keynote, Athenes
Non-local rules General delegation (at my-iphone): boyMeetsGirl@gossip-site($girl, $boy) :- girls@my-iphone($girl, $event), boys@alice-iphone($boy, $event) Example: girls@my-iphone(“Alice”, “Julia's birthday”) • my-iphone installs the following rule at alice-iphone boyMeetsGirl@gossip-site(“Alice”, $boy) :- boys@alice-iphone($boy, “Julia's birthday”) Useful to distribute work and exchange knowledge Abiteboul - EDBT keynote, Athenes
The thesis The Web should turn into a distributed knowledge base where peers share facts and rules, and collaborate The language Webdamlog is a first step towards that goal Missing • Probabilities • Access control Abiteboul - EDBT keynote, Athenes
4. The Webdamlog language 4.2 Probabilities Abiteboul - EDBT keynote, Athenes
Advertisement Deduction with Contradictions in Datalog S.A., Daniel Deutch and Victor Vianu • Tomorrow, 11:00 Abiteboul - EDBT keynote, Athenes
4. The Webdamlog language 4.3 Access control Abiteboul - EDBT keynote, Athenes
Requirements Data access Users would like to control who can read and modify their information Data dissemination Users would like to control how their data are transferred from one participant to another Application control Users would like to control which applications can run on their behalf, and what information these applications can access. Abiteboul - EDBT keynote, Athenes
The general picture • Coarse grain for extensional relations • read access to the relation • Fine grain for intensional relations • read access to tuple t requires read access to the tuples that lead to deriving t • Delegation controlled in a sandbox • Focus on read privilege here Abiteboul - EDBT keynote, Athenes
Read: default • Extensional relations • if you have read privilege to the relation • Intensional relations • if you have read privilege to the relation & • if you can read all the tuples that have been used to create this fact – provenance of the fact Abiteboul - EDBT keynote, Athenes
Fine grain access control [at Bob] album@Alice($p,$f) :- photo@Bob($p,$f) [at Sue] album@Alice($p,$f) :- photo@Sue($p,$f) • album@Alice is intensional • Both Bob and Sue contribute to it • Peter who has read privilege to album@Alice and photo@Bob only does not see the photos of Sue Abiteboul - EDBT keynote, Athenes
Paranoiac access control [at Bob] album@Alice($p,$f) :- photo@Bob($p,$f), friends@Bob($f) • Issue: you can read Bob’s photos only if you have read privilege on friends@Bob that Bob wants to keep private Abiteboul - EDBT keynote, Athenes
Declassification [at Bob] photo@Alice($p,$f) :- photo@Bob($p,$f), [ hide friends@Bob($f) ] • Hide: blocks the provenance from friends@Bob • Bob declassify this data just for the evaluation of this rule • You can declassify only tuples you own ↦ grant privilege Abiteboul - EDBT keynote, Athenes
Issues with non local rules [at Bob] message@Sue(“I hate you”) :- date@Alice(d) aliceSecret@Bob(x) :-date@Alice(d), secret@Alice(x) Ignoring access rights, by delegation, this results in running [at Alice] message@Sue(“I hate you”) :- date@Alice(d) aliceSecret@Bob(x) :- date@Alice(d), secret@Alice(x) Abiteboul - EDBT keynote, Athenes
Default solution: sand box We run the rule at Alice in a Sandbox • We use the access rights of Bob So the second rule does not succeed in sending secrets • The message specifies that this is done at Bob’s request So requires authentication/signatures Alternative: delegation without sandbox Abiteboul - EDBT keynote, Athenes