300 likes | 520 Vues
LIS901N: URI. Thomas Krichel 2003-01-??. URIs (background). URI: “uniform resource identifier” Originally, a generalization of: URL (uniform resource locator), URN (uniform resource name), URC (uniform resource citation), and potentially others, but mainly, URL and URN.
 
                
                E N D
LIS901N: URI Thomas Krichel 2003-01-??
URIs (background) • URI: “uniform resource identifier” • Originally, a generalization of: • URL (uniform resource locator), • URN (uniform resource name), • URC (uniform resource citation), • and potentially others, • but mainly, URL and URN
The difference (in theory) between URL and URN: • a URL is bound to a location • when resource moves, url changes • a URN is a name • thus location independent, and, in theory, persistent (whatever “persistent” means)
The Other View • Distinction between URL and URN is artificial • Both terms should be abolished and replaced by “URI” • thus all identifier “schemes” would be URI schemes (even “http”) and no prefix would be necessary (URL, URN, or even URI).
Reasoning • Original URI philosophy: • URLs were a short-term solution and URNs long-term . • URL would be a temporary identification mechanism until a location-independent, persistent identifier was developed, the URN. • Now it seems: • URNs won’t be any more persistent than URLs. • persistence is a social problem, not a technical problem
URI vs URL • The term ‘URL’ or “Universal Resource Locator” is not used in standards anymore. It generally means a URI that contains a domain-name but it is historical only. • This presentation uses the term URI exclusively. • The term ‘URL’ is still sufficient to convey the meaning but should not be used when precision is necessary.
What does a URI identify? • A URI identifies a Resource. • A URI only comes into existence when it is bound to a Resource. • A Resource is defined as anything that is identified by a URI. • Resources only come into existence when a URI is bound to it. • A URI cannot exist without a Resource. • A Resource cannot exist without a URI.
it all comes from Plato • The “URI identifies an abstract Resource” formalism assumes the Platonic concept of “form”. • A Resource, once bound to a URI and brought into existence, is only the abstract ‘essence’ of the ‘real world’ thing’ we perceive. • Any physical or digital version of that Resource is only one of all possible physical representations of that Resource. • For example, http://openlib.org/home/krichel is a URI for a homepage. Using language and content negotiation it is possible to request that page in many languages and formats. Which version is the Resource? • Answer: none of them. Each is only a representation. It is possible to assign a URI to even the representations. But even still, each Resource is only the abstraction of the physical or digital thing, not the thing itself.
What is ‘resolution’? • ‘Resolution’ means accessing some representation of the Resource that a URI identifies. • For ‘http://foo.com/’ it means accessing the homepage of ‘foo.com’ • For ‘mailto:krichel@openlib.org’ it can mean sending an email message to that address. • For URIs that contain network location information it is simply a matter of visiting that location and doing some function. I.e. ‘foo.com’ is the exact network host that can give you the web page.
The history • Tim Berners-Lee came to the IETF in 1992 to develop the WorldWideWeb standards. At the time URIs were known as Universal Resource Locators. • RFC 1738 “Uniform Resource Locators (URL) was published in 1994. • RFC 1738 was updated by RFC 1808, RFC 2368, RFC 2396. • RFC 2396 “Uniform Resource Identifiers (URI): Generic Syntax” is the current standard. • RFC 2396 may be updated to reflect developments in internationalization, terminology updates, and registration procedures.
Confusion… • Due to misunderstandings and the formation of the W3C separately from the IETF, there was a long term disagreement on certain aspects of URIs, especially when it came to Uniform Resource Names (URNs). • A join IETF/W3C URI Interest Group was formed in 2000 to investigate work that needed to be done with URIs in general. • That group published URIs, URLs, and URNs: Clarifications and Recommendations Report from the joint W3C/IETF URI Planning Interest Group (draft-mealling-uri-ig-01.txt ) which begins to clarify the problems and proposes solutions.
URN Uniform Resource Names Are defined by RFC 2141 as a particular URI scheme with these characteristics: • Permanent – Once a URN is assigned to some Resource it can never be re-assigned to something else. • Location Independent – The actual URN should not contain any network location information such as domain-names, IP addresses, file path-names, etc.
RFC2396 • Berners-Lee, Tim Roy T. Fielding and Larry Masinter (1998) ``Uniform Resource Identifiers (URI): Generic Syntax'', rfc2396 • A Uniform Resource Identifier (URI) is a compact string of character for identifying an abstract or physical resource. • They provide a simple and extensible means for identifying a resource.
operations on a URI • There is a set of operations that can be applied to URIs. For example, for a URL, the access to the resource. • To understand if a given URI instance is valid, we have to study the operations applied to URIs.
benefits of uniformity • It allows different type of resource identifiers to be used in the same context, even when the mechanisms used to access those resources may differ • it allows uniform semantic interpretation of common syntactic conventions across different types of resource identifiers • it allows introduction of new types of resource identifiers without interfering with the way that existing identifiers are • it allows the identifiers to be reused in many different contexts, thus permitting new applications or protocols to leverage a pre-existing, large, and widely-used set of resource identifiers.
Resources and Identity in the RFC • A resource can be anything that has identity. Not all resources are network ``retrievable''. The resource is the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping at any particular instance in time. • An identifier is an object that can act as a reference to something that has identity. In the case of URI, the object is a sequence of characters with a restricted syntax.
URI, URL, & URN in the RFC • A URI can be further classified as a locator, a name, or both. The term ``Uniform Resource Locator'' (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network “location”), rather than identifying the resource by name or by some other attribute(s) of that resource. • The term ``Uniform Resource Name'' (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable.
URN in the RFC • A URN differs from a URL in that it's primary purpose is persistent labeling of a resource with an identifier. That identifier is drawn from one of a set of defined namespaces, each of which has its own set name structure and assignment procedures. The “urn” scheme has been reserved to establish the requirements for a standardized URN namespace, as defined in “URN Syntax” RFC2141 and its related specifications.
transcribability • The URI syntax was designed with global transcribability as one of its main concerns. A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters. A URI may be represented in a variety of ways.
consequences of transcribability • A URI is a sequence of characters, which is not always represented as a sequence of octets. • A URI may be transcribed from a non-network source, and thus should consist of characters that are most likely to be able to be typed into a computer, within the constraints imposed by keyboards (and related input devices) across languages and locales. • A URI often needs to be remembered by people, and it is easier for people to remember a URI when it consists of meaningful components.
URI characters • URI consist of a restricted set of characters, nota sequence of octets. The allowable characters primarily chosen to aid transcribability and usability both in computer systems and in non-computer communications. Characters used conventionally as delimiters around URI are excluded. • In the simplest case, the original character sequence contains only characters that are defined in US-ASCII, and the two levels of mapping are simple and easily invertible: each 'original character' is represented as the octet for the US-ASCII code for it, which is, in turn, represented as either the US-ASCII character.
reserved characters • Many URI include components consisting of or delimited by, certain special characters. These characters are called ``reserved'', since their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI. • they are ; / ? : @ & = + $ , • They are allowed within a URI, but which may not be allowed within a particular component of the generic URI syntax.
unreserved & excluded characters • Those are the characters that are allowed and never take any special meaning. They are • the upper and lowercase letters a to z and A to Z • the decimal digits 0 to 9 • the following: - _ . ! ~ * ‘ ( ) • All characters that are not reserved or unreserved are excluded • < > # % ” { } | ^ [ ] ` • and the blank are excluded. They have to be escaped.
escaping • When you want to use a character in a URI that not one of the excluded characters, you have to escape it The way that this done is to write a construction of the form • % hex hex • where hex is a digit or the letters a to f (uppercase or lowercase). The two hex characters represent the value of the character in unicode in hex. For example %7eis the character ~
The Semantic Web • The W3C has been developing a new architecture that applies knowledge representation technology to the WWW. • Using the Resource Description Framework (RDF), Statements are made using a Subject, Predicate and Object (very similar to Lisp and other predicate based languages). • Each Subject, Predicate or Object are Resources in the URI sense and are identified by URIs within an RDF Statement using XML Namespaces.
example • This statement says that the Resource identified by the URI ‘http://openlib.org/home/krichel’ was created by the person ‘Thomas Krichel’: <?xml version="1.0"?> <RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <Description about="http://openlib.org/home/krichel"> <Creator xmlns="http://description.org/schema/">Ora Lassila</Creator> </Description> </RDF>
The Semantic Web • The combination of Web Services and the Semantic Web should give the Web the ability to turn any existing Web Resource into a full node in a purposefully built knowledge representation system with a functional component that allows that knowledge to be acted on. • And both are based on the simple Uniform Resource Identifier.
http://openlib.org/home/krichel Thank you for your attention!