P1L1 P1L2 P1L3 P1L4 Network Working Group T. Berners-Lee P1L5 Request for Comments: 2396 MIT/LCS P1L6 Updates: 1808, 1738 R. Fielding P1L7 Category: Standards Track U.C. Irvine P1L8 L. Masinter P1L9 Xerox Corporation P1L10 August 1998 P1L11 P1L12 P1L13 Uniform Resource Identifiers (URI): Generic Syntax P1L14 P1L15 Status of this Memo P1L16 P1L17 This document specifies an Internet standards track protocol for the P1L18 Internet community, and requests discussion and suggestions for P1L19 improvements. Please refer to the current edition of the "Internet P1L20 Official Protocol Standards" (STD 1) for the standardization state P1L21 and status of this protocol. Distribution of this memo is unlimited. P1L22 P1L23 Copyright Notice P1L24 P1L25 Copyright (C) The Internet Society (1998). All Rights Reserved. P1L26 P1L27 IESG Note P1L28 P1L29 This paper describes a "superset" of operations that can be applied P1L30 to URI. It consists of both a grammar and a description of basic P1L31 functionality for URI. To understand what is a valid URI, both the P1L32 grammar and the associated description have to be studied. Some of P1L33 the functionality described is not applicable to all URI schemes, and P1L34 some operations are only possible when certain media types are P1L35 retrieved using the URI, regardless of the scheme used. P1L36 P1L37 Abstract P1L38 P1L39 A Uniform Resource Identifier (URI) is a compact string of characters P1L40 for identifying an abstract or physical resource. This document P1L41 defines the generic syntax of URI, including both absolute and P1L42 relative forms, and guidelines for their use; it revises and replaces P1L43 the generic definitions in RFC 1738 and RFC 1808. P1L44 P1L45 This document defines a grammar that is a superset of all valid URI, P1L46 such that an implementation can parse the common components of a URI P1L47 reference without knowing the scheme-specific requirements of every P1L48 possible identifier type. This document does not define a generative P1L49 grammar for URI; that task will be performed by the individual P1L50 specifications of each URI scheme. P2L1 1. Introduction P2L2 P2L3 Uniform Resource Identifiers (URI) provide a simple and extensible P2L4 means for identifying a resource. This specification of URI syntax P2L5 and semantics is derived from concepts introduced by the World Wide P2L6 Web global information initiative, whose use of such objects dates P2L7 from 1990 and is described in "Universal Resource Identifiers in WWW" P2L8 [RFC1630]. The specification of URI is designed to meet the P2L9 recommendations laid out in "Functional Recommendations for Internet P2L10 Resource Locators" [RFC1736] and "Functional Requirements for Uniform P2L11 Resource Names" [RFC1737]. P2L12 P2L13 This document updates and merges "Uniform Resource Locators" P2L14 [RFC1738] and "Relative Uniform Resource Locators" [RFC1808] in order P2L15 to define a single, generic syntax for all URI. It excludes those P2L16 portions of RFC 1738 that defined the specific syntax of individual P2L17 URL schemes; those portions will be updated as separate documents, as P2L18 will the process for registration of new URI schemes. This document P2L19 does not discuss the issues and recommendation for dealing with P2L20 characters outside of the US-ASCII character set [ASCII]; those P2L21 recommendations are discussed in a separate document. P2L22 P2L23 All significant changes from the prior RFCs are noted in Appendix G. P2L24 P2L25 1.1 Overview of URI P2L26 P2L27 URI are characterized by the following definitions: P2L28 P2L29 Uniform P2L30 Uniformity provides several benefits: it allows different types P2L31 of resource identifiers to be used in the same context, even P2L32 when the mechanisms used to access those resources may differ; P2L33 it allows uniform semantic interpretation of common syntactic P2L34 conventions across different types of resource identifiers; it P2L35 allows introduction of new types of resource identifiers P2L36 without interfering with the way that existing identifiers are P2L37 used; and, it allows the identifiers to be reused in many P2L38 different contexts, thus permitting new applications or P2L39 protocols to leverage a pre-existing, large, and widely-used P2L40 set of resource identifiers. P2L41 P2L42 Resource P2L43 A resource can be anything that has identity. Familiar P2L44 examples include an electronic document, an image, a service P2L45 (e.g., "today's weather report for Los Angeles"), and a P2L46 collection of other resources. Not all resources are network P2L47 "retrievable"; e.g., human beings, corporations, and bound P2L48 books in a library can also be considered resources. P3L1 The resource is the conceptual mapping to an entity or set of P3L2 entities, not necessarily the entity which corresponds to that P3L3 mapping at any particular instance in time. Thus, a resource P3L4 can remain constant even when its content---the entities to P3L5 which it currently corresponds---changes over time, provided P3L6 that the conceptual mapping is not changed in the process. P3L7 P3L8 Identifier P3L9 An identifier is an object that can act as a reference to P3L10 something that has identity. In the case of URI, the object is P3L11 a sequence of characters with a restricted syntax. P3L12 P3L13 Having identified a resource, a system may perform a variety of P3L14 operations on the resource, as might be characterized by such words P3L15 as `access', `update', `replace', or `find attributes'. P3L16 P3L17 1.2. URI, URL, and URN P3L18 P3L19 A URI can be further classified as a locator, a name, or both. The P3L20 term "Uniform Resource Locator" (URL) refers to the subset of URI P3L21 that identify resources via a representation of their primary access P3L22 mechanism (e.g., their network "location"), rather than identifying P3L23 the resource by name or by some other attribute(s) of that resource. P3L24 The term "Uniform Resource Name" (URN) refers to the subset of URI P3L25 that are required to remain globally unique and persistent even when P3L26 the resource ceases to exist or becomes unavailable. P3L27 P3L28 The URI scheme (Section 3.1) defines the namespace of the URI, and P3L29 thus may further restrict the syntax and semantics of identifiers P3L30 using that scheme. This specification defines those elements of the P3L31 URI syntax that are either required of all URI schemes or are common P3L32 to many URI schemes. It thus defines the syntax and semantics that P3L33 are needed to implement a scheme-independent parsing mechanism for P3L34 URI references, such that the scheme-dependent handling of a URI can P3L35 be postponed until the scheme-dependent semantics are needed. We use P3L36 the term URL below when describing syntax or semantics that only P3L37 apply to locators. P3L38 P3L39 Although many URL schemes are named after protocols, this does not P3L40 imply that the only way to access the URL's resource is via the named P3L41 protocol. Gateways, proxies, caches, and name resolution services P3L42 might be used to access some resources, independent of the protocol P3L43 of their origin, and the resolution of some URL may require the use P3L44 of more than one protocol (e.g., both DNS and HTTP are typically used P3L45 to access an "http" URL's resource when it can't be found in a local P3L46 cache). P3L47 P3L48 P4L1 A URN differs from a URL in that it's primary purpose is persistent P4L2 labeling of a resource with an identifier. That identifier is drawn P4L3 from one of a set of defined namespaces, each of which has its own P4L4 set name structure and assignment procedures. The "urn" scheme has P4L5 been reserved to establish the requirements for a standardized URN P4L6 namespace, as defined in "URN Syntax" [RFC2141] and its related P4L7 specifications. P4L8 P4L9 Most of the examples in this specification demonstrate URL, since P4L10 they allow the most varied use of the syntax and often have a P4L11 hierarchical namespace. A parser of the URI syntax is capable of P4L12 parsing both URL and URN references as a generic URI; once the scheme P4L13 is determined, the scheme-specific parsing can be performed on the P4L14 generic URI components. In other words, the URI syntax is a superset P4L15 of the syntax of all URI schemes. P4L16 P4L17 1.3. Example URI P4L18 P4L19 The following examples illustrate URI that are in common use. P4L20 P4L21 ftp://ftp.is.co.za/rfc/rfc1808.txt P4L22 -- ftp scheme for File Transfer Protocol services P4L23 P4L24 gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles P4L25 -- gopher scheme for Gopher and Gopher+ Protocol services P4L26 P4L27 http://www.math.uio.no/faq/compression-faq/part1.html P4L28 -- http scheme for Hypertext Transfer Protocol services P4L29 P4L30 mailto:mduerst@ifi.unizh.ch P4L31 -- mailto scheme for electronic mail addresses P4L32 P4L33 news:comp.infosystems.www.servers.unix P4L34 -- news scheme for USENET news groups and articles P4L35 P4L36 telnet://melvyl.ucop.edu/ P4L37 -- telnet scheme for interactive services via the TELNET Protocol P4L38 P4L39 1.4. Hierarchical URI and Relative Forms P4L40 P4L41 An absolute identifier refers to a resource independent of the P4L42 context in which the identifier is used. In contrast, a relative P4L43 identifier refers to a resource by describing the difference within a P4L44 hierarchical namespace between the current context and an absolute P4L45 identifier of the resource. P4L46 P4L47 P4L48 P5L1 Some URI schemes support a hierarchical naming system, where the P5L2 hierarchy of the name is denoted by a "/" delimiter separating the P5L3 components in the scheme. This document defines a scheme-independent P5L4 `relative' form of URI reference that can be used in conjunction with P5L5 a `base' URI (of a hierarchical scheme) to produce another URI. The P5L6 syntax of hierarchical URI is described in Section 3; the relative P5L7 URI calculation is described in Section 5. P5L8 P5L9 1.5. URI Transcribability P5L10 P5L11 The URI syntax was designed with global transcribability as one of P5L12 its main concerns. A URI is a sequence of characters from a very P5L13 limited set, i.e. the letters of the basic Latin alphabet, digits, P5L14 and a few special characters. A URI may be represented in a variety P5L15 of ways: e.g., ink on paper, pixels on a screen, or a sequence of P5L16 octets in a coded character set. The interpretation of a URI depends P5L17 only on the characters used and not how those characters are P5L18 represented in a network protocol. P5L19 P5L20 The goal of transcribability can be described by a simple scenario. P5L21 Imagine two colleagues, Sam and Kim, sitting in a pub at an P5L22 international conference and exchanging research ideas. Sam asks Kim P5L23 for a location to get more information, so Kim writes the URI for the P5L24 research site on a napkin. Upon returning home, Sam takes out the P5L25 napkin and types the URI into a computer, which then retrieves the P5L26 information to which Kim referred. P5L27 P5L28 There are several design concerns revealed by the scenario: P5L29 P5L30 o A URI is a sequence of characters, which is not always P5L31 represented as a sequence of octets. P5L32 P5L33 o A URI may be transcribed from a non-network source, and thus P5L34 should consist of characters that are most likely to be able to P5L35 be typed into a computer, within the constraints imposed by P5L36 keyboards (and related input devices) across languages and P5L37 locales. P5L38 P5L39 o A URI often needs to be remembered by people, and it is easier P5L40 for people to remember a URI when it consists of meaningful P5L41 components. P5L42 P5L43 These design concerns are not always in alignment. For example, it P5L44 is often the case that the most meaningful name for a URI component P5L45 would require characters that cannot be typed into some systems. The P5L46 ability to transcribe the resource identifier from one medium to P5L47 another was considered more important than having its URI consist of P5L48 the most meaningful of components. In local and regional contexts P6L1 and with improving technology, users might benefit from being able to P6L2 use a wider range of characters; such use is not defined in this P6L3 document. P6L4 P6L5 1.6. Syntax Notation and Common Elements P6L6 P6L7 This document uses two conventions to describe and define the syntax P6L8 for URI. The first, called the layout form, is a general description P6L9 of the order of components and component separators, as in P6L10 P6L11 /;? P6L12 P6L13 The component names are enclosed in angle-brackets and any characters P6L14 outside angle-brackets are literal separators. Whitespace should be P6L15 ignored. These descriptions are used informally and do not define P6L16 the syntax requirements. P6L17 P6L18 The second convention is a BNF-like grammar, used to define the P6L19 formal URI syntax. The grammar is that of [RFC822], except that "|" P6L20 is used to designate alternatives. Briefly, rules are separated from P6L21 definitions by an equal "=", indentation is used to continue a rule P6L22 definition over more than one line, literals are quoted with "", P6L23 parentheses "(" and ")" are used to group elements, optional elements P6L24 are enclosed in "[" and "]" brackets, and elements may be preceded P6L25 with * to designate n or more repetitions of the following P6L26 element; n defaults to 0. P6L27 P6L28 Unlike many specifications that use a BNF-like grammar to define the P6L29 bytes (octets) allowed by a protocol, the URI grammar is defined in P6L30 terms of characters. Each literal in the grammar corresponds to the P6L31 character it represents, rather than to the octet encoding of that P6L32 character in any particular coded character set. How a URI is P6L33 represented in terms of bits and bytes on the wire is dependent upon P6L34 the character encoding of the protocol used to transport it, or the P6L35 charset of the document which contains it. P6L36 P6L37 The following definitions are common to many elements: P6L38 P6L39 alpha = lowalpha | upalpha P6L40 P6L41 lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | P6L42 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | P6L43 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" P6L44 P6L45 upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | P6L46 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | P6L47 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" P6L48 P7L1 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | P7L2 "8" | "9" P7L3 P7L4 alphanum = alpha | digit P7L5 P7L6 The complete URI syntax is collected in Appendix A. P7L7 P7L8 2. URI Characters and Escape Sequences P7L9 P7L10 URI consist of a restricted set of characters, primarily chosen to P7L11 aid transcribability and usability both in computer systems and in P7L12 non-computer communications. Characters used conventionally as P7L13 delimiters around URI were excluded. The restricted set of P7L14 characters consists of digits, letters, and a few graphic symbols P7L15 were chosen from those common to most of the character encodings and P7L16 input facilities available to Internet users. P7L17 P7L18 uric = reserved | unreserved | escaped P7L19 P7L20 Within a URI, characters are either used as delimiters, or to P7L21 represent strings of data (octets) within the delimited portions. P7L22 Octets are either represented directly by a character (using the US- P7L23 ASCII character for that octet [ASCII]) or by an escape encoding. P7L24 This representation is elaborated below. P7L25 P7L26 2.1 URI and non-ASCII characters P7L27 P7L28 The relationship between URI and characters has been a source of P7L29 confusion for characters that are not part of US-ASCII. To describe P7L30 the relationship, it is useful to distinguish between a "character" P7L31 (as a distinguishable semantic entity) and an "octet" (an 8-bit P7L32 byte). There are two mappings, one from URI characters to octets, and P7L33 a second from octets to original characters: P7L34 P7L35 URI character sequence->octet sequence->original character sequence P7L36 P7L37 A URI is represented as a sequence of characters, not as a sequence P7L38 of octets. That is because URI might be "transported" by means that P7L39 are not through a computer network, e.g., printed on paper, read over P7L40 the radio, etc. P7L41 P7L42 A URI scheme may define a mapping from URI characters to octets; P7L43 whether this is done depends on the scheme. Commonly, within a P7L44 delimited component of a URI, a sequence of characters may be used to P7L45 represent a sequence of octets. For example, the character "a" P7L46 represents the octet 97 (decimal), while the character sequence "%", P7L47 "0", "a" represents the octet 10 (decimal). P7L48 P8L1 There is a second translation for some resources: the sequence of P8L2 octets defined by a component of the URI is subsequently used to P8L3 represent a sequence of characters. A 'charset' defines this mapping. P8L4 There are many charsets in use in Internet protocols. For example, P8L5 UTF-8 [UTF-8] defines a mapping from sequences of octets to sequences P8L6 of characters in the repertoire of ISO 10646. P8L7 P8L8 In the simplest case, the original character sequence contains only P8L9 characters that are defined in US-ASCII, and the two levels of P8L10 mapping are simple and easily invertible: each 'original character' P8L11 is represented as the octet for the US-ASCII code for it, which is, P8L12 in turn, represented as either the US-ASCII character, or else the P8L13 "%" escape sequence for that octet. P8L14 P8L15 For original character sequences that contain non-ASCII characters, P8L16 however, the situation is more difficult. Internet protocols that P8L17 transmit octet sequences intended to represent character sequences P8L18 are expected to provide some way of identifying the charset used, if P8L19 there might be more than one [RFC2277]. However, there is currently P8L20 no provision within the generic URI syntax to accomplish this P8L21 identification. An individual URI scheme may require a single P8L22 charset, define a default charset, or provide a way to indicate the P8L23 charset used. P8L24 P8L25 It is expected that a systematic treatment of character encoding P8L26 within URI will be developed as a future modification of this P8L27 specification. P8L28 P8L29 2.2. Reserved Characters P8L30 P8L31 Many URI include components consisting of or delimited by, certain P8L32 special characters. These characters are called "reserved", since P8L33 their usage within the URI component is limited to their reserved P8L34 purpose. If the data for a URI component would conflict with the P8L35 reserved purpose, then the conflicting data must be escaped before P8L36 forming the URI. P8L37 P8L38 reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | P8L39 "$" | "," P8L40 P8L41 The "reserved" syntax class above refers to those characters that are P8L42 allowed within a URI, but which may not be allowed within a P8L43 particular component of the generic URI syntax; they are used as P8L44 delimiters of the components described in Section 3. P8L45 P8L46 P8L47 P8L48 P9L1 Characters in the "reserved" set are not reserved in all contexts. P9L2 The set of characters actually reserved within any given URI P9L3 component is defined by that component. In general, a character is P9L4 reserved if the semantics of the URI changes if the character is P9L5 replaced with its escaped US-ASCII encoding. P9L6 P9L7 2.3. Unreserved Characters P9L8 P9L9 Data characters that are allowed in a URI but do not have a reserved P9L10 purpose are called unreserved. These include upper and lower case P9L11 letters, decimal digits, and a limited set of punctuation marks and P9L12 symbols. P9L13 P9L14 unreserved = alphanum | mark P9L15 P9L16 mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" P9L17 P9L18 Unreserved characters can be escaped without changing the semantics P9L19 of the URI, but this should not be done unless the URI is being used P9L20 in a context that does not allow the unescaped character to appear. P9L21 P9L22 2.4. Escape Sequences P9L23 P9L24 Data must be escaped if it does not have a representation using an P9L25 unreserved character; this includes data that does not correspond to P9L26 a printable character of the US-ASCII coded character set, or that P9L27 corresponds to any US-ASCII character that is disallowed, as P9L28 explained below. P9L29 P9L30 2.4.1. Escaped Encoding P9L31 P9L32 An escaped octet is encoded as a character triplet, consisting of the P9L33 percent character "%" followed by the two hexadecimal digits P9L34 representing the octet code. For example, "%20" is the escaped P9L35 encoding for the US-ASCII space character. P9L36 P9L37 escaped = "%" hex hex P9L38 hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | P9L39 "a" | "b" | "c" | "d" | "e" | "f" P9L40 P9L41 2.4.2. When to Escape and Unescape P9L42 P9L43 A URI is always in an "escaped" form, since escaping or unescaping a P9L44 completed URI might change its semantics. Normally, the only time P9L45 escape encodings can safely be made is when the URI is being created P9L46 from its component parts; each component may have its own set of P9L47 characters that are reserved, so only the mechanism responsible for P9L48 generating or interpreting that component can determine whether or P10L1 not escaping a character will change its semantics. Likewise, a URI P10L2 must be separated into its components before the escaped characters P10L3 within those components can be safely decoded. P10L4 P10L5 In some cases, data that could be represented by an unreserved P10L6 character may appear escaped; for example, some of the unreserved P10L7 "mark" characters are automatically escaped by some systems. If the P10L8 given URI scheme defines a canonicalization algorithm, then P10L9 unreserved characters may be unescaped according to that algorithm. P10L10 For example, "%7e" is sometimes used instead of "~" in an http URL P10L11 path, but the two are equivalent for an http URL. P10L12 P10L13 Because the percent "%" character always has the reserved purpose of P10L14 being the escape indicator, it must be escaped as "%25" in order to P10L15 be used as data within a URI. Implementers should be careful not to P10L16 escape or unescape the same string more than once, since unescaping P10L17 an already unescaped string might lead to misinterpreting a percent P10L18 data character as another escaped character, or vice versa in the P10L19 case of escaping an already escaped string. P10L20 P10L21 2.4.3. Excluded US-ASCII Characters P10L22 P10L23 Although they are disallowed within the URI syntax, we include here a P10L24 description of those US-ASCII characters that have been excluded and P10L25 the reasons for their exclusion. P10L26 P10L27 The control characters in the US-ASCII coded character set are not P10L28 used within a URI, both because they are non-printable and because P10L29 they are likely to be misinterpreted by some control mechanisms. P10L30 P10L31 control = P10L32 P10L33 The space character is excluded because significant spaces may P10L34 disappear and insignificant spaces may be introduced when URI are P10L35 transcribed or typeset or subjected to the treatment of word- P10L36 processing programs. Whitespace is also used to delimit URI in many P10L37 contexts. P10L38 P10L39 space = P10L40 P10L41 The angle-bracket "<" and ">" and double-quote (") characters are P10L42 excluded because they are often used as the delimiters around URI in P10L43 text documents and protocol fields. The character "#" is excluded P10L44 because it is used to delimit a URI from a fragment identifier in URI P10L45 references (Section 4). The percent character "%" is excluded because P10L46 it is used for the encoding of escaped characters. P10L47 P10L48 delims = "<" | ">" | "#" | "%" | <"> P11L1 Other characters are excluded because gateways and other transport P11L2 agents are known to sometimes modify such characters, or they are P11L3 used as delimiters. P11L4 P11L5 unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`" P11L6 P11L7 Data corresponding to excluded characters must be escaped in order to P11L8 be properly represented within a URI. P11L9 P11L10 3. URI Syntactic Components P11L11 P11L12 The URI syntax is dependent upon the scheme. In general, absolute P11L13 URI are written as follows: P11L14 P11L15 : P11L16 P11L17 An absolute URI contains the name of the scheme being used () P11L18 followed by a colon (":") and then a string (the ) whose interpretation depends on the scheme. P11L20 P11L21 The URI syntax does not require that the scheme-specific-part have P11L22 any general structure or set of semantics which is common among all P11L23 URI. However, a subset of URI do share a common syntax for P11L24 representing hierarchical relationships within the namespace. This P11L25 "generic URI" syntax consists of a sequence of four main components: P11L26 P11L27 ://? P11L28 P11L29 each of which, except , may be absent from a particular URI. P11L30 For example, some URI schemes do not allow an component, P11L31 and others do not use a component. P11L32 P11L33 absoluteURI = scheme ":" ( hier_part | opaque_part ) P11L34 P11L35 URI that are hierarchical in nature use the slash "/" character for P11L36 separating hierarchical components. For some file systems, a "/" P11L37 character (used to denote the hierarchical structure of a URI) is the P11L38 delimiter used to construct a file name hierarchy, and thus the URI P11L39 path will look similar to a file pathname. This does NOT imply that P11L40 the resource is a file or that the URI maps to an actual filesystem P11L41 pathname. P11L42 P11L43 hier_part = ( net_path | abs_path ) [ "?" query ] P11L44 P11L45 net_path = "//" authority [ abs_path ] P11L46 P11L47 abs_path = "/" path_segments P11L48 P12L1 URI that do not make use of the slash "/" character for separating P12L2 hierarchical components are considered opaque by the generic URI P12L3 parser. P12L4 P12L5 opaque_part = uric_no_slash *uric P12L6 P12L7 uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" | P12L8 "&" | "=" | "+" | "$" | "," P12L9 P12L10 We use the term to refer to both the and P12L11 constructs, since they are mutually exclusive for any P12L12 given URI and can be parsed as a single component. P12L13 P12L14 3.1. Scheme Component P12L15 P12L16 Just as there are many different methods of access to resources, P12L17 there are a variety of schemes for identifying such resources. The P12L18 URI syntax consists of a sequence of components separated by reserved P12L19 characters, with the first component defining the semantics for the P12L20 remainder of the URI string. P12L21 P12L22 Scheme names consist of a sequence of characters beginning with a P12L23 lower case letter and followed by any combination of lower case P12L24 letters, digits, plus ("+"), period ("."), or hyphen ("-"). For P12L25 resiliency, programs interpreting URI should treat upper case letters P12L26 as equivalent to lower case in scheme names (e.g., allow "HTTP" as P12L27 well as "http"). P12L28 P12L29 scheme = alpha *( alpha | digit | "+" | "-" | "." ) P12L30 P12L31 Relative URI references are distinguished from absolute URI in that P12L32 they do not begin with a scheme name. Instead, the scheme is P12L33 inherited from the base URI, as described in Section 5.2. P12L34 P12L35 3.2. Authority Component P12L36 P12L37 Many URI schemes include a top hierarchical element for a naming P12L38 authority, such that the namespace defined by the remainder of the P12L39 URI is governed by that authority. This authority component is P12L40 typically defined by an Internet-based server or a scheme-specific P12L41 registry of naming authorities. P12L42 P12L43 authority = server | reg_name P12L44 P12L45 The authority component is preceded by a double slash "//" and is P12L46 terminated by the next slash "/", question-mark "?", or by the end of P12L47 the URI. Within the authority component, the characters ";", ":", P12L48 "@", "?", and "/" are reserved. P13L1 An authority component is not required for a URI scheme to make use P13L2 of relative references. A base URI without an authority component P13L3 implies that any relative reference will also be without an authority P13L4 component. P13L5 P13L6 3.2.1. Registry-based Naming Authority P13L7 P13L8 The structure of a registry-based naming authority is specific to the P13L9 URI scheme, but constrained to the allowed characters for an P13L10 authority component. P13L11 P13L12 reg_name = 1*( unreserved | escaped | "$" | "," | P13L13 ";" | ":" | "@" | "&" | "=" | "+" ) P13L14 P13L15 3.2.2. Server-based Naming Authority P13L16 P13L17 URL schemes that involve the direct use of an IP-based protocol to a P13L18 specified server on the Internet use a common syntax for the server P13L19 component of the URI's scheme-specific data: P13L20 P13L21 @: P13L22 P13L23 where may consist of a user name and, optionally, scheme- P13L24 specific information about how to gain authorization to access the P13L25 server. The parts "@" and ":" may be omitted. P13L26 P13L27 server = [ [ userinfo "@" ] hostport ] P13L28 P13L29 The user information, if present, is followed by a commercial at-sign P13L30 "@". P13L31 P13L32 userinfo = *( unreserved | escaped | P13L33 ";" | ":" | "&" | "=" | "+" | "$" | "," ) P13L34 P13L35 Some URL schemes use the format "user:password" in the userinfo P13L36 field. This practice is NOT RECOMMENDED, because the passing of P13L37 authentication information in clear text (such as URI) has proven to P13L38 be a security risk in almost every case where it has been used. P13L39 P13L40 The host is a domain name of a network host, or its IPv4 address as a P13L41 set of four decimal digit groups separated by ".". Literal IPv6 P13L42 addresses are not supported. P13L43 P13L44 hostport = host [ ":" port ] P13L45 host = hostname | IPv4address P13L46 hostname = *( domainlabel "." ) toplabel [ "." ] P13L47 domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum P13L48 toplabel = alpha | alpha *( alphanum | "-" ) alphanum P14L1 IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit P14L2 port = *digit P14L3 P14L4 Hostnames take the form described in Section 3 of [RFC1034] and P14L5 Section 2.1 of [RFC1123]: a sequence of domain labels separated by P14L6 ".", each domain label starting and ending with an alphanumeric P14L7 character and possibly also containing "-" characters. The rightmost P14L8 domain label of a fully qualified domain name will never start with a P14L9 digit, thus syntactically distinguishing domain names from IPv4 P14L10 addresses, and may be followed by a single "." if it is necessary to P14L11 distinguish between the complete domain name and any local domain. P14L12 To actually be "Uniform" as a resource locator, a URL hostname should P14L13 be a fully qualified domain name. In practice, however, the host P14L14 component may be a local domain literal. P14L15 P14L16 Note: A suitable representation for including a literal IPv6 P14L17 address as the host part of a URL is desired, but has not yet been P14L18 determined or implemented in practice. P14L19 P14L20 The port is the network port number for the server. Most schemes P14L21 designate protocols that have a default port number. Another port P14L22 number may optionally be supplied, in decimal, separated from the P14L23 host by a colon. If the port is omitted, the default port number is P14L24 assumed. P14L25 P14L26 3.3. Path Component P14L27 P14L28 The path component contains data, specific to the authority (or the P14L29 scheme if there is no authority component), identifying the resource P14L30 within the scope of that scheme and authority. P14L31 P14L32 path = [ abs_path | opaque_part ] P14L33 P14L34 path_segments = segment *( "/" segment ) P14L35 segment = *pchar *( ";" param ) P14L36 param = *pchar P14L37 P14L38 pchar = unreserved | escaped | P14L39 ":" | "@" | "&" | "=" | "+" | "$" | "," P14L40 P14L41 The path may consist of a sequence of path segments separated by a P14L42 single slash "/" character. Within a path segment, the characters P14L43 "/", ";", "=", and "?" are reserved. Each path segment may include a P14L44 sequence of parameters, indicated by the semicolon ";" character. P14L45 The parameters are not significant to the parsing of relative P14L46 references. P14L47 P14L48 P15L1 3.4. Query Component P15L2 P15L3 The query component is a string of information to be interpreted by P15L4 the resource. P15L5 P15L6 query = *uric P15L7 P15L8 Within a query component, the characters ";", "/", "?", ":", "@", P15L9 "&", "=", "+", ",", and "$" are reserved. P15L10 P15L11 4. URI References P15L12 P15L13 The term "URI-reference" is used here to denote the common usage of a P15L14 resource identifier. A URI reference may be absolute or relative, P15L15 and may have additional information attached in the form of a P15L16 fragment identifier. However, "the URI" that results from such a P15L17 reference includes only the absolute URI after the fragment P15L18 identifier (if any) is removed and after any relative URI is resolved P15L19 to its absolute form. Although it is possible to limit the P15L20 discussion of URI syntax and semantics to that of the absolute P15L21 result, most usage of URI is within general URI references, and it is P15L22 impossible to obtain the URI from such a reference without also P15L23 parsing the fragment and resolving the relative form. P15L24 P15L25 URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ] P15L26 P15L27 The syntax for relative URI is a shortened form of that for absolute P15L28 URI, where some prefix of the URI is missing and certain path P15L29 components ("." and "..") have a special meaning when, and only when, P15L30 interpreting a relative path. The relative URI syntax is defined in P15L31 Section 5. P15L32 P15L33 4.1. Fragment Identifier P15L34 P15L35 When a URI reference is used to perform a retrieval action on the P15L36 identified resource, the optional fragment identifier, separated from P15L37 the URI by a crosshatch ("#") character, consists of additional P15L38 reference information to be interpreted by the user agent after the P15L39 retrieval action has been successfully completed. As such, it is not P15L40 part of a URI, but is often used in conjunction with a URI. P15L41 P15L42 fragment = *uric P15L43 P15L44 The semantics of a fragment identifier is a property of the data P15L45 resulting from a retrieval action, regardless of the type of URI used P15L46 in the reference. Therefore, the format and interpretation of P15L47 fragment identifiers is dependent on the media type [RFC2046] of the P15L48 retrieval result. The character restrictions described in Section 2 P16L1 for URI also apply to the fragment in a URI-reference. Individual P16L2 media types may define additional restrictions or structure within P16L3 the fragment for specifying different types of "partial views" that P16L4 can be identified within that media type. P16L5 P16L6 A fragment identifier is only meaningful when a URI reference is P16L7 intended for retrieval and the result of that retrieval is a document P16L8 for which the identified fragment is consistently defined. P16L9 P16L10 4.2. Same-document References P16L11 P16L12 A URI reference that does not contain a URI is a reference to the P16L13 current document. In other words, an empty URI reference within a P16L14 document is interpreted as a reference to the start of that document, P16L15 and a reference containing only a fragment identifier is a reference P16L16 to the identified fragment of that document. Traversal of such a P16L17 reference should not result in an additional retrieval action. P16L18 However, if the URI reference occurs in a context that is always P16L19 intended to result in a new request, as in the case of HTML's FORM P16L20 element, then an empty URI reference represents the base URI of the P16L21 current document and should be replaced by that URI when transformed P16L22 into a request. P16L23 P16L24 4.3. Parsing a URI Reference P16L25 P16L26 A URI reference is typically parsed according to the four main P16L27 components and fragment identifier in order to determine what P16L28 components are present and whether the reference is relative or P16L29 absolute. The individual components are then parsed for their P16L30 subparts and, if not opaque, to verify their validity. P16L31 P16L32 Although the BNF defines what is allowed in each component, it is P16L33 ambiguous in terms of differentiating between an authority component P16L34 and a path component that begins with two slash characters. The P16L35 greedy algorithm is used for disambiguation: the left-most matching P16L36 rule soaks up as much of the URI reference string as it is capable of P16L37 matching. In other words, the authority component wins. P16L38 P16L39 Readers familiar with regular expressions should see Appendix B for a P16L40 concrete parsing example and test oracle. P16L41 P16L42 5. Relative URI References P16L43 P16L44 It is often the case that a group or "tree" of documents has been P16L45 constructed to serve a common purpose; the vast majority of URI in P16L46 these documents point to resources within the tree rather than P16L47 P16L48 P17L1 outside of it. Similarly, documents located at a particular site are P17L2 much more likely to refer to other resources at that site than to P17L3 resources at remote sites. P17L4 P17L5 Relative addressing of URI allows document trees to be partially P17L6 independent of their location and access scheme. For instance, it is P17L7 possible for a single set of hypertext documents to be simultaneously P17L8 accessible and traversable via each of the "file", "http", and "ftp" P17L9 schemes if the documents refer to each other using relative URI. P17L10 Furthermore, such document trees can be moved, as a whole, without P17L11 changing any of the relative references. Experience within the WWW P17L12 has demonstrated that the ability to perform relative referencing is P17L13 necessary for the long-term usability of embedded URI. P17L14 P17L15 The syntax for relative URI takes advantage of the syntax P17L16 of (Section 3) in order to express a reference that is P17L17 relative to the namespace of another hierarchical URI. P17L18 P17L19 relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ] P17L20 P17L21 A relative reference beginning with two slash characters is termed a P17L22 network-path reference, as defined by in Section 3. Such P17L23 references are rarely used. P17L24 P17L25 A relative reference beginning with a single slash character is P17L26 termed an absolute-path reference, as defined by in P17L27 Section 3. P17L28 P17L29 A relative reference that does not begin with a scheme name or a P17L30 slash character is termed a relative-path reference. P17L31 P17L32 rel_path = rel_segment [ abs_path ] P17L33 P17L34 rel_segment = 1*( unreserved | escaped | P17L35 ";" | "@" | "&" | "=" | "+" | "$" | "," ) P17L36 P17L37 Within a relative-path reference, the complete path segments "." and P17L38 ".." have special meanings: "the current hierarchy level" and "the P17L39 level above this hierarchy level", respectively. Although this is P17L40 very similar to their use within Unix-based filesystems to indicate P17L41 directory levels, these path components are only considered special P17L42 when resolving a relative-path reference to its absolute form P17L43 (Section 5.2). P17L44 P17L45 Authors should be aware that a path segment which contains a colon P17L46 character cannot be used as the first segment of a relative URI path P17L47 (e.g., "this:that"), because it would be mistaken for a scheme name. P17L48 P18L1 It is therefore necessary to precede such segments with other P18L2 segments (e.g., "./this:that") in order for them to be referenced as P18L3 a relative path. P18L4 P18L5 It is not necessary for all URI within a given scheme to be P18L6 restricted to the syntax, since the hierarchical P18L7 properties of that syntax are only necessary when relative URI are P18L8 used within a particular document. Documents can only make use of P18L9 relative URI when their base URI fits within the syntax. P18L10 It is assumed that any document which contains a relative reference P18L11 will also have a base URI that obeys the syntax. In other words, P18L12 relative URI cannot be used within a document that has an unsuitable P18L13 base URI. P18L14 P18L15 Some URI schemes do not allow a hierarchical syntax matching the P18L16 syntax, and thus cannot use relative references. P18L17 P18L18 5.1. Establishing a Base URI P18L19 P18L20 The term "relative URI" implies that there exists some absolute "base P18L21 URI" against which the relative reference is applied. Indeed, the P18L22 base URI is necessary to define the semantics of any relative URI P18L23 reference; without it, a relative reference is meaningless. In order P18L24 for relative URI to be usable within a document, the base URI of that P18L25 document must be known to the parser. P18L26 P18L27 The base URI of a document can be established in one of four ways, P18L28 listed below in order of precedence. The order of precedence can be P18L29 thought of in terms of layers, where the innermost defined base URI P18L30 has the highest precedence. This can be visualized graphically as: P18L31 P18L32 .----------------------------------------------------------. P18L33 | .----------------------------------------------------. | P18L34 | | .----------------------------------------------. | | P18L35 | | | .----------------------------------------. | | | P18L36 | | | | .----------------------------------. | | | | P18L37 | | | | | | | | | | P18L38 | | | | `----------------------------------' | | | | P18L39 | | | | (5.1.1) Base URI embedded in the | | | | P18L40 | | | | document's content | | | | P18L41 | | | `----------------------------------------' | | | P18L42 | | | (5.1.2) Base URI of the encapsulating entity | | | P18L43 | | | (message, document, or none). | | | P18L44 | | `----------------------------------------------' | | P18L45 | | (5.1.3) URI used to retrieve the entity | | P18L46 | `----------------------------------------------------' | P18L47 | (5.1.4) Default Base URI is application-dependent | P18L48 `----------------------------------------------------------' P19L1 5.1.1. Base URI within Document Content P19L2 P19L3 Within certain document media types, the base URI of the document can P19L4 be embedded within the content itself such that it can be readily P19L5 obtained by a parser. This can be useful for descriptive documents, P19L6 such as tables of content, which may be transmitted to others through P19L7 protocols other than their usual retrieval context (e.g., E-Mail or P19L8 USENET news). P19L9 P19L10 It is beyond the scope of this document to specify how, for each P19L11 media type, the base URI can be embedded. It is assumed that user P19L12 agents manipulating such media types will be able to obtain the P19L13 appropriate syntax from that media type's specification. An example P19L14 of how the base URI can be embedded in the Hypertext Markup Language P19L15 (HTML) [RFC1866] is provided in Appendix D. P19L16 P19L17 A mechanism for embedding the base URI within MIME container types P19L18 (e.g., the message and multipart types) is defined by MHTML P19L19 [RFC2110]. Protocols that do not use the MIME message header syntax, P19L20 but which do allow some form of tagged metainformation to be included P19L21 within messages, may define their own syntax for defining the base P19L22 URI as part of a message. P19L23 P19L24 5.1.2. Base URI from the Encapsulating Entity P19L25 P19L26 If no base URI is embedded, the base URI of a document is defined by P19L27 the document's retrieval context. For a document that is enclosed P19L28 within another entity (such as a message or another document), the P19L29 retrieval context is that entity; thus, the default base URI of the P19L30 document is the base URI of the entity in which the document is P19L31 encapsulated. P19L32 P19L33 5.1.3. Base URI from the Retrieval URI P19L34 P19L35 If no base URI is embedded and the document is not encapsulated P19L36 within some other entity (e.g., the top level of a composite entity), P19L37 then, if a URI was used to retrieve the base document, that URI shall P19L38 be considered the base URI. Note that if the retrieval was the P19L39 result of a redirected request, the last URI used (i.e., that which P19L40 resulted in the actual retrieval of the document) is the base URI. P19L41 P19L42 5.1.4. Default Base URI P19L43 P19L44 If none of the conditions described in Sections 5.1.1--5.1.3 apply, P19L45 then the base URI is defined by the context of the application. P19L46 Since this definition is necessarily application-dependent, failing P19L47 P19L48 P20L1 to define the base URI using one of the other methods may result in P20L2 the same content being interpreted differently by different types of P20L3 application. P20L4 P20L5 It is the responsibility of the distributor(s) of a document P20L6 containing relative URI to ensure that the base URI for that document P20L7 can be established. It must be emphasized that relative URI cannot P20L8 be used reliably in situations where the document's base URI is not P20L9 well-defined. P20L10 P20L11 5.2. Resolving Relative References to Absolute Form P20L12 P20L13 This section describes an example algorithm for resolving URI P20L14 references that might be relative to a given base URI. P20L15 P20L16 The base URI is established according to the rules of Section 5.1 and P20L17 parsed into the four main components as described in Section 3. Note P20L18 that only the scheme component is required to be present in the base P20L19 URI; the other components may be empty or undefined. A component is P20L20 undefined if its preceding separator does not appear in the URI P20L21 reference; the path component is never undefined, though it may be P20L22 empty. The base URI's query component is not used by the resolution P20L23 algorithm and may be discarded. P20L24 P20L25 For each URI reference, the following steps are performed in order: P20L26 P20L27 1) The URI reference is parsed into the potential four components and P20L28 fragment identifier, as described in Section 4.3. P20L29 P20L30 2) If the path component is empty and the scheme, authority, and P20L31 query components are undefined, then it is a reference to the P20L32 current document and we are done. Otherwise, the reference URI's P20L33 query and fragment components are defined as found (or not found) P20L34 within the URI reference and not inherited from the base URI. P20L35 P20L36 3) If the scheme component is defined, indicating that the reference P20L37 starts with a scheme name, then the reference is interpreted as an P20L38 absolute URI and we are done. Otherwise, the reference URI's P20L39 scheme is inherited from the base URI's scheme component. P20L40 P20L41 Due to a loophole in prior specifications [RFC1630], some parsers P20L42 allow the scheme name to be present in a relative URI if it is the P20L43 same as the base URI scheme. Unfortunately, this can conflict P20L44 with the correct parsing of non-hierarchical URI. For backwards P20L45 compatibility, an implementation may work around such references P20L46 by removing the scheme if it matches that of the base URI and the P20L47 scheme is known to always use the syntax. The parser P20L48 P21L1 can then continue with the steps below for the remainder of the P21L2 reference components. Validating parsers should mark such a P21L3 misformed relative reference as an error. P21L4 P21L5 4) If the authority component is defined, then the reference is a P21L6 network-path and we skip to step 7. Otherwise, the reference P21L7 URI's authority is inherited from the base URI's authority P21L8 component, which will also be undefined if the URI scheme does not P21L9 use an authority component. P21L10 P21L11 5) If the path component begins with a slash character ("/"), then P21L12 the reference is an absolute-path and we skip to step 7. P21L13 P21L14 6) If this step is reached, then we are resolving a relative-path P21L15 reference. The relative path needs to be merged with the base P21L16 URI's path. Although there are many ways to do this, we will P21L17 describe a simple method using a separate string buffer. P21L18 P21L19 a) All but the last segment of the base URI's path component is P21L20 copied to the buffer. In other words, any characters after the P21L21 last (right-most) slash character, if any, are excluded. P21L22 P21L23 b) The reference's path component is appended to the buffer P21L24 string. P21L25 P21L26 c) All occurrences of "./", where "." is a complete path segment, P21L27 are removed from the buffer string. P21L28 P21L29 d) If the buffer string ends with "." as a complete path segment, P21L30 that "." is removed. P21L31 P21L32 e) All occurrences of "/../", where is a P21L33 complete path segment not equal to "..", are removed from the P21L34 buffer string. Removal of these path segments is performed P21L35 iteratively, removing the leftmost matching pattern on each P21L36 iteration, until no matching pattern remains. P21L37 P21L38 f) If the buffer string ends with "/..", where P21L39 is a complete path segment not equal to "..", that P21L40 "/.." is removed. P21L41 P21L42 g) If the resulting buffer string still begins with one or more P21L43 complete path segments of "..", then the reference is P21L44 considered to be in error. Implementations may handle this P21L45 error by retaining these components in the resolved path (i.e., P21L46 treating them as part of the final URI), by removing them from P21L47 the resolved path (i.e., discarding relative levels above the P21L48 root), or by avoiding traversal of the reference. P22L1 h) The remaining buffer string is the reference URI's new path P22L2 component. P22L3 P22L4 7) The resulting URI components, including any inherited from the P22L5 base URI, are recombined to give the absolute form of the URI P22L6 reference. Using pseudocode, this would be P22L7 P22L8 result = "" P22L9 P22L10 if scheme is defined then P22L11 append scheme to result P22L12 append ":" to result P22L13 P22L14 if authority is defined then P22L15 append "//" to result P22L16 append authority to result P22L17 P22L18 append path to result P22L19 P22L20 if query is defined then P22L21 append "?" to result P22L22 append query to result P22L23 P22L24 if fragment is defined then P22L25 append "#" to result P22L26 append fragment to result P22L27 P22L28 return result P22L29 P22L30 Note that we must be careful to preserve the distinction between a P22L31 component that is undefined, meaning that its separator was not P22L32 present in the reference, and a component that is empty, meaning P22L33 that the separator was present and was immediately followed by the P22L34 next component separator or the end of the reference. P22L35 P22L36 The above algorithm is intended to provide an example by which the P22L37 output of implementations can be tested -- implementation of the P22L38 algorithm itself is not required. For example, some systems may find P22L39 it more efficient to implement step 6 as a pair of segment stacks P22L40 being merged, rather than as a series of string pattern replacements. P22L41 P22L42 Note: Some WWW client applications will fail to separate the P22L43 reference's query component from its path component before merging P22L44 the base and reference paths in step 6 above. This may result in P22L45 a loss of information if the query component contains the strings P22L46 "/../" or "/./". P22L47 P22L48 Resolution examples are provided in Appendix C. P23L1 6. URI Normalization and Equivalence P23L2 P23L3 In many cases, different URI strings may actually identify the P23L4 identical resource. For example, the host names used in URL are P23L5 actually case insensitive, and the URL is P23L6 equivalent to . In general, the rules for P23L7 equivalence and definition of a normal form, if any, are scheme P23L8 dependent. When a scheme uses elements of the common syntax, it will P23L9 also use the common syntax equivalence rules, namely that the scheme P23L10 and hostname are case insensitive and a URL with an explicit ":port", P23L11 where the port is the default for the scheme, is equivalent to one P23L12 where the port is elided. P23L13 P23L14 7. Security Considerations P23L15 P23L16 A URI does not in itself pose a security threat. Users should beware P23L17 that there is no general guarantee that a URL, which at one time P23L18 located a given resource, will continue to do so. Nor is there any P23L19 guarantee that a URL will not locate a different resource at some P23L20 later point in time, due to the lack of any constraint on how a given P23L21 authority apportions its namespace. Such a guarantee can only be P23L22 obtained from the person(s) controlling that namespace and the P23L23 resource in question. A specific URI scheme may include additional P23L24 semantics, such as name persistence, if those semantics are required P23L25 of all naming authorities for that scheme. P23L26 P23L27 It is sometimes possible to construct a URL such that an attempt to P23L28 perform a seemingly harmless, idempotent operation, such as the P23L29 retrieval of an entity associated with the resource, will in fact P23L30 cause a possibly damaging remote operation to occur. The unsafe URL P23L31 is typically constructed by specifying a port number other than that P23L32 reserved for the network protocol in question. The client P23L33 unwittingly contacts a site that is in fact running a different P23L34 protocol. The content of the URL contains instructions that, when P23L35 interpreted according to this other protocol, cause an unexpected P23L36 operation. An example has been the use of a gopher URL to cause an P23L37 unintended or impersonating message to be sent via a SMTP server. P23L38 P23L39 Caution should be used when using any URL that specifies a port P23L40 number other than the default for the protocol, especially when it is P23L41 a number within the reserved space. P23L42 P23L43 Care should be taken when a URL contains escaped delimiters for a P23L44 given protocol (for example, CR and LF characters for telnet P23L45 protocols) that these are not unescaped before transmission. This P23L46 might violate the protocol, but avoids the potential for such P23L47 P23L48 P24L1 characters to be used to simulate an extra operation or parameter in P24L2 that protocol, which might lead to an unexpected and possibly harmful P24L3 remote operation to be performed. P24L4 P24L5 It is clearly unwise to use a URL that contains a password which is P24L6 intended to be secret. In particular, the use of a password within P24L7 the 'userinfo' component of a URL is strongly disrecommended except P24L8 in those rare cases where the 'password' parameter is intended to be P24L9 public. P24L10 P24L11 8. Acknowledgements P24L12 P24L13 This document was derived from RFC 1738 [RFC1738] and RFC 1808 P24L14 [RFC1808]; the acknowledgements in those specifications still apply. P24L15 In addition, contributions by Gisle Aas, Martin Beet, Martin Duerst, P24L16 Jim Gettys, Martijn Koster, Dave Kristol, Daniel LaLiberte, Foteos P24L17 Macrides, James Marshall, Ryan Moats, Keith Moore, and Lauren Wood P24L18 are gratefully acknowledged. P24L19 P24L20 9. References P24L21 P24L22 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and P24L23 Languages", BCP 18, RFC 2277, January 1998. P24L24 P24L25 [RFC1630] Berners-Lee, T., "Universal Resource Identifiers in WWW: A P24L26 Unifying Syntax for the Expression of Names and Addresses P24L27 of Objects on the Network as used in the World-Wide Web", P24L28 RFC 1630, June 1994. P24L29 P24L30 [RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, Editors, P24L31 "Uniform Resource Locators (URL)", RFC 1738, December 1994. P24L32 P24L33 [RFC1866] Berners-Lee T., and D. Connolly, "HyperText Markup Language P24L34 Specification -- 2.0", RFC 1866, November 1995. P24L35 P24L36 [RFC1123] Braden, R., Editor, "Requirements for Internet Hosts -- P24L37 Application and Support", STD 3, RFC 1123, October 1989. P24L38 P24L39 [RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text P24L40 Messages", STD 11, RFC 822, August 1982. P24L41 P24L42 [RFC1808] Fielding, R., "Relative Uniform Resource Locators", RFC P24L43 1808, June 1995. P24L44 P24L45 [RFC2046] Freed, N., and N. Borenstein, "Multipurpose Internet Mail P24L46 Extensions (MIME) Part Two: Media Types", RFC 2046, P24L47 November 1996. P24L48 P25L1 [RFC1736] Kunze, J., "Functional Recommendations for Internet P25L2 Resource Locators", RFC 1736, February 1995. P25L3 P25L4 [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. P25L5 P25L6 [RFC1034] Mockapetris, P., "Domain Names - Concepts and Facilities", P25L7 STD 13, RFC 1034, November 1987. P25L8 P25L9 [RFC2110] Palme, J., and A. Hopmann, "MIME E-mail Encapsulation of P25L10 Aggregate Documents, such as HTML (MHTML)", RFC 2110, March P25L11 1997. P25L12 P25L13 [RFC1737] Sollins, K., and L. Masinter, "Functional Requirements for P25L14 Uniform Resource Names", RFC 1737, December 1994. P25L15 P25L16 [ASCII] US-ASCII. "Coded Character Set -- 7-bit American Standard P25L17 Code for Information Interchange", ANSI X3.4-1986. P25L18 P25L19 [UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO 10646", P25L20 RFC 2279, January 1998. P25L21 P25L22 P25L23 P25L24 P25L25 P25L26 P25L27 P25L28 P25L29 P25L30 P25L31 P25L32 P25L33 P25L34 P25L35 P25L36 P25L37 P25L38 P25L39 P25L40 P25L41 P25L42 P25L43 P25L44 P25L45 P25L46 P25L47 P25L48 P26L1 10. Authors' Addresses P26L2 P26L3 Tim Berners-Lee P26L4 World Wide Web Consortium P26L5 MIT Laboratory for Computer Science, NE43-356 P26L6 545 Technology Square P26L7 Cambridge, MA 02139 P26L8 P26L9 Fax: +1(617)258-8682 P26L10 EMail: timbl@w3.org P26L11 P26L12 P26L13 Roy T. Fielding P26L14 Department of Information and Computer Science P26L15 University of California, Irvine P26L16 Irvine, CA 92697-3425 P26L17 P26L18 Fax: +1(949)824-1715 P26L19 EMail: fielding@ics.uci.edu P26L20 P26L21 P26L22 Larry Masinter P26L23 Xerox PARC P26L24 3333 Coyote Hill Road P26L25 Palo Alto, CA 94034 P26L26 P26L27 Fax: +1(415)812-4333 P26L28 EMail: masinter@parc.xerox.com P26L29 P26L30 P26L31 P26L32 P26L33 P26L34 P26L35 P26L36 P26L37 P26L38 P26L39 P26L40 P26L41 P26L42 P26L43 P26L44 P26L45 P26L46 P26L47 P26L48 P27L1 A. Collected BNF for URI P27L2 P27L3 URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ] P27L4 absoluteURI = scheme ":" ( hier_part | opaque_part ) P27L5 relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ] P27L6 P27L7 hier_part = ( net_path | abs_path ) [ "?" query ] P27L8 opaque_part = uric_no_slash *uric P27L9 P27L10 uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" | P27L11 "&" | "=" | "+" | "$" | "," P27L12 P27L13 net_path = "//" authority [ abs_path ] P27L14 abs_path = "/" path_segments P27L15 rel_path = rel_segment [ abs_path ] P27L16 P27L17 rel_segment = 1*( unreserved | escaped | P27L18 ";" | "@" | "&" | "=" | "+" | "$" | "," ) P27L19 P27L20 scheme = alpha *( alpha | digit | "+" | "-" | "." ) P27L21 P27L22 authority = server | reg_name P27L23 P27L24 reg_name = 1*( unreserved | escaped | "$" | "," | P27L25 ";" | ":" | "@" | "&" | "=" | "+" ) P27L26 P27L27 server = [ [ userinfo "@" ] hostport ] P27L28 userinfo = *( unreserved | escaped | P27L29 ";" | ":" | "&" | "=" | "+" | "$" | "," ) P27L30 P27L31 hostport = host [ ":" port ] P27L32 host = hostname | IPv4address P27L33 hostname = *( domainlabel "." ) toplabel [ "." ] P27L34 domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum P27L35 toplabel = alpha | alpha *( alphanum | "-" ) alphanum P27L36 IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit P27L37 port = *digit P27L38 P27L39 path = [ abs_path | opaque_part ] P27L40 path_segments = segment *( "/" segment ) P27L41 segment = *pchar *( ";" param ) P27L42 param = *pchar P27L43 pchar = unreserved | escaped | P27L44 ":" | "@" | "&" | "=" | "+" | "$" | "," P27L45 P27L46 query = *uric P27L47 P27L48 fragment = *uric P28L1 uric = reserved | unreserved | escaped P28L2 reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | P28L3 "$" | "," P28L4 unreserved = alphanum | mark P28L5 mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | P28L6 "(" | ")" P28L7 P28L8 escaped = "%" hex hex P28L9 hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | P28L10 "a" | "b" | "c" | "d" | "e" | "f" P28L11 P28L12 alphanum = alpha | digit P28L13 alpha = lowalpha | upalpha P28L14 P28L15 lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | P28L16 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | P28L17 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" P28L18 upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | P28L19 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | P28L20 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" P28L21 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | P28L22 "8" | "9" P28L23 P28L24 P28L25 P28L26 P28L27 P28L28 P28L29 P28L30 P28L31 P28L32 P28L33 P28L34 P28L35 P28L36 P28L37 P28L38 P28L39 P28L40 P28L41 P28L42 P28L43 P28L44 P28L45 P28L46 P28L47 P28L48 P29L1 B. Parsing a URI Reference with a Regular Expression P29L2 P29L3 As described in Section 4.3, the generic URI syntax is not sufficient P29L4 to disambiguate the components of some forms of URI. Since the P29L5 "greedy algorithm" described in that section is identical to the P29L6 disambiguation method used by POSIX regular expressions, it is P29L7 natural and commonplace to use a regular expression for parsing the P29L8 potential four components and fragment identifier of a URI reference. P29L9 P29L10 The following line is the regular expression for breaking-down a URI P29L11 reference into its components. P29L12 P29L13 ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? P29L14 12 3 4 5 6 7 8 9 P29L15 P29L16 The numbers in the second line above are only to assist readability; P29L17 they indicate the reference points for each subexpression (i.e., each P29L18 paired parenthesis). We refer to the value matched for subexpression P29L19 as $. For example, matching the above expression to P29L20 P29L21 http://www.ics.uci.edu/pub/ietf/uri/#Related P29L22 P29L23 results in the following subexpression matches: P29L24 P29L25 $1 = http: P29L26 $2 = http P29L27 $3 = //www.ics.uci.edu P29L28 $4 = www.ics.uci.edu P29L29 $5 = /pub/ietf/uri/ P29L30 $6 = P29L31 $7 = P29L32 $8 = #Related P29L33 $9 = Related P29L34 P29L35 where indicates that the component is not present, as is P29L36 the case for the query component in the above example. Therefore, we P29L37 can determine the value of the four components and fragment as P29L38 P29L39 scheme = $2 P29L40 authority = $4 P29L41 path = $5 P29L42 query = $7 P29L43 fragment = $9 P29L44 P29L45 and, going in the opposite direction, we can recreate a URI reference P29L46 from its components using the algorithm in step 7 of Section 5.2. P29L47 P29L48 P30L1 C. Examples of Resolving Relative URI References P30L2 P30L3 Within an object with a well-defined base URI of P30L4 P30L5 http://a/b/c/d;p?q P30L6 P30L7 the relative URI would be resolved as follows: P30L8 P30L9 C.1. Normal Examples P30L10 P30L11 g:h = g:h P30L12 g = http://a/b/c/g P30L13 ./g = http://a/b/c/g P30L14 g/ = http://a/b/c/g/ P30L15 /g = http://a/g P30L16 //g = http://g P30L17 ?y = http://a/b/c/?y P30L18 g?y = http://a/b/c/g?y P30L19 #s = (current document)#s P30L20 g#s = http://a/b/c/g#s P30L21 g?y#s = http://a/b/c/g?y#s P30L22 ;x = http://a/b/c/;x P30L23 g;x = http://a/b/c/g;x P30L24 g;x?y#s = http://a/b/c/g;x?y#s P30L25 . = http://a/b/c/ P30L26 ./ = http://a/b/c/ P30L27 .. = http://a/b/ P30L28 ../ = http://a/b/ P30L29 ../g = http://a/b/g P30L30 ../.. = http://a/ P30L31 ../../ = http://a/ P30L32 ../../g = http://a/g P30L33 P30L34 C.2. Abnormal Examples P30L35 P30L36 Although the following abnormal examples are unlikely to occur in P30L37 normal practice, all URI parsers should be capable of resolving them P30L38 consistently. Each example uses the same base as above. P30L39 P30L40 An empty reference refers to the start of the current document. P30L41 P30L42 <> = (current document) P30L43 P30L44 Parsers must be careful in handling the case where there are more P30L45 relative path ".." segments than there are hierarchical levels in the P30L46 base URI's path. Note that the ".." syntax cannot be used to change P30L47 the authority component of a URI. P30L48 P31L1 ../../../g = http://a/../g P31L2 ../../../../g = http://a/../../g P31L3 P31L4 In practice, some implementations strip leading relative symbolic P31L5 elements (".", "..") after applying a relative URI calculation, based P31L6 on the theory that compensating for obvious author errors is better P31L7 than allowing the request to fail. Thus, the above two references P31L8 will be interpreted as "http://a/g" by some implementations. P31L9 P31L10 Similarly, parsers must avoid treating "." and ".." as special when P31L11 they are not complete components of a relative path. P31L12 P31L13 /./g = http://a/./g P31L14 /../g = http://a/../g P31L15 g. = http://a/b/c/g. P31L16 .g = http://a/b/c/.g P31L17 g.. = http://a/b/c/g.. P31L18 ..g = http://a/b/c/..g P31L19 P31L20 Less likely are cases where the relative URI uses unnecessary or P31L21 nonsensical forms of the "." and ".." complete path segments. P31L22 P31L23 ./../g = http://a/b/g P31L24 ./g/. = http://a/b/c/g/ P31L25 g/./h = http://a/b/c/g/h P31L26 g/../h = http://a/b/c/h P31L27 g;x=1/./y = http://a/b/c/g;x=1/y P31L28 g;x=1/../y = http://a/b/c/y P31L29 P31L30 All client applications remove the query component from the base URI P31L31 before resolving relative URI. However, some applications fail to P31L32 separate the reference's query and/or fragment components from a P31L33 relative path before merging it with the base path. This error is P31L34 rarely noticed, since typical usage of a fragment never includes the P31L35 hierarchy ("/") character, and the query component is not normally P31L36 used within relative references. P31L37 P31L38 g?y/./x = http://a/b/c/g?y/./x P31L39 g?y/../x = http://a/b/c/g?y/../x P31L40 g#s/./x = http://a/b/c/g#s/./x P31L41 g#s/../x = http://a/b/c/g#s/../x P31L42 P31L43 P31L44 P31L45 P31L46 P31L47 P31L48 P32L1 Some parsers allow the scheme name to be present in a relative URI if P32L2 it is the same as the base URI scheme. This is considered to be a P32L3 loophole in prior specifications of partial URI [RFC1630]. Its use P32L4 should be avoided. P32L5 P32L6 http:g = http:g ; for validating parsers P32L7 | http://a/b/c/g ; for backwards compatibility P32L8 P32L9 P32L10 P32L11 P32L12 P32L13 P32L14 P32L15 P32L16 P32L17 P32L18 P32L19 P32L20 P32L21 P32L22 P32L23 P32L24 P32L25 P32L26 P32L27 P32L28 P32L29 P32L30 P32L31 P32L32 P32L33 P32L34 P32L35 P32L36 P32L37 P32L38 P32L39 P32L40 P32L41 P32L42 P32L43 P32L44 P32L45 P32L46 P32L47 P32L48 P33L1 D. Embedding the Base URI in HTML documents P33L2 P33L3 It is useful to consider an example of how the base URI of a document P33L4 can be embedded within the document's content. In this appendix, we P33L5 describe how documents written in the Hypertext Markup Language P33L6 (HTML) [RFC1866] can include an embedded base URI. This appendix P33L7 does not form a part of the URI specification and should not be P33L8 considered as anything more than a descriptive example. P33L9 P33L10 HTML defines a special element "BASE" which, when present in the P33L11 "HEAD" portion of a document, signals that the parser should use the P33L12 BASE element's "HREF" attribute as the base URI for resolving any P33L13 relative URI. The "HREF" attribute must be an absolute URI. Note P33L14 that, in HTML, element and attribute names are case-insensitive. For P33L15 example: P33L16 P33L17 P33L18 P33L19 An example HTML document P33L20 P33L21 P33L22 ... a hypertext anchor ... P33L23 P33L24 P33L25 A parser reading the example document should interpret the given P33L26 relative URI "../x" as representing the absolute URI P33L27 P33L28 P33L29 P33L30 regardless of the context in which the example document was obtained. P33L31 P33L32 P33L33 P33L34 P33L35 P33L36 P33L37 P33L38 P33L39 P33L40 P33L41 P33L42 P33L43 P33L44 P33L45 P33L46 P33L47 P33L48 P34L1 E. Recommendations for Delimiting URI in Context P34L2 P34L3 URI are often transmitted through formats that do not provide a clear P34L4 context for their interpretation. For example, there are many P34L5 occasions when URI are included in plain text; examples include text P34L6 sent in electronic mail, USENET news messages, and, most importantly, P34L7 printed on paper. In such cases, it is important to be able to P34L8 delimit the URI from the rest of the text, and in particular from P34L9 punctuation marks that might be mistaken for part of the URI. P34L10 P34L11 In practice, URI are delimited in a variety of ways, but usually P34L12 within double-quotes "http://test.com/", angle brackets P34L13 , or just using whitespace P34L14 P34L15 http://test.com/ P34L16 P34L17 These wrappers do not form part of the URI. P34L18 P34L19 In the case where a fragment identifier is associated with a URI P34L20 reference, the fragment would be placed within the brackets as well P34L21 (separated from the URI with a "#" character). P34L22 P34L23 In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may P34L24 need to be added to break long URI across lines. The whitespace P34L25 should be ignored when extracting the URI. P34L26 P34L27 No whitespace should be introduced after a hyphen ("-") character. P34L28 Because some typesetters and printers may (erroneously) introduce a P34L29 hyphen at the end of line when breaking a line, the interpreter of a P34L30 URI containing a line break immediately after a hyphen should ignore P34L31 all unescaped whitespace around the line break, and should be aware P34L32 that the hyphen may or may not actually be part of the URI. P34L33 P34L34 Using <> angle brackets around each URI is especially recommended as P34L35 a delimiting style for URI that contain whitespace. P34L36 P34L37 The prefix "URL:" (with or without a trailing space) was recommended P34L38 as a way to used to help distinguish a URL from other bracketed P34L39 designators, although this is not common in practice. P34L40 P34L41 For robustness, software that accepts user-typed URI should attempt P34L42 to recognize and strip both delimiters and embedded whitespace. P34L43 P34L44 For example, the text: P34L45 P34L46 P34L47 P34L48 P35L1 Yes, Jim, I found it under "http://www.w3.org/Addressing/", P35L2 but you can probably pick it up from . Note the warning in . P35L5 P35L6 contains the URI references P35L7 P35L8 http://www.w3.org/Addressing/ P35L9 ftp://ds.internic.net/rfc/ P35L10 http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING P35L11 P35L12 P35L13 P35L14 P35L15 P35L16 P35L17 P35L18 P35L19 P35L20 P35L21 P35L22 P35L23 P35L24 P35L25 P35L26 P35L27 P35L28 P35L29 P35L30 P35L31 P35L32 P35L33 P35L34 P35L35 P35L36 P35L37 P35L38 P35L39 P35L40 P35L41 P35L42 P35L43 P35L44 P35L45 P35L46 P35L47 P35L48 P36L1 F. Abbreviated URLs P36L2 P36L3 The URL syntax was designed for unambiguous reference to network P36L4 resources and extensibility via the URL scheme. However, as URL P36L5 identification and usage have become commonplace, traditional media P36L6 (television, radio, newspapers, billboards, etc.) have increasingly P36L7 used abbreviated URL references. That is, a reference consisting of P36L8 only the authority and path portions of the identified resource, such P36L9 as P36L10 P36L11 www.w3.org/Addressing/ P36L12 P36L13 or simply the DNS hostname on its own. Such references are primarily P36L14 intended for human interpretation rather than machine, with the P36L15 assumption that context-based heuristics are sufficient to complete P36L16 the URL (e.g., most hostnames beginning with "www" are likely to have P36L17 a URL prefix of "http://"). Although there is no standard set of P36L18 heuristics for disambiguating abbreviated URL references, many client P36L19 implementations allow them to be entered by the user and P36L20 heuristically resolved. It should be noted that such heuristics may P36L21 change over time, particularly when new URL schemes are introduced. P36L22 P36L23 Since an abbreviated URL has the same syntax as a relative URL path, P36L24 abbreviated URL references cannot be used in contexts where relative P36L25 URLs are expected. This limits the use of abbreviated URLs to places P36L26 where there is no defined base URL, such as dialog boxes and off-line P36L27 advertisements. P36L28 P36L29 P36L30 P36L31 P36L32 P36L33 P36L34 P36L35 P36L36 P36L37 P36L38 P36L39 P36L40 P36L41 P36L42 P36L43 P36L44 P36L45 P36L46 P36L47 P36L48 P37L1 G. Summary of Non-editorial Changes P37L2 P37L3 G.1. Additions P37L4 P37L5 Section 4 (URI References) was added to stem the confusion regarding P37L6 "what is a URI" and how to describe fragment identifiers given that P37L7 they are not part of the URI, but are part of the URI syntax and P37L8 parsing concerns. In addition, it provides a reference definition P37L9 for use by other IETF specifications (HTML, HTTP, etc.) that have P37L10 previously attempted to redefine the URI syntax in order to account P37L11 for the presence of fragment identifiers in URI references. P37L12 P37L13 Section 2.4 was rewritten to clarify a number of misinterpretations P37L14 and to leave room for fully internationalized URI. P37L15 P37L16 Appendix F on abbreviated URLs was added to describe the shortened P37L17 references often seen on television and magazine advertisements and P37L18 explain why they are not used in other contexts. P37L19 P37L20 G.2. Modifications from both RFC 1738 and RFC 1808 P37L21 P37L22 Changed to URI syntax instead of just URL. P37L23 P37L24 Confusion regarding the terms "character encoding", the URI P37L25 "character set", and the escaping of characters with % P37L26 equivalents has (hopefully) been reduced. Many of the BNF rule names P37L27 regarding the character sets have been changed to more accurately P37L28 describe their purpose and to encompass all "characters" rather than P37L29 just US-ASCII octets. Unless otherwise noted here, these P37L30 modifications do not affect the URI syntax. P37L31 P37L32 Both RFC 1738 and RFC 1808 refer to the "reserved" set of characters P37L33 as if URI-interpreting software were limited to a single set of P37L34 characters with a reserved purpose (i.e., as meaning something other P37L35 than the data to which the characters correspond), and that this set P37L36 was fixed by the URI scheme. However, this has not been true in P37L37 practice; any character that is interpreted differently when it is P37L38 escaped is, in effect, reserved. Furthermore, the interpreting P37L39 engine on a HTTP server is often dependent on the resource, not just P37L40 the URI scheme. The description of reserved characters has been P37L41 changed accordingly. P37L42 P37L43 The plus "+", dollar "$", and comma "," characters have been added to P37L44 those in the "reserved" set, since they are treated as reserved P37L45 within the query component. P37L46 P37L47 P37L48 P38L1 The tilde "~" character was added to those in the "unreserved" set, P38L2 since it is extensively used on the Internet in spite of the P38L3 difficulty to transcribe it with some keyboards. P38L4 P38L5 The syntax for URI scheme has been changed to require that all P38L6 schemes begin with an alpha character. P38L7 P38L8 The "user:password" form in the previous BNF was changed to a P38L9 "userinfo" token, and the possibility that it might be P38L10 "user:password" made scheme specific. In particular, the use of P38L11 passwords in the clear is not even suggested by the syntax. P38L12 P38L13 The question-mark "?" character was removed from the set of allowed P38L14 characters for the userinfo in the authority component, since testing P38L15 showed that many applications treat it as reserved for separating the P38L16 query component from the rest of the URI. P38L17 P38L18 The semicolon ";" character was added to those stated as being P38L19 reserved within the authority component, since several new schemes P38L20 are using it as a separator within userinfo to indicate the type of P38L21 user authentication. P38L22 P38L23 RFC 1738 specified that the path was separated from the authority P38L24 portion of a URI by a slash. RFC 1808 followed suit, but with a P38L25 fudge of carrying around the separator as a "prefix" in order to P38L26 describe the parsing algorithm. RFC 1630 never had this problem, P38L27 since it considered the slash to be part of the path. In writing P38L28 this specification, it was found to be impossible to accurately P38L29 describe and retain the difference between the two URI P38L30 and P38L31 without either considering the slash to be part of the path (as P38L32 corresponds to actual practice) or creating a separate component just P38L33 to hold that slash. We chose the former. P38L34 P38L35 G.3. Modifications from RFC 1738 P38L36 P38L37 The definition of specific URL schemes and their scheme-specific P38L38 syntax and semantics has been moved to separate documents. P38L39 P38L40 The URL host was defined as a fully-qualified domain name. However, P38L41 many URLs are used without fully-qualified domain names (in contexts P38L42 for which the full qualification is not necessary), without any host P38L43 (as in some file URLs), or with a host of "localhost". P38L44 P38L45 The URL port is now *digit instead of 1*digit, since systems are P38L46 expected to handle the case where the ":" separator between host and P38L47 port is supplied without a port. P38L48 P39L1 The recommendations for delimiting URI in context (Appendix E) have P39L2 been adjusted to reflect current practice. P39L3 P39L4 G.4. Modifications from RFC 1808 P39L5 P39L6 RFC 1808 (Section 4) defined an empty URL reference (a reference P39L7 containing nothing aside from the fragment identifier) as being a P39L8 reference to the base URL. Unfortunately, that definition could be P39L9 interpreted, upon selection of such a reference, as a new retrieval P39L10 action on that resource. Since the normal intent of such references P39L11 is for the user agent to change its view of the current document to P39L12 the beginning of the specified fragment within that document, not to P39L13 make an additional request of the resource, a description of how to P39L14 correctly interpret an empty reference has been added in Section 4. P39L15 P39L16 The description of the mythical Base header field has been replaced P39L17 with a reference to the Content-Location header field defined by P39L18 MHTML [RFC2110]. P39L19 P39L20 RFC 1808 described various schemes as either having or not having the P39L21 properties of the generic URI syntax. However, the only requirement P39L22 is that the particular document containing the relative references P39L23 have a base URI that abides by the generic URI syntax, regardless of P39L24 the URI scheme, so the associated description has been updated to P39L25 reflect that. P39L26 P39L27 The BNF term has been replaced with , since the P39L28 latter more accurately describes its use and purpose. Likewise, the P39L29 authority is no longer restricted to the IP server syntax. P39L30 P39L31 Extensive testing of current client applications demonstrated that P39L32 the majority of deployed systems do not use the ";" character to P39L33 indicate trailing parameter information, and that the presence of a P39L34 semicolon in a path segment does not affect the relative parsing of P39L35 that segment. Therefore, parameters have been removed as a separate P39L36 component and may now appear in any path segment. Their influence P39L37 has been removed from the algorithm for resolving a relative URI P39L38 reference. The resolution examples in Appendix C have been modified P39L39 to reflect this change. P39L40 P39L41 Implementations are now allowed to work around misformed relative P39L42 references that are prefixed by the same scheme as the base URI, but P39L43 only for schemes known to use the syntax. P39L44 P39L45 P39L46 P39L47 P39L48 P40L1 H. Full Copyright Statement P40L2 P40L3 Copyright (C) The Internet Society (1998). All Rights Reserved. P40L4 P40L5 This document and translations of it may be copied and furnished to P40L6 others, and derivative works that comment on or otherwise explain it P40L7 or assist in its implementation may be prepared, copied, published P40L8 and distributed, in whole or in part, without restriction of any P40L9 kind, provided that the above copyright notice and this paragraph are P40L10 included on all such copies and derivative works. However, this P40L11 document itself may not be modified in any way, such as by removing P40L12 the copyright notice or references to the Internet Society or other P40L13 Internet organizations, except as needed for the purpose of P40L14 developing Internet standards in which case the procedures for P40L15 copyrights defined in the Internet Standards process must be P40L16 followed, or as required to translate it into languages other than P40L17 English. P40L18 P40L19 The limited permissions granted above are perpetual and will not be P40L20 revoked by the Internet Society or its successors or assigns. P40L21 P40L22 This document and the information contained herein is provided on an P40L23 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING P40L24 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING P40L25 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION P40L26 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF P40L27 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. P40L28 P40L29 P40L30 P40L31 P40L32 P40L33 P40L34 P40L35 P40L36 P40L37 P40L38 P40L39 P40L40 P40L41 P40L42 P40L43 P40L44 P40L45 P40L46 P40L47 P40L48