Uniform Resource Identifier (URI)

What is a Uniform Resource Identifier (URI)?

A Uniform Resource Identifier (URI) is a character sequence that identifies a logical (abstract) or physical resource — usually, but not always, connected to the internet. A URI distinguishes one resource from another.

URIs enable internet protocols to facilitate interactions between and among these resources. The strings of characters incorporated in a URI serve as identifiers, such as a scheme name and a file path.

In the URI, the file path may be empty.

A Uniform Resource Locator (URL), or web address, is the most common form of URI. It is used for unambiguously identifying and locating websites or other web-connected resources.

[embedded content]

How Uniform Resource Identifiers work

A URI provides a simple, extensible way to identify internet resources. Thanks to the uniformity that URIs provide, different types of resource identifiers can be used in the same context, regardless of the mechanisms used to access those resources.

The resource identifiers can also be reused in different contexts.

URIs can identify different types of resources, including:

  • electronic documents
  • webpages
  • images
  • information sources with a consistent purpose
Every URL is also a Uniform Resource Identifier, but not every URI is a Uniform Resource Locator

URIs and their generic syntax are defined in the Internet Engineering Task Force (IETF) Request for Comments (RFC) 3986. According to these specifications, these resources do not have to be accessible on the internet.

They are also summarized and extended in a W3C document for the W3C’s “World Wide Web project,” authored by Tim Berners-Lee.

Uniform Resource Identifier syntax

The generic form of any URI scheme is

[//[user:[email protected]]host[:port]][/]path[?query][#fragment]

A URI may consist of the following elements:

Scheme

Within the URI, the first element is the scheme name. Schemes are case-insensitive and separated from the rest of the object by a colon. The scheme establishes the concrete syntax and associated protocols for the URI.

Ideally, URI schemes should be registered with the Internet Assigned Numbers Authority (IANA) although nonregistered schemes can also be used.

Example

If the URI is telnet://192.0.2.16:80, the scheme name is “telnet.”

Authority

The URI’s authority component is made up of multiple parts: a host consisting of either a registered name or an IP address, an optional authentication section and an optional port number.

The authentication section contains the username and password, separated by a colon, and followed by the symbol for at (@). After the @ comes the hostname, followed by a colon and then a port number. IPv4 addresses are commonly in a dot-decimal notation, and IPv6 addresses, which need to be in brackets, are typically in hexadecimal form.

an IPv6 address
An example of the different segments of an IPv6 address.

The path containing data is notated by a sequence of segments separated by slashes. These slashes imply a hierarchical structure. The path begins with a single slash, whether or not an authority is present. However, the path cannot start with a double slash. This part of the syntax may closely resemble a particular file path but does not always imply a relation to that file system path.

In the previous URI example (telnet://192.0.2.16:80), a scheme name is present. The numbers after the double slash constitute the authority. Because no characters come after the slash, it indicates that the path is empty.

Query (optional)

The query contains a string of non-hierarchical data. It is often a sequence of attribute-value pairs separated by a delimiter, such as an ampersand (&) or semicolon. A question mark separates the query from the part that comes before it.

The string represents some operation applied to a “queryable” object by the URI.

Example

In the URI

foo://techtarget.com:8042/over/there?name=parrot#beak

the query is name=parrot#beak.

However, because this part of the syntax is optional, it may not always be present.

Fragment (optional)

The fragment contains an identifier that provides direction to a secondary resource. It is separated from the preceding part of the URI by a hash (#).

If the primary resource is an HTML document or article, the fragment may be an ID attribute of a specific element of that resource. In this case, a web browser will scroll this particular element into view.

However, if the fragment ID is void, it indicates that the URI refers to the whole object. In this case, the hash sign may be omitted.

Types of Uniform Resource Identifiers

Uniform Resource Locators (URLs) and Uniform Resource Names (URNs) are two types of URI.

Uniform Resource Locator (URL)

A URL is used to identify and locate webpages.

A URI identifies a resource but does not imply or guarantee access to it. A URL, however, not only identifies the resource, but also specifies how it can be accessed or where it is located. This is why a URL contains unique components, such as the protocol, domain and/or subdomain, in addition to other URI components.

A URL is a subset of URIs. This means all URLs are URIs.

However, not all URIs are URLs.

A URL begins by stating the protocol that should be used to access and locate the logical or physical resource on a particular network.

Therefore:

  • If the resource is a webpage, the URL starts with the protocol HTTP or HTTPS.
  • If the resource is a file, the URL begins with the protocol FTP.
  • For an email address, the URL starts with the protocol “mailto.”

A URL is a location-dependent URI that may or may not be persistent. This means that if the resource’s location changes, the URL also changes to reflect and point to the new location.

URL examples:

https://whatis.techtarget.com/definition/URI-Uniform-Resource-Identifier

https://datatracker.ietf.org/doc/html/rfc3986

https://www.w3.org/Addressing/URL/uri-spec.html

Uniform Resource Name (URN)

Like a URL, a URN identifies a resource. However, unlike a URL, a URN is location-independent and persistent, meaning that it always identifies the same resource over time. A URN continues to persist even when the resource no longer exists or becomes unavailable.

A URN does not state which protocol should be used to locate and access the resource. Instead, it labels the resource with a persistent, location-independent and unique identifier.

A URN has three components:

  • The label “urn”
  • A colon
  • A character string as the unique identifier

URN examples (provided by IETF RFC 2986):

  • urn:oasis:names:specification:docbook:dtd:xml:4.1.2
  • urn:example:animal:ferret:nose

URI vs. URL

Although often used interchangeably, the “URI” and “URL” are different. A URI is an identifier of a specific resource while a URL is a special type of identifier that identifies a resource and specifies how it can be accessed.

The analogy of a person’s name and address can explain this difference. In this case, the name is the URI because it identifies the person. However, it doesn’t explain how the person can be found or where they live. For this, the address or URL is required.

Moreover, a URI can be used to identify and differentiate various types of files and resources, including HTML and XML, from each other. However, URLs can only be used to identify and locate webpages and resources. If a protocol, such as FTP or HTTPS, is present or implied for a domain, it is called a URL, even though it is also a URI.

Uniform Resource Identifier resolution and references

Two additional aspects of Uniform Resource Identifiers are resolution and references.

URI resolution is one of a few common operations performed on URIs that are also URLs. It involves determining the proper data access method and parameters needed to locate and retrieve the resource that the URI represents.

A URI reference is used to determine common usage for a URI and may appear as a full URI, part of a full URI or an empty string. If there is a fragment identifier, it will identify part of the resource referred to by the rest of the URI.

A URI reference can be a URI, but it can also be a relative reference. In this case, the URI reference’s prefix does not match the syntax of a scheme followed by its colon separator. To determine which components are present and whether the reference is relative, each of the URI components is parsed for its subparts and validation.

[embedded content]

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to our Newsletter