Namespaces and Prefixes

Namespaces are beautiful but also have a dark side. Let me explain.

Distribution of responsibility

The most important benefit of the use namespaces is the distribution of responsibility how the names within them are used. Many protocols use a computer name or domain name, plus some extra arguments to separate the part which is globally organized, and what is free to arrange.

For instance, in the the URL https://meshy.space/explain/namespaces/, the part https://meshy.space (RFC2616, scheme, host, and port) is understood by all applications world-wide as how-to-connect-to-what. The part /explain/namespaces/ (the path) is used by the contacted remote server to do something. The server decides independently what it means: has responsibility over what happens inside its namespace.

Another example is the use of namespaces in XML (see also our pages about namespace in XML and its problematic design). In XML, the namespace is used to avoid collisions of names defined by different specifications.

The generic need for distributed labeling of things, has lead to URNs (RFC8141, Uniform Resource Names). These labels contain a multi-leveled namespace organization, where each next component makes rules for the remaining part of the string. For instances, in urn:ietf:rfc:2648, the IETF organization describes that rfc contains a number, but may add other sub-namespaces with different rules. Nested namespaces.

Distributed responsibility also causes problems: outsides may need to guess how the server has organized things. For instance, a search engine collects web-pages. Is path /abc/ the same as /abc/index.html or /abc/index.cgi? (There are more possibilities) Especially the use of query parameters in HTTP URLs is cause many problems for web-crawlers which source web-pages for search engines. Workarounds grew in sitemaps and the "canonical" reference inside HTML.

Accidental collisions in names

Without namespaces, you can easily get into the situation that two unrelated things bump into another. The first photos from two different cameras (of the same brand) are both named PIC00000.jpg. Better not write them in the same folder.

Some applications change the name of one photos, for instance adding "(2)" when the name already exists. With ad-hoc tricks, you may avoid the use of namespaces. These tricks produce inconsistent behavior.

When you have a single naming source, it is possible to enforce uniqueness without namespaces and without tricks. Our letters use postal codes. It's implicit "namespace" indicator is its expected location in an address on the envelope, or the word "Postal-code" on a form. The governments assigns everyone a unique identification number, in a namespace "Social Security Number", "Personal Identifier", or whatever your country names its set) IANA assigns identifying mime-types to be able to interpret rows of octets. But these all require a tedious central registration process. In the computer space, you want to avoid central orchestration whenever possible.

There is a wide variety ways to avoid accidental collisions via namespaces. We use maps (folders, directories), which can each contain a file with the same name without conflict. In computer software, people use packages or objects to restrict the visibility of names, which reduces the chance on accidents. And so on.

Summarized: when you need to avoid collisions in names, have no wish for centralized orchestration, and you do not want to implement tricks with inconsistent results, then you should introduce namespaces.

Meaningless and Meaningful Namespaces

As opposites, the http URLs explicit tell everyone which server must be contacted to ask for data, where the urn URLs do express anything else than uniqueness. However, the urn:isbn: namespace is meaningful in a library context.

In XML, namespaces are (ab)used for multiple things at the same time. It SHOULD contain a domain-name you own, it usually contains the name of the service or purpose, and often contains a version number of the specification: http://www.w3.org/2001/XMLSchema.

The domain-name makes uniqueness simple, although it is not enforced nor verifiable. The version number in the namespace is a horrible mistake, blocking interface improvements. More about XML mistakes.

In the early years, the namespace of the XML could also be used to pull the schema file which described the namespace. Sometimes it only leads to a page with documentation. At the moment, this practice is rarely followed: organizations change servers, change their name, have websites which do not support it, and so on... but changing a namespace is hard.

Prefixes

Where namespaces are really useful, they are usually quite long. When your document refers to the same namespace hundreds of times, it gets uselessly large. As a kind of compression, protocols often support the use of prefixes for namespaces: short abbreviations. The same namespace may be referred to by multiple abbreviations.

Prefixes are limited to a context, to avoid collisions of abbreviations. They are totally free to pick because they have no meaning by themselves. Therefore, keeping abbreviations unique in a context is not too hard.

Example with XML: the message contains

<x:home xmlns:x="https://example.com/addresses">
   <x:postal-code>6815BH</x:postal-code>
</x:home>

In the example, the namespace is https://example.com/addresses. The namespace refers to a loaded 'schema', file which describes home and postal-code. Gladly, we could declare (xmlns) prefix x as abbreviation for the namespace, which otherwise had to be repeated in full extend four time.

Internally in applications, the prefix is immediately converted into the namespace. They SHOULD not continue to work with the prefix, because different prefixes may exist for the same namespace. Above example would describe exactly the same when prefix "y" was used consequently.

Namespaces in Meshy Space

As you could have guessed by now: Meshy Space Namespaces are a bit differently. It is used for uniquely identifying and contacting the source, but does not reflect versions.

The Namespace string refers to a Unit, which describes how the namespace is organized. It contains access points to the daemons which manage the Collections within the namespace. It also refers to the Rules of the root Collection.

Connecting to a Namespace (a server) starts negotiations: where do the capabilities of client and server match? and what does the server provide. This includes versions of schema's, compression and signature algorithms. It may include authentication keys.

Meshy space Namespaces support redirection, like symbolic links. This offers an easy way to move a Collection.

The path to download a Unit is (simplified) like this:

take the prefix from the element;
look-up the prefix in the contextual table which translates it into a namespace string;
when there is no cached version of the Unit which belongs to the namespace string, then use it as URL to contact the server to provide the Namespace Unit;
when the Namespace Unit is a redirect (namespace rewrite), follow that namespace string;
the Namespace Unit describes the Resources: where copies of the Collection can be found like a mirror list in ftp-servers. Pick the fastest nearby, if possible to guess;
use the Resource to get to the Collection Rules, which will lead you to the element;

Strong caching of Units will make this work fast enough.

mark@overmeer.net Web-pages generated on 2023-02-03