Namespaces and Prefixes

Namespaces are beautiful but also have a dark side. Let me explain.

What are namespaces?

Many protocols use a computer name or domain name, plus some extra arguments to separate the part which is globally organized, and what is free to arrange. For instance, in the the URL https://meshy.space/explain/namespaces/, the part https://meshy.space (RFC2616, scheme, host, and port) is understood by all applications world-wide as how-to-connect-to-what. The part /explain/namespaces/ (the path) is used by the contacted remote server to do something. The server decides independently what it means: has responsibility over what happens inside its namespace.

Another example is the use of namespaces in XML (see also our pages about namespace in XML and its problematic design). In XML, the namespace is used to avoid collisions of names defined by different specifications.

The generic need for distributed labeling of things, has lead to URNs (RFC8141, Uniform Resource Names). These labels contain a multi-leveled namespace organization, where each next component makes rules for the remaining part of the string. For instances, in urn:ietf:rfc:2648, the IETF organization describes that rfc contains a number, but may add other sub-namespaces with different rules. Nested namespaces.

Distribution of responsibility

The most important benefit of the use namespaces, is the distribution of responsibility how the names within them are used. With Meshy Space, it is not only about names: it is also about activity: who can do what and where.

Distributed responsibility also causes problems: outsiders may need to guess how the server has organized things. For instance, a search engine collects web-pages. Is path /abc/ the same as /abc/index.html or as /abc/index.cgi? (There are even more possibilities) Especially the use of query parameters in HTTP URLs is cause many problems for web-crawlers which source web-pages for search engines. Workarounds grew in sitemaps and the "canonical" reference inside HTML: ad-hoc tricks.

Accidental collisions in names

Without namespaces, you can easily get into the situation that two unrelated things bump into another. The first photos from two different cameras (of the same brand) are both named PIC00001.jpg. Better not write them in the same directory/map/folder.

Some applications change the name of one of the photos, for instance adding "(2)", when the name already exists. With ad-hoc tricks, you may avoid the use of namespaces. These tricks produce inconsistent and often unpredictable behavior.

When you have a single naming authority, it is possible to enforce uniqueness without namespaces and without tricks. Our letters use postal codes, with unique codes for addresses. The postal implicit "namespace" indicator is its expected location in an address on the envelope, or the word "Postal-code" on a form.

The governments assigns everyone a unique identification number, in a namespace "Social Security Number", "Personal Identifier", or whatever your country names its set)

IANA assigns identifying mime-types to be able to interpret rows of octets. But these all require a tedious central registration process. In the computer space, you want to avoid central orchestration whenever possible.

There is a wide variety in ways to avoid accidental collisions via namespaces. We use maps (folders, directories), which can each contain a file with the same name without conflict. In computer software, people use packages or objects to restrict the visibility of names, which reduces the chance on accidents. And so on.

Summarized: when you need to avoid collisions in names, have no wish for centralized orchestration, and you do not want to implement tricks with inconsistent results, then you should introduce namespaces based on a unique prefix component.

Meaningless and Meaningful Namespaces

As opposites, the http URLs explicitly tells everyone which server must be contacted to ask for data, where the urn URLs do express anything else than uniqueness. However, the urn:isbn: namespace is meaningful in a library context.

In XML, namespaces are (ab)used for multiple things at the same time. It SHOULD contain a domain-name you own, it usually contains the name of the service or purpose, and often contains a version number of the specification. Example: http://www.w3.org/2001/XMLSchema.

The domain-name makes uniqueness simple, although it is not enforced nor verifiable in XML technology. The version number in the namespace is a horrible mistake, blocking interface improvements. More about XML mistakes.

In the early years, the namespace of the XML could also be used to pull the schema file which described the namespace. Sometimes it only leads to a page with documentation. At the moment, this practice is rarely followed: organizations change servers, change their name, have websites which do not support it, and so on... but changing a namespace is hard.

Prefixes

Where namespaces are really useful, they are usually impractically long. When your document refers to the same namespace hundreds of times, it grows uselessly large. As a kind of compression, protocols often support the use of prefixes for namespaces: short abbreviations.

The same namespace may be referred to by multiple abbreviations: the prefix itself is just a helper, lacking any meaning. Only the namespace has a meaning.

Prefix to namespace mappings are limited to a context, to avoid collisions of these abbreviations. You are totally free to pick the prefix string, because they have no meaning by themselves. However, there are often preferences for human readability only.

Example with XML: the message contains

<x:home xmlns:x="https://example.com/addresses">
   <x:postal-code>6815BH</x:postal-code>
</x:home>

In the example, the namespace is https://example.com/addresses. The namespace refers to a loaded schema file which describes home and postal-code fields. We declare (xmlns) the prefix x as abbreviation for the namespace, which otherwise had to be repeated in full extend four time.

Internally in applications, the prefix is immediately converted into the namespace. Programs should not work with the prefix internally, because different prefixes MAY exist for the same namespace. Internally, programs often use "(namespace, name)" pairs, or the "{namespace}name" notation.

Namespaces in Meshy Space

As you could have guessed by now: Meshy Space namespaces are a bit differently. It is used for uniquely identifying and contacting the source, but the namespace does not reflect versions.

The namespace string refers to a Namespace Unit, which describes how the namespace is organized. It contains access points to the daemons which manage the Collections within the namespace, the Resources. It also refers to the Rules of the root Collection.

First contact

Connecting to a Namespace (a server) starts negotiations: where do the capabilities of client and server match? and what does the server provide. This includes versions of schema's, compression and signature algorithms. It may include authentication keys.

Meshy space Namespaces support redirection, like symbolic links. This offers an easy way to move a Collection.

The path to download a Unit is (simplified) like this:

take the prefix from the element;
look-up the prefix in the contextual table which translates it into a namespace string;
Look in the Cache whether you have the Namespace object which belongs to the namespace;
when we do not have it yet, use the namespace string as URL to contact the server to provide the Namespace Unit;
when the Namespace Unit is a redirect (namespace rewrite), follow that returned namespace string;
the Namespace Unit describes the Resources: where copies of the Collection can be found like a mirror list in ftp-servers. Pick the fastest nearby, if possible to guess;
use the ideal Resource to get to the Collection Rules, which will lead you to the requested Unit;

Strong use of caching of Units makes this work fast enough.

mark@overmeer.net Web-pages generated on 2023-12-19