XSD Design Rules for Meshy Space Interfaces
Even though XML Schemas (XSDs) are a strict formalism, there are many
totally different ways to apply them. For instance, there are four different
design patterns, many naming conventions, constructs which originate from
tooling, and so forth. This page describes the choices.
Design patterns
Literature differentiates four ways to set-up a schema:
- Russian Doll
- Salami Slice
- Venetian Blind
- Garden of Eden
Some comparison of the approaches
HERE
and HERE.
The "Russian Doll" and "Venetian Blind" ethics require a single
global root element, behind which anything else is hidden. This does
not work in combination with substitutionGroups, so are unusable
for most modern schemas.
"Salami Slice" opens up all elements, but not the type
definitions. Even sub-sub-sub-elements have their own definition on
the schemas top level. We need some of the types and do not want to
export all of the elements.
With "Garden of Eden", we see all types and all elements. You never
know what extension schemas created later will need. Totally no
encapsulation. That's also not what we want.
For MSI, we design a flexible schema version upgrade mechanism:
it is not needed to open-up everything to be prepared for the future.
Schema updates are made like software releases. So we attempt to specify
simple and small, but rework the schema (without changing the structure
of the messages) when we need to.
An example of a "We hid too much, let's rewrite to open" action.
The original is simple, we need only bike elements:
<element name="bike">
<complexType>
<sequence>
<element name="wheels" type="byte" />
</sequence>
</complexType>
</element>
But then for some reason —for instance conversion to a
substitutionGroup element— someone needs access to
the bike's type. The next minor version of the schema contains:
<element name="bike" type="me:bikeType" />
<complexType name="bikeType">
<sequence>
<element name="wheels" type="byte" />
</sequence>
</complexType>
So: we can afford to use a fifth approach: "KISS".
No types to represent relations
Some schemas are generated by design tools. This results in schemas
which are far more complex than needed. Please write your schemas with
full view on what you will distribute.
Schema generating tools typically produce things like this:
<element name="bike" type="bikeType" />
<complexType name="bikeType">
<sequence>
<element ref="owners" />
</sequence>
</complexType>
<element name="owners" type="ownersType" />
<complexType name="ownersType">
<sequence>
<element ref="owner" [0..∞] />
</sequence>
</complexType>
<element name="owner" type="ownerType" />
Although totally correct, please use the simpler design:
<element name="bike">
<complexType>
<sequence>
<element name="owner" type="ownerType"
[0..∞] />
</sequence>
</complexType>
</element>
For core MSI components, you SHOULD use the simpler version. For third
party schemas, the simpler set-up is RECOMMENDED.
Use of nillable and optional
There are three ways to represent "missing" in XSDs:
nillable : when the element definition contains
nillable="true" , then the element in the message
may carry the xsi:nil="true" attribute.
- When the element declaration contains
minOccurs="0"
it may be missing from the message.
- When the element declaration contains a
default="value"
it also may be missing from the message.
With MSI, a message with xsi:nil means: I may know more
about this element, but for some reason I will not include it here.
With simple elements, try to use the default as often as
possible: it clarifies what will happen when you leave the value out.
Also, it may reduce the size of a message a bit.
No use of any and any*Type
The any schema construct is not needed: the need is
mostly avoided by use of substitutionGroup s. Other uses
are replaced by the moving schema versions strategy of MSI.
The any*Type types do not work for programs. These are
base construct for types which you can implement in your programs,
but not usable by itself.
Avoid huge data-types Decimal and *integer
Implementations which need to support these types must
be prepared to handle unlimited precision values. This is
hard, so avoid decimal , integer ,
nonNegativeInteger , positiveInteger ,
nonPositiveInteger , and negativeInteger .
Whenever a long (between about -1019) and
+1019 or unsignedLong (from 0 to +1020)
are not large enough, then you MUST add size restricting facets.
The long type is 128 bit, the int type
is 64 bit.
Use enumerate when possible
Using digits to represent a state or flag looks efficient, but that
is an outdated peephole optimization at the expense of harder to debug
code.
TODO: explain that QName can be useful for deprecation cycle.
Rules, summary
- You SHALL NOT use "
xsi:type ".
- You SHALL NOT use "
any " and "anyType ".
- You SHALL NOT use "
mixed " content: we use data XML,
not page fragments.
- You MUST put boundaries on types which are infinite in size, but
it is RECOMMENDED to avoid them altogether.
- It is RECOMMENDED to use enumeration with strings, over numbers
to reflect states.
- You SHALL NOT use namespace-less schema components.
- You MUST use namespaces on elements and also attributes.
- You SHOULD follow the "as simple as possible" practice.
mark@overmeer.net
Web-pages generated on 2023-12-19
|