Home

 
 

XSD Design Rules for Meshy Space Interfaces

Even though XML Schemas (XSDs) are a strict formalism, there are many totally different ways to apply them. For instance, there are four different design patterns, many naming conventions, constructs which originate from tooling, and so forth. This page describes the choices.

Design patterns

Literature differentiates four ways to set-up a schema:

  1. Russian Doll
  2. Salami Slice
  3. Venetian Blind
  4. Garden of Eden

Some comparison of the approaches HERE and HERE.

The "Russian Doll" and "Venetian Blind" ethics require a single global root element, behind which anything else is hidden. This does not work in combination with substitutionGroups, so are unusable for most modern schemas.

"Salami Slice" opens up all elements, but not the type definitions. Even sub-sub-sub-elements have their own definition on the schemas top level. We need some of the types and do not want to export all of the elements.

With "Garden of Eden", we see all types and all elements. You never know what extension schemas created later will need. Totally no encapsulation. That's also not what we want.

 
For MSI, we design a flexible schema version upgrade mechanism: it is not needed to open-up everything to be prepared for the future. Schema updates are made like software releases. So we attempt to specify simple and small, but rework the schema (without changing the structure of the messages) when we need to.

An example of a "We hid too much, let's rewrite to open" action. The original is simple, we need only bike elements:

   <element name="bike">
     <complexType>
       <sequence>
         <element name="wheels" type="byte" />
       </sequence>
     </complexType>
   </element>

But then for some reason —for instance conversion to a substitutionGroup element— someone needs access to the bike's type. The next minor version of the schema contains:

   <element name="bike" type="me:bikeType" />
   <complexType name="bikeType">
     <sequence>
       <element name="wheels" type="byte" />
     </sequence>
   </complexType>

So: we can afford to use a fifth approach: "KISS".

No types to represent relations

Some schemas are generated by design tools. This results in schemas which are far more complex than needed. Please write your schemas with full view on what you will distribute.

Schema generating tools typically produce things like this:

   <element name="bike" type="bikeType" />
   <complexType name="bikeType">
     <sequence>
       <element ref="owners" />
     </sequence>
   </complexType>
   <element name="owners" type="ownersType" />
   <complexType name="ownersType">
     <sequence>
       <element ref="owner" [0..∞] />
     </sequence>
   </complexType>
   <element name="owner" type="ownerType" />

Although totally correct, please use the simpler design:

   <element name="bike">
     <complexType>
       <sequence>
         <element name="owner" type="ownerType"
            [0..∞] />
       </sequence>
     </complexType>
   </element>

For core MSI components, you SHOULD use the simpler version. For third party schemas, the simpler set-up is RECOMMENDED.

Use of nillable and optional

There are three ways to represent "missing" in XSDs:

  1. nillable: when the element definition contains nillable="true", then the element in the message may carry the xsi:nil="true" attribute.
  2. When the element declaration contains minOccurs="0" it may be missing from the message.
  3. When the element declaration contains a default="value" it also may be missing from the message.

With MSI, a message with xsi:nil means: I may know more about this element, but for some reason I will not include it here.

With simple elements, try to use the default as often as possible: it clarifies what will happen when you leave the value out. Also, it may reduce the size of a message a bit.

No use of any and any*Type

The any schema construct is not needed: the need is mostly avoided by use of substitutionGroups. Other uses are replaced by the moving schema versions strategy of MSI.

The any*Type types do not work for programs. These are base construct for types which you can implement in your programs, but not usable by itself.

Avoid huge data-types Decimal and *integer

Implementations which need to support these types must be prepared to handle unlimited precision values. This is hard, so avoid decimal, integer, nonNegativeInteger, positiveInteger, nonPositiveInteger, and negativeInteger.

Whenever a long (between about -1019) and +1019 or unsignedLong (from 0 to +1020) are not large enough, then you MUST add size restricting facets. The long type is 128 bit, the int type is 64 bit.

Use enumerate when possible

Using digits to represent a state or flag looks efficient, but that is an outdated peephole optimization at the expense of harder to debug code.

TODO: explain that QName can be useful for deprecation cycle.

Rules, summary

  • You SHALL NOT use "xsi:type".
  • You SHALL NOT use "any" and "anyType".
  • You SHALL NOT use "mixed" content: we use data XML, not page fragments.
  • You MUST put boundaries on types which are infinite in size, but it is RECOMMENDED to avoid them altogether.
  • It is RECOMMENDED to use enumeration with strings, over numbers to reflect states.
  • You SHALL NOT use namespace-less schema components.
  • You MUST use namespaces on elements and also attributes.
  • You SHOULD follow the "as simple as possible" practice.

mark@overmeer.net      Web-pages generated on 2023-12-19