About annotations in SBML Level 2 and Level 3

The following is an attempt to clarify some issues in the SBML annotation scheme defined since SBML Level 2. The fact that the scheme uses RDF, but not all RDF, has been a source of confusion. The use of a very restricted subset of RDF is intended to let software developers avoid having to implement full-on RDF parsing, but at the same time, understanding what can and cannot be put into SBML Level 2 merits elaboration. We discuss these issues here and provide guidelines for how to encode certain kinds of information that SBML developers have requested.

What is allowed inside rdf:Description in SBML Level 2 and Level 3?

In Section 6.3, the SBML specification states that RDF content in an <annotation> element, when not attached to the model element, essentially must take the following form (and here we’ve shortened the list of XML namespace declarations to a typical minimum set):

<SBML_ELEMENT ... metaid="SBML_META_ID" ...> 
  ... 
  <annotation> 
    ... 
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
             xmlns:bqbiol="https://biomodels.net/biology-qualifiers/" 
             xmlns:bqmodel="https://biomodels.net/model-qualifiers/"  > 
      <rdf:Description rdf:about="#SBML_META_ID" > 
        <QUALIFIER> 
          <rdf:Bag> 
            <rdf:li rdf:resource="MIRIAM_URI" /> 
            ... 
            [NESTED_CONTENT] 
          </rdf:Bag> 
        </QUALIFIER> 
        ... 
      </rdf:Description> 
      ... 
    </rdf:RDF> 
    ... 
  </annotation> 
  ... 
</SBML_ELEMENT> 

The different elements have the following meanings:

Thing Description
SBML_ELEMENT The SBML element being annotated. Can be any SBML element such as species, reaction, etc.
SBML_META_ID The metaid attribute value of the SBML element being annotated. Each SBML element being annotated must have a value for the metaid so that the annotations can refer to the element. SBML’s metaid attributes have data type XML ID and must be unique across the entire model file.
QUALIFIER A BioModels qualifier expressing the nature of the relationship between the SBML object being annotated and the target of the annotation (i.e., the thing pointed to by the MIRIAM_URI). The qualifiers begin with the character strings bqbiol: and bqmodel:. In all cases using the biology qualifiers, the object of the relation is the biological or biochemical object represented by the enclosing model element. In the cases of the model qualifiers, the object of the relation is the model element itself.
[NESTED_CONTENT] The [NESTED_CONTENT] has the same format and restrictions as QUALIFIER, and is used to describe a clarification or another annotation about the QUALIFIER in which it appears, using such elements as bqbiol:isDescribedBy. Nested content permits, for example, describing protein modifications on species, or adding evidence codes for an annotation. Nested content relates to its containing QUALIFIER, not the other way around. It qualifies, but does not change, its meaning. Ignoring a [NESTED_CONTENT] does not affect the information provided by the containing QUALIFIER. These elements may be nested arbitrarily deep, with each successive layer describing or clarifying the annotation it is embedded within.
MIRIAM_URI The URI of the (external) resource being referenced, following the MIRIAM URI specification.
... A placeholder for “more of the same”, meaning, zero or more elements of the same form as the immediately preceding element. In the template above, there are 2 places where this appears: inside the rdf:Bag element to indicate that more rdf:li elements can appear there, and after the QUALIFIER element, to indicate that more QUALIFIER elements can appear there.
... A placeholder for either no content, or content that is other than the content defined by this Level 2 annotation scheme

There are some important restrictions to keep in mind regarding the SBML Level 2 scheme. The first restriction (stipulated in Section 3.2.4 of the SBML Level 2 Version 4 specification) is that there can be only one <rdf:RDF> element inside a given <annotation> element. Consequently, all RDF content associated with a given SBML object must be contained in this single <rdf:RDF> element. The second restriction (stipulated in Section 6.3 of the SBML Level 2 Version 4 specification) is that SBML controlled annotations are only recognized when they are in the first rdf:Description element within the <rdf:RDF> element. (This is shown in red type in the fragment above.) There can be other content inside the <rdf:RDF> element, but as far as the SBML MIRIAM annotation scheme is concerned, such other RDF content is private to a given model or software application and falls outside the simple scheme defined by SBML Level 2. (These restrictions are an effort to reduce the burden on software developers—it is presumed that the highly restricted scheme means an application does not need a full-blown RDF parser/writer in order to parse or write the SBML controlled annotations.)

A third restriction is that the structure of the rdf:Description element (shown in red type above) is actually limited to only what is shown here, and the structure of the QUALIFIER element is likewise limited to just an rdf:Bag containing only rdf:li elements that only use rdf:resource attributes. For better or worse, this is what SBML Level 2 defines as the scheme for annotations. While an application could write other RDF content inside the first rdf:Description element, it would not be compliant with SBML Level 2. Applications should put other RDF content outside, where indicated above by the ... after the first rdf:Description element. The sections below provide give examples of doing this.

Can the MIRIAM_URI refer to something inside the model?

No. Although this is perhaps not clear in the SBML Level 2 specification, the restriction on the <rdf:li rdf:resource="MIRIAM_URI”> elements is that the value must be an Identifiers.org URI (or the former MIRIAM URN)a MIRIAM URN. Although this simplifies reading model definitions, it unfortunately limits what can be put within this annotation framework.

An application or model author is free to put other RDF content elsewhere (for example, after the first rdf:Description element, as mentioned above), but then it becomes a proprietary annotation, and there is no guarantee that another software program will be prepared to interpret it in the intended way.

How do I put evidence codes in annotations?

There are at least two ways to do this. One approach puts the annotations entirely within the first rdf:Description element, making it SBML MIRIAM compliant, but it comes with a limitation that may or may not be significant in a particular situation. The second approach puts the annotation outside the first rdf:Description element, and while it is correct RDF and provides more power and flexibility, it is not compliant with the restricted SBML MIRIAM scheme of SBML Level 2 (but may nevertheless be understood by software that can parse full RDF). We describe each approach in turn.

Approach #1

MIRIAM Resources defines a URI for the ECO evidence code ontology accessed via the Ontology Lookup Service. To use it in the SBML Level 2 annotation scheme, use the isDescribedBy biological qualifier from the BioModels qualifiers list. The following is an example of annotating a reaction with the ECO code ECO:0000183.

<reaction metaid="metaid_0000052" id="vPFK" name="Phosphofructokinase">
  <annotation>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"  
             xmlns:bqbiol="https://biomodels.net/biology-qualifiers/"  >

      <rdf:Description rdf:about="#metaid_0000052">
  
        <bqbiol:isDescribedBy>  
          <rdf:Bag>
            <rdf:li rdf:resource="http://identifiers.org/eco/ECO:0000183"/>
          </rdf:Bag>
        </bqbiol:isDescribedBy>
  
        <bqbiol:is>
          <rdf:Bag>
            <rdf:li rdf:resource="http://identifiers.org/kegg.reaction/R00756"/>
            <rdf:li rdf:resource="http://identifiers.org/reactome/REACT_736"/>
          </rdf:Bag>
        </bqbiol:is>
  
      </rdf:Description>

    </rdf:RDF>
  </annotation>
  ...  
</reaction>

The above says that the reaction “is described by” the given evidence code ECO:0000183 (“Inferred from experiment/inferred from cell-free assay”), which is to be interpreted as the kind of evidence that supports the presence of the reaction in the model. (Since the evidence code is attached to the reaction and refers to it, this is the only possible interpretation.) This approach is simple and fits entirely within the scheme defined in SBML Level 2.

Depending on a modeler’s goals, this evidence association may be just what is sought. However, sometimes a modeler wants to describe the evidence associated with a particular statement about an SBML model entity. An example is associating an evidence code with a claim that a species occurs in a particular compartment: it is not a statement about the species, but rather about a relationship involving the species. The SBML Level 2 annotation scheme does not permit this kind of annotation-referring-to-another-annotation structure. We therefore describe an alternative (approach #2) below.

Approach #2

The second approach uses other RDF syntax (specifically, [http://www.w3.org/TR/rdf-primer/#reification RDF Reification]) placed outside of the first rdf:Description element controlled by the SBML MIRIAM annotation scheme. RDF reification provides a vocabulary for describing RDF statements using RDF, which is exactly the situation we face when we want to attach evidence codes to particular SBML annotations. RDF reification uses a rdf:Statement element and three RDF property elements for the subject, object, and predicate involved in the relationship being stated.

The following examples illustrates the syntax and approach:

<species metaid="metaid_0000052" id="S" compartment="ly"> 
  <annotation> 
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             xmlns:bqbiol="https://biomodels.net/biology-qualifiers/" > 
 
      <rdf:Description rdf:about="#metaid_0000052"> 
        <bqbiol:occursIn> 
          <rdf:Bag> 
            <rdf:li rdf:resource="http://identifiers.org/go/GO:0005764"/> 
          </rdf:Bag> 
        </bqbiol:occursIn> 
      </rdf:Description> 
 
      <rdf:Statement> 
        <rdf:subject rdf:resource="#metaid_0000052"/> 
        <rdf:predicate rdf:resource="https://biomodels.net/biology-qualifiers/occursIn"/> 
        <rdf:object rdf:resource="http://identifiers.org/go/GO:0005764"/> 
        <bqbiol:isDescribedBy> 
          <rdf:Bag> 
            <rdf:li rdf:resource="http://identifiers.org/eco/ECO:0000004"/> 
            <rdf:li rdf:resource="http://identifiers.org/pubmed/7017716"/> 
          </rdf:Bag> 
        </bqbiol:isDescribedBy> 
      </rdf:Statement> 
 
    </rdf:RDF> 
  </annotation> 
  ...
</species>

The rdf:Description element above says that the species S is located in a lysosome (a cellular structure identified by GO term GO:0005764). That much conforms to the SBML MIRIAM annotation scheme defined in SBML Level 2.

After this, the annotation above contains a second RDF element, the rdf:Statement. This element states that the relationship “S occurs in GO:0005764” is described by two things simultaneously: the Pubmed publication number 7017716, and the ECO evidence code ECO:0000004 (“inferred from cell fractionation assay”). The first part of the statement uses the properties rdf:subject, rdf:predicate and rdf:object, respectively to identify the subject (here, the element with XML ID value metaid_0000052, which is the element for species S), the predicate (here, “occurs in”, defined by a BioModels.net qualifier), and the target/object (here, the lysosome as identified by GO term GO:0005764) of the relationship. The second part of the statement records the fact that this relationship is described by the given publication and evidence code.

It may seem as though there is duplication of information between the rdf:Description and rdf:Statement elements, but there is not. The RDF reification statement in the second element is not an assertion that species S is located in a lysosome; it is a statement about the assertion that species S is located in a lysosome. As the RDF Primer puts it, “… when someone says that John said something about the weight of a tent, they are not making a statement about the weight of a tent themselves, they are making a statement about something John said”.

Sample code for approaches #1 and #2

Sarah Keating wrote sample code demonstrating how to add the annotations above using libSBML. The programs are included in the libSBML distribution under the examples/c++ directory. Here are direct links to the files:

How do I put InChI strings in annotations?

Update: InChIs now form a collection in the MIRIAM Registry. Therefore, one can use InChIs using MIRIAM URIs as described above.

Certain modeling efforts want to attach InChI strings to SBML species elements, as a way of putting self-contained unique identifiers on the species. The limitations on SBML Level 2’s RDF annotation content described above prevents putting the InChI content directly inside the rdf:Description elements as defined by the SBML MIRIAM scheme. Consequently, in SBML Level 2, the annotations must be entered as “proprietary” annotations, outside the Level 2-defined scheme. We give an example of an approach to doing this here.

<species metaid="metaid_M_8" id="M_8" name="1-Methylnicotinamide" compartment="C_1"> 
  <annotation> 
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"  
             xmlns:bqbiol="https://biomodels.net/biology-qualifiers/"  >

      <rdf:Description rdf:about="#metaid_M_8"> 
        <bqbiol:is> 
          <rdf:Bag> 
            <rdf:li rdf:resource="http://identifiers.org/chebi/CHEBI:16797"/> 
          </rdf:Bag> 
        </bqbiol:is> 
      </rdf:Description> 

      <rdf:Description rdf:about="#metaid_M_8"> 
        <in:inchi xmlns:in="https://biomodels.net/inchi"> 
           InChI=1/C7H8N2O/c1-9-4-2-3-6(5-9)7(8)10/h2-5H,1H3,(H-,8,10)/p+1 
        </in:inchi> 
      </rdf:Description> 

    </rdf:RDF> 
  </annotation> 
</species>

The essential point of the scheme above is that it puts the InChI string as the content of an element other than the first rdf:Description element inside the rdf:RDF element. The SBML Level 2 specification only constrains the content of the first rdf:Description element, and permits valid XML RDF content beyond it, so the above is valid RDF, but simply outside the scope of the SBML Level 2 annotation scheme. SBML-compliant software tools may not attempt to understand it, but tools with extended support for RDF may be able to.

One last note about the use of the namespace URI "https://biomodels.net/inchi". The use of a namespace-qualified element implies that the entity controlling the namespace has defined (or can define) the content that does into the element. It may seem reasonable to use "http://iupac.org/inchi" for the namespace string, but it turns out that IUPAC, the organization behind InChI, does not define this namespace (or any namespace, as far as we can determine). This introduces the question of just exactly what to use in this situation. Attempting to use "http://iupac.org/inchi" would risk encountering a conflict in the future if IUPAC ever ‘‘does’’ define something using the "http://iupac.org/inchi" namespace. Consequently, a different namespace must be used. The BioModels.net consortium will release a simple schema for this purpose, using the namespace URI "https://biomodels.net/inchi".