Identifiers & CURIEs
GA4GH Recommendation
GA4GH recommends to use CURIEs as (external) identifiers.
General Use of Identifiers in GA4GH Standards¶
CURIEs¶
CURIEs ("Compact URIs") are namespace-scoped identifiers which can be expanded to Internationalized Resource Identifiers (IRI). A CURIE is comprised of two components, a prefix and a reference, separated by a colon symbol :
. CURIES are case sensitive, although for prefixes this practice is inconsistently being followed.
The GA4GH recommendations are:
- use only a single prefix
- for newly generated identifiers, and specifically applying to the new
ga4gh
namespace, one should avoid the use of the underscore_
character in the private part of an identifier- reason is the sometimes replacement of the colon
:
separator by_
, in computing environments where:
may be problematic - exceptions are underscore characters in computed identifiers
- reason is the sometimes replacement of the colon
- a reasonable separation character for structural elements of the private identifier part ("internal prefix") is the dot
.
character
Example use of CURIEs in GA4GH¶
In GA4GH schemas, CURIEs constitute the recommended syntax for the referencing ontology classes or external references. Here, usually a CURIE as id
is combined with a label
for the text representation of the , such in the OntologyClass
object prototype:
"onset": {
"label" : "Juvenile onset",
"id" : "HP:0003621"
},
"external_references": [
{
"id" : "cellosaurus:CVCL_0312",
"label" : "HOS"
},
]
The underscore in the Cellosaurus id cellosaurus:CVCL_0312
should usually not be problematic if it is properly prefixed; however, de novo identifier designs may avoid such a syntax.
Contributors¶
- Chris Mungall (@cmungall)
- Julie McMurry (@jmcmurry)
- Melissa Haendel (@mellybelly)
- Michael Baudis (@mbaudis)
- Reece Hart (@reece)
- cross GA4GH alignment discussions
Further Information¶
- W3C CURIE syntax page
- W3C IRI documentation
- IETF IRI specification (e.g. allowed characters)
- SchemaBlocks OntologyClass class documentation
- SchemaBlocks Curie class documentation
- N2T resolver documentation
Please see also a previous discussion on Github, and the links from there.
The ga4gh
Namespace¶
ga4gh
Prefix1¶
In a "GA4GH Namespace Discussion" telecon on 2019-08-22, initiated by GKS and with
the participation of different work stream and project leads, it was agreed that
newly generated identifiers created and maintained in the "GA4GH ecosystem" should
use a general ga4gh
prefix, and not create scoped prefixes. Details and implementation of this general concept are currently being evaluated.
Some extensive discussion of this can be found in the GA4GH TASC space
and the VRS specification.
-
Image by Alex Wagner, from ↩