The aim of this document is to provide a brief reference for namespaces and aspects of BEL syntax that are relevant for BEL Track of the BioCreative challenge.
BEL is a complex and powerful knowledge representation formalism. BEL statements are typically written by domain experts on the basis of information found in the literature. The aim of this track is to verify how far BEL statements can be constructed by automated tools.
Given the complexity of the BEL language, we will select only statements that make use of a reduced set of namespaces and syntactic constructs. The specific constraints adopted for the BEL track are documented below. Notice however that some choices are not yet completely finalized and we welcome feedback from potential participants.
For a full documentation of BEL and all supported namespaces please see the links below:
BEL adopts a concept of namespaces to disambiguate references for biological entities. Generally, the user can associate a namespace prefix with an external vocabulary and refer to elements of the vocabulary within the namespace.
Currently OpenBEL offers 22 different namespaces. To reduce the complexity of the task we decided to focus on a selected set of namespaces.
The BEL Framework manages equivalencing between namespace values through specification and use of BEL namespace equivalence documents. When equivalence documents are specified as inputs to the knowledge assembly process, the BEL Compiler can integrate knowledge expressed using terms based on multiple namespaces.
Overview of Namespaces used in BEL Track Dataset
BEL statements can make use of a relatively large number of namespaces. However, for the sake of the challenge only those listed below will be considered. An additional difficulty is that in order to disambiguate a gene or protein name, it is necessary to know the organism to which it belongs. In the context of a full paper or abstract it is possible (in most cases) to find sufficient information to perform such disambiguation.
We are aware that organism disambiguation is not feasible based on evidence sentences only. Therefore we accept results based on HGNC, MGI or EGID if they are equivalent. All statements will receive the same full score if one of these is in the gold standard. In the training data only the original name spaces are given.
p(HGNC:MAPK14) increases act(p(HGNC:STAT1)) is equivalent to
p(EGID:1432) increases act(p(EGID:6772)) and
p(MGI:Mapk14) increases act(p(MGI:Stat1)) and
p(EGID:26416) increases act(p(EGID:20846))
All objects in BEL have a unique internal identifier that can be found using the "BEL Namespace Equivalence Documents" (*.beleq) listed below. These internal identifiers can be used to map across the three databases (HGCN, MGI, EGID).
The "BEL Orthology resource", a BEL document containing orthologous relationships ( HGNC (human) / MGI (mouse), HGNC (human) / RGD (rat) , MGI (mouse) / RGD (rat)), might also be useful.
- Link to Resource: http://www.genenames.org/
- Short Description: HGNC is responsible for approving unique symbols and names for human loci, including protein coding genes, ncRNA genes and pseudogenes, to allow unambiguous scientific communication.
- Download: Homo_sapiens.gene_info
- BEL Namespace Documents: hgnc-human-genes.belns
- BEL Namespace Equivalence Documents: hgnc-human-genes.beleq
- Link to Resource: http://www.informatics.jax.org/
- Short Description: MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.
- Download: Mus_musculus.gene_info
- BEL Namespace Documents: mgi-mouse-genes.belns
- BEL Namespace Equivalence Documents: mgi-mouse-genes.beleq
- Link to Resource: http://www.ncbi.nlm.nih.gov/gene
- Short Description: Entrez Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.
- Download: gene_info
- BEL Namespace Documents: entrez-gene-ids.belns
- BEL Namespace Equivalence Documents: entrez-gene-ids.beleq
- Comment: We will accept mouse hits for human hits and vice versa.
GOBP: The Gene Ontology Biological Processes
- Link to Resource: http://www.geneontology.org/, http://obofoundry.org/cgi-bin/detail.cgi?id=biological_process
- Short Description: The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products across databases. The GO project has developed three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner.
- Download: GO obo-xml file
- BEL Namespace Documents: go-biological-process.belns
- BEL Namespace Equivalence Documents: go-biological-process.beleq
- Comment: We will focus on a restricted frequent number of biological processes.
Only the GO processes included in this list will be considered.
MESHD: MeSH Diseases
- Link to Resource: http://www.nlm.nih.gov/mesh/
- Short Description: MeSH is the National Library of Medicine's controlled vocabulary thesaurus. It consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity.
- Download: MeSH Download Page
- BEL Namespace Documents: mesh-diseases.belns
- BEL Namespace Equivalence Documents: mesh-diseases.beleq
CHEBI: Chemical Entities of Biological Interest (ChEBI)
- Link to Resource: http://www.ebi.ac.uk/chebi/
- Short Description: Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. ChEBI ontology is provided in the W3C standard Web Ontology Language (OWL) and OBO formats.
- Download: Chebi Download Page
- BEL Namespace Documents: chebi.belns
- BEL Namespace Equivalence Documents: chebi.beleq
GOCCID: GO Cellular Components
- These occur in the gold standard in the context of the pmod() function but are NOT EXPECTED in as part of the submissions and will NOT BE EVALUATED.
BEL Statements: Functions and Relationships
BEL Statements are expressions that represent knowledge of the existence of biological entities and relationships between them.
Most BEL Statements represent relationships between one BEL Term and another BEL Term or BEL Statement. This type of BEL Statement encodes a semantic triple (subject, relationship type, object), which represents an assertion of a relationship between the subject and object.
In the data set we provide the full BEL statements as given by the authors but expect the participants to generate only simplified statements. Additionally in the evaluation process we will generate scores for part statements too (See Evaluation Details of Biocreative BEL Task 1 for more details).
Functions associated to Namespaces
In BEL different namespaces have different abundance and process functions associated:
- For genes (HNGC, MGI, EGID) we use only the protein abundance function p() in the simplified version
- For CHEBI the abundance function a() is used
- For GOBP the function for biological processes bp() is used
- For MESHD the pathology function path() is used
BEL Terms are formed by using these BEL functions together with entity definitions referenced by identifiers of the associated BEL Namespace. Each BEL Term represents either a biological process or the abundance of an entity.
|Namespace||Functions||BEL Term Example|
|HGNC||p(), g(), r(), m() |
|MGI||p(), g(), r(), m() |
p(), g(), r() 
 Only p() will be used in the simplified statement and the evaluation
Abundance functions: complexAbundance
Denotes the abundance of a molecular complex.
(complex(p(MGI:Itga8),p(MGI:Itgb1))) -> bp(GOBP:"cell adhesion")
Used as an argument within a p() function to indicate covalent modification of the specified protein.
p(MGI:Cav1,pmod(P)) -> a(CHEBI:"nitric oxide")
Denotes the frequency or abundance of events where an argument is degraded.
p(MGI:Lyve1) -> deg(a(CHEBI:"hyaluronic acid"))
Denotes the frequency or abundance of events where an argument changes location.
a(CHEBI:"brefeldin A") -> tloc(p(MGI:Stk16))
Denotes the frequency or abundance of events where an argument acts as a causal agent at the molecular scale.
complex(p(MGI:Cckbr),p(MGI:Gast)) -> act(p(MGI:Prkd1))
These functions will be scored separately. We will give you basic scores for a found relation even when those functions are not included in your BEL statements. Details of evaluation can be found here: Evaluation Details of Biocreative BEL Task 1. And a full overview of evaluation of BEL functions can be found here: All Functions Evaluation Overview.
The last function act() describes the activity of the protein. This information is partly inferred directly from the sentence by the user but often inferred from background knowledge. In the original statements different activities are given such as cat(), tscript(), kin() and gtp() and act() will be accepted for any of them.
In the BEL Track dataset, there are four accepted relationships:
a(CHEBI:"brefeldin A") -| p(HGNC:SCOC)
p(HGNC:BMP4) increases complex(p(HGNC:MTOR), p(HGNC:STAT3))
These relationships will be expected and evaluated.