Child pages
  • BEL Documentation
Skip to end of metadata
Go to start of metadata

 

The aim of this document is to provide a brief reference for namespaces and aspects of BEL syntax that are relevant for BEL Track of the BioCreative challenge.

BEL is a complex and powerful knowledge representation formalism. BEL statements are typically written by domain experts on the basis of information found in the literature. The aim of this track is to verify how far BEL statements can be constructed by automated tools. 

Given the complexity of the BEL language, we will select only statements that make use of a reduced set of namespaces and syntactic constructs. The specific constraints adopted for the BEL track are documented below. Notice however that some choices are not yet completely finalized and we welcome feedback from potential participants.

For a full documentation of BEL and all supported namespaces please see the links below:

BEL Namespaces

BEL adopts a concept of namespaces to disambiguate references for biological entities. Generally, the user can associate a namespace prefix with an external vocabulary and refer to elements of the vocabulary within the namespace.

Currently OpenBEL offers 22 different namespaces. To reduce the complexity of the task we decided to focus on a selected set of namespaces.

The BEL Framework manages equivalencing between namespace values through specification and use of BEL namespace equivalence documents. When equivalence documents are specified as inputs to the knowledge assembly process, the BEL Compiler can integrate knowledge expressed using terms based on multiple namespaces.

Overview of Namespaces used in BEL Track Dataset

BEL statements can make use of a relatively large number of namespaces. However, for the sake of the challenge only those listed below will be considered.  An additional difficulty is that in order to disambiguate a gene or protein name, it is necessary to know the organism to which it belongs. In the context of a full paper or abstract it is possible (in most cases) to find sufficient information to perform such disambiguation.

We are aware that organism disambiguation is not feasible based on evidence sentences only. Therefore we accept results based on HGNC, MGI or EGID if they are equivalent. All statements will receive the same full score if one of these is in the gold standard. In the training data only the original name spaces are given.

Example:

p(HGNC:MAPK14) increases act(p(HGNC:STAT1)) is equivalent to
p(EGID:1432) increases act(p(EGID:6772)) and  
p(MGI:Mapk14) increases act(p(MGI:Stat1)) and
p(EGID:26416) increases act(p(EGID:20846)) 

All objects in BEL have a unique internal identifier that can be found using the "BEL Namespace Equivalence Documents" (*.beleq) listed below. These internal identifiers can be used to map across the three databases (HGCN, MGI, EGID).

The "BEL Orthology resource", a BEL document containing orthologous relationships ( HGNC (human) / MGI (mouse), HGNC (human) / RGD (rat) , MGI (mouse) / RGD (rat)), might also be useful.

HGNC: HUGO Gene Nomenclature Committee

MGI: Mouse Genome Informatics

EGID: Entrez Gene Identifier

  • Link to Resource: http://www.ncbi.nlm.nih.gov/gene
  • Short Description: Entrez Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.
  • Download: gene_info
  • BEL Namespace Documents: entrez-gene-ids.belns
  • BEL Namespace Equivalence Documents: entrez-gene-ids.beleq
  • Comment: We will accept mouse hits for human hits and vice versa.

GOBP: The Gene Ontology Biological Processes


MESHD: MeSH Diseases

CHEBI: Chemical Entities of Biological Interest (ChEBI)

  • Link to Resource: http://www.ebi.ac.uk/chebi/
  • Short Description: Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. ChEBI ontology is provided in the W3C standard Web Ontology Language (OWL) and OBO formats.
  • Download: Chebi Download Page
  • BEL Namespace Documents: chebi.belns
  • BEL Namespace Equivalence Documents: chebi.beleq

GOCCID: GO Cellular Components

  • These occur in the gold standard in the context of the pmod() function but are NOT EXPECTED in as part of the submissions and will NOT BE EVALUATED.

BEL Statements: Functions and Relationships

 

BEL Statements are expressions that represent knowledge of the existence of biological entities and relationships between them.

Most BEL Statements represent relationships between one BEL Term and another BEL Term or BEL Statement. This type of BEL Statement encodes a semantic triple (subject, relationship type, object), which represents an assertion of a relationship between the subject and object. 

In the data set we provide the full BEL statements as given by the authors but expect the participants to generate only simplified statements. Additionally in the evaluation process we will generate scores for part statements too (See Evaluation Details of Biocreative BEL Task 1 for more details).

Functions associated to Namespaces

In BEL different namespaces have different abundance and process functions associated:

  • For genes (HNGC, MGI, EGID) we use only the protein abundance function p() in the simplified version
  • For CHEBI the abundance function a() is used
  • For GOBP the function for biological processes bp() is used
  • For MESHD the pathology function path() is used


BEL Terms are formed by using these BEL functions together with entity definitions referenced by identifiers of the associated BEL Namespace. Each BEL Term represents either a biological process or the abundance of an entity.

Overview:

NamespaceFunctionsBEL Term Example
HGNCp(), g(), r(), m() [1]
p(HGNC:MAPK14)
MGIp(), g(), r(), m() [1]
p(MGI:Mapk14)
EGID

p(), g(), r() [1]

p(EGID:1432)
GOBPbp() 
bp(GOBP:"cell proliferation")
MESHDpath() 
path(MESHD:Hyperoxia)
CHEBI a()
a(CHEBI: lipopolysaccharide)

[1] Only p() will be used in the simplified statement and the evaluation

Other Functions

FunctionExplanationExample
complex()

Abundance functions: complexAbundance

Denotes the abundance of a molecular complex.

(complex(p(MGI:Itga8),p(MGI:Itgb1))) -> bp(GOBP:"cell adhesion")
pmod()

Modifications: proteinModification

Used as an argument within a p() function to indicate covalent modification of the specified protein.

p(MGI:Cav1,pmod(P)) -> a(CHEBI:"nitric oxide")
deg()

Transformations: degradation

Denotes the frequency or abundance of events where an argument is degraded.

p(MGI:Lyve1) -> deg(a(CHEBI:"hyaluronic acid"))
tloc()

Transformations: translocation

Denotes the frequency or abundance of events where an argument changes location.

a(CHEBI:"brefeldin A") -> tloc(p(MGI:Stk16))
act()

Activities: molecularActivity

Denotes the frequency or abundance of events where an argument acts as a causal agent at the molecular scale.

complex(p(MGI:Cckbr),p(MGI:Gast)) -> act(p(MGI:Prkd1))

These functions will be scored separately. We will give you basic scores for a found relation even when those functions are not included in your BEL statements. Details of evaluation can be found here: Evaluation Details of Biocreative BEL Task 1. And a full overview of evaluation of BEL functions can be found here: All Functions Evaluation Overview.

The last function act() describes the activity of the protein. This information is partly inferred directly from the sentence by the user but often inferred from background knowledge. In the original statements different activities are given such as cat(), tscript(), kin() and gtp() and act() will be accepted for any of them.

BEL Relationships

BEL defines an intrinsic set of relationship types used to represent the type of relationship observed. We decided to focus on causal relationships.

In the BEL Track dataset, there are four accepted relationships:

RelationshipExample
  • decreases (alternative Symbol: -|)
  • directlyDecreases (alternative Symbol:  =|) 
    • Comment: we will accept decreases instead of directlyDecreases
a(CHEBI:"brefeldin A") -| p(HGNC:SCOC)
  • increases (alternative Symbol: ->)
  • directlyIncreases (alternative Symbol: =>) 
    • Comment: we will accept increases instead of directlyIncreases
p(HGNC:BMP4) increases complex(p(HGNC:MTOR), p(HGNC:STAT3))


These relationships will be expected and evaluated.

  • No labels