Child pages
  • BioCreative VI Track 3 (BEL Task 2017) Home
Skip to end of metadata
Go to start of metadata
Contents

Important News

BEL Task

Extraction of causal network information using the Biological Expression Language (BEL)

Overview: Automatic extraction of biological network information is one of the most desired and most complex tasks in biological and medical text mining. In BioCreative V, we tackled this complexity by extracting causal relationships represented in Biological Expression Language (BEL, www.openbel.org). BEL is an advanced knowledge representation format which has been designed to be both human readable and machine processable. The smallest unit is a BEL statement or BEL nanopub, expressing a single causal relationship. In the last BioCreative V Track 4 (BEL Task), there was only a limited time for participants to train on the data and, in addition, the evaluation environment became only available for the test phase.  Furthermore, for the second subtask, the sentence classification, no training data was available. Therefore, we decide to present the same task based on new test data. This time, the training data for both subtask is available and, the evaluation environment can be used during the training time. As before, the challenge is organized into two tasks which will evaluate the complementary aspects of the problem:

Task 1

  • Short description: Given selected textual, construct the corresponding BEL statement.

  • Training data: A significant number of relationships systematically selected from the curated networks, with their evidence and the full BEL statement.

  • Test data: A smaller number of relationships from the same dataset. We provide only the evidence sentence and the participants have to generate the BEL statement.

  • Evaluation: We will compare a list of the n best BEL statements generated by the user system to the corresponding human-generated BEL statements. In case a user notices inconsistencies after the fully-automated evaluation step, specific BEL statements can be re-submitted for manual evaluation, given that their syntax has been verified.

Task 2

  • Short description: Given a BEL statement, detect all available textual evidences.

  • Training data: Same data as for Task 1

  • Test data: BEL statements WITHOUT evidence. The participants have to provide at most 10 sentences (ranked by confidence) with different PMIDs that offer evidence for each BEL statement.

  • Evaluation: A team of experts will manually assess all evidence statements provided by the participants and classify them as correct or incorrect. We will then score each participants’ contribution using a ranking metric, such as TAP-k.

Biological Expression Language

The Biological Expression Language (BEL) is a language for representing scientific findings in the life sciences in a computable form. BEL represents scientific findings by capturing causal and correlative relationships in a given context. This context includes information about the biological system and experimental conditions. The supporting evidences are captured and linked to the publication references. It is specifically designed to adopt external vocabularies and ontologies, and therefore represents life-science knowledge in language and schema known by the community.

Datasets

More Information about dataset files and format

Sample Data

In the BioCreative V Track 4 sample data version 1 was used, which can be found here (Datasets -> Sample Corpus).

Training Data

The structure of the training data is identical to the structure of the sample data described above.

In the BioCreative V Track 4 traning data version 1 was used, which can be found here (Datasets -> Training Corpus).

Test Data

The test data for BioCreative VI Track 3 (BEL task) will be released on 11th of July.

Submission Limits

TASK 1: 

We will accept 3 runs per participant/team. In each run, for each input sentences, up to 10 BEL statements or fragments will be considered. 

Although you are free to use the runs as you prefer, in principle each run is meant to represent a different configuration of your system. 

TASK 2: 

We will accept one run per participant/team, containing at most ten (independent) sentences ranked in order of relevance. Since these sentences will be manually evaluated by a team of experts, we can only guarantee to evaluate the top five sentences for each submission. If resources allow, more sentences might be considered (depending on the number of submissions).

Important Dates

Release training data (Training-2015 + Sample-2015 + Test-2015): Already available at the Datasets page.

Evaluation website: coming soon

Release test data: Jun 11, 2017 (Tue)

Submission of results (by participants) deadline: Jun 12, 2017 (Wed)

Release of gold standard entities: Jul 13, 2017 (Thu)

Second submission deadline: Jul 14, 2017 (Fri) (optional delivery of revised results of task 1 including gold standard entities)

Notification of results to participants: Aug 4, 2017 (Fri) (results of task 1 might be notified earlier)

Submission of the system description papers: Aug 20, 2017 (Sun)  

Feedback on the papers: Sep 15, 2017 (Fri)

Camera-ready papers: Oct 1, 2017 (Sun)

Workshop: Oct 18-20, 2017 (Wed-Fri)

Main References

Task organizing committee

  • Dr. Juliane Fluck (Fraunhofer SCAI Institute, Germany)
  • Sumit Madan (Fraunhofer SCAI Institute, Germany)
  • Dr. Justyna Szostak (Philip Morris International: PMI, Switzerland)
  • Prof. Dr. Martin Hofmann-Apitius (OpenBEL Consortium and Fraunhofer SCAI Institute, Germany)
  • No labels