The BEL resource data pipeline (resource-generator repository) assembles biological identifiers (genes, proteins, disease concepts, etc.) into an RDF model. This RDF model is cohesive such that each identifier is described and linked to equivalent identifiers where possible.
The project would benefit from a streamlined design so a knowledge knowledged Python programmer is highly appropriate. One area to explore might be to take a map-reduce approach; that is split up dataset processing and assemble the RDF model in the last stage.
- New namespaces to support BEL version 2.0.
- Incorporate Parent-Child identifiers from datasets.
Enhancements to the BEL data pipeline to streamline the pipeline, making it easier to expand and maintain