Motivation

Because names for biomedical entities can change quite often (e.g., HGNC gene symbols change quite frequently), it's much safer to reference them using stable identifiers. However, humans prefer names to identifies when interacting with data and knowledge, so there needs to be a fast, unified way to resolve identifiers to names.

Ontologies like the Gene Ontology can be accessed through one of several unified lookup services such as the OLS, AberOWL, OntoBee, and BioPortal. However, these services can only be used for biomedical entities appearing in ontologies, and not for other important nomenclatures such as HGNC, UniProt, or PubChem. Alternatively, small databases like SwissProt (i.e., the reviewed portion of UniProt entries) can be exported and wrapped in small packages like the protmapper that provide easy lookup for names based on identifiers. Larger databases like PubChem Compound and dbSNP can be accessed through a programmatic API becuase they can't be easily exported or quickly loaded in memory.

The Biolookup Service is a unified platform that is not only applicable for biomedical entities in ontologies, but from both small and large databases as well.

Database Generation

The first set of resources ingested in the Biolookup Service are those listed in the Bioregistry as either having an OWL or OBO ontology file. This mostly covers the OBO Foundry as well as additional resources like Cellosaurus. They are parsed with a combination of the obonet and pronto Python packages. Unfortunately, many ontologies listed in the OBO Foundry that only appear with an OWL build artifact have issues that make them impossible to parse. The Biolookup Service has the benefit that the resource list is externally maintained and can therefore benefit from arbitrary improvements to the upstream data source. Ontologies in the BioPortal are not automatically listed in the Bioregistry the same as ontologies in the OBO Foundry due to their lack of quality control.

The second set of resources ingested in the Biolookup Service are any resources (ontologies, databases, etc.) available through the pyobo Python package. Additional resources can be suggested for inclusion in the Biolookup Service via the PyOBO issue tracker.

Database Coverage

This two-sided comparison shows the overlap between resources in the Biolookup Service and the Bioregistry. All resources included the Biolookup Service, both ontologies and databases, are normalized to use Bioregistry prefixes. Bioregistry Coverage