ID Mapping Tutorial

This tutorial demonstrates how to use UniProtMapper to map between different types of identifiers.

Basic Mapping

Here’s a simple example of mapping UniProt accession IDs to Ensembl IDs:

from UniProtMapper import ProtMapper

mapper = ProtMapper()

result, failed = mapper.get(
    ids=["P30542", "Q16678", "Q02880"],
    from_db="UniProtKB_AC-ID",
    to_db="Ensembl"
)

The result is a pandas.DataFrame containing the query and mapped IDs (column names From and To, respectively), while failed is a list of IDs that couldn’t be mapped.

Mapping Through Cross-Referenced Fields

Ensembl is also cross-referenced in UniProt entries. In case you’re interested in checking all cross-referenced Ensembl IDs for a given UniProt entry, you can do so by:

from UniProtMapper import ProtMapper

mapper = ProtMapper()

fields = ["xref_ensembl"]
result, failed = mapper.get(
    ids=["P30542", "Q16678", "Q02880"],
    fields=fields,
)

Note

For a full list of the supported fields, check the Supported fields section of the docs. Here, result is again a pandas.DataFrame containing the query and mapped IDs (column names From and Ensembl, following the label column in the reference table).

Available Databases

UniProtMapper supports mapping between numerous databases. You can view the complete list of supported databases in ProtMapper()._supported_dbs or check UniProt’s documentation.

Handling Failed Mappings

Some IDs might fail to map if the identifier you’re working with is not listed on the cross-references of a certain UniProt entry. Therefore, ProtMapper.get method will always return two values, with the first being the result and the second, a list of IDs that failed to be mapped (an empty list if all IDs were successfully mapped). Here’s how to handle failed mappings:

# Check if there were any failed mappings
if failed:
    print(f"Failed to map {len(failed)} IDs:")
    print(f"- {' '.join(id)}")

Batch Processing

For large sets of IDs, UniProtMapper automatically handles batch processing:

# Large list of IDs
ids = ["P30542", "Q16678", ..., "Q02880"]

# UniProtMapper will handle batching automatically
result, failed = mapper.get(
    ids=ids,
    from_db="UniProtKB_AC-ID",
    to_db="Ensembl"
)