deepfold.common.residue_constants.sequence_to_onehot

deepfold.common.residue_constants.sequence_to_onehot(sequence: str, mapping: Mapping[str, int], map_unknown_to_x: bool = False) ndarray[source]

Maps the given sequence into a one-hot encoded matrix.

Parameters:
  • sequence – An amino acid sequence.

  • mapping – A dictionary mapping amino acids to integers.

  • map_unknown_to_x – If True, any amino acid that is not in the mapping will be mapped to the unknown amino acid ‘X’. If the mapping doesn’t contain amino acid ‘X’, an error will be thrown. If False, any amino acid not in the mapping will throw an error.

Returns:

A numpy array of shape (seq_len, num_unique_aas) with one-hot encoding of the sequence.

Raises:

ValueError – If the mapping doesn’t contain values from 0 to num_unique_aas - 1 without any gaps.