deepfold.common package

Submodules

deepfold.common.protein module

Protein data type.

class deepfold.common.protein.Protein(atom_positions: ndarray, aatype: ndarray, atom_mask: ndarray, residue_index: ndarray, b_factors: ndarray, chain_index: ndarray | None = None, remark: str | None = None, parents: Sequence[str] | None = None, parents_chain_index: Sequence[int] | None = None)

Bases: object

Protein structure representation.

aatype: ndarray
atom_mask: ndarray
atom_positions: ndarray
b_factors: ndarray
chain_index: ndarray | None = None
parents: Sequence[str] | None = None
parents_chain_index: Sequence[int] | None = None
remark: str | None = None
residue_index: ndarray
deepfold.common.protein.add_pdb_headers(prot: Protein, pdb_str: str) str

Add pdb headers to an existing PDB string. Useful during multi-chain recycling

deepfold.common.protein.from_pdb_string(pdb_str: str, chain_id: str | None = None) Protein

Takes a PDB string and constructs a Protein object.

WARNING: All non-standard residue types will be converted into UNK. All

non-standard atoms will be ignored.

Parameters:
  • pdb_str – The contents of the pdb file

  • chain_id – If None, then the whole pdb file is parsed. If chain_id is specified (e.g. A), then only that chain is parsed.

Returns:

A new Protein parsed from the pdb contents.

deepfold.common.protein.from_prediction(processed_features: Mapping[str, ndarray], result: Mapping[str, Any], b_factors: ndarray | None = None, remove_leading_feature_dimension: bool = False, is_trajectory: bool = False, remark: str | None = None, parents: Sequence[str] | None = None, parents_chain_index: Sequence[int] | None = None) Protein | List[Protein]

Assembles a protein from a prediction.

Parameters:
  • processed_features – Dictionary holding model inputs.

  • result – Dictionary holding model outputs.

  • b_factors – (Optional) B-factors to use for the protein.

  • remove_leading_feature_dimension – Whether to remove the leading dimension of the feature values.

  • remark – (Optional) Remark about the prediction

  • parents – (Optional) List of template names

Returns:

A protein instance.

deepfold.common.protein.from_relaxation(relaxed_pdb_str: str, residue_index: ndarray | None = None, chain_index: ndarray | None = None, b_factors: ndarray | None = None) Protein

Amber relaxation procedure renames residue index starting from 1. Since we may ahve cropped domains, we must fix residue indices with correct ones.

Parameters:
  • relaxed_pdb_str – a protein indices/

  • residue_index – residue indcies.

Returns:

PDB strings.

deepfold.common.protein.get_pdb_headers(prot: Protein, chain_id: int = 0) Sequence[str]
deepfold.common.protein.ideal_atom_mask(prot: Protein) ndarray

Computes an ideal atom mask.

Protein.atom_mask typically is defined according to the atoms that are reported in the PDB. This function computes a mask according to heavy atoms that should be present in the given sequence of amino acids.

Parameters:

protProtein whose fields are numpy.ndarray objects.

Returns:

An ideal atom mask.

deepfold.common.protein.to_modelcif(prot: Protein) str

Converts a Protein instance to a ModelCIF string. Chains with identical modelled coordinates will be treated as the same polymer entity. But note that if chains differ in modelled regions, no attempt is made at identifying them as a single polymer entity.

Parameters:

prot – The protein to convert to PDB.

Returns:

ModelCIF string.

deepfold.common.protein.to_pdb(prots: Protein | Iterable[Protein]) str

Convert Protein instances to a PDB string.

deepfold.common.residue_constants module

Constants used in AlphaFold.

class deepfold.common.residue_constants.Bond(atom1_name, atom2_name, length, stddev)

Bases: tuple

atom1_name

Alias for field number 0

atom2_name

Alias for field number 1

length

Alias for field number 2

stddev

Alias for field number 3

class deepfold.common.residue_constants.BondAngle(atom1_name, atom2_name, atom3name, angle_rad, stddev)

Bases: tuple

angle_rad

Alias for field number 3

atom1_name

Alias for field number 0

atom2_name

Alias for field number 1

atom3name

Alias for field number 2

stddev

Alias for field number 4

deepfold.common.residue_constants.aatype_to_str_sequence(aatype)

Return all residue types with X.

deepfold.common.residue_constants.chi_angle_atom(atom_index: int) ndarray

Define chi-angle rigid groups via one-hot representations.

deepfold.common.residue_constants.load_stereo_chemical_props() Tuple[Mapping[str, List[Bond]], Mapping[str, List[Bond]], Mapping[str, List[BondAngle]]]

Load stereo_chemical_props.txt into a nice structure.

Load literature values for bond lengths and bond angles and translate bond angles into the length of the opposite edge of the triangle (“residue_virtual_bonds”).

Returns:

Dict that maps resname -> list of Bond tuples residue_virtual_bonds: Dict that maps resname -> list of Bond tuples residue_bond_angles: Dict that maps resname -> list of BondAngle tuples

Return type:

residue_bonds

deepfold.common.residue_constants.make_atom14_dists_bounds(overlap_tolerance=1.5, bond_length_tolerance_factor=15)

compute upper and lower bounds for bonds to assess violations.

deepfold.common.residue_constants.map_structure_with_atom_order(in_list: list, first_call: bool = True) list
deepfold.common.residue_constants.sequence_to_onehot(sequence: str, mapping: Mapping[str, int], map_unknown_to_x: bool = False) ndarray

Maps the given sequence into a one-hot encoded matrix.

Parameters:
  • sequence – An amino acid sequence.

  • mapping – A dictionary mapping amino acids to integers.

  • map_unknown_to_x – If True, any amino acid that is not in the mapping will be mapped to the unknown amino acid ‘X’. If the mapping doesn’t contain amino acid ‘X’, an error will be thrown. If False, any amino acid not in the mapping will throw an error.

Returns:

A numpy array of shape (seq_len, num_unique_aas) with one-hot encoding of the sequence.

Raises:

ValueError – If the mapping doesn’t contain values from 0 to num_unique_aas - 1 without any gaps.

Module contents