deepfold.common package¶
Submodules¶
deepfold.common.protein module¶
Protein data type.
- class deepfold.common.protein.Protein(atom_positions: ndarray, aatype: ndarray, atom_mask: ndarray, residue_index: ndarray, b_factors: ndarray, chain_index: ndarray | None = None, remark: str | None = None, parents: Sequence[str] | None = None, parents_chain_index: Sequence[int] | None = None)¶
Bases:
object
Protein structure representation.
- aatype: ndarray¶
- atom_mask: ndarray¶
- atom_positions: ndarray¶
- b_factors: ndarray¶
- chain_index: ndarray | None = None¶
- parents: Sequence[str] | None = None¶
- parents_chain_index: Sequence[int] | None = None¶
- remark: str | None = None¶
- residue_index: ndarray¶
- deepfold.common.protein.add_pdb_headers(prot: Protein, pdb_str: str) str ¶
Add pdb headers to an existing PDB string. Useful during multi-chain recycling
- deepfold.common.protein.from_pdb_string(pdb_str: str, chain_id: str | None = None) Protein ¶
Takes a PDB string and constructs a Protein object.
- WARNING: All non-standard residue types will be converted into UNK. All
non-standard atoms will be ignored.
- Parameters:
pdb_str – The contents of the pdb file
chain_id – If None, then the whole pdb file is parsed. If chain_id is specified (e.g. A), then only that chain is parsed.
- Returns:
A new Protein parsed from the pdb contents.
- deepfold.common.protein.from_prediction(processed_features: Mapping[str, ndarray], result: Mapping[str, Any], b_factors: ndarray | None = None, remove_leading_feature_dimension: bool = False, is_trajectory: bool = False, remark: str | None = None, parents: Sequence[str] | None = None, parents_chain_index: Sequence[int] | None = None) Protein | List[Protein] ¶
Assembles a protein from a prediction.
- Parameters:
processed_features – Dictionary holding model inputs.
result – Dictionary holding model outputs.
b_factors – (Optional) B-factors to use for the protein.
remove_leading_feature_dimension – Whether to remove the leading dimension of the feature values.
remark – (Optional) Remark about the prediction
parents – (Optional) List of template names
- Returns:
A protein instance.
- deepfold.common.protein.from_relaxation(relaxed_pdb_str: str, residue_index: ndarray | None = None, chain_index: ndarray | None = None, b_factors: ndarray | None = None) Protein ¶
Amber relaxation procedure renames residue index starting from 1. Since we may ahve cropped domains, we must fix residue indices with correct ones.
- Parameters:
relaxed_pdb_str – a protein indices/
residue_index – residue indcies.
- Returns:
PDB strings.
- deepfold.common.protein.ideal_atom_mask(prot: Protein) ndarray ¶
Computes an ideal atom mask.
Protein.atom_mask typically is defined according to the atoms that are reported in the PDB. This function computes a mask according to heavy atoms that should be present in the given sequence of amino acids.
- Parameters:
prot – Protein whose fields are numpy.ndarray objects.
- Returns:
An ideal atom mask.
- deepfold.common.protein.to_modelcif(prot: Protein) str ¶
Converts a Protein instance to a ModelCIF string. Chains with identical modelled coordinates will be treated as the same polymer entity. But note that if chains differ in modelled regions, no attempt is made at identifying them as a single polymer entity.
- Parameters:
prot – The protein to convert to PDB.
- Returns:
ModelCIF string.
deepfold.common.residue_constants module¶
Constants used in AlphaFold.
- class deepfold.common.residue_constants.Bond(atom1_name, atom2_name, length, stddev)¶
Bases:
tuple
- atom1_name¶
Alias for field number 0
- atom2_name¶
Alias for field number 1
- length¶
Alias for field number 2
- stddev¶
Alias for field number 3
- class deepfold.common.residue_constants.BondAngle(atom1_name, atom2_name, atom3name, angle_rad, stddev)¶
Bases:
tuple
- angle_rad¶
Alias for field number 3
- atom1_name¶
Alias for field number 0
- atom2_name¶
Alias for field number 1
- atom3name¶
Alias for field number 2
- stddev¶
Alias for field number 4
- deepfold.common.residue_constants.aatype_to_str_sequence(aatype)¶
Return all residue types with X.
- deepfold.common.residue_constants.chi_angle_atom(atom_index: int) ndarray ¶
Define chi-angle rigid groups via one-hot representations.
- deepfold.common.residue_constants.load_stereo_chemical_props() Tuple[Mapping[str, List[Bond]], Mapping[str, List[Bond]], Mapping[str, List[BondAngle]]] ¶
Load stereo_chemical_props.txt into a nice structure.
Load literature values for bond lengths and bond angles and translate bond angles into the length of the opposite edge of the triangle (“residue_virtual_bonds”).
- Returns:
Dict that maps resname -> list of Bond tuples residue_virtual_bonds: Dict that maps resname -> list of Bond tuples residue_bond_angles: Dict that maps resname -> list of BondAngle tuples
- Return type:
residue_bonds
- deepfold.common.residue_constants.make_atom14_dists_bounds(overlap_tolerance=1.5, bond_length_tolerance_factor=15)¶
compute upper and lower bounds for bonds to assess violations.
- deepfold.common.residue_constants.map_structure_with_atom_order(in_list: list, first_call: bool = True) list ¶
- deepfold.common.residue_constants.sequence_to_onehot(sequence: str, mapping: Mapping[str, int], map_unknown_to_x: bool = False) ndarray ¶
Maps the given sequence into a one-hot encoded matrix.
- Parameters:
sequence – An amino acid sequence.
mapping – A dictionary mapping amino acids to integers.
map_unknown_to_x – If True, any amino acid that is not in the mapping will be mapped to the unknown amino acid ‘X’. If the mapping doesn’t contain amino acid ‘X’, an error will be thrown. If False, any amino acid not in the mapping will throw an error.
- Returns:
A numpy array of shape (seq_len, num_unique_aas) with one-hot encoding of the sequence.
- Raises:
ValueError – If the mapping doesn’t contain values from 0 to num_unique_aas - 1 without any gaps.