deepfold.data.multimer package¶

Submodules¶

deepfold.data.multimer.input_features module¶

class deepfold.data.multimer.input_features.ComplexInfo(descriptions: List[str] = <factory>, num_units: List[int] = <factory>)¶

Bases: object

descriptions: List[str]¶

num_units: List[int]¶

deepfold.data.multimer.input_features.add_assembly_features(all_chain_features: MutableMapping[str, dict]) → MutableMapping[str, dict]¶

Add features to distinguish between chains.

Parameters:

all_chain_features – A dictionary which maps chain_id to a dictionary of features for each chain.

Returns:

A dictionary which maps strings of the form: <seq_id>_<sym_id> to the corresponding chain features. E.g. two chains from a homodimer would have keys A_1 and A_2. Two chains from a heterodimer would have keys A_1 and B_1.

Return type:

all_chain_features

deepfold.data.multimer.input_features.convert_monomer_features(monomer_features: dict) → dict¶: Reshapes and modifies monomer features for multimer models.

deepfold.data.multimer.input_features.create_all_seq_msa_features(all_seq_features: dict) → dict¶

deepfold.data.multimer.input_features.create_all_seq_msa_features_from_a3m(a3m_strings: Sequence[str], sequence: str | None = None) → dict¶: Get MSA features for paring.

deepfold.data.multimer.input_features.create_multimer_features(paired_a3m_strings: List[str], sequence: str | None = None) → dict¶: Create multimer features from paired MSA strings.

deepfold.data.multimer.input_features.crop_chains(chains_list: List[dict], msa_crop_size: int, pair_msa_sequences: bool, max_templates: int) → List[dict]¶

Crops the MSAs for a set of chains.

Parameters:

chains_list – A list of chains to be cropped.
msa_crop_size – The total number of sequences to crop from the MSA.
pair_msa_sequences – Whether we are operating in sequence-pairing mode.
max_templates – The maximum templates to use per chain.

Returns:

The chains cropped.

deepfold.data.multimer.input_features.int_id_to_str_id(num: int) → str¶

Encodes a number as a string, using reverse spreadsheet style naming.

Parameters:: num – A positive integer.
Returns:: A string that encodes the positive integer using reverse spreadsheet style, naming e.g. 1 = A, 2 = B, …, 27 = AA, 28 = BA, 29 = CA, … This is the usual way to encode chain IDs in mmCIF files.

deepfold.data.multimer.input_features.pad_msa(example: dict, min_num_cluster) → dict¶

deepfold.data.multimer.input_features.pair_and_merge(all_chain_features: MutableMapping[str, dict]) → dict¶

Runs processing on features to augment, pair and merge.

Parameters:: all_chain_features – A MutableMap of dictionaries of features for each chain.
Returns:: A dictionary of features.

deepfold.data.multimer.input_features.process_final(np_example: dict) → dict¶: Final processing steps in data pipeline, after merging and pairing.

deepfold.data.multimer.input_features.process_multimer_features(complex: ComplexInfo, all_monomer_features: Mapping[str, dict], pair_with_identifier: bool = False, a3m_strings_with_identifiers: Mapping[str, str] | None = None, paired_a3m_strings: Mapping[str, str] = {}, max_num_clusters: int = 508) → dict¶: Create a multimer input features.

deepfold.data.multimer.input_features.process_single_chain(chain_features: dict, is_homomer_or_monomer: bool, a3m_strings_for_paring: Sequence[str] | None = None, use_identifier: bool = False) → dict¶: Process a single chain features.

deepfold.data.multimer.input_features.process_unmerged_features(all_chain_features: MutableMapping[str, dict])¶: Postprocessing stage for per-chain features before merging.

deepfold.data.multimer.msa_pairing module¶

Pairing logic for multimer data pipeline.

deepfold.data.multimer.msa_pairing.block_diag(*arrs: ndarray, pad_value: float = 0.0) → ndarray¶: Like scipy.linalg.block_diag but with an optional padding value.

deepfold.data.multimer.msa_pairing.create_paired_features(chains: Iterable[Mapping[str, ndarray]]) → List[Mapping[str, ndarray]]¶

Returns the original chains with paired NUM_SEQ features.

Parameters:: chains – A list of feature dictionaries for each chain.
Returns:: A list of feature dictionaries with sequence features including only rows to be paired.

deepfold.data.multimer.msa_pairing.deduplicate_unpaired_sequences(np_chains: List[Mapping[str, ndarray]]) → List[Mapping[str, ndarray]]¶: Removes unpaired sequences which duplicate a paired sequence.

deepfold.data.multimer.msa_pairing.merge_chain_features(np_chains_list: List[Mapping[str, ndarray]], pair_msa_sequences: bool, max_templates: int) → Mapping[str, ndarray]¶

Merges features for multiple chains to single FeatureDict.

Parameters:

np_chains_list – List of FeatureDicts for each chain.
pair_msa_sequences – Whether to merge paired MSAs.
max_templates – The maximum number of templates to include.

Returns:

Single FeatureDict for entire complex.

deepfold.data.multimer.msa_pairing.pad_features(feature: ndarray, feature_name: str) → ndarray¶

Add a ‘padding’ row at the end of the features list.

The padding row will be selected as a ‘paired’ row in the case of partial alignment - for the chain that doesn’t have paired alignment.

Parameters:

feature – The feature to be padded.
feature_name – The name of the feature to be padded.

Returns:

The feature with an additional padding row.

deepfold.data.multimer.msa_pairing.pair_sequences(examples: List[Mapping[str, ndarray]]) → Dict[int, ndarray]¶: Returns indices for paired MSA sequences across chains.

deepfold.data.multimer.msa_pairing.reorder_paired_rows(all_paired_msa_rows_dict: Dict[int, ndarray]) → ndarray¶

Creates a list of indices of paired MSA rows across chains.

Parameters:

all_paired_msa_rows_dict – a mapping from the number of paired chains to the paired indices.

Returns:

a list of lists, each containing indices of paired MSA rows across chains. The paired-index lists are ordered by:

the number of chains in the paired alignment, i.e, all-chain pairings will come first.

e-values

deepfold.data.multimer package¶

Submodules¶

deepfold.data.multimer.input_features module¶

deepfold.data.multimer.msa_pairing module¶

Module contents¶