deepfold.data.multimer package

Submodules

deepfold.data.multimer.input_features module

class deepfold.data.multimer.input_features.ComplexInfo(descriptions: List[str] = <factory>, num_units: List[int] = <factory>)

Bases: object

descriptions: List[str]
num_units: List[int]
deepfold.data.multimer.input_features.add_assembly_features(all_chain_features: MutableMapping[str, dict]) MutableMapping[str, dict]

Add features to distinguish between chains.

Parameters:

all_chain_features – A dictionary which maps chain_id to a dictionary of features for each chain.

Returns:

A dictionary which maps strings of the form

<seq_id>_<sym_id> to the corresponding chain features. E.g. two chains from a homodimer would have keys A_1 and A_2. Two chains from a heterodimer would have keys A_1 and B_1.

Return type:

all_chain_features

deepfold.data.multimer.input_features.convert_monomer_features(monomer_features: dict) dict

Reshapes and modifies monomer features for multimer models.

deepfold.data.multimer.input_features.create_all_seq_msa_features(all_seq_features: dict) dict
deepfold.data.multimer.input_features.create_all_seq_msa_features_from_a3m(a3m_strings: Sequence[str], sequence: str | None = None) dict

Get MSA features for paring.

deepfold.data.multimer.input_features.create_multimer_features(paired_a3m_strings: List[str], sequence: str | None = None) dict

Create multimer features from paired MSA strings.

deepfold.data.multimer.input_features.crop_chains(chains_list: List[dict], msa_crop_size: int, pair_msa_sequences: bool, max_templates: int) List[dict]

Crops the MSAs for a set of chains.

Parameters:
  • chains_list – A list of chains to be cropped.

  • msa_crop_size – The total number of sequences to crop from the MSA.

  • pair_msa_sequences – Whether we are operating in sequence-pairing mode.

  • max_templates – The maximum templates to use per chain.

Returns:

The chains cropped.

deepfold.data.multimer.input_features.int_id_to_str_id(num: int) str

Encodes a number as a string, using reverse spreadsheet style naming.

Parameters:

num – A positive integer.

Returns:

A string that encodes the positive integer using reverse spreadsheet style, naming e.g. 1 = A, 2 = B, …, 27 = AA, 28 = BA, 29 = CA, … This is the usual way to encode chain IDs in mmCIF files.

deepfold.data.multimer.input_features.pad_msa(example: dict, min_num_cluster) dict
deepfold.data.multimer.input_features.pair_and_merge(all_chain_features: MutableMapping[str, dict]) dict

Runs processing on features to augment, pair and merge.

Parameters:

all_chain_features – A MutableMap of dictionaries of features for each chain.

Returns:

A dictionary of features.

deepfold.data.multimer.input_features.process_final(np_example: dict) dict

Final processing steps in data pipeline, after merging and pairing.

deepfold.data.multimer.input_features.process_multimer_features(complex: ComplexInfo, all_monomer_features: Mapping[str, dict], pair_with_identifier: bool = False, a3m_strings_with_identifiers: Mapping[str, str] | None = None, paired_a3m_strings: Mapping[str, str] = {}, max_num_clusters: int = 508) dict

Create a multimer input features.

deepfold.data.multimer.input_features.process_single_chain(chain_features: dict, is_homomer_or_monomer: bool, a3m_strings_for_paring: Sequence[str] | None = None, use_identifier: bool = False) dict

Process a single chain features.

deepfold.data.multimer.input_features.process_unmerged_features(all_chain_features: MutableMapping[str, dict])

Postprocessing stage for per-chain features before merging.

deepfold.data.multimer.msa_pairing module

Pairing logic for multimer data pipeline.

deepfold.data.multimer.msa_pairing.block_diag(*arrs: ndarray, pad_value: float = 0.0) ndarray

Like scipy.linalg.block_diag but with an optional padding value.

deepfold.data.multimer.msa_pairing.create_paired_features(chains: Iterable[Mapping[str, ndarray]]) List[Mapping[str, ndarray]]

Returns the original chains with paired NUM_SEQ features.

Parameters:

chains – A list of feature dictionaries for each chain.

Returns:

A list of feature dictionaries with sequence features including only rows to be paired.

deepfold.data.multimer.msa_pairing.deduplicate_unpaired_sequences(np_chains: List[Mapping[str, ndarray]]) List[Mapping[str, ndarray]]

Removes unpaired sequences which duplicate a paired sequence.

deepfold.data.multimer.msa_pairing.merge_chain_features(np_chains_list: List[Mapping[str, ndarray]], pair_msa_sequences: bool, max_templates: int) Mapping[str, ndarray]

Merges features for multiple chains to single FeatureDict.

Parameters:
  • np_chains_list – List of FeatureDicts for each chain.

  • pair_msa_sequences – Whether to merge paired MSAs.

  • max_templates – The maximum number of templates to include.

Returns:

Single FeatureDict for entire complex.

deepfold.data.multimer.msa_pairing.pad_features(feature: ndarray, feature_name: str) ndarray

Add a ‘padding’ row at the end of the features list.

The padding row will be selected as a ‘paired’ row in the case of partial alignment - for the chain that doesn’t have paired alignment.

Parameters:
  • feature – The feature to be padded.

  • feature_name – The name of the feature to be padded.

Returns:

The feature with an additional padding row.

deepfold.data.multimer.msa_pairing.pair_sequences(examples: List[Mapping[str, ndarray]]) Dict[int, ndarray]

Returns indices for paired MSA sequences across chains.

deepfold.data.multimer.msa_pairing.reorder_paired_rows(all_paired_msa_rows_dict: Dict[int, ndarray]) ndarray

Creates a list of indices of paired MSA rows across chains.

Parameters:

all_paired_msa_rows_dict – a mapping from the number of paired chains to the paired indices.

Returns:

a list of lists, each containing indices of paired MSA rows across chains. The paired-index lists are ordered by:

  1. the number of chains in the paired alignment, i.e, all-chain pairings will come first.

  2. e-values

Module contents