deepfold.data.multimer package¶
Submodules¶
deepfold.data.multimer.input_features module¶
- class deepfold.data.multimer.input_features.ComplexInfo(descriptions: List[str] = <factory>, num_units: List[int] = <factory>)¶
Bases:
object
- descriptions: List[str]¶
- num_units: List[int]¶
- deepfold.data.multimer.input_features.add_assembly_features(all_chain_features: MutableMapping[str, dict]) MutableMapping[str, dict] ¶
Add features to distinguish between chains.
- Parameters:
all_chain_features – A dictionary which maps chain_id to a dictionary of features for each chain.
- Returns:
- A dictionary which maps strings of the form
<seq_id>_<sym_id> to the corresponding chain features. E.g. two chains from a homodimer would have keys A_1 and A_2. Two chains from a heterodimer would have keys A_1 and B_1.
- Return type:
all_chain_features
- deepfold.data.multimer.input_features.convert_monomer_features(monomer_features: dict) dict ¶
Reshapes and modifies monomer features for multimer models.
- deepfold.data.multimer.input_features.create_all_seq_msa_features(all_seq_features: dict) dict ¶
- deepfold.data.multimer.input_features.create_all_seq_msa_features_from_a3m(a3m_strings: Sequence[str], sequence: str | None = None) dict ¶
Get MSA features for paring.
- deepfold.data.multimer.input_features.create_multimer_features(paired_a3m_strings: List[str], sequence: str | None = None) dict ¶
Create multimer features from paired MSA strings.
- deepfold.data.multimer.input_features.crop_chains(chains_list: List[dict], msa_crop_size: int, pair_msa_sequences: bool, max_templates: int) List[dict] ¶
Crops the MSAs for a set of chains.
- Parameters:
chains_list – A list of chains to be cropped.
msa_crop_size – The total number of sequences to crop from the MSA.
pair_msa_sequences – Whether we are operating in sequence-pairing mode.
max_templates – The maximum templates to use per chain.
- Returns:
The chains cropped.
- deepfold.data.multimer.input_features.int_id_to_str_id(num: int) str ¶
Encodes a number as a string, using reverse spreadsheet style naming.
- Parameters:
num – A positive integer.
- Returns:
A string that encodes the positive integer using reverse spreadsheet style, naming e.g. 1 = A, 2 = B, …, 27 = AA, 28 = BA, 29 = CA, … This is the usual way to encode chain IDs in mmCIF files.
- deepfold.data.multimer.input_features.pad_msa(example: dict, min_num_cluster) dict ¶
- deepfold.data.multimer.input_features.pair_and_merge(all_chain_features: MutableMapping[str, dict]) dict ¶
Runs processing on features to augment, pair and merge.
- Parameters:
all_chain_features – A MutableMap of dictionaries of features for each chain.
- Returns:
A dictionary of features.
- deepfold.data.multimer.input_features.process_final(np_example: dict) dict ¶
Final processing steps in data pipeline, after merging and pairing.
- deepfold.data.multimer.input_features.process_multimer_features(complex: ComplexInfo, all_monomer_features: Mapping[str, dict], pair_with_identifier: bool = False, a3m_strings_with_identifiers: Mapping[str, str] | None = None, paired_a3m_strings: Mapping[str, str] = {}, max_num_clusters: int = 508) dict ¶
Create a multimer input features.
- deepfold.data.multimer.input_features.process_single_chain(chain_features: dict, is_homomer_or_monomer: bool, a3m_strings_for_paring: Sequence[str] | None = None, use_identifier: bool = False) dict ¶
Process a single chain features.
- deepfold.data.multimer.input_features.process_unmerged_features(all_chain_features: MutableMapping[str, dict])¶
Postprocessing stage for per-chain features before merging.
deepfold.data.multimer.msa_pairing module¶
Pairing logic for multimer data pipeline.
- deepfold.data.multimer.msa_pairing.block_diag(*arrs: ndarray, pad_value: float = 0.0) ndarray ¶
Like scipy.linalg.block_diag but with an optional padding value.
- deepfold.data.multimer.msa_pairing.create_paired_features(chains: Iterable[Mapping[str, ndarray]]) List[Mapping[str, ndarray]] ¶
Returns the original chains with paired NUM_SEQ features.
- Parameters:
chains – A list of feature dictionaries for each chain.
- Returns:
A list of feature dictionaries with sequence features including only rows to be paired.
- deepfold.data.multimer.msa_pairing.deduplicate_unpaired_sequences(np_chains: List[Mapping[str, ndarray]]) List[Mapping[str, ndarray]] ¶
Removes unpaired sequences which duplicate a paired sequence.
- deepfold.data.multimer.msa_pairing.merge_chain_features(np_chains_list: List[Mapping[str, ndarray]], pair_msa_sequences: bool, max_templates: int) Mapping[str, ndarray] ¶
Merges features for multiple chains to single FeatureDict.
- Parameters:
np_chains_list – List of FeatureDicts for each chain.
pair_msa_sequences – Whether to merge paired MSAs.
max_templates – The maximum number of templates to include.
- Returns:
Single FeatureDict for entire complex.
- deepfold.data.multimer.msa_pairing.pad_features(feature: ndarray, feature_name: str) ndarray ¶
Add a ‘padding’ row at the end of the features list.
The padding row will be selected as a ‘paired’ row in the case of partial alignment - for the chain that doesn’t have paired alignment.
- Parameters:
feature – The feature to be padded.
feature_name – The name of the feature to be padded.
- Returns:
The feature with an additional padding row.
- deepfold.data.multimer.msa_pairing.pair_sequences(examples: List[Mapping[str, ndarray]]) Dict[int, ndarray] ¶
Returns indices for paired MSA sequences across chains.
- deepfold.data.multimer.msa_pairing.reorder_paired_rows(all_paired_msa_rows_dict: Dict[int, ndarray]) ndarray ¶
Creates a list of indices of paired MSA rows across chains.
- Parameters:
all_paired_msa_rows_dict – a mapping from the number of paired chains to the paired indices.
- Returns:
a list of lists, each containing indices of paired MSA rows across chains. The paired-index lists are ordered by:
the number of chains in the paired alignment, i.e, all-chain pairings will come first.
e-values