miniworld.feature package

Submodules

miniworld.feature.MiniWorld_featuring_species module

miniworld.feature.MiniWorld_featuring_species.MSA_block_deletion(msa, insertion, nb=5)[source]

Down-sample given MSA by randomly delete blocks of sequences Input: MSA/Insertion having shape (N, L) output: new MSA/Insertion with block deletion (N’, L)

miniworld.feature.MiniWorld_featuring_species.MSA_featurize_wo_statistics(msa, insertion, chain_to_idx_dict, params)[source]

I modified RF2 version. (just changed name of variables) Input: full MSA information (after Block deletion if necessary) & full insertion information Output: seed MSA features & extra sequences.

Parameters:
  • msa (torch.LongTensor) – Full MSA tensor.

  • insertion (torch.LongTensor) – Full insertion tensor.

  • chain_to_idx_dict (dict) – Dictionary mapping chain ID to residue indices.

  • params (dict) – Dictionary of parameters.

Seed MSA features:

  • aatype of seed sequence (20 regular aa + 1 gap + 1 unknwon + 1 mask)

  • profile of clustered sequences (23) => removed

  • insertion statistics (2) => removed statistics, only use insertion_clust

  • N-term or C-term? (2)

extra sequence features:

  • aatype of extra sequence (23)

  • insertion info (1)

  • N-term or C-term? (2)

miniworld.feature.MiniWorld_featuring_species.MSA_featurize_wo_statistics_by_chain(msa, insertion, N_clust, params)[source]

I modified RF2 version. (just changed name of variables) Input: full MSA information (after Block deletion if necessary) & full insertion information Output: seed MSA features & extra sequences

msa : (N, L) torch.LongTensor ins : (N, L) torch.LongTensor params : list of parameters p_mask : probability of masking eps : small number to avoid zero division chain_break : dictionary of chain idx {chain_id: (start, end)}

Seed MSA features:
  • aatype of seed sequence (20 regular aa + 1 gap + 1 unknwon + 1 mask)

  • profile of clustered sequences (23) => removed

  • insertion statistics (2) => removed statistics, only use insertion_clust

  • N-term or C-term? (2)

extra sequence features:
  • aatype of extra sequence (23)

  • insertion info (1)

  • N-term or C-term? (2)

miniworld.feature.MiniWorld_featuring_species.center_and_realign_missing(xyz, mask_t)[source]
miniworld.feature.MiniWorld_featuring_species.chain_break_cropping(chain_break, crop_idx)[source]
miniworld.feature.MiniWorld_featuring_species.cluster_sum(data, assignment, N_seq, N_res)[source]
miniworld.feature.MiniWorld_featuring_species.cutoff_chain_num(sel, xyz, chain_break, params, query_chain_idx)[source]
miniworld.feature.MiniWorld_featuring_species.display_top(snapshot, key_type='lineno', limit=10)[source]
miniworld.feature.MiniWorld_featuring_species.find_chain_combinations(pairs)[source]
miniworld.feature.MiniWorld_featuring_species.generate_combinations(lst, pairs)[source]
miniworld.feature.MiniWorld_featuring_species.get_STRING_crop(len_s, mask, device, params)[source]
miniworld.feature.MiniWorld_featuring_species.get_complex_crop(len_s, mask, device, params)[source]
miniworld.feature.MiniWorld_featuring_species.get_crop(chain_start, chain_end, mask, device, params, unclamp=False, ID=None)[source]
miniworld.feature.MiniWorld_featuring_species.get_same_crop_idx(xyz_full, crop_idx, chain_break, same_chain_info, cutoff=10.0)[source]
miniworld.feature.MiniWorld_featuring_species.get_spatial_crop(xyz, mask, pivot_chain_idx, chain_break, len_s, params, protein_ID, cutoff=10.0, eps=1e-06)[source]
miniworld.feature.MiniWorld_featuring_species.getsize(obj_0)[source]

Recursively iterate to sum size of object & members.

miniworld.feature.MiniWorld_featuring_species.permute_label(protein_list, crop_idx, out_of_sequence_idxs, chain_break, same_chain_info)[source]
miniworld.feature.MiniWorld_featuring_species.random_split(n, k, min_split)[source]
miniworld.feature.MiniWorld_featuring_species.template_featurize(input_template_dict, params)[source]

I modified RF2 version.

In MSA_featurize, I changed the name of variables and a small part of code because the shape of inputs (msa, insertion) are almost same as RF2. On the other hand, I totally reconstructed template structure, so I changed a lot in this function.

Note

Processes template information for a single chain.

Parameters:
  • input_template_dict (dict) –

    A dictionary containing template information. It should have the following keys:

    • ’xyz’: torch.Tensor of shape (N_template, L_chain, 27, 3)

    • ’mask’: torch.Tensor of shape (N_template, L_chain, 27)

    • ’sequence’: torch.Tensor of shape (N_template, L_chain, NUM_CLASSES)

    • ’f0d’: torch.Tensor of shape (N_template)

    • ’f1d’: torch.Tensor of shape (N_template, L_chain)

  • params (dict) – Dictionary of parameters.

Returns:

A dictionary with processed template features:

  • ’xyz’: torch.Tensor of shape (npick_global, L_query, 27, 3)

  • ’template_1D’: torch.Tensor of shape (npick_global, L_query, 23 + 1)

  • ’template_atom_mask’: torch.Tensor of shape (npick_global, L_query, 27)

Return type:

dict

Module contents