miniworld.utils package¶
Submodules¶
miniworld.utils.DataClass module¶
miniworld.utils.My_mistake module¶
miniworld.utils.ProteinClass module¶
- miniworld.utils.ProteinClass.PDB_parsing(PDB_file, IS_LABEL=True, return_lddt=False)[source]¶
Input : PDB file path, IS_LABEL (bool) Output : Protein object
This function handle multi-model PDB file as well as single-model PDB file. Also this function handle multiple chains. And model idx start from 0 in this function, but PDB model idx start from 1. So I subtract 1 from model idx. This function parse PDB file and return a object of Protein class. I assume that all chain break is same in each model.
- class miniworld.utils.ProteinClass.Protein(sequence, structure, occupancy=None, symmetry_related_info=None, ID='----', IS_LABEL=False, source='')[source]¶
Bases:
object
- class miniworld.utils.ProteinClass.ProteinMSA(msa_ID=None, query_sequence=None, msa_tensor=None, insertion_tensor=None, a3m_file_path=None)[source]¶
Bases:
object
Represents the Multiple Sequence Alignment (MSA) of a Protein:
msa_ID : str query_sequence : str or torch.tensor(1, L) or (L,) msa(_tensor) : torch.tensor(N, L); N is MSA depth insertion(_tensor) : torch.tensor(N, L) a3m_file_path : str for load this object from a3m file
- class miniworld.utils.ProteinClass.ProteinSequence(sequence, masked_sequence=None, chain_break=None)[source]¶
Bases:
object
This class is used to store the protein sequence information:
M : Number of model L : Number of residue sequence : (M, L) torch.Tensor or str. I strongly recommend to use torch.Tensor masked_sequence : (M, L) torch.Tensor. chain_break : (M, ) list of dictionary or OrderDict. Ex) M=1, chain_break[0] = {A: (0,197), B:(198,356), … } sequence = Full sequence without None masked_sequence = Masked sequence with None
- class miniworld.utils.ProteinClass.ProteinStructure(xyz, chain_break, atom_mask, position_mask=None, has_multiple_chains=False, has_multiple_models=False)[source]¶
Bases:
object
This class is used to store the protein structure information:
M : Number of model N : Number of residue xyz : (M, L, 14, 3) np.array or list or tensor. 14 is the number of heavy atom and # … (중략) … position_mask : (M, L) np.array or list or tensor, 1 for confident position and 0 for missing position.
- class miniworld.utils.ProteinClass.ProteinTemplate(sequence, structure, position=None, template_ID=None, f0d=None, f1d=None)[source]¶
Bases:
Protein
position : (M, L_template) np.array or list or tensor. L is the number of model and N_template is the number of residue in template. It have to be less than L.
miniworld.utils.arguments_MiniWorld module¶
miniworld.utils.chemical module¶
miniworld.utils.data_refactoring module¶
This file just contains the test data generation.
- miniworld.utils.data_refactoring.AF_Results_filtering(file_path='/home/psk6950/practice/string_MSA/monomer_colabfold_0_15/direct/0/', saving_directory='data/STRING/', residue_plddt_filtering=0.0, whole_plddt_filtering=0.7, rank_num=5, IS_DIRECT=True)[source]¶
Input : AF results folder path Output : None
This function filter AF results and save it as .pt file using torch
- miniworld.utils.data_refactoring.AF_Results_json_filtering(json_file_path, residue_plddt_filtering=0.95)[source]¶
Input : AF results json path, plddt_filtering (float) Output : plddt_filtered (np.array) [It is index of filtered seq]
This function filter AF results and return plddt_filtered
- miniworld.utils.data_refactoring.AF_Results_json_viewer(file_path)[source]¶
Input : AF results json path Output : None
This function is used to view AF results json file
- miniworld.utils.data_refactoring.PDB_parsing_from_pt(PDB_file, IS_LABEL=True, source='')[source]¶
Input : PDB file path, IS_LABEL (bool) Output : Protein object
This function is for .pt file pt_file is dictionary with keys of seq, xyz, mask, bfac, occ
- miniworld.utils.data_refactoring.a3m_pickling(a3m_folder, saving_directory='data/pickle_data/a3m/')[source]¶
Input : a3m folder path Output : None
This function parse a3m file and save it as .pkl file using pickle In /public_data/ml/RF2_train/PDB-2021AUG02/a3m/, there are many folders which has 3 letters.
- miniworld.utils.data_refactoring.get_ID_to_source_dict(sources_path_list, saving_path='data/ID_to_source_dict.txt')[source]¶
Input : sources directory path, saving directory path(.txt) Output : ID_to_source_dict (dictionary)
PDB ID = mmcif ID STRING ID = pkl ID
- miniworld.utils.data_refactoring.get_hash_dict(msa_folder_path, saving_path='data/hash_dict.txt')[source]¶
- miniworld.utils.data_refactoring.gzip_files()[source]¶
It’s just for convenience. For large data, use gzip library rather than this function.
- miniworld.utils.data_refactoring.hhr_dir_rerefactoring(hhr_dir='data/hhr_refactoring', saving_dir='data/hhr_rerefactoring')[source]¶
- miniworld.utils.data_refactoring.hhr_refactoring(hhr_folder, saving_directory='data/hhr_refactoring/')[source]¶
Input : hhr folder path Output : None
This function parse hhr file and save it as .pt file using torch In /public_data/ml/RF2_train/PDB-2021AUG02/torch/hhr/, there are many folders which has 3 letters.
- miniworld.utils.data_refactoring.mmcif_Full_sequence_parsing(mmcif_file, output='String')[source]¶
Input : mmcif file path Output : Full Sequence
I miss that sequence in mmcif file is not full sequence. So, I parse full sequence.
- miniworld.utils.data_refactoring.mmcif_full_sequence(mmcif_folder, saving_file_path='data/PDB_ID_full_sequence.txt')[source]¶
Input : mmcif folder path Output : None
This function parse mmcif file and save it as .pkl file using pickle. In public_data/rcsb/cif/, there are many folders which has 2 letters.
- miniworld.utils.data_refactoring.mmcif_line_parser(line, loop_=None)[source]¶
Parses a single line from an mmCIF file.:
Input : mmcif line Output : dictionary or list loop_ : Optional, Example) [_atom_site.group_PDB, _atom_site.id, …] Example : ATOM line -> ATOM 1 N N . MET A 1 1 ? 11.242 1.210 20.525 1.00 4.07 ? 1 MET A N 1 Chem line -> ALA ‘L-peptide linking’ y ALANINE ? ‘C3 H7 N O2’ 89.093 These lines are separated by space but there is ‘something something’ (‘L-peptide linking’) with space in some lines. So I use alternative split function.
- miniworld.utils.data_refactoring.mmcif_loop_parser(lines_split_by_sharp, first_key, IS_ATOM=False)[source]¶
CAUTION !!! mmcif is not well-sturctured file. So it is not easy to parse. Therefore I use this function to parse mmcif file.
- miniworld.utils.data_refactoring.mmcif_parsing(mmcif_file, IS_LABEL=True)[source]¶
Input : mmcif file path (~.cif or ~.cif.gz) Output : Protein object
In this project, I exclude nucleic acid. So I don’t consider nucleic acid. Also I only consider ATOM line -> It can make problem…
- miniworld.utils.data_refactoring.mmcif_pickling(mmcif_folder, saving_directory='data/pickle_data/mmcif/')[source]¶
Input : mmcif folder path Output : None
This function parse mmcif file and save it as .pkl file using pickle. In public_data/rcsb/cif/, there are many folders which has 2 letters.
- miniworld.utils.data_refactoring.pdb_pickling(pdb_folder, saving_directory='data/pickle_data/pdb/')[source]¶
Input : pdb folder path Output : None
This function parse pdb file and save it as .pkl file using pickle
- miniworld.utils.data_refactoring.print_hhr_pt(file_path='data/test_data/PDB_2021Aug02/001128.pt')[source]¶
- miniworld.utils.data_refactoring.protein_to_template(protein, position=None, f0d=None, f1d=None, IS_LABEL=False)[source]¶
Input : Protein object, position (torch.Tensor) Output : ProteinTemplate object This function is used when Alphafold output is used as template. A protein is a object of Protein class and a position is confident residue position. (It is filtered from other function)
- miniworld.utils.data_refactoring.refactoring_hhr_file(original_hhr_file, saving_directory=None)[source]¶
Input : original hhr file path Do : Refactoring hhr file and save it Output : None
Original hhr file has qmap which has information about length and position of each template. I think it is not necessary to save this information in hhr file and hard to understand. So I remove it. and save as list of torch tensor.
miniworld.utils.ffindex module¶
Created on Apr 30, 2014
@author: meiermark
miniworld.utils.hhpred_parser module¶
miniworld.utils.kalign_mapping module¶
miniworld.utils.kinematics module¶
- miniworld.utils.kinematics.avgQ(Qs)[source]¶
average a set of quaternions input dims: Qs - (B,N,R,4) averages across ‘N’ dimension
- miniworld.utils.kinematics.c6d_to_bins(c6d, params={'ABINS': 36, 'DBINS': 36, 'DMAX': 20.0, 'DMIN': 2.0})[source]¶
bin 2d distance and orientation maps
- miniworld.utils.kinematics.c6d_to_bins2(c6d, same_chain, ignore_interchain=False, params={'ABINS': 36, 'DBINS': 36, 'DMAX': 20.0, 'DMIN': 2.0})[source]¶
bin 2d distance and orientation maps
- miniworld.utils.kinematics.dist_to_bins(dist, params={'ABINS': 36, 'DBINS': 36, 'DMAX': 20.0, 'DMIN': 2.0})[source]¶
bin 2d distance maps
- miniworld.utils.kinematics.dist_to_onehot(dist, params={'ABINS': 36, 'DBINS': 36, 'DMAX': 20.0, 'DMIN': 2.0})[source]¶
- miniworld.utils.kinematics.get_ang(a, b, c)[source]¶
calculate planar angles for all consecutive triples (a[i],b[i],c[i]) from Cartesian coordinates of three sets of atoms a,b,c
- Parameters:
a (pytorch tensors of shape [batch,nres,3]) – store Cartesian coordinates of three sets of atoms
b (pytorch tensors of shape [batch,nres,3]) – store Cartesian coordinates of three sets of atoms
c (pytorch tensors of shape [batch,nres,3]) – store Cartesian coordinates of three sets of atoms
- Returns:
ang – stores resulting planar angles
- Return type:
pytorch tensor of shape [batch,nres]
- miniworld.utils.kinematics.get_dih(a, b, c, d)[source]¶
calculate dihedral angles for all consecutive quadruples (a[i],b[i],c[i],d[i]) given Cartesian coordinates of four sets of atoms a,b,c,d
- Parameters:
a (pytorch tensors of shape [batch,nres,3]) – store Cartesian coordinates of four sets of atoms
b (pytorch tensors of shape [batch,nres,3]) – store Cartesian coordinates of four sets of atoms
c (pytorch tensors of shape [batch,nres,3]) – store Cartesian coordinates of four sets of atoms
d (pytorch tensors of shape [batch,nres,3]) – store Cartesian coordinates of four sets of atoms
- Returns:
dih – stores resulting dihedrals
- Return type:
pytorch tensor of shape [batch,nres]
- miniworld.utils.kinematics.get_pair_dist(a, b)[source]¶
calculate pair distances between two sets of points
- Parameters:
a (pytorch tensors of shape [batch,nres,3]) – store Cartesian coordinates of two sets of atoms
b (pytorch tensors of shape [batch,nres,3]) – store Cartesian coordinates of two sets of atoms
- Returns:
dist – stores paitwise distances between atoms in a and b
- Return type:
pytorch tensor of shape [batch,nres,nres]
- miniworld.utils.kinematics.xyz_to_bbtor(xyz, params={'ABINS': 36, 'DBINS': 36, 'DMAX': 20.0, 'DMIN': 2.0})[source]¶
- miniworld.utils.kinematics.xyz_to_c6d(xyz, params={'ABINS': 36, 'DBINS': 36, 'DMAX': 20.0, 'DMIN': 2.0})[source]¶
convert cartesian coordinates into 2d distance and orientation maps
- Parameters:
xyz (pytorch tensor of shape [batch,nres,3,3]) – stores Cartesian coordinates of backbone N,Ca,C atoms
- Returns:
c6d – stores stacked dist,omega,theta,phi 2D maps
- Return type:
pytorch tensor of shape [batch,nres,nres,4]
- miniworld.utils.kinematics.xyz_to_chi1(xyz_t)[source]¶
convert template cartesian coordinates into chi1 angles
- Parameters:
xyz_t (pytorch tensor of shape [batch, templ, nres, 14, 3]) – stores Cartesian coordinates of template atoms. For missing atoms, it should be NaN
- Returns:
chi1 – stores cos and sin chi1 angle
- Return type:
pytorch tensor of shape [batch, templ, nres, 2]
- miniworld.utils.kinematics.xyz_to_t2d(xyz_t, mask, params={'ABINS': 36, 'DBINS': 36, 'DMAX': 20.0, 'DMIN': 2.0})[source]¶
convert template cartesian coordinates into 2d distance and orientation maps
- Parameters:
xyz_t (pytorch tensor of shape [batch,templ,nres,natm,3]) – stores Cartesian coordinates of template backbone N,Ca,C atoms
mask (pytorch tensor of shape [batch,templ,nrres,nres]) – indicates whether valid residue pairs or not
- Returns:
t2d – stores stacked dist,omega,theta,phi 2D maps
- Return type:
pytorch tensor of shape [batch,nres,nres,37+6+1]
miniworld.utils.output_visualize module¶
- miniworld.utils.output_visualize.atom_to_pdb_line(atom, atom_xyz, residue_idx, residue_stirng, atom_mask)[source]¶
- miniworld.utils.output_visualize.output_to_pdb(output_dict, save_dir='model_output_visualize/')[source]¶
- miniworld.utils.output_visualize.protein_to_pdb(ID, protein, crop_idx=None, save_dir='model_output_visualize/')[source]¶
- miniworld.utils.output_visualize.tensor_to_pdb(ID, xyz, sequence, atom_mask, crop_idx=None, save_dir='model_output_visualize/')[source]¶
- miniworld.utils.output_visualize.visualize_2D_heatmap(heatmap, file_name='heatmap', heatmap_dir='opt_visualize/')[source]¶
miniworld.utils.parser_util module¶
miniworld.utils.template_parser module¶
miniworld.utils.util module¶
- miniworld.utils.util.generate_initial_xyz(seq, chain_break, random_noise=20.0, use_all_model=False, use_random_model=True)[source]¶
- miniworld.utils.util.generate_random_xyz(seq, random_noise=20.0, use_all_model=False, use_random_model=True)[source]¶
- miniworld.utils.util.generate_symmetric_xyz(seq, chain_break, random_noise=20.0, use_random_model=True, symmetry='C4')[source]¶
- miniworld.utils.util.get_train_valid_data_dir_dictionary(data_dir_dictionary)[source]¶
Describes the structure of the data directory dictionary:
data_dir_dictionary = { 'PDB_hhr_dir' : '/home/psk6950/practice/MiniWorld/data/hhr_rerefactoring/', 'pickle_data' : { 'PDB_msa_dir' : '/home/psk6950/practice/MiniWorld/data/pickle_data/a3m/', 'PDB_mmcif_dir' : '/home/psk6950/practice/MiniWorld/data/pickle_data/mmcif/' }, 'STRING_msa_dir' : '/home/psk6950/practice/MiniWorld/data/STRING/MSA', 'STRING_template_dir' : '/home/psk6950/practice/MiniWorld/data/STRING/Template', 'STRING_ID_Seq_path' : '/home/psk6950/practice/MiniWorld/data/STRING_ID_dict.txt', 'ID_to_source_dir' : '/home/psk6950/practice/MiniWorld/data/new/filter_final_v00/new_ID_to_source_dict.txt', 'PDB_Monomer_ID_to_info_dir' : '/home/psk6950/practice/MiniWorld/data/valid_PDB_Monomer_ID_list.txt', 'PDB_Complex_ID_to_info_dir' : '/home/psk6950/practice/MiniWorld/data/valid_PDB_Complex_ID_list.txt', 'PDB_ID_to_info_dir' : '/home/psk6950/practice/MiniWorld/data/new/filter_final_v00/list_PSK_v04.csv', 'PDB_ID_full_sequence' : '/home/psk6950/practice/MiniWorld/data/new/filter_final_v00/new_PDB_ID_full_sequence.txt', 'train_ID_list' : '/home/psk6950/practice/MiniWorld/data/train_ID_list.txt', 'valid_PDB_ID_list' : '/home/psk6950/practice/MiniWorld/data/valid_PDB_ID_list.txt', 'valid_STRING_ID_list' : '/home/psk6950/practice/MiniWorld/data/valid_STRING_ID_list.txt',
}
miniworld.utils.util_module module¶
- class miniworld.utils.util_module.ComputeAllAtomCoords[source]¶
Bases:
Module
- forward(seq, xyz, alphas, non_ideal=False, use_H=True)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class miniworld.utils.util_module.Dropout(broadcast_dim=None, p_drop=0.15)[source]¶
Bases:
Module
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.