bis_protein_structure package¶

Submodules¶

bis_protein_structure.CROSSLINK module¶

bis_protein_structure.CROSSLINK.calculate_calpha_distogram(chain_id, chain, residue_dict)[source]¶

Calculates the C-alpha distogram for a protein chain.

Parameters:

chain_id (str) – Chain identifier.
chain (Bio.PDB.Chain) – Chain object from the parsed structure.
residue_dict (dict) – Dictionary mapping chain IDs to residues and their positions.

Returns:

residue_length (int) – Number of residues in the chain.
distogram (numpy.ndarray) – A 2D matrix representing the pairwise C-alpha distances.

bis_protein_structure.CROSSLINK.calculate_lys_leu_map(chain_id, chain, residue_dict)[source]¶

Generates a binary map indicating the presence of Lysine (LYS) or Leucine (LEU) residues.

Parameters:

chain_id (str) – Chain identifier.
chain (Bio.PDB.Chain) – Chain object from the parsed structure.
residue_dict (dict) – Dictionary mapping chain IDs to residues and their positions.

Returns:

residue_length (int) – Number of residues in the chain.
lys_leu_map (numpy.ndarray) – A 2D boolean array where True indicates the presence of LYS or LEU at corresponding residue positions.

bis_protein_structure.CROSSLINK.calculate_sa_map(chain_id, chain, residue_dict, solvent_raw_datas)[source]¶

Calculates the solvent accessibility (SA) map for a protein chain.

Parameters:

chain_id (str) – Chain identifier.
chain (Bio.PDB.Chain) – Chain object from the parsed structure.
residue_dict (dict) – Dictionary mapping chain IDs to residues and their positions.
solvent_raw_datas (list) – List containing raw solvent accessibility data for each residue.

Returns:

residue_length (int) – Number of residues in the chain.
sa_map (numpy.ndarray) – A 2D array representing the solvent accessibility map, with values between 0 and 1.

bis_protein_structure.CROSSLINK.calculate_tryptic_map(chain_id, chain, residue_dict)[source]¶

Generates a tryptic map based on the presence of Lysine (LYS) or Arginine (ARG).

Parameters:

chain_id (str) – Chain identifier.
chain (Bio.PDB.Chain) – Chain object from the parsed structure.
residue_dict (dict) – Dictionary mapping chain IDs to residues and their positions.

Returns:

residue_length (int) – Number of residues in the chain.
tryptic_map (numpy.ndarray) – A 2D boolean array where True indicates tryptic cleavage points between residues.

bis_protein_structure.CROSSLINK.load_list_from_file(file_path)[source]¶

Loads a list from a pickle file.

Parameters:: file_path (str) – Path to the pickle file.
Returns:: data_list – List of data loaded from the pickle file.
Return type:: list

bis_protein_structure.CROSSLINK.plot_cross_link(residue_length, distogram, lys_leu_map, sa_map, tryptic_map)[source]¶

Plots cross-linking analysis based on distance, LYS_LEU map, solvent accessibility, and tryptic map.

Parameters:

residue_length (int) – Number of residues in the chain.
distogram (numpy.ndarray) – Pairwise C-alpha distance matrix.
lys_leu_map (numpy.ndarray) – LYS and LEU residue map.
sa_map (numpy.ndarray) – Solvent accessibility map.
tryptic_map (numpy.ndarray) – Tryptic cleavage map.

bis_protein_structure.CROSSLINK.readMMCIF(mmcif_path)[source]¶

Reads an MMCIF file and extracts chain and residue information.

Parameters:

mmcif_path (str) – Path to the MMCIF file.

Returns:

model (Bio.PDB.Model) – The first model from the parsed MMCIF file.
chains (list of str) – List of chain identifiers.
residue_dict (dict) – Dictionary mapping chain IDs to residues and their positions.

bis_protein_structure.CROSSLINK.readPDB(pdb_dir)[source]¶

Reads a PDB file and extracts chain and residue information.

Parameters:

pdb_dir (str) – Path to the PDB file.

Returns:

model (Bio.PDB.Model) – The first model from the parsed PDB file.
chains (list of str) – List of chain identifiers.
residue_dict (dict) – Dictionary mapping chain IDs to residues and their positions.

bis_protein_structure.DATAGEN module¶

class bis_protein_structure.DATAGEN.AllResiduesSelector(target_chain_id)[source]¶

Bases: Select

Selector class for PDBIO to select all residues in a specific chain.

Parameters:: target_chain_id (str) – The ID of the target chain to select residues from.

accept_residue(residue)[source]¶: Returns True if the residue belongs to the target chain.

accept_residue(residue)[source]¶: Overload this to reject residues for output.

bis_protein_structure.DATAGEN.create_structure_from_feature(sequence, all_atom_positions, all_atom_mask, structure_id='pred', model_id=0, chain_id='A')[source]¶

Creates a structure from sequence and atomic position information.

Parameters:

sequence (str) – Amino acid sequence of the protein.
all_atom_positions (numpy.ndarray) – Array of atomic positions for the protein.
all_atom_mask (numpy.ndarray) – Mask indicating valid atoms in the structure.
structure_id (str, optional) – Identifier for the structure (default is ‘pred’).
model_id (int, optional) – Model ID for the structure (default is 0).
chain_id (str, optional) – Chain ID for the structure (default is ‘A’).

Returns:

structure – Generated structure object containing atomic coordinates.

Return type:

Bio.PDB.Structure.Structure

bis_protein_structure.DATAGEN.generate_feature_dict(tag, seq, fasta_path, alignment_dir)[source]¶

Generates a feature dictionary from the given sequence and alignment.

Parameters:

tag (str) – Sequence identifier tag.
seq (str) – The amino acid sequence.
fasta_path (str) – Path to the FASTA file.
alignment_dir (str) – Directory where alignments are stored.

Returns:

feature_dict – Dictionary of sequence features.

Return type:

dict

bis_protein_structure.DATAGEN.mmcif_to_pdbs(input_mmcif_file, output_pdb_root)[source]¶

Converts an MMCIF file to separate PDB files for each chain.

Parameters:

input_mmcif_file (str) – Path to the input MMCIF file.
output_pdb_root (str) – Directory where the output PDB files will be stored.

Return type:

None

bis_protein_structure.DATAGEN.parallel_processing(mmcifs, mmcif_root, output_root)[source]¶

Processes multiple MMCIF files in parallel and converts them to PDB files.

Parameters:

mmcifs (list of str) – List of MMCIF file names to process.
mmcif_root (str) – Directory where the MMCIF files are stored.
output_root (str) – Directory where the PDB files will be saved.

Return type:

None

bis_protein_structure.DATAGEN.parse_fasta(data)[source]¶

Parses the contents of a FASTA file and extracts sequence tags and sequences.

Parameters:

data (str) – String content of the FASTA file.

Returns:

tags (list of str) – List of sequence tags.
seqs (list of str) – List of sequences corresponding to the tags.

bis_protein_structure.DATAGEN.parse_mmcif(path, file_id, chain_id, alignment_dir)[source]¶

Parses an MMCIF file and processes it into structured data.

Parameters:

path (str) – Path to the MMCIF file.
file_id (str) – Unique identifier for the MMCIF file.
chain_id (str) – Chain ID to process from the MMCIF file.
alignment_dir (str) – Directory where the alignments are stored.

Returns:

data – Processed data from the MMCIF file.

Return type:

dict

Raises:

Exception – If an error occurs during parsing or if the MMCIF object is None.

bis_protein_structure.DATAGEN.process_mmcif_to_pdbs(mmcif, mmcif_root, output_root)[source]¶

Processes an MMCIF file and converts it to PDB format.

Parameters:

mmcif (str) – Name of the MMCIF file.
mmcif_root (str) – Directory where the MMCIF files are stored.
output_root (str) – Directory where the PDB files will be saved.

Return type:

None

bis_protein_structure.DATAGEN.read_fasta(file_path)[source]¶

Reads a FASTA file and prints out each sequence ID and sequence.

Parameters:: file_path (str) – Path to the FASTA file.
Return type:: None

bis_protein_structure.DISULFIDE module¶

bis_protein_structure.DISULFIDE.get_SS_ver5(chain_id, chain, res_length, residue_idx_edit, criteria)[source]¶

Extracts the spatial coordinates of the alpha carbon (CA) or sulfur gamma (SG) atoms of residues in a protein chain and computes pairwise distances.

Parameters:

chain_id (str) – Chain ID to process.
chain (Bio.PDB.Chain) – The chain object containing residues to process.
res_length (int) – The total number of residues in the chain.
residue_idx_edit (dict) – A dictionary mapping residue numbers to indices for the chain.
criteria (str) – The atom type to use for distance calculations. Valid options are ‘CA’ (alpha carbon) and ‘SG’ (sulfur gamma).

Returns:

filter_disto (numpy.ndarray) – A filtered distogram (distance matrix) that contains distances between Cysteine residues that meet the distance criteria. The shape is (res_length, res_length, 1).
pair_list (numpy.ndarray) – List of residue index pairs where the distance between Cysteine residues satisfies the specified distance criteria.

Notes

The function computes the pairwise Euclidean distance between alpha carbons (CA) or sulfur gamma atoms (SG) in the given chain.
For the ‘CA’ criteria, the distance threshold is set between 3.0 and 7.5 Å, while for ‘SG’ criteria, it is set between 2.0 and 3.0 Å.
Only Cysteine residues are considered for the distance calculation in the case of ‘SG’ criteria.

bis_protein_structure.DISULFIDE.readMMCIF_label(mmcif_path)[source]¶

Reads an MMCIF file and extracts chain and residue information.

Parameters:

mmcif_path (str) – Path to the MMCIF file.

Returns:

model (Bio.PDB.Model) – The first model from the parsed structure.
chains (list of str) – List of chain IDs present in the structure.
residue_dict (dict) – Dictionary mapping each chain to a dictionary of residue numbers and residue names. Format: {chain_id: {res_num: res_name}}.
residue_idx_edit (dict) – Dictionary that maps chain IDs to dictionaries which map residue numbers to indices based on their sequence order. Format: {chain_id: {res_num: index}}.

Notes

The function parses the MMCIF file, extracts chain and residue information, and then aligns the residue numbering with the label sequence identifiers provided in the MMCIF file. Only standard amino acids are included, and the sequence numbering is adjusted to fit a zero-based index for further processing.

bis_protein_structure.MMEVAL module¶

bis_protein_structure.MMEVAL.convert_string(s)[source]¶

Converts a string of alphabets into a corresponding string of numbers based on alphabet position.

Parameters:: s (str) – Input string consisting of alphabets.
Returns:: result – String where each letter is replaced by its corresponding position (A=0, B=1, …, Z=23).
Return type:: str

bis_protein_structure.MMEVAL.eval_interface(native_pdb_path, pred_pdb_path, show=False, interface='all', print=False)[source]¶

Evaluates the similarity between the native and predicted interfaces based on contact maps.

Parameters:

native_pdb_path (str) – Path to the native PDB file.
pred_pdb_path (str) – Path to the predicted PDB file.
show (bool, optional) – Whether to display the contact maps (default is False).
interface (str, optional) – Specifies the chains for which the contact map should be evaluated (default is ‘all’).
print (bool, optional) – Whether to print the evaluation results (default is False).

Returns:

ICS (float) – The Interface Similarity Score (ICS).
IPS (float) – The Interface Patch Score (IPS).

bis_protein_structure.MMEVAL.get_ICS(native_contact_map, pred_contact_map)[source]¶

Computes the Interface Similarity Score (ICS) between two contact maps.

Parameters:

native_contact_map (numpy.ndarray) – Native contact map.
pred_contact_map (numpy.ndarray) – Predicted contact map.

Returns:

interface_similarity_score – The interface similarity score, based on the F1 score.

Return type:

float

bis_protein_structure.MMEVAL.get_IPS(native_contact_map, pred_contact_map)[source]¶

Computes the Interface Patch Score (IPS) between two contact maps.

Parameters:

native_contact_map (numpy.ndarray) – Native contact map.
pred_contact_map (numpy.ndarray) – Predicted contact map.

Returns:

interface_patch_similarity – The interface patch similarity score, calculated as the ratio of the intersection to the union of the patches.

Return type:

float

bis_protein_structure.MMEVAL.get_contact(pdb_path, residue_dict=None, coord_masks=None, interface='all', show=True)[source]¶

Calculates the contact map between residues in a PDB file.

Parameters:

pdb_path (str) – Path to the PDB file.
residue_dict (dict, optional) – Predefined residue dictionary (default is None).
coord_masks (numpy.ndarray, optional) – Mask of residue coordinates (default is None).
interface (str, optional) – Specifies the chains for which the contact map should be calculated. If ‘all’, contacts across all chains are calculated (default is ‘all’).
show (bool, optional) – Whether to display the contact map (default is True).

Returns:

contact_map (numpy.ndarray) – The contact map where values represent the number of contacting atoms.
residue_dict (dict) – Residue dictionary mapping chain IDs to residue names.
coord_masks (numpy.ndarray) – Residue coordinate masks.

bis_protein_structure.MMEVAL.readPDB(pdb_dir)[source]¶

Reads a PDB file and returns the structure, chains, and residue dictionary.

Parameters:

pdb_dir (str) – Path to the PDB file.

Returns:

model (Bio.PDB.Model.Model) – The PDB model object.
chains (list of str) – List of chain identifiers.
residue_dict (dict) – Dictionary where keys are chain IDs and values are dictionaries mapping residue numbers to residue names.

bis_protein_structure.MMEVAL.restype_refer_atoms(restype)[source]¶

Returns the list of atom indices for a given residue type.

Parameters:: restype (str) – The three-letter code for the residue (e.g., ‘ALA’, ‘ARG’).
Returns:: atoms – List of atom indices corresponding to the residue type.
Return type:: list of int

bis_protein_structure.SOLVENTACC module¶

bis_protein_structure.SOLVENTACC.contains_non_numeric(input_string)[source]¶

Check if the input string contains non-numeric characters.

Parameters:: input_string (str) – The string to be checked.
Returns:: True if the string contains non-numeric characters, False otherwise.
Return type:: bool

bis_protein_structure.SOLVENTACC.convert_mmcif_to_dssp(mmcif_path, dssp_root)[source]¶

Convert a MMCIF file to DSSP format.

Parameters:

mmcif_path (str) – Path to the MMCIF file.
dssp_root (str) – Directory where the DSSP file will be saved.

bis_protein_structure.SOLVENTACC.convert_mmcif_to_dssp_parallel(mmcif_paths, dssp_root, max_workers=4)[source]¶

Convert multiple MMCIF files to DSSP format in parallel.

Parameters:

mmcif_paths (list of str) – List of paths to the MMCIF files.
dssp_root (str) – Directory where the DSSP files will be saved.
max_workers (int, optional) – Maximum number of workers for parallel processing (default is 4).

bis_protein_structure.SOLVENTACC.convert_pdb_to_dssp(pdb_path, dssp_root)[source]¶

Convert a PDB file to DSSP format.

Parameters:

pdb_path (str) – Path to the PDB file.
dssp_root (str) – Directory where the DSSP file will be saved.

bis_protein_structure.SOLVENTACC.create_structure_from_feature(sequence, all_atom_positions, all_atom_mask, structure_id='pred', model_id=0, chain_id='A')[source]¶

Create a Biopython Structure object from sequence and atomic features.

Parameters:

sequence (str) – The amino acid sequence.
all_atom_positions (np.ndarray) – Array of shape (n_residues, n_atoms, 3) containing atom coordinates.
all_atom_mask (np.ndarray) – Array of shape (n_residues, n_atoms) indicating which atoms are present.
structure_id (str, optional) – The ID of the structure (default is “pred”).
model_id (int, optional) – The model ID (default is 0).
chain_id (str, optional) – The chain ID (default is “A”).

Returns:

A Biopython Structure object representing the protein.

Return type:

Structure

bis_protein_structure.SOLVENTACC.extract_coords_from_pdb(pdb_path)[source]¶

Extract atomic coordinates from a PDB file.

Parameters:: pdb_path (str) – Path to the PDB file.
Returns:: A tuple containing the sequence and an array of atomic coordinates.
Return type:: tuple

bis_protein_structure.SOLVENTACC.getTMscore(pdb_path1, pdb_path2)[source]¶

Calculate the TM-score between two PDB files.

Parameters:

pdb_path1 (str) – Path to the first PDB file.
pdb_path2 (str) – Path to the second PDB file.

Returns:

The TM-score between the two structures.

Return type:

float

bis_protein_structure.SOLVENTACC.load_pickle_file(file_path)[source]¶

Load data from a pickle file.

Parameters:: file_path (str) – The path to the pickle file.
Returns:: The data loaded from the pickle file.
Return type:: data

bis_protein_structure.TORSION2 module¶

bis_protein_structure.TORSION2.angles_to_sincos(tor_angles)[source]¶

Convert angles to sine and cosine values.

Parameters:: tor_angles (torch.Tensor) – A tensor of shape (n, m) containing angles in degrees.
Returns:: A tensor of shape (n, m, 2) where the last dimension contains the sine and cosine of the angles.
Return type:: torch.Tensor

bis_protein_structure.TORSION2.getDistogram(residues, atom_pos, atom_mask)[source]¶

Calculates the pairwise distance matrix for atom positions.

Parameters:

residues (dict) – A dictionary mapping residue IDs to residue names.
atom_pos (np.ndarray) – An array containing the coordinates of the atoms.
atom_mask (np.ndarray) – An array indicating which atoms have valid coordinates.

Returns:

pairwise_dist – A square matrix of shape (n_atoms, n_atoms) containing pairwise distances.

Return type:

np.ndarray

bis_protein_structure.TORSION2.getTorsion_acc(target_residues, tor_masks, native_angles, target_angles, target_alter_angles, thres=10, chi_dependent=True)[source]¶

Calculates the accuracy of torsion angles by comparing native angles with target angles and optional alternative angles.

Parameters:

target_residues (dict) – A dictionary containing the target residues indexed by their residue numbers.
tor_masks (np.ndarray) – A boolean array indicating the availability of torsion angles for each residue.
native_angles (np.ndarray) – An array of native torsion angles.
target_angles (np.ndarray) – An array of target torsion angles for comparison.
target_alter_angles (np.ndarray) – An array of alternative target torsion angles for comparison.
thres (int, optional) – The threshold for considering an angle correct. Default is 10 degrees.
chi_dependent (bool, optional) – If True, ensures that only the first valid chi angle is counted as correct. Default is True.

Returns:

A dictionary containing the total, correct counts, and accuracy for each angle type (backbone and sidechain).

Return type:

dict

bis_protein_structure.TORSION2.get_bondangle(p)[source]¶

Calculate the bond angle formed by three points.

Parameters:: p (np.ndarray) – An array of shape (3, 3) representing the coordinates of the three points.
Returns:: The bond angle in degrees formed by the three points.
Return type:: float

bis_protein_structure.TORSION2.get_coordinates(final_residue, residues, chain)[source]¶

Retrieves the coordinates of atoms from the residues in a chain.

Parameters:

final_residue (int) – The total number of residues to be considered.
residues (dict) – A dictionary mapping residue IDs to residue names.
chain (Chain) – The chain object containing the residues.

Returns:

coord (np.ndarray) – An array of shape (final_residue, 37, 3) containing the coordinates of the atoms.
coord_mask (np.ndarray) – An array of shape (final_residue, 37, 1) indicating which atoms have valid coordinates.
unexpected_atoms (dict) – A dictionary mapping residue IDs to unexpected atom IDs found during processing.

bis_protein_structure.TORSION2.get_refer_atoms(restype, angletype)[source]¶

Get reference atom indices based on residue type and angle type.

Parameters:

restype (str) – The residue type (e.g., ‘ARG’, ‘GLY’).
angletype (int) – The angle type index (0-6).

Returns:

A list of indices representing reference atoms for the given residue and angle types.

Return type:

list

bis_protein_structure.TORSION2.get_torsion(atom_mask, atom_pos, residues, as_tensor=False)[source]¶

Computes the torsion angles for a set of residues.

Parameters:

atom_mask (np.ndarray) – An array indicating which atoms have valid coordinates.
atom_pos (np.ndarray) – An array containing the coordinates of the atoms.
residues (dict) – A dictionary mapping residue IDs to residue names.
as_tensor (bool, optional) – If True, returns the angles and masks as PyTorch tensors (default is False).

Returns:

tor_masks (np.ndarray) – A boolean array of shape (n_residues, 7) indicating valid torsion angles.
tor_angles (np.ndarray) – An array of shape (n_residues, 7) containing the torsion angle values.

bis_protein_structure.TORSION2.new_dihedral(p)[source]¶

Calculate the dihedral angle between four points.

Parameters:: p (np.ndarray) – An array of shape (4, 3) representing the coordinates of the four points.
Returns:: The dihedral angle in degrees between the planes formed by the points.
Return type:: float

bis_protein_structure.TORSION2.readPDB(pdb_path)[source]¶

Reads a PDB file and extracts residue information.

Parameters:

pdb_path (str) – The file path to the PDB file to be parsed.

Returns:

residues (dict) – A dictionary mapping residue IDs to residue names.
chain (Chain) – The chain object from the parsed structure.

bis_protein_structure.TORSION2.restype_refer_atoms(restype)[source]¶

Get reference atom indices for a given residue type.

Parameters:: restype (str) – The residue type (e.g., ‘ALA’, ‘ARG’).
Returns:: A list of indices representing the atoms associated with the residue type.
Return type:: list

bis_protein_structure.TORSION2.sidechain_sym_angle(target_residues, tor_masks, native_angles, target_angles, target_alter_angles)[source]¶

Adjusts the target angles of sidechain torsions based on their differences with native angles and alternative angles.

Parameters:

target_residues (dict) – A dictionary containing the target residues indexed by their residue numbers.
tor_masks (np.ndarray) – A boolean array indicating the availability of torsion angles for each residue.
native_angles (np.ndarray) – An array of native torsion angles.
target_angles (np.ndarray) – An array of target torsion angles to be modified.
target_alter_angles (np.ndarray) – An array of alternative target torsion angles for comparison.

Returns:

The modified target angles after adjusting based on comparisons with native and alternative angles.

Return type:

np.ndarray

bis_protein_structure.TORSION2.torsion_angle_loss(a, a_gt, tor_masks)[source]¶

Computes the loss for torsion angles based on the difference between predicted angles and ground truth angles, including penalties for angle normalization.

Parameters:

a (torch.Tensor) – The predicted torsion angles, shape [*, N, 7, 2].
a_gt (torch.Tensor) – The ground truth torsion angles, shape [*, N, 7, 2].
tor_masks (torch.Tensor) – A boolean tensor indicating the validity of torsion angles.

Returns:

A dictionary containing the total loss, backbone loss, and sidechain loss.

Return type:

dict

bis_protein_structure.residue_constants module¶

Constants used in AlphaFold.

class bis_protein_structure.residue_constants.Bond(atom1_name, atom2_name, length, stddev)¶

Bases: tuple

atom1_name¶: Alias for field number 0

atom2_name¶: Alias for field number 1

length¶: Alias for field number 2

stddev¶: Alias for field number 3

class bis_protein_structure.residue_constants.BondAngle(atom1_name, atom2_name, atom3name, angle_rad, stddev)¶

Bases: tuple

angle_rad¶: Alias for field number 3

atom1_name¶: Alias for field number 0

atom2_name¶: Alias for field number 1

atom3name¶: Alias for field number 2

stddev¶: Alias for field number 4

bis_protein_structure.residue_constants.chi_angle_atom(atom_index: int) → ndarray[source]¶: Define chi-angle rigid groups via one-hot representations.

bis_protein_structure.residue_constants.load_stereo_chemical_props() → Tuple[Mapping[str, List[Bond]], Mapping[str, List[Bond]], Mapping[str, List[BondAngle]]][source]¶

Load stereo_chemical_props.txt into a nice structure.

Load literature values for bond lengths and bond angles and translate bond angles into the length of the opposite edge of the triangle (“residue_virtual_bonds”).

Returns:: dict that maps resname –> list of Bond tuples residue_virtual_bonds: dict that maps resname –> list of Bond tuples residue_bond_angles: dict that maps resname –> list of BondAngle tuples
Return type:: residue_bonds

bis_protein_structure.residue_constants.make_atom14_dists_bounds(overlap_tolerance=1.5, bond_length_tolerance_factor=15)[source]¶: compute upper and lower bounds for bonds to assess violations.

bis_protein_structure.residue_constants.sequence_to_onehot(sequence: str, mapping: Mapping[str, int], map_unknown_to_x: bool = False) → ndarray[source]¶

Maps the given sequence into a one-hot encoded matrix.

Parameters:

sequence – An amino acid sequence.
mapping – A dictionary mapping amino acids to integers.
map_unknown_to_x – If True, any amino acid that is not in the mapping will be mapped to the unknown amino acid ‘X’. If the mapping doesn’t contain amino acid ‘X’, an error will be thrown. If False, any amino acid not in the mapping will throw an error.

Returns:

A numpy array of shape (seq_len, num_unique_aas) with one-hot encoding of the sequence.

Raises:

ValueError – If the mapping doesn’t contain values from 0 to num_unique_aas - 1 without any gaps.

bis_protein_structure.split_pdb module¶

bis_protein_structure.split_pdb.extract(structure, chain_id, start, end, filename)[source]¶

Write out selected portion of a structure to a file.

Parameters:

structure (Bio.PDB.Structure) – The structure object containing the protein data.
chain_id (str) – The identifier for the chain to extract.
start (int) – The starting residue index for extraction.
end (int) – The ending residue index for extraction.
filename (str) – The path to the output file where the selected portion will be saved.

Return type:

None

bis_protein_structure.split_pdb.split_the_pdb(cif_path, pdb, chain)[source]¶

Split a PDB file into separate CIF files for a specified chain.

Parameters:

cif_path (str) – The path to the input CIF file.
pdb (str) – The name of the PDB structure.
chain (str) – The chain identifier to extract from the PDB structure.

Return type:

None

bis_protein_structure package¶

Submodules¶

bis_protein_structure.CROSSLINK module¶

bis_protein_structure.DATAGEN module¶

bis_protein_structure.DISULFIDE module¶

bis_protein_structure.MMEVAL module¶

bis_protein_structure.SOLVENTACC module¶

bis_protein_structure.TORSION2 module¶

bis_protein_structure.residue_constants module¶

bis_protein_structure.split_pdb module¶

Module contents¶