deepfold.data.tools package¶
Submodules¶
deepfold.data.tools.hhblits module¶
Library to run HHblits from Python.
- class deepfold.data.tools.hhblits.HHBlits(*, binary_path: str, databases: Sequence[str], n_cpu: int = 4, n_iter: int = 3, e_value: float = 0.001, maxseq: int = 1000000, realign_max: int = 100000, maxfilt: int = 100000, min_prefilter_hits: int = 1000, all_seqs: bool = False, alt: int | None = None, p: int = 20, z: int = 500)¶
Bases:
object
Python wrapper of the HHblits binary.
- query(input_fasta_path: str) List[Mapping[str, Any]] ¶
Queries the database using HHblits.
deepfold.data.tools.hhsearch module¶
Library to run HHsearch from Python.
- class deepfold.data.tools.hhsearch.HHSearch(*, binary_path: str, databases: Sequence[str], n_cpu: int = 2, maxseq: int = 1000000)¶
Bases:
object
Python wrapper of the HHsearch binary.
- static get_template_hits(output_string: str, input_sequence: str) Sequence[TemplateHit] ¶
Gets parsed template hits from the raw string output by the tool
- property input_format: str¶
- property output_format: str¶
- query(a3m: str, output_dir: str | None = None) str ¶
Queries the database using HHsearch using a given a3m.
deepfold.data.tools.hmmbuild module¶
A Python wrapper for hmmbuild - construct HMM profiles from MSA.
- class deepfold.data.tools.hmmbuild.Hmmbuild(*, binary_path: str, singlemx: bool = False)¶
Bases:
object
Python wrapper of the hmmbuild binary.
- build_profile_from_a3m(a3m: str) str ¶
Builds a HHM for the aligned sequences given as an A3M string.
- Parameters:
a3m – A string with the aligned sequences in the A3M format.
- Returns:
A string with the profile in the HMM format.
- Raises:
RuntimeError – If hmmbuild fails.
- build_profile_from_sto(sto: str, model_construction='fast') str ¶
Builds a HHM for the aligned sequences given as an A3M string.
- Parameters:
sto – A string with the aligned sequences in the Stockholm format.
model_construction – Whether to use reference annotation in the msa to determine consensus columns (‘hand’) or default (‘fast’).
- Returns:
A string with the profile in the HMM format.
- Raises:
RuntimeError – If hmmbuild fails.
deepfold.data.tools.hmmsearch module¶
A Python wrapper for hmmsearch - search profile against a sequence db.
- class deepfold.data.tools.hmmsearch.Hmmsearch(*, binary_path: str, hmmbuild_binary_path: str, database_path: str, flags: Sequence[str] | None = None)¶
Bases:
object
Python wrapper of the hmmsearch binary.
- static get_template_hits(output_string: str, input_sequence: str) Sequence[TemplateHit] ¶
Gets parsed template hits from the raw string output by the tool.
- property input_format: str¶
- property output_format: str¶
- query(msa_sto: str, output_dir: str | None = None) str ¶
Queries the database using hmmsearch using a given stockholm msa.
- query_with_hmm(hmm: str, output_dir: str | None = None) str ¶
Queries the database using hmmsearch using a given hmm.
deepfold.data.tools.jackhmmer module¶
Library to run Jackhmmer from Python.
- class deepfold.data.tools.jackhmmer.Jackhmmer(*, binary_path: str, database_path: str, n_cpu: int = 8, n_iter: int = 1, e_value: float = 0.0001, z_value: int | None = None, get_tblout: bool = False, filter_f1: float = 0.0005, filter_f2: float = 5e-05, filter_f3: float = 5e-07, incdom_e: float | None = None, dom_e: float | None = None, num_streamed_chunks: int | None = None, streaming_callback: Callable[[int], None] | None = None)¶
Bases:
object
Python wrapper of the Jackhmmer binary.
- query(input_fasta_path: str, max_sequences: int | None = None) Sequence[Sequence[Mapping[str, Any]]] ¶
- query_multiple(input_fasta_paths: Sequence[str], max_sequences: int | None = None) Sequence[Sequence[Mapping[str, Any]]] ¶
Queries the database using Jackhmmer.
deepfold.data.tools.kalign module¶
A Python wrapper for Kalign.
- class deepfold.data.tools.kalign.Kalign(binary_path: str, verbose: bool = False)¶
Bases:
object
Python wrapper of the Kalign binary.
- align(sequences: List[str]) str ¶
Aligns the sequences and returns the alignment in A3M string.
- Parameters:
sequences – A list of query sequence strings. The sequences have to be at least 6 residues long (Kalign requires this). Note that the order in which you give the sequences might alter the output slightly as different alignment tree might get constructed.
- Returns:
A string with the alignment in a3m format.
- Raises:
RuntimeError – If Kalign fails.
ValueError – If any of the sequences is less than 6 residues long.
deepfold.data.tools.utils module¶
Common utilities for data pipeline tools.
- deepfold.data.tools.utils.timing(msg: str)¶
- deepfold.data.tools.utils.tmpdir_manager(base_dir: str | None = None)¶
Context manager that deletes a temporary directory on exit.
- deepfold.data.tools.utils.to_date(s: str)¶