deepfold.data.tools package

Submodules

deepfold.data.tools.hhblits module

Library to run HHblits from Python.

class deepfold.data.tools.hhblits.HHBlits(*, binary_path: str, databases: Sequence[str], n_cpu: int = 4, n_iter: int = 3, e_value: float = 0.001, maxseq: int = 1000000, realign_max: int = 100000, maxfilt: int = 100000, min_prefilter_hits: int = 1000, all_seqs: bool = False, alt: int | None = None, p: int = 20, z: int = 500)

Bases: object

Python wrapper of the HHblits binary.

query(input_fasta_path: str) List[Mapping[str, Any]]

Queries the database using HHblits.

deepfold.data.tools.hhsearch module

Library to run HHsearch from Python.

class deepfold.data.tools.hhsearch.HHSearch(*, binary_path: str, databases: Sequence[str], n_cpu: int = 2, maxseq: int = 1000000)

Bases: object

Python wrapper of the HHsearch binary.

static get_template_hits(output_string: str, input_sequence: str) Sequence[TemplateHit]

Gets parsed template hits from the raw string output by the tool

property input_format: str
property output_format: str
query(a3m: str, output_dir: str | None = None) str

Queries the database using HHsearch using a given a3m.

deepfold.data.tools.hmmbuild module

A Python wrapper for hmmbuild - construct HMM profiles from MSA.

class deepfold.data.tools.hmmbuild.Hmmbuild(*, binary_path: str, singlemx: bool = False)

Bases: object

Python wrapper of the hmmbuild binary.

build_profile_from_a3m(a3m: str) str

Builds a HHM for the aligned sequences given as an A3M string.

Parameters:

a3m – A string with the aligned sequences in the A3M format.

Returns:

A string with the profile in the HMM format.

Raises:

RuntimeError – If hmmbuild fails.

build_profile_from_sto(sto: str, model_construction='fast') str

Builds a HHM for the aligned sequences given as an A3M string.

Parameters:
  • sto – A string with the aligned sequences in the Stockholm format.

  • model_construction – Whether to use reference annotation in the msa to determine consensus columns (‘hand’) or default (‘fast’).

Returns:

A string with the profile in the HMM format.

Raises:

RuntimeError – If hmmbuild fails.

deepfold.data.tools.hmmsearch module

A Python wrapper for hmmsearch - search profile against a sequence db.

class deepfold.data.tools.hmmsearch.Hmmsearch(*, binary_path: str, hmmbuild_binary_path: str, database_path: str, flags: Sequence[str] | None = None)

Bases: object

Python wrapper of the hmmsearch binary.

static get_template_hits(output_string: str, input_sequence: str) Sequence[TemplateHit]

Gets parsed template hits from the raw string output by the tool.

property input_format: str
property output_format: str
query(msa_sto: str, output_dir: str | None = None) str

Queries the database using hmmsearch using a given stockholm msa.

query_with_hmm(hmm: str, output_dir: str | None = None) str

Queries the database using hmmsearch using a given hmm.

deepfold.data.tools.jackhmmer module

Library to run Jackhmmer from Python.

class deepfold.data.tools.jackhmmer.Jackhmmer(*, binary_path: str, database_path: str, n_cpu: int = 8, n_iter: int = 1, e_value: float = 0.0001, z_value: int | None = None, get_tblout: bool = False, filter_f1: float = 0.0005, filter_f2: float = 5e-05, filter_f3: float = 5e-07, incdom_e: float | None = None, dom_e: float | None = None, num_streamed_chunks: int | None = None, streaming_callback: Callable[[int], None] | None = None)

Bases: object

Python wrapper of the Jackhmmer binary.

query(input_fasta_path: str, max_sequences: int | None = None) Sequence[Sequence[Mapping[str, Any]]]
query_multiple(input_fasta_paths: Sequence[str], max_sequences: int | None = None) Sequence[Sequence[Mapping[str, Any]]]

Queries the database using Jackhmmer.

deepfold.data.tools.kalign module

A Python wrapper for Kalign.

class deepfold.data.tools.kalign.Kalign(binary_path: str, verbose: bool = False)

Bases: object

Python wrapper of the Kalign binary.

align(sequences: List[str]) str

Aligns the sequences and returns the alignment in A3M string.

Parameters:

sequences – A list of query sequence strings. The sequences have to be at least 6 residues long (Kalign requires this). Note that the order in which you give the sequences might alter the output slightly as different alignment tree might get constructed.

Returns:

A string with the alignment in a3m format.

Raises:
  • RuntimeError – If Kalign fails.

  • ValueError – If any of the sequences is less than 6 residues long.

deepfold.data.tools.utils module

Common utilities for data pipeline tools.

deepfold.data.tools.utils.timing(msg: str)
deepfold.data.tools.utils.tmpdir_manager(base_dir: str | None = None)

Context manager that deletes a temporary directory on exit.

deepfold.data.tools.utils.to_date(s: str)

Module contents