snakemake
snakemake documentation
- Version in MegaLinter: 7.25.0
- Visit Official Web Site
Configuration in MegaLinter
- Enable snakemake by adding
SNAKEMAKE_LINT
in ENABLE_LINTERS variable - Disable snakemake by adding
SNAKEMAKE_LINT
in DISABLE_LINTERS variable
Variable | Description | Default value |
---|---|---|
SNAKEMAKE_LINT_ARGUMENTS | User custom arguments to add in linter CLI call Ex: -s --foo "bar" |
|
SNAKEMAKE_LINT_FILTER_REGEX_INCLUDE | Custom regex including filter Ex: (src\|lib) |
Include every file |
SNAKEMAKE_LINT_FILTER_REGEX_EXCLUDE | Custom regex excluding filter Ex: (test\|examples) |
Exclude no file |
SNAKEMAKE_LINT_CLI_LINT_MODE | Override default CLI lint mode - file : Calls the linter for each file- project : Call the linter from the root of the project |
file |
SNAKEMAKE_LINT_FILE_EXTENSIONS | Allowed file extensions. "*" matches any extension, "" matches empty extension. Empty list excludes all filesEx: [".py", ""] |
[".smk"] |
SNAKEMAKE_LINT_FILE_NAMES_REGEX | File name regex filters. Regular expression list for filtering files by their base names using regex full match. Empty list includes all files Ex: ["Dockerfile(-.+)?", "Jenkinsfile"] |
["Snakefile"] |
SNAKEMAKE_LINT_PRE_COMMANDS | List of bash commands to run before the linter | None |
SNAKEMAKE_LINT_POST_COMMANDS | List of bash commands to run after the linter | None |
SNAKEMAKE_LINT_DISABLE_ERRORS | Run linter but consider errors as warnings | false |
SNAKEMAKE_LINT_DISABLE_ERRORS_IF_LESS_THAN | Maximum number of errors allowed | 0 |
IDE Integration
Use snakemake in your favorite IDE to catch errors before MegaLinter !
IDE | Extension Name | Install | |
---|---|---|---|
IDEA | SnakeCharm | ||
Visual Studio Code | VSCode SnakeMake |
MegaLinter Flavours
This linter is available in the following flavours
Flavor | Description | Embedded linters | Info | |
---|---|---|---|---|
all | Default MegaLinter Flavor | 114 | ||
documentation | MegaLinter for documentation projects | 48 | ||
dotnet | Optimized for C, C++, C# or VB based projects | 60 | ||
go | Optimized for GO based projects | 50 | ||
java | Optimized for JAVA based projects | 51 | ||
javascript | Optimized for JAVASCRIPT or TYPESCRIPT based projects | 57 | ||
php | Optimized for PHP based projects | 51 | ||
python | Optimized for PYTHON based projects | 59 | ||
ruby | Optimized for RUBY based projects | 48 | ||
rust | Optimized for RUST based projects | 48 | ||
salesforce | Optimized for Salesforce based projects | 51 | ||
swift | Optimized for SWIFT based projects | 48 | ||
terraform | Optimized for TERRAFORM based projects | 53 |
Behind the scenes
How are identified applicable files
- File extensions:
.smk
- File names (regex):
Snakefile
How the linting is performed
- snakemake is called one time by identified file (
file
CLI lint mode)
Example calls
snakemake --lint -s Snakefile
Help content
usage: snakemake [-h] [--dry-run] [--profile PROFILE] [--cache [RULE ...]]
[--snakefile FILE] [--cores [N]] [--jobs [N]]
[--local-cores N] [--resources [NAME=INT ...]]
[--set-threads RULE=THREADS [RULE=THREADS ...]]
[--max-threads MAX_THREADS]
[--set-resources RULE:RESOURCE=VALUE [RULE:RESOURCE=VALUE ...]]
[--set-scatter NAME=SCATTERITEMS [NAME=SCATTERITEMS ...]]
[--set-resource-scopes RESOURCE=[global|local]
[RESOURCE=[global|local] ...]]
[--default-resources [NAME=INT ...]]
[--preemption-default PREEMPTION_DEFAULT]
[--preemptible-rules PREEMPTIBLE_RULES [PREEMPTIBLE_RULES ...]]
[--config [KEY=VALUE ...]] [--configfile FILE [FILE ...]]
[--envvars VARNAME [VARNAME ...]] [--directory DIR] [--touch]
[--keep-going]
[--rerun-triggers {mtime,params,input,software-env,code} [{mtime,params,input,software-env,code} ...]]
[--force] [--forceall] [--forcerun [TARGET ...]]
[--prioritize TARGET [TARGET ...]]
[--batch RULE=BATCH/BATCHES] [--until TARGET [TARGET ...]]
[--omit-from TARGET [TARGET ...]] [--rerun-incomplete]
[--shadow-prefix DIR] [--scheduler [{ilp,greedy}]]
[--wms-monitor [WMS_MONITOR]]
[--wms-monitor-arg [NAME=VALUE ...]]
[--scheduler-ilp-solver {PULP_CBC_CMD}]
[--scheduler-solver-path SCHEDULER_SOLVER_PATH]
[--conda-base-path CONDA_BASE_PATH] [--no-subworkflows]
[--groups GROUPS [GROUPS ...]]
[--group-components GROUP_COMPONENTS [GROUP_COMPONENTS ...]]
[--report [FILE]] [--report-stylesheet CSSFILE]
[--draft-notebook TARGET] [--edit-notebook TARGET]
[--notebook-listen IP:PORT] [--lint [{text,json}]]
[--generate-unit-tests [TESTPATH]] [--containerize]
[--export-cwl FILE] [--list] [--list-target-rules] [--dag]
[--rulegraph] [--filegraph] [--d3dag] [--summary]
[--detailed-summary] [--archive FILE]
[--cleanup-metadata FILE [FILE ...]] [--cleanup-shadow]
[--skip-script-cleanup] [--unlock] [--list-version-changes]
[--list-code-changes] [--list-input-changes]
[--list-params-changes] [--list-untracked]
[--delete-all-output] [--delete-temp-output]
[--bash-completion] [--keep-incomplete] [--drop-metadata]
[--version] [--reason] [--gui [PORT]] [--printshellcmds]
[--debug-dag] [--stats FILE] [--nocolor]
[--quiet [{progress,rules,all} ...]] [--print-compilation]
[--verbose] [--force-use-threads] [--allow-ambiguity]
[--nolock] [--ignore-incomplete]
[--max-inventory-time SECONDS] [--latency-wait SECONDS]
[--wait-for-files [FILE ...]] [--wait-for-files-file FILE]
[--notemp] [--all-temp] [--keep-remote] [--keep-target-files]
[--allowed-rules ALLOWED_RULES [ALLOWED_RULES ...]]
[--target-jobs TARGET_JOBS [TARGET_JOBS ...]]
[--local-groupid LOCAL_GROUPID]
[--max-jobs-per-second MAX_JOBS_PER_SECOND]
[--max-status-checks-per-second MAX_STATUS_CHECKS_PER_SECOND]
[-T RETRIES] [--attempt ATTEMPT]
[--wrapper-prefix WRAPPER_PREFIX]
[--default-remote-provider {S3,GS,FTP,SFTP,S3Mocked,gfal,gridftp,iRODS,AzBlob,XRootD}]
[--default-remote-prefix DEFAULT_REMOTE_PREFIX]
[--no-shared-fs] [--greediness GREEDINESS] [--no-hooks]
[--overwrite-shellcmd OVERWRITE_SHELLCMD] [--debug]
[--runtime-profile FILE] [--mode {0,1,2}]
[--show-failed-logs] [--log-handler-script FILE]
[--log-service {none,slack,wms}] [--slurm]
[--cluster CMD | --cluster-sync CMD | --drmaa [ARGS]]
[--cluster-config FILE] [--immediate-submit]
[--jobscript SCRIPT] [--jobname NAME]
[--cluster-status CLUSTER_STATUS]
[--cluster-cancel CLUSTER_CANCEL]
[--cluster-cancel-nargs CLUSTER_CANCEL_NARGS]
[--cluster-sidecar CLUSTER_SIDECAR] [--drmaa-log-dir DIR]
[--kubernetes [NAMESPACE]] [--container-image IMAGE]
[--k8s-cpu-scalar FLOAT] [--tibanna]
[--tibanna-sfn TIBANNA_SFN] [--precommand PRECOMMAND]
[--tibanna-config TIBANNA_CONFIG [TIBANNA_CONFIG ...]]
[--google-lifesciences]
[--google-lifesciences-regions GOOGLE_LIFESCIENCES_REGIONS [GOOGLE_LIFESCIENCES_REGIONS ...]]
[--google-lifesciences-location GOOGLE_LIFESCIENCES_LOCATION]
[--google-lifesciences-keep-cache] [--flux] [--tes URL]
[--use-conda] [--conda-not-block-search-path-envvars]
[--list-conda-envs] [--conda-prefix DIR]
[--conda-cleanup-envs]
[--conda-cleanup-pkgs [{tarballs,cache}]]
[--conda-create-envs-only] [--conda-frontend {conda,mamba}]
[--use-singularity] [--singularity-prefix DIR]
[--singularity-args ARGS] [--cleanup-containers]
[--use-envmodules]
[target ...]
Snakemake is a Python based language and execution environment for GNU Make-
like workflows.
options:
-h, --help show this help message and exit
EXECUTION:
target Targets to build. May be rules or files. (default:
None)
--dry-run, --dryrun, -n
Do not execute anything, and display what would be
done. If you have a very large workflow, use --dry-run
--quiet to just print a summary of the DAG of jobs.
(default: False)
--profile PROFILE Name of profile to use for configuring Snakemake.
Snakemake will search for a corresponding folder in
/etc/xdg/snakemake and /root/.config/snakemake.
Alternatively, this can be an absolute or relative
path. The profile folder has to contain a file
'config.yaml'. This file can be used to set default
values for command line options in YAML format. For
example, '--cluster qsub' becomes 'cluster: qsub' in
the YAML file. Profiles can be obtained from
https://github.com/snakemake-profiles. The profile can
also be set via the environment variable
$SNAKEMAKE_PROFILE. [env var: SNAKEMAKE_PROFILE]
(default: None)
--cache [RULE ...] Store output files of given rules in a central cache
given by the environment variable
$SNAKEMAKE_OUTPUT_CACHE. Likewise, retrieve output
files of the given rules from this cache if they have
been created before (by anybody writing to the same
cache), instead of actually executing the rules.
Output files are identified by hashing all steps,
parameters and software stack (conda envs or
containers) needed to create them. (default: None)
--snakefile FILE, -s FILE
The workflow definition in form of a
snakefile.Usually, you should not need to specify
this. By default, Snakemake will search for
'Snakefile', 'snakefile', 'workflow/Snakefile',
'workflow/snakefile' beneath the current working
directory, in this order. Only if you definitely want
a different layout, you need to use this parameter.
(default: None)
--cores [N], -c [N] Use at most N CPU cores/jobs in parallel. If N is
omitted or 'all', the limit is set to the number of
available CPU cores. In case of cluster/cloud
execution, this argument sets the maximum number of
cores requested from the cluster or cloud scheduler.
(See https://snakemake.readthedocs.io/en/stable/snakef
iles/rules.html#resources-remote-execution for more
info)This number is available to rules via
workflow.cores. (default: None)
--jobs [N], -j [N] Use at most N CPU cluster/cloud jobs in parallel. For
local execution this is an alias for --cores. Note:
Set to 'unlimited' in case, this does not play a role.
(default: None)
--local-cores N In cluster/cloud mode, use at most N cores of the host
machine in parallel (default: number of CPU cores of
the host). The cores are used to execute local rules.
This option is ignored when not in cluster/cloud mode.
(default: 2)
--resources [NAME=INT ...], --res [NAME=INT ...]
Define additional resources that shall constrain the
scheduling analogously to --cores (see above). A
resource is defined as a name and an integer value.
E.g. --resources mem_mb=1000. Rules can use resources
by defining the resource keyword, e.g. resources:
mem_mb=600. If now two rules require 600 of the
resource 'mem_mb' they won't be run in parallel by the
scheduler. In cluster/cloud mode, this argument will
also constrain the amount of resources requested from
the server. (See https://snakemake.readthedocs.io/en/s
table/snakefiles/rules.html#resources-remote-execution
for more info) (default: None)
--set-threads RULE=THREADS [RULE=THREADS ...]
Overwrite thread usage of rules. This allows to fine-
tune workflow parallelization. In particular, this is
helpful to target certain cluster nodes by e.g.
shifting a rule to use more, or less threads than
defined in the workflow. Thereby, THREADS has to be a
positive integer, and RULE has to be the name of the
rule. (default: None)
--max-threads MAX_THREADS
Define a global maximum number of threads available to
any rule. Rules requesting more threads (via the
threads keyword) will have their values reduced to the
maximum. This can be useful when you want to restrict
the maximum number of threads without modifying the
workflow definition or overwriting rules individually
with --set-threads. (default: None)
--set-resources RULE:RESOURCE=VALUE [RULE:RESOURCE=VALUE ...]
Overwrite resource usage of rules. This allows to
fine-tune workflow resources. In particular, this is
helpful to target certain cluster nodes by e.g.
defining a certain partition for a rule, or overriding
a temporary directory. Thereby, VALUE has to be a
positive integer or a string, RULE has to be the name
of the rule, and RESOURCE has to be the name of the
resource. (default: None)
--set-scatter NAME=SCATTERITEMS [NAME=SCATTERITEMS ...]
Overwrite number of scatter items of scattergather
processes. This allows to fine-tune workflow
parallelization. Thereby, SCATTERITEMS has to be a
positive integer, and NAME has to be the name of the
scattergather process defined via a scattergather
directive in the workflow. (default: None)
--set-resource-scopes RESOURCE=[global|local] [RESOURCE=[global|local] ...]
Overwrite resource scopes. A scope determines how a
constraint is reckoned in cluster execution. With
RESOURCE=local, a constraint applied to RESOURCE using
--resources will be considered the limit for each
group submission. With RESOURCE=global, the constraint
will apply across all groups cumulatively. By default,
only `mem_mb` and `disk_mb` are considered local, all
other resources are global. This may be modified in
the snakefile using the `resource_scopes:` directive.
Note that number of threads, specified via --cores, is
always considered local. (See https://snakemake.readth
edocs.io/en/stable/snakefiles/rules.html#resources-
remote-execution for more info) (default: None)
--default-resources [NAME=INT ...], --default-res [NAME=INT ...]
Define default values of resources for rules that do
not define their own values. In addition to plain
integers, python expressions over inputsize are
allowed (e.g. '2*input.size_mb'). The inputsize is the
sum of the sizes of all input files of a rule. By
default, Snakemake assumes a default for mem_mb,
disk_mb, and tmpdir (see below). This option allows to
add further defaults (e.g. account and partition for
slurm) or to overwrite these default values. The
defaults are 'mem_mb=max(2*input.size_mb, 1000)',
'disk_mb=max(2*input.size_mb, 1000)' (i.e., default
disk and mem usage is twice the input file size but at
least 1GB), and the system temporary directory (as
given by $TMPDIR, $TEMP, or $TMP) is used for the
tmpdir resource. The tmpdir resource is automatically
used by shell commands, scripts and wrappers to store
temporary data (as it is mirrored into $TMPDIR, $TEMP,
and $TMP for the executed subprocesses). If this
argument is not specified at all, Snakemake just uses
the tmpdir resource as outlined above. (default: None)
--preemption-default PREEMPTION_DEFAULT
A preemptible instance can be requested when using the
Google Life Sciences API. If you set a --preemption-
default,all rules will be subject to the default.
Specifically, this integer is the number of restart
attempts that will be made given that the instance is
killed unexpectedly. Note that preemptible instances
have a maximum running time of 24 hours. If you want
to set preemptible instances for only a subset of
rules, use --preemptible-rules instead. (default:
None)
--preemptible-rules PREEMPTIBLE_RULES [PREEMPTIBLE_RULES ...]
A preemptible instance can be requested when using the
Google Life Sciences API. If you want to use these
instances for a subset of your rules, you can use
--preemptible-rules and then specify a list of rule
and integer pairs, where each integer indicates the
number of restarts to use for the rule's instance in
the case that the instance is terminated unexpectedly.
--preemptible-rules can be used in combination with
--preemption-default, and will take priority. Note
that preemptible instances have a maximum running time
of 24. If you want to apply a consistent number of
retries across all your rules, use --preemption-
default instead. Example: snakemake --preemption-
default 10 --preemptible-rules map_reads=3
call_variants=0 (default: None)
--config [KEY=VALUE ...], -C [KEY=VALUE ...]
Set or overwrite values in the workflow config object.
The workflow config object is accessible as variable
config inside the workflow. Default values can be set
by providing a JSON file (see Documentation).
(default: None)
--configfile FILE [FILE ...], --configfiles FILE [FILE ...]
Specify or overwrite the config file of the workflow
(see the docs). Values specified in JSON or YAML
format are available in the global config dictionary
inside the workflow. Multiple files overwrite each
other in the given order. Thereby missing keys in
previous config files are extended by following
configfiles. Note that this order also includes a
config file defined in the workflow definition itself
(which will come first). (default: None)
--envvars VARNAME [VARNAME ...]
Environment variables to pass to cloud jobs. (default:
None)
--directory DIR, -d DIR
Specify working directory (relative paths in the
snakefile will use this as their origin). (default:
None)
--touch, -t Touch output files (mark them up to date without
really changing them) instead of running their
commands. This is used to pretend that the rules were
executed, in order to fool future invocations of
snakemake. Fails if a file does not yet exist. Note
that this will only touch files that would otherwise
be recreated by Snakemake (e.g. because their input
files are newer). For enforcing a touch, combine this
with --force, --forceall, or --forcerun. Note however
that you lose the provenance information when the
files have been created in reality. Hence, this should
be used only as a last resort. (default: False)
--keep-going, -k Go on with independent jobs if a job fails. (default:
False)
--rerun-triggers {mtime,params,input,software-env,code} [{mtime,params,input,software-env,code} ...]
Define what triggers the rerunning of a job. By
default, all triggers are used, which guarantees that
results are consistent with the workflow code and
configuration. If you rather prefer the traditional
way of just considering file modification dates, use '
--rerun-trigger mtime'. (default: ['mtime', 'params',
'input', 'software-env', 'code'])
--force, -f Force the execution of the selected target or the
first rule regardless of already created output.
(default: False)
--forceall, -F Force the execution of the selected (or the first)
rule and all rules it is dependent on regardless of
already created output. (default: False)
--forcerun [TARGET ...], -R [TARGET ...]
Force the re-execution or creation of the given rules
or files. Use this option if you changed a rule and
want to have all its output in your workflow updated.
(default: None)
--prioritize TARGET [TARGET ...], -P TARGET [TARGET ...]
Tell the scheduler to assign creation of given targets
(and all their dependencies) highest priority.
(EXPERIMENTAL) (default: None)
--batch RULE=BATCH/BATCHES
Only create the given BATCH of the input files of the
given RULE. This can be used to iteratively run parts
of very large workflows. Only the execution plan of
the relevant part of the workflow has to be
calculated, thereby speeding up DAG computation. It is
recommended to provide the most suitable rule for
batching when documenting a workflow. It should be
some aggregating rule that would be executed only
once, and has a large number of input files. For
example, it can be a rule that aggregates over
samples. (default: None)
--until TARGET [TARGET ...], -U TARGET [TARGET ...]
Runs the pipeline until it reaches the specified rules
or files. Only runs jobs that are dependencies of the
specified rule or files, does not run sibling DAGs.
(default: None)
--omit-from TARGET [TARGET ...], -O TARGET [TARGET ...]
Prevent the execution or creation of the given rules
or files as well as any rules or files that are
downstream of these targets in the DAG. Also runs jobs
in sibling DAGs that are independent of the rules or
files specified here. (default: None)
--rerun-incomplete, --ri
Re-run all jobs the output of which is recognized as
incomplete. (default: False)
--shadow-prefix DIR Specify a directory in which the 'shadow' directory is
created. If not supplied, the value is set to the
'.snakemake' directory relative to the working
directory. (default: None)
--scheduler [{ilp,greedy}]
Specifies if jobs are selected by a greedy algorithm
or by solving an ilp. The ilp scheduler aims to reduce
runtime and hdd usage by best possible use of
resources. (default: greedy)
--wms-monitor [WMS_MONITOR]
IP and port of workflow management system to monitor
the execution of snakemake (e.g.
http://127.0.0.1:5000) Note that if your service
requires an authorization token, you must export
WMS_MONITOR_TOKEN in the environment. (default: None)
--wms-monitor-arg [NAME=VALUE ...]
If the workflow management service accepts extra
arguments, provide. them in key value pairs with
--wms-monitor-arg. For example, to run an existing
workflow using a wms monitor, you can provide the pair
id=12345 and the arguments will be provided to the
endpoint to first interact with the workflow (default:
None)
--scheduler-ilp-solver {PULP_CBC_CMD}
Specifies solver to be utilized when selecting ilp-
scheduler. (default: COIN_CMD)
--scheduler-solver-path SCHEDULER_SOLVER_PATH
Set the PATH to search for scheduler solver binaries
(internal use only). (default: None)
--conda-base-path CONDA_BASE_PATH
Path of conda base installation (home of conda, mamba,
activate) (internal use only). (default: None)
--no-subworkflows, --nosw
Do not evaluate or execute subworkflows. (default:
False)
GROUPING:
--groups GROUPS [GROUPS ...]
Assign rules to groups (this overwrites any group
definitions from the workflow). (default: None)
--group-components GROUP_COMPONENTS [GROUP_COMPONENTS ...]
Set the number of connected components a group is
allowed to span. By default, this is 1, but this flag
allows to extend this. This can be used to run e.g. 3
jobs of the same rule in the same group, although they
are not connected. It can be helpful for putting
together many small jobs or benefitting of shared
memory setups. (default: None)
REPORTS:
--report [FILE] Create an HTML report with results and statistics.
This can be either a .html file or a .zip file. In the
former case, all results are embedded into the .html
(this only works for small data). In the latter case,
results are stored along with a file report.html in
the zip archive. If no filename is given, an embedded
report.html is the default. (default: None)
--report-stylesheet CSSFILE
Custom stylesheet to use for report. In particular,
this can be used for branding the report with e.g. a
custom logo, see docs. (default: None)
NOTEBOOKS:
--draft-notebook TARGET
Draft a skeleton notebook for the rule used to
generate the given target file. This notebook can then
be opened in a jupyter server, executed and
implemented until ready. After saving, it will
automatically be reused in non-interactive mode by
Snakemake for subsequent jobs. (default: None)
--edit-notebook TARGET
Interactively edit the notebook associated with the
rule used to generate the given target file. This will
start a local jupyter notebook server. Any changes to
the notebook should be saved, and the server has to be
stopped by closing the notebook and hitting the 'Quit'
button on the jupyter dashboard. Afterwards, the
updated notebook will be automatically stored in the
path defined in the rule. If the notebook is not yet
present, this will create an empty draft. (default:
None)
--notebook-listen IP:PORT
The IP address and PORT the notebook server used for
editing the notebook (--edit-notebook) will listen on.
(default: localhost:8888)
UTILITIES:
--lint [{text,json}] Perform linting on the given workflow. This will print
snakemake specific suggestions to improve code quality
(work in progress, more lints to be added in the
future). If no argument is provided, plain text output
is used. (default: None)
--generate-unit-tests [TESTPATH]
Automatically generate unit tests for each workflow
rule. This assumes that all input files of each job
are already present. Rules without a job with present
input files will be skipped (a warning will be
issued). For each rule, one test case will be created
in the specified test folder (.tests/unit by default).
After successful execution, tests can be run with
'pytest TESTPATH'. (default: None)
--containerize Print a Dockerfile that provides an execution
environment for the workflow, including all conda
environments. (default: False)
--export-cwl FILE Compile workflow to CWL and store it in given FILE.
(default: None)
--list, -l Show available rules in given Snakefile. (default:
False)
--list-target-rules, --lt
Show available target rules in given Snakefile.
(default: False)
--dag Do not execute anything and print the directed acyclic
graph of jobs in the dot language. Recommended use on
Unix systems: snakemake --dag | dot | display. Note
print statements in your Snakefile may interfere with
visualization. (default: False)
--rulegraph Do not execute anything and print the dependency graph
of rules in the dot language. This will be less
crowded than above DAG of jobs, but also show less
information. Note that each rule is displayed once,
hence the displayed graph will be cyclic if a rule
appears in several steps of the workflow. Use this if
above option leads to a DAG that is too large.
Recommended use on Unix systems: snakemake --rulegraph
| dot | display. Note print statements in your
Snakefile may interfere with visualization. (default:
False)
--filegraph Do not execute anything and print the dependency graph
of rules with their input and output files in the dot
language. This is an intermediate solution between
above DAG of jobs and the rule graph. Note that each
rule is displayed once, hence the displayed graph will
be cyclic if a rule appears in several steps of the
workflow. Use this if above option leads to a DAG that
is too large. Recommended use on Unix systems:
snakemake --filegraph | dot | display. Note print
statements in your Snakefile may interfere with
visualization. (default: False)
--d3dag Print the DAG in D3.js compatible JSON format.
(default: False)
--summary, -S Print a summary of all files created by the workflow.
The has the following columns: filename, modification
time, rule version, status, plan. Thereby rule version
contains the versionthe file was created with (see the
version keyword of rules), and status denotes whether
the file is missing, its input files are newer or if
version or implementation of the rule changed since
file creation. Finally the last column denotes whether
the file will be updated or created during the next
workflow execution. (default: False)
--detailed-summary, -D
Print a summary of all files created by the workflow.
The has the following columns: filename, modification
time, rule version, input file(s), shell command,
status, plan. Thereby rule version contains the
version the file was created with (see the version
keyword of rules), and status denotes whether the file
is missing, its input files are newer or if version or
implementation of the rule changed since file
creation. The input file and shell command columns are
self explanatory. Finally the last column denotes
whether the file will be updated or created during the
next workflow execution. (default: False)
--archive FILE Archive the workflow into the given tar archive FILE.
The archive will be created such that the workflow can
be re-executed on a vanilla system. The function needs
conda and git to be installed. It will archive every
file that is under git version control. Note that it
is best practice to have the Snakefile, config files,
and scripts under version control. Hence, they will be
included in the archive. Further, it will add input
files that are not generated by by the workflow itself
and conda environments. Note that symlinks are
dereferenced. Supported formats are .tar, .tar.gz,
.tar.bz2 and .tar.xz. (default: None)
--cleanup-metadata FILE [FILE ...], --cm FILE [FILE ...]
Cleanup the metadata of given files. That means that
snakemake removes any tracked version info, and any
marks that files are incomplete. (default: None)
--cleanup-shadow Cleanup old shadow directories which have not been
deleted due to failures or power loss. (default:
False)
--skip-script-cleanup
Don't delete wrapper scripts used for execution
(default: False)
--unlock Remove a lock on the working directory. (default:
False)
--list-version-changes, --lv
List all output files that have been created with a
different version (as determined by the version
keyword). (default: False)
--list-code-changes, --lc
List all output files for which the rule body (run or
shell) have changed in the Snakefile. (default: False)
--list-input-changes, --li
List all output files for which the defined input
files have changed in the Snakefile (e.g. new input
files were added in the rule definition or files were
renamed). For listing input file modification in the
filesystem, use --summary. (default: False)
--list-params-changes, --lp
List all output files for which the defined params
have changed in the Snakefile. (default: False)
--list-untracked, --lu
List all files in the working directory that are not
used in the workflow. This can be used e.g. for
identifying leftover files. Hidden files and
directories are ignored. (default: False)
--delete-all-output Remove all files generated by the workflow. Use
together with --dry-run to list files without actually
deleting anything. Note that this will not recurse
into subworkflows. Write-protected files are not
removed. Nevertheless, use with care! (default: False)
--delete-temp-output Remove all temporary files generated by the workflow.
Use together with --dry-run to list files without
actually deleting anything. Note that this will not
recurse into subworkflows. (default: False)
--bash-completion Output code to register bash completion for snakemake.
Put the following in your .bashrc (including the
accents): `snakemake --bash-completion` or issue it in
an open terminal session. (default: False)
--keep-incomplete Do not remove incomplete output files by failed jobs.
(default: False)
--drop-metadata Drop metadata file tracking information after job
finishes. Provenance-information based reports (e.g.
--report and the --list_x_changes functions) will be
empty or incomplete. (default: False)
--version, -v show program's version number and exit
OUTPUT:
--reason, -r Print the reason for each executed rule (deprecated,
always true now). (default: False)
--gui [PORT] Serve an HTML based user interface to the given
network and port e.g. 168.129.10.15:8000. By default
Snakemake is only available in the local network
(default port: 8000). To make Snakemake listen to all
ip addresses add the special host address 0.0.0.0 to
the url (0.0.0.0:8000). This is important if Snakemake
is used in a virtualised environment like Docker. If
possible, a browser window is opened. (default: None)
--printshellcmds, -p Print out the shell commands that will be executed.
(default: False)
--debug-dag Print candidate and selected jobs (including their
wildcards) while inferring DAG. This can help to debug
unexpected DAG topology or errors. (default: False)
--stats FILE Write stats about Snakefile execution in JSON format
to the given file. (default: None)
--nocolor Do not use a colored output. (default: False)
--quiet [{progress,rules,all} ...], -q [{progress,rules,all} ...]
Do not output certain information. If used without
arguments, do not output any progress or rule
information. Defining 'all' results in no information
being printed at all. (default: None)
--print-compilation Print the python representation of the workflow.
(default: False)
--verbose Print debugging output. (default: False)
BEHAVIOR:
--force-use-threads Force threads rather than processes. Helpful if shared
memory (/dev/shm) is full or unavailable. (default:
False)
--allow-ambiguity, -a
Don't check for ambiguous rules and simply use the
first if several can produce the same file. This
allows the user to prioritize rules by their order in
the snakefile. (default: False)
--nolock Do not lock the working directory (default: False)
--ignore-incomplete, --ii
Do not check for incomplete output files. (default:
False)
--max-inventory-time SECONDS
Spend at most SECONDS seconds to create a file
inventory for the working directory. The inventory
vastly speeds up file modification and existence
checks when computing which jobs need to be executed.
However, creating the inventory itself can be slow,
e.g. on network file systems. Hence, we do not spend
more than a given amount of time and fall back to
individual checks for the rest. (default: 20)
--latency-wait SECONDS, --output-wait SECONDS, -w SECONDS
Wait given seconds if an output file of a job is not
present after the job finished. This helps if your
filesystem suffers from latency (default 5). (default:
5)
--wait-for-files [FILE ...]
Wait --latency-wait seconds for these files to be
present before executing the workflow. This option is
used internally to handle filesystem latency in
cluster environments. (default: None)
--wait-for-files-file FILE
Same behaviour as --wait-for-files, but file list is
stored in file instead of being passed on the
commandline. This is useful when the list of files is
too long to be passed on the commandline. (default:
None)
--notemp, --nt Ignore temp() declarations. This is useful when
running only a part of the workflow, since temp()
would lead to deletion of probably needed files by
other parts of the workflow. (default: False)
--all-temp Mark all output files as temp files. This can be
useful for CI testing, in order to save space.
(default: False)
--keep-remote Keep local copies of remote input files. (default:
False)
--keep-target-files Do not adjust the paths of given target files relative
to the working directory. (default: False)
--allowed-rules ALLOWED_RULES [ALLOWED_RULES ...]
Only consider given rules. If omitted, all rules in
Snakefile are used. Note that this is intended
primarily for internal use and may lead to unexpected
results otherwise. (default: None)
--target-jobs TARGET_JOBS [TARGET_JOBS ...]
Target particular jobs by
RULE:WILDCARD1=VALUE,WILDCARD2=VALUE,... This is meant
for internal use by Snakemake itself only. (default:
None)
--local-groupid LOCAL_GROUPID
Name for local groupid, meant for internal use only.
(default: local)
--max-jobs-per-second MAX_JOBS_PER_SECOND
Maximal number of cluster/drmaa jobs per second,
default is 10, fractions allowed. (default: 10)
--max-status-checks-per-second MAX_STATUS_CHECKS_PER_SECOND
Maximal number of job status checks per second,
default is 10, fractions allowed. (default: 10)
-T RETRIES, --retries RETRIES, --restart-times RETRIES
Number of times to restart failing jobs (defaults to
0). (default: 0)
--attempt ATTEMPT Internal use only: define the initial value of the
attempt parameter (default: 1). (default: 1)
--wrapper-prefix WRAPPER_PREFIX
Prefix for URL created from wrapper directive
(default: https://github.com/snakemake/snakemake-
wrappers/raw/). Set this to a different URL to use
your fork or a local clone of the repository, e.g.,
use a git URL like
'git+file://path/to/your/local/clone@'. (default:
https://github.com/snakemake/snakemake-wrappers/raw/)
--default-remote-provider {S3,GS,FTP,SFTP,S3Mocked,gfal,gridftp,iRODS,AzBlob,XRootD}
Specify default remote provider to be used for all
input and output files that don't yet specify one.
(default: None)
--default-remote-prefix DEFAULT_REMOTE_PREFIX
Specify prefix for default remote provider. E.g. a
bucket name. (default: )
--no-shared-fs Do not assume that jobs share a common file system.
When this flag is activated, Snakemake will assume
that the filesystem on a cluster node is not shared
with other nodes. For example, this will lead to
downloading remote files on each cluster node
separately. Further, it won't take special measures to
deal with filesystem latency issues. This option will
in most cases only make sense in combination with
--default-remote-provider. Further, when using
--cluster you will have to also provide --cluster-
status. Only activate this if you know what you are
doing. (default: False)
--greediness GREEDINESS
Set the greediness of scheduling. This value between 0
and 1 determines how careful jobs are selected for
execution. The default value (1.0) provides the best
speed and still acceptable scheduling quality.
(default: None)
--no-hooks Do not invoke onstart, onsuccess or onerror hooks
after execution. (default: False)
--overwrite-shellcmd OVERWRITE_SHELLCMD
Provide a shell command that shall be executed instead
of those given in the workflow. This is for debugging
purposes only. (default: None)
--debug Allow to debug rules with e.g. PDB. This flag allows
to set breakpoints in run blocks. (default: False)
--runtime-profile FILE
Profile Snakemake and write the output to FILE. This
requires yappi to be installed. (default: None)
--mode {0,1,2} Set execution mode of Snakemake (internal use only).
(default: 0)
--show-failed-logs Automatically display logs of failed jobs. (default:
False)
--log-handler-script FILE
Provide a custom script containing a function 'def
log_handler(msg):'. Snakemake will call this function
for every logging output (given as a dictionary
msg)allowing to e.g. send notifications in the form of
e.g. slack messages or emails. (default: None)
--log-service {none,slack,wms}
Set a specific messaging service for logging
output.Snakemake will notify the service on errors and
completed execution.Currently slack and workflow
management system (wms) are supported. (default: None)
SLURM:
--slurm Execute snakemake rules as SLURM batch jobs according
to their 'resources' definition. SLURM resources as
'partition', 'ntasks', 'cpus', etc. need to be defined
per rule within the 'resources' definition. Note, that
memory can only be defined as 'mem_mb' or
'mem_mb_per_cpu' as analogous to the SLURM 'mem' and
'mem-per-cpu' flags to sbatch, respectively. Here, the
unit is always 'MiB'. In addition '--
default_resources' should contain the SLURM account.
(default: False)
CLUSTER:
--cluster CMD Execute snakemake rules with the given submit command,
e.g. qsub. Snakemake compiles jobs into scripts that
are submitted to the cluster with the given command,
once all input files for a particular job are present.
The submit command can be decorated to make it aware
of certain job properties (name, rulename, input,
output, params, wildcards, log, threads and
dependencies (see the argument below)), e.g.: $
snakemake --cluster 'qsub -pe threaded {threads}'.
(default: None)
--cluster-sync CMD cluster submission command will block, returning the
remote exitstatus upon remote termination (for
example, this should be usedif the cluster command is
'qsub -sync y' (SGE) (default: None)
--drmaa [ARGS] Execute snakemake on a cluster accessed via DRMAA,
Snakemake compiles jobs into scripts that are
submitted to the cluster with the given command, once
all input files for a particular job are present. ARGS
can be used to specify options of the underlying
cluster system, thereby using the job properties name,
rulename, input, output, params, wildcards, log,
threads and dependencies, e.g.: --drmaa ' -pe threaded
{threads}'. Note that ARGS must be given in quotes and
with a leading whitespace. (default: None)
--cluster-config FILE, -u FILE
A JSON or YAML file that defines the wildcards used in
'cluster'for specific rules, instead of having them
specified in the Snakefile. For example, for rule
'job' you may define: { 'job' : { 'time' : '24:00:00'
} } to specify the time for rule 'job'. You can
specify more than one file. The configuration files
are merged with later values overriding earlier ones.
This option is deprecated in favor of using --profile,
see docs. (default: [])
--immediate-submit, --is
Immediately submit all jobs to the cluster instead of
waiting for present input files. This will fail,
unless you make the cluster aware of job dependencies,
e.g. via: $ snakemake --cluster 'sbatch --dependency
{dependencies}. Assuming that your submit script (here
sbatch) outputs the generated job id to the first
stdout line, {dependencies} will be filled with space
separated job ids this job depends on. Does not work
for workflows that contain checkpoint rules. (default:
False)
--jobscript SCRIPT, --js SCRIPT
Provide a custom job script for submission to the
cluster. The default script resides as 'jobscript.sh'
in the installation directory. (default: None)
--jobname NAME, --jn NAME
Provide a custom name for the jobscript that is
submitted to the cluster (see --cluster). NAME is
"snakejob.{name}.{jobid}.sh" per default. The wildcard
{jobid} has to be present in the name. (default:
snakejob.{name}.{jobid}.sh)
--cluster-status CLUSTER_STATUS
Status command for cluster execution. This is only
considered in combination with the --cluster flag. If
provided, Snakemake will use the status command to
determine if a job has finished successfully or
failed. For this it is necessary that the submit
command provided to --cluster returns the cluster job
id. Then, the status command will be invoked with the
job id. Snakemake expects it to return 'success' if
the job was successful, 'failed' if the job failed and
'running' if the job still runs. (default: None)
--cluster-cancel CLUSTER_CANCEL
Specify a command that allows to stop currently
running jobs. The command will be passed a single
argument, the job id. (default: None)
--cluster-cancel-nargs CLUSTER_CANCEL_NARGS
Specify maximal number of job ids to pass to
--cluster-cancel command, defaults to 1000. (default:
1000)
--cluster-sidecar CLUSTER_SIDECAR
Optional command to start a sidecar process during
cluster execution. Only active when --cluster is given
as well. (default: None)
--drmaa-log-dir DIR Specify a directory in which stdout and stderr files
of DRMAA jobs will be written. The value may be given
as a relative path, in which case Snakemake will use
the current invocation directory as the origin. If
given, this will override any given '-o' and/or '-e'
native specification. If not given, all DRMAA stdout
and stderr files are written to the current working
directory. (default: None)
FLUX:
--flux Execute your workflow on a flux cluster. Flux can work
with both a shared network filesystem (like NFS) or
without. If you don't have a shared filesystem,
additionally specify --no-shared-fs. (default: False)
KUBERNETES:
--kubernetes [NAMESPACE]
Execute workflow in a kubernetes cluster (in the
cloud). NAMESPACE is the namespace you want to use for
your job (if nothing specified: 'default'). Usually,
this requires --default-remote-provider and --default-
remote-prefix to be set to a S3 or GS bucket where
your . data shall be stored. It is further advisable
to activate conda integration via --use-conda.
(default: None)
--container-image IMAGE
Docker image to use, e.g., when submitting jobs to
kubernetes Defaults to
'https://hub.docker.com/r/snakemake/snakemake', tagged
with the same version as the currently running
Snakemake instance. Note that overwriting this value
is up to your responsibility. Any used image has to
contain a working snakemake installation that is
compatible with (or ideally the same as) the currently
running version. (default: None)
--k8s-cpu-scalar FLOAT
K8s reserves some proportion of available CPUs for its
own use. So, where an underlying node may have 8 CPUs,
only e.g. 7600 milliCPUs are allocatable to k8s pods
(i.e. snakemake jobs). As 8 > 7.6, k8s can't find a
node with enough CPU resource to run such jobs. This
argument acts as a global scalar on each job's CPU
request, so that e.g. a job whose rule definition asks
for 8 CPUs will request 7600m CPUs from k8s, allowing
it to utilise one entire node. N.B: the job itself
would still see the original value, i.e. as the value
substituted in {threads}. (default: 0.95)
TIBANNA:
--tibanna Execute workflow on AWS cloud using Tibanna. This
requires --default-remote-prefix to be set to S3
bucket name and prefix (e.g.
'bucketname/subdirectory') where input is already
stored and output will be sent to. Using --tibanna
implies --default-resources is set as default.
Optionally, use --precommand to specify any
preparation command to run before snakemake command on
the cloud (inside snakemake container on Tibanna VM).
Also, --use-conda, --use-singularity, --config,
--configfile are supported and will be carried over.
(default: False)
--tibanna-sfn TIBANNA_SFN
Name of Tibanna Unicorn step function (e.g.
tibanna_unicorn_monty).This works as serverless
scheduler/resource allocator and must be deployed
first using tibanna cli. (e.g. tibanna deploy_unicorn
--usergroup=monty --buckets=bucketname) (default:
None)
--precommand PRECOMMAND
Any command to execute before snakemake command on AWS
cloud such as wget, git clone, unzip, etc. This is
used with --tibanna.Do not include input/output
download/upload commands - file transfer between S3
bucket and the run environment (container) is
automatically handled by Tibanna. (default: None)
--tibanna-config TIBANNA_CONFIG [TIBANNA_CONFIG ...]
Additional tibanna config e.g. --tibanna-config
spot_instance=true subnet=<subnet_id> security
group=<security_group_id> (default: None)
GOOGLE_LIFE_SCIENCE:
--google-lifesciences
Execute workflow on Google Cloud cloud using the
Google Life. Science API. This requires default
application credentials (json) to be created and
export to the environment to use Google Cloud Storage,
Compute Engine, and Life Sciences. The credential file
should be exported as GOOGLE_APPLICATION_CREDENTIALS
for snakemake to discover. Also, --use-conda, --use-
singularity, --config, --configfile are supported and
will be carried over. (default: False)
--google-lifesciences-regions GOOGLE_LIFESCIENCES_REGIONS [GOOGLE_LIFESCIENCES_REGIONS ...]
Specify one or more valid instance regions (defaults
to US) (default: ['us-east1', 'us-west1', 'us-
central1'])
--google-lifesciences-location GOOGLE_LIFESCIENCES_LOCATION
The Life Sciences API service used to schedule the
jobs. E.g., us-centra1 (Iowa) and europe-west2
(London) Watch the terminal output to see all options
found to be available. If not specified, defaults to
the first found with a matching prefix from regions
specified with --google-lifesciences-regions.
(default: None)
--google-lifesciences-keep-cache
Cache workflows in your Google Cloud Storage Bucket
specified by --default-remote-prefix/{source}/{cache}.
Each workflow working directory is compressed to a
.tar.gz, named by the hash of the contents, and kept
in Google Cloud Storage. By default, the caches are
deleted at the shutdown step of the workflow.
(default: False)
TES:
--tes URL Send workflow tasks to GA4GH TES server specified by
url. (default: None)
CONDA:
--use-conda If defined in the rule, run job in a conda
environment. If this flag is not set, the conda
directive is ignored. (default: False)
--conda-not-block-search-path-envvars
Do not block environment variables that modify the
search path (R_LIBS, PYTHONPATH, PERL5LIB, PERLLIB)
when using conda environments. (default: False)
--list-conda-envs List all conda environments and their location on
disk. (default: False)
--conda-prefix DIR Specify a directory in which the 'conda' and 'conda-
archive' directories are created. These are used to
store conda environments and their archives,
respectively. If not supplied, the value is set to the
'.snakemake' directory relative to the invocation
directory. If supplied, the `--use-conda` flag must
also be set. The value may be given as a relative
path, which will be extrapolated to the invocation
directory, or as an absolute path. The value can also
be provided via the environment variable
$SNAKEMAKE_CONDA_PREFIX. (default: None)
--conda-cleanup-envs Cleanup unused conda environments. (default: False)
--conda-cleanup-pkgs [{tarballs,cache}]
Cleanup conda packages after creating environments. In
case of 'tarballs' mode, will clean up all downloaded
package tarballs. In case of 'cache' mode, will
additionally clean up unused package caches. If mode
is omitted, will default to only cleaning up the
tarballs. (default: None)
--conda-create-envs-only
If specified, only creates the job-specific conda
environments then exits. The `--use-conda` flag must
also be set. (default: False)
--conda-frontend {conda,mamba}
Choose the conda frontend for installing environments.
Mamba is much faster and highly recommended. (default:
mamba)
SINGULARITY:
--use-singularity If defined in the rule, run job within a singularity
container. If this flag is not set, the singularity
directive is ignored. (default: False)
--singularity-prefix DIR
Specify a directory in which singularity images will
be stored.If not supplied, the value is set to the
'.snakemake' directory relative to the invocation
directory. If supplied, the `--use-singularity` flag
must also be set. The value may be given as a relative
path, which will be extrapolated to the invocation
directory, or as an absolute path. (default: None)
--singularity-args ARGS
Pass additional args to singularity. (default: )
--cleanup-containers Remove unused (singularity) containers (default:
False)
ENVIRONMENT MODULES:
--use-envmodules If defined in the rule, run job within the given
environment modules, loaded in the given order. This
can be combined with --use-conda and --use-
singularity, which will then be only used as a
fallback for rules which don't define environment
modules. (default: False)
If an arg is specified in more than one place, then commandline values
override environment variables which override defaults.
Installation on mega-linter Docker image
- PIP packages (Python):