Below are some additional Scikick commands that should enhance usage throughout project development.
sk status -v can be used to view the full Scikick analysis configuration.
Dependencies for each file are indented just as they are formatted in the scikick.yml
.
Out-of-date files are marked with a three symbol code which shows
the reason for their update on the next sk run
.
sk mv can be used while rearranging files in the project to adjust the workflow definition in tandem with the file moves.
mkdir code
sk mv hw.Rmd code/hw.Rmd
If you are using git, use sk mv -g
to use git mv
during this process.
Both individual files and directories can be moved with sk mv
.
sk del is the counterpart to sk add. For example, we can remove hw.Rmd
from our analysis with
sk del hw.Rmd
Unlike sk add, if the flag ‘-d’ is used (with a dependency specified), only the dependency is removed.
No usage of sk del will result in deleted files. Users should remove notebooks from the workflow with sk del and then delete the notebook using standard methods.
In order to make our project more tidy, we can create some dedicated directories with
sk init --dirs
# creates:
# report/ - output directory for scikick
# output/ - directory for outputs from scripts
# code/ - directory containing scripts (Rmd and others)
# input/ - input data directory
If git is in use for the project, directories report
, output
, input
are not
recommended to be tracked.
They can be added to .gitignore
with
sk init --git
and git will know to ignore the contents of these directories.
A short template readme snippet is provided to inform readers that the project uses Scikick.
sk layout can be used to configure the order of the menus and menu items in final report.
Start by running the command without arguments
sk layout
1: hw
2: greets
3: dummy1
4: dummy2
Which returns the current ordered list of tab indices and their names.
The order can be changed by specifying the new order of tab indices, e.g.
# to reverse the tab order:
sk layout 4 3 2 1
# the list does not have to include all of the indices (1 to 4 in this case):
sk layout 4 # move tab 4 to the front
# the incomplete list '4' is interpreted as '4 1 2 3'
Output after running sk layout 4
:
1: dummy2
2: hw
3: greets
4: dummy1
Items within menus can be rearranged similarly with:
sk layout -s <menu name>
Data pipelines benefit from improved workflow execution tools
(Snakemake, Bpipe, Nextflow), however, ad hoc data analysis projects often do
not apply these tools.
Users can quickly configure reports
to take advantage of the snakemake backend and use snakemake arguments with sk run -v -s <snakemake arguments>
.
Snakemake is responsible for:
sk run -s -j <number of cores>
where scikick assumes each page
uses just a single core.--cluster
or --profile
arguments)sk config is used to add additional configurations to projects.
In order to run all Rmds in a singularity image, specify the singularity image and use the singularity snakemake flag.
# specify a singularity image
sk config --singularity docker://rocker/tidyverse
# run the project within a singularity container
# by passing '--use-singularity' argument to Snakemake
sk run -v -s --use-singularity
Scripts will be run inside the singularity container. The container must have at least the R dependencies installed (most R-based containers have these packages installed).
Similar steps are used to execute projects in a conda environment. In this case, the config should point to a conda environment YAML file.
# create an env.yml file from the current conda environment
conda env export > env.yml
# specify that this file is the conda environment file
sk config --conda env.yml
# run
sk run -v -s --use-conda
Use of these methods can ensure executions have all required software.
These features and other features of snakemake can make it more feasible to configure projects for automated re-execution on a remote server.
Additional workflows written in snakemake should play nicely with the Scikick workflow. By default, a Snakefile
at the project root will be included in the sk run
execution (The scikick workflow will use the include:
directive).
These jobs can be added to the beginning, middle, or end of Scikick related tasks:
sk add first_step.rmd -d pipeline_donefile
(where pipeline_donefile is the last file generated by the Snakefile)report/out_md/first_step.md
as the input to the first job of the Snakefile.sk add second_step.rmd -d pipeline_donefile
report/out_md/last_step.md
as the input to the first job of the Snakefile.Further, built-in Scikick methods can be overridden by ensuring they utilize the same inputs and outpus. Use of the snakemake ruleorder
directive can prioritize the rules over Scikick rules.
It can be useful to explore the scikick configuration interactively via python directly. Below, the scikick configuration is read into python and the project map is generated for viewing. This project map can be found on each page of the report site where each node represents a page with a link to the contents.
import scikick
import scikick.graph
scikick.graph.make_dag(scikick.ScikickConfig())