Below are some additional Scikick commands that should enhance usage throughout project development.

sk status -v

sk status -v can be used to view the full Scikick analysis configuration. Dependencies for each file are indented just as they are formatted in the scikick.yml. Out-of-date files are marked with a three symbol code which shows the reason for their update on the next sk run.

sk mv

sk mv can be used while rearranging files in the project to adjust the workflow definition in tandem with the file moves.

mkdir code
sk mv hw.Rmd code/hw.Rmd

If you are using git, use sk mv -g to use git mv during this process. Both individual files and directories can be moved with sk mv.

sk del

sk del is the counterpart to sk add. For example, we can remove hw.Rmd from our analysis with

sk del hw.Rmd

Unlike sk add, if the flag ‘-d’ is used (with a dependency specified), only the dependency is removed.

No usage of sk del will result in deleted files. Users should remove notebooks from the workflow with sk del and then delete the notebook using standard methods.

Project Templates with sk init Flags

Directories

In order to make our project more tidy, we can create some dedicated directories with

sk init --dirs
# creates:
# report/ - output directory for scikick
# output/ - directory for outputs from scripts
# code/ - directory containing scripts (Rmd and others)
# input/ - input data directory

Version Control

If git is in use for the project, directories report, output, input are not recommended to be tracked. They can be added to .gitignore with

sk init --git

and git will know to ignore the contents of these directories.

README

A short template readme snippet is provided to inform readers that the project uses Scikick.

sk layout

sk layout can be used to configure the order of the menus and menu items in final report.

Start by running the command without arguments

sk layout

1:  hw
2:  greets
3:  dummy1
4:  dummy2

Which returns the current ordered list of tab indices and their names.

The order can be changed by specifying the new order of tab indices, e.g.

# to reverse the tab order:
sk layout 4 3 2 1
# the list does not have to include all of the indices (1 to 4 in this case):
sk layout 4 # move tab 4 to the front
# the incomplete list '4' is interpreted as '4 1 2 3'

Output after running sk layout 4:

1:  dummy2
2:  hw
3:  greets
4:  dummy1

Items within menus can be rearranged similarly with:

sk layout -s <menu name>

Snakemake Backend

Data pipelines benefit from improved workflow execution tools (Snakemake, Bpipe, Nextflow), however, ad hoc data analysis projects often do not apply these tools. Users can quickly configure reports to take advantage of the snakemake backend and use snakemake arguments with sk run -v -s <snakemake arguments>. Snakemake is responsible for:

Basic dependency management (i.e. Make-like execution)
Parallelization: sk run -s -j <number of cores> where scikick assumes each page uses just a single core.
Distribution of tasks on compute clusters (Using snakemake’s --cluster or --profile arguments)
Software virtualization with: Singularity, Docker, Conda
Other snakemake functionality (via passed arguments)

sk config

sk config is used to add additional configurations to projects.

Singularity

In order to run all Rmds in a singularity image, specify the singularity image and use the singularity snakemake flag.

# specify a singularity image
sk config --singularity docker://rocker/tidyverse
# run the project within a singularity container
# by passing '--use-singularity' argument to Snakemake
sk run -v -s --use-singularity

Scripts will be run inside the singularity container. The container must have at least the R dependencies installed (most R-based containers have these packages installed).

Conda

Similar steps are used to execute projects in a conda environment. In this case, the config should point to a conda environment YAML file.

# create an env.yml file from the current conda environment
conda env export > env.yml
# specify that this file is the conda environment file
sk config --conda env.yml
# run
sk run -v -s --use-conda

Use of these methods can ensure executions have all required software.

Automated Re-Execution

These features and other features of snakemake can make it more feasible to configure projects for automated re-execution on a remote server.

Interoperability with Other Workflows

Additional workflows written in snakemake should play nicely with the Scikick workflow. By default, a Snakefile at the project root will be included in the sk run execution (The scikick workflow will use the include: directive).

These jobs can be added to the beginning, middle, or end of Scikick related tasks:

Beginning
- sk add first_step.rmd -d pipeline_donefile (where pipeline_donefile is the last file generated by the Snakefile)
Middle
- Add report/out_md/first_step.md as the input to the first job of the Snakefile.
- sk add second_step.rmd -d pipeline_donefile
End
- Add report/out_md/last_step.md as the input to the first job of the Snakefile.

Further, built-in Scikick methods can be overridden by ensuring they utilize the same inputs and outpus. Use of the snakemake ruleorder directive can prioritize the rules over Scikick rules.

Python Usage

It can be useful to explore the scikick configuration interactively via python directly. Below, the scikick configuration is read into python and the project map is generated for viewing. This project map can be found on each page of the report site where each node represents a page with a link to the contents.

import scikick
import scikick.graph
scikick.graph.make_dag(scikick.ScikickConfig())

Next (Project Map)

Advanced Usage Commands

21 July 2023