Introduction
The whirl package provides functionalities for executing scripts in batch while simultaneously getting logs from the individual executions. In the following sections, we will go through how to use the package.
Ways to call the run()
function
Single and multiple files, and wild cards
The input
argument in the run()
function
can in the most simple case point to a single file for which an
execution and log-generation is required.
The input
argument can also point to multiple files in a
directory using wildcard (or globbing) patterns. In this case, all
scripts in the directory will be executed in parallel and a log will be
generated for each script. The number of workers can be specified
through the n_workers
argument (default is set to one). A
summary log file will also be provided as a fast way to get an overview
of the execution. The location of the summary file can be controlled
with the summary_file
argument.
# Execution of all R files in a specific directory
run(
input = "path/to/directory/*.R",
n_workers = 4,
summary_file = "path/to/summary"
)
# Execution of all R files starting with "mk200" in a specific directory
run(
input = "path/to/directory/mk200*.R",
n_workers = 8,
summary_file = "path/to/summary"
)
More information on how the wildcards are interpreted see
Sys.glob()
.
It is also possible to provide a character vector of several paths
(either single files or glob expression) that should be executed. Note
that whenever the input
argument in is supplied with a
character vector
(e.g. c("path/to/script1.R", "path/to/script2.R")
) it
assumes that these can be executed independently and in parallel. If the
elements needs to be executed sequentially this can be achieved by using
a list()
instead (see below).
Using list()
as input
If the scripts have to be executed in a specific order, the
input
argument can be supplied as a list. The scripts will
then be executed in the order they are listed in the list, with scripts
listed in the same element being executed in parallel (if
n_workers
> 1).
# In the below example, script1.R and script2.R will be executed in parallel
run(
input = c("path/to/script1.R",
"path/to/script2.R"),
n_workers = 2
)
# In the below example, script1.R and script2.R will be executed in parallel,
# and all R files in the directory will subsequently be executed in parallel
run(
input = list(
c("path/to/script1.R", "path/to/script2.R"),
"path/to/directory/*.R"
),
n_workers = 2
)
# In the below example, script1.R and script2.R will be executed in parallel, and subsequently
# script3.R and script4.R will be executed in parallel
run(
input = list(
c("path/to/script1.R", "path/to/script2.R"),
c("path/to/script3.R", "path/to/script4.R")
),
n_workers = 2
)
The list can also be supplied with names list elements. This can be useful during execution as some of these ‘name’ will be printed to the console.
E.g.
Using a configuration file as input
The execution order can also be pre-specified in a configuration file (config file for short). The config file could have the following structure.
steps:
- name: "Step 1"
paths:
- "path/to/script1.R"
- "path/to/script2.R"
- name: "Step 2"
paths:
- "path/to/script3.R"
- "path/to/script4.R"
In this case, the input
argument in the
run()
function should point to the config file. Assuming
the config file is called config.yaml
, the execution can be
initiated as follows:
run(input = "path/to/config.yaml", n_workers = 4)
Each steps in the config file will be executed independently while
scrips within each step will be executed in parallel using the number of
workers specified with the n_workers
argument.
Adjusting the log directory
How to use the log_dir
argument to specify where to
store the logs
When executing run()
the default is to store logs in the
directory where the individual scripts are located. For example, if we
apply run()
to a a vector of scripts with the following
paths c(path/to/dir1/script1.R, path/to/dir2/script2.R)
,
the log of script1.R and script2.R will be stored in
path/to/dir1 and path/to/dir2,
respectively.
If the logs should be stored in a different directory, the
log_dir
argument can be used. This argument can be supplied
with a character string or a function. Note that in either case the
directory that log_dir
points to must exist before the
execution is initiated.
If the log_dir
is supplied with a character pointing to
a specific path the call could look like:
run(input = "path/to/script.R", log_dir = "path/to/logs")
In this example the log of script.R will be stored in
path/to/logs.
Note that if multiple scripts are executed and log_dir
is a
character to a path, then every log will be redirected to the same
directory - in this case path/to/logs.
If a more dynamic approach is needed the log_dir
argument can also be supplied with a function that will be applied to
the individual path of every script.
For example, if multiple script are executed and the logs needs to be
stored in a sub-folder within the script directories this could be
achieved by:
run(input = c("path/to/dir1/script1.R", "path/to/dir2/script2.R"),
log_dir = function(x) {paste0(dirname(x), "/logs")})
In this exampel the log of script1.R will be stored in path/to/dir1/logs and the log of script2.R will be stored in path/to/dir2/logs.
Note that x refer to the path of the script that is being executed.