
ADR: mighty.toolbox integration
Aksel Thomsen
Ryszard Szymanski
2026-03-11
Source:vignettes/articles/adr-mighty_toolbox_integration.Rmd
adr-mighty_toolbox_integration.RmdStatus
| Package | mighty.metadata |
| Status | Approved |
| Version | 0.2.1 |
| Description | ADR for how mighty.metadata should integrate with mighty.toolbox |
Success criteria
-
mighty.toolboxis able to generate adefine.xmlusing metadata created withmighty.metadata - We know how the
mighty.metadatametadata is provided tomighty.toolboxe.g. path to folder with YAML files or R object - We know if things fall into
mighty.metadatascope ormighty.toolboxscope - Outline of the programming interface e.g. user facing functions
Context
-
mighty.metadata::mighty_study()would provide the potential R object - Planned utility function to create metadata datasets similar to existing sheets in the CST file and “md” datasets
- We want to make it easy to map
mighty.metadatametadata into a list of data.frames (structure already used bymighty.toolbox) -
mighty.toolbox::generate_define_xml()has themetadata_pathargument which currently accepts a path to an.xlsxfile or a folder withYAML - Current schema of one metadata domain
- YAML metadata is planned to be version controlled, so it will always be saved to disk
- When dealing with pooled metadata - one yaml file might define the same domain in multiple studies and a user would subset metadata manually before using it
- Metadata can be in a “populated” state or not -
mighty.metadata::populate_sparse()andmighty.metadata::populate_core()are used to populate predecessor columns along with inherited metadata - these operations are assumed to be idempotent
-
mighty.metadatais missing some fields required bymighty.toolbox:- mighty.metadata drops whereclauses that don’t use EQ as a comparator in source_values
- xmlcodelist information does not get included in column_metadata in YAML files
- Origin and Origindescription are missing in value metadata
- include_in_submission/include_in_trial columns get ignored
- Equivalent of ‘Mapping’ sheet from the CST file
Decisions
mighty.toolbox needs
mighty.metadata::mighty_study() for the integration
Pass a mighty.metadata::mighty_study() object
mighty.metadata::mighty_study("path/to/folder") |>
mighty.toolbox::generate_define_xml()later info from the object would be retrieved through @
properties
Pros:
- Simple and user friendly interface
- Support pooled metadata as users can subset the metadata before
passing it to
mighty.toolbox::generate_define_xml()
Cons:
- Requires
mighty.toolboxto change the name of the input parameter e.g. tometadataorx(but users already sometimes asked about reading metadata as a separate step or we can just accept either a path or an object) - we could avoid a parameter name change if we callmighty.metadata::mighty_study()internally - Need to add
mighty.metadatato Suggests (as we need to create an object for testing) - it’s ok because both are internally developed packages and need to aligned anyways - Validating the correctness of a
mighty_studyobject can be tricky e.g. making sure it’s fully populated (Mitigation: as it’s an idempotent operation we can just redo it, ormighty.metadatacould export a validation function)
Alternatives Considered
How to pass mighty.metadata metadata to
mighty.toolbox
Pass the folder with YAML files to
mighty.toolbox::generate_define_xml() and use
mighty.metadata::mighty_study() under the hood
mighty.toolbox::generate_define_xml(metadata_path = "folder_with_yaml_metadata") # mighty.metadata::mighty_study() used under the hoodPros:
- That’s how it currently works, so we avoid a breaking parameter change
- We keep control over how the files are read e.g. in the CST file
empty columns are read as logical by
openxlsx2which can cause issues with joins when the same column is not empty in another sheet, so giving users too much flexibility might be problematic
Cons:
- Need to add
mighty.metadatato Imports if we callmighty.metadata::mighty_study()directly (as we need to create an object for testing) - it’s ok because both are internally developed packages and need to be aligned anyways - Metadata subsetting would need to happen in
mighty.toolbox::generate_define_xml()- as users cannot provide the subsetted object themselves
Create a CST file based on the YAML files and pass it to
mighty.toolbox
Pros:
- No changes needed to
mighty.toolboxas it already accepts CST files
Cons:
- Having to maintain the CST format (which we don’t want to use directly anymore)
- Having to maintain the conversion to the CST format
- If define.xml requirements change it might impact the CST format (so we are updating 2 formats at once) - deal-breaker
Implementation Details
See Decisions for general interface.
On parameters and Whereclauses
-
mighty.metadatawill support only ‘EQ’-like whereclauses - All logic supporting different comparators will be removed
- Parameters are identified only via id
- If multiple conditions apply they need to be all listed as EQ clauses
- The variable for comparison is always
PARAMCD
Given the following example:
parameters:
- id: BMIDER
label: "Body mass index"
columns:
- id: AVAL
origin: Derived
method: "Derivation Method"mighty.toolbox will interpret the
id: BMIDER as the whereclause
PARAMCD EQ 'BMIDER'. Multiple whereclauses are listed like
this:
parameters:
- id: A
label: "A"
columns:
- id: AVAL
origin: Derived
method: "Derivation method A"
- id: B
label: "B"
columns:
- id: AVAL
origin: Derived
method: "Derivation method B"Due to mentioned restrictions, logical AND is not
supported - whereclauses in the form
PARAMCD EQ 'A' AND PARAMCD EQ 'B' would always resolve to
FALSE.
Testing Strategy
- Test in
mighty.toolboxusingmighty.metadatametadata - (If possible) CI in
mighty.metadatainforming about breakingmighty.toolbox
Compliance Considerations
- All development on GitHub using Pull Requests for merges to main branch, and standard ATMOS branch protection rules.
- R CMD Check is required to pass on all relevant platforms before a PR is approved.
References
- mighty.metadata
- mighty.toolbox (internal package)
- r.workflows