Validation workflows • nfportalutils

This vignette demonstrates two approaches for programmatic metadata validation workflows using the utilities in this package. Validating metadata via the UI is covered elsewhere.

First set up as usual:

library(nfportalutils)
syn_login(Sys.getenv("SYNAPSE_AUTH_TOKEN"))

Option 1: Using Synapse validation service

The Synapse validation service will be the main validation service to use starting in 2026. Before validating with Synapse’s native backend capabilities, make sure a JSON schema is “bound” to the folder. Here’s our example folder with several files; two directly under the folder and one in a subfolder.

my_dataset <- "syn51106460"

Bind the schema that we know to be the appropriate one:

# Preferred: Bind explicit version
bind_schema(id = my_dataset, schema_id = "org.synapse.nf-generalmeasuredatatemplate-10.2.0")

# Use carefully: Bind the "latest" (last registered) version
# bind_schema(id = my_dataset, schema_id = "org.synapse.nf-generalmeasuredatatemplate")

Validate all items in the folder. A fileview that includes the folder in the scope is required. The results are returned as the “raw” response; if you want a summary format, create a function to process the results as desired.

validate_dataset_folder(my_dataset, fileview = "syn64364141")

Option 2 (will be DEPRECATED in 2026): Using Schematic API service

Most validation will use schematic service until 2026. Note that the schematic API only works with dataset folders currently. An example dataset folder is given below:

my_dataset <- "syn25386362"

To validate metadata, a manifest must be reconstituted, i.e. we create a csv from the annotations currently on the files. Type ?manifest_generate to read the docs.

You need to know the data_type to validate against. The data_type is the same as the “Component” value in the schematic data model. If feeling lucky, try infer_data_type.

inferred <- infer_data_type(my_dataset)
inferred

Once data_type is in hand, here’s the general invocation:

data_type <- inferred$data_type # Or set manually otherwise

manifest_generate(data_type, 
                  dataset_id = my_dataset,
                  schema_url = "https://raw.githubusercontent.com/nf-osi/nf-metadata-dictionary/main/NF.jsonld",
                  output_format = "google_sheet") # otherwise excel

Go to the google_sheet and download as .csv. If Excel was chosen, open in some spreadsheet editor and resave file as .csv. Then validate.

manifest_validate(data_type = data_type, 
                  file_name = "GenomicsAssayTemplate - Sheet1.csv")

Make corrections in the .csv according to validation laundry list.
Submit corrected manifest via DCA (or via annotate_with_manifest).