Containerized nctime for time-axis checks of CMIP6 data on tetralith
December 28, 2020
Need
SMHI produce a lot of CMIP6 data which is then published on the ESGF datanode esg-dn1.nsc.liu.se
. Among the tests that CMIP6 datanode administrators are required to carry out before publishing CMIP6 data are checks for time axis squareness, and time coverage continuity, both of which require nctime to be installed. This container is also used by the automated ESGF Publishing Tool scripts in use at NSC, as part of the data acceptance checks, performed prior to ESGF data publication.
The container
- A containerized deployment of nctime (singularity container) has been created with nctime predeployed, and made available on tetralith.
- Available as
/proj/s-cmip/singularity/nctcckenv.simg
- SHA256SUM:
077f51c12ed1e3ba4fd403a726d9ec5b15eda6b7da81a06ffe1df00768341a5f
- Wrapper scripts for nctcck and nctxck, to allow for non-interactive execution with singularity exec
- Gitlab repo for nctcckenv
General instructions
- Singularity is only available on compute nodes on tetralith, so in order to use the container, ensure that you are already on a compute node.
- For faster execution without affecting other users, it’s recommended to obtain a single node in fully dedicated mode, and run nctcck/nctxck with the
--max-processes -1
option, which runs maximum possible concurrent threads. - You need to mount the directory containing the data (absolute path) to be scanned, so the container can see it.
Usage instructions (interactive)
- On the compute node, do the following
singularity run --bind <path_to_data_directory>:/mnt /proj/s-cmip/singularity/nctcckenv.simg
- Example:
singularity run --bind /proj/s-cmip/users/lu/x_larni/cmorized/GD01:/mnt /proj/s-cmip/singularity/nctcckenv.simg
Singularity> source /opt/scripts/venv/bin/activate
(venv) Singularity>
ls /mnt
CMIP6 QC sha256sums
nctcck --max-processes -1 /mnt/CMIP6
Process netCDF file(s): 100% | 4960/4960 files
Number of node(s): 4960
Number of dataset(s): 62
Number of dataset(s) with overlap(s): 0
Number of dataset(s) with broken time series: 0
Number of file(s) scanned: 4960
Number of file(s) skipped: 0
nctxck --max-processes -1 /mnt/CMIP6
Process netCDF file(s): 100% | 4960/4960 files
Number of file(s) with error(s): 0
Number of file(s) scanned: 4960
Number of file(s) skipped: 0
Usage instructions (non-interactive)
- nctcck (time coverage continuity)
[pchengi@n629 ~]$ singularity exec --bind /proj/s-cmip/users/lu/x_larni/cmorized/GD01:/mnt /proj/s-cmip/singularity/nctcckenv.simg bash /opt/scripts/nctcckwrapper.sh -1 /mnt CMIP6 /home/pchengi/nctcck_output.txt
working in '/mnt'
Here, arguments are:
/mnt: the parent directory to the data directory
CMIP6: the data directory, directly beneath /mnt
/home/pchengi/nctcck_output.txt: the log file. Always use an absolute path to the log file.
cat nctcck_output.txt
Process netCDF file(s): 100% | 4960/4960 files
Number of node(s): 4960
Number of dataset(s): 62
Number of dataset(s) with overlap(s): 0
Number of dataset(s) with broken time series: 0
Number of file(s) scanned: 4960
Number of file(s) skipped: 0
- nctxck (time axis squareness check)
[pchengi@n629 ~]$ singularity exec --bind /proj/s-cmip/users/lu/x_larni/cmorized/GD01:/mnt /proj/s-cmip/singularity/nctcckenv.simg bash /opt/scripts/nctxckwrapper.sh -1 /mnt CMIP6 '--ignore-errors 005' /home/pchengi/nctxck_output.txt
working in '/mnt'
Here, the arguments are:
/mnt: the parent directory to the data directory
CMIP6: the data directory, directly beneath /mnt
--ignore-errors 005: ignore error code 005 (for full list of error codes, see section `Listing of error codes for nctxck` below)
/home/pchengi/nctxck_output.txt: the log file. Always use an absolute path to the log file.
[pchengi@n629 ~]$ cat nctxck_output.txt
Process netCDF file(s): 100% | 4960/4960 files
Number of file(s) with error(s): 0
Number of file(s) scanned: 4960
Number of file(s) skipped: 0
Listing of error codes for nctxck
000: Time axis seems OK
001: Incorrect time axis over one or several time steps
002: Time units must be unchanged for the same dataset
003: Last date is earlier than end date from filename
004: An instantaneous time axis should not embed time boundaries
005: An averaged time axis should embed time boundaries
006: Incorrect time bounds over one or several time steps
007: Calendar must be unchanged for the same dataset
008: Last date is later than end date of filename