Berzelius Common Datasets

To avoid data duplication and save hard drive space, we provide access to a selection of public datasets frequently used in AI/ML research. The datasets are Ready-to-use under COMMON_DATASETS=/proj/common-datasets.

Users are encouraged to contact us to request corrections, updates, or the addition of new datasets.

Resource Version Directory Access License / Terms
Protein structure and bioinformatics
AlphaFold
BFD Not versioned AlphaFold Ready-to-use CC BY 4.0
MGnify 2022_05, 2018_12 AlphaFold Ready-to-use CC0
PDB70 from_mmcif_200401 AlphaFold Ready-to-use CC BY 4.0
PDB Not versioned AlphaFold Ready-to-use CC0
PDB seqres Not versioned AlphaFold Ready-to-use CC0
UniRef30 2021_03 AlphaFold Ready-to-use CC BY-SA 4.0
UniProt 2022_05 AlphaFold Ready-to-use CC BY 4.0
UniRef90 2022_05 AlphaFold Ready-to-use CC BY 4.0
Parameters 2022-12-06 AlphaFold Ready-to-use Apache 2.0
AlphaFold 3
BFD small Not versioned AlphaFold3 Ready-to-use CC BY 4.0
MGnify 2022_05 AlphaFold3 Ready-to-use CC0
PDB 2022_09_28 AlphaFold3 Ready-to-use CC0
PDB seqres 2022_09_28 AlphaFold3 Ready-to-use CC0
UniProt 2021_04 AlphaFold3 Ready-to-use CC BY 4.0
UniRef90 2022_05 AlphaFold3 Ready-to-use CC BY 4.0
NT 2023_02_23 AlphaFold3 Ready-to-use Not specified
RFam 14_9 AlphaFold3 Ready-to-use CC
RNACentral Not versioned AlphaFold3 Ready-to-use CC0
Model parameters Not hosted Not provided on Berzelius Obtain separately Terms
Foldseek
afdb 2025-10-08 Foldseek Ready-to-use CC BY 4.0
afdb50 2025-10-07 Foldseek Ready-to-use CC BY 4.0
bfvd 2025-09-12 Foldseek Ready-to-use CC BY 4.0
OpenFold
Trained parameters Not versioned OpenFold Ready-to-use Apache 2.0
SoloSeq trained parameters Not versioned OpenFold Ready-to-use Apache 2.0
ColabFold's environmental database 202108 OpenFold Ready-to-use MIT
Alignments Not versioned OpenFold Ready-to-use CC0
Alignment DBs Not versioned OpenFold Ready-to-use CC0
Data caches Not versioned OpenFold Ready-to-use Apache 2.0
Computer vision and image classification
CIFAR-10/100 Not versioned CIFAR Ready-to-use Not specified
COCO Not versioned COCO Ready-to-use CC BY 4.0
DomainNet Not versioned DomainNet Ready-to-use Terms
Fashion-MNIST Not versioned Fashion-MNIST Ready-to-use MIT
ImageNet Not versioned ImageNet Request via SUPR Terms
Imagenette Not versioned Imagenette Request via SUPR Terms
MNIST Not versioned MNIST Ready-to-use CC BY-SA 3.0
Places365 Not versioned Places Request via support Terms
Autonomous driving and robotics
Other autonomous driving datasets
Argoverse v1.1, v2.0 Argoverse Request via SUPR Terms
KITTI Not versioned KITTI Ready-to-use Not specified
KITTI-360 Not versioned KITTI-360 Ready-to-use Not specified
MAN-TruckScenes v1.0 MAN-TruckScenes Ready-to-use CC BY-NC-SA 4.0
nuImages v1.0 nuImages Request via SUPR Terms
nuPlan v1.1 nuPlan Request via SUPR Terms
Zenseact Open Dataset Not versioned Zenseact-Open-Dataset Request via SUPR Terms
nuScenes
Panoptic v1.0 nuScenes Request via SUPR Terms
Lidarseg v1.0 nuScenes Request via SUPR Terms
CAN bus expansion v1.0 nuScenes Request via SUPR Terms
Map expansion v1.3 nuScenes Request via SUPR Terms
Full dataset v1.0 nuScenes Request via SUPR Terms
Waymo Open Dataset
Motion Dataset 1.2.1, 1.3.0 Waymo Request via support Terms
Perception Dataset 1.4.3, 2.0.1 Waymo Request via support Terms
Marine imaging and plankton datasets
SMHI IFCB Plankton version 2 SMHI-IFCB-Plankton Ready-to-use CC BY 4.0
SYKE-plankton_IFCB_2022 20220201 SYKE-plankton_IFCB_2022 Ready-to-use CC BY 4.0
SYKE-plankton_IFCB_Utö_2021 20220428 SYKE-plankton_IFCB_Utö_2021 Ready-to-use CC BY 4.0
WHOI-Plankton Not versioned WHOI-Plankton Ready-to-use MIT

User Area

User support

Guides, documentation and FAQ.

Getting access

Applying for projects and login accounts.

System status

Everything OK!

No reported problems

Self-service

SUPR
NSC Express