For neural networks to expand into more areas in traditional supercomputing, especially in areas where image data is key, high image fidelity and accuracy are critical.
While it is not uncommon to reduce image size while adding the burdening of extensive labeling for convolutional neural networks in image-rich research areas, doing so sacrifices that all-important resolution and has an impact on the accuracy of results. Doing this takes a great deal of memory, both on single processing elements and across the system during training in particular. In short, training massive image-based datasets is not a cheap or simple proposition.
The thing is, as many areas of scientific computing start to use image-based datasets to fill in gaps, replace or “suggest” parts of simulations, or do cosmic clean-up work as we’ll discuss below, having a primed architecture for large-scale image processing that can scale without sacrificing resolution and depth is rare—especially if seeking something that can integrate with existing supercomputers.
Most national labs and universities are training and inferencing on the dominant accelerator in HPC—the GPU. However, as Argonne National Laboratory has found, the memory limitations of GPUs for certain convolutional neural networks can be overcome with other architectures. In their case it was with SambaNova’s DataScale system.
In recent cosmology work, Argonne found that by modifying ResNet to optimize for cosmic background remove from liquid argon time projection chamber images, they were able to get around downsampling high-resolution input images and keep their original input image size of 1260×2048 with all three channels preserved. On the V100 GPU even a single batch ran out of memory on an Nvidia V100 GPU.
While the improvements with an Nvidia A100 GPU might have been more pronounced, in the Argonne team’s view, With minimal changes to the original code, SambaNova’s DataScale system provides a way to train deep CNN models with gigapixel images efficiently. Other computer vision tasks, such as classification and image superpixel resolution, would benefit greatly from the ability to train models without losing any information.
As the authors of the cosmic background removal using SambaNova’s DataFlow system add, “In contrast [to the V100 GPU], the reconfigurable dataflow architecture of SambaNova’s DataScale system does not have these problems. The Argonne and SambaNova team were able to train CNNs with images beyond 50kx50k resolution. We used the same model, configuration, and hyperparameters, expect we were able to use images with their original sizes without downsampling.”
To overcome the issues of training on GPUs, the authors had previously downsampled their input images to 50% resolution and trained the model with inputs containing 3x640x1024 pixels. However, this results in a loss of information which is crucial to this problem and many other sensitive domains such as medical imaging and astronomy.
As SambaNova’s VP of Product, Marshall Choy, tells The Next Platform, “the reconfigurable dataflow architecture eliminates downsampling high-resolution input images to low resolution for training and inference, preserving all information in the original image. The software stack further eliminates data loss when tiling the image as is required to fit memory constraints of other architectures. The result is the ability to train CNNs with input images of 50k x 50k resolution, higher overall model quality by achieving state of the art accuracy levels exceeding 90% and eliminating additional image labeling.”
Even though the model on DataScale’s Reconfigurable DataFlow Unit (RDU) is trained at a lower precision (bfloat16) compared to GPU’s FP32, Choy says the teams were able to ensure stable convergence and achieve better results. “Certain loss functions such as focal loss are inferior when using a lower batch size per replica. While GPUs (A100) can fit only one image per replica at full image sizes, RDUs let you train with up to 32 samples per replica and further improve accuracy.”
The Argonne team adds that with minimal changes to the original code, SambaNova’s DataScale system provided a “way to train deep CNN models with gigapixel images efficiently. Other computer vision tasks, such as classification and image superpixel resolution, would benefit greatly from the ability to train models without losing any information.”
More information was just released following publication of this story here.