DapuStor accelerates AI model training with a novel BaM framework proposed by NVIDIA using their R5101 and H5100 NVMe SSDs, with tests showing them to be tens of times faster and more cost effective.
DapuStor, a leading provider of advanced storage solutions, showcases a remarkable benchmark result in AI model training performance by leveraging the Big Accelerator Memory (BaM) framework. Recent tests conducted with DapuStor's R5101 (PCIe Gen 4) and H5100 (PCIe Gen 5) NVMe SSDs demonstrate dramatic improvements in AI training time, positioning DapuStor as a key player in advancing AI infrastructure efficiency.
AI model training with BaM
As AI models grow larger and more complex, training datasets can reach tens of terabytes, far exceeding the memory capacity of GPUs. Traditional methods of managing data for AI model training - either CPU orchestration or expanding host/GPU memory - are inefficient and costly. BaM offers a novel solution by enabling direct data transfer from NVMe SSDs to GPUs, bypassing CPU intervention entirely.
BaM was developed by a collaboration of industry and academic leaders, including NVIDIA, IBM, the University of Illinois Urbana-Champaign, and the University at Buffalo. The innovative approach maximises the parallelism of GPU threads and leverages a user-space NVMe driver, ensuring that data is delivered to GPUs on demand with minimal latency. The result is significantly reduced overhead in data synchronisation between the CPU and GPU, optimising both cost and performance.
Impressive test results
In a rigorous test of the BaM framework, Graph Neural Network (GNN) training was evaluated using heterogeneous large datasets. The test system processed 1100 iterations, with DapuStor’s R5101 and H5100 SSDs showing a remarkable 25X performance increase over traditional methods, significantly reducing end-to-end execution time.
The feature aggregation phase, which is most impacted by I/O performance, saw substantial improvements thanks to the high throughput and low latency of the NVMe SSDs. With BaM, the end-to-end execution time dropped from 250s to less than 10s, and reduced the time spent on feature aggregation from 99% in baseline tests to 77%.Adding additional SSDs further enhanced the system's ability to handle data in parallel, reducing feature aggregation times by an additional 40%.
The power of PCIe Gen 5
DapuStor’s H5100, with PCIe Gen 5, proved particularly effective in handling demanding workloads. When the batch size was increased to 4096, the H5100 achieved 18% faster feature aggregation compared to the R5101 (PCIe Gen 4), highlighting the performance benefits of the Gen 5 interface in high-IOPS scenarios. The H5100 reached an estimated 2 million IOPS in the test, which exceeds the maximum value of PCIe 4.0 SSD products available in the market but is still lower than the specs of H5100, demonstrating its ability to fully capitalize on the capabilities of PCIe Gen 5.
“As AI models scale rapidly, ensuring efficient utilisation of GPU resources has become a critical challenge,” says Dapustor Solutions Architect Grant Li. “With the BaM framework and DapuStor’s high-performance NVMe SSDs, AI model training times can be drastically reduced, leading to cost savings and more efficient use of infrastructure.
“DapuStor’s R5101 and H5100 SSDs demonstrate the potential for tens of times faster model training,” he adds. “And even greater performance is achievable through the use of PCIe Gen 5 technology.”
For more information on how DapuStor's storage solutions can accelerate your AI workloads, visit www.dapustor.com.
About DapuStor
DapuStor Corporation (DapuStor), founded in April 2016, is a leading expert in advanced enterprise solid-state drives (SSD), SOC, and edge computing-related products. With world-class R&D strength and more than 400 team members, it has comprehensive capabilities from chip design and product development to mass production. Its products have been widely used in servers, telecom operators and data centers.