When working with neural networks and deep learning applications, their training is the single most time-consuming and costly process. In fact, a single training run of a high-level language model can easily cost around ten million dollars.
Torsten Hoefler, a professor at ETH Zurich and leader of the ETH Scalable Parallel Computing Lab, together with two computer scientists from his team, have developed software to run on one of the most powerful supercomputers currently being installed.
Their new software called NoPFS (Near-optimal Pre-Fetching System) achieves this by clairvoyance: It exploits a pseudo-random and therefore predictable process in training more effectively than other tools available to date, cutting training time in half.
MORE