Earlier in May during The Next AI Platform event in San Jose, we conducted live, technical interviews with a broad range of experts in various areas of deep learning hardware. This included the expected conversations about chips and accelerators but also discussions around storage and I/O and how different teams think about building balanced architectures for training and inference (almost exclusively as separate clusters).
The chip and accelerator segment of the day was packed with live individual interviews that will be presented in the near term. To wrap up that section of the day we put a few heavyweights on stage to talk about various aspects of accelerators and where they fit into training and inference systems. Topics include the role of the interconnect for various types of systems, generalizing software frameworks for diverse AI workloads and systems, and how the various companies will seek to build full platforms for AI versus offload accelerators.
The video below recaps that panel session from the sold-out Next AI Platform event, which is hosted by analyst and Next Platform contributor, Paul Teich and features Nigel Toon, CEO and co-founder of Graphcore, Jin Kim, Chief Data Science Officer at Wave Computing, Mike Henry, CEO of inference/edge-focused startup, Mythic, and Gaurav Singh of Xilinx.
Much of the conversation focuses on efficiency and how systems are built differently for training versus inference. “For inference it is different [than training]. It’s mission critical, it’s 24/7, and it has to seamlessly integrate with three different pipelines. The data stream with live data coming in, the model ingest pipeline since models are never static, and the inference output, which feeds application servers or backend servers. It’s a challenging problem and on the training side this is also an issue,” Wave’s Jin Kim tells the audience.
This trend toward heterogeneity will continue in both training and inference, says Gaurav Singh of Xilinx, with the CPU in conjunction with an accelerator. “Whether that is machine learning or some other offloaded problem we have to come up with an application paradigm that lets people program accelerators.”
The panel also touched on the topic of efficiency and performance per watt. There was little doubt that GPUs have dominated in the training space but that big compute emphasis is not suitable for inference. This opens the market for lower power, inference-optimized alternatives that lock down the software interfaces and integrations right and also opens possibilities for startups that can try to wrangle efficient training and inference on the same device with optimal power consumption and performance.
That is the golden grail but for now, we are still in a specific accelerator world (separate training/inference accelerators).
For those interested in the inference side of building AI systems we will be hosting a second event dedicated specifically to this workload that covers the ecosystem for datacenter-rooted inference and how some of those workloads might shift to the edge while others take various approaches modeled after hyperscale systems for delivering fast results off trained models at scale.
More coming on this soon but you heard it here first, save the date, October 3rd in San Jose.