Back in March, we introduced a chip upstart taking aim at the efficiency of future exascale systems called Rex Computing. The young company is now armed with $1.25 million to hire another few engineers to move the Neo chips from concept to production—and also has a sizable DARPA contract to match the early interest it found with select national labs in the U.S..
For the background on the architecture and to a lesser extent, the company’s founder (who is not yet twenty), check the initial overview of Rex Computing. Since the time of that piece, founder Thomas Sohmers and his small team have been in the process of locking down the architecture to round out the final verified RTL by the end of this year. Rex Computing will be sampling its first chips in the middle of next year and will move to full production silicon in mid-2017 using TSMC’s 28 nanometer process.
In addition to the young company’s first round of financing, Rex Computing has also secured close to $100,000 in DARPA funds. The full description can be found midway down this DARPA document under “Programming New Computers,” and has, according to Sohmers, been instrumental as they start down the verification and early tape out process for the Neo chips. The funding is designed to target the automatic scratch pad memory tools, which, according to Sohmers is the “difficult part and where this approach might succeed where others have failed is the static compilation analysis technology at runtime.”
As a reminder, this automated step at runtime is a key differentiator in the Neo design. With this approach, from the user perspective, using Neo will be similar to tapping a cache-based system but without all the area and power overhead. Rex’s goal is to remove unnecessary complexity in the on-processor memory system and put that into the compiler instead. All of this happens at compile time, so it does not add complexity to the program itself either. The compiler understands where data will need to be at different points and it inserts it where it should go instead of leaving it in DRAM and letting the chip’s memory management units fetch it when it needs to in an inefficient big handful—and with data included that likely will not be used anyway.
“It takes 4200 picojoules to move 64 bits from DRAM to registers while it only takes 100 picojoules to do a double-precision floating point operation. It’s over 40x more energy to move the data than to actually operate on it. What most people would assume is that most of that 4200 picojoules is being used in going off-chip, but in reality, about 60% of that energy usage is being consumed by the on-chip cache hierarchy because of all of the extra gates and wires on the chip that the electrons go through. We are removing that 60%.”
Rex Computing is working under a tight schedule with a very small team. Sohmers tells The Next Platform that they are hiring engineers to bring the company to a total of seven people. They have already created the instruction set architecture and the basic core chip design but as the year moves on, they will push the functional verification and ensure that their ISA is optimized for the applications they are targeting and free from logical inconsistencies and other potential problems.
At the same time, they are taking the functional logical idea of their architecture and implementing it in actual hardware via the UC Berkeley-developed Chisel hardware description language. “There is a traditional flow then where we have RTL engineers writing the RTL based on what our functional model is, we will then take that, hand it over to the VLSI to do to the physical part using EDA tools and start placing the gates and components of the chip on the physical space.” He says the software tools available for this part of the process are abysmal, “it’s very time consuming, even if they can do some of the things automatically.” From this point, the small team will pass it over to a verification engineer to run on an FPGA or on their C++ functional simulator, then put the physical design through a hardware simulation like SPICE or tools like Cadence and Synopsys.
This is all very ambitious. “When Intel does this, they have 300 or more people on many teams over 18 months. We’re doing it with five on a tight schedule,” Sohmers said. Although this is a qualified comparison since Haswell chips, for instance, are far larger and are not even on the same playing field on the functional side compared to Neo, the point is, the small team will be in for a sleepless 2015. But when you’re young, driven, and set with a potential market for a product—what’s a little lost sleep?
Sohmers says Rex Computing has had to change its public facing approach since we spoke earlier in the year to include other markets beyond supercomputing. While he does have interest from national labs, including Sandia, where he is working with Jim Ang, who runs the new architectures group and is working with Sohmers and his small team on some modeling tools for the Neo architecture, Sohmers has to be more public about potential telco, embedded, and other use cases.
“One thing we’ve had to do to get funding is to pivot our public face by not using the word supercomputing or HPC so much. It’s basically a cursed word in the Silicon Valley investor community.”
He says that while there are plenty of investors that do understand the value of high performance computing technologies, the term HPC is still problematic. The investor community wants companies that can target much larger markets and even if they might understand that HPC has something to offer, for young companies looking for funds, this is an interesting note. “In this age of social networks and messaging apps being the big thing in Silicon Valley, it’s almost impossible to get funded if you’re pitching something for the big iron systems,” he explains.
Even with funding, this is a risky venture. “The cost for us going to TSMC and getting 100 chips back is, after you include the packaging and just getting the dies to our door, around $250,000. They sell them in blocks with shared costs of the mask among other companies, which is how we’re getting our first prototypes made.” Before that the other costs are EDA tools. Single seats for Cadence or Synopsys software is in the several hundreds of thousands of dollars even when you’re a startup, he says.