在“晶片庭院”培育一顆多核異構 RISC-V SOC種子
2 chipyard元件
Chipyard是用於敏捷開發基於Chisel的片上系統的開源框架。它將使您能夠利用Chisel HDL,Rocket Chip SoC生成器和其他Berkeley專案來生產RISC-V SoC,該產品具有從MMIO對映的外設到定製加速器的所有功能。Chipyard包含:
- 處理器核心(Rocket,BOOM,Ariane);
- 加速器(Hwacha,Gemmini,NVDLA);
- 記憶體系統以及其他外圍裝置和工具,以幫助建立功能齊全的SoC。
2.1 Rocket
Rocket-core是標準的5級流水順序執行標量處理器,支援RV64GC RISC-V 指令集,Chisel實現,下面是一個典型的雙核實現
2.2 BOOM
BOOM全名為Berkeley Out-of-Order Machine,顧名思義是個亂序執行的core,為7級流水,支援RV64GC RISC-V 指令集,Chisel實現,如下是詳細的流水線結構
2.3 Ariane
2.4 Gemmini
Gemmini專案是一種正在開發基於脈動陣列的矩陣乘法單元生成器。利用ROCC介面,用於與RISC-V Rocket / BOOM處理器整合的協處理器。
NVDLA是NVIDIA開發的開源深度學習加速器。可以通過TileLink匯流排掛載搭配Rocket Chip SoC 上。
2.6 SHA3 RoCC 加速器
利用ROCC介面,用於與RISC-V Rocket / BOOM處理器整合的協處理器,專用於SHA3 Hash加速。
3 搭建環境
set -ex
sudo apt-get install -y build-essential bison flex
sudo apt-get install -y libgmp-dev libmpfr-dev libmpc-dev zlib1g-dev vim git default-jdk default-jre
# install sbt: https://www.scala-sbt.org/release/docs/Installing-sbt-on-Linux.html
echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt-get update
sudo apt-get install -y sbt
sudo apt-get install -y texinfo gengetopt
sudo apt-get install -y libexpat1-dev libusb-dev libncurses5-dev cmake
# deps for poky
sudo apt-get install -y python3.6 patch diffstat texi2html texinfo subversion chrpath git wget
# deps for qemu
sudo apt-get install -y libgtk-3-dev gettext
# deps for firemarshal
sudo apt-get install -y python3-pip python3.6-dev rsync libguestfs-tools expat ctags
# install DTC
sudo apt-get install -y device-tree-compiler
# install verilator
git clone http://git.veripool.org/git/verilator
cd verilator
git checkout v4.034
autoconf && ./configure && make -j30 && sudo make install
git clone https://github.com/ucb-bar/chipyard.git
cd chipyard
# riscv-tools: if set, builds the riscv toolchain (this is also the default)
# esp-tools: if set, builds esp-tools toolchain used for the hwacha vector accelerator
# ec2fast: if set, pulls in a pre-compiled RISC-V toolchain for an EC2 manager instance
export MAKEFLAGS=-j30
./scripts/build-toolchains.sh riscv-tools # for a normal risc-v toolchain
source ./env.sh
- 利用代理或者梯子;
- 利用gitee映象原倉庫,然後後臺一個一個下載,最後重複執行
與./scripts/build-toolchains.sh riscv-tools
4 幾個示例
4.1 Rocket
class RocketConfig extends Config(
new chipyard.iobinders.WithUARTAdapter ++ // display UART with a SimUARTAdapter
new chipyard.iobinders.WithTieOffInterrupts ++ // tie off top-level interrupts
new chipyard.iobinders.WithBlackBoxSimMem ++ // drive the master AXI4 memory with a blackbox DRAMSim model
new chipyard.iobinders.WithTiedOffDebug ++ // tie off debug (since we are using SimSerial for testing)
new chipyard.iobinders.WithSimSerial ++ // drive TSI with SimSerial for testing
new testchipip.WithTSI ++ // use testchipip serial offchip link
new chipyard.config.WithBootROM ++ // use default bootrom
new chipyard.config.WithUART ++ // add a UART
new chipyard.config.WithL2TLBs(1024) ++ // use L2 TLBs
new freechips.rocketchip.subsystem.WithNoMMIOPort ++ // no top-level MMIO master port (overrides default set in rocketchip)
new freechips.rocketchip.subsystem.WithNoSlavePort ++ // no top-level MMIO slave port (overrides default set in rocketchip)
new freechips.rocketchip.subsystem.WithInclusiveCache ++ // use Sifive L2 cache
new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++ // no external interrupts
new freechips.rocketchip.subsystem.WithNBigCores(1) ++ // single rocket-core
new freechips.rocketchip.subsystem.WithCoherentBusTopology ++ // hierarchical buses including mbus+l2
new freechips.rocketchip.system.BaseConfig) // "base" rocketchip system
cd sims/verilator
make CONFIG=RocketConfig -j
cd $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/
make -j
cd $RISCV/../sims/verilator
./simulator-chipyard-RocketConfig $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/dhrystone.riscv
4.2 BOOM
再來看看一個Small BOOM的配置
// generators/chipyard/src/main/scala/config/BoomConfigs.scala
class SmallBoomConfig extends Config(
new chipyard.iobinders.WithUARTAdapter ++ // display UART with a SimUARTAdapter
new chipyard.iobinders.WithTieOffInterrupts ++ // tie off top-level interrupts
new chipyard.iobinders.WithBlackBoxSimMem ++ // drive the master AXI4 memory with a SimAXIMem
new chipyard.iobinders.WithTiedOffDebug ++ // tie off debug (since we are using SimSerial for testing)
new chipyard.iobinders.WithSimSerial ++ // drive TSI with SimSerial for testing
new testchipip.WithTSI ++ // use testchipip serial offchip link
new chipyard.config.WithBootROM ++ // use default bootrom
new chipyard.config.WithUART ++ // add a UART
new chipyard.config.WithL2TLBs(1024) ++ // use L2 TLBs
new freechips.rocketchip.subsystem.WithNoMMIOPort ++ // no top-level MMIO master port (overrides default set in rocketchip)
new freechips.rocketchip.subsystem.WithNoSlavePort ++ // no top-level MMIO slave port (overrides default set in rocketchip)
new freechips.rocketchip.subsystem.WithInclusiveCache ++ // use Sifive L2 cache
new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++ // no external interrupts
new boom.common.WithSmallBooms ++ // small boom config
new boom.common.WithNBoomCores(1) ++ // single-core boom
new freechips.rocketchip.subsystem.WithCoherentBusTopology ++ // hierarchical buses including mbus+l2
new freechips.rocketchip.system.BaseConfig) // "base" rocketchip system
cd sims/verilator
make CONFIG=SmallBoomConfig -j
cd $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/
make -j
cd $RISCV/../sims/verilator
./simulator-chipyard-SmallBoomConfig $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/dhrystone.riscv
根據跑分,可以看出Mini Boom核心的亂序執行對比Rocket的順序執行稍微提升了效能(假設核心頻率)。
再來看看一個Large Boom的跑分,帶來了兩倍以上的效能提升。
4.3 初探定製硬體加速器SOC
最後來看一個帶FIR硬體加速器的Rocket SOC,它的配置為
class StreamingFIRRocketConfig extends Config (
new chipyard.example.WithStreamingFIR ++ // use top with tilelink-controlled streaming FIR
new chipyard.iobinders.WithUARTAdapter ++
new chipyard.iobinders.WithTieOffInterrupts ++
new chipyard.iobinders.WithBlackBoxSimMem ++
new chipyard.iobinders.WithTiedOffDebug ++
new chipyard.iobinders.WithSimSerial ++
new testchipip.WithTSI ++
new chipyard.config.WithBootROM ++
new chipyard.config.WithUART ++
new chipyard.config.WithL2TLBs(1024) ++
new freechips.rocketchip.subsystem.WithNoMMIOPort ++
new freechips.rocketchip.subsystem.WithNoSlavePort ++
new freechips.rocketchip.subsystem.WithInclusiveCache ++
new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++
new freechips.rocketchip.subsystem.WithNBigCores(1) ++
new freechips.rocketchip.subsystem.WithCoherentBusTopology ++
new freechips.rocketchip.system.BaseConfig)
cd tests/
make -j
cd ../sims/verilator
make CONFIG=StreamingFIRRocketConfig -j BINARY=../../tests/streaming-fir.riscv run-binary
#define PASSTHROUGH_WRITE 0x2000
#define PASSTHROUGH_READ 0x2100
#define BP 3
#define BP_SCALE ((double)(1 << BP))
#include "mmio.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
uint64_t roundi(double x)
if (x < 0.0) {
return (uint64_t)(x - 0.5);
} else {
return (uint64_t)(x + 0.5);
int main(void)
double test_vector[15] = {1.0, 2.0, 3.0, 4.0, 5.0, 4.0, 3.0, 2.0, 1.0, 0.5, 0.25, 0.125, 0.125};
uint32_t num_tests = sizeof(test_vector) / sizeof(double);
printf("Starting writing %d inputs\n", num_tests);
for (int i = 0; i < num_tests; i++) {
reg_write64(PASSTHROUGH_WRITE, roundi(test_vector[i] * BP_SCALE));
printf("Done writing\n");
uint32_t rcnt = reg_read32(PASSTHROUGH_READ_COUNT);
printf("Write count: %d\n", reg_read32(PASSTHROUGH_WRITE_COUNT));
printf("Read count: %d\n", rcnt);
int failed = 0;
if (rcnt != 0) {
for (int i = 0; i < num_tests - 3; i++) {
uint32_t res = reg_read32(PASSTHROUGH_READ);
// double res = ((double)reg_read32(PASSTHROUGH_READ)) / BP_SCALE;
double expected_double = 3*test_vector[i] + 2*test_vector[i+1] + test_vector[i+2];
uint32_t expected = ((uint32_t)(expected_double * BP_SCALE + 0.5)) & 0xFF;
if (res == expected) {
printf("\n\nPass: Got %u Expected %u\n\n", res, expected);
} else {
failed = 1;
printf("\n\nFail: Got %u Expected %u\n\n", res, expected);
} else {
failed = 1;
if (failed) {
printf("\n\nSome tests failed\n\n");
} else {
printf("\n\nAll tests passed\n\n");
return 0;
4.4 構建多核異構SOC
class LargeBoomAndHwachaRocketConfig extends Config(
new chipyard.iobinders.WithUARTAdapter ++
new chipyard.iobinders.WithTieOffInterrupts ++
new chipyard.iobinders.WithBlackBoxSimMem ++
new chipyard.iobinders.WithTiedOffDebug ++
new chipyard.iobinders.WithSimSerial ++
new testchipip.WithTSI ++
new chipyard.config.WithBootROM ++
new chipyard.config.WithUART ++
new chipyard.config.WithMultiRoCC ++ // support heterogeneous rocc
new chipyard.config.WithMultiRoCCHwacha(1) ++ // put hwacha on hart-2 (rocket)
new chipyard.config.WithL2TLBs(1024) ++
new chipyard.config.WithRenumberHarts ++
new boom.common.WithLargeBooms ++
new boom.common.WithNBoomCores(1) ++
new freechips.rocketchip.subsystem.WithNoMMIOPort ++
new freechips.rocketchip.subsystem.WithNoSlavePort ++
new freechips.rocketchip.subsystem.WithInclusiveCache ++
new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++
new freechips.rocketchip.subsystem.WithNBigCores(1) ++
new freechips.rocketchip.subsystem.WithCoherentBusTopology ++
new freechips.rocketchip.system.BaseConfig)
