在“晶片庭院”培育一顆多核異構 RISC-V SOC種子
1 文章導覽
本文是簡要性的導覽chipyard官方手冊內容,以及安裝開發環境需要注意的的一些地方,最後執行幾個簡單的官方Demo,希望能對RISC-V有興趣的小夥伴有所啟發幫助,官方網址為https://chipyard.readthedocs.io/en/latest/
注:文內大部分程式碼均複製貼上整理自官方手冊。
2 chipyard元件
Chipyard是用於敏捷開發基於Chisel的片上系統的開源框架。它將使您能夠利用Chisel HDL,Rocket Chip SoC生成器和其他Berkeley專案來生產RISC-V SoC,該產品具有從MMIO對映的外設到定製加速器的所有功能。Chipyard包含:
- 處理器核心(Rocket,BOOM,Ariane);
- 加速器(Hwacha,Gemmini,NVDLA);
- 記憶體系統以及其他外圍裝置和工具,以幫助建立功能齊全的SoC。
2.1 Rocket
Rocket-core是標準的5級流水順序執行標量處理器,支援RV64GC RISC-V 指令集,Chisel實現,下面是一個典型的雙核實現
它的流水線結構為
2.2 BOOM
BOOM全名為Berkeley Out-of-Order Machine,顧名思義是個亂序執行的core,為7級流水,支援RV64GC RISC-V 指令集,Chisel實現,如下是詳細的流水線結構
這個是簡化的流水線結構
特性彙總如下表
2.3 Ariane
Ariane是6級流水順序執行標量core,SV實現,如下是它的流水線結構
2.4 Gemmini
Gemmini專案是一種正在開發基於脈動陣列的矩陣乘法單元生成器。利用ROCC介面,用於與RISC-V Rocket / BOOM處理器整合的協處理器。
2.5 NVDLA
NVDLA是NVIDIA開發的開源深度學習加速器。可以通過TileLink匯流排掛載搭配Rocket Chip SoC 上。
2.6 SHA3 RoCC 加速器
利用ROCC介面,用於與RISC-V Rocket / BOOM處理器整合的協處理器,專用於SHA3 Hash加速。
3 搭建環境
注:僅限於Linux系統!!!
下面以Ubuntu為例,其他的建議參考官方文件
首先要先安裝必要的依賴環境
#!/bin/bash
set -ex
sudo apt-get install -y build-essential bison flex
sudo apt-get install -y libgmp-dev libmpfr-dev libmpc-dev zlib1g-dev vim git default-jdk default-jre
# install sbt: https://www.scala-sbt.org/release/docs/Installing-sbt-on-Linux.html
echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt-get update
sudo apt-get install -y sbt
sudo apt-get install -y texinfo gengetopt
sudo apt-get install -y libexpat1-dev libusb-dev libncurses5-dev cmake
# deps for poky
sudo apt-get install -y python3.6 patch diffstat texi2html texinfo subversion chrpath git wget
# deps for qemu
sudo apt-get install -y libgtk-3-dev gettext
# deps for firemarshal
sudo apt-get install -y python3-pip python3.6-dev rsync libguestfs-tools expat ctags
# install DTC
sudo apt-get install -y device-tree-compiler
# install verilator
git clone http://git.veripool.org/git/verilator
cd verilator
git checkout v4.034
autoconf && ./configure && make -j30 && sudo make install
下面利用git把chipyard以及包含的所有子模組全部下載下來。
git clone https://github.com/ucb-bar/chipyard.git
cd chipyard
./scripts/init-submodules-no-riscv-tools.sh
最後構建需要的工具鏈
# riscv-tools: if set, builds the riscv toolchain (this is also the default)
# esp-tools: if set, builds esp-tools toolchain used for the hwacha vector accelerator
# ec2fast: if set, pulls in a pre-compiled RISC-V toolchain for an EC2 manager instance
export MAKEFLAGS=-j30
./scripts/build-toolchains.sh riscv-tools # for a normal risc-v toolchain
source ./env.sh
如果上面的步驟經過了大半天也沒有完成,甚至因為網路的原因出錯,那麼你可以有如下兩種解決方案,如果還有更好的方案歡迎討論:
- 利用代理或者梯子;
- 利用gitee映象原倉庫,然後後臺一個一個下載,最後重複執行
./scripts/init-submodules-no-riscv-tools.sh
與./scripts/build-toolchains.sh riscv-tools
,直到最終完成工具鏈的構建。
4 幾個示例
4.1 Rocket
首先進行一個典型的Rocket配置,更多有趣的配置可以直接訪問原始檔
//generators/chipyard/src/main/scala/config/RocketConfigs.scala
class RocketConfig extends Config(
new chipyard.iobinders.WithUARTAdapter ++ // display UART with a SimUARTAdapter
new chipyard.iobinders.WithTieOffInterrupts ++ // tie off top-level interrupts
new chipyard.iobinders.WithBlackBoxSimMem ++ // drive the master AXI4 memory with a blackbox DRAMSim model
new chipyard.iobinders.WithTiedOffDebug ++ // tie off debug (since we are using SimSerial for testing)
new chipyard.iobinders.WithSimSerial ++ // drive TSI with SimSerial for testing
new testchipip.WithTSI ++ // use testchipip serial offchip link
new chipyard.config.WithBootROM ++ // use default bootrom
new chipyard.config.WithUART ++ // add a UART
new chipyard.config.WithL2TLBs(1024) ++ // use L2 TLBs
new freechips.rocketchip.subsystem.WithNoMMIOPort ++ // no top-level MMIO master port (overrides default set in rocketchip)
new freechips.rocketchip.subsystem.WithNoSlavePort ++ // no top-level MMIO slave port (overrides default set in rocketchip)
new freechips.rocketchip.subsystem.WithInclusiveCache ++ // use Sifive L2 cache
new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++ // no external interrupts
new freechips.rocketchip.subsystem.WithNBigCores(1) ++ // single rocket-core
new freechips.rocketchip.subsystem.WithCoherentBusTopology ++ // hierarchical buses including mbus+l2
new freechips.rocketchip.system.BaseConfig) // "base" rocketchip system
構建core
cd sims/verilator
make CONFIG=RocketConfig -j
如下部分裝置樹log對應著上述的配置
然後執行個跑分程式看看效能
cd $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/
make -j
cd $RISCV/../sims/verilator
./simulator-chipyard-RocketConfig $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/dhrystone.riscv
4.2 BOOM
再來看看一個Small BOOM的配置
// generators/chipyard/src/main/scala/config/BoomConfigs.scala
class SmallBoomConfig extends Config(
new chipyard.iobinders.WithUARTAdapter ++ // display UART with a SimUARTAdapter
new chipyard.iobinders.WithTieOffInterrupts ++ // tie off top-level interrupts
new chipyard.iobinders.WithBlackBoxSimMem ++ // drive the master AXI4 memory with a SimAXIMem
new chipyard.iobinders.WithTiedOffDebug ++ // tie off debug (since we are using SimSerial for testing)
new chipyard.iobinders.WithSimSerial ++ // drive TSI with SimSerial for testing
new testchipip.WithTSI ++ // use testchipip serial offchip link
new chipyard.config.WithBootROM ++ // use default bootrom
new chipyard.config.WithUART ++ // add a UART
new chipyard.config.WithL2TLBs(1024) ++ // use L2 TLBs
new freechips.rocketchip.subsystem.WithNoMMIOPort ++ // no top-level MMIO master port (overrides default set in rocketchip)
new freechips.rocketchip.subsystem.WithNoSlavePort ++ // no top-level MMIO slave port (overrides default set in rocketchip)
new freechips.rocketchip.subsystem.WithInclusiveCache ++ // use Sifive L2 cache
new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++ // no external interrupts
new boom.common.WithSmallBooms ++ // small boom config
new boom.common.WithNBoomCores(1) ++ // single-core boom
new freechips.rocketchip.subsystem.WithCoherentBusTopology ++ // hierarchical buses including mbus+l2
new freechips.rocketchip.system.BaseConfig) // "base" rocketchip system
執行如下命令進行構建核心
cd sims/verilator
make CONFIG=SmallBoomConfig -j
如下部分裝置樹log對應著上述的配置
然後執行個跑分程式看看效能
cd $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/
make -j
cd $RISCV/../sims/verilator
./simulator-chipyard-SmallBoomConfig $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/dhrystone.riscv
根據跑分,可以看出Mini Boom核心的亂序執行對比Rocket的順序執行稍微提升了效能(假設核心頻率)。
再來看看一個Large Boom的跑分,帶來了兩倍以上的效能提升。
注:更深入的跑分資料對比需要換算為DMIPS/MHz,與其他處理器進行對比,這裡就不深入說明了。
4.3 初探定製硬體加速器SOC
最後來看一個帶FIR硬體加速器的Rocket SOC,它的配置為
//generators/chipyard/src/main/scala/config/RocketConfigs.scala
class StreamingFIRRocketConfig extends Config (
new chipyard.example.WithStreamingFIR ++ // use top with tilelink-controlled streaming FIR
new chipyard.iobinders.WithUARTAdapter ++
new chipyard.iobinders.WithTieOffInterrupts ++
new chipyard.iobinders.WithBlackBoxSimMem ++
new chipyard.iobinders.WithTiedOffDebug ++
new chipyard.iobinders.WithSimSerial ++
new testchipip.WithTSI ++
new chipyard.config.WithBootROM ++
new chipyard.config.WithUART ++
new chipyard.config.WithL2TLBs(1024) ++
new freechips.rocketchip.subsystem.WithNoMMIOPort ++
new freechips.rocketchip.subsystem.WithNoSlavePort ++
new freechips.rocketchip.subsystem.WithInclusiveCache ++
new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++
new freechips.rocketchip.subsystem.WithNBigCores(1) ++
new freechips.rocketchip.subsystem.WithCoherentBusTopology ++
new freechips.rocketchip.system.BaseConfig)
cd tests/
make -j
cd ../sims/verilator
make CONFIG=StreamingFIRRocketConfig -j BINARY=../../tests/streaming-fir.riscv run-binary
根據log可以看出記憶體地址有該硬體加速器的一席之地,後面會利用MMIO進行控制訪問
測試程式碼如下
#define PASSTHROUGH_WRITE 0x2000
#define PASSTHROUGH_WRITE_COUNT 0x2008
#define PASSTHROUGH_READ 0x2100
#define PASSTHROUGH_READ_COUNT 0x2108
#define BP 3
#define BP_SCALE ((double)(1 << BP))
#include "mmio.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
uint64_t roundi(double x)
{
if (x < 0.0) {
return (uint64_t)(x - 0.5);
} else {
return (uint64_t)(x + 0.5);
}
}
int main(void)
{
double test_vector[15] = {1.0, 2.0, 3.0, 4.0, 5.0, 4.0, 3.0, 2.0, 1.0, 0.5, 0.25, 0.125, 0.125};
uint32_t num_tests = sizeof(test_vector) / sizeof(double);
printf("Starting writing %d inputs\n", num_tests);
for (int i = 0; i < num_tests; i++) {
reg_write64(PASSTHROUGH_WRITE, roundi(test_vector[i] * BP_SCALE));
}
printf("Done writing\n");
uint32_t rcnt = reg_read32(PASSTHROUGH_READ_COUNT);
printf("Write count: %d\n", reg_read32(PASSTHROUGH_WRITE_COUNT));
printf("Read count: %d\n", rcnt);
int failed = 0;
if (rcnt != 0) {
for (int i = 0; i < num_tests - 3; i++) {
uint32_t res = reg_read32(PASSTHROUGH_READ);
// double res = ((double)reg_read32(PASSTHROUGH_READ)) / BP_SCALE;
double expected_double = 3*test_vector[i] + 2*test_vector[i+1] + test_vector[i+2];
uint32_t expected = ((uint32_t)(expected_double * BP_SCALE + 0.5)) & 0xFF;
if (res == expected) {
printf("\n\nPass: Got %u Expected %u\n\n", res, expected);
} else {
failed = 1;
printf("\n\nFail: Got %u Expected %u\n\n", res, expected);
}
}
} else {
failed = 1;
}
if (failed) {
printf("\n\nSome tests failed\n\n");
} else {
printf("\n\nAll tests passed\n\n");
}
return 0;
}
測試結果如下
4.4 構建多核異構SOC
一個典型的配置為單核Boom與單核Rocket以及其他必要的元件構成一個異構SOC
class LargeBoomAndHwachaRocketConfig extends Config(
new chipyard.iobinders.WithUARTAdapter ++
new chipyard.iobinders.WithTieOffInterrupts ++
new chipyard.iobinders.WithBlackBoxSimMem ++
new chipyard.iobinders.WithTiedOffDebug ++
new chipyard.iobinders.WithSimSerial ++
new testchipip.WithTSI ++
new chipyard.config.WithBootROM ++
new chipyard.config.WithUART ++
new chipyard.config.WithMultiRoCC ++ // support heterogeneous rocc
new chipyard.config.WithMultiRoCCHwacha(1) ++ // put hwacha on hart-2 (rocket)
new chipyard.config.WithL2TLBs(1024) ++
new chipyard.config.WithRenumberHarts ++
new boom.common.WithLargeBooms ++
new boom.common.WithNBoomCores(1) ++
new freechips.rocketchip.subsystem.WithNoMMIOPort ++
new freechips.rocketchip.subsystem.WithNoSlavePort ++
new freechips.rocketchip.subsystem.WithInclusiveCache ++
new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++
new freechips.rocketchip.subsystem.WithNBigCores(1) ++
new freechips.rocketchip.subsystem.WithCoherentBusTopology ++
new freechips.rocketchip.system.BaseConfig)
更多的細節內容建議直接訪問官方文件,以及文章的後續(如果有機會的話,看情況會有core移植到FPGA、Linux作業系統移植的相關內容)。
整理不易,嚴禁剽竊!
歡迎大家關注我建立的微信公眾號——小白倉庫
原創經驗資料分享:包含但不僅限於FPGA、ARM、RISC-V、Linux、LabVIEW等軟硬體開發,另外分享生活中的趣事以及感悟。目的是建立一個平臺記錄學習過的知識,並分享出來自認為有用的與感興趣的道友相互交流進步。
相關文章
- Semico Research:預計到2025年全球RISC-V架構晶片將增至624億顆架構晶片
- 檢驗一顆樹是不是另一顆樹的子結構
- 『預計到2025年全球RISC-V架構晶片將增至624億顆』今日資料行業日報(2019.12.05)架構晶片行業
- 全志最新電子紙專用SoC晶片B300晶片
- MPU進化,多核異構處理器有多強?
- 從廣西的新基建耕種,讀懂一顆名為智慧體的種子智慧體
- 多核異構模式下有管理的共享記憶體設計方法模式記憶體
- 元宇宙時代,缺這樣一顆「專用」晶片元宇宙晶片
- 73年前,夏農已經給大模型發展埋下一顆種子大模型
- 如何設計一顆40PFLOPS量級的AI晶片?AI晶片
- 阿里巴巴釋出第一顆自研晶片,全球最強 AI 晶片含光 800阿里晶片AI
- ARM太貴,80多家科技巨頭悄然站隊開源晶片架構RISC-V晶片架構
- 沁恆risc-v藍芽晶片的flash使用注意點藍芽晶片
- RISC-V 特權指令結構
- 知識分享 | 車載SoC晶片應用產業分析晶片產業
- 摩爾定律失效後,晶片正轉向一個由「異構」驅動的世界晶片
- 在《骰子地下城》裡,你可以扮演一顆骰子
- 這顆模擬AI晶片將開啟新紀元?AI晶片
- 嘉楠基於RISC-V的端側AIoT SoC採用了芯原的ISP IP和GPU IPAIGPU
- AM57x 多核SoC開發板——GPMC的多通道AD採集綜合案例手冊(上)
- 專用R5F+雙核A53,異構多核AM64x讓工控“更實時”
- 含光出鞘,鋒利無比 | 阿里第一顆自研晶片問世,最強 AI 晶片含光800釋出阿里晶片AI
- 美的集團2022年目標出貨8000萬顆晶片晶片
- MT2502A SOC datasheet,MT2502A穿戴晶片規格書資料晶片
- 高通CSRA68105藍芽音訊片上系統晶片(SOC)藍芽音訊晶片
- Apple公司的M1晶片和開源的RISC-V晶片給了我們什麼啟示?APP晶片
- 異構計算,GPU、FPGA、ASIC晶片將三分天下GPUFPGA晶片
- Java 異常(一) 異常概述及其架構Java架構
- DPU晶片企業中科馭數加入龍蜥社群,構建異構算力生態晶片
- 12顆小球,有一顆質量不一樣的面試題面試題
- 一種將前端惡意程式碼關在“籠子”裡的技術方案前端
- RISC-V各種資料,書書籍,paper等等整理收集
- 資料結構實驗六是否同一顆二叉樹資料結構二叉樹
- 臺積電7nm工藝生產晶片超10億顆晶片
- 光晶片能否代替電子晶片?破解 AI 「算力荒」晶片AI
- 晶片的五種「死」法晶片
- Ta想做一粒智慧的種子
- 硬核觀察 #794 倪光南院士稱中國自研晶片應押注 RISC-V晶片