在“晶片庭院”培育一顆多核異構 RISC-V SOC種子

sazc發表於2020-12-22

1 文章導覽

在這裡插入圖片描述

本文是簡要性的導覽chipyard官方手冊內容,以及安裝開發環境需要注意的的一些地方,最後執行幾個簡單的官方Demo,希望能對RISC-V有興趣的小夥伴有所啟發幫助,官方網址為https://chipyard.readthedocs.io/en/latest/

注:文內大部分程式碼均複製貼上整理自官方手冊。

2 chipyard元件

Chipyard是用於敏捷開發基於Chisel的片上系統的開源框架。它將使您能夠利用Chisel HDL,Rocket Chip SoC生成器和其他Berkeley專案來生產RISC-V SoC,該產品具有從MMIO對映的外設到定製加速器的所有功能。Chipyard包含:

  • 處理器核心(Rocket,BOOM,Ariane);
  • 加速器(Hwacha,Gemmini,NVDLA);
  • 記憶體系統以及其他外圍裝置和工具,以幫助建立功能齊全的SoC。

2.1 Rocket

Rocket-core是標準的5級流水順序執行標量處理器,支援RV64GC RISC-V 指令集,Chisel實現,下面是一個典型的雙核實現
在這裡插入圖片描述

它的流水線結構為
在這裡插入圖片描述

2.2 BOOM

BOOM全名為Berkeley Out-of-Order Machine,顧名思義是個亂序執行的core,為7級流水,支援RV64GC RISC-V 指令集,Chisel實現,如下是詳細的流水線結構
在這裡插入圖片描述
這個是簡化的流水線結構
在這裡插入圖片描述

特性彙總如下表在這裡插入圖片描述

2.3 Ariane

Ariane是6級流水順序執行標量core,SV實現,如下是它的流水線結構
在這裡插入圖片描述

2.4 Gemmini

Gemmini專案是一種正在開發基於脈動陣列的矩陣乘法單元生成器。利用ROCC介面,用於與RISC-V Rocket / BOOM處理器整合的協處理器。
在這裡插入圖片描述

2.5 NVDLA

NVDLA是NVIDIA開發的開源深度學習加速器。可以通過TileLink匯流排掛載搭配Rocket Chip SoC 上。
在這裡插入圖片描述

2.6 SHA3 RoCC 加速器

利用ROCC介面,用於與RISC-V Rocket / BOOM處理器整合的協處理器,專用於SHA3 Hash加速。
在這裡插入圖片描述

3 搭建環境

注:僅限於Linux系統!!!

下面以Ubuntu為例,其他的建議參考官方文件

首先要先安裝必要的依賴環境

#!/bin/bash

set -ex

sudo apt-get install -y build-essential bison flex
sudo apt-get install -y libgmp-dev libmpfr-dev libmpc-dev zlib1g-dev vim git default-jdk default-jre
# install sbt: https://www.scala-sbt.org/release/docs/Installing-sbt-on-Linux.html
echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt-get update
sudo apt-get install -y sbt
sudo apt-get install -y texinfo gengetopt
sudo apt-get install -y libexpat1-dev libusb-dev libncurses5-dev cmake
# deps for poky
sudo apt-get install -y python3.6 patch diffstat texi2html texinfo subversion chrpath git wget
# deps for qemu
sudo apt-get install -y libgtk-3-dev gettext
# deps for firemarshal
sudo apt-get install -y python3-pip python3.6-dev rsync libguestfs-tools expat ctags
# install DTC
sudo apt-get install -y device-tree-compiler

# install verilator
git clone http://git.veripool.org/git/verilator
cd verilator
git checkout v4.034
autoconf && ./configure && make -j30 && sudo make install

下面利用git把chipyard以及包含的所有子模組全部下載下來。

git clone https://github.com/ucb-bar/chipyard.git
cd chipyard
./scripts/init-submodules-no-riscv-tools.sh

最後構建需要的工具鏈

# riscv-tools: if set, builds the riscv toolchain (this is also the default)
# esp-tools: if set, builds esp-tools toolchain used for the hwacha vector accelerator
# ec2fast: if set, pulls in a pre-compiled RISC-V toolchain for an EC2 manager instance
export MAKEFLAGS=-j30
./scripts/build-toolchains.sh riscv-tools # for a normal risc-v toolchain
source ./env.sh

如果上面的步驟經過了大半天也沒有完成,甚至因為網路的原因出錯,那麼你可以有如下兩種解決方案,如果還有更好的方案歡迎討論:

  • 利用代理或者梯子;
  • 利用gitee映象原倉庫,然後後臺一個一個下載,最後重複執行./scripts/init-submodules-no-riscv-tools.sh./scripts/build-toolchains.sh riscv-tools,直到最終完成工具鏈的構建。

4 幾個示例

4.1 Rocket

首先進行一個典型的Rocket配置,更多有趣的配置可以直接訪問原始檔

//generators/chipyard/src/main/scala/config/RocketConfigs.scala
class RocketConfig extends Config(
  new chipyard.iobinders.WithUARTAdapter ++                      // display UART with a SimUARTAdapter
  new chipyard.iobinders.WithTieOffInterrupts ++                 // tie off top-level interrupts
  new chipyard.iobinders.WithBlackBoxSimMem ++                   // drive the master AXI4 memory with a blackbox DRAMSim model
  new chipyard.iobinders.WithTiedOffDebug ++                     // tie off debug (since we are using SimSerial for testing)
  new chipyard.iobinders.WithSimSerial ++                        // drive TSI with SimSerial for testing
  new testchipip.WithTSI ++                                      // use testchipip serial offchip link
  new chipyard.config.WithBootROM ++                             // use default bootrom
  new chipyard.config.WithUART ++                                // add a UART
  new chipyard.config.WithL2TLBs(1024) ++                        // use L2 TLBs
  new freechips.rocketchip.subsystem.WithNoMMIOPort ++           // no top-level MMIO master port (overrides default set in rocketchip)
  new freechips.rocketchip.subsystem.WithNoSlavePort ++          // no top-level MMIO slave port (overrides default set in rocketchip)
  new freechips.rocketchip.subsystem.WithInclusiveCache ++       // use Sifive L2 cache
  new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++ // no external interrupts
  new freechips.rocketchip.subsystem.WithNBigCores(1) ++         // single rocket-core
  new freechips.rocketchip.subsystem.WithCoherentBusTopology ++  // hierarchical buses including mbus+l2
  new freechips.rocketchip.system.BaseConfig)                    // "base" rocketchip system

構建core

cd sims/verilator
make CONFIG=RocketConfig -j

如下部分裝置樹log對應著上述的配置
在這裡插入圖片描述

然後執行個跑分程式看看效能

cd $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/
make -j
cd $RISCV/../sims/verilator
./simulator-chipyard-RocketConfig $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/dhrystone.riscv

在這裡插入圖片描述

4.2 BOOM

再來看看一個Small BOOM的配置

// generators/chipyard/src/main/scala/config/BoomConfigs.scala
class SmallBoomConfig extends Config(
  new chipyard.iobinders.WithUARTAdapter ++                      // display UART with a SimUARTAdapter
  new chipyard.iobinders.WithTieOffInterrupts ++                 // tie off top-level interrupts
  new chipyard.iobinders.WithBlackBoxSimMem ++                   // drive the master AXI4 memory with a SimAXIMem
  new chipyard.iobinders.WithTiedOffDebug ++                     // tie off debug (since we are using SimSerial for testing)
  new chipyard.iobinders.WithSimSerial ++                        // drive TSI with SimSerial for testing
  new testchipip.WithTSI ++                                      // use testchipip serial offchip link
  new chipyard.config.WithBootROM ++                             // use default bootrom
  new chipyard.config.WithUART ++                                // add a UART
  new chipyard.config.WithL2TLBs(1024) ++                        // use L2 TLBs
  new freechips.rocketchip.subsystem.WithNoMMIOPort ++           // no top-level MMIO master port (overrides default set in rocketchip)
  new freechips.rocketchip.subsystem.WithNoSlavePort ++          // no top-level MMIO slave port (overrides default set in rocketchip)
  new freechips.rocketchip.subsystem.WithInclusiveCache ++       // use Sifive L2 cache
  new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++ // no external interrupts
  new boom.common.WithSmallBooms ++                              // small boom config
  new boom.common.WithNBoomCores(1) ++                           // single-core boom
  new freechips.rocketchip.subsystem.WithCoherentBusTopology ++  // hierarchical buses including mbus+l2
  new freechips.rocketchip.system.BaseConfig)                    // "base" rocketchip system

執行如下命令進行構建核心

cd sims/verilator
make CONFIG=SmallBoomConfig -j

如下部分裝置樹log對應著上述的配置
在這裡插入圖片描述

然後執行個跑分程式看看效能

cd $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/
make -j
cd $RISCV/../sims/verilator
./simulator-chipyard-SmallBoomConfig $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/dhrystone.riscv

在這裡插入圖片描述
根據跑分,可以看出Mini Boom核心的亂序執行對比Rocket的順序執行稍微提升了效能(假設核心頻率)。

再來看看一個Large Boom的跑分,帶來了兩倍以上的效能提升。
在這裡插入圖片描述
注:更深入的跑分資料對比需要換算為DMIPS/MHz,與其他處理器進行對比,這裡就不深入說明了。

4.3 初探定製硬體加速器SOC

最後來看一個帶FIR硬體加速器的Rocket SOC,它的配置為

//generators/chipyard/src/main/scala/config/RocketConfigs.scala
class StreamingFIRRocketConfig extends Config (
  new chipyard.example.WithStreamingFIR ++ // use top with tilelink-controlled streaming FIR
  new chipyard.iobinders.WithUARTAdapter ++
  new chipyard.iobinders.WithTieOffInterrupts ++
  new chipyard.iobinders.WithBlackBoxSimMem ++
  new chipyard.iobinders.WithTiedOffDebug ++
  new chipyard.iobinders.WithSimSerial ++
  new testchipip.WithTSI ++
  new chipyard.config.WithBootROM ++
  new chipyard.config.WithUART ++
  new chipyard.config.WithL2TLBs(1024) ++
  new freechips.rocketchip.subsystem.WithNoMMIOPort ++
  new freechips.rocketchip.subsystem.WithNoSlavePort ++
  new freechips.rocketchip.subsystem.WithInclusiveCache ++
  new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++
  new freechips.rocketchip.subsystem.WithNBigCores(1) ++
  new freechips.rocketchip.subsystem.WithCoherentBusTopology ++
  new freechips.rocketchip.system.BaseConfig)
cd tests/
make -j
cd ../sims/verilator
make CONFIG=StreamingFIRRocketConfig -j BINARY=../../tests/streaming-fir.riscv run-binary

根據log可以看出記憶體地址有該硬體加速器的一席之地,後面會利用MMIO進行控制訪問
在這裡插入圖片描述
測試程式碼如下

#define PASSTHROUGH_WRITE 0x2000
#define PASSTHROUGH_WRITE_COUNT 0x2008
#define PASSTHROUGH_READ 0x2100
#define PASSTHROUGH_READ_COUNT 0x2108

#define BP 3
#define BP_SCALE ((double)(1 << BP))

#include "mmio.h"

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

uint64_t roundi(double x)
{
  if (x < 0.0) {
    return (uint64_t)(x - 0.5);
  } else {
    return (uint64_t)(x + 0.5);
  }
}

int main(void)
{
  double test_vector[15] = {1.0, 2.0, 3.0, 4.0, 5.0, 4.0, 3.0, 2.0, 1.0, 0.5, 0.25, 0.125, 0.125};
  uint32_t num_tests = sizeof(test_vector) / sizeof(double);
  printf("Starting writing %d inputs\n", num_tests);

  for (int i = 0; i < num_tests; i++) {
    reg_write64(PASSTHROUGH_WRITE, roundi(test_vector[i] * BP_SCALE));
  }

  printf("Done writing\n");
  uint32_t rcnt = reg_read32(PASSTHROUGH_READ_COUNT);
  printf("Write count: %d\n", reg_read32(PASSTHROUGH_WRITE_COUNT));
  printf("Read count: %d\n", rcnt);

  int failed = 0;
  if (rcnt != 0) {
    for (int i = 0; i < num_tests - 3; i++) {
      uint32_t res = reg_read32(PASSTHROUGH_READ);
      // double res = ((double)reg_read32(PASSTHROUGH_READ)) / BP_SCALE;
      double expected_double = 3*test_vector[i] + 2*test_vector[i+1] + test_vector[i+2];
      uint32_t expected = ((uint32_t)(expected_double * BP_SCALE + 0.5)) & 0xFF;
      if (res == expected) {
        printf("\n\nPass: Got %u Expected %u\n\n", res, expected);
      } else {
        failed = 1;
        printf("\n\nFail: Got %u Expected %u\n\n", res, expected);
      }
    }
  } else {
    failed = 1;
  }

  if (failed) {
    printf("\n\nSome tests failed\n\n");
  } else {
    printf("\n\nAll tests passed\n\n");
  }
  
  return 0;
}

測試結果如下
在這裡插入圖片描述

4.4 構建多核異構SOC

一個典型的配置為單核Boom與單核Rocket以及其他必要的元件構成一個異構SOC

class LargeBoomAndHwachaRocketConfig extends Config(
  new chipyard.iobinders.WithUARTAdapter ++
  new chipyard.iobinders.WithTieOffInterrupts ++
  new chipyard.iobinders.WithBlackBoxSimMem ++
  new chipyard.iobinders.WithTiedOffDebug ++
  new chipyard.iobinders.WithSimSerial ++
  new testchipip.WithTSI ++
  new chipyard.config.WithBootROM ++
  new chipyard.config.WithUART ++
  new chipyard.config.WithMultiRoCC ++                                  // support heterogeneous rocc
  new chipyard.config.WithMultiRoCCHwacha(1) ++                         // put hwacha on hart-2 (rocket)
  new chipyard.config.WithL2TLBs(1024) ++
  new chipyard.config.WithRenumberHarts ++
  new boom.common.WithLargeBooms ++
  new boom.common.WithNBoomCores(1) ++
  new freechips.rocketchip.subsystem.WithNoMMIOPort ++
  new freechips.rocketchip.subsystem.WithNoSlavePort ++
  new freechips.rocketchip.subsystem.WithInclusiveCache ++
  new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++
  new freechips.rocketchip.subsystem.WithNBigCores(1) ++
  new freechips.rocketchip.subsystem.WithCoherentBusTopology ++
  new freechips.rocketchip.system.BaseConfig)

更多的細節內容建議直接訪問官方文件,以及文章的後續(如果有機會的話,看情況會有core移植到FPGA、Linux作業系統移植的相關內容)。



整理不易,嚴禁剽竊!

在這裡插入圖片描述

歡迎大家關注我建立的微信公眾號——小白倉庫
原創經驗資料分享:包含但不僅限於FPGA、ARM、RISC-V、Linux、LabVIEW等軟硬體開發,另外分享生活中的趣事以及感悟。目的是建立一個平臺記錄學習過的知識,並分享出來自認為有用的與感興趣的道友相互交流進步。

相關文章