Why systolic architectures?
- Simple and regular design
- nonrecurring cost(設計 design):簡單而規則的硬體架構,Google在很短的時間內完成了晶片的設計和實現。
- recurring cost(器件 parts)
- Concurrency and communication
- Balancing computation with I/O
“(Semi-) systolic convolution arrays with global data communication”
broadcast inputs, move results, weights stay
broadcast inputs, move weights, results stay
fan-in results, move inputs, weights stay
“(Pure-) systolic convolution arrays without global data communication”
esults stay, inputs and weights move in opposite directions
results stay, inputs and weights move in the same direction but at different speeds
weights stay, inputs and results move in opposite direction
weights stay, inputs and results move in the same direction but at different speeds
對原始的矩陣進行一些reformat
from: 深入理解Google TPU的脈動陣列架構