HDU4920 Matrix multiplication (CPU cache對程式的影響)

bigbigship發表於2014-08-06
Problem Description
Given two matrices A and B of size n×n, find the product of them.

bobo hates big integers. So you are only asked to find the result modulo 3.
 

Input
The input consists of several tests. For each tests:

The first line contains n (1≤n≤800). Each of the following n lines contain n integers -- the description of the matrix A. The j-th integer in the i-th line equals Aij. The next n lines describe the matrix B in similar format (0≤Aij,Bij≤109).
 

Output
For each tests:

Print n lines. Each of them contain n integers -- the matrix A×B in similar format.
 

Sample Input
1 0 1 2 0 1 2 3 4 5 6 7
 

Sample Output
0 0 1 2 1


經典的矩陣乘法因為第三層迴圈(最內層迴圈)是對k進行迴圈,因此b[k][j]是對b逐列進行訪問。我們知道記憶體中二維陣列是以行為單位連續儲存的,逐列訪問將會每次跳1000*4(bytes)。根據cpu cache的替換策略,將會有大量的cache失效。

因此square2.cpp將j迴圈和k迴圈交換位置,這樣就保證了

c[i][j] += a[i][k] * b[k][j];

這條語句對記憶體的訪問是連續的,增加了cache的命中率,大大提升了程式執行速度。

具體見樣例:http://blog.csdn.net/a775700879/article/details/11750703

程式碼如下:

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;

const int maxn = 810;

int a[maxn][maxn],b[maxn][maxn],c[maxn][maxn];

int n;

int main()
{
    while(~scanf("%d",&n)){
        int i,j,k;
        for(i=0;i<n;i++){
            for(j=0;j<n;j++){
                scanf("%d",&a[i][j]);
                a[i][j]%=3;
                c[i][j]=0;
            }
        }
        for(i=0;i<n;i++)
            for(int j=0;j<n;j++){
                scanf("%d",&b[i][j]);
                b[i][j]%=3;
            }
        for(i=0;i<n;i++)
            for(k=0;k<n;k++)
                for(j=0;j<n;j++)
                    c[i][j]=c[i][j]+a[i][k]*b[k][j];
        for(i=0;i<n;i++){
            for(j=0;j<n-1;j++)
                printf("%d ",c[i][j]%3);
            printf("%d\n",c[i][n-1]%3);
        }
    }
    return 0;
}


相關文章