今天講講arm彙編中除法的底層實現。彙編程式碼本身比較長了,如需參考請直接拉到文末。
下面我直接把arm的除法演算法的彙編程式碼轉譯成C語言的程式碼貼出來,並進行解析。
因為篇幅有限,所以在此只解析無符號整型的除法運算,關於無符號除法和有符號除法的區別請參考上一篇推送。
程式碼較長如下,電腦端看效果更佳,如無耐心請直接拉下去看講解即可:
#include<stdio.h>
unsigned int count_leading_zeros(unsigned int num)
{
unsigned int cnt = 0;
while(!(num & 0x80000000) && cnt < 32){
cnt++;
num <<= 1;
}
return cnt;
}
unsigned int div_unsigned(unsigned int dividend, unsigned int divisor)
{
unsigned int answer = 0;
int cc;
unsigned int divisor_lz = 0, dividend_lz = 0;
if (divisor == 1){
return dividend;
}else if (divisor < 1){
return -1;
}
if (divisor == dividend){
return 1;
}else if (dividend < divisor){
return 0;
}
if ((divisor & (divisor - 1)) == 0){
return dividend >> (31 - count_leading_zeros(divisor));
}
divisor_lz = count_leading_zeros(divisor);
dividend_lz = count_leading_zeros(dividend);
printf("dividend[0x%x], dividend_lz[%d], divisor[0x%x], divisor_lz[%d]\n", dividend, dividend_lz, divisor, divisor_lz);
cc = divisor_lz - dividend_lz;
while(cc >= 0){
answer <<= 1;
if (dividend >= (divisor << cc)){
answer += 1;
dividend -= (divisor << cc);
}
cc--;
}
return answer;
}
main(){
unsigned int a = 0x80000000 / 3;
unsigned int b = div_unsigned(0x80000000, 3);
printf("[0x%x][0x%x]",a, b);
}
2次冪和移位運算
在以上程式碼中我們終於看到了移位運算對除法運算的優化:
當除數是2的N次冪時,可以直接對被除數做右移運算來代替除法, 比如除數是2即(2的1次冪),此時只需要對被除數做一次右移即可,同理如果除數是8則對被除數做三次右移。
而判斷一個數字是不是2的N次冪只需要一行程式碼:
if ((divisor & (divisor - 1)) == 0){
這一行程式碼也幾乎就是leetcode的第231題2的冪的答案:
2^x | n | n - 1 | n & (n - 1) |
---|---|---|---|
2^0 | 0001 | 0000 | (0001) & (0000) == 0 |
2^1 | 0010 | 0001 | (0010) & (0001) == 0 |
2^2 | 0100 | 0011 | (0100) & (0011) == 0 |
2^3 | 1000 | 0111 | (1000) & (0111) == 0 |
如有疑問請繼續參考leetcode的題解:https://leetcode-cn.com/problems/power-of-two/solution/power-of-two-er-jin-zhi-ji-jian-by-jyd/
而計算2的N次冪中的N,也只需要這一句即可:
(31 - count_leading_zeros(divisor))
count_leading_zeros即為一個32bit的數字以二進位制呈現的時候,從高位向低位數開始數有連續多少個0的數量。
比如數字2的二進位制是: 0000 0000 0000 0000 0000 0000 0000 0010
在第一個bit1出現之前有30個0。
判斷是否是2的N次冪,並且計算出N的大小並進行右移也只需要以下三行程式碼。
if ((divisor & (divisor - 1)) == 0){
return dividend >> (31 - count_leading_zeros(divisor));
}
為什麼要使用count_leading_zeros這種方法呢,雖然我在上面的程式碼中定義了函式count_leading_zeros,但是在arm彙編中只需要一條指令clz即可,計算2的N次冪的N加上右移也只需要三條指令即可,非常高效:
clz r2, r1 //計算leading zeros的數量
rsb r2, r2, #31 //31 - count_leading_zeros(divisor)
lsr.w r0, r0, r2 // 進行右移
二進位制的除法解析
那麼更多情況下,除數也並不是2的N次冪。如果除數是3,那麼還是要做一下正規的除法了。
我做了一張圖來對比8/3的十進位制和二進位制的除法。
在二進位制時,任何一個bit不可能大於1,所以當兩個數字的leading zeros相同時,被除數不可能會整除除數超過或者等於兩次。也就是說leading zeros相同時,被除數要麼能整除除數一次,要麼是0次。
二進位制運算除法的時候,首先會對除數做左移操作,讓除數和被除數進行“對齊”(即leading zeros數量相同),如果此時的被除數大於等於此時(左移後的)除數,那麼在相應的答案位上置一,否則置0。然後對(左移後的)除數做右移一位操作再繼續和被除數做比較,直到除數恢復成原來的初始值(這時候會作最後一次運算)。如下程式碼所示:
cc = divisor_lz - dividend_lz;
while(cc >= 0){
answer <<= 1;
if (dividend >= (divisor << cc)){
answer += 1;
dividend -= (divisor << cc);
}
cc--;
}
所以在二進位制整型數字的除法世界中,只需要減法和移位操作就能夠滿足除法運算的需求。最後我才發現,二進位制的除法原本就是這麼簡單,比十進位制的除法還要簡單。
本文完,以下為參考資料。
arm的指令集查文件:
http://users.ece.utexas.edu/~valvano/Volume1/QuickReferenceCard.pdf
https://iitd-plos.github.io/col718/ref/arm-instructionset.pdf
div無符號整形的除法彙編如下:
00010490 <__udivsi3>:
10490: 1e4a subs r2, r1, #1
10492: bf08 it eq
10494: 4770 bxeq lr
10496: f0c0 8124 bcc.w 106e2 <__udivsi3+0x252>
1049a: 4288 cmp r0, r1
1049c: f240 8116 bls.w 106cc <__udivsi3+0x23c>
104a0: 4211 tst r1, r2
104a2: f000 8117 beq.w 106d4 <__udivsi3+0x244>
104a6: fab0 f380 clz r3, r0
104aa: fab1 f281 clz r2, r1
104ae: eba2 0303 sub.w r3, r2, r3
104b2: f1c3 031f rsb r3, r3, #31
104b6: a204 add r2, pc, #16 ; (adr r2, 104c8 <__udivsi3+0x38>)
104b8: eb02 1303 add.w r3, r2, r3, lsl #4
104bc: f04f 0200 mov.w r2, #0
104c0: 469f mov pc, r3
104c2: bf00 nop
104c4: f3af 8000 nop.w
104c8: ebb0 7fc1 cmp.w r0, r1, lsl #31
104cc: bf00 nop
104ce: eb42 0202 adc.w r2, r2, r2
104d2: bf28 it cs
104d4: eba0 70c1 subcs.w r0, r0, r1, lsl #31
104d8: ebb0 7f81 cmp.w r0, r1, lsl #30
104dc: bf00 nop
104de: eb42 0202 adc.w r2, r2, r2
104e2: bf28 it cs
104e4: eba0 7081 subcs.w r0, r0, r1, lsl #30
104e8: ebb0 7f41 cmp.w r0, r1, lsl #29
104ec: bf00 nop
104ee: eb42 0202 adc.w r2, r2, r2
104f2: bf28 it cs
104f4: eba0 7041 subcs.w r0, r0, r1, lsl #29
104f8: ebb0 7f01 cmp.w r0, r1, lsl #28
104fc: bf00 nop
104fe: eb42 0202 adc.w r2, r2, r2
10502: bf28 it cs
10504: eba0 7001 subcs.w r0, r0, r1, lsl #28
10508: ebb0 6fc1 cmp.w r0, r1, lsl #27
1050c: bf00 nop
1050e: eb42 0202 adc.w r2, r2, r2
10512: bf28 it cs
10514: eba0 60c1 subcs.w r0, r0, r1, lsl #27
10518: ebb0 6f81 cmp.w r0, r1, lsl #26
1051c: bf00 nop
1051e: eb42 0202 adc.w r2, r2, r2
10522: bf28 it cs
10524: eba0 6081 subcs.w r0, r0, r1, lsl #26
10528: ebb0 6f41 cmp.w r0, r1, lsl #25
1052c: bf00 nop
1052e: eb42 0202 adc.w r2, r2, r2
10532: bf28 it cs
10534: eba0 6041 subcs.w r0, r0, r1, lsl #25
10538: ebb0 6f01 cmp.w r0, r1, lsl #24
1053c: bf00 nop
1053e: eb42 0202 adc.w r2, r2, r2
10542: bf28 it cs
10544: eba0 6001 subcs.w r0, r0, r1, lsl #24
10548: ebb0 5fc1 cmp.w r0, r1, lsl #23
1054c: bf00 nop
1054e: eb42 0202 adc.w r2, r2, r2
10552: bf28 it cs
10554: eba0 50c1 subcs.w r0, r0, r1, lsl #23
10558: ebb0 5f81 cmp.w r0, r1, lsl #22
1055c: bf00 nop
1055e: eb42 0202 adc.w r2, r2, r2
10562: bf28 it cs
10564: eba0 5081 subcs.w r0, r0, r1, lsl #22
10568: ebb0 5f41 cmp.w r0, r1, lsl #21
1056c: bf00 nop
1056e: eb42 0202 adc.w r2, r2, r2
10572: bf28 it cs
10574: eba0 5041 subcs.w r0, r0, r1, lsl #21
10578: ebb0 5f01 cmp.w r0, r1, lsl #20
1057c: bf00 nop
1057e: eb42 0202 adc.w r2, r2, r2
10582: bf28 it cs
10584: eba0 5001 subcs.w r0, r0, r1, lsl #20
10588: ebb0 4fc1 cmp.w r0, r1, lsl #19
1058c: bf00 nop
1058e: eb42 0202 adc.w r2, r2, r2
10592: bf28 it cs
10594: eba0 40c1 subcs.w r0, r0, r1, lsl #19
10598: ebb0 4f81 cmp.w r0, r1, lsl #18
1059c: bf00 nop
1059e: eb42 0202 adc.w r2, r2, r2
105a2: bf28 it cs
105a4: eba0 4081 subcs.w r0, r0, r1, lsl #18
105a8: ebb0 4f41 cmp.w r0, r1, lsl #17
105ac: bf00 nop
105ae: eb42 0202 adc.w r2, r2, r2
105b2: bf28 it cs
105b4: eba0 4041 subcs.w r0, r0, r1, lsl #17
105b8: ebb0 4f01 cmp.w r0, r1, lsl #16
105bc: bf00 nop
105be: eb42 0202 adc.w r2, r2, r2
105c2: bf28 it cs
105c4: eba0 4001 subcs.w r0, r0, r1, lsl #16
105c8: ebb0 3fc1 cmp.w r0, r1, lsl #15
105cc: bf00 nop
105ce: eb42 0202 adc.w r2, r2, r2
105d2: bf28 it cs
105d4: eba0 30c1 subcs.w r0, r0, r1, lsl #15
105d8: ebb0 3f81 cmp.w r0, r1, lsl #14
105dc: bf00 nop
105de: eb42 0202 adc.w r2, r2, r2
105e2: bf28 it cs
105e4: eba0 3081 subcs.w r0, r0, r1, lsl #14
105e8: ebb0 3f41 cmp.w r0, r1, lsl #13
105ec: bf00 nop
105ee: eb42 0202 adc.w r2, r2, r2
105f2: bf28 it cs
105f4: eba0 3041 subcs.w r0, r0, r1, lsl #13
105f8: ebb0 3f01 cmp.w r0, r1, lsl #12
105fc: bf00 nop
105fe: eb42 0202 adc.w r2, r2, r2
10602: bf28 it cs
10604: eba0 3001 subcs.w r0, r0, r1, lsl #12
10608: ebb0 2fc1 cmp.w r0, r1, lsl #11
1060c: bf00 nop
1060e: eb42 0202 adc.w r2, r2, r2
10612: bf28 it cs
10614: eba0 20c1 subcs.w r0, r0, r1, lsl #11
10618: ebb0 2f81 cmp.w r0, r1, lsl #10
1061c: bf00 nop
1061e: eb42 0202 adc.w r2, r2, r2
10622: bf28 it cs
10624: eba0 2081 subcs.w r0, r0, r1, lsl #10
10628: ebb0 2f41 cmp.w r0, r1, lsl #9
1062c: bf00 nop
1062e: eb42 0202 adc.w r2, r2, r2
10632: bf28 it cs
10634: eba0 2041 subcs.w r0, r0, r1, lsl #9
10638: ebb0 2f01 cmp.w r0, r1, lsl #8
1063c: bf00 nop
1063e: eb42 0202 adc.w r2, r2, r2
10642: bf28 it cs
10644: eba0 2001 subcs.w r0, r0, r1, lsl #8
10648: ebb0 1fc1 cmp.w r0, r1, lsl #7
1064c: bf00 nop
1064e: eb42 0202 adc.w r2, r2, r2
10652: bf28 it cs
10654: eba0 10c1 subcs.w r0, r0, r1, lsl #7
10658: ebb0 1f81 cmp.w r0, r1, lsl #6
1065c: bf00 nop
1065e: eb42 0202 adc.w r2, r2, r2
10662: bf28 it cs
10664: eba0 1081 subcs.w r0, r0, r1, lsl #6
10668: ebb0 1f41 cmp.w r0, r1, lsl #5
1066c: bf00 nop
1066e: eb42 0202 adc.w r2, r2, r2
10672: bf28 it cs
10674: eba0 1041 subcs.w r0, r0, r1, lsl #5
10678: ebb0 1f01 cmp.w r0, r1, lsl #4
1067c: bf00 nop
1067e: eb42 0202 adc.w r2, r2, r2
10682: bf28 it cs
10684: eba0 1001 subcs.w r0, r0, r1, lsl #4
10688: ebb0 0fc1 cmp.w r0, r1, lsl #3
1068c: bf00 nop
1068e: eb42 0202 adc.w r2, r2, r2
10692: bf28 it cs
10694: eba0 00c1 subcs.w r0, r0, r1, lsl #3
10698: ebb0 0f81 cmp.w r0, r1, lsl #2
1069c: bf00 nop
1069e: eb42 0202 adc.w r2, r2, r2
106a2: bf28 it cs
106a4: eba0 0081 subcs.w r0, r0, r1, lsl #2
106a8: ebb0 0f41 cmp.w r0, r1, lsl #1
106ac: bf00 nop
106ae: eb42 0202 adc.w r2, r2, r2
106b2: bf28 it cs
106b4: eba0 0041 subcs.w r0, r0, r1, lsl #1
106b8: ebb0 0f01 cmp.w r0, r1
106bc: bf00 nop
106be: eb42 0202 adc.w r2, r2, r2
106c2: bf28 it cs
106c4: eba0 0001 subcs.w r0, r0, r1
106c8: 4610 mov r0, r2
106ca: 4770 bx lr
106cc: bf0c ite eq
106ce: 2001 moveq r0, #1
106d0: 2000 movne r0, #0
106d2: 4770 bx lr
106d4: fab1 f281 clz r2, r1
106d8: f1c2 021f rsb r2, r2, #31
106dc: fa20 f002 lsr.w r0, r0, r2
106e0: 4770 bx lr
106e2: b108 cbz r0, 106e8 <__udivsi3+0x258>
106e4: f04f 30ff mov.w r0, #4294967295 ; 0xffffffff
106e8: f000 b966 b.w 109b8 <__aeabi_idiv0>