ITPUB SQL大賽第二期(二)

yangtingkun發表於2011-03-29

貼一下第二期的第二道題。

第二期題目參考:http://www.itpub.net/thread-1403356-1-1.html

版主newkid點評參考:http://www.itpub.net/thread-1411980-1-1.html

 

 

簡單描述一下問題:

問題2(40)

在一個N*N的棋盤中,每行放置M個球,每列、每個45度的斜線上最多放置M個球,其中5<=N<=6,1<=M<=(N-1),現要求出每個M,N組合中最多擺放球的不同的擺法的個數(包括重複和不重複的,分別輸出)。請用最多兩條SQL語句得到以下結果:(以M=2, N=5為例)

兩條SQL的輸出格式:

SQL1:

M      N         AllCnt

2      5          92

 

SQL2:

M      N         NoReptCnt

2      5          xx

 

 

一條SQL的輸出格式:

M      N         AllCnt           NoReptCnt

2      5          92                xx

 

Oracle變數定義如下(以M=2, N=5為例):

var m number;

exec :m:=2;

var n number;

exec :n:=5;

 

這道題其實就是在第一題的基礎上要求了擴充套件性,雖然題目要求5<=N<=6,但是真正擴充套件性好的解題應該是2<=N,當然由於演算法和機器效能的限制,可能N最多也就支援到7左右。

SQL> VAR N NUMBER
SQL> VAR M NUMBER
SQL> EXEC :N := 5

PL/SQL 過程已成功完成。

SQL> EXEC :M := 2

PL/SQL 過程已成功完成。

SQL> SET PAGES 100 LINES 120 TIMING ON

SQL> with i as
  2  (select rownum i from dual connect by rownum <= :n),
  3  j as
  4  (select rownum j from dual connect by rownum <= :n),
  5  position as
  6  (select i, j
  7  from i, j),
  8  b as
  9  (select rownum - 1 b from dual connect by rownum <= 2),
 10  b_line as
 11  (select replace(sys_connect_by_path(b, ','), ',', '') line
 12  from b
 13  where level = :n
 14  connect by level <= :n),
 15  lines as
 16  (select rownum, line from b_line
 17  where instr(line, 1, 1, :m) > 0
 18  and instr(line, 0, 1, :n - :m) > 0),
 19  lines_result as
 20  (select replace(sys_connect_by_path(line, ','), ',', '') result
 21  from lines
 22  where level = :n
 23  connect by level <= :n)
 24  select :m M, :n N, count(*) AllCnt, count(distinct greatest(res1, res2, res3, res4, res5, res6, res7, res8)) NoReptCnt
 25  from (
 26  select result res1, reverse(result) res2,
 27  (select replace(wmsys.wm_concat(substr(result, (i-1)*:n + 1, :n)), ',', '') from (select i from i order by i desc)) res3,
 28  (select replace(wmsys.wm_concat(reverse(substr(result, (i-1)*:n + 1, :n))), ',', '') from (select i from i order by i)) res4,
 29  (select replace(wmsys.wm_concat(substr(result, (i-1)*:n + j, 1)), ',' ,'') from (select i, j from position order by j desc, i)) res5,
 30  (select replace(wmsys.wm_concat(substr(result, (i-1)*:n + j, 1)), ',' ,'') from (select i, j from position order by j, i)) res6,
 31  (select replace(wmsys.wm_concat(substr(result, (i-1)*:n + j, 1)), ',' ,'') from (select i, j from position order by j, i desc)) res7,
 32  (select replace(wmsys.wm_concat(substr(result, (i-1)*:n + j, 1)), ',' ,'') from (select i, j from position order by j desc, i desc)) res8
 33  from (
 34  select result, j col, (j-i) l, (j+i) r, sum(substr(result, (i-1)*:n + j, 1)) c
 35  from lines_result, position
 36  group by grouping sets ((result, j), (result, (j-i)), (result, (j+i))))
 37  group by result
 38  having max(case when l is null and r is null then c end) = :m
 39  and max(case when col is null and l is null then c end) <= :m
 40  and max(case when col is null and r is null then c end) <= :m
 41  );

         M          N     ALLCNT  NOREPTCNT
---------- ---------- ---------- ----------
         2          5         92         14

已用時間:  00: 00: 22.43

這是最初的SQL,以實現為主要目的。主要的思路仍然是仿照第一題的似乎,只不過這裡不能確定多少個表進行連線,因此構造的工作由樹形查詢來完成。

首先仍然是確定每行兩個球,然後將符合條件的行構造陣列。將資料結果和位置進行關聯,分別對jj+ij-i進行聚集,聚集的結果分別對應列、左對角線和右對角線,然後利用having語句過濾不滿足條件的列,最終的結果就是符合條件的陣列。

在符合條件的陣列上,分別進行7種變形,然後最終透過和第一題一樣的count distinct greatest方式,獲取不衝突的值。

雖然結果滿足要求,但是本來滿足條件的陣列就不少,還多次和N*N的陣列進行笛卡爾積,肯定效能不會太好。另外這種方式大量利用了wmsys.wm_concat,效能不但較低,而且這個函式不屬於Oracle的標準函式,不一定在所有的資料庫中都會存在。

當前這個SQLM=2N=5需要將近23秒,跑M=2N=6基本上就出不來結果了。

SQL> with i as
  2  (select rownum i from dual connect by rownum <= :n),
  3  j as
  4  (select rownum j from dual connect by rownum <= :n),
  5  position as
  6  (select i, j from i, j),
  7  trim as
  8  (select sys_connect_by_path(level -1, ' ') str
  9  from dual
 10  where level = :m + 1
 11  connect by level <= :m + 1),
 12  b as
 13  (select rownum - 1 b from dual connect by rownum <= 2),
 14  b_line as
 15  (select sys_connect_by_path(b, ',') line
 16  from b
 17  where level = :n
 18  connect by level <= :n),
 19  lines as
 20  (select rownum, replace(line, ',', '') line from b_line
 21  where instr(line, 1, 1, :m) > 0
 22  and instr(line, 0, 1, :n - :m) > 0),
 23  lines_result as
 24  (select replace(sys_connect_by_path(line, ','), ',', '') result
 25  from lines
 26  where level = :n
 27  connect by level <= :n)
 28  select :m M, :n N, count(*) AllCnt, count(distinct greatest(res1, reverse(res1), res2, reverse(res2), res3, reverse(res3), res4, reverse(res4))) NoReptCnt
 29  from (
 30  select result res1,
 31  (select replace(sys_connect_by_path(reverse(substr(result, (level-1)*:n + 1, :n)), ','), ',', '') from dual where level = :n connect by level <= :n) res2,
 32  (select replace(wmsys.wm_concat(substr(result, (i-1)*:n + j, 1)), ',' ,'') from (select i, j from position order by j desc, i)) res3,
 33  (select replace(wmsys.wm_concat(substr(result, (i-1)*:n + j, 1)), ',' ,'') from (select i, j from position order by j, i)) res4
 34  from lines_result, i
 35  group by result
 36  having sum(substr(result, (i-1)*:n + 1, :n)) = to_number(lpad(:m, :n, :m))
 37  and ltrim(sum(substr(result, (i-1)*:n + 1, :n)*power(10, i-1)), (select str from trim)) is null
 38  and ltrim(sum(substr(result, (i-1)*:n + 1, :n)*power(10, :n-i)), (select str from trim)) is null
 39  );

         M          N     ALLCNT  NOREPTCNT
---------- ---------- ---------- ----------
         2          5         92         14

已用時間:  00: 00: 03.48
SQL> exec :n := 6

PL/SQL 過程已成功完成。

已用時間:  00: 00: 00.00
SQL> /

         M          N     ALLCNT  NOREPTCNT
---------- ---------- ---------- ----------
         2          6       1097        155

已用時間:  00: 08: 23.27

這是第一次大規模最佳化後的結果,跑M=2N=5需要3秒多,是原始SQL1/7,而M=2,N=6也可以得到結果,但是用了8分鐘之久,速度仍然太慢。

這裡最佳化主要在兩個方面,首先去掉了對陣列和位置的笛卡爾積,而變成了與行的笛卡爾積,這使得最終結果小了N倍,同時把GROUPING SET聚集方式又修改為ltrim方式。然後最佳化了RESULT的變形,除了本身結果外,只需要在獲得3種變形,然後利用REVERSE獲得另外的4種變形。

當前的效率仍然很低,主要原因在於使用樹形查詢沒有辦法在執行的中間將不符合條件的結果過濾掉,為了解決這個問題,只能使用11.2的新特性遞迴WITH語句。

遞迴WITH11.2剛出來的時候,作為新特性研究了一下,不過並不深入。現在別說遞迴WITH,就是WITH語句,甚至SELECT語句,寫的都少得多了。所以基本上仍然處於現學現賣的過程,經過若干次小的最佳化之後,提交的最終SQL如下:

SQL> with trim as
  2  (select sys_connect_by_path(level - 1, ' ') str
  3  from dual
  4  where level = :m + 1
  5  connect by level <= :m + 1),
  6  line_count as
  7  (select (select round(power(2, sum(log(2, rownum)))) from dual connect by rownum <= :n)
  8     / (select round(power(2, sum(log(2, rownum)))) from dual connect by rownum <= :m)
  9     / (select round(power(2, sum(log(2, rownum)))) from dual connect by rownum <= :n - :m)
 10  from dual),
 11  b as
 12  (select rownum - 1 b from dual connect by rownum <= 2),
 13  b_line (line, cnt) as
 14  (select b || '', b from b
 15  union all
 16  select b || line, b + cnt
 17  from b, b_line
 18  where cnt <= :m)
 19  cycle line set dup_line to 'y' default 'n',
 20  lines as
 21  (select line
 22  from b_line
 23  where length(line) = :n
 24  and cnt = :m
 25  and rownum <= (select * from line_count)),
 26  line_result (result, cnt, r, l) as
 27  (select line || '', to_number(line), to_number(line), to_number(line)
 28  from lines
 29  union all
 30  select line || result, to_number(line) + cnt, to_number(line) + 10*r, to_number(line) + l/10
 31  from lines, line_result
 32  where ltrim(cnt, (select str from trim)) is null
 33  and ltrim(r, (select str from trim)) is null
 34  and ltrim(l, (select str || '.' from trim)) is null
 35  and instr(case when length(result) >= :n*(:n-:m+1) then to_char(cnt, rpad('0', :n, '9')) else '9' end, '0', 1) = 0
 36  and instr(case when :n-:m+2 < :n and length(result) >= :n*(:n-:m+2) then to_char(cnt, rpad('0', :n, '9')) else '9' end, '1', 1) = 0)
 37  cycle result set dup_result to 'y' default 'n',
 38  result as
 39  (select result, cnt, r, l from line_result
 40  where rownum <= power((select * from line_count), :n))
 41  select :m M, :n N, count(*) AllCnt, count(distinct greatest(res1, reverse(res1), res2, reverse(res2), res3, reverse(res3), res4, reverse(res4))) NoReptCnt
 42  from (
 43  select result res1,
 44  (select replace(sys_connect_by_path(reverse(substr(result, (level-1)*:n + 1, :n)), ','), ',', '') from dual where level = :n connect by level <= :n) res2,
 45  (select replace(sys_connect_by_path(substr(result, mod(level-1,:n)*:n + ceil(level/:n), 1), ','), ',', '') from dual where level = :n*:n connect by level <= :n*:n) res3,
 46  (select replace(sys_connect_by_path(substr(result, mod(level-1,:n)*:n + :n-ceil(level/:n)+1, 1), ','), ',', '') from dual where level = :n*:n connect by level <= :n*:n) res4
 47  from result
 48  where length(result) = :n*:n
 49  and cnt = to_number(lpad(:m, :n, :m))
 50  and ltrim(r, (select str from trim)) is null
 51  and ltrim(l, (select str || '.' from trim)) is null
 52  );

         M          N     ALLCNT  NOREPTCNT
---------- ---------- ---------- ----------
         2          6       1097        155

已用時間:  00: 00: 06.26
SQL> exec :m := 3

PL/SQL 過程已成功完成。

已用時間:  00: 00: 00.00
SQL> /

         M          N     ALLCNT  NOREPTCNT
---------- ---------- ---------- ----------
         3          6      14412       1811

已用時間:  00: 03: 25.54

採用遞迴WITH的最大好處就是可以在每層迭代的同時過濾掉不滿足條件的資料,使得查詢速度有了質的飛躍,執行M=2N=6現在只需要6秒多點,不過最耗時的M=3N=6還需要3分半鐘,不是很令人滿意。

簡單介紹一下思路:trim字句是獲取ltrim的字串;line_count用於計算Cnm的值;兩層遞迴WITH查詢分別構造行記錄和陣列記錄,其中第二個構造陣列結果的遞迴WITH,還同時求出了列和兩個對角線的和,在遞迴的過程中可以用LTRIM函式過濾列和對角線來清除不滿足條件的記錄。

對獲得的結果,利用樹形查詢替代了wmsys.wm_concat對結果進行轉換,獲取另外三種變形,然後利用REVERSE獲取其他四種變形,最終利用COUNT DISTINCT GREATEST來獲得不重複的記錄。

由於最近工作上的事情很多,導致第二題沒有足夠的時間進行思考和最佳化,因此雖然明知道效能不是很滿意,也沒有辦法只能提交了。

不過在做第三題的時候,發現第二題還有很多可以最佳化的地方,只需要做些簡單的修改,在遞迴的時候就可以過濾更多的無用資料:

SQL> with trim as
  2  (select sys_connect_by_path(level - 1, ' ') str
  3  from dual
  4  where level = :m + 1
  5  connect by level <= :m + 1),
  6  b as
  7  (select rownum - 1 b from dual connect by rownum <= 2),
  8  b_line (n, line, cnt) as
  9  (select 1 n, to_char(b), b from b
 10  union all
 11  select n + 1, line || b, cnt + b
 12  from b, b_line
 13  where n < :n
 14  and cnt <= :m),
 15  lines as
 16  (select line
 17  from b_line
 18  where cnt = :m
 19  and length(line) = :n),
 20  line_result (n, result, cnt, r, l) as
 21  (select 1 n, line, to_number(line), to_number(line), to_number(line)
 22  from lines
 23  union all
 24  select n + 1, result || line, to_number(line) + cnt, to_number(line) + 10*r, to_number(line) + l/10
 25  from lines, line_result
 26  where n < :n
 27  and ltrim(to_number(line) + cnt, (select str from trim)) is null
 28  and ltrim(to_number(line) + 10*r, (select str from trim)) is null
 29  and ltrim(to_number(line) + l/10, (select str || '.' from trim)) is null
 30  and instr(case when length(result) >= :n*(:n-:m+1) then to_char(cnt, rpad('0', :n, '9')) else '9' end, '0', 1) = 0
 31  and instr(case when :n-:m+2 < :n and length(result) >= :n*(:n-:m+2) then to_char(cnt, rpad('0', :n, '9')) else '9' end, '1', 1) = 0)
 32  select :m M, :n N, count(*) AllCnt, count(distinct greatest(res1, reverse(res1), res2, reverse(res2), res3, reverse(res3), res4, reverse(res4))) NoReptCnt
 33  from (
 34  select result res1,
 35  (select replace(sys_connect_by_path(reverse(substr(result, (level-1)*:n + 1, :n)), ','), ',', '') from dual where level = :n connect by level <= :n) res2,
 36  (select replace(sys_connect_by_path(substr(result, mod(level-1,:n)*:n + ceil(level/:n), 1), ','), ',', '') from dual where level = :n*:n connect by level <= :n*:n) res3,
 37  (select replace(sys_connect_by_path(substr(result, mod(level-1,:n)*:n + :n-ceil(level/:n)+1, 1), ','), ',', '') from dual where level = :n*:n connect by level <= :n*:n) res4
 38  from line_result
 39  where length(result) = :n*:n
 40  and cnt = to_number(lpad(:m, :n, :m))
 41  and ltrim(r, (select str from trim)) is null
 42  and ltrim(l, (select str || '.' from trim)) is null
 43  );

         M          N     ALLCNT  NOREPTCNT
---------- ---------- ---------- ----------
         2          6       1097        155

已用時間:  00: 00: 01.37
SQL> exec :m := 3

PL/SQL 過程已成功完成。

已用時間:  00: 00: 00.00
SQL> /

         M          N     ALLCNT  NOREPTCNT
---------- ---------- ---------- ----------
         3          6      14412       1811

已用時間:  00: 00: 25.14

速度又提高了將近8倍。對於M=3N=6而言,26秒的結果應該算是比較滿意了,而且程式碼長度減少了1/5

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4227/viewspace-691264/,如需轉載,請註明出處,否則將追究法律責任。

相關文章