POJ 3415-Common Substrings（字尾陣列+單調棧-公共子串的長度）

kewlgrl發表於2017-04-22

Common Substrings

Time Limit: 5000MS		Memory Limit: 65536K
Total Submissions: 10850		Accepted: 3587

Description

A substring of a string T is defined as:

T(i, k)=T_iT_i₊₁...T_i+k_-1, 1≤i≤i+k-1≤|T|.

Given two strings A, B and one integer K, we define S, a set of triples (i, j, k):

S = {(i, j, k) | k≥K, A(i, k)=B(j, k)}.

You are to give the value of |S| for specific A, B and K.

Input

The input file contains several blocks of data. For each block, the first line contains one integer K, followed by two lines containing strings A and B, respectively. The input file is ended by K=0.

1 ≤ |A|, |B| ≤ 10⁵
1 ≤ K ≤ min{|A|, |B|}
Characters of A and B are all Latin letters.

Output

For each case, output an integer |S|.

Sample Input

2
aababaa
abaabaa
1
xx
xx
0

Sample Output

22
5

Source

POJ Monthly--2007.10.06, wintokk

題目意思：

給出兩個字串，計算它們所有的長度大於K的公共子串的個數（可以重複）。

解題思路：

只想到用字尾陣列，妥妥會TLE，看了一下大神說加單調棧優化，單調棧的思路不難，看的時候就是不明白如何用它來優化…(*゜ー゜*)

按照之前求最長公共子串長度的題目，兩串加‘$’連線後求得高度陣列，然後分B去匹配A、A去匹配B兩種情況來掃描，其實思路是一樣的，所以舉一個B去匹配A栗子來說明。

『

_( ﾟДﾟ)ﾉ首先必須要滿足的條件是①最長公共子串長度大於K；②分屬於兩個不同字串即A串和B串。

當A的字尾與B的字尾的最長公共字首長度（最長公共子串長度）大於K，且當前是A串的位置時，個數+=高度陣列lcp[i]-長度限制K+1，因為長度範圍在[K, ]均滿足題意，區間內個數是lcp[i]-K+1。

單調棧維護一個棧頂是不小於K的最小公共字首長度的高度陣列及其對應個數的序列，每次如果當前最小公共字首長度小於棧頂元素則需要調整：去掉出棧元素多加了的個數、滿足條件的A字尾串的個數加上出棧元素對應的個數。

因為現在棧頂元素變的更小了，所以更小的元素肯定是包含了之前那個比它大的元素對應的“公共子串的個數”，我們就要在調整的過程中減去這部分被重複計算的元素個數。

最後需要再次判斷相鄰字尾是否屬於B串，此時滿足分屬於兩個不同字串，即A串和B串。

』

同理對A去匹配B再次掃描，區別是先判斷屬於B串再判斷屬於A串來判定分屬於兩個不同字串，將兩次掃描的結果相加。

Note：小心地參考了若干大神的部落格以及傳遍大江南北的字尾陣列論文。

#include <iostream>
#include <cstdio>
#include <cstring>
#include <vector>
#include <queue>
#include <algorithm>
using namespace std;
typedef long long ll;
#define MAXN 200100
int n,k,m,lens;
long long ans=0;
string s,t;
int sa[MAXN],lcp[MAXN];
int rank[MAXN*2],tmp[MAXN*2];
void construct_lcp(string s,int sa[],int lcp[])
{
    int n=s.length();
    for(int i=0; i<=n; ++i)
        rank[sa[i]]=i;
    int h=0;
    lcp[0]=0;
    for(int i=0; i<n; ++i)
    {
        int j=sa[rank[i]-1];
        if(h>0) --h;
        for(; j+h<n&&i+h<n; ++h)
            if(s[j+h]!=s[i+h]) break;
        lcp[rank[i]-1]=h;
    }
}
bool compare_sa(int i,int j)//倍增法,比較rank
{
    if(rank[i]!=rank[j]) return rank[i]<rank[j];
    else
    {
        int ri=i+k<=n?rank[i+k] :-1;
        int rj=j+k<=n?rank[j+k] :-1;
        return ri<rj;
    }
}

void construct_sa(string s,int sa[])//計算s的字尾陣列
{
    for(int i=0; i<=n; ++i)//初始長度為1，rank為字元編碼
    {
        sa[i]=i;
        rank[i]=i<n?s[i] :-1;
    }
    for(k=1; k<=n; k*=2)//倍增法求字尾陣列
    {
        sort(sa,sa+n+1,compare_sa);
        tmp[sa[0]]=0;
        for(int i=1; i<=n; ++i)
            tmp[sa[i]]=tmp[sa[i-1]]+(compare_sa(sa[i-1],sa[i])?1:0);
        for(int i=0; i<=n; ++i)
            rank[i]=tmp[i];
    }
}
bool contain(string s,int sa[],string t)
{
    int a=0,b=s.length();
    while(b-a>1)
    {
        int c=(a+b)/2;
        if(s.compare(sa[c],t.length(),t)<0) a=c;
        else b=c;
    }
    return s.compare(sa[b],t.length(),t)==0;
}
void solve()
{
    int dull[MAXN][2];//維護字尾的單調遞減棧,棧頂最小,dull[i][0]是lcp[i],dull[i][1]是個數
    long long temp,top;//temp記錄當前棧中所有項和一個剛進入的子串匹配所能得到的總的子串的數目
    ans=0;
    //第一次掃描，B串中的子串匹配rank比其高的A子串
    for(int i=0; i<n; i++) //每遇到一個B的字尾就統計與前面的A的字尾能產生多少個長度不小於k的公共子串
    {
        if (lcp[i]<m) top=temp=0;
        else//A的字尾與B的字尾的最長公共字首長度滿足限制條件
        {
            int res=0;//滿足條件的A字尾串的個數
            if(sa[i]<lens)//在第一個串中
            {
                ++res;
                temp+=lcp[i]-m+1;//更新個數,長度範圍在[m,最長公共字首長度]均滿足題意
            }
            while(top>0&&lcp[i]<=dull[top-1][0])//調整單調棧,當前最長公共字首長度比棧頂元素還小
            {
                --top;
                temp-=dull[top][1]*(dull[top][0]-lcp[i]);//去掉出棧元素多加了的個數
                res+=dull[top][1];
            }
            dull[top][0]=lcp[i];
            dull[top++][1]=res;
            if(sa[i+1]>lens)//在第二個串中，即分屬於兩個不同字串
                ans+=temp;
        }
    }
    //第二次掃描，A串中的子串匹配rank比其高的B子串
    for(int i=0; i<n; i++)
    {
        if (lcp[i]<m) top=temp=0;
        else
        {
            int res=0;//滿足條件的B字尾串的個數
            if(sa[i]>lens)//在第二個串中
            {
                ++res;
                temp+=lcp[i]-m+1;
            }
            while (top>0&&lcp[i]<=dull[top-1][0])
            {
                --top;
                temp-=dull[top][1]*(dull[top][0]-lcp[i]);
                res+=dull[top][1];
            }
            dull[top][0]=lcp[i];
            dull[top++][1]=res;
            if(sa[i+1]<lens)//在第一個串中，即分屬於兩個不同字串
                ans+=temp;
        }
    }
}
int main()
{
#ifdef ONLINE_JUDGE
#else
    freopen("G:/cbx/read.txt","r",stdin);
    //freopen("G:/cbx/out.txt","w",stdout);
#endif
    ios::sync_with_stdio(false);
    cin.tie(0);
    while(cin>>m)
    {
        if(m==0) break;
        cin>>s;
        cin>>t;
        lens=s.length();
        s+='$'+t;//連線串
        n=s.length();
        construct_sa(s,sa);
        construct_lcp(s,sa,lcp);
        solve();
        cout<<ans<<endl;
    }
    return 0;
}

Codeforces #123D: 字尾陣列+單調棧
2018-05-13
3D陣列
最長公共子串二維陣列 Go實現
2021-01-01
陣列Go
POJ1743 Musical Theme(字尾陣列二分)
2018-07-04
陣列
lCS(最長公共子串)
2024-04-15
DreamJudge-1294-字尾子串排序
2024-06-15
排序
字尾陣列 SA
2024-03-21
陣列
字尾陣列模板
2020-11-01
陣列
字尾陣列，SA
2024-07-30
陣列
線性dp：最長公共子串
2024-08-24
203. 長度最小的子陣列
2024-10-29
陣列
【筆記】字尾陣列
2024-03-07
筆記陣列
字尾陣列（後續）
2020-10-08
陣列
字尾陣列複習
2022-02-20
陣列
leetcode_209. 長度最小的子陣列
2024-07-17
LeetCode陣列
LeetCode-209-長度最小的子陣列
2022-04-28
LeetCode陣列
【LeetCode】209. 長度最小的子陣列
2021-08-04
LeetCode陣列
POJ 2752+KMP+利用next陣列性質求出所有相同的字首和字尾
2020-04-04
KMP陣列
單調棧/單調佇列
2024-10-08
佇列
OI loves Algorithm——字尾陣列
2024-07-10
Go陣列
每日一練（45）：長度最小的子陣列
2022-04-25
陣列
單調棧和單調佇列
2024-09-22
佇列
單調棧和單調佇列
2024-07-27
佇列
字尾陣列學習筆記
2024-04-28
陣列筆記
字尾陣列學習筆記
2024-07-17
陣列筆記
字元陣列的長度
2019-03-17
字元陣列
1588 所有奇數長度子陣列的和（字首和）
2020-10-01
陣列
Q11 LeetCode209 長度最小的子陣列
2024-06-06
LeetCode陣列
牛客網 Coincidence（最長公共子串LCS板題）
2019-01-28
IDE
BZOJ2882: 工藝(字尾陣列)
2018-11-27
陣列
977.有序陣列的平方，209.長度最小的子陣列，59.螺旋矩陣II
2024-06-22
陣列矩陣
3254. 長度為 K 的子陣列的能量值 I
2024-11-06
陣列
程式碼隨想錄陣列二刷：長度最小的子陣列（滑動視窗）
2024-07-20
陣列
C++陣列長度
2019-04-03
C++陣列
查詢陣列中出現次數大於陣列長度一半的數字
2020-12-15
陣列
Java 定義長度為 0 的陣列 / 空陣列
2019-03-14
Java陣列
100251. 陣列中的最短非公共子字串暴力解法
2024-03-10
陣列字串
Day2| 977.有序陣列的平方，209.長度最小的子陣列，59.螺旋矩陣II
2024-05-24
陣列矩陣
Day2 |977.有序陣列的平方& 209.長度最小的子陣列&59.螺旋矩陣II
2024-07-04
陣列矩陣
牛客題霸 [最長公共子串]C++題解/答案
2020-11-09
C++

POJ 3415-Common Substrings（字尾陣列+單調棧-公共子串的長度）

題目意思：

解題思路：

相關文章