匹配_KMP模式匹配算法：二-Aet

Bingliaolong 2020-10-09 11:53 Aet | 隐藏边栏 | 抢沙发 | 3 0

文章评分 1 次，平均分 5.0 ：

预处理操作

参数

iterP_begin:模式序列P的起始迭代器
iterP_end:模式序列P的终止迭代器

解析

初始化
pai[1] = 0,k = 0

遍历(q从：2->m)
从2开始，因为Pk必须是Pm的真子集。
条件：k > 0 && p[k+1] != p[q]
终止条件：k = pai[k],因为p[k+1]=p[q]说明找到了pk是pm的真子集
p[k+1] == p[q],则,k = k+1 && pai[q] = k

template<typename IteratorP>

std::vector<int> get_pai(const IteratorP iterP_begin,

const IteratorP iterP_end)

{

auto lenP = std::distance(iterP_begin,iterP_end);

if (lenP <= 0)

throw std::invalid_argument("get_pai error:iterP_begin must < iterP_end");

std::vector<int> pai(lenP,0);

int k = 0;

for (int q = 1; q < lenP; q++)

{

//右移直到P[k+1]==P[m]，这里从0计数，所以用*(iterP_begin+k)

while( K>0 && *(iterP_begin+k)!=*(iterP_begin+q))

{

k = pai[k];

}

//确实发生P[k+1]==P[m]，这里从0计数，所以用*(iterP_begin+k)

if (*(iterP_begin+k)==*(iterP_begin+q))

k++;

pai[q] = k;

}

return pai;

}

KMP字符串匹配算法

参数

iterT_begin : 被文本序列T的起始迭代器
iterT_end: 文本序列T的终止迭代器
iterP_begin : 模式序列P的起始迭代器
iterP_end: 模式序列P的终止迭代器

字符串匹配

定义如下：假设文本是一个长度为n的数组 T[1...n]，而模式是一个长度为m的数组P[1...m]，
其中m<=n。进一步假设P和T的元素都是来自一个有限字母集合M的字符。如M={0,1}或者M={a,b,c,...z}。
字符数组P和T通常称为字符串。

原理

模式的前缀函数pai 包含了模式与它自身偏移进行匹配的信息。假设模式字符P[1...q]与文本字符T[s+1,...s+q]匹配，s'是某个偏移量，s'>s。则对于某些k<q，满足：P[1...k]=T[s'+1,...s'+k]的最小s'>s，其中s'+k=s+q?

----------------------------------------------------------

T | 1 | 2 | 3 |.....|s+1|............| s+q |..............| n | T[s+q]

----------------------------------------------------------

|<-----长度为q---->|

--------------------------------

P | 1 | 2 |.......| q |....| m | :Pq=P[1...q]

--------------------------------

----------------------------------------------------------

T | 1 | 2 | 3 |.....|s+1|..|s'+1|..........| s+q |..............| n | T[s+q]

----------------------------------------------------------

|<-----长度为k---->|

------------------------------------

P | 1 | 2 |........| k |..|q|..| m | :Pk=P[1...k]

------------------------------------

换句话说，已知Pq是T[s+q]的后缀，我们希望Pq的真前缀 Pk 也是T[s+q]的后缀。我们把在P前缀长度
范围内的差值 q-k 加到s上即可得到新的偏移 s'=s+(q-k)。

可以用模式与它自身的比较来预先计算出这些必要的信息。前述可知Pk是T[s+q]的后缀，它也是Pq的真前缀，因此要求出 Pk是Pq的后缀的最大的k<q。于是这个新的偏移s'=s+(q-k)就是下一个可能的有效偏移。
之所有求最大的k，就是为了是(q-k)尽可能小，从而不会漏过任何的可能的有效偏移。

模式P的前缀函数就是函数 pai:{1,2,...,m}--> {0,1,2,...,m-1}，满足：
pai[q]=max{k:k<q 且 Pk是Pq的后缀}。即pai[q]是Pq的真后缀P的最长前缀长度。

辅助函数

KMP 算法用到了辅助函数 pai，它在O(m)时间内根据模式预先计算出pai并且存放在数组pai[1...m]中。
数组pai能够使我们按照需要即时计算出转移函数。
计算出pai数组之后，KMP算法从左到右扫描文本序列T，并从pai中获取转移函数。当状态结果为 m时，当前偏移为有效偏移点。

步骤

预处理算法
匹配算法

初始化
q = 0

遍历(i从：1->n)
条件：q > 0 && p[q+1] != T[i];在循环中执行q=pai[q]
如果：P[q+1]==T[i]则q=q+1
如果：q==m,则找到了有效偏移点。将结果保存，然后q=pai[q]

性能

计算前缀函数的运行时间为 O(m)，匹配时间为O(n)，总运行时间为 O(n).

template<typename IteratorT,typename IteratorP>

std::vector<int> kmp_match(const IteratorT iterT_begin,

const IteratorT iterT_end,

const IteratorP iterP_begin,

const IteratorP iterP_end)

{

typedef typename std::iterator_traits<IteratorT>::value_type T1;

typedef typename std::iterator_traits<IteratorP>::value_type T2;

static_assert(std::is_same<T1,T2>::value,"finite_automaton_match error:sequence T and P must contain same type value!");

auto lenT = std::distance(iterT_begin,iterT_end);

auto lenP = std::distance(iterP_begin,iterP_end);

if(lenT < 0)

throw std::invalid_argument("finite_automaton_match error:iterT_begin must <= iterT_end");

if(lenP <= 0)

throw std::invalid_argument("finite_automaton_match error:iterP_begin must < iterP_end");

std::vector<int> result;

if (lenT < lenP)

return result;

//预处理

auto pai = get_pai(iterP_begin,iterP_end);

//匹配

int q = 0;

for(int i = 0;i < lenT;i++)

{

//右移直到P[q+1]==T[i]，这里从0计数，所以用*(iterP_begin+q)

while(q>0 && *(iterP_begin+q)!=*(iterT_begin+i))

q = pai[q];

//确实发生P[q+1]==T[i]，这里从0计数，所以用*(iterP_begin+q)

if (*(iterP_begin+q)==*(iterT_begin+i))

q++;

if (q == lenP)//找到有效偏移点

{

result.push_back(i-lenP+1);//i左侧lenP的位置

q = pai[lenP-1];//防止出现P[lenP+1]的情况出现，这里用pai[lenP-1]

}

return result;

}

预处理操作
KMP字符串匹配算法
字符串匹配
原理
辅助函数
步骤
性能

声明：本文为原创文章，版权归Aet所有，欢迎分享本文，转载请保留出处！

Bingliaolong 关注：0 粉丝：0 最后编辑于：2023-06-03

Everything will be better.

一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

预处理操作

参数

解析

KMP字符串匹配算法

参数

字符串匹配

原理

辅助函数

步骤

性能

你可能也喜欢

热评文章

发表评论

标签云集