哈希及其应用

最新推荐文章于 2025-05-20 16:23:00 发布

原创最新推荐文章于 2025-05-20 16:23:00 发布 · 341 阅读

0 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#哈希算法 #散列表 #数据结构

哈希算法：通过哈希函数将数据（字符串，较大的数）转化为可以用来作为数组下标的数，从而实现快速查找和匹配

一·字符串哈希

1.探究问题

寻找长度为n的字符串(s1)中长度为m的字符串(s2)出现的位置（次数）

2.解决方法

①枚举所有起始位置，O(m*n)

②字符串HASH：比较s1中的子串s3的Hash值与s2的Hash值，即比较H(s2)与H(s3)

思考：如果我们用枚举每一个起始位置的方法计算s3的Hash值，时间复杂度并没有改观

优化1：滚动哈希

取两个互质常数b,M(b<h)，现有字符串T[n]，其每一项为t1,t2,t3,...,tn

定义哈希函数H(T)=(T1* $b^{n-1}$ +T2* $b^{n-2}$ +...+Tn* $b^{0}$ )modM

聪明的你发现了吗：哈希函数事实上就是把字符串看成b进制数

设H(T,k)为字符串T中前k个字符构成字符串的Hash值

那么子串s3= $T_{k+1}...T_{k+m}$ 的Hash值H(s3)=H(T,k+m)-H(T,k)* $b^{m}$

至此，时间复杂度变为O(n+m)

eg.

H(1)=(T1*1)modM

H(2)=(T1*b+T2)modM

例题

【模板】字符串哈希 - 洛谷

#include<cstdio>
#include<cstring>
#include<iostream>
#include<algorithm>
using namespace std;
typedef unsigned long long ull;
ull b=131;
ull mod1=1e9+7;
ull mod2=1e9+9;
const int N=1e5+5; 
struct data
{
	ull x;
	ull y;
}a[N];
bool cmp(data m,data n)
{
	return m.x<n.x;
}
string str;
//双哈希 
ull hash1(string s)
{
	ull l=s.size(),ans=0;
	for(int i=0;i<l;i++)
		ans=(ans*b+(ull)s[i])%mod1;
	return ans;
}
ull hash2(string s)
{
	ull l=s.size(),ans=0;
	for(int i=0;i<l;i++)
		ans=(ans*b+(ull)s[i])%mod2;
	return ans;
}
int main()
{
	int n,i,cnt=1;
	cin>>n;
	for(i=1;i<=n;i++)
	{
		cin>>str;
		a[i].x=hash1(str);
		a[i].y=hash2(str);
	}
	sort(a+1,a+1+n,cmp);
	for(i=2;i<=n;i++)
	{
		if(a[i].x!=a[i-1].x||a[i].y!=a[i-1].y)
			cnt++;
	}
	cout<<cnt<<endl;
	return 0;
}

3461 -- Oulipo

#include<cstdio>
#include<cstring>
#include<iostream>
#include<algorithm>
using namespace std;
typedef unsigned long long ull;
const int N=1000005;
const int b=131;
ull p[N];
ull h[N]; 
char s1[N],s2[N];
int main()
{
	freopen("poj3461.in","r",stdin);
	freopen("poj3461.out","w",stdout);
	int i,l1,l2;
	ull h1;
	p[0]=1;
	for(i=1;i<=10001;i++)
		p[i]=p[i-1]*b;
	int t;
	scanf("%d",&t);
	while(t--)
	{
		scanf("%s%s",s1,s2);
		//s1子串 s2母串 
		l1=strlen(s1);
		l2=strlen(s2);
		//h1子串Hash值
		//注意因为string这里下标是从0开始的
		h[0]=s2[0];
		for(i=1;i<l2;i++)
			h[i]=h[i-1]*b+(ull)s2[i];
		h1=s1[0];
		for(i=1;i<l1;i++)
			h1=h1*b+(ull)s1[i];
		int ans=0;
		if(h1==h[l1-1])ans++;
		for(i=1;i<l2-l1+1;i++)
		{
			if(h1==h[i+l1-1]-h[i-1]*p[l1])
				ans++;
		}
		cout<<ans<<endl;
	}
	return 0;
}

注意这道题用string甚至都超时

单词背诵 - 洛谷

尺取法（ygp的双指针）其实就是维护一个队列包含我们想要的所有单词