leetcode 30. Substring with Concatenation of All Words

最新推荐文章于 2020-06-29 16:37:31 发布

原创最新推荐文章于 2020-06-29 16:37:31 发布 · 201 阅读

0 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#字符串 #匹配 #算法 #滑动窗口

leetcode 专栏收录该内容

95 篇文章

订阅专栏

本文探讨了在给定字符串中寻找特定单词列表所有可能组合的问题，通过滑动窗口算法优化解决，提高了搜索效率。

一题目

You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.

Example 1:

Input:
  s = "barfoothefoobarman",
  words = ["foo","bar"]
Output: [0,9]
Explanation: Substrings starting at index 0 and 9 are "barfoor" and "foobar" respectively.
The output order does not matter, returning [9,0] is fine too.

Example 2:

Input:
  s = "wordgoodgoodgoodbestword",
  words = ["word","good","best","word"]
Output: []

二分析

大意是给定一个字符串和一个包含若干个词的列表，然后找出列表中所有词的各种组合在字符串中的位置。注意情况是仅一次。

本题是hard级别。通常情况下hard级别的题目对于暴力循环的方式会超时TLE。

会遇到重复的情况

"wordgoodgoodgoodbestword"
["word","good","best","good"]

所以开始想的，因为子串的长度固定，就按照子串长度去匹配吧，如果len=3,大概效果这样【0,3,6,9.。。。】

还会遇到不是标准对齐的情况：

"lingmindraboofooowingdingbarrwingmonkeypoundcake"
["fooo","barr","wing","ding","wing"]

所以，不但要考虑上面遍历一遍，还是需要逐个字母去偏移后迭代。然后偏移一个字符 1，4，7,10...然后再偏移一个字符 2，5，8，11.。。。这样就可以吧全部case覆盖到。

因为考虑到重复的情况，所以需要放到hashmap（ map）对应来记录 words 里的所有词及出现的次数，然后我们一个单词一个单词的遍历，如果当前遍历的到的单词t在 map 中存在，那么我们将其加入另一个哈希表 curmap 中，如果在 curmap 中个数小于等于 map 中的个数，那么我们 count 自增1，如果大于了，那么需要做一些处理，说明不连续了，就break。

如果count==words的数量，说明子串都是 words 中的单词，并且刚好构成了 words，则将当前位置i加入结果 res 即可。

public static void main(String[] args) {
		
		
		String[] words4 = {"bar","foo","the"};
		List<Integer> res4 =	findSubstring( "barfoofoobarthefoobarman",words4);		
		System.out.println(JSON.toJSON(res4));

		String[] words6 = {"fooo","barr","wing","ding","wing"};
		List<Integer> res6 =	findSubstring( "lingmindraboofooowingdingbarrwingmonkeypoundcake",words6);		
		System.out.println(JSON.toJSON(res6));
		
	
		
		String[] words5 = {"word","good","best","good"};
		List<Integer> res5 =	findSubstring( "wordgoodgoodgoodbestword",words5);
		System.out.println(JSON.toJSON(res5));
		
		String[] words = {"foo","bar"};
		List<Integer> res =	findSubstring( "barfoothefoobarman",words);
		System.out.println(JSON.toJSON(res));
		
		String[] words1 = {"aa","aa"};
		List<Integer> res1 =	findSubstring( "aaa",words1);
		System.out.println(JSON.toJSON(res1));
		

		
		String[] words3 = {"ba","ab","ab"};
		List<Integer> res3 =	findSubstring( "abaababbaba",words3);		
		System.out.println(JSON.toJSON(res3));
	
		String[] words2 = {"word","good","best","word"};
		List<Integer> res2 =	findSubstring( "wordgoodgoodgoodbestword",words2);		
		System.out.println(JSON.toJSON(res2));
		

		

	}
    //滑动窗口
	public static List<Integer> findSubstring(String s, String[] words) {
		
		List<Integer> res = new ArrayList();
		HashMap<String,Integer> map = new HashMap();
		//coner case
		if(s.equals("")||words.length==0){
			return res;
		}
		
		for(String word:words){
			if(map.containsKey(word)){
				map.put(word, map.get(word)+1);
			}
			else{
				map.put(word, 1);
			}
		}
		//length
		int l = words[0].length();		
	
		int w= words.length;
		
		// 迭代字母
		for (int i = 0; i <= s.length() - w * l; i++) {
			int count = 0;
			HashMap<String, Integer> curMap = new HashMap();
			//单词迭代
			for (int j = 0; j < w; j++) {
				String tmp = s.substring(i + j * l, i + (j + 1) * l);

				if (map.containsKey(tmp)) {

					if (curMap.containsKey(tmp)) {
						curMap.put(tmp, curMap.get(tmp) + 1);
					} else {
						curMap.put(tmp, 1);
					}

					if (curMap.get(tmp) <= map.get(tmp)) {
						count++;
					} else {
						break;					
					}
				} else {
					curMap.clear();
					count = 0;
					break;
				}
			}
			if (count == w) {
				res.add(i);
			}
		}
		return res;
	}

Runtime: 96 ms, faster than 31.44% of Java online submissions forSubstring with Concatenation of All Words.

Memory Usage: 41 MB, less than 45.24% of Java online submissions forSubstring with Concatenation of All Words.

还是挺慢的。网上还看到了滑动窗口的解法。

2.2 滑动窗口

今天补充上9.12，目前使用这个算法超出我的能力了，看了大神的文章。

上面的一样：变化的是用left来记录左边界的位置，count 表示当前已经匹配的单词的个数。

特殊情况处理：如：wordgoodgoodgoodbestword，words= {"word","good","best","good"}.

这种情况下good出现第三次的时候。要重头开始循环，没截取一个单词，都对应的curmap 减去。如果curmap对应的value比map小，计数的count还要减去。为啥要用循环的？

我开始想，直接保留一个good,再循环不是更快嘛？知道遇到abababab 的case。{"a","b","a"}

在匹配完aba之后，就是bab,b第二次出现的时候，大于了map出现1次。这时候就不好处理了，不知道重复的元素在哪里？所以要用循环，从left开始匹配，再相应的处理。我觉得这里就是本题的难点所在。

如果某个时刻 count 和 w相等了，说明我们成功匹配了一个位置，那么将当前左边界 left 存入结果 res 中，此时去掉最左边的一个词，同时 count 自减1，左边界右移 l，继续匹配。如果我们匹配到一个不在 map 中的词，那么说明跟前面已经断开了，我们重置 curmap，count 为0，左边界left移到 j+l.

class Solution {
    public List<Integer> findSubstring(String s, String[] words) {
       Set<Integer> res = new HashSet();
		HashMap<String,Integer> map = new HashMap();
		//coner case
		if(s.equals("")||words.length==0){
			return new ArrayList(res);
		}
		
		for(String word:words){
			if(map.containsKey(word)){
				map.put(word, map.get(word)+1);
			}
			else{
				map.put(word, 1);
			}
		}
		//length
		int l = words[0].length();		
	
		int w= words.length;
		
		// 迭代字母
		for (int i = 0; i <= l; i++) {
			
			 int left = i, count = 0;
			HashMap<String, Integer> curMap = new HashMap();
			//单词迭代
			for (int j = i; j <= s.length()-l; j=j+l) {
				String tmp = s.substring(j , j + l);

				if (map.containsKey(tmp)) {

					if (curMap.containsKey(tmp)) {
						curMap.put(tmp, curMap.get(tmp) + 1);
					} else {
						curMap.put(tmp, 1);
					}

					if (curMap.get(tmp) <= map.get(tmp)) {
						count++;
					} else {
						while(curMap.get(tmp)>map.get(tmp)){
							String t1 = s.substring(left,left+l);
							if(curMap .containsKey(t1)){
								curMap.put(t1, curMap.get(t1)-1);
								if(curMap.get(t1)<map.get(t1) ){
									count --;
								}
							}
							left = left+l;
						}				
					}
					if (count == w) {
						res.add(left);
						count --;
						String t2 =  s.substring(left,left+l);
						if(curMap.containsKey(t2))
						curMap.put(t2, curMap.get(t2)-1);
						//左边界右移
						left = left+l;
					}
					
				} else {//断开
					curMap.clear();
					count = 0;
					left = j+l;
				}
			}
			
		}
		return new ArrayList(res);          
    }
}

Runtime: 12 ms, faster than 78.90% of Java online submissions forSubstring with Concatenation of All Words.

Memory Usage: 39.4 MB, less than 92.86% of Java online submissions forSubstring with Concatenation of All Words.

时间复杂度是O(N)的。

参考：

https://blog.csdn.net/linhuanmars/article/details/20342851

还需要不断练习。