LeetCode 30. 串联所有单词的子串（难）

这道题可以用滑动窗口的方式来解决，窗口的大小是words数组里面所有单词的总和。因为字符串s如果有子串能完全匹配words，那肯定是words中的每个单词都要出现若干次的(words里面的单词可能有重复)。这样一来，我们就不用排列组合words中的一个一个字符串了。

题目

给定一个字符串 s 和一些长度相同的单词 words。找出 s 中恰好可以由 words 中所有单词串联形成的子串的起始位置。注意子串要与 words 中的单词完全匹配，中间不能有其他字符，但不需要考虑 words 中单词串联的顺序。

示例

示例 1：

输入：
  s = "barfoothefoobarman",
  words = ["foo","bar"]
输出：[0,9]

解释：从索引 0 和 9 开始的子串分别是 "barfoo" 和 "foobar" 。输出的顺序不重要, [9,0] 也是有效答案。

示例 2：

输入：
  s = "wordgoodgoodgoodbestword",
  words = ["word","good","best","word"]
输出：[]

考察知识点

字符串匹配、哈希表、双指针、字符串、滑动窗口

核心思想

方法一、先确定可能的组合形式，再一个一个的匹配。

思路简单，但是时间复杂度太大。

方法二、滑动窗口

这道题可以用滑动窗口的方式来解决，窗口的大小是words数组里面所有单词的总和。因为字符串s如果有子串能完全匹配words，那肯定是words中的每个单词都要出现若干次的(words里面的单词可能有重复)。这样一来，我们就不用排列组合words中的一个一个字符串了。以barfoothefoobarman为例，words=["foo","bar"]，那么窗口大小就是6(foo+bar的长度)。

我们从字符串s的开始位置起，每次截取6位长度记做tmp，跟words的单词比较。words中的单词都放到一个map中，key是单词，value是这个单词出现的次数。我们比较tmp和map是否完全匹配，如果是就找到了一个索引位置。我们按照这种方式迭代完整个数组，就可以求出解了。

代码概述：

1、几种特判情况，s为空，words为空。直接返回[]。
2、利用dict构造hash map，将每个单词以及出现的频率记录到字典中。
3、先获取单个word的长度，再计算所有word组成的字符串的长度。
4、开始循环，s中在最后all_words_size个字符，长度都不够all_words_size，不可能满足条件，所以循环终点在 len(s) - all_words_size + 1。
5、每次取tmp_str中长为all_words_size的子串，将子串和临时字典进行比较，如果是就将临时字典记录的频率-1，如果不在就跳出循环。当这个word记录频率为0时，删除该word。
6、当内层循环遍历完后，如果临时字典为空则表示全部匹配上了，则记录数组的下标。
7、循环结束。

参考连接

Python版本

方法一的实现

148 / 173 个通过测试用例。

class Solution:

    def strStrs(self, haystack: str, needle: str) -> list:
        res = []
        # count = 0
        if len(haystack) == 0:
            return res
        last_index = 0
        is_first = True
        while haystack:
            tmp_index = self.strStr(haystack, needle)
            if tmp_index != -1:  # 如果找到了就保存到res里面
                if is_first:  # 第一次
                    res.append(tmp_index)
                    last_index = tmp_index
                    is_first = False
                else:
                    res.append(tmp_index + last_index + 1)
                    last_index = tmp_index + last_index + 1
                haystack = haystack[tmp_index+1: ]
                # count += 1
            else:  # 如果一次都找不到，就直接返回[]
                return res
        return res

    def strStr(self, haystack: str, needle: str) -> int:
        # Func: 计算偏移表
        def calShiftMat(st):
            dic = {}
            for i in range(len(st)-1,-1,-1):  # 从后往前计算
                if not dic.get(st[i]):
                    dic[st[i]] = len(st)-i
            dic["ot"] = len(st)+1
            return dic

        needle_len = len(needle)
        haystack_len = len(haystack)

        # 其他情况判断
        if needle_len > haystack_len:return -1
        if needle=="": return 0
       
        # 偏移表预处理
        dic = calShiftMat(needle)
        idx = 0

        while idx+needle_len <= haystack_len:
            # 待匹配字符串
            str_cut = haystack[idx:idx+needle_len]
            # 判断是否匹配，如果匹配了就把答案填在res里面，并递归调用，直到再也找不到为止。
            if str_cut==needle:
                return idx
            else:
                # 边界处理
                if idx+needle_len >= haystack_len:
                    return -1
                # 不匹配情况下，根据下一个字符的偏移，移动idx
                cur_c = haystack[idx+needle_len]
                if dic.get(cur_c):
                    idx += dic[cur_c]
                else:
                    idx += dic["ot"]

        return -1 if idx+needle_len >= haystack_len else idx

    def permutations(self, n):
        if n == 0:
            return []
        if n == 1:
            return [[0]]
        res = []
        indices = list(range(n))
        # print(tuple(indices))
        res.append(tuple(indices))
        cycles = list(range(n, 0, -1))
        while n:
            for i in reversed(range(n)):
                cycles[i] -= 1
                if cycles[i] == 0:
                    indices[i:] = indices[i+1:] + indices[i:i+1]
                    cycles[i] = n - i
                else:
                    j = cycles[i]
                    indices[i], indices[-j] = indices[-j], indices[i]
                    # print(tuple(indices))
                    res.append(tuple(indices))
                    break
            else:
                return res

    def findSubstring(self, s: str, words: list) -> list:

        prs = self.permutations(len(words))

        words_combinations = []
        for permutation in prs:
            tmp_str = ""
            for i in permutation:
                tmp_str += words[i]
            words_combinations.append(tmp_str)
        
        words_combinations = set(words_combinations)  # 去一下重复

        res = []
        for word in words_combinations:
            _index = self.strStrs(s, word)
            if len(_index):
                res += _index

        return res


print("leet code accept!!!")
Input = ["abababab", "a", "", "foobarfoobar", "barfoothefoobarman", "wordgoodgoodgoodbestword"]
Input1 = [["a","b"], ["a"], ["a"], ["foo","bar"], ["foo", "bar"], ["word","good","best","word"]]
Answer = [[0,1,2,3,4,5,6], [0], [], [0, 3, 6], [0,9], []]

if __name__ == "__main__":
    solution = Solution()
    for i in range(len(Input)):
        print("-"*50)
        result = solution.findSubstring(Input[i], Input1[i])
        result.sort()
        print(result)
        print(Answer[i])
        print(result==Answer[i])

方法二的实现

# 滑动窗口
class Solution2:
    def findSubstring(self, s: str, words: list) -> list:
        # 几种特判情况，s为空，words为空。直接返回[]。
        if not s or len(s)==0 or not words or len(words)==0:
            return []
        # 利用dict构造hash map，将每个单词以及出现的频率记录到字典中。
        words_map,res = dict(),[]
        for i in words:
            if i not in words_map:
                words_map[i] = 1
            else:
                words_map[i] += 1
        one_word_size = len(words[0])  # 获取单个word的长度
        all_words_size = len(words)*one_word_size  # 计算所有word组成的字符串的长度
        s_len = len(s)
        for i in range(s_len-all_words_size+1):  # s中在最后all_words_size个字符，长度都不够all_words_size，不可能满足条件，所以循环终点在 len(s) - all_words_size + 1。
            # 每次取 all_words_size长度的子串
            tmp_str, d = s[i:i+all_words_size], dict(words_map)
            # 将子串和临时字典进行比较
            for j in range(0, len(tmp_str), one_word_size):  # one_word_size个one_word_size个地从子串tmp_str中取出one_word_size长度的子串，看是否出现在临时字典中
                # 如果是就将临时字典记录的频率-1，如果不在就跳出循环
                key = tmp_str[j:j+one_word_size]
                if key in d:
                    d[key] -= 1
                    if d[key]==0:
                        del d[key]
                else:
                    break
            if not d:  # 当内层循环遍历完后，如果临时字典为空则表示全部匹配上了，记录数组的下标。
                res.append(i)
        return res  # 返回结果

时间复杂度\(O(N*M)\)

有效语法糖

1、注意Python中的赋值，可变对象（list、set、dict）和不可变对象（int、float、string）。

1
2
3

words_map = {"a": 1, "b": 2}
d = words_map  # 这样赋值，d变化，words_map也会变化。
d = dict(words_map)  # 这样赋值，d变化，words_map不会变化。

LeetCode

数组与指针

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

LeetCode 31. 下一个排列（中）上一篇

LeetCode 29. 两数相除（中）下一篇