05 August 2008
Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character
). For each character they type except ‘#‘, you need to return the top 3 historical hot sentences that have prefix the same as the part of sentence already typed. Here are the specific rules:1
'#'
Your job is to implement the following functions:
The constructor function:
This is the constructor. The input is historical data. 1
AutocompleteSystem(String[] sentences, int[] times):
is a string array consists of previously typed sentences. 1
Sentences
is the corresponding times a sentence has been typed. Your system should record these historical data.1
Times
Now, the user wants to input a new sentence. The following function will provide the next character the user types:
The input 1
List<String> input(char c):
is the next character typed by the user. The character will only be lower-case letters (1
c
to 1
'a'
), blank space (1
'z'
) or a special character (1
' '
). Also, the previously typed sentence should be recorded in your system. The output will be the top 3 historical hot sentences that have prefix the same as the part of sentence already typed.1
'#'
Example:
Operation: AutocompleteSystem([“i love you”, “island”,”ironman”, “i love leetcode”], [5,3,2,2])
The system have already tracked down the following sentences and their corresponding times:
: 1
"i love you"
times1
5
: 1
"island"
times1
3
: 1
"ironman"
times1
2
: 1
"i love leetcode"
times1
2
Now, the user begins another search:
Operation: input(‘i’)
Output: [“i love you”, “island”,”i love leetcode”]
Explanation:
There are four sentences that have prefix
. Among them, “ironman” and “i love leetcode” have same hot degree. Since 1
"i"
has ASCII code 32 and 1
' '
has ASCII code 114, “i love leetcode” should be in front of “ironman”. Also we only need to output top 3 hot sentences, so “ironman” will be ignored.1
'r'
Operation: input(’ ‘)
Output: [“i love you”,”i love leetcode”]
Explanation:
There are only two sentences that have prefix
.1
"i "
Operation: input(‘a’)
Output: []
Explanation:
There are no sentences that have prefix
.1
"i a"
Operation: input(’#’)
Output: []
Explanation:
The user finished the input, the sentence
should be saved as a historical sentence in system. And the following input will be counted as a new search.1
"i a"
Note:
大体意思就是设计一个搜索的自动补全系统,它需要包含如下两个方法:
1) 暴力解法的思路(TLE):
2) Trie
前缀树包含节点和边,可以像树一样存储字符串。每个节点最多有多个子节点和边(根据存储的字符范围,比如全小写26个,全字符128或256个),边用于连接父节点和子节点,每条边对应 1 个存储的字符。
根据字符串中每个位置出现的字符,从上到下存储字符串。第一层存储字符串中出现的所有首字母,第二层存储字符串中第二个位置出现的所有字母。
前缀树常用于存储字典中的所有单词。数的每一个层表示字符串对应位置上的字符。从词根开始到一个叶节点,就是一个单词。
这道题需要修改普通前缀树的结构,给每个单词增加访问次数标记 times,它存储在对应单词的叶节点。
在方法 AutoComplete 中,将 sentences 数组中所有句子插入到单词查找树中,并在叶节点上添加每个句子出现的次数 times。
下图句子 “A”,”to”, “tea”, “ted”, “ten”, “i”, “in”,每个句子出现次数为 15, 7, 3, 4, 12, 11, 5, 9 的Trie。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
/**
将整个句子都视作单词加入字典树。前缀匹配回溯较多,修改字典树结点结构,使得每个结点维护其后计数最大的三个字符串,加速匹配。
*/
class AutocompleteSystem {
// 构建Trie
class Trie {
// 确定路径上的某个字符是否是某个单词(句子)的结尾字符
boolean isEnding;
int count;
String s;
Trie[] children = new Trie[27]; // 26个字符加‘ ’
// 小顶堆保存最大的k个
int k;
PriorityQueue<Trie> queue;
Trie(int k) {
this.k = k;
// min Heap
this.queue = new PriorityQueue<>((a, b) -> a.count == b.count ? b.s.compareTo(a.s) : a.count - b.count);
// 字典树的一条路径的最后一位存空格
this.children = new Trie[27];
}
private void insert(String word, int val) {
if (word.isEmpty()) {
return;
}
Trie temp = this;
// 记录路径上的节点
List<Trie> path = new LinkedList<>();
for (char c : word.toCharArray()) {
int index = (c == ' ') ? 26 : (c - 'a');
if (temp.children[index] == null) {
temp.children[index] = new Trie(k);
}
temp = temp.children[index];
path.add(temp);
}
// 结尾的字符特殊标记,并进行整个路径的计数更新
temp.isEnding = true;
temp.count += val;
temp.s = word;
// 关联终止节点到路径上的每个节点
for (Trie cur : path) {
// remove old value
if (cur.queue.contains(temp)) {
cur.queue.remove(temp);
}
// update new value
cur.queue.add(temp);
// 大于k个,将最小的弹出
while (cur.queue.size() > k) {
cur.queue.poll();
}
}
}
private void search(List<String> results) {
List<Trie> temp = new ArrayList<>();
// 加入堆中元素
while (!queue.isEmpty()) {
Trie trie = queue.poll();
temp.add(trie);
results.add(trie.s);
}
queue.addAll(temp);
Collections.reverse(results);
}
}
// 字典树
private final Trie root;
// 记录前一个节点
private Trie pre;
//记录历史字符串
private StringBuilder sb;
public AutocompleteSystem(String[] sentences, int[] times) {
this.root = new Trie(3); // 每条路径最多三个单词
this.pre = root;
sb = new StringBuilder();
for (int i = 0; i < sentences.length; i++) {
root.insert(sentences[i], times[i]);
}
}
public List<String> input(char c) {
List<String> results = new LinkedList<>();
// 遇到‘#’终止
if (c == '#') {
// 更新历史记录
saveHistory(sb.toString());
// 清空状态
pre = root;
sb = new StringBuilder();
return results;
}
// 尚未终止
// 记录历史
sb.append(c);
// 更新节点
if (pre != null) {
pre = (c == ' ') ? pre.children[26] : pre.children[c - 'a'];
}
// 搜索其后的所有值
if (pre != null) {
pre.search(results);
}
return results;
}
private void saveHistory(String s) {
root.insert(s, 1);
}
}
/**
* Your AutocompleteSystem object will be instantiated and called as such:
* AutocompleteSystem obj = new AutocompleteSystem(sentences, times);
* List<String> param_1 = obj.input(c);
*/