05 August 2008
Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.
Machine 1 (sender) has the function:
1
2
3
4
string encode(vector<string> strs) {
// ... your code
return encoded_string;
}
Machine 2 (receiver) has the function:
1
2
3
4
vector<string> decode(string s) {
//... your code
return strs;
}
So Machine 1 does:
1
string encoded_string = encode(strs);
and Machine 2 does:
1
vector<string> strs2 = decode(encoded_string);
strs2 in Machine 2 should be the same as strs in Machine 1.
Implement the encode and decode methods.
You are not allowed to solve the problem using any serialize methods (such as eval).
Example 1:
1
2
3
4
5
6
7
8
9
10
11
Input: dummy_input = ["Hello","World"]
Output: ["Hello","World"]
Explanation:
Machine 1:
Codec encoder = new Codec();
String msg = encoder.encode(strs);
Machine 1 ---msg---> Machine 2
Machine 2:
Codec decoder = new Codec();
String[] strs = decoder.decode(msg);
Example 2:
1
2
Input: dummy_input = [""]
Output: [""]
Constraints:
1
1 <= strs.length <= 200
1
0 <= strs[i].length <= 200
1
strs[i]
contains any possible characters out of 1
256
valid ASCII characters.Follow up: Could you write a generalized algorithm to work on any possible set of characters?
Approach 1: Non-ASCII Delimiter
Intuition
Naive solution here is to join strings using delimiters.
What to use as a delimiter? Each string may contain any possible characters out of 256 valid ascii characters.
Seems like one has to use non-ASCII unichar character, for example
in Python and 1
unichr(257)
in Java (it’s character 1
Character.toString((char)257)
).1
ā
Here it’s convenient to use two different non-ASCII characters, to distinguish between situations of “empty array” and of “array of empty strings”.
Implementation
Use
in Java with a second argument 1
split
to make it work as 1
-1
in Python.1
split
Complexity Analysis
Approach 2: Chunked Transfer Encoding
Pay attention to this approach because last year Google likes to ask that sort of low-level optimisation. Serialize and deserialize BST problem is a similar example.
This approach is based on the encoding used in HTTP v1.1. It doesn’t depend on the set of input characters, and hence is more versatile and effective than Approach 1.
Data stream is divided into chunks. Each chunk is preceded by its size in bytes.
Encoding Algorithm
Decoding Algorithm
1
i
initiated as 0. While 1
i < n
:
1
s[i: i + 4]
. It’s chunk size in bytes. Convert this 4-bytes string to integer 1
length
.1
i += 4
.1
s[i: i + length]
.1
length
bytes 1
i += length
.Complexity Analysis
1.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public class Codec {
// Encodes a list of strings to a single string.
public String encode(List<String> strs) {
if (strs.size() == 0) return Character.toString((char)258);
String d = Character.toString((char)257);
StringBuilder sb = new StringBuilder();
for(String s: strs) {
sb.append(s);
sb.append(d);
}
sb.deleteCharAt(sb.length() - 1);
return sb.toString();
}
// Decodes a single string to a list of strings.
public List<String> decode(String s) {
String d = Character.toString((char)258);
if (s.equals(d)) return new ArrayList();
d = Character.toString((char)257);
return Arrays.asList(s.split(d, -1));
}
}
// Your Codec object will be instantiated and called as such:
// Codec codec = new Codec();
// codec.decode(codec.encode(strs));
2.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
public class Codec {
// Encodes string length to bytes string
public String intToString(String s) {
int x = s.length();
char[] bytes = new char[4];
for(int i = 3; i > -1; --i) {
bytes[3 - i] = (char) (x >> (i * 8) & 0xff);
}
return new String(bytes);
}
// Encodes a list of strings to a single string.
public String encode(List<String> strs) {
StringBuilder sb = new StringBuilder();
for(String s: strs) {
sb.append(intToString(s));
sb.append(s);
}
return sb.toString();
}
// Decodes bytes string to integer
public int stringToInt(String bytesStr) {
int result = 0;
for(char b : bytesStr.toCharArray())
result = (result << 8) + (int)b;
return result;
}
// Decodes a single string to a list of strings.
public List<String> decode(String s) {
int i = 0, n = s.length();
List<String> output = new ArrayList();
while (i < n) {
int length = stringToInt(s.substring(i, i + 4));
i += 4;
output.add(s.substring(i, i + length));
i += length;
}
return output;
}
}