记 Java DataInputStream.readUTF() 的一个小坑

记录今天遇到的 DataInputStream.readUTF() 的一个小坑。

问题描述

最近在看 Flutter 和 Dart。感觉 Dart 上手应该比较容易,所以准备动手写 Dart 代码 (Dart Socket) 访问我们 app 后台接口的。但考虑到实际后台接口是使用私有的协议,其中的数据变换和加解密会带来一些不必要的工作,不易快速实现,所以我自己写了个简单的 Java Server (Java 是我最熟悉的,写起来快) 来验证 Dart Client。说是 Server,其实就是简单地接收 Client 发来的字符串,转换成大写形式,然后发送给 Client。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
public class SimpleServer {

public static void main(String[] args) throws Exception {

final ExecutorService e = Executors.newFixedThreadPool(10);

ServerSocket ss = new ServerSocket();
ss.bind(new InetSocketAddress("127.0.0.1", 6760));

System.out.println("Server started at port 6760");
while (true) {
e.execute(new ClientHandler(ss.accept()));
}
}

private static class ClientHandler implements Runnable {

private final Socket socket;

private final DataInputStream input;
private final DataOutputStream output;

private ClientHandler(Socket socket) throws IOException {
this.socket = socket;
input = new DataInputStream(socket.getInputStream());
output = new DataOutputStream((socket.getOutputStream()));
}

@Override
public void run() {
System.out.println("A client coming " + socket.getInetAddress() + ":" + socket.getPort());

try {
String echo = input.readUTF();
System.out.println("Client said: " + echo);
output.writeUTF(echo.toUpperCase());
output.flush();
System.out.println("Server said: " + echo.toUpperCase());
} catch (IOException e) {
e.printStackTrace();
} finally {
close(input);
close(output);
close(socket);
}

System.out.println("This client has leave " + socket.getInetAddress() + ":" + socket.getPort());
}
}

private static void close(Closeable closeable) {
if (closeable != null) {
try {
closeable.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

再用 Dart 写一个 Client (参考自 Stack Overflow)。代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import 'dart:async';
import 'dart:convert';
import 'dart:io';
main() async {
Socket socket = await Socket.connect('127.0.0.1', 6760);
print('connected');

// listen to the received data event stream
socket.listen((List<int> event) {
print(event);
print(utf8.decode(event));
});

// send hello
socket.add(utf8.encode('hello'));

// wait 5 seconds
await Future.delayed(Duration(seconds: 5));

// .. and close the socket
socket.close();
}

看起来也不难,无奈调试发现一个奇怪的问题: Server 端可以看到 Client 端连上来,但死活收不到数据,阻塞在 String echo = input.readUTF() 这里。

毫无头绪,于用 Java 写个类似的 Client 验证, Server 和 Client 之间收发数据完全正常。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
public class SimpleClient {

public static void main(String[] args) throws IOException {
Socket s = new Socket();

s.connect(new InetSocketAddress("127.0.0.1", 6760));

DataInputStream input = new DataInputStream(new BufferedInputStream(s.getInputStream()));
DataOutputStream output = new DataOutputStream(new BufferedOutputStream(s.getOutputStream()));

output.writeUTF("hello");
output.flush();

System.out.println("received: " + input.readUTF());


close(input);
close(output);
close(s);
}

private static void close(Closeable closeable) {
if (closeable != null) {
try {
closeable.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

排查问题各种折腾偶然发现另一个现象:Dart Client 打印出 Java Server 发送的数据时有一个多出来的乱码。

Java Server 使用 DataOutputStream.writeUTF() 发送数据:

1
2
output.writeUTF("HELLO");
output.flush();

Dart Client 使用 socket.listen() 接收数据并打印:

1
2
3
4
socket.listen((List<int> event) {
print(utf8.decode(event)); // HELLO 前面有一个乱码
print(event); // [0, 5, 72, 69, 76, 76, 79]
}

-w446

eventList<int>,其中后面几个数字容易理解,就是 “HELLO” 对应的 ASCII 码。05 又是什么?

readUTF() 和 writeUTF()

基于这两个现象,

  • Java Client 和 Java Server 可以正常通信,Dart Client 就不行
  • DataOutputStream.writeUTF() 发送的数据会多出一些字节

我看了下 DataOutputStream.writeUTF() 的文档和代码。(其实稍加仔细看下文档就能明白问题所在了)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/**
* Writes a string to the underlying output stream using
* <a href="DataInput.html#modified-utf-8">modified UTF-8</a>
* encoding in a machine-independent manner.
* <p>
* First, two bytes are written to the output stream as if by the
* <code>writeShort</code> method giving the number of bytes to
* follow. This value is the number of bytes actually written out,
* not the length of the string. Following the length, each character
* of the string is output, in sequence, using the modified UTF-8 encoding
* for the character. If no exception is thrown, the counter
* <code>written</code> is incremented by the total number of
* bytes written to the output stream. This will be at least two
* plus the length of <code>str</code>, and at most two plus
* thrice the length of <code>str</code>.
*
* @param str a string to be written.
* @exception IOException if an I/O error occurs.
*/
public final void writeUTF(String str) throws IOException {
writeUTF(str, this);
}

static int writeUTF(String str, DataOutput out) throws IOException {
int strlen = str.length();
int utflen = 0;
int c, count = 0;

...
out.write(bytearr, 0, utflen+2);
return utflen + 2;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

/**
* See the general contract of the <code>readUTF</code>
* method of <code>DataInput</code>.
* <p>
* Bytes
* for this operation are read from the contained
* input stream.
*
* @return a Unicode string.
* @exception EOFException if this input stream reaches the end before
* reading all the bytes.
* @exception IOException the stream has been closed and the contained
* input stream does not support reading after close, or
* another I/O error occurs.
* @exception UTFDataFormatException if the bytes do not represent a valid
* modified UTF-8 encoding of a string.
* @see java.io.DataInputStream#readUTF(java.io.DataInput)
*/
public final String readUTF() throws IOException {
return readUTF(this);
}

/**
* Reads from the
* stream <code>in</code> a representation
* of a Unicode character string encoded in
* <a href="DataInput.html#modified-utf-8">modified UTF-8</a> format;
* this string of characters is then returned as a <code>String</code>.
* The details of the modified UTF-8 representation
* are exactly the same as for the <code>readUTF</code>
* method of <code>DataInput</code>.
*
* @param in a data input stream.
* @return a Unicode string.
* @exception EOFException if the input stream reaches the end
* before all the bytes.
* @exception IOException the stream has been closed and the contained
* input stream does not support reading after close, or
* another I/O error occurs.
* @exception UTFDataFormatException if the bytes do not represent a
* valid modified UTF-8 encoding of a Unicode string.
* @see java.io.DataInputStream#readUnsignedShort()
*/
public final static String readUTF(DataInput in) throws IOException {
int utflen = in.readUnsignedShort();
...
}

文档明确提到:

  • DataOutputStream.writeUTF()DataInputStream.readUTF() 实际发送/接收的并不是 UTF-8 编码数据,而是一个修改过的 UTF-8 参考
  • 更要命的是,writeUTF() 发送实际 UTF-8 前先会以 “UnsignedShort” 形式发送其长度,占两个字节

第2点会导致几个问题:

  • writeUTF(str) 的返回值并不是 UTF-8 的长度,而是 UTF-8 的长度再加上 2
  • readUTF()writeUTF() 配合使用不会有问题,因为 readUTF() 会先调用 readUnsignedShort() 读取 UTF-8 的长度 (不妨为 utflen),然后再读取 utflen 个字节
  • 如果不是使用 readUTF() 读取数据,而是其他方法,就要小心了。至于其他编程语言,就更要小心了!

说到这里,就不难理解 05 是怎么来的了。简单描述一下。

  • Java Server 调用 DataOutputStream.writeUTF() 发送 “HELLO”
  • “HELLO” 的 UTF-8 形式为 [72, 69, 76, 76, 79]
  • 不过实际发送时还要考虑 UTF-8 的长度,所以发送的是 [0, 5, 72, 69, 76, 76, 79]
  • Dart Client 收到 [0, 5, 72, 69, 76, 76, 79]
  • Dart Client 解码时将 05 打印成乱码

之前另一个问题是:Server 端可以看到 Client 端连上来,但死活收不到数据,阻塞在 String echo = input.readUTF()。问题原因描述如下:

  • Dart Client 发送 “HELLO”
  • “HELLO” 的 UTF-8 形式为 [72, 69, 76, 76, 79]
  • Java Server 调用 DataInputStream.readUTF() 接收 “HELLO”
  • Java Server 会先调用 readUnsignedShort(),[72, 69] 作为 “UnsignedShort” 值为 18501
  • readUTF() 会尝试读取 18501 个字节,但实际上只能读取 3 个字节,所以一直阻塞
1
2
3
4
5
public static void main(String[] args) {
ByteBuffer bb = ByteBuffer.wrap(new byte[] {72, 69});
ShortBuffer sb = bb.asShortBuffer();
System.out.println(sb.get(0)); // [72, 69] 作为 short 值为 18501
}

总结

对 Java Server 进行修复,其代码避免调用 writeUTF()readUTF(),而是分别使用 write()read() 代替。

修复后的 Java Server 代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

public class SimpleServer {

public static void main(String[] args) throws Exception {

final ExecutorService e = Executors.newFixedThreadPool(10);

ServerSocket ss = new ServerSocket();
ss.bind(new InetSocketAddress("127.0.0.1", 6760));

System.out.println("Server started at port 6760");
while (true) {
e.execute(new ClientHandler(ss.accept()));
}
}

private static class ClientHandler implements Runnable {

private final Socket socket;

private final DataInputStream input;
private final DataOutputStream output;

private final byte[] buf;


private ClientHandler(Socket socket) throws IOException {
this.socket = socket;
input = new DataInputStream(socket.getInputStream());
output = new DataOutputStream((socket.getOutputStream()));
buf = new byte[1024];
}

@Override
public void run() {
System.out.println("A client coming " + socket.getInetAddress() + ":" + socket.getPort());

try {
int hasRead = input.read(buf);
if (hasRead > 0) {
String echo = new String(buf, 0, hasRead);
System.out.println("Client said: " + echo);
output.write(echo.toUpperCase().getBytes());
output.flush();
System.out.println("Server said: " + echo.toUpperCase());
} else {
throw new IOException("len is " + hasRead);
}

} catch (IOException e) {
e.printStackTrace();
} finally {
close(input);
close(output);
close(socket);
}

System.out.println("This client has leave " + socket.getInetAddress() + ":" + socket.getPort());
}
}

private static void close(Closeable closeable) {
if (closeable != null) {
try {
closeable.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

修复后的 Dart Client 如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import 'dart:async';
import 'dart:convert';
import 'dart:io';
main() async {
Socket socket = await Socket.connect('127.0.0.1', 6760);
print('connected');

// listen to the received data event stream
socket.listen((List<int> event) {
print(event);
print(utf8.decode(event));
});

// send hello
socket.add(utf8.encode('hello'));


// wait 5 seconds
await Future.delayed(Duration(seconds: 5));

// .. and close the socket
socket.close();
}