关于 Postgresql zone 变成 0 时区的问题排查
背景
使用 Postgresql Jdbc Driver 读取大量数据
1
2
3
4
5<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.6.0</version>
</dependency>读取的数据中包含 timestamp with time zone 与 time with time zone
Pg Server 是 +08:00 时区
问题描述
1 | public class PostgresqlReadZoneDemo { |
- 写入 50 条数据,其中包含 int(主键), time with time zone, timestamp with time zone
set time zone '+08'
- 分批次(batchSize = 5)读取数据结果的 timezone 信息分为了两批
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 0 ORDER BY id LIMIT 5, id = 0, limit = 5
1, 20:56:13.176+08, 2024-01-14 20:56:13.176+08
2, 21:56:13.176+08, 2024-01-15 20:56:13.176+08
3, 22:56:13.176+08, 2024-01-16 20:56:13.176+08
4, 23:56:13.176+08, 2024-01-17 20:56:13.176+08
5, 00:56:13.176+08, 2024-01-18 20:56:13.176+08
SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 6 ORDER BY id LIMIT 5, id = 6, limit = 5
6, 01:56:13.176+08, 2024-01-19 20:56:13.176+08
7, 02:56:13.176+08, 2024-01-20 20:56:13.176+08
8, 03:56:13.176+08, 2024-01-21 20:56:13.176+08
9, 04:56:13.176+08, 2024-01-22 20:56:13.176+08
10, 05:56:13.176+08, 2024-01-23 20:56:13.176+08
SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 11 ORDER BY id LIMIT 5, id = 11, limit = 5
11, 06:56:13.176+08, 2024-01-24 20:56:13.176+08
12, 07:56:13.176+08, 2024-01-25 20:56:13.176+08
13, 08:56:13.176+08, 2024-01-26 20:56:13.176+08
14, 09:56:13.176+08, 2024-01-27 20:56:13.176+08
15, 10:56:13.176+08, 2024-01-28 20:56:13.176+08
SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 16 ORDER BY id LIMIT 5, id = 16, limit = 5
16, 11:56:13.176+08, 2024-01-29 20:56:13.176+08
17, 12:56:13.176+08, 2024-01-30 20:56:13.176+08
18, 13:56:13.176+08, 2024-01-31 20:56:13.176+08
19, 14:56:13.176+08, 2024-02-01 20:56:13.176+08
20, 15:56:13.176+08, 2024-02-02 20:56:13.176+08
SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 21 ORDER BY id LIMIT 5, id = 21, limit = 5
21, 16:56:13.176+08, 2024-02-03 20:56:13.176+08
22, 17:56:13.176+08, 2024-02-04 20:56:13.176+08
23, 18:56:13.176+08, 2024-02-05 20:56:13.176+08
24, 19:56:13.176+08, 2024-02-06 20:56:13.176+08
25, 20:56:13.176+08, 2024-02-07 20:56:13.176+08
SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 26 ORDER BY id LIMIT 5, id = 26, limit = 5
26, 13:56:13.176+00, 2024-02-08 12:56:13.176+00
27, 14:56:13.176+00, 2024-02-09 12:56:13.176+00
28, 15:56:13.176+00, 2024-02-10 12:56:13.176+00
29, 16:56:13.176+00, 2024-02-11 12:56:13.176+00
30, 17:56:13.176+00, 2024-02-12 12:56:13.176+00
SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 31 ORDER BY id LIMIT 5, id = 31, limit = 5
31, 18:56:13.176+00, 2024-02-13 12:56:13.176+00
32, 19:56:13.176+00, 2024-02-14 12:56:13.176+00
33, 20:56:13.176+00, 2024-02-15 12:56:13.176+00
34, 21:56:13.176+00, 2024-02-16 12:56:13.176+00
35, 22:56:13.176+00, 2024-02-17 12:56:13.176+00
SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 36 ORDER BY id LIMIT 5, id = 36, limit = 5
36, 23:56:13.176+00, 2024-02-18 12:56:13.176+00
37, 00:56:13.176+00, 2024-02-19 12:56:13.176+00
38, 01:56:13.176+00, 2024-02-20 12:56:13.176+00
39, 02:56:13.176+00, 2024-02-21 12:56:13.176+00
40, 03:56:13.176+00, 2024-02-22 12:56:13.176+00
SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 41 ORDER BY id LIMIT 5, id = 41, limit = 5
41, 04:56:13.176+00, 2024-02-23 12:56:13.176+00
42, 05:56:13.176+00, 2024-02-24 12:56:13.176+00
43, 06:56:13.176+00, 2024-02-25 12:56:13.176+00
44, 07:56:13.176+00, 2024-02-26 12:56:13.176+00
45, 08:56:13.176+00, 2024-02-27 12:56:13.176+00
SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 46 ORDER BY id LIMIT 5, id = 46, limit = 5
46, 09:56:13.176+00, 2024-02-28 12:56:13.176+00
47, 10:56:13.176+00, 2024-02-29 12:56:13.176+00
48, 11:56:13.176+00, 2024-03-01 12:56:13.176+00
49, 12:56:13.176+00, 2024-03-02 12:56:13.176+00
50, 13:56:13.176+00, 2024-03-03 12:56:13.176+00id <= 25
数据是正确的 time zone =+08:00
id > 25
数据是也是正确的,但是 time zone =+00:00
排查问题思路
为什么会是 0 时区?
其实这已经是一个查询到中间,已经定位到是 driver 返回返回数据问题时候的场景了。
这个时候剩余的工作就是抓包
1 | sudo tcpdump -i any -n -w postgres_traffic.pcap 'port 5432' |
抓到的包在 postgres_traffic.pcap
中扔到 wireshark 中分析,在比较正确与错误数据的时候可以分析出来点问题
可以从上面解析结果得到(16 进制 byte[] 直接 new String() 是 pg driver 中的文本协议解析逻辑)
1 | 20 |
而在错误 timezone 中可以看到要求返回的 result format 格式已经变为 Binary 了
结果也是 binary
如果将这个结果带入 pg driver 的解析逻辑中,会发现 server 传递的实际上是 UTC timestamp,而 0 时区则是我们本地设置的时区
这个时候问题的推断就到了为什么我们本地是 0 时区的问题上来了
通过 PgResultSet#getString
中代码可以追下去
时区在 pgDriver 中是在 QueryExecutorImpl#receiveParameterStatus
中接收到 param 的 TimeZone
时候设置的,默认是以本地 JVM 时区为准,然后 set timezone to '+08'
会通过 TimestampUtils.parseBackendTimeZone
设置上
问题就出在 set timezone to '+08'
上
在这里面就会发现实际正确的格式应该是 set timezone to 'GMT-08'
,如果不带 GMT
会解析出错,导致最终转换时候成 0 时区数据,而 -08 是 PG 独有的参考 http://www.postgres.cn/docs/14/datetime-posix-timezone-specs.html
时区与实际我们认识到的正好相反,东八区是 -08 时区
为什么不是所有的数据都出错?
在上述抓包过程中我们可以看到,实际上如果是使用文本协议来传输的数据是没有问题的,而使用二进制传输的话数据会存在问题。
问题就在影响二进制协议或文本协议的参数在 pg 中的描述二进制协议是默认开启的
不断的 Debug 之后在协议交换的时候发现了一些端倪在 preferQueryMode
这个参数中
当 preferQueryMode=extended
(默认) 的时候,pg driver 会在执行到五次(默认)相同的 prepareStatement 之后尝试缓存 ps,并将 ps 协议换为 binary 协议与 server 端交互传输,这就导致了在执行五次 SQL 之后返回的结果均是 binary 的协议
最后怎么解决?
set timezone to 'GMT-8'
- 或者可以将协议完全更改为文本协议
binaryTransfer=false
总结
抓包结合 driver 代码是解决多数复杂数据库问题的良方