关于 Postgresql zone 变成 0 时区的问题排查
背景
- 使用 Postgresql Jdbc Driver 读取大量数据 - 1 
 2
 3
 4
 5- <dependency> 
 <groupId>org.postgresql</groupId>
 <artifactId>postgresql</artifactId>
 <version>42.6.0</version>
 </dependency>
- 读取的数据中包含 timestamp with time zone 与 time with time zone 
- Pg Server 是 +08:00 时区 
问题描述
| 1 | public class PostgresqlReadZoneDemo { | 
- 写入 50 条数据,其中包含 int(主键), time with time zone, timestamp with time zone
- set time zone '+08'
- 分批次(batchSize = 5)读取数据结果的 timezone 信息分为了两批1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 0 ORDER BY id LIMIT 5, id = 0, limit = 5 
 1, 20:56:13.176+08, 2024-01-14 20:56:13.176+08
 2, 21:56:13.176+08, 2024-01-15 20:56:13.176+08
 3, 22:56:13.176+08, 2024-01-16 20:56:13.176+08
 4, 23:56:13.176+08, 2024-01-17 20:56:13.176+08
 5, 00:56:13.176+08, 2024-01-18 20:56:13.176+08
 SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 6 ORDER BY id LIMIT 5, id = 6, limit = 5
 6, 01:56:13.176+08, 2024-01-19 20:56:13.176+08
 7, 02:56:13.176+08, 2024-01-20 20:56:13.176+08
 8, 03:56:13.176+08, 2024-01-21 20:56:13.176+08
 9, 04:56:13.176+08, 2024-01-22 20:56:13.176+08
 10, 05:56:13.176+08, 2024-01-23 20:56:13.176+08
 SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 11 ORDER BY id LIMIT 5, id = 11, limit = 5
 11, 06:56:13.176+08, 2024-01-24 20:56:13.176+08
 12, 07:56:13.176+08, 2024-01-25 20:56:13.176+08
 13, 08:56:13.176+08, 2024-01-26 20:56:13.176+08
 14, 09:56:13.176+08, 2024-01-27 20:56:13.176+08
 15, 10:56:13.176+08, 2024-01-28 20:56:13.176+08
 SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 16 ORDER BY id LIMIT 5, id = 16, limit = 5
 16, 11:56:13.176+08, 2024-01-29 20:56:13.176+08
 17, 12:56:13.176+08, 2024-01-30 20:56:13.176+08
 18, 13:56:13.176+08, 2024-01-31 20:56:13.176+08
 19, 14:56:13.176+08, 2024-02-01 20:56:13.176+08
 20, 15:56:13.176+08, 2024-02-02 20:56:13.176+08
 SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 21 ORDER BY id LIMIT 5, id = 21, limit = 5
 21, 16:56:13.176+08, 2024-02-03 20:56:13.176+08
 22, 17:56:13.176+08, 2024-02-04 20:56:13.176+08
 23, 18:56:13.176+08, 2024-02-05 20:56:13.176+08
 24, 19:56:13.176+08, 2024-02-06 20:56:13.176+08
 25, 20:56:13.176+08, 2024-02-07 20:56:13.176+08
 SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 26 ORDER BY id LIMIT 5, id = 26, limit = 5
 26, 13:56:13.176+00, 2024-02-08 12:56:13.176+00
 27, 14:56:13.176+00, 2024-02-09 12:56:13.176+00
 28, 15:56:13.176+00, 2024-02-10 12:56:13.176+00
 29, 16:56:13.176+00, 2024-02-11 12:56:13.176+00
 30, 17:56:13.176+00, 2024-02-12 12:56:13.176+00
 SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 31 ORDER BY id LIMIT 5, id = 31, limit = 5
 31, 18:56:13.176+00, 2024-02-13 12:56:13.176+00
 32, 19:56:13.176+00, 2024-02-14 12:56:13.176+00
 33, 20:56:13.176+00, 2024-02-15 12:56:13.176+00
 34, 21:56:13.176+00, 2024-02-16 12:56:13.176+00
 35, 22:56:13.176+00, 2024-02-17 12:56:13.176+00
 SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 36 ORDER BY id LIMIT 5, id = 36, limit = 5
 36, 23:56:13.176+00, 2024-02-18 12:56:13.176+00
 37, 00:56:13.176+00, 2024-02-19 12:56:13.176+00
 38, 01:56:13.176+00, 2024-02-20 12:56:13.176+00
 39, 02:56:13.176+00, 2024-02-21 12:56:13.176+00
 40, 03:56:13.176+00, 2024-02-22 12:56:13.176+00
 SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 41 ORDER BY id LIMIT 5, id = 41, limit = 5
 41, 04:56:13.176+00, 2024-02-23 12:56:13.176+00
 42, 05:56:13.176+00, 2024-02-24 12:56:13.176+00
 43, 06:56:13.176+00, 2024-02-25 12:56:13.176+00
 44, 07:56:13.176+00, 2024-02-26 12:56:13.176+00
 45, 08:56:13.176+00, 2024-02-27 12:56:13.176+00
 SELECT id, time_with_time_zone, timestamp_with_time_zone FROM test_schema.test_time_zone WHERE id >= 46 ORDER BY id LIMIT 5, id = 46, limit = 5
 46, 09:56:13.176+00, 2024-02-28 12:56:13.176+00
 47, 10:56:13.176+00, 2024-02-29 12:56:13.176+00
 48, 11:56:13.176+00, 2024-03-01 12:56:13.176+00
 49, 12:56:13.176+00, 2024-03-02 12:56:13.176+00
 50, 13:56:13.176+00, 2024-03-03 12:56:13.176+00id <= 25数据是正确的 time zone =+08:00id > 25数据是也是正确的,但是 time zone =+00:00
排查问题思路
为什么会是 0 时区?
其实这已经是一个查询到中间,已经定位到是 driver 返回返回数据问题时候的场景了。
这个时候剩余的工作就是抓包
| 1 | sudo tcpdump -i any -n -w postgres_traffic.pcap 'port 5432' | 
抓到的包在 postgres_traffic.pcap中扔到 wireshark 中分析,在比较正确与错误数据的时候可以分析出来点问题
可以从上面解析结果得到(16 进制 byte[] 直接 new String() 是 pg driver 中的文本协议解析逻辑)
| 1 | 20 | 
而在错误 timezone 中可以看到要求返回的 result format 格式已经变为 Binary 了
结果也是 binary
如果将这个结果带入 pg driver 的解析逻辑中,会发现 server 传递的实际上是 UTC timestamp,而 0 时区则是我们本地设置的时区
这个时候问题的推断就到了为什么我们本地是 0 时区的问题上来了
通过 PgResultSet#getString中代码可以追下去
时区在 pgDriver 中是在 QueryExecutorImpl#receiveParameterStatus中接收到 param 的 TimeZone时候设置的,默认是以本地 JVM 时区为准,然后 set timezone to '+08'会通过 TimestampUtils.parseBackendTimeZone 设置上
问题就出在 set timezone to '+08'上
在这里面就会发现实际正确的格式应该是 set timezone to 'GMT-08',如果不带 GMT会解析出错,导致最终转换时候成 0 时区数据,而 -08 是 PG 独有的参考 http://www.postgres.cn/docs/14/datetime-posix-timezone-specs.html
时区与实际我们认识到的正好相反,东八区是 -08 时区
为什么不是所有的数据都出错?
在上述抓包过程中我们可以看到,实际上如果是使用文本协议来传输的数据是没有问题的,而使用二进制传输的话数据会存在问题。
问题就在影响二进制协议或文本协议的参数在 pg 中的描述二进制协议是默认开启的
不断的 Debug 之后在协议交换的时候发现了一些端倪在 preferQueryMode 这个参数中
当 preferQueryMode=extended(默认) 的时候,pg driver 会在执行到五次(默认)相同的 prepareStatement 之后尝试缓存 ps,并将 ps 协议换为 binary 协议与 server 端交互传输,这就导致了在执行五次 SQL 之后返回的结果均是 binary 的协议
最后怎么解决?
- set timezone to 'GMT-8'
- 或者可以将协议完全更改为文本协议 binaryTransfer=false
总结
抓包结合 driver 代码是解决多数复杂数据库问题的良方