摘要
关键词
大对象、大对象的插入
插入数据过程中内存上下文、虚拟文件描述符VFD、高速缓存和缓存池管理BufferDesc等模块之间的交互流程。
目标:结合任务 "gdb调试文本类大对象的插入流程" 串联存储相关的各模块交互流程
1 体系结构

2 读写元组的过程

3 gdb调式大对象的插入流程
3.1 工作环境介绍
os : Red Hat 4.8.5-11
gdb : GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
pg : 8.4.1
3.2 基本调试流程

# 1, start pg
[z00423965@localhost pgsql]$ pg_ctl start -D ./data
server starting
LOG: database system was shut down at 2019-06-27 00:50:55 CST
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
[z00423965@localhost pgsql]$ ps -ef | grep postg
z004239+ 1049 1 0 00:51 pts/7 00:00:00 /home/z00423965/pgsql/build/bin/postgres -D ./data # postmaster
z004239+ 1051 1049 0 00:51 ? 00:00:00 postgres: writer process # bg_writer
z004239+ 1052 1049 0 00:51 ? 00:00:00 postgres: wal writer process # wal_writer
z004239+ 1053 1049 0 00:51 ? 00:00:00 postgres: autovacuum launcher process # auto_vacuum
z004239+ 1054 1049 0 00:51 ? 00:00:00 postgres: stats collector process # pg_stat
# 2, start backend process for this pg_session
# two ssh terminals needed, said T1, T2
[z00423965@localhost pgsql]$ createdb test #@T1
[z00423965@localhost pgsql]$ psql test #@T1
test=# create table t_introduction(item char(256), introduction oid);
[z00423965@localhost pgsql]$ ps -ef | grep postg #@T2
... ...
z004239+ 1757 1049 0 01:01 ? 00:00:00 postgres: z00423965 test [local] idle # backend process for this pg_session
# 3, start gdb
[z00423965@localhost pgsql]$ gdb #@T2
(gdb) layout src #@T2
(gdb) attach 1757 #@T2
(gdb) b lo_import #@T2
(gdb) c #@T2
test=# insert into t_introduction values('pg', lo_import('/home/z00423965/pgsql/pg_introduction.txt')); #@T1
3.3 各模块的调用过程
3.3.1 大对象
大对象采用线外存储,表中的元组处存储 oid,在query及排序时,仅需要加载较少的数据,提升i/o性能,存储在 pg_largeobject 中,相应的处理函数为 inv_create inv_open inv_write inv_read
(gdb) b lo_import #@T2
Breakpoint 1 at 0x5ac062: file be-fsstubs.c, line 324.
(gdb) c
Continuing.
Breakpoint 1, lo_import (fcinfo=0x7fffd07c1980) at be-fsstubs.c:324
324 text *filename = PG_GETARG_TEXT_PP(0);
(gdb) ... ...
lo_import_internal (filename=0x1aced70, lobjOid=0) at be-fsstubs.c:354
350 LargeObjectDesc *lobj;
361 CreateFSContext(); # 创建 filesystem 内存上下文
367 fd = PathNameOpenFile(fnamebuf, O_RDONLY | PG_BINARY, 0666); # 打开 fd 涉及 vfd 机制
375 // create an inversion object
377 oid = inv_create(lobjOid); # 创建大对象 book. P75
380 // read in from the filesystem and write to the inversion object
382 lobj = inv_open(oid, INV_WRITE, fscxt); # 打开大对象 book. P75
384 while ((nbytes = FileRead(fd, buf, BUFSIZE)) > 0)
386 tmp = inv_write(lobj, buf, nbytes); # 写入大对象 book. P75
396 inv_close(lobj);
397 FileClose(fd);
(gdb) p *lobj
$2 = {id = 17090, snapshot = 0xb348a0 <SnapshotNowData>, subid = 1, offset = 488, flags = 3}
(gdb)
1673 ExecInsert(slot, tupleid, planSlot, dest, estate); # 继续执行 insert
test=# select * from pg_largeobject;
loid | pageno | data
17090 | 0 | xxxxxxxxxxxxxxxxxx
3.3.2 内存上下文 MemoryContext
内存上下文,C++中allocator的概念,每次内存申请都记录到对应的内存上下文中,调用结束后,可以通过释放上下文中的所有内存来避免内存泄漏。

# /*
# * LO "FD"s are indexes into the cookies array.
# * A non-null entry is a pointer to a LargeObjectDesc allocated in the
# * LO private memory context "fscxt". The cookies array itself is also
# * dynamically allocated in that context. Its current allocated size is # 大对象使用私有的内存上下文 fscxt 来分配空间存储大对象的 cookie 索引
# * cookies_len entries, of which any unused entries will be NULL.
# */
(gdb) b lo_import
361 CreateFSContext(); # 创建 filesystem 内存上下文
# static MemoryContext fscxt = NULL;
# #define CreateFSContext() # 目前仅有 AllocSetContext 一种类型的分配器
# fscxt = AllocSetContextCreate(TopMemoryContext, "Filesystem", ...
382 lobj = inv_open(oid, INV_WRITE, fscxt);
236 retval = (LargeObjectDesc *) MemoryContextAlloc(mcxt, sizeof(LargeObjectDesc)); # mcxt=fscxt,在 fscxt 中管理大对象 entry
# TODO 内存上下文的销毁
3.3.3 虚拟文件描述符 VFD机制
虚拟文件描述符,实际能打开的物理句柄数存在上限(ulimit -n 1024),不能满足需求,需要文件操作的都持有虚拟文件描述符,其中记录文件路径、打开模式、偏移量等信息,当有读写操作时,才尝试打开物理文件句柄,如句柄已满,使用 LRU 机制关闭掉最少使用的虚拟文件描述符的物理句柄,并未当前虚拟文件打开物理句柄,并设置偏移到上次操作结束位置。
(gdb) b lo_import
367 fd = PathNameOpenFile(fnamebuf, O_RDONLY | PG_BINARY, 0666);
839 file = AllocateVfd(); # 分配描述符
840 vfdP = &VfdCache[file]; # 插入VfdCache
842 while (nfile + numAllocatedDescs >= max_safe_fds) # 文件描述符不够时
844 if (!ReleaseLruFile()) break; # 淘汰暂不使用的文件
848 vfdP->fd = BasicOpenFile(fileName, fileFlags, fileMode); # 打开物理句柄
856 ++nfile;
860 Insert(file); # 插入 lru 池
384 while ((nbytes = FileRead(fd, buf, BUFSIZE)) > 0)
1108 returnCode = FileAccess(file);
1109 if (returnCode < 0)
1110 return returnCode;
1113 returnCode = read(VfdCache[file].fd, buffer, amount);
1115 if (returnCode >= 0)
1116 VfdCache[file].seekPos += returnCode;
}
3.3.4 高速缓存 SysCache & RelCache
1) 基本概念
- SysCache 数据库访问表时,需要表的模式信息(表的列属性、OID、统计信息等),而表的模式信息存放在系统表(pg_class) 中,因而每次访问普通表都要先访问系统表元组;
- RelCache 存放了最近访问表的模式信息,以 RelationData 形式存储
test=# d
List of relations
Schema | Name | Type | Owner |
--------+----------------+-------+-----------
public | t_introduction | table | z00423965 |
(1 row) ^
test=# select * from pg_class where relname='t_introduction';
- SysCache 数据库的一些功能如函数等操作,记录在系统表中,需要通过系统表调用;
test=# select * from pg_proc where proname='lo_import'; # 上面使用到的 lo_import 功能
- book. P86 每个系统表使用一个CatCache跟踪,共54个组成数组SysCache,CatCache提供最多4个关键字对缓存的元组进行索引,采用 hash 索引,hash 桶中使用双向链表记录hash相同的元素,如下图:

2) gdb 调试
(gdb) b SearchCatCache
(gdb) bt
# 访问 SysCache 获取 Table 信息
#0 SearchCatCache (cache=0x20738c0, v1=34188608, v2=11, v3=0, v4=0) at catcache.c:1118
#1 SearchSysCache (cacheId=37, key1=34188608, key2=11, key3=0, key4=0)
#9 setTargetTable (pstate=0x209b218, relation=0x209ad60, inh=0 '000',
#12 parse_analyze (parseTree=0x209b178,
sourceText=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgsql
/pg_introduction.txt'));", paramTypes=0x0, numParams=0) at analyze.c:91
#14 exec_simple_query (
query_string=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgs
ql/pg_introduction.txt'));") at postgres.c:917
#15 PostgresMain (argc=4, argv=0x1ffa918, username=0x1ffa8e0 "z00423965")
(gdb) n ...
# SearchCatCache 过程
# // initialize the search key information
# 1128 memcpy(cur_skey, cache->cc_skey, sizeof(cur_skey));
# // find the hash bucket in which to look for the tuple
# 1137 hashValue = CatalogCacheComputeHashValue(cache, cache->cc_nkeys, cur_skey);
# 1138 hashIndex = HASH_INDEX(hashValue, cache->cc_nbuckets);
# // scan the hash bucket until we find a match or exhaust our tuples
# // Tuple was not found in cache, so we have to try to retrieve it directly
# // from the relation. If found, we will add it to the cache; if not
# // found, we will add a negative cache entry instead.
(gdb) c
# 访问 SysCache 获取 function 信息
#0 SearchCatCache (cache=0x2088270, v1=705, v2=0, v3=0, v4=0) at catcache.c:1118
#4 0x0000000000503c16 in transformFuncCall (pstate=0x209b218, fn=0x209afa8) at parse_expr.c:1106
#5 0x0000000000501ebe in transformExpr (pstate=0x209b218, expr=0x209afa8) at parse_expr.c:228
#9 0x00000000004d9f1e in parse_analyze (parseTree=0x209b178,
sourceText=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgsql
/pg_introduction.txt'));", paramTypes=0x0, numParams=0) at analyze.c:91
#11 0x0000000000664c3a in exec_simple_query (
query_string=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgs
ql/pg_introduction.txt'));") at postgres.c:917
#12 0x0000000000668abd in PostgresMain (argc=4, argv=0x1ffa918, username=0x1ffa8e0 "z00423965")
at postgres.c:3614
# 访问 RelCache 获取 relation 信息
(gdb) hash_search
#0 hash_search (hashp=0x20228f0, keyPtr=0x7ffe3250038c, action=HASH_FIND, foundPtr=0x0)
#1 0x000000000073efa1 in RelationIdGetRelation (relationId=17098) at relcache.c:1524
#2 0x0000000000467cb6 in relation_open (relationId=17098, lockmode=3) at heapam.c:898
#6 0x00000000004fe4d7 in setTargetTable (pstate=0x209b218, relation=0x209ad60, inh=0 '000',
alsoSource=0 '000', requiredPerms=1) at parse_clause.c:178
#7 0x00000000004da500 in transformInsertStmt (pstate=0x209b218, stmt=0x209b178) at analyze.c:387
#9 0x00000000004d9f1e in parse_analyze (parseTree=0x209b178,
sourceText=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgsql
/pg_introduction.txt'));", paramTypes=0x0, numParams=0) at analyze.c:91
#11 0x0000000000664c3a in exec_simple_query (
query_string=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgs
ql/pg_introduction.txt'));") at postgres.c:917
3.3.5 缓存池管理 BufferDesc
(gdb) ReadBuffer
########### 6, 加载到缓冲区
#0 ReadBuffer (reln=0x7f8379fb9fa0, blockNum=0) at bufmgr.c:180
########### 5, B-Tree 索引查找并加载表文件
#4 0x000000000047e5e3 in _bt_first (scan=0x209acb0, dir=ForwardScanDirection) at nbtsearch.c:837
########### 4, 未找到,加载表文件继续查询
#5 0x000000000047c5f5 in btgettuple (fcinfo=0x7ffe324ffc40) at nbtree.c:281
########### 3, 查找高速缓存
#10 0x000000000074367a in SearchSysCache (cacheId=29, key1=140202663657656, key2=0, key3=0,
key4=0) at syscache.c:848
########### 2, 分析查询
#21 0x00000000004d9f1e in parse_analyze (parseTree=0x209b178,
sourceText=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgsql
/pg_introduction.txt'));", paramTypes=0x0, numParams=0) at analyze.c:91
########### 1, 执行查询
#23 0x0000000000664c3a in exec_simple_query (
query_string=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgs
ql/pg_introduction.txt'));") at postgres.c:917
(gdb) ReadBuffer
3.3.6 其他
暂未涉及到的概念 ( FSM, VM & IPC )
参考文献
本文详细介绍了在PostgreSQL中,如何利用gdb调试大对象(LOB)的插入过程,包括大对象的存储结构、内存上下文MemoryContext、虚拟文件描述符VFD机制、高速缓存SysCache与RelCache的交互。通过具体的调试步骤,揭示了从创建大对象到写入数据时,系统内部各个模块如何协同工作的细节。

1662

被折叠的 条评论
为什么被折叠?



