pg 事务 存储过程_PostgreSQL源码剖析_插入大对象时各存储相关模块交互流程

本文详细介绍了在PostgreSQL中,如何利用gdb调试大对象(LOB)的插入过程,包括大对象的存储结构、内存上下文MemoryContext、虚拟文件描述符VFD机制、高速缓存SysCache与RelCache的交互。通过具体的调试步骤,揭示了从创建大对象到写入数据时,系统内部各个模块如何协同工作的细节。

摘要

关键词

大对象、大对象的插入

插入数据过程中内存上下文、虚拟文件描述符VFD、高速缓存和缓存池管理BufferDesc等模块之间的交互流程。

目标:结合任务 "gdb调试文本类大对象的插入流程" 串联存储相关的各模块交互流程

1 体系结构

f790badddd1b9b14a2dfc6f1b46cb5dd.png

2 读写元组的过程

5759f8d80e411f433504858fd0b11480.png

3 gdb调式大对象的插入流程

3.1 工作环境介绍

os : Red Hat 4.8.5-11

gdb : GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7

pg : 8.4.1

3.2 基本调试流程

a59d36ec5ff771c71e3d2799654ae0ca.png

# 1, start pg

[z00423965@localhost pgsql]$ pg_ctl start -D ./data

server starting

LOG: database system was shut down at 2019-06-27 00:50:55 CST

LOG: database system is ready to accept connections

LOG: autovacuum launcher started

[z00423965@localhost pgsql]$ ps -ef | grep postg

z004239+ 1049 1 0 00:51 pts/7 00:00:00 /home/z00423965/pgsql/build/bin/postgres -D ./data # postmaster

z004239+ 1051 1049 0 00:51 ? 00:00:00 postgres: writer process # bg_writer

z004239+ 1052 1049 0 00:51 ? 00:00:00 postgres: wal writer process # wal_writer

z004239+ 1053 1049 0 00:51 ? 00:00:00 postgres: autovacuum launcher process # auto_vacuum

z004239+ 1054 1049 0 00:51 ? 00:00:00 postgres: stats collector process # pg_stat

# 2, start backend process for this pg_session

# two ssh terminals needed, said T1, T2

[z00423965@localhost pgsql]$ createdb test #@T1

[z00423965@localhost pgsql]$ psql test #@T1

test=# create table t_introduction(item char(256), introduction oid);

[z00423965@localhost pgsql]$ ps -ef | grep postg #@T2

... ...

z004239+ 1757 1049 0 01:01 ? 00:00:00 postgres: z00423965 test [local] idle # backend process for this pg_session

# 3, start gdb

[z00423965@localhost pgsql]$ gdb #@T2

(gdb) layout src #@T2

(gdb) attach 1757 #@T2

(gdb) b lo_import #@T2

(gdb) c #@T2

test=# insert into t_introduction values('pg', lo_import('/home/z00423965/pgsql/pg_introduction.txt')); #@T1

3.3 各模块的调用过程

3.3.1 大对象

大对象采用线外存储,表中的元组处存储 oid,在query及排序时,仅需要加载较少的数据,提升i/o性能,存储在 pg_largeobject 中,相应的处理函数为 inv_create inv_open inv_write inv_read

(gdb) b lo_import #@T2

Breakpoint 1 at 0x5ac062: file be-fsstubs.c, line 324.

(gdb) c

Continuing.

Breakpoint 1, lo_import (fcinfo=0x7fffd07c1980) at be-fsstubs.c:324

324 text *filename = PG_GETARG_TEXT_PP(0);

(gdb) ... ...

lo_import_internal (filename=0x1aced70, lobjOid=0) at be-fsstubs.c:354

350 LargeObjectDesc *lobj;

361 CreateFSContext(); # 创建 filesystem 内存上下文

367 fd = PathNameOpenFile(fnamebuf, O_RDONLY | PG_BINARY, 0666); # 打开 fd 涉及 vfd 机制

375 // create an inversion object

377 oid = inv_create(lobjOid); # 创建大对象 book. P75

380 // read in from the filesystem and write to the inversion object

382 lobj = inv_open(oid, INV_WRITE, fscxt); # 打开大对象 book. P75

384 while ((nbytes = FileRead(fd, buf, BUFSIZE)) > 0)

386 tmp = inv_write(lobj, buf, nbytes); # 写入大对象 book. P75

396 inv_close(lobj);

397 FileClose(fd);

(gdb) p *lobj

$2 = {id = 17090, snapshot = 0xb348a0 <SnapshotNowData>, subid = 1, offset = 488, flags = 3}

(gdb)

1673 ExecInsert(slot, tupleid, planSlot, dest, estate); # 继续执行 insert

test=# select * from pg_largeobject;

loid | pageno | data

17090 | 0 | xxxxxxxxxxxxxxxxxx

3.3.2 内存上下文 MemoryContext

内存上下文,C++中allocator的概念,每次内存申请都记录到对应的内存上下文中,调用结束后,可以通过释放上下文中的所有内存来避免内存泄漏。

c3fa23cb5589b0a563288b63630935fa.png

# /*

# * LO "FD"s are indexes into the cookies array.

# * A non-null entry is a pointer to a LargeObjectDesc allocated in the

# * LO private memory context "fscxt". The cookies array itself is also

# * dynamically allocated in that context. Its current allocated size is # 大对象使用私有的内存上下文 fscxt 来分配空间存储大对象的 cookie 索引

# * cookies_len entries, of which any unused entries will be NULL.

# */

(gdb) b lo_import

361 CreateFSContext(); # 创建 filesystem 内存上下文

# static MemoryContext fscxt = NULL;

# #define CreateFSContext() # 目前仅有 AllocSetContext 一种类型的分配器

# fscxt = AllocSetContextCreate(TopMemoryContext, "Filesystem", ...

382 lobj = inv_open(oid, INV_WRITE, fscxt);

236 retval = (LargeObjectDesc *) MemoryContextAlloc(mcxt, sizeof(LargeObjectDesc)); # mcxt=fscxt,在 fscxt 中管理大对象 entry

# TODO 内存上下文的销毁

3.3.3 虚拟文件描述符 VFD机制

虚拟文件描述符,实际能打开的物理句柄数存在上限(ulimit -n 1024),不能满足需求,需要文件操作的都持有虚拟文件描述符,其中记录文件路径、打开模式、偏移量等信息,当有读写操作时,才尝试打开物理文件句柄,如句柄已满,使用 LRU 机制关闭掉最少使用的虚拟文件描述符的物理句柄,并未当前虚拟文件打开物理句柄,并设置偏移到上次操作结束位置。

(gdb) b lo_import

367 fd = PathNameOpenFile(fnamebuf, O_RDONLY | PG_BINARY, 0666);

839 file = AllocateVfd(); # 分配描述符

840 vfdP = &VfdCache[file]; # 插入VfdCache

842 while (nfile + numAllocatedDescs >= max_safe_fds) # 文件描述符不够时

844 if (!ReleaseLruFile()) break; # 淘汰暂不使用的文件

848 vfdP->fd = BasicOpenFile(fileName, fileFlags, fileMode); # 打开物理句柄

856 ++nfile;

860 Insert(file); # 插入 lru 池

384 while ((nbytes = FileRead(fd, buf, BUFSIZE)) > 0)

1108 returnCode = FileAccess(file);

1109 if (returnCode < 0)

1110 return returnCode;

1113 returnCode = read(VfdCache[file].fd, buffer, amount);

1115 if (returnCode >= 0)

1116 VfdCache[file].seekPos += returnCode;

}

3.3.4 高速缓存 SysCache & RelCache

1) 基本概念

  • SysCache 数据库访问表时,需要表的模式信息(表的列属性、OID、统计信息等),而表的模式信息存放在系统表(pg_class) 中,因而每次访问普通表都要先访问系统表元组;
  • RelCache 存放了最近访问表的模式信息,以 RelationData 形式存储

test=# d

List of relations

Schema | Name | Type | Owner |

--------+----------------+-------+-----------

public | t_introduction | table | z00423965 |

(1 row) ^

test=# select * from pg_class where relname='t_introduction';

  • SysCache 数据库的一些功能如函数等操作,记录在系统表中,需要通过系统表调用;

test=# select * from pg_proc where proname='lo_import'; # 上面使用到的 lo_import 功能

  • book. P86 每个系统表使用一个CatCache跟踪,共54个组成数组SysCache,CatCache提供最多4个关键字对缓存的元组进行索引,采用 hash 索引,hash 桶中使用双向链表记录hash相同的元素,如下图:

0c3efe744c99677327c86215d9e56fae.png

2) gdb 调试

(gdb) b SearchCatCache

(gdb) bt

# 访问 SysCache 获取 Table 信息

#0 SearchCatCache (cache=0x20738c0, v1=34188608, v2=11, v3=0, v4=0) at catcache.c:1118

#1 SearchSysCache (cacheId=37, key1=34188608, key2=11, key3=0, key4=0)

#9 setTargetTable (pstate=0x209b218, relation=0x209ad60, inh=0 '000',

#12 parse_analyze (parseTree=0x209b178,

sourceText=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgsql

/pg_introduction.txt'));", paramTypes=0x0, numParams=0) at analyze.c:91

#14 exec_simple_query (

query_string=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgs

ql/pg_introduction.txt'));") at postgres.c:917

#15 PostgresMain (argc=4, argv=0x1ffa918, username=0x1ffa8e0 "z00423965")

(gdb) n ...

# SearchCatCache 过程

# // initialize the search key information

# 1128 memcpy(cur_skey, cache->cc_skey, sizeof(cur_skey));

# // find the hash bucket in which to look for the tuple

# 1137 hashValue = CatalogCacheComputeHashValue(cache, cache->cc_nkeys, cur_skey);

# 1138 hashIndex = HASH_INDEX(hashValue, cache->cc_nbuckets);

# // scan the hash bucket until we find a match or exhaust our tuples

# // Tuple was not found in cache, so we have to try to retrieve it directly

# // from the relation. If found, we will add it to the cache; if not

# // found, we will add a negative cache entry instead.

(gdb) c

# 访问 SysCache 获取 function 信息

#0 SearchCatCache (cache=0x2088270, v1=705, v2=0, v3=0, v4=0) at catcache.c:1118

#4 0x0000000000503c16 in transformFuncCall (pstate=0x209b218, fn=0x209afa8) at parse_expr.c:1106

#5 0x0000000000501ebe in transformExpr (pstate=0x209b218, expr=0x209afa8) at parse_expr.c:228

#9 0x00000000004d9f1e in parse_analyze (parseTree=0x209b178,

sourceText=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgsql

/pg_introduction.txt'));", paramTypes=0x0, numParams=0) at analyze.c:91

#11 0x0000000000664c3a in exec_simple_query (

query_string=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgs

ql/pg_introduction.txt'));") at postgres.c:917

#12 0x0000000000668abd in PostgresMain (argc=4, argv=0x1ffa918, username=0x1ffa8e0 "z00423965")

at postgres.c:3614

# 访问 RelCache 获取 relation 信息

(gdb) hash_search

#0 hash_search (hashp=0x20228f0, keyPtr=0x7ffe3250038c, action=HASH_FIND, foundPtr=0x0)

#1 0x000000000073efa1 in RelationIdGetRelation (relationId=17098) at relcache.c:1524

#2 0x0000000000467cb6 in relation_open (relationId=17098, lockmode=3) at heapam.c:898

#6 0x00000000004fe4d7 in setTargetTable (pstate=0x209b218, relation=0x209ad60, inh=0 '000',

alsoSource=0 '000', requiredPerms=1) at parse_clause.c:178

#7 0x00000000004da500 in transformInsertStmt (pstate=0x209b218, stmt=0x209b178) at analyze.c:387

#9 0x00000000004d9f1e in parse_analyze (parseTree=0x209b178,

sourceText=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgsql

/pg_introduction.txt'));", paramTypes=0x0, numParams=0) at analyze.c:91

#11 0x0000000000664c3a in exec_simple_query (

query_string=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgs

ql/pg_introduction.txt'));") at postgres.c:917

3.3.5 缓存池管理 BufferDesc

(gdb) ReadBuffer

########### 6, 加载到缓冲区

#0 ReadBuffer (reln=0x7f8379fb9fa0, blockNum=0) at bufmgr.c:180

########### 5, B-Tree 索引查找并加载表文件

#4 0x000000000047e5e3 in _bt_first (scan=0x209acb0, dir=ForwardScanDirection) at nbtsearch.c:837

########### 4, 未找到,加载表文件继续查询

#5 0x000000000047c5f5 in btgettuple (fcinfo=0x7ffe324ffc40) at nbtree.c:281

########### 3, 查找高速缓存

#10 0x000000000074367a in SearchSysCache (cacheId=29, key1=140202663657656, key2=0, key3=0,

key4=0) at syscache.c:848

########### 2, 分析查询

#21 0x00000000004d9f1e in parse_analyze (parseTree=0x209b178,

sourceText=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgsql

/pg_introduction.txt'));", paramTypes=0x0, numParams=0) at analyze.c:91

########### 1, 执行查询

#23 0x0000000000664c3a in exec_simple_query (

query_string=0x209a490 "insert into t_introduction values('pg', lo_import('/home/z00423965/pgs

ql/pg_introduction.txt'));") at postgres.c:917

(gdb) ReadBuffer

3.3.6 其他

暂未涉及到的概念 ( FSM, VM & IPC )

参考文献

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值