技术分享 | 一次新上的SQL引发的CPU飙升(Oracle)

开发板推荐:天空星STM32F407VET6开发板

超高性价比 STM32主控 | 超高主频 | 一板兼容百芯 | 比赛神器 | 沉金彩色丝印

本文为墨天轮数据库管理服务团队第133期技术分享,内容原创,作者为技术顾问李宁,如需转载请联系小墨(VX:modb666)并注明来源。如需查看更多文章可关注【墨天轮】公众号。

版本架构:Oracle 19c 双节点RAC+双节点DG

摘要:记录一次TOP SQL由于统计信息缺失引发的AAS和CPU突刺,优化器只能基于统计信息来判断最优执行路径。如果统计信息缺失或过期,系统随时可能出现严重性能问题。

一、问题背景

2024年11月29号早上,生产营业系统突然收到CPU占用率超过阈值的告警。通过监控发现,数据库侧的 AAS(Active Session Average)飙升到95,有一条新上线的SQL消耗异常,导致CPU使用率直线拉升。

图片

二、问题定位

1、Top SQL定位

从监控(上面图片)看,左面有一条TOP SQL占比31%,大概率是元凶,继续分析分线,该SQL第一次出现时间在 07:15:29,说明是新上线的SQL,从监控中快速定位:

SQL_ID        4cwfmj3fkzrh8
PLAN_HASH_VALUE   2962550836
FIRST_LOAD_TIME   2024-11-29/07:15:29
LAST_LOAD_TIME    2024-11-29 10:35:25

-- sql信息
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID,
CUSTOMERPHONE, SERVICECODE,  SERVICESTATUS, RECEIVETIME, EFFTIME,
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE,
SELLERMEMO FROM OJ_RES_GOOD_SYN
 WHERE SUBSID = :1
   AND EXPTIME IS NOT NULL
   AND EXPTIME >= SYSDATE
   AND EXPTIME <= :2
   AND CITY = 1117

–查看sql的第一次载入时间
select FIRST\_LOAD\_TIME from v$sql where sql\_id=&1;

2、分析原因

在监控(上面图片)右侧显示的实时的等待事件,此时占比最高的是 ON CPU / runqueue,这个等待事件不是select distinct WAIT\_CLASS from v$event\_name;13个类别里面的,一般可以通过V$session status='ACTIVE' and state!='WAITING'、或者V$ASH中event列为NULL判断,也是说如果没有等待事件此时SQL正在使用或等待CPU。到这里可以断定问题大概率发生在sql执行流程中的执行阶段,可能占用CPU较高的是sql的排序操作、复杂算子的计算、函数调用等

接着分析SQL的执行计划

SQL_ID  4cwfmj3fkzrh8, child number 0
-------------------------------------
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID,
CUSTOMERPHONE, SERVICECODE,  SERVICESTATUS, RECEIVETIME, EFFTIME,
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE,
SELLERMEMO FROM OJ_RES_GOOD_SYN  WHERE SUBSID = :1  AND
EXPTIME IS NOT NULL AND EXPTIME >= SYSDATE AND EXPTIME <= :2  AND
CITY = 1117

Plan hash value: 881293612

------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                      | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                           |        |       |  1470 (100)|          |       |       |
|*  1 |  FILTER                                     |                           |        |       |            |          |       |       |
|*  2 |   TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| OJ_RES_GOOD_SYN                     |     15 | 13680 |  1470   (0)| 00:00:01 | ROWID | ROWID |
|*  3 |    INDEX RANGE SCAN                         | IDX_RIGHTGOOD_SERVICEGOOD |    239K|       |    28   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(:2>=SYSDATE@!)
   2 - filter(("EXPTIME"<=:2 AND "SUBSID"=TO_NUMBER(:1) AND "EXPTIME">=SYSDATE@!))
   3 - access("CITY"=1117)

索引信息

SYS@PANDAe1>@ind %.OJ_RES_GOOD_SYN
Display indexes where table or index name matches %%.OJ_RES_GOOD_SYN%...


TABLE_OWNER          TABLE_NAME                     INDEX_NAME                     POS# COLUMN_NAME                    DSC
-------------------- ------------------------------ ------------------------------ ---- ------------------------------ ----
PANDA                 OJ_RES_GOOD_SYN                          IDX_CS_REC_SUBS_RIGHT_SUBSID      1 SUBSID        <<<< 正确的
                                                    IDX_RIGHTGOOD_CUSTOMERPHONE       1 CUSTOMERPHONE
                                                    IDX_RIGHTGOOD_SERVICEGOOD         1 CITY        <<<< 错误的
                                                                                      2 GOODSID
                                                                                      3 SERVICECODE


INDEX_OWNER          TABLE_NAME                     INDEX_NAME                     IDXTYPE    UNIQ STATUS   PART TEMP  H     LFBLKS           NDK   NUM_ROWS       CLUF LAST_ANALYZED       DEGREE VISIBILIT
-------------------- ------------------------------ ------------------------------ ---------- ---- -------- ---- ---- -- ---------- ------------- ---------- ---------- ------------------- ------ ---------
PANDA                OJ_RES_GOOD_SYN                          IDX_CS_REC_SUBS_RIGHT_SUBSID               NORMAL     NO   N/A      YES  N     3     129071       7690240   34905836   34226961 2024-11-29 10:35:23 1      VISIBLE  <<<<<<<<<刚才这个索引是没有统计信息的
                     OJ_RES_GOOD_SYN                          IDX_RIGHTGOOD_CUSTOMERPHONE                NORMAL     NO   N/A      YES  N     3     148744       6300672   34905829   34425431 2024-11-29 10:35:01 1      VISIBLE
                     OJ_RES_GOOD_SYN

执行计划使用了索引 IDX\_RIGHTGOOD\_SERVICEGOOD走了了INDEX RANGE SCAN并且谓词中只通过CITY字段进行了过滤,这个字段是一个分区键,选择性很差。在where条件后面有一个SUBSID选择新很好并且存在索引IDX\_CS\_REC\_SUBS\_RIGHT\_SUBSID

select * from (select CITY,count(*) from PANDA.OJ_RES_GOOD_SYN group by CITY order by count(*) desc)where rownum<=10;
    CITY   COUNT(*)
---------- ----------
       1116   24489838
       1117   11353009
       1118       7236

3 rows selected.

select * from (select SUBSID,count(*) from PANDA.OJ_RES_GOOD_SYN group by SUBSID order by count(*) desc)where rownum<=10;

    SUBSID   COUNT(*)
---------- ----------
3.1639E+12       2944
3.1641E+12       2855
3.1639E+12       2476
3.1639E+12       2204
3.1421E+12       2194
3.1640E+12       2143
3.1621E+12       2099
3.1641E+12       2015
3.1640E+12       1939
3.1640E+12       1909

10 rows selected.

3、检查统计信息

执行计划异常,有好的索引不走,检查涉及表的统计信息,发现统计信息过期 (STALE\_STATS=YES/NULL忘记了),这意味着优化器可能生成了不合理的执行计划。

SELECT owner, table_name, partition_name, stale_stats, last_analyzed
FROM   dba_tab_statistics
WHERE  table_name   = UPPER('OJ_RES_GOOD_SYN')
  AND  owner        = 'PANDA';

OWNER      TABLE_NAME           PARTITION_NAME       STALE LAST_ANALYZED
---------- -------------------- -------------------- ----- ----------------
OWNER       TABLE_NAME                       PARTITION_NAME                  29-AUG-20

三、处理过程

之前OJ\_RES\_GOOD\_SYN没有统计信息,优化器无法正确评估成本。收集表统计信息

exec DBMS_STATS.GATHER_TABLE_STATS (
    ownname => 'PANDA', 
    tabname => 'OJ_RES_GOOD_SYN',
    cascade => true,
    estimate_percent => dbms_stats.auto_sample_size,
    method_opt => 'FOR TABLE FOR ALL COLUMNS SIZE REPEAT',
    degree => 8,
    no_invalidate => false
);

收集后,执行计划重新生成,正确走上IDX\_CS\_REC\_SUBS\_RIGHT\_SUBSID索引,也走了CITY的分区裁剪

SQL_ID  4cwfmj3fkzrh8, child number 3
-------------------------------------
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID, 
CUSTOMERPHONE, SERVICECODE,  SERVICESTATUS, RECEIVETIME, EFFTIME, 
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,  
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE, 
SELLERMEMO FROM OJ_RES_GOOD_SYN  WHERE SUBSID = :1  AND 
EXPTIME IS NOT NULL AND EXPTIME >= SYSDATE AND EXPTIME <= :2  AND 
CITY = 1117

Plan hash value: 2962550836

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                              |       |       |   134 (100)|          |       |       |
|*  1 |  FILTER                                     |                              |       |       |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR                  |                              |     1 |   912 |   134   (0)| 00:00:01 |   131 |   196 |
|*  3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| OJ_RES_GOOD_SYN                        |     1 |   912 |   134   (0)| 00:00:01 |   131 |   196 |
|*  4 |     INDEX RANGE SCAN                        | IDX_CS_REC_SUBS_RIGHT_SUBSID |    32 |       |   133   (0)| 00:00:01 |   131 |   196 |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(:2>=SYSDATE@!)
   3 - filter(("CITY"=1117 AND "EXPTIME"<=:2 AND "EXPTIME">=SYSDATE@!))
   4 - access("SUBSID"=TO_NUMBER(:1))

四、效果验证

统计信息收集后,执行计划恢复正确,CPU瞬间回落,系统恢复正常。

SQL性能指标对比:

计划LIO/Exec(块次)CPU/Exec(secs)Elapsed/Exec(secs)
坏计划 8812936122,112,709.9838.8443.73
好计划 2962550836173.730.000.01
                                                                                              Avg                 Avg
Snapshot                                          Avg LIO             Avg PIO          CPU (secs)      Elapsed (secs)
Time         INSTANCE_NUMBER    Execs            Per Exec            Per Exec            Per Exec            Per Exec
------------ --------------- -------- ------------------- ------------------- ------------------- -------------------
29-NOV 07:30               1        6        2,465,321.50          108,749.83               18.59               49.14
29-NOV 08:00               1      257        8,457,635.70            1,496.14               34.31               36.93
29-NOV 08:30               1      814                0.00                0.00               36.69               36.74
29-NOV 09:00               1    1,329        2,255,022.27                0.00               39.74               39.84
29-NOV 09:30               1    1,706                0.00                0.00               42.84               42.99
29-NOV 10:00               1    1,757        1,434,805.84                0.00               45.29               45.58
29-NOV 10:30               1      102          176,184.53                7.97               54.41               54.88
29-NOV 10:30               1    1,669              173.73                5.38                0.00                0.01
                             -------- ------------------- ------------------- ------------------- -------------------
avg                                          1,848,642.95           13,782.42               33.98               38.26
sum                             7,640

                                                             Per-Plan Execution Statistics Over Time
                                                                                                                             Avg                 Avg
      Plan Snapshot                                         Avg Rows             Avg LIO             Avg PIO          CPU (secs)      Elapsed (secs)
Hash Value Time         INSTANCE_NUMBER    Execs            Per Exec            Per Exec            Per Exec            Per Exec            Per Exec
---------- ------------ --------------- -------- ------------------- ------------------- ------------------- ------------------- -------------------
 881293612 29-NOV 07:30               1        6                0.00        2,465,321.50          108,749.83               18.59               49.14
           29-NOV 08:00               1      257                0.00        8,457,635.70            1,496.14               34.31               36.93
           29-NOV 08:30               1      814                0.01                0.00                0.00               36.69               36.74
           29-NOV 09:00               1    1,329                0.02        2,255,022.27                0.00               39.74               39.84
           29-NOV 09:30               1    1,706                0.01                0.00                0.00               42.84               42.99
           29-NOV 10:00               1    1,757                0.01        1,434,805.84                0.00               45.29               45.58
           29-NOV 10:30               1      102                0.02          176,184.53                7.97               54.41               54.88
**********                              -------- ------------------- ------------------- ------------------- ------------------- -------------------
avg                                                             0.01        2,112,709.98           15,750.56               38.84               43.73
sum                                        5,971

2962550836 29-NOV 10:30               1    1,669                0.01              173.73                5.38                0.00                0.01
**********                              -------- ------------------- ------------------- ------------------- ------------------- -------------------
avg                                                             0.01              173.73                5.38                0.00                0.01
sum 

墨天轮从乐知乐享的数据库技术社区蓄势出发,全面升级,提供多类型数据库管理服务。墨天轮数据库管理服务旨在为用户构建信赖可托付的数据库环境,并为数据库厂商提供中立的生态支持。
墨天轮数据库服务官网:https://www.modb.pro/service

开发板推荐:天空星STM32F407VET6开发板

超高性价比 STM32主控 | 超高主频 | 一板兼容百芯 | 比赛神器 | 沉金彩色丝印

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值