1.1 实例无法cleaning
1.1.1 故障现象
ORACLE 11.2.0.3.0升级到oracle 11.2.0.3.6,节点1上打psu过程完全正常,没有任何报错。但是,最后一步启动CRS的时候,发现节点1上实例无法正常启动。
1.1.2 集群状态
发现节点1上,实例状态为unknown,并且srvctl无法启动和关闭节点1上的实例。
RAC02:/home/grid> crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASM_ARCH.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_CRS.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA01.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA02.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA03.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.LISTENER.lsnr
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.asm
ONLINE ONLINE rac01 Started
ONLINE ONLINE rac02 Started
ora.gsd
OFFLINE OFFLINE rac01
OFFLINE OFFLINE rac02
ora.net1.network
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ons
ONLINE ONLINE rac01
ONLINE ONLINE rac02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac01
ora.cvu
1 ONLINE ONLINE rac01
ora.racdb.db
1 ONLINE UNKNOWN rac01
2 ONLINE ONLINE rac02 Open
ora.rac01.vip
1 ONLINE ONLINE rac01
ora.rac02.vip
1 ONLINE ONLINE rac02
ora.oc4j
1 ONLINE ONLINE rac01
ora.scan1.vip
1 ONLINE ONLINE rac01
1.1.3 手工停止实例
试着手工停掉节点1实例,发现节点1实例一直是CLEANING状态,不能正常停止:
RAC01:/home/grid> crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASM_ARCH.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_CRS.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA01.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA02.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA03.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.LISTENER.lsnr
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.asm
ONLINE ONLINE rac01 Started
ONLINE ONLINE rac02 Started
ora.gsd
OFFLINE OFFLINE rac01
OFFLINE OFFLINE rac02
ora.net1.network
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ons
ONLINE ONLINE rac01
ONLINE ONLINE rac02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac02
ora.cvu
1 ONLINE ONLINE rac02
ora.racdb.db
1 ONLINE OFFLINE CLEANING
2 ONLINE ONLINE rac02 Open
ora.rac01.vip
1 ONLINE ONLINE rac01
ora.rac02.vip
1 ONLINE ONLINE rac02
ora.oc4j
1 ONLINE ONLINE rac01
ora.scan1.vip
1 ONLINE ONLINE rac02
1.1.4 查看日志
查看日志crsd.log:
2013-11-27 13:22:43.287: [ A**][41] {1:37211:819} Starting the agent: /oracle/grid/11.2.0/grid_1/bin/oraagent with user id: oracle and incarnation:15
2013-11-27 13:22:43.287: [UiServer][47] {1:37211:819} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac01']
MSGTYPE:
TextMessage[3]
OBJID:
TextMessage[rac01]
WAIT:
TextMessage[0]
]
2013-11-27 13:22:43.288: [UiServer][47] {1:37211:819} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2679: Attempting to clean 'ora.racdb.db' on 'rac01']
MSGTYPE:
TextMessage[3]
OBJID:
TextMessage[ora.racdb.db 1 1]
WAIT:
TextMessage[0]
]
2013-11-27 13:22:43.345: [ A**][41] {1:37211:819} Starting the HB [Interval = 30000, misscount = 6kill allowed=1] for agent: /oracle/grid/11.2.0/grid_1/bin/oraagent_oracle
2013-11-27 13:22:43.347: [ A**][41] {1:37211:819} Could not forward message [RESOURCE_CLEAN[ora.racdb.db 1 1] ID 4100:3131] to agent. /oracle/grid/11.2.0/grid_1/bin/oraagent_oracle is not running
2013-11-27 13:22:43.348: [ A**][41] {1:37211:819} Starting of the agent: /oracle/grid/11.2.0/grid_1/bin/oraagent with user id oracle is already in progress.
2013-11-27 13:22:57.461: [ A**][44] {2:6722:146} Created alert : (:CRSAGF00130:) : Failed to start the agent /oracle/grid/11.2.0/grid_1/bin/oraagent_oracle
2013-11-27 13:22:57.462: [ A**][44] {2:6722:146} A** Proxy Server sending the last reply to PE for message:RESOURCE_CLEAN[ora.racdb.db 1 1] ID 4100:2464
2013-11-27 13:22:57.462: [ A**][44] {2:6722:146} Can not stop the agent: /oracle/grid/11.2.0/grid_1/bin/oraagent_oracle because pid is not initialized
2013-11-27 13:22:57.462: [ CRSPE][49] {2:6722:146} Received reply to action [Clean] message ID: 2464
2013-11-27 13:22:57.462: [ CRSPE][49] {2:6722:146} RI [ora.racdb.db 1 1] new internal state: [STABLE] old value: [CLEANING]
2013-11-27 13:22:57.462: [ CRSPE][49] {2:6722:146} Fatal Error from A** Proxy: Unable to start the agent process
2013-11-27 13:22:57.463: [ CRSPE][49] {2:6722:146} CRS-2680: Clean of 'ora.racdb.db' on 'rac01' failed
2013-11-27 13:22:57.465: [ CRSPE][49] {2:6722:146} Sequencer for [ora.racdb.db 1 1] has completed with error: CRS-5802: Unable to s
tart the agent process
1.1.5 查看oraagent进程
因为日志里有提到oraagent_oracle无法启动,所以在两个节点上对比oraagent进程状态:
节点2:
RAC02:/home/grid> ps -ef | grep oraagent
grid 3211 1 0 12:04:45 ? 0:50 /oracle/grid/11.2.0/grid_1/bin/oraagent.bin
grid 3830 1 0 12:05:50 ? 0:24 /oracle/grid/11.2.0/grid_1/bin/oraagent.bin
oracle 6046 1 0 12:12:42 ? 0:57 /oracle/grid/11.2.0/grid_1/bin/oraagent.bin
grid 3204 26634 1 13:39:21 pts/4 0:00 grep oraagent
节点1:
RAC01:/home/grid> ps -ef | grep oraagent
grid 5077 1 0 11:46:17 ? 1:05 /oracle/grid/11.2.0/grid_1/bin/oraagent.bin
grid 5938 1 0 11:47:41 ? 0:35 /oracle/grid/11.2.0/grid_1/bin/oraagent.bin
grid 29140 19598 0 13:39:15 pts/2 0:00 grep oraagent
1.1.6 手工启动oraagent进程
从上面的信息看,节点1比节点2少启动一个oraagent.bin进程。于是在节点1上尝试进入oracle用户启动该进程:
RAC01:/home/oracle> /oracle/grid/11.2.0/grid_1/bin/oraagent.bin
sh: /oracle/grid/11.2.0/grid_1/bin/oraagent.bin: Execute permission denied.
RAC01:/home/oracle> ll /oracle/grid/11.2.0/grid_1/bin/oraagent.bin
-rwx------ 1 grid oinstall 40262952 Nov 26 19:46 /oracle/grid/11.2.0/grid_1/bin/oraagent.bin
RAC01:/home/oracle> id
uid=1101(oracle) gid=1000(oinstall) groups=1031(dba)
RAC01:/home/oracle> exit
logout
RAC01:/> id
uid=0(root) gid=3(sys) groups=0(root),1(other),2(bin),4(adm),5(daemon),6(mail),7(lp),20(users)
RAC01:/> chmod 777 /oracle/grid/11.2.0/grid_1/bin/oraagent.bin
1.1.7 再次停止instance
在节点1上用srvctl停掉instance,然后启动:
RAC01:/oracle/grid> srvctl stop instance -d racdb -n rac01 -f
发现这次stop可以顺利完成,查看状态,发现节点1的顺利状态也是正常的shutdwon状态:
RAC01:/oracle/grid> crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASM_ARCH.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_CRS.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA01.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA02.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA03.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.LISTENER.lsnr
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.asm
ONLINE ONLINE rac01 Started
ONLINE ONLINE rac02 Started
ora.gsd
OFFLINE OFFLINE rac01
OFFLINE OFFLINE rac02
ora.net1.network
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ons
ONLINE ONLINE rac01
ONLINE ONLINE rac02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac01
ora.cvu
1 ONLINE ONLINE rac01
ora.racdb.db
1 OFFLINE OFFLINE Instance Shutdown
2 ONLINE ONLINE rac02 Open
ora.rac01.vip
1 ONLINE ONLINE rac01
ora.rac02.vip
1 ONLINE ONLINE rac02
ora.oc4j
1 ONLINE ONLINE rac01
ora.scan1.vip
1 ONLINE ONLINE rac01
1.1.8 再使用srvctl启动instance
RAC01:/oracle/grid> srvctl start instance -d racdb -n rac01
启动命令成功完成!!!!
查看状态,节点1上的实例已经顺利启动:
RAC01:/oracle/grid> crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASM_ARCH.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_CRS.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA01.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA02.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ASM_DATA03.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.LISTENER.lsnr
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.asm
ONLINE ONLINE rac01 Started
ONLINE ONLINE rac02 Started
ora.gsd
OFFLINE OFFLINE rac01
OFFLINE OFFLINE rac02
ora.net1.network
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ons
ONLINE ONLINE rac01
ONLINE ONLINE rac02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac01
ora.cvu
1 ONLINE ONLINE rac01
ora.racdb.db
1 ONLINE ONLINE rac01 Open
2 ONLINE ONLINE rac02 Open
ora.rac01.vip
1 ONLINE ONLINE rac01
ora.rac02.vip
1 ONLINE ONLINE rac02
ora.oc4j
1 ONLINE ONLINE rac01
ora.scan1.vip
1 ONLINE ONLINE rac01
至此,节点1的oracle实例已经顺利启动,节点1升级完成,开始进行节点2升级。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/751371/viewspace-1061243/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/751371/viewspace-1061243/
本文记录了Oracle 11.2.0.3.0升级过程中遇到的问题及解决步骤。主要介绍了节点1实例无法正常启动的情况,通过分析日志发现oraagent进程问题并修复,最终使实例成功启动。

229

被折叠的 条评论
为什么被折叠?



