查看更多教程 https://on itroad.com
解决方案
- 以独占模式持有卸载磁盘发现入队“DD-00000000-00000000”的会话之一无限期地等待“kfk:异步磁盘IO”。
此过程阻止 RBAL 为在 RAC 环境中的其他节点上添加的新设备获取相同的队列(卸载的磁盘发现队列)。
这就是为什么从 RBAL 跟踪中重复以下消息的原因。
kfgbTryFn: failed to acquire DD.0.0 in 6 for kfgbDiscoverNow (of group 7/0x259d8ac6)
这可以从下面给出的 sql 脚本中检查:
set linesize 200 set pagesize 1000 column username format a10 column mod format a20 column blocker format a7 column waiter format a7 column lmode format 9999 column request format 9999 column I format 99 column sid format 9999 col username format a6 col osuser format a8 col s# format 99999 col CS_pid format a13 col pname format a10 col program format a20 col waitsec format 999,999,999 col pid format 9999 --col p1 format 9999 col p2 format a20 col sql format a20 spool locking_information prompt ######################## prompt # Blocking Information # prompt ######################## select b.inst_id||'/'||b.sid blocker, -- s.module, w.inst_id||'/'||w.sid waiter, b.type, b.id1, b.id2, b.lmode, w.request from gv$lock b, ( select inst_id, sid, type, id1, id2, lmode, request from gv$lock where request > 0 ) w -- gv$session s where b.lmode > 0 and ( b.id1 = w.id1 and b.id2 = w.id2 and b.type = w.type ) --and ( b.sid = s.sid and b.inst_id = s.inst_id ) order by b.inst_id, b.sid / prompt ########################## prompt # Rebalance Information # prompt ########################## select * from gv$asm_operation / prompt ######################## prompt # Locking Information # prompt ######################## select a.type, a.id1, a.id2, a.lmode, a.request, a.inst_id inst, a.sid, case when a.type='DD' and a.id1=0 and a.id2=0 and a.lmode=6 then '<<<<<<------------------' end "Dismounted DD enq holder" from gv$lock a order by a.type, a.id1, a.id2, a.lmode / prompt ######################## prompt # Session Information # prompt ######################## select s.inst_id I, s.sid, s.serial# s#, p.pid, s.username, s.process||'/'||spid CS_pid, p.pname, --> p.program in 10g_11gR1 s.status, s.module program, s.osuser , substr(w.event, 1, 30) wait_event, w.seconds_in_wait waitsec, w.p1, case when w.event='DFS lock handle' and w.p2=38 then 'ASM diskgroup discovery wait' when w.event='DFS lock handle' and w.p2=39 then 'ASM diskgroup release' when w.event='DFS lock handle' and w.p2=40 then 'ASM push DB updates' when w.event='DFS lock handle' and w.p2=41 then 'ASM add ACD chunk' when w.event='DFS lock handle' and w.p2=42 then 'ASM map resize message' when w.event='DFS lock handle' and w.p2=43 then 'ASM map lock message' when w.event='DFS lock handle' and w.p2=44 then 'ASM map unlock message (phase 1)' when w.event='DFS lock handle' and w.p2=45 then 'ASM map unlock message (phase 2)' when w.event='DFS lock handle' and w.p2=46 then 'ASM generate add disk redo marker' when w.event='DFS lock handle' and w.p2=47 then 'ASM check of PST validity' when w.event='DFS lock handle' and w.p2=48 then 'ASM offline disk CIC' when w.event='DFS lock handle' and w.p2=52 then 'ASM F1X0 relocation' when w.event='DFS lock handle' and w.p2=55 then 'ASM disk operation message' when w.event='DFS lock handle' and w.p2=56 then 'ASM I/O error emulation' when w.event='DFS lock handle' and w.p2=60 then 'ASM Pre-Existing Extent Lock wait' when w.event='DFS lock handle' and w.p2=61 then 'Perform a ksk action through DBWR' when w.event='DFS lock handle' and w.p2=62 then 'ASM diskgroup refresh wait' else to_char(w.p2) end p2 , substr(q.sql_text, 1, 100) sql from gv$session s , gv$process p , gv$session_wait w , gv$sqlarea q where ( s.paddr = p.addr and s.inst_id = p.inst_id ) and ( s.inst_id = w.inst_id and s.sid = w.sid ) and ( s.inst_id = q.inst_id(+) and s.sql_address = q.address(+) ) order by s.inst_id, s.sid --, s.audsid / spool off exit
示例输出:
------------------------------------------------------------------------------------------------------------------------ DD 0 0 6 0 2 182 <<<<<<------------------ ( Inst# 2, SID 182 is an exclusive holder process for DD-00000000-00000000 )
注意ID1和ID2为“0”,例如:DD-00000000-00000000,LMODE为“6”,为独占模式。
- 添加到受影响磁盘组的设备之一显示接近 100% 的利用率。
例如,“iostat -xt 2”的输出,其中 xvdev1 是添加的设备之一。
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdev 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 0.00 100.00 <<<<<------- Utilization shows 100% xvdev1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.00 0.00 0.00 100.00
解决问题:
修复设备在操作系统或者存储级别上显示接近 100% 利用率的问题。
修复有问题的设备后,通过使用新设备以注释 557348.1 中描述的方式创建虚拟磁盘组来模拟相同的问题。
并运行 asm_blocking.sql 以检查是否有任何进程长时间持有“DD-00000000-00000000”。
如果可以毫无问题地创建新的 DUMMY 磁盘组,则不会发生同样的情况。如果在操作系统级别修复存储问题后未自动启动重新平衡,则重新启动磁盘组的重新平衡。
SQL> alter diskgroup DATA rebalance power 6;
问题
在RAC环境中,在现有的磁盘组中添加了多个磁盘,sqlplus会话发起添加操作没有返回控制权,需要手动断开。
v$asm_operation 没有发生重新平衡:
SQL> select * from gv$asm_operation; no rows selected
- 其他节点中的“磁盘验证挂起”消息可见,但 ASM 警报.log 中没有“成功:刷新成员资格”消息:
Tue Aug 27 23:32:36 2013 NOTE: disk validation pending for group 2/0x75fe02b8 (DATA) Wed Aug 28 05:28:52 2013
- RBAL 跟踪重复显示以下消息。
kfgbTryFn: failed to acquire DD.0.0 in 6 for kfgbDiscoverNow (of group 7/0x259d8ac6)
注意:“DD.0.0”用于卸载磁盘发现入队,“6”用于独占模式。
- 查询 v$asm_disk 和 v$asm_diskgroup 挂起,但查询 v$asm_disk_stat 和 v$asm_diskgroup_stat 视图有效。
新设备的 v$asm_disk_stat 输出示例。
注意“添加”状态:
GN DN m_status h_status mo_status state dname 2 10 OPENED MEMBER SYNCING ADDING DATA_0010 2 11 OPENED MEMBER SYNCING ADDING DATA_0011
日期:2020-09-17 00:11:19 来源:oir作者:oir