Oracle常见死锁发生的原因以及解决方法Oracle常见死锁发生的原因以及解决办法一,删除和更新之间引起的死锁造成死锁的原因就是多个线程或进程对同一个资源的争抢或相互依赖。
这里列举一个对同一个资源的争抢造成死锁的实例。
Oracle 10g, PL/SQL version 9.2CREATE TABLE testLock( ID NUMBER,test VARCHAR(100) )COMMITINSERT INTO testLock VALUES(1,'test1');INSERT INTO testLock VALUES(2,'test2');COMMIT;SELECT * FROM testLock1. ID TEST2.---------- ----------------------------------3. 1 test14. 2 test2死锁现象的重现:1)在sql 窗口执行:SELECT * FROM testLock FOR UPDATE; -- 加行级锁并对内容进行修改,不要提交2)另开一个command窗口,执行:delete from testLock WHERE ID=1;此时发生死锁(注意此时要另开一个窗口,不然会提示:POST THE CHANGE RECORD TO THE DATABASE. 点yes 后强制commit):3)死锁查看:1.SQL> select ername,l.object_id, l.session_id,s.serial#, s.lockwait,s.status,s.machine,s.program from v$session s,v$locked_object l where s.sid = l.session_id;</p><p>USER NAME SESSION_ID SERIAL# LOCKWAIT STATUS MACHINE PROGRAM2.---------- ---------- ---------- -------- -------- ---------------------- ------------3.SYS 146 104 INACTIVE WORKGROUP\J-THINK PLSQLDev.exe4.SYS 144 145 20834474 ACTIVE WORKGROUP\J-THINK PLSQLDev.exe字段说明:Username:死锁语句所用的数据库用户;SID: session identifier,session 标示符,session 是通信双方从开始通信到通信结束期间的一个上下文。
SERIAL#: sid 会重用,但是同一个sid被重用时,serial#会增加,不会重复。
Lockwait:可以通过这个字段查询出当前正在等待的锁的相关信息。
Status:用来判断session状态。
Active:正执行SQL语句。
Inactive:等待操作。
Killed:被标注为删除。
Machine:死锁语句所在的机器。
Program:产生死锁的语句主要来自哪个应用程序。
4)查看引起死锁的语句:SQL> select sql_text from v$sql where hash_value in (select sql_hash_value from v$se ssion where sid in (select session_id from v$locked_object));1.2.SQL_TEXT3.------------------------------------------------------------1.delete from testLock where ID = 15)死锁的处理:SQL> alter system kill session '144,145';1.2.System altered3.4.Executed in 1.061 seconds此时在执行delete语句的窗口出现:SQL> delete from testLock where ID = 1;1.2.delete from testLock where ID = 13.4.ORA-00028: 您的会话己被终止再查看一下死锁,会发现已经没有stauts为active的记录了:SQL> select ername, l.session_id,s.serial#, s.lockwait,s.status,s.machine,s.program from v$session s,v$locked_object l where s.sid = l.session_id;1.ERNAME SESSION_ID SERIAL# LOCKWAIT STATUS MACHINE PROGRAM3.------------- ---------- ---------- -------- -------- --------------------------- ----------------1.SYS 146 104 INACTIVE WORKGROUP\J-THINK PLSQLDev.exe2.3.Executed in 0.032 seconds发生死锁的语句已经被终止。
二,在外键上没有加索引引起的死锁客户的10.2.0.4 RAC for AIX环境频繁出现ORA-60死锁问题,导致应用程序无法顺利执行。
经过一系列的诊断,发现最终问题是由于外键上没有建立索引所致,由于程序在主子表上删除数据,缺少索引导致行级锁升级为表级锁,最终导致大量的锁等待和死锁。
下面通过一个例子简单模拟一下问题:SQL> create table t_p (id number primary key, name varchar2(30));Table created.SQL> create table t_f (fid number, f_name varchar2(30), foreign key (fid) referencest_p);Table created.SQL> insert into t_p values (1, 'a');1 row created.SQL> insert into t_f values (1, 'a');1 row created.SQL> insert into t_p values (2, 'b');1 row created.SQL> insert into t_f values (2, 'c');1 row created.SQL> commit;Commit complete.SQL> delete t_f where fid = 2;1 row deleted.这时在会话2同样对子表进行删除:SQL2> delete t_f where fid = 1;1 row deleted.回到会话1执行主表的删除:SQL> delete t_p where id = 2;会话被锁,回到会话2执行主表的删除:SQL2> delete t_p where id = 1;会话同样被锁,这时会话1的语句被回滚,出现ORA-60死锁错误:delete t_p where id = 2*ERROR at line 1:ORA-00060: deadlock detected while waiting for resourceSQL> rollback;Rollback complete.将会话1操作回滚,会话2同样回滚并建立外键列上的索引:1 row deleted.SQL2> rollback;Rollback complete.SQL2> create index ind_t_f_fid on t_f(fid);Index created.重复上面的步骤会话1删除子表记录:SQL> delete t_f where fid = 2;1 row deleted.会话2删除子表记录:SQL2> delete t_f where fid = 1;1 row deleted.会话1删除主表记录:SQL> delete t_p where id = 2;1 row deleted.会话2删除主表记录:SQL> delete t_p where id = 1;1 row deleted.所有的删除操作都可以成功执行,关于两种情况下锁信息的不同这里就不深入分析了,重点就是在外键列上建立索引。
虽然有一些文章提到过,如果满足某些情况,可以不在外键列上建立的索引,但是我的观点一向是,既然创建了外键,就不要在乎再多一个索引,因为一个索引所增加的代价,与缺失这个索引所带来的问题相比,是微不足道的。
【补充】Oracle 10g和Oracle 9i trc日志内容的差别最主要的差别是在Oracle 10g中提示了等待资源的两条sql语句,在Oracle 9i中,只显示检测到死锁的sql语句Oracle 10g 10.2.0.3.0:DEADLOCK DETECTED ( ORA-00060 )[Transaction Deadlock]The following deadlock is not an ORACLE error. It is adeadlock due to user error in the design of an applicationor from issuing incorrect ad-hoc SQL. The followinginformation may aid in determining the deadlock:Deadlock graph:---------Blocker(s)-----------------Waiter(s)---------Resource Name process session holds waits process session holds waitsTM-0000dd55-00000000 16 146 SX SSX 17 148 SX SSXSX SSXsession 146: DID 0001-0010-00000008 session 148: DID0001-0011-00000006session 148: DID 0001-0011-00000006 session 146: DID0001-0010-00000008Rows waited on:Session 148: no rowSession 146: no rowInformation on the OTHER waiting sessions:Session 148:pid=17 serial=39 audsid=540046 user: 54/SCOTTO/S info: user: SKYHOME\sky, term: SKYHOME, ospid: 3028:7000, machine: WORKGROUP\SKYHOMEprogram: plsqldev.exeapplication name: PL/SQL Developer, hash value=1190136663action name: Command Window - New, hash value=254318129Current SQL Statement:delete t_p where id = 1End of information on OTHER waiting sessions.Current SQL statement for this session:delete t_p where id = 2Oracle 9i 9.2.0.7.0:DEADLOCK DETECTEDCurrent SQL statement for this session:delete t_p where id = 2The following deadlock is not an ORACLE error. It is adeadlock due to user error in the design of an applicationor from issuing incorrect ad-hoc SQL. The followinginformation may aid in determining the deadlock:Deadlock graph:---------Blocker(s)-----------------Waiter(s)---------Resource Name process session holds waits process session holds waitsSX SSXTM-0000260e-00000000 23 20 SX SSX 21 51 SX SSXsession 51: DID 0001-0015-0000043D session 20: DID0001-0017-00000397session 20: DID 0001-0017-00000397 session 51: DID0001-0015-0000043DRows waited on:Session 20: no rowSession 51: no rowInformation on the OTHER waiting sessions:Session 20:pid=23 serial=53179 audsid=197296 user: 87/scottO/S info: user: sky, term: SKYHOME, ospid: 5540:4984, machine: WORKGROUP\SKYHOMEprogram: plsqldev.execlient info: 127.0.0.1application name: PL/SQL Developer, hash value=1190136663action name: Command Window - New, hash value=254318129Current SQL Statement:delete t_p where id = 1End of information on OTHER waiting sessions.三,两个表之前不同顺序之间的相互更新操作引起的死锁Oracle中的死锁:注:4个update语句的执行顺序按图中位置自上而下图中左边会话中断(此时不回滚也不提交,等待用户决定),右边会话阻塞,等待左边会话释放a表上的锁。