Wednesday, 8 June 2011

A Right Pig's Ear of a Circular Reference

If you have ever used a self referencing table within Oracle to store hierarchical data (e.g. an organisations structure), you will have undoubtedly used CONNECT BY PRIOR to build your results tree. This is something we use on pretty much every project as the organisation is very hierarchy based.

Recently, the support cell sent the details of a recent call they received asking me to take a look. Looking down the call, I noticed that the following Oracle Error Message was logged:

"ORA-01436: CONNECT BY loop in user data"

A quick look at the explanation of -01436 and it was clear that there was a circular reference in the organisation table i.e. ORG_UNIT1 was the PARENT of ORG_UNIT2 and ORG_UNIT2 was the PARENT of ORG_UNIT1. In this example, both ORG_UNITS were the child and parent of each other. Clearly this was an issue which was quickly resolved by the addition of a application and server side validation to prevent this from re-occurring.

The outcome of this fix was a useful script that I keep to identify if there are any circular references within a self referencing table. The example below shows this script in action:
CREATE TABLE ORGANISATIONS (ORG_ID NUMBER
, PARENT_ORG_ID NUMBER
, NAME VARCHAR2(100));

INSERT INTO ORGANISATIONS VALUES (1, NULL, 'HQ');
INSERT INTO ORGANISATIONS VALUES (2, 1, 'SALES');
INSERT INTO ORGANISATIONS VALUES (3, 1, 'RESEARCH');
INSERT INTO ORGANISATIONS VALUES (4, 1, 'IT');
INSERT INTO ORGANISATIONS VALUES (5, 2, 'EUROPE');
INSERT INTO ORGANISATIONS VALUES (6, 2, 'ASIA');
INSERT INTO ORGANISATIONS VALUES (7, 2, 'AMERICAS');

COMMIT;

A quick tree walk query shows the visual representation of the hierarchy with no errors:
SELECT LPAD ('*', LEVEL, '*') || name tree
,LEVEL lev
FROM organisations
START WITH org_id = 1
CONNECT BY PRIOR org_id = parent_org_id;

TREE LEV
---------------------
*HQ 1
**SALES 2
***EUROPE 3
***ASIA 3
***AMERICAS 3
**RESEARCH 2
**IT 2

Now lets create a circular reference so that EUROPE is the parent of SALES:
UPDATE ORGANISATIONS SET parent_org_id = 5 WHERE NAME = 'SALES';
COMMIT;

Re-running the query from the very top of the tree completes but gives an incorrect
and incomplete result set:
TREE         LEV
---------------------
*HQ 1
**RESEARCH 2
**IT 2

If you running Oracle Database 9i and backwards, this Experts Exchange article provides a nice procedural solution and a quick mod to the PL/SQL gave me exactly the information I needed:
SET serveroutput on size 20000

DECLARE
l_n NUMBER;
BEGIN
FOR rec IN (SELECT org_id FROM organisations)
LOOP
BEGIN
SELECT COUNT ( * )
INTO l_n
FROM ( SELECT LPAD ('*', LEVEL, '*') || name tree
, LEVEL lev
FROM organisations
START WITH org_id = rec.org_id
CONNECT BY PRIOR org_id = parent_org_id);
EXCEPTION
WHEN OTHERS
THEN
IF SQLCODE = -1436
THEN
DBMS_OUTPUT.put_line
(
rec.org_id
|| ' is part of a Circular Reference'
);
END IF;
END;
END LOOP;
END;
/
As Buzz Killington pointed out in the comments section, Oracle Database 10g onwards introduces CONNECT BY NOCYCLE which will instruct Oracle to return rows even if it is involved in a self referencing loop. When used with the CONNECT_BY_ISCYCLE pseudocolumn, you can easily identify erroneous relationships via SQL without the need to switch to PL/SQL. An example of this can be seen by executing the following query:

WITH
my_data
AS
( SELECT ORG.PARENT_ORG_ID
,ORG.ORG_ID
,ORG.NAME org_name
FROM organisations org
)
SELECT
SYS_CONNECT_BY_PATH (org_name,'/') tree
,PARENT_ORG_ID
,ORG_ID
,CONNECT_BY_ISCYCLE err
FROM
my_data
CONNECT BY NOCYCLE PRIOR org_id = parent_org_id
ORDER BY 4 desc;

TREE PARENT_ORG_ID ORG_ID ERR
------------------------------------------------
/SALES/EUROPE 2 5 1
/EUROPE/SALES 5 2 1
/EUROPE 2 5 0
/EUROPE/SALES/ASIA 2 6 0
/EUROPE/SALES/AMERICAS 2 7 0
/ASIA 2 6 0
/AMERICAS 2 7 0
/SALES 5 2 0
/SALES/ASIA 2 6 0
/SALES/AMERICAS 2 7 0
/HQ 1 0
/HQ/RESEARCH 1 3 0
/HQ/IT 1 4 0
/RESEARCH 1 3 0
/IT 1 4 0

Any value > 0 in the ERR column indicates that the row is involved in a self referencing join. This is a much neater and more performant way to achieve to desired result. Thanks to Buzz for pointing out a much better way to original PL/SQL routine.

3 comments:

Buzz Killington said...

Uhh don't you just want to use the nocycle param?

connect by nocycle

That will avoid errors - you can also use the CONNECT_BY_ISCYCLE pseudocolumn to show you which of these has bad data.

Buzz Killington said...

Secondly if you want to trap errors, you should use a pragma exception_init to define your custom error and catch it. Even in an anonymous block when others is just lazy.

Duncan Mein said...

Buzz. Agreed to both points. I have updated the post to include the use of connect by nocycle for folks running 10g onwards.

Thanks for bringing this to my attention. It is a far more elegant way to achieve the desired result.

Cheers.
Duncs