Connect by is an Oracle specific clause that is used to fetch the Hierarchical data present in the DB in a more efficient way..

The article describes some of my thought process when i was unaware that such a clause existed.

Lets take a common and almost omnipresent example.

Suppose you have a table Employee, in which you have employeeId and managerId and the table structure looks like this

“create table employee

( id integer primary key,

name varchar2(10) unique,

mgr_id integer

);”

Now, suppose you insert the data in the table in such a way that it represents the hierarchy of the employee as follows.

A

B        C

D   E

F G

The correspoding insert scripts will be

“insert into employee values (1, ‘A’, null);”

“insert into employee values (2, ‘B’, 1);”

“insert into employee values (3, ‘C’, 1);”

“insert into employee values (4, ‘D’, 2);”

“insert into employee values (5, ‘E’, 2);”

“insert into employee values (6, ‘F’, 4);”

“insert into employee values (7, ‘G’, 4);”

Now, suppose you have to do the following in a single query for each.

1) Find out all direct reporters to ‘A’  — Expected Result : B, C,

2) Find out all direct and indirect reporters to ‘A’ — Expected result  B, C, D, E, F, G

The first goal can be easily achieved by a simple query using a Self Join like

“Select e2.name from employee e1, employee e2 where e2.mgr_id = e1.id and e1.name = ‘A'”

The second goal requires some brainstorming..

It can be seen that by using a simple Self Join as above, we can go to one level below and fetch the employees.

However, using the query like

“Select e3.name from employee e1, employee e2, employee e3 where e3.mgr_id = e2.id and e2.mgr_id = e1.id and e1.name = ‘A’;”

will give us the employees on the second level below A.

So, a reasonable query like

“Select e2.name from employee e1, employee e2 where e2.mgr_id = e1.id and e1.name = ‘A’

UNION

Select e3.name from employee e1, employee e2, employee e3 where e3.mgr_id = e2.id and e2.mgr_id = e1.id and e1.name = ‘A’

UNION

Select e4.name from employee e1, employee e2, employee e3, employee e4 where e4.mgr_id = e3.id and e3.mgr_id = e2.id and e2.mgr_id = e1.id and e1.name = ‘A’;”

will get the desired result.

I am still looking for any other query in ANSI SQL which does not use UNION.(considering the number of levels are known)

and I am not very sure if there is a query in ANSI SQL which can get the required result if the number of levels in the tree is unknown.

However, their is a glaring limitation in such a solution.

You should know the number of levels to which the hierarchy goes down so that you can have related no of Unions to the query.

Also, suppose you were to fetch the total reporters(direct and indirect) to ‘B’ (who infact, has no reporters).

Now, suppose you know that the maximum no of levels in the tree is four so that you execute the above query(using appropriate no of unions)

and get the result as 0.

Note that a number of unions were required although there were no reporters to B which is a huge performance bottleneck if number of levels is more.

Alternate Strategies to the problem do exist which include keeping the levelNo and maxNoOfLevels as columns in the table which

could be used to find out about the number of Self Joins that need to be done, but it has its big drawbacks of maintaining these

levelNos as when the data is inserted or updated.

One approach which one could think of is using a stored procedure (or a function).

You could pass the parameters to the function and the function will get the reporters and then dynamically decide if there

are any more nested levels to go to and stop if there are none.

This is where ‘Connect by’ clause comes riding on a white horse and saves the day.

A query like

“Select e.name from Employee e start with e.name = ‘A’ connect by prior id = mgr_id”

returns the tree structure starting from ‘A’.

And we need a little manipulation to get only the children of ‘A’ using.

“Select e.name from Employee e start with e.name = ‘A’ connect by prior id = mgr_id

Minus

Select ‘A’ from Dual;”

Not surprising, it is reckoned that the implementation of the Connect By clause is done using a stored procedure with a dynamic

recursive loop(almost like we thought before)

Further, the Connect By Clause is Oracle Specific and clause may differ for different databases, if it exists for them.

Some databases like DB2, do not have any such clause, however, there are ways to fetch such data in a query using their own syntax(using the With Clause)

And needless to say, Hibernate supports neither Connect By nor Union.

So, making a named query will be a resort in such a case, but it will have to differ for different DBs.

There are various flavours to the “Connect By” clause which are used to get the hierarchical data using some more conditions like

from one level to another.