3.23.2015

Join Methods

To join each pair of row sources, Oracle Database must decide how to do it. The "how" is the join method.
 
The possible join methods are - 
  •  Nested Loop Join
  •  Hash Join
  •  Sort Merge Join
A Cartesian join requires one of the preceding join methods.
Each join method has specific situations in which it is more suitable than the others.


NESTED LOOP

Suppose somebody gave you a telephone book and a list of 20 names to look up, and then asked you to write down each person's name and corresponding telephone number. You would probably go down the list of names, looking up each one in the telephone book one at a time. This task would be pretty easy because the telephone book is alphabetized by name. Moreover, somebody looking over your shoulder could start calling to the first few numbers you write down, while you are still looking up the rest. This scene describes a NESTED LOOPS join.

Q. What are the Nested Loop processing Steps ?

Step-1:
Oracle reads the first row from the first row source,

Step-2:
and then checks the second row source for matches.

Step-3:
All matches are then placed in the result set and,

Step-4:
Oracle goes on to the next row from the first row source.

Step-n:
This continues until all rows in the first row source have been processed.
The first row source is often called the outer or driving table, whereas the second row source is called the inner table.

Using a NESTED LOOPS join is one of the fastest methods of receiving the first records back from a join.

Q: When Oracle Nested Loop Joins Are Ideal ?


NESTED LOOPS joins are ideal when the driving row source (the records you are looking for) is small and the joined columns of the inner row source are uniquely indexed.

NESTED LOOPS joins have an advantage over other join methods in that they can quickly retrieve the first few rows of the result set without having to wait for the entire result set to be determined. This situation is ideal for query screens where an end user can read the first few records retrieved while the rest are being fetched.

However, NESTED LOOPS joins can be very inefficient if the inner row source (second table accessed) does not have an index on the joined columns or if the index is not highly selective. If the driving row source (the records retrieved from the driving table) is quite large, other join methods may be more efficient.

Video link for Nested loop and Nested loop with Index -
https://www.youtube.com/watch?v=o1dMJ6-CKzU

Reference - 
http://logicalread.solarwinds.com/oracle-11g-when-nested-loop-joins-are-ideal-mc02/#.VQ5kx0l5OPI 

HASH JOIN

A method for joining large data sets.
 
The database uses the smaller of two data sets to build a hash table on the join key in memory. It then scans the larger data set, probing the hash table to find the joined rows.
In cases where a very small table is being joined to a large table, the oracle HASH JOIN will often dramatically speed-up the query.

Q. When optimizer considers a hash join ?

In general, the optimizer considers a hash join when the following conditions are true:

- A relatively large amount of data must be joined, or a large fraction of a small table must be joined.
- The join is an equijoin, that mean, you cannot use hash joins unless there is an equality condition.

This method is best when the smaller table fits in available memory.

Q. What are the Hash Join processing Steps ?


Step-1:
Oracle select the smaller of two data sets.

Step-2:
The database performs a full scan of the selected smaller data set, and then applies a Hash Function to the join key in each row to build a Hash Table in the PGA (memory).
In this step, oracle perform a nested loop to build the hash table, from the smaller table.
 

For example -



Step-3:
The hast table then scans the larger data set, to find the joined rows.

---
The HASH join is similar to a NESTED LOOPS join in the sense that there is a nested loop that occurs when Oracle first builds a hash table to facilitate the operation and then loops through the hash table.

When using an ORDERED hint, the first table in the FROM clause is the table used to build the hash table.

HASH joins can be effective when the lack of a useful index renders NESTED LOOPS joins inefficient. The HASH join might be faster than a SORT-MERGE join, in this case, because only one row source needs to be sorted, and it could possibly be faster than a NESTED LOOPS join because probing a hash table in memory can be faster than traversing a b-tree index.

As with SORT-MERGE joins and CLUSTER joins, HASH joins work only on equijoins. As with SORT-MERGE joins, HASH joins use memory resources and can drive up I/O in the temporary tablespace if the sort memory is not sufficient (which can cause this join method to be extremely slow).

Finally, HASH joins are available only when cost-based optimization is used (which should be 100 percent of the time for your application running on Oracle 11g).

Because the hash table is in the PGA, Oracle Database can access rows without latching them. This technique reduces logical I/O by avoiding the necessity of repeatedly latching and reading blocks in the database buffer cache.

If the data sets do not fit in memory, then the database partitions the row sources, and the join proceeds partition by partition. This can use a lot of sort area memory, and I/O to the temporary tablespace. This method can still be the most cost effective, especially when parallel query servers are used.

Q. What is a hash table ?

An in-memory data structure that associates join keys with rows in a hash join.
For example, in a join of the employees and departments tables, the join key might be the department ID. A hash function uses the join key to generate a hash value. This hash value is an index in an array, which is the hash table.

Q. What is in-memory scan ?


A table scan that retrieves rows from the In-Memory Column Store.

Q. How to size your PGA for Hash Join ?

The rule is quite different depending on your system. You need to focus on HASH_AREA_SIZE or PGA_AGGREGATE_TARGET parameter.

Oracle uses a HASH join efficiently only if the parameter PGA_AGGREGATE_TARGET is set to a large enough value. If MEMORY_TARGET is used, the PGA_AGGREGATE_TARGET is included in the MEMORY_TARGET, but you may still want to set a minimum.

If you set the SGA_TARGET, you must set the PGA_AGGREGATE_TARGET as the SGA_TARGET does not include the PGA (unless you use MEMORY_TARGET as just described).

And, Oracle places limits of the amount of RAM available for HASH JOIN so that no single session will "Hog" all the RAM.

Unfortunately, the oracle Hash Join is more memory intensive than Nested Loop join Join.
For large tables, hash joins requires lots of RAM so we must set the HASH_AREA_SIZE large enough to hold the entire hash table in memory.

In a production database with very large tables, it is not always easy to get your database to invoke hash joins without increasing the RAM regions that control hash joins. 
For large tables, hash joins requires lots of RAM.

Q. How to use Hash Join Hints ?


Apply the USE_HASH hint to instruct the optimizer to use a hash join when joining two tables together.

More -
http://logicalread.solarwinds.com/oracle-11g-hash-joins-mc02/#.VQ6D1Ul5OPI

Still I have a big confusion about Hash Join!!
 

The most fundamental difference between hash and nested loop joins -
http://blog.tanelpoder.com/2010/10/06/a-the-most-fundamental-difference-between-hash-and-nested-loop-joins/
SORT MARGE JOIN

Sort merge joins can join rows from two independent sources.

Hash joins generally perform better than sort merge joins. However, sort merge 
joins can perform better than hash joins if both of the following conditions exist:
- The row sources are sorted already.
- A sort operation does not have to be done.

However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.

Sort merge joins are useful when the join condition between two tables is an inequality condition such as <, <=, >, or >=.
 

Sort merge joins perform better than nested loop joins for large data sets.
You cannot use hash joins unless there is an equality condition.

Q. What are the Sort Marge Join processing Steps ?


In a merge join, there is no concept of a driving table. The join consists of two steps:
- Sort join operation: Both the inputs are sorted on the join key.
- Merge join operation: The sorted lists are merged together.

If the input is sorted by the join column, then a sort join operation is not performed for that row source.
 

However, a sort merge join always creates a positionable sort buffer for the right side of the join so that it can seek back to the last match in the case where duplicate join key values come out of the left side of the join.

Q. When the Optimizer Uses Sort Merge Joins ?

The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:

- The join condition between two tables is not an equijoin.
- Because of sorts required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.


Q. How to use Sort Merge Join Hints ?

To instruct the optimizer to use a sort merge join, apply the USE_MERGE hint. You might also need to give hints to force an access path.

There are situations where it makes sense to override the optimizer with the USE_MERGE hint.
For example, the optimizer can choose a full scan on a table and avoid a sort operation in a query.
 

However, there is an increased cost because a large table is accessed through an index and single block reads, as opposed to faster access through a full table scan.

More -
http://docs.oracle.com/cd/E29505_01/server.1111/e16638/optimops.htm#autoId42
https://docs.oracle.com/database/121/TGSQL/tgsql_join.htm#TGSQL94690

 

Cartesian Joins

The database uses a Cartesian join when one or more of the tables does not have any join conditions to any other tables in the statement.
 
The optimizer joins every row from one data source with every row from the other data source, creating the Cartesian product of the two sets.

Q. When the Optimizer Uses Cartesian Joins ?


The optimizer uses Cartesian joins when it is asked to join two tables with no join conditions.
 

In some cases, a common filter condition between the two tables could be picked up by the optimizer as a possible join condition.
 

In other cases, the optimizer may decide to generate a Cartesian product of two very small tables that are both joined to the same large table.

Q. How to use Cartesian Join Hints ?

Applying the ORDERED hint, instructs the optimizer to use a Cartesian join.
By specifying a table before its join table is specified, the optimizer does a Cartesian join.