Search This Blog

Thursday, August 2, 2012

Understanding Oracle QUERY PLAN - Part2 (Exploring SQL Joins)

This is the second part of the article Understanding Oracle Query Plan. In this part we will deal with SQL Joins.
For the first part of this article, click here
This time we will explore and try to understand query plan for joins. Let’s take on joining of two tables and let’s find out how Oracle query plan changes. We will start with two tables as following,
Product Table
- Stores 15000 products. Each product has unique numeric id.
Buyer Table
- Stores 15,000,00 buyers who buy the above products. This table has unique id field as well as a prodid (product id) field that links back to the product table.
Before we start, please note, we do not have any index or table statistics present for these tables.

SORT MERGE JOIN

  
SQL> explain plan for SELECT *
  2  FROM PRODUCT, BUYER
  3  WHERE PRODUCT.ID = BUYER.PRODID;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
---------------------------------------------

---------------------------------------
| Id  | Operation           | Name    |
---------------------------------------
|   0 | SELECT STATEMENT    |         |
|   1 |  MERGE JOIN         |         |
|   2 |   SORT JOIN         |         |
|   3 |    TABLE ACCESS FULL| BUYER   |
|*  4 |   SORT JOIN         |         |
|   5 |    TABLE ACCESS FULL| PRODUCT |
---------------------------------------

Above plan tells us that CBO is opting for a Sort Merge join. In this type of joins, both tables are read individually and then sorted based on the join predicate and after that sorted results are merged together (joined).
  
Read Product ---> Sort by product id ------|
                                           |---> join
Read Buyer   ---> Sort by product id ------|
Joins are always a serial operation even though individual table access can be parallel.
Now let’s create some statistics for these tables and let’s check if CBO does something else than SORT MERGE join.

HASH JOIN

SQL> analyze table product compute statistics;

Table analyzed.

SQL> analyze table buyer compute statistics;

Table analyzed.

SQL> explain plan for SELECT *
  2  FROM PRODUCT, BUYER
  3  WHERE PRODUCT.ID = BUYER.PRODID;

Explained.
 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------
Plan hash value: 2830850455

------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes |
------------------------------------------------------
|   0 | SELECT STATEMENT   |         | 25369 |  2279K|
|*  1 |  HASH JOIN         |         | 25369 |  2279K|
|   2 |   TABLE ACCESS FULL| PRODUCT | 15856 |  1254K|
|   3 |   TABLE ACCESS FULL| BUYER   |   159K|  1718K|
------------------------------------------------------
  
CBO chooses to use Hash join instead of SMJ once the tables are analyzed and CBO has enough statistics. Hash join is a comparatively new join algorithm which is theoretically more efficient than other types of joins. In hash join, Oracle chooses the smaller table to create an intermediate hash table and a bitmap. Then the second row source is hashed and checked against the intermediate hash table for matching joins. The bitmap is used to quickly check if the rows are present in hash table. The bitmap is especially handy if the hash table is too huge. Remember only cost based optimizer uses hash join.
Also notice the FTS operation in the above example. This may be avoided if we create some index on both the tables. Watch this,
SQL> create index idx_prod_id on product (id);

Index created.

SQL> create index idx_buyer_prodid on buyer (prodid);

Index created.

SQL>  explain plan for select product.id
  2  FROM PRODUCT, BUYER
  3  WHERE PRODUCT.ID = BUYER.PRODID;

Explained.

SQL>  select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------

------------------------------------------------------------------
| Id  | Operation             | Name             | Rows  | Bytes |
------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                  | 25369 |   198K|
|*  1 |  HASH JOIN            |                  | 25369 |   198K|
|   2 |   INDEX FAST FULL SCAN| IDX_PROD_ID      | 15856 | 63424 |
|   3 |   INDEX FAST FULL SCAN| IDX_BUYER_PRODID |   159K|   624K|
------------------------------------------------------------------

NESTED LOOP JOIN

There is yet another kind of joins called Nested Loop Join. In this kind of joins, each record from one source is probed against all the records of the other source. The performance of nested loop join depends heavily on the number of records returned from the first source. If the first source returns more record, that means there will be more probing on the second table. If the first source returns less record, that means, there will be less probing on the second table.
To show a nested loop, let’s introduce one more table. We will just copy the product table into a new table, product_new. All these tables will have index.
Now I write a simple query below,
  
select  *
from buyer, product, product_new
where buyer.prodid=product.id
and buyer.prodid = product_new.id;
And then I checked the plan. But the plan shows a HASH JOIN condition and not a NESTED LOOP. This is, in fact, expected because as discussed earlier hash-join is more efficient compared to other joins. But remember hash join is only used for cost based optimizer. So if I force Oracle to use rule based optimizer, I might be able to see nested joins. I can do that by using a query hint. Watch this,
SQL> explain plan for
  2   select /*+ RULE */ *
  3  from buyer, product, product_new
  4  where buyer.prodid=product.id
  5  and buyer.prodid = product_new.id;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------
Plan hash value: 3711554028

-----------------------------------------------------------
| Id  | Operation                      | Name             |
-----------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |
|   1 |  TABLE ACCESS BY INDEX ROWID   | PRODUCT          |
|   2 |   NESTED LOOPS                 |                  |
|   3 |    NESTED LOOPS                |                  |
|   4 |     TABLE ACCESS FULL          | PRODUCT_NEW      |
|   5 |     TABLE ACCESS BY INDEX ROWID| BUYER            |
|*  6 |      INDEX RANGE SCAN          | IDX_BUYER_PRODID |
|*  7 |    INDEX RANGE SCAN            | IDX_PROD_ID      |
-----------------------------------------------------------
Voila! I got nested loops! As you see, this time I have forced Oracle to use rule based optimizer by providing /*+ RULE */ hint. So Oracle has now no option but to use nested loops. As apparent from the plan, Oracle performs a full scan of product_new and index scans for other tables. First it joins buyer with product_new by feeding each row of buyer to product_new and then it sends the result set to probe against product.
Ok, with this I will conclude this article. The main purpose of this article and the earlier one was to make you familiar on Oracle query execution plans. Please keep all these ideas in mind because in my next article I will show how we can use this knowledge to better tune our SQL Queries. Stay tuned.