NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL

Which method is best to select values present in one table but missing in another one?

This:
SELECT l.* FROM t_left l LEFT JOIN t_right r ON r.value = l.value WHERE r.value IS NULL

,this:
SELECT l.* FROM t_left l WHERE l.value NOT IN ( SELECT value FROM t_right r )

or this:
SELECT l.* FROM t_left l WHERE NOT EXISTS ( SELECT NULL FROM t_right r WHERE r.value = l.value )

Differences between the methods

These methods are quite different.

First of all, LEFT JOIN / IS NULL and NOT EXISTS are semantically equivalent, while NOT IN is not. These method differ in how they handle NULL values in t_right

LEFT JOIN is guaranteed to return every row from t_left, and then filtering is applied to the values returned from t_right. If for some row in t_left there is no corresponding row in t_right (which means no row with that exact value is present in t_right), the row from t_left will be returned once, and the NULL values will be substituted instead of t_right‘s actual values.

Since NULL values can never satisfy an equality JOIN condition, the NULL values returned by the query are guaranteed to be substituted by the LEFT JOIN, not fetched out of the actual t_right‘s row. This means that LEFT JOIN / IS NULL is guaranteed to return at most one row from t_left, and these row’s value is not equal to one of those in t_right.

The same holds for NOT EXISTS. Since it’s a predicate, not a JOIN condition, the rows from t_left can only be returned at most once too. EXISTS always returns TRUE or FALSE and it will return TRUE as soon as it finds only a single matching row in t_right, or FALSE, if it find none.

NOT EXISTS, therefore, will return TRUE only if no row satisfying the equality condition is found in t_right (same as for LEFT JOIN / IS NULL).

Note that NULL values do not safisfy the equality conditions, so both LEFT JOIN / IS NULL and NOT EXISTS will always return rows from t_left that have value set to NULL, even is there are rows with value IS NULL in t_right.

NOT IN, however, behaves differently.

IN predicate (unlike EXISTS) is trivalent, i. e. it can return TRUE, FALSE or NULL:

  • TRUE is returned when the non-NULL value in question is found in the list
  • FALSE is returned when the non-NULL value is not found in the list and the list does not contain NULL values
  • NULL is returned when the value is NULL, or the non-NULL value is not found in the list and the list contains at least one NULL
  • value

IN predicate does not give a definitive answer to whether or not the expression is contained in the list as long as there are NULL values on either side of the expression, returning NULL instead.

This of course makes no difference when using the positive form of NULL: predicates returning NULL are filtered out by the WHERE clause as well as those returning FALSE.

However, NOT IN is different, since negation of NULL is NULL as well.

That’s why NOT IN condition will never hold for any list with a NULL value in it.

  • If a row is found in the list, IN will return TRUE and NOT IN, therefore, will return FALSE
  • If a row is not found in the list, IN will return NULL, and NOT IN on its turn will also return NULL

Both conditions will of course be filtered out by the WHERE clause.

Summary

In SQL Server, NOT EXISTS and NOT IN predicates are the best way to search for missing values, as long as both columns in question are NOT NULL. They produce the safe efficient plans with some kind of an Anti Join.

LEFT JOIN / IS NULL is less efficient, since it makes no attempt to skip the already matched values in the right table, returning all results and filtering them out instead.



The original posting URL: http://explainextended.com/2009/09/15/not-in-vs-not-exists-vs-left-join-is-null-sql-server/

标签: none

添加新评论