Subqueries and JOINs
A subquery can quite often, but not in all cases, be rewritten as a JOIN.
Rewriting Subqueries as JOINS
A subquery using IN
can be rewritten with the DISTINCT
keyword, for example:
SELECT * FROM table1 WHERE col1 IN (SELECT col1 FROM table2);
can be rewritten as:
SELECT DISTINCT table1.* FROM table1, table2 WHERE table1.col1=table2.col1;
NOT IN
or NOT EXISTS
queries can also be rewritten. For example, these two queries returns the same result:
SELECT * FROM table1 WHERE col1 NOT IN (SELECT col1 FROM table2); SELECT * FROM table1 WHERE NOT EXISTS (SELECT col1 FROM table2 WHERE table1.col1=table2.col1);
and both can be rewritten as:
SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id WHERE table2.id IS NULL;
Subqueries that can be rewritten as a LEFT JOIN are sometimes more efficient.
Using Subqueries instead of JOINS
There are some scenarios, though, which call for subqueries rather than joins:
- When you want duplicates, but not false duplicates. Suppose
Table_1
has three rows — {1
,1
,2
} — andTable_2
has two rows — {1
,2
,2
}. If you need to list the rows inTable_1
which are also inTable_2
, only this subquery-basedSELECT
statement will give the right answer (1
,1
,2
):
SELECT Table_1.column_1 FROM Table_1 WHERE Table_1.column_1 IN (SELECT Table_2.column_1 FROM Table_2);
- This SQL statement won't work:
SELECT Table_1.column_1 FROM Table_1,Table_2 WHERE Table_1.column_1 = Table_2.column_1;
- because the result will be {
1
,1
,2
,2
} — and the duplication of 2 is an error. This SQL statement won't work either:
SELECT DISTINCT Table_1.column_1 FROM Table_1,Table_2 WHERE Table_1.column_1 = Table_2.column_1;
- because the result will be {
1
,2
} — and the removal of the duplicated 1 is an error too.
- When the outermost statement is not a query. The SQL statement:
UPDATE Table_1 SET column_1 = (SELECT column_1 FROM Table_2);
- can't be expressed using a join unless some rare SQL3 features are used.
- When the join is over an expression. The SQL statement:
SELECT * FROM Table_1 WHERE column_1 + 5 = (SELECT MAX(column_1) FROM Table_2);
- is hard to express with a join. In fact, the only way we can think of is this SQL statement:
SELECT Table_1.* FROM Table_1, (SELECT MAX(column_1) AS max_column_1 FROM Table_2) AS Table_2 WHERE Table_1.column_1 + 5 = Table_2.max_column_1;
- which still involves a parenthesized query, so nothing is gained from the transformation.
- When you want to see the exception. For example, suppose the question is: what books are longer than Das Kapital? These two queries are effectively almost the same:
SELECT DISTINCT Bookcolumn_1.* FROM Books AS Bookcolumn_1 JOIN Books AS Bookcolumn_2 USING(page_count) WHERE title = 'Das Kapital'; SELECT DISTINCT Bookcolumn_1.* FROM Books AS Bookcolumn_1 WHERE Bookcolumn_1.page_count > (SELECT DISTINCT page_count FROM Books AS Bookcolumn_2 WHERE title = 'Das Kapital');
- The difference is between these two SQL statements is, if there are two
editions of Das Kapital (with different page counts), then the self-join
example will return the books which are longer than the shortest edition
of Das Kapital. That might be the wrong answer, since the original
question didn't ask for "... longer than
ANY
book named Das Kapital" (it seems to contain a false assumption that there's only one edition).
Content reproduced on this site is the property of its respective owners,
and this content is not reviewed in advance by MariaDB. The views, information and opinions
expressed by this content do not necessarily represent those of MariaDB or any other party.