Again, a relation is made up of a schema and an instance, which can be conceptualized as a mathematical set
A query language is a set of instructions to manipulate and retrieve data. It is not a full programming language and is not designed for complex combinations of operations
- Relational algebra is a more operational way of representing execution plans
- Relational calculus lets users describe what they want, non-procedurally
Relational Algebra
There are five operators,
- Projection selects a subset of columns in a relation
- Selection retains only wanted rows from a relation
- Cross-product allows us to combine two relations
- Set-difference finds tuples only found on the LHS
- Union finds tuples found on the LHS and/or the RHS (requires compatible relations)
In a theoretical sense, all operations work on sets and cannot output multiple of the same value. In reality, systems do not generally work this way.
Here’s some example notation:
Cross-product is interesting but not useful for now. Instead , the natural join, is very useful. It is not listed before because it is actually a compound operator. Natural join basically outputs a new relation with the columns from both and the rows that have some shared field (generally an ID) in both input relations.
Another operator, “Theta join” is like natural join but with a specific matching condition. For example,
A note about efficiency with joins: In terms of relational algebra, there are many ways to write out a query, and some would theoretically be faster by reducing the amount of rows going into a join. However, SQL is very clever and will automatically optimize queries.
Another useful compound operator is “division” . finds the rows of that contain the rows of .
SQL
SQL lets you express relational algebra quite easily.
SELECT [DISTINCT] <fields>
FROM <table> <alias>, ...
WHERE alias.<field> !=/=/</>/<=/>=/AS/LIKE/IN <value> AND/OR ...SQL also supports string comparison operations <, >, and LIKE which matches single characters, ’_’ as any character, and ’%’ as multiple characters.
Essentially, FROM computes the cross-product, WHERE checks conditions, SELECT chooses fields, and DISTINCT eliminates duplicate rows. Note, this is only a conceptual way of looking at it, in reality SQL chooses an optimal ordering of operations.
SQL supports subqueries that can be used with set operators, like UNION, INTERSECT, and EXCEPT. Additionally, WHERE clause can contain a subquery. Comparisons can also be used with ANY/SOME
A subquery is correlated if it references something from the full query’s context.
Aggregate operators take in sets as input and output single numbers
COUNT(*)
COUNT([DISTINCT] A)
SUM(A)
AVG(A)
MAX(A)
MIN(A)MAX only returns a single value. Therefore, to fetch entries with the max, you’d need something like,
SELECT S.name, S.age
FROM Sailors S
WHERE S.age = (SELECT MAX(S2.age) FROM Sailors S2)The output of a query can also be sorted and/or limited
SELECT *
FROM <table>
ORDER BY <field> ASC/DESC
LIMIT <num>Of course, SQL supports joins, and many variations. The default is INNER.
SELECT <fields>
FROM <table> [LEFT/RIGHT/FULL] [INNER/NATURAL/OUTER] JOIN <table>
ON <condition>
SELECT <fields>
FROM <table> NATURAL JOIN <table>NATURAL performs the join on all columns with the same name and datatype, without using an ON clause
LEFT OUTER returns all matched rows as well as all unmatched rows from the table on the left of the join clause (RIGHT does that with the rows of the right table, FULL does both)
GROUP BY and HAVING add additional ways of processing queries.
SELECT [DISTINCT] <fields>
FROM <tables>
WHERE <conditions>
GROUP BY <fields>
[HAVING <set-condition>]This will output a tuple for each set grouped according to the GROUP BY fields list that matches an aggregate condition.