How to think in SQL?

Have “SELECT * WHERE a=b FROM c” or “SELECT WHERE a=b FROM c ON *” ?

If you are like me, you will agree: SQL is one of those things that at first glance seem to be easy (read like English!), but for some reason have to Google every simple query to find the correct syntax.


And then start joiny, aggregation, subqueries, and it turns out quite rubbish. Like this:


the
SELECT members.firstname || '' || members.lastname
AS in "Full Name"
FROM borrowings
INNER JOIN members
ON members.memberid=borrowings.memberid
INNER JOIN books
ON books.bookid=borrowings.bookid
WHERE borrowings.bookid IN (SELECT bookid
FROM books
WHERE stock>(SELECT avg(stock)
FROM books))
GROUP BY members.firstname, members.lastname;

BUE! This will scare away any beginner, or even intermediate level developer, if he sees SQL for the first time. But not so bad.


Easy to remember what is intuitive, and with the help of this guide I hope to lower the entry threshold for newcomers to SQL and experienced to offer a new look at the SQL.


despite the fact that the SQL syntax is almost the same in different databases, this article is used to query PostgreSQL. Some examples will work in MySQL and other databases.


the

1. Three magic words


In many SQL keywords, but SELECT, FROM and WHERE are present in almost every query. Later you will realize that these three words represent the most fundamental aspects of building queries to the database, and other more complex queries are just add-ons over them.


the

2. Our database


Let's look at the database that we will use as an example in this article:





(the reference to the table)


we Have books and people. There is also a spreadsheet for recording books issued.


the
    the
  • In the table "books" stores information about title, author, date of publication and availability of the book. It's simple.
  • the
  • In the table “members” — the names of all students enrolled in the library.
  • the
  • In the table “borrowings” is stored information taken from library books. Column bookid refers to the identity taken in the books table “books”, and the column memberid refers to the appropriate person from the table “members”. We also have date of issue and the date when the book must be returned.

the

3. Simple request


Let's start with a simple request: we need names and identifiers (id) of all the books written by the author “Dan Brown”


Query is:


the
SELECT bookid AS id, title
FROM books
WHERE author='Dan Brown';

And the result is this:


the the the
id title
2 The Lost Symbol
4 Inferno

Pretty simple. Let's look at the query to understand what is happening.


the

3.1 FROM — where do you get the data


Now this may seem obvious, but FROM will be very important later when we get to the joins and subqueries.


FROM indicates the table for which you want to query. This can be an existing table (as in the example above), or a table that is created on the fly via the connection or subqueries.


the

3.2 WHERE — what data display


WHERE simply behaves like a filter on lines that we want to withdraw. In our case, we want to see only those rows where the value in column author is “Dan Brown”.


the

3.3 SELECT as display data


Now that we have all the needed columns from the desired table, you need to decide how to show these data. In our case we need only the names and IDs of the books, so that's what we choose with SELECT. At the same time you can rename a column using AS.



the

4. Connection (Jouni)


Now we want to see names (not necessarily unique) of all the books of Dan brown, which were taken from the library, and when these books need to be returned -


the
SELECT books.title AS Title, borrowings.returndate AS "Return Date"
Borrowings FROM JOIN books ON borrowings.bookid=books.bookid
WHERE books.author='Dan Brown';

Result:


the the the the
Title Return Date
The Lost Symbol 2016-03-23 00:00:00
Inferno 2016-04-13 00:00:00
The Lost Symbol 2016-04-19 00:00:00

most of the request is similar to the previous except section of the FROM. This means that we request data from another table. We do not apply any to the table “books” nor to the table “borrowings”. Instead, we appeal to the new table, which was created by the joining of these two tables.


JOIN books ON borrowings borrowings.bookid=books.bookid is count, a new table, which was formed by combining all the records from the tables "books" and "borrowings" in which the values of bookid are the same. The result of this merge will be:



And then we make the query for this table in the same way as in the example above. This means that when you join tables, you only have to worry about how to hold the connection. And then the query becomes as clear as in the case of "simple request" from paragraph 3.


Let's try a slightly more complicated connection with two tables.


Now we want to get the names of the people who took out library books by the author “Dan Brown”.


this time let's go from bottom to top:


Step 1 — how do you take data? To get the desired result, you need to join tables “member” and “books” with the table “borrowings”. Section of the JOIN would look like this:


the
borrowings
JOIN books ON borrowings.bookid=books.bookid
JOIN members ON members.memberid=borrowings.memberid

the Result of the join can see the link.


Step 2 — what data is displayed? We are interested in only the data where author is “Dan Brown”


the
WHERE books.author='Dan Brown'

Step 3 — how display the data? Now, when data is received, it is necessary simply to withdraw the name and surname of those who borrowed books:


the
SELECT
members.firstname AS "First Name",
members.lastname AS "Last Name"

Super! It remains only to combine the three components and we need to make query:


the
SELECT
members.firstname AS "First Name",
members.lastname AS "Last Name"
FROM borrowings
JOIN books ON borrowings.bookid=books.bookid
JOIN members ON members.memberid=borrowings.memberid
WHERE books.author='Dan Brown';

Which will give us:


the the the the
First Name Last Name
Mike Willis
Ellen Horton
Ellen Horton

Excellent! But the names are repeated (they are not unique). We will soon fix that.


the

5. Aggregation


Roughly speaking, aggregation need to convert multiple strings into one. Thus, during aggregation for different columns used different logic.


Let's continue with our example, in which there are duplicate names. It is seen that Ellen Horton took more than one book, but this is not the best way to show this information. Can I make another request:


the
SELECT
members.firstname AS "First Name",
members.lastname AS "Last Name",
count(*) AS "Number of books borrowed"
FROM borrowings
JOIN books ON borrowings.bookid=books.bookid
JOIN members ON members.memberid=borrowings.memberid
WHERE books.author='Dan Brown'
GROUP BY members.firstname, members.lastname;

What will give us the desired result:


the the the
First Name Last Name Number of books borrowed
Mike Willis 1
Ellen Horton 2

Almost all aggregation go along with the expression GROUP BY. This thing turns the table, which could be obtained by request, in the table group. Each group corresponds to a unique value (or set of values) column, we have indicated in the GROUP BY. In our example, we convert the result from the last exercise in the group of rows. We also aggregation with count, which converts multiple rows in an integer value (in our case the number of rows). Then this value is attributed to each group.



you Can come to the logical conclusion that all fields in the result must be either specified in the GROUP BY, or it should be aggregate. Because all the other fields can differ from each other in different rows, and if you choose their SELECT'th, it is unclear which of the possible values, you have to take.


In the example above, the function count processed all rows (since we counted the number of rows). Other functions like sum or max process only the specified rows. For example, if we want to know the number of books written by each author, you need this query:


the
SELECT author, sum(stock)
FROM books
GROUP BY author;

Result:


the the the the the
author sum
Robin Sharma 4
Dan Brown 6
John Green 3
Amish Tripathi 2

the function sum handles only column stock and considers the sum of all values in each group.


the

6. Subqueries



the Subquery is an ordinary SQL queries embedded in larger queries. They are divided into three types according to the type of the returned result.


the

6.1 two-Dimensional table


There are queries that return multiple columns. A good example is the query from the last exercise in aggregation. As a subquery, it will just return another table, in which you can do the new requests. Continuing the previous exercise, if we want to know the number of books written by the author of “Robin Sharma”, then one possible way is to use subqueries:


the
SELECT *
FROM (
SELECT author, sum(stock)
FROM books
GROUP BY author
) AS results
WHERE author='Robin Sharma';

Result:


the the
author sum
Robin Sharma 4

the

6.2 one-Dimensional array


Queries that return multiple rows one column, you can use not only as two-dimensional tables, but also as arrays.


Suppose we want to know the names and IDs of all books written by a specific author, but only if the library such books more than three. Break that down into two steps:


1. Get a list of authors with the number of books is greater than 3. Complementing our previous example:


the
SELECT author
FROM (
SELECT author, sum(stock)
FROM books
GROUP BY author
) AS results
WHERE sum > 3;

Result:


the the the
author
Robin Sharma
Dan Brown

Can be written as: ['Robin Sharma', 'Dan Brown']


2. Now we use this result in a new query:


the
SELECT title, bookid
FROM books
WHERE author IN (
SELECT author
FROM (
SELECT author, sum(stock)
FROM books
GROUP BY author
) AS results
WHERE sum > 3);

Result:


the the the the
title bookid
The Lost Symbol 2
Who Will Cry When You Die? 3
Inferno 4

This is the same as:


the
SELECT title, bookid
FROM books
WHERE author IN ('Robin Sharma', 'Dan Brown');

the

6.3 Individual values


there Are queries, which are only one row and one column. They can be treated as constant values and can be used wherever values are used, for example, in the comparison operators. You can also use them as two-dimensional tables or arrays of one element.


take, for example, get information about all the books, which number in the library is higher than the average value at this time.


the Average number can be obtained in this way:


the
select avg(stock) from books;

Which gives us:


the the
avg
3.000

And it can be used as a scalar value 3.


Now, finally, you can write the whole query:


the
SELECT *
FROM books
WHERE stock>(SELECT avg(stock) FROM books);

This is the same as:


the
SELECT *
FROM books
WHERE stock>3.000

the the
bookid title author published stock
3 Who Will Cry When You Die? Robin Sharma 2006-06-15 00:00:00 4

the

7. Write operation


Most write operations in a database is fairly simple when compared with more complex read operations.


the

7.1 Update


the syntax of the query UPDATE is semantically identical with the read request. The only difference is that instead of selecting columns, SELECT'th, we set znaeniya SET'th.


If all books Dan brown lost, you need to reset the count value. The query for this is:


the
UPDATE books
SET stock=0
WHERE author='Dan Brown';

WHERE does the same thing as before: selects the row. Instead of SELECT used when reading, we now use SET. However, you now need to specify not only the column name, but the new value for this column in the selected rows.


img


the

7.2 Delete


Request DELETE is simply a query SELECT or UPDATE with no column names. Seriously. As in the case of SELECT and UPDATE, WHERE remains the same: it selects the rows that you want to delete. The delete operation destroys the entire string, so does not make sense to specify a separate column. So, if we decide not to reset the number of books Dan brown's, and generally to delete all the records, it is possible to make such a request:


the
DELETE FROM books
WHERE author='Dan Brown';

the

7.3 Insert


Perhaps the only thing that differs from other types of queries, INSERT. The format is:


the
INSERT INTO x
(a,b,c)
VALUES
(x, y, z);

Where a, b, c is the column name and x, y and z are values that should be inserted in these columns, in the same order. That is, in principle, that's all.


take a Look at a concrete example. Here is a query with INSERT, which populates the whole table "books":


the
INSERT INTO books
(bookid,title,author,published,stock)
VALUES
(1,'Scion of Ikshvaku','Amish Tripathi','06-22-2015',2),
(2,'The Lost Symbol', Dan Brown','07-22-2010',3),
(3,'Who Will Cry When You Die?','Robin Sharma','06-15-2006',4),
(4,'Inferno','Dan Brown','05-05-2014',3),
(5,'The Fault in our Stars','John Green','01-03-2015',3);

the

8. Check


We approached the end, I propose a little test. Look at the query in the beginning of the article. Unable to understand it? Try to break it down into sections SELECT, FROM, WHERE, GROUP BY, and consider the individual components of the subqueries.


Here it is in more readable form:


the
SELECT members.firstname || '' || members.lastname AS "Full Name"

FROM borrowings
INNER JOIN members
ON members.memberid=borrowings.memberid
INNER JOIN books
ON books.bookid=borrowings.bookid

WHERE borrowings.bookid IN (SELECT bookid FROM books WHERE stock> (SELECT avg(stock) FROM books) )

GROUP BY members.firstname, members.lastname;

This query displays a list of people who took from the library a book whose total number is above average.


Result:


the the
Full Name
Lida Tyler

I Hope you were able to understand without problems. But if not, I will be glad to your comments and feedback so I can improve this post.

Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

March Habrameeting in Kiev

PostgreSQL load testing using JMeter, Yandex.Tank and Overload

Monitoring PostgreSQL with Zabbix