Intro
Recently, I took a deep dive into analytics to finish some case studies I started as part of my coursework for the Google Data Analytics Certificate. Part of that deep dive meant reviewing SQL queries, but it had been a while since I worked with them so I put together a cheat sheet of the basics.
In this post, you’ll find a list of the basic SQL queries, what they mean, and example code snippets. For these examples, I used Google’s BigQuery to practice. Depending on the SQL database you use the naming conventions of the columns and tables may be different, but the basic setup of the queries will be the same.
You can find the markdown version of this cheat sheet in my LearnInPublic repo on GitHub.
Reserved Words
SELECT
What it does:
Specifies or “selects” the column(s) you want to query
Can use the asterisk
*
if you want to select all the columns in the databasedatabaseName.columnName
→ for specifying which DB and column, useful when joining datasets
FROM
What it does:
Specifies which table to pull data from
At this stage, you can introduce an alias for column names by adding it to the end of the
FROM
clause. This is useful for making them easier to remember. For example,FROM Player_Attributes attributes
Example:
SELECT
*
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
Here we will get all of the columns from the table Project Tycho Reports of the USA Contagious Disease dataset.
As you can see there are 759,467 rows in the results. That’s a lot and this is where adding other clauses will help narrow down the results. Alternatively, we also have the option to export the data which then can be used with other analysis software or programming language (e.g., Python) to better understand it.
Instead of using the asterisk *
to select all the columns we could also select a single column name (or multiple column names separated by commas ,
):
SELECT
disease
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
WHERE
What it does:
Specifies the condition(s) for the data you want to see (e.g., all values greater than 100)
The percent symbol
%
can be used for searching stringsThe greater than
>
and less than<
symbols are used for looking for numerical valuesThe underscore
_
is used for searching for charactersThe
in
keyword performs an exact match while thelike
keyword performs a looser matchThere is also
between
,and
,or
,is null
,is not null
Example:
SELECT
*
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
WHERE
disease="MUMPS"
Here we can select all of the columns from the table, but the WHERE
clause specifies we only want the entries where the disease is MUMPS. This means other diseases are not included.
Here’s another example where we are looking for a numerical value. In this case, when the number of incidences of a disease per 100,000 people is greater than 1.
SELECT
*
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
WHERE
incidence_per_100000>1
We can even combine these statements with the and
keyword and look for the disease “MUMPS” where the incidence per 100,000 people is greater than 1.
SELECT
*
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
WHERE
disease="MUMPS" and incidence_per_100000>1
GROUP BY
What it does:
- We can use the fields or column names to group things together
Example:
SELECT
disease,
AVG(incidence_per_100000) as avg_incidence
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
GROUP BY
disease
Here we have a list of the average incidence per 100,000 people grouped by disease.
There are two things to note in this example:
I used the keyword
as
to give an alias to theavg_incidence
. This can be used when doing a calculation or with a column name that might be particularly long. This helps us make the column names more human-friendly.There is no ordering to this list, but we can fix that with our next example when we use the
ORDER BY
clause.
ORDER BY
What it does:
Here we can specify a column name to order the results by
By default, it’s in ascending order but we can add
DESC
to the end for descending order
Example:
SELECT
disease,
AVG(incidence_per_100000) as avg_incidence
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
GROUP BY
disease
ORDER BY
avg_incidence
Here we have a list of the average incidence per 100,000 people grouped by disease and ordered based on the incidence.
By default, it is listed from lowest to highest (ascending order). But we can also add the keyword desc
to the ORDER BY
clause in order to list it from highest to lowest (descending). See the second example below.
SELECT
disease,
AVG(incidence_per_100000) as avg_incidence
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
GROUP BY
disease
ORDER BY
avg_incidence desc
Conclusion
Those are the basic SQL queries you need to get started with querying a database. At the very least, you need to know how to use SELECT
and WHERE
so that you can pull data out. However, it’s helpful to know other clauses like GROUP BY
and ORDER BY
to narrow down or filter your data. This is particularly helpful with large datasets.
We also got a preview of how we can use functions, like AVG()
, to perform calculations and return the results in a column. This helps us get a glance at the shape of the data which might lead us to areas for deeper exploration.
This is just a quick start guide. Like all things technology, there’s a lot more you can do with SQL and I’m not an expert by any means. But I do enjoy learning and sharing what I learn as I go. If you’d like to learn more about SQL, check out the ‘References & Resources’ section. I’ve included a list of resources that were helpful to me when learning SQL and writing this post. Happy coding!
References & Resources
If you want to take a deeper dive into SQL queries, I found the following resource helpful while writing this post.
Learn Basic SQL in 15 Minutes (YouTube Tutorial by Adam Finer)
SQL Basics Cheat Sheet by Datacamp
SQL Cheat Sheet by Dataquest
SQL Style Guide by Simon Holywell
Thanks for reading
Thanks for reading this post. I hope you found it useful. If you have a favorite resource for studying SQL, share it in the comments. Hearing your thoughts makes this more of a conversation and helps us all learn.