Introduction to SQL

SQL is the most popular language for database applications on the planet. It is one of those must-know languages. Here is a very brief introduction. It doesn't cover everything, but it does get you started.

Overview

These notes cover only the basics of SQL programming. They do not cover the whole language. They do not cover database design.

First up is a brief history of the language, taken from Wikipedia:

1970 — Ted Codd writes a paper in the ACM describing how data can be described as a set of relations
Soon after — IBM develops a relational database system called System R, with a query language called SEQUEL (Structured English Query Language)
A little later — Language name shortened to SQL (pronouced es-kyu-el) since "SEQUEL" was trademarked.
1986 — SQL-86 standardized by ANSI
1987 — ISO ratifies ANSI SQL-86 and calls it SQL-87
1992 — SQL-92 (a.k.a SQL2) becomes a standard
1999 — SQL:1999 (a.k.a. SQL3) becomes a standard
2003 — SQL:2003 becomes a standard
2006 — new standard SQL:2006
2008 — mew standard SQL:2008
2011 — new standard SQL:2011

Although there's an official standard, it's likely no one implements it 100%; all vendors have tons of extensions and tend to omit quite a bit of the standard they don't like. This situation might change as time goes on.

Tables

SQL is a programming language for relational databases. The most important, if not the only, data structure, is the table (sometimes called a relation):

-- Here is a short SQL script to create a few tables.
create table Employee (
    id integer not null primary key,
    name varchar(40) not null,
    departmentName varchar(20),
    floor integer,
    managerId integer,
    salary decimal);

create table Department (
    name varchar(20) not null primary key,
    managerId integer,
    foreign key (managerId) references Employee(id));

alter table Employee
    add foreign key (departmentName) references Department(name);

create table Project (
    id integer not null primary key,
    name varchar(30),
    budget decimal);

create table Assignment (
    employeeId integer not null,
    projectId integer not null,
    role varchar(35),
    primary key (employeeId, projectId),
    foreign key (employeeId) references Employee(id),
    foreign key (projectId) references Project(id));

create table Report (
    id integer not null primary key,
    title varchar(128),
    departmentName varchar(20),
    foreign key (departmentName) references Department(name));

You can visualize the tables as two-dimensional data structures:

Project
id	name	budget

Here Project is the name of the table; id, name, and budget are called the attributes of the table. Each row in the table represents an instance of some particular project.

Inserting into Tables

Add rows to a table with the insert command:

insert into Project
values (10, 'Longhorn', 250000.00);

Sometimes a particular attribute may be unknown or unnecessary for a particular row:

insert into Department
values ('Planning', null);

insert into Employee
values (36, 'Milner', 'Planning', 2, 16, 43911.26);

insert into Project
values (42299, 'Surface Analysis', null);

insert into Employee (name, floor, id, salary)
values ('Coquand', 10, 33, 92188.00);

Null values often make things pretty ugly. A thorough treatement of nulls is beyond the scope of this introduction. But we will study them later.

Queries

SQL is a declarative language, as opposed to an imperative language. That is, the programmer states what she wants, not how to get it. Here are some sample queries:

select id, name
from Employee
where departmentName = 'Planning'
and floor = 1;

select name
from Project
where budget between 100000.00 and 5311882.88;

select *
from Assignment
where employeeId = 21;

Sometimes you need to nest queries because information is contained in more than one table. Here is one way to find the names of all employees that work on project #121:

select name
from Employee
where id in (
    select employeeId
    from Assignment
    where projectId = 121);

Here's another way:

select name
from Employee, Assignment
where Employee.id = Assignment.employeeId
and projectId = 121;

or, perhaps better,

select name
from Employee
join Assignment
on Employee.id = Assignment.employeeId
and projectId = 121;

Sometimes it is nice to be able to abbreviate the table names, as in

select p.budget
from Project p, Assignment a
where p.id = a.projectId
and (a.Role = 'Electrician' or a.employeeId != 24);

This is actually necessary when phrasing a query that refers to distinct occurence of a single table. Which employees make more than their own manager?

select id, name, salary
from Employee e, Employee m
where e.managerId = m.id
and e.salary > m.salary;

You can have your result sorted on any column.

select *
from Project
where budget < 3200000.22
order by name;

Sometimes when an English query has words like "every" and "all" in it, the SQL command may become messy. To find out which employees work on every project you write:

select id, name
from Employee e
where not exists (
    select *
    from Project p
    where not exists (
        select *
        from Assignment a
        where e.id = a.employeeId and a.projectId = P.id));

There are five built-in functions in SQL that operate on columns. They are min, max, avg, sum and count. Here is how to find the minimum, maximum and average budget of all projects:

select min(budget), avg(budget), max(budget)
from Project;

How many managers are there?

select count(distinct managerId)
from Employee;

What is the highest floor that an employee of the 'Mars' project works on?

select max(Floor)
from Employee e, Project p, Assignment a
where e.id = a.employeeId
and p.id = a.projectId
and p.name = 'Mars';

One nice feature of SQL is that it is easy to partition the rows of a table into groups and apply these functions to a column within a group. Let's get the average salary of all workers for each department:

select departmentName, avg(Salary)
from Employee e
group by departmentName;

Suppose you wanted the number of reports produced by each department:

select departmentName, count(*)
from Report
group by departmentName;

Remember that the where clause filters out rows. Suppose we wanted to get the average salary of employees for each department except the Sales department. We write

select departmentName, avg(salary)
from Employee e
where departmentName != 'Sales'
group by departmentName;

The having clause filters groups. Lets do the previous query again but only print those departments with an average salary above 50000.

select departmentName, avg(salary)
from Employee e
where departmentName != 'Sales'
group by departmentName
having avg(Salary) > 50000.00;

Don't forget: the where clause filters rows and the having clause filters groups.

Here is something you should memorize. The order in which the clauses of a query are executed is

FROM — chooses the desired tables
WHERE — filters the rows you want
GROUP BY — partitions the rows
HAVING — filters the groups you want
SELECT — filters the columns you want
ORDER BY — sorts

Only the FROM and SELECT clauses are required. You can only have a HAVING if you have a GROUP BY (that one should be obvious!). The ORDER BY can not be used in a subquery.

Here is a query that uses all six cluases. Find for each department the number of employees that work in that department and the sum of their salaries. Do not consider the planning department nor any department managed by Wirth nor any department whose total salary expense is below 100000. Print the results in reverse alphabetical order by department name.

select departmentName, count(*), sum(salary)
from Employee
where departmentName != 'Planning' and departmentName not in (
    select departmentName
    from Department d, Employee e
    where d.name = E.departmentName and e.name = 'Wirth')
group by departmentName
having sum(Salary) > 100000.00
order by departmentName desc;

If we wanted to sort the previous result by decreasing salary totals then the last clause would be:

order by 3 desc

where the 3 refers to the third column mentioned in the select clause. Positional specification of columns is pretty brittle, since adding a column in your output might cause you to change the order by clause. It is preferable to name the columns in the select clause:

select departmentName, count(*) as numEmployees, sum(salary) as totalSalary

which will allow you to write

order by totalSalary desc;

Updating and Deleting

Here are the basics of updating and deleting:

delete
from Employee
where Floor = 3;

update Project
set budget = budget * 1.05
where name = 'Engineering';

That last command effectively gave the Engineering department a 5% budget increase.

More...

These notes have barely scratched the surface of the SQL language, so do enjoy the next steps in your study.

When you get good, try out these interview questions