These notes cover only the basics of SQL programming. They do not cover the whole language. They do not cover database design.
First up is a brief history of the language, taken from Wikipedia:
Although there's an official standard, it's likely no one implements it 100%; all vendors have tons of extensions and tend to omit quite a bit of the standard they don't like. This situation might change as time goes on.
SQL is a programming language for relational databases. The most important, if not the only, data structure, is the table (sometimes called a relation):
-- Here is a short SQL script to create a few tables. create table Employee ( id integer not null primary key, name varchar(40) not null, departmentName varchar(20), floor integer, managerId integer, salary decimal); create table Department ( name varchar(20) not null primary key, managerId integer, foreign key (managerId) references Employee(id)); alter table Employee add foreign key (departmentName) references Department(name); create table Project ( id integer not null primary key, name varchar(30), budget decimal); create table Assignment ( employeeId integer not null, projectId integer not null, role varchar(35), primary key (employeeId, projectId), foreign key (employeeId) references Employee(id), foreign key (projectId) references Project(id)); create table Report ( id integer not null primary key, title varchar(128), departmentName varchar(20), foreign key (departmentName) references Department(name));
You can visualize the tables as two-dimensional data structures:
Project | ||
---|---|---|
id | name | budget |
|
Here Project
is the name of the table; id
, name
,
and budget
are
called the attributes of the table. Each row in the table represents
an instance of some particular project.
Add rows to a table with the insert
command:
insert into Project values (10, 'Longhorn', 250000.00);
Sometimes a particular attribute may be unknown or unnecessary for a particular row:
insert into Department values ('Planning', null); insert into Employee values (36, 'Milner', 'Planning', 2, 16, 43911.26); insert into Project values (42299, 'Surface Analysis', null); insert into Employee (name, floor, id, salary) values ('Coquand', 10, 33, 92188.00);
Null values often make things pretty ugly. A thorough treatement of nulls is beyond the scope of this introduction. But we will study them later.
SQL is a declarative language, as opposed to an imperative language. That is, the programmer states what she wants, not how to get it. Here are some sample queries:
select id, name from Employee where departmentName = 'Planning' and floor = 1; select name from Project where budget between 100000.00 and 5311882.88; select * from Assignment where employeeId = 21;
Sometimes you need to nest queries because information is contained in more than one table. Here is one way to find the names of all employees that work on project #121:
select name from Employee where id in ( select employeeId from Assignment where projectId = 121);
Here's another way:
select name from Employee, Assignment where Employee.id = Assignment.employeeId and projectId = 121;
or, perhaps better,
select name from Employee join Assignment on Employee.id = Assignment.employeeId and projectId = 121;
Sometimes it is nice to be able to abbreviate the table names, as in
select p.budget from Project p, Assignment a where p.id = a.projectId and (a.Role = 'Electrician' or a.employeeId != 24);
This is actually necessary when phrasing a query that refers to distinct occurence of a single table. Which employees make more than their own manager?
select id, name, salary from Employee e, Employee m where e.managerId = m.id and e.salary > m.salary;
You can have your result sorted on any column.
select * from Project where budget < 3200000.22 order by name;
Sometimes when an English query has words like "every" and "all" in it, the SQL command may become messy. To find out which employees work on every project you write:
select id, name from Employee e where not exists ( select * from Project p where not exists ( select * from Assignment a where e.id = a.employeeId and a.projectId = P.id));
There are five built-in functions in SQL that operate on
columns. They are min
, max
, avg
,
sum
and count
.
Here is how to find the minimum, maximum and average budget
of all projects:
select min(budget), avg(budget), max(budget) from Project;
How many managers are there?
select count(distinct managerId) from Employee;
What is the highest floor that an employee of the 'Mars' project works on?
select max(Floor) from Employee e, Project p, Assignment a where e.id = a.employeeId and p.id = a.projectId and p.name = 'Mars';
One nice feature of SQL is that it is easy to partition the rows of a table into groups and apply these functions to a column within a group. Let's get the average salary of all workers for each department:
select departmentName, avg(Salary) from Employee e group by departmentName;
Suppose you wanted the number of reports produced by each department:
select departmentName, count(*) from Report group by departmentName;
Remember that the where
clause filters out rows. Suppose we
wanted to get the average salary of employees for each department
except the Sales department. We write
select departmentName, avg(salary) from Employee e where departmentName != 'Sales' group by departmentName;
The having
clause filters groups. Lets do the previous
query again but only print those departments
with an average salary above 50000.
select departmentName, avg(salary) from Employee e where departmentName != 'Sales' group by departmentName having avg(Salary) > 50000.00;
Don't forget: the where
clause filters rows and
the having
clause filters groups.
Here is something you should memorize. The order in which the clauses of a query are executed is
Only the FROM and SELECT clauses are required. You can only have a HAVING if you have a GROUP BY (that one should be obvious!). The ORDER BY can not be used in a subquery.
Here is a query that uses all six cluases. Find for each department the number of employees that work in that department and the sum of their salaries. Do not consider the planning department nor any department managed by Wirth nor any department whose total salary expense is below 100000. Print the results in reverse alphabetical order by department name.
select departmentName, count(*), sum(salary) from Employee where departmentName != 'Planning' and departmentName not in ( select departmentName from Department d, Employee e where d.name = E.departmentName and e.name = 'Wirth') group by departmentName having sum(Salary) > 100000.00 order by departmentName desc;
If we wanted to sort the previous result by decreasing salary totals then the last clause would be:
order by 3 desc
where the 3 refers to the third column mentioned in
the select clause. Positional specification of columns is pretty brittle, since
adding a column in your output might cause you to change the order by
clause.
It is preferable to name the columns in the select clause:
select departmentName, count(*) as numEmployees, sum(salary) as totalSalary
which will allow you to write
order by totalSalary desc;
Here are the basics of updating and deleting:
delete from Employee where Floor = 3;
update Project set budget = budget * 1.05 where name = 'Engineering';
That last command effectively gave the Engineering department a 5% budget increase.
These notes have barely scratched the surface of the SQL language, so do enjoy the next steps in your study.
When you get good, try out these interview questions