Database Models

There was a time when people said “database” they meant “relational database.” Those days are gone.

Key-Value Stores

Key-value databases are mappings of keys to values. For example:

testingfalse
domain"lmu.edu"
max_content_length2097152
commands["configure", "make", "make install"]
sales:2016:1189432
sales:2016:1291123
sales:2017:019078
sales:2017:028819
available_colors{"blue", "turquoise", "violet", "cyan", "seagreen"}
employees_from{"CA": 293, "NV": 11, "RI": 12, "HI": 17}
leaderboard{marko(872), kat(711), meixian(733), alex(687), sean(599)}

Keys are generally strings. Values can be numbers, booleans, strings, lists, sets, sorted sets, or hashes.

You generally can’t query on partial keys, e.g., you can’t easily ask for all sales in 2016.

Popular Key-Value Stores:

Document Stores

Document databases are generally made up of collections, each holding documents. Documents are generally free-form (nested) objects. For example, here are some documents in a movies collection:

{
  "_id": 4647782, "type": "Feature Film", "status": "Released",
  "people": [{"person_id": 42297, "role": "Director", "group": "Director"}],
  "countries": ["United States", "Jamaica"],
  "titles": [{"main": true, "name": "Still Moving: Patti Smith"}]
}
{
  "_id": 4647783, "type": "Feature Film", "status": "Released",
  "people": [{"person_id": 36075, "role": "Actor", "group": "Cast"}],
  "countries": ["Italy"],
  "titles": [{"main": true, "name": "Ultimatum"}]
}
{
  "_id": 4648539, "type": "Feature Film", "status": "Released",
  "ratings": [{"source": "MPAA", "value": "PG"}],
  "countries": ["United States"],
  "box_office": 2071091.0,
  "people": [
    {"person_id": 38316, "role": "Actor", "group": "Cast"},
    {"person_id": 31894, "role": "Actor", "group": "Cast", "credit": "cameo"},
    {"person_id": 38193, "role": "Actor", "group": "Cast", "credit": "cameo"},
    {"person_id": 40836, "role": "Actor", "group": "Cast"},
    {"person_id": 41626, "role": "Director", "group": "Director"},
    {"person_id": 32279, "role": "Executive Producer", "group": "Production"},
    {"person_id": 34607, "role": "Producer", "group": "Production"},
    {"person_id": 34547, "role": "Associate Producer", "group": "Production"},
    {"person_id": 41626, "role": "Screenplay", "group": "Writing"},
    {"person_id": 34547, "role": "Screenplay", "group": "Writing"},
    {"person_id": 39554, "role": "Cinematographer", "group": "Camera, Film, Tape"},
    {"person_id": 39954, "role": "Music", "group": "Music"},
    {"person_id": 34477, "role": "Music", "group": "Music"},
    {"person_id": 44020, "role": "Music Producer", "group": "Music"}
  ],
  "companies": [
    {"role": "Studio", "group": "Studio", "company_id": 46232},
    {"role": "Domestic Theatrical Distributor", "group": "Distribution", "company_id": 46208},
    {"role": "Domestic Video Distributor", "group": "Distribution", "company_id": 45788}
  ],
  "locations": [
    "Costa Rica", "South Africa", "Australia", "New Zealand", "Fiji", "Tahiti,
    French Polynesia", "Java, Indonesia", "Bali, Indonesia", "Hawaii, USA"
  ],
  "titles": [
    {"main": true, "type": "name": "Endless Summer II"},
    {"name": "Bruce Brown's The Endless Summer 2"}
  ]
}

Queries might look like

films.find({"ratings.value":"G"}, {titles:{$elemMatch:{main:true}}}).limit(5)

When collections are independent of each other, everything is great. Otherwise, you have to (1) repeat nested objects or (2) use references. Querying across collections is generally not well supported by the native query language; you have to do such things programmatically. If you are doing too many of these joins, you probably don’t want to use a document store!

Popular Document Stores:

Relational Stores

Relational databases are generally made up of tables, each holding rows. Rows represent objects. Unlike document databases, where documents can contain anything, each row in a table has only the properties of that table. We say relational datbases have a fixed schema while document databases have a flexible schema. (Of course, nothing stops you from making a table with columns holding JSON....)

Here is the database we saw earlier:

Author
idnamecountry
1J.D. SalingerUSA
2F. Scott. FitzgeraldUSA
3Jane AustenUK
4Scott HanselmanUSA
5Jason N. GaylordUSA
6Pranav RastogiIndia
7Todd MirandaUSA
8Christian WenzUSA
Book
idtitle
1The Catcher in the Rye
2Nine Stories
3Franny and Zooey
4The Great Gatsby
5Tender id the Night
6Pride and Prejudice
7Professional ASP.NET 4.5 in C# and VB
BookAuthor
bookIdauthorId
11
21
31
42
52
63
74
75
76
77
78

Expect lots of joins, for example:

select a.name
from author a
join bookauthor ba
on a.id = ba.authorId
join book b
on b.id = ba.bookId
where b.title like 'Professional ASP.NET%';

Popular Relational Stores:

Graph Stores

A graph store just might be the thing to use if you have highly connected data. (In some sense, these are the real “relational” databases). In a graph database, you have nodes (that can have labels and properties) which are connected with relations (which can themselves have labels and properties). You can connect any node to any other.

example-graph-database.png

Queries look pretty cool:

match (p)-[a:ACTED_IN]->(:Movie {title: "A League of Their Own"})
return p.name, a.roles

Popular Graph Stores:

Column Stores

These are like relational databases, except they are stored in column groups, not row by row. This makes a difference in terms of what kinds of things you can do efficiently.

Popular Column Stores:

Object Stores

Object databases have been around for a long time. There were books about OODBs back in the 1980s and 1990s, including Kim and Hughes.

Read about them at Wikipedia.

Popular Object Stores:

There are “hybrid” Object-relational databases, which are really relational databases with some object features (generally mostly in the query language...), e.g.,

select b.author.name from book b where b.title = 'The Catcher in the Rye'

Triple Stores

A triple store is optimized for querying triples of the form (subject, predicate, object). Example:

Example

Looks a bit like a subset of a graph database.

Some triple stores can be queried with SPARQL.

Popular Triple Stores: