HTML is a markup language used in:
HTML documents are rendered by both visual and aural user agents. Make sure to author documents that work properly on both kinds of agents. For example, don’t author documents for which color has semantic meaning. And make sure to mark up abbreviations, language changes, and stress so that aural agents can read properly.
These notes will only cover HTML5, but believe it or not there are documents on the WWW that use older versions. It’s fun to know what came before.
Version | Date | Notes | Docs |
---|---|---|---|
HTML 1.0 | 1991 | Early version, used before most people took note. | — |
HTML 2.0 | 1994 | First version to get an official spec. | W3C • RFC |
HTML 3.0 | — | Very ambitious. Never actually implemented. | — |
HTML 3.2 | 1997 | A scaling back of 3.0. | W3C |
HTML 4 | 1998 (4.0) 1999 (4.01) | More multimedia support, better accessibility, better internationalization. | W3C |
HTML 5 | 2008-present (Still evolving) | The modern version. | Living Standard |
If you like history, check out this HTML History covering 1989-1998 by Dave Ragget.
Differences between HTML5 and earlier versionsThere was once a time when the web consisted of most documents written in HTML4 and below, but that time has passed, and there is really no reason to learn the old versions anymore.
Here is a basic HTML document:
<!doctype html> <html> <head> <meta charset="utf-8"> <title>Hello</title> </head> <body> <h1>Welcome</h1> <p>Hello, world, this is my <a href="fido.html">dog</a>. <img src="fido.jpg" alt="fido"> <!-- That wasn't too bad --> </p> </body> </html>
The first line is a doctype that identifies the document as HTML5. There are other doctypes that tell the browser to use a different version of HTML. If you omit the doctype, the browser will resort to quirks mode — a rendering mode that tries to make old web pages written for hacked and buggy browsers back in the day render sort of the way they were intended.
Never use quirks mode.It’s gross, but a necessary evil, for now.
Always use
<!doctype html>
.
After the doctype comes the elements, defining a tree with root element html
. Elements are written with start tags and end tags. Sometimes the end tag can be omitted and sometimes the start tag can be omitted, too! Elements may have attributes. The content of an element may include other elements, as well as comments, text, and a few other things.
The internal representation of an HTML document is called the DOM, which is short for “document object model.” The DOM is a tree data structure made up of the following types of nodes: element, attribute, text, cdata, entity reference, entity, processing instruction, comment, document, document type, document fragment, and notation. The HTML document above is represented with the following DOM (here we use the ⏎
character to represent newlines and the ‿
to represent spaces):
document doctype html element html text ⏎‿‿ element head text ⏎‿‿‿‿ element meta (charset=utf-8) text ⏎‿‿‿‿ element title text Hello text ⏎‿‿ text ⏎‿‿ element body text ⏎‿‿‿‿ element h1 text Welcome text ⏎‿‿‿‿ element p text Hello,‿world,‿this‿is‿my‿ element a (href=fido.html) text dog text .⏎‿‿‿‿‿‿ element img (src=fido.jpg alt=fido) text ⏎‿‿‿‿‿‿ comment ‿That‿wasn't‿too‿bad‿ text ⏎‿‿‿‿ text ⏎‿‿ text ⏎
It’s fine to draw DOM trees without the inter-element whitespace (the empty text nodes and the text nodes containing only whitespace that is not part of elements that can contain significant text). So the following picture would be more common:
The in-memory representation (DOM HTML) is what ultimately matters, and there are an infinite number of concrete syntaxes that can represent it. However there are two official concrete syntaxes: the HTML syntax and the XML syntax.
Here are big picture topics to keep in mind when learning HTML:
We’ll be going through a few code-alongs introducing the elements and attributes in the context of creating fun little web applications.
Quick reference time.
Here are the elements, grouped by category. (For a cooler list, linked to the HTML5 spec, see the Periodic Table of the Elements. And for an even cooler list with awesome popup code snippets, see HTML5 Doctor’s Element Index.)
ROOT | |
---|---|
html | The document root |
METADATA | |
head | The metadata container |
title | The document’s title |
base | The URL to use for resolving relative URLs |
link | A link from this document to an other resource |
meta | Metadata not specifiable via title, base, link, style, or script |
style | Embedded styling information |
SCRIPTING | |
script | A script, either embedded or external |
noscript | Content activated only if scripting is disabled |
SECTIONING | |
body | Main document content |
section | Thematic grouping of content, typically with a heading, such as chapters in a book, or a web page’s introduction, news, and contact info sections. |
nav | A (major) section that contains navigation links |
article | Self-contained composition in a document, independently distributable (syndicatable) |
aside | Tangentially related content, such as would appear in a sidebar |
h1 | A “Level 1” section heading |
h2 | A “Level 2” section heading |
h3 | A “Level 3” section heading |
h4 | A “Level 4” section heading |
h5 | A “Level 5” section heading |
h6 | A “Level 6” section heading |
hgroup | A section heading that can contain multiple levels (e.g. headings and subheadings) |
header | A group of introductory or navigational aids for a document or section, such as a wrapper for a table of contents, logo, company name, and search form |
footer | A footer for its document or section, perhaps containing copyright info, author info, license agreements, etc. |
address | Contact info for nearest article or body ancestor |
GROUPING | |
p | Paragraph |
hr | Paragraph-level thematic break, such as a scene change in a story, or a transition to another topic within a section. |
pre | Preformatted text |
blockquote | A section quoted from another source |
ol | Ordered list |
ul | Unordered list |
li | List item |
dl | Description list (name/value groups), such as for questions/answers and terms/definitions |
dt | A name part of a name/value group in a description list |
dd | A value part of a name/value group in a description list |
figure | Content (such as illustrations, diagrams, photos, code listings) optionally with a caption, that is self-contained and is typically referenced as a single unit from the main flow of the document |
figcaption | A caption or legend for the rest of the contents of the figcaption element’s parent figure element, if any. |
div | Generic wrapper for a group of consecutive elements (should only be used as a last resort, when no existing element is suitable) |
TEXT-LEVEL SEMANTICS | |
a | A hyperlink, or a placeholder for a hyperlink |
em | Stress emphasis |
strong | Importance |
small | Side comments (e.g., fine print) |
s | No longer accurate or relevant |
cite | Title of a work, such as a book, paper, essay, poem, score, song, script, film, game, painting, play, musical, exhibition, or similar |
q | Content quoted from another source |
dfn | The defining instance of a term |
abbr | Abbreviation or acronym |
data | Content tagged with a machine-readable format (in the value attribute)
|
time | a date, time, or datetime (human readable in content, machine readable in datetime attribute)
|
code | A fragment of computer code |
var | A variable |
samp | (Sample) output from a computer program or system |
kbd | User input, typically keyboard input, but could be voice or other kind of input |
sub | Subscript |
sup | Superscript |
i | Text in an alternate voice or mood, e.g., foreign words, technical terms, terms from a taxonomy, ship names, stage directions in a script, thoughts, hand-written notes in a document, voice-overs. (Do not use if some other element such as em, strong, dfn, var, cite, q applies.) |
b | Text to which attention is being drawn for utilitarian purposes without conveying any extra importance and with no implication of an alternate voice or mood, such as key words in a document abstract, product names in a review, actionable words in interactive text-driven software, or an article lede |
u | text with an unarticulated, though explicitly rendered, non-textual annotation, such as labeling the text as being a proper name in Chinese text (a Chinese proper name mark), or labeling the text as being misspelt |
mark | marked or highlighted for reference purposes, due to its relevance in another context |
ruby | Text spans containing ruby markup |
rt | The ruby text component of a ruby annotation |
rp | Parentheses around a ruby text component of a ruby annotation, to be shown by user agents that don’t support ruby annotations |
bdi | text to be isolated from its surroundings for the purposes of bidirectional text formatting |
bdo | Bidirectional override |
span | Generic phrase-level wrapper |
br | Line break (Only used when the break is part of the content, as in a poem or address) |
wbr | Line break opportunity (usually inside of a very long word or source code line) |
EDITS | |
ins | An addition to the document |
del | A removal from the document |
EMBEDDING | |
image | An image |
iframe | A nested browsing context |
embed | an integration point for an external (typically non-HTML) application or interactive content |
object | an external resource, which, depending on the type of the resource, will either be treated as an image, as a nested browsing context, or as an external resource to be processed by a plugin. |
param | A parameter for plugins invoked by object elements |
video | A video |
audio | A sound or audio stream |
source | Alternative media resource for a media element |
track | Explicit external timed text track for media elements |
canvas | A resolution-dependent bitmap canvas, which can be used for rendering graphs, game graphics, or other visual images on the fly |
map | An image map |
area | A hyperlink with some text and a corresponding area on an image map, or a dead area on an image map. |
TABULAR | |
table | A table |
caption | The title of a table |
colgroup | A group of table columns |
col | A column in a colgroup |
tbody | A block of rows making up the main content of a table |
thead | A block of rows making up the column labels (headers) of a table |
tfoot | A block of rows making up the column summaries (footers) of a table |
tr | Table row |
td | Table cell |
th | Header cell in a table |
FORMS | |
form | a collection of form-associated elements, some of which can represent editable values that can be submitted to a server for processing |
fieldset | A group of form controls optionally grouped under a common name |
legend | caption for a fieldset |
label | a caption in a user interface, generally for a specific form control |
input | a typed data field, usually with a form control. The types include hidden, text, search, tel, url, email, password, datetime, date, month, week, time, datetime-local, number, range, color, checkbox, radio, file, submit, image, reset, button |
button | A button |
select | A control for selecting among a set of options |
datalist | A set of option elements that represent predefined options for other controls |
optgroup | A group of option elements with a common label |
option | An option in a select element or as part of a list of suggestions in a datalist element |
textarea | A multiline plain text edit control |
keygen | A key pair generator control; when the control’s form is submitted, the private key is stored in the local keystore, and the public key is packaged and sent to the server |
output | The result of a calculation |
progress | The completion progress of a task |
meter | A scalar measurement within a known range, or a fractional value; for example disk usage, the relevance of a query result, or the fraction of a voting population to have selected a particular candidate |
INTERACTIVE | |
details | A disclosure widget from which the user can obtain additional information or controls |
summary | A summary, caption, or legend for the rest of the contents of the summary element’s parent details element, if any |
command | A command that the user can invoke |
menu | A list of commands |
dialog | Part of an application that a user interacts with (e.g., dialog box, inspector, window) |
You can’t just put any element inside any other. The rules are sometimes very complicated; see the HTML spec for details. The following image will give you an idea of some of the relationships, but not all:
Which attributes are allowed for which elements? Some, called the global attributes, are allowed on all elements. Some are allowed only on specific elements.
Element | Allowed Attributes |
---|---|
(ALL) | accesskey class contenteditable contextmenu dir draggable dropzone hidden id lang spellcheck style tabindex title onabort onblur oncanplay oncanplaythrough onchange onclick oncontextmenu oncuechange ondblclick ondrag ondragend ondragenter ondragleave ondragover ondragstart ondrop ondurationchange onemptied onended onerror onfocus oninput oninvalid onkeydown onkeypress onkeyup onload onloadeddata onloadedmetadata onloadstart onmousedown onmousemove onmouseout onmouseover onmouseup onmousewheel onpause onplay onplaying onprogress onratechange onreset onscroll onseeked onseeking onselect onshow onstalled onsubmit onsuspend ontimeupdate onvolumechange onwaiting |
html | manifest |
base | href target |
link | rel href media hreflang type sizes title |
meta | name http-equiv content charset |
style | media type scoped title |
script | src async defer type charset |
body | onafterprint onbeforeprint onbeforeunload onblur onerror onfocus onhashchange onload onmessage onoffline ononline onpagehide onpageshow onpopstate onredo onresize onscroll onstorage onundo onunload |
blockquote | cite |
ol | reverse start type |
li | value |
a | href target download ping rel media hreflang type |
q | cite |
data | value |
time | datetime |
ins | cite datetime |
del | cite datetime |
img | alt src srcset crossorigin usemap ismap width height |
iframe | src srcdoc name sandbox seamless width height |
embed | src type width height |
object | data type typemustmatch name usemap form width height |
param | name value |
video | src crossorigin poster preload autoplay mediagroup loop muted controls width height |
audio | src crossorigin preload autoplay mediagroup loop muted controls |
source | src type media |
track | kind src srclang label default |
canvas | width height |
map | name |
area | alt coords shape href target download ping rel media hreflang type |
colgroup | span |
col | span |
td | colspan rowspan headers |
th | colspan rowspan headers scope |
form | accept-charset action autocomplete enctype method name novalidate target |
fieldset | disabled form name |
label | form for |
input | accept alt autocomplete autofocus checked dirname disabled form formaction formenctype formmethod formnovalidate formtarget height inputmode list max maxlength min multiple name pattern placeholder readonly required size src step type value width |
button | autofocus disabled form formaction formenctype formmethod formnovalidate formtarget name type value |
select | autofocus disabled form multiple name required size |
optgroup | disabled label |
option | disabled label selected value |
textarea | autocomplete autofocus cols dirname disabled form inputmode maxlength name placeholder readonly required rows wrap |
keygen | autofocus challenge disabled form keytype name |
output | for form name |
progress | value max |
meter | value min max low high optimum |
details | open |
command | type label icon disabled checked radiogroup command |
menu | type label |
dialog | open |
As HTML has evolved, elements have come and gone. Yes, some have gone. You might still see some of these in browsers. If you do, well, yikes. Just make sure YOU don’t ever use them. If you are copying and pasting code from someone else, be on the lookout for these obsolete features and replace them.
The elements that have been removed include acronym, applet, bgsound, dir, frame, frameset, noframes, isindex, keygen, listing, menuitem, nextid, noembed, plaintext, rb, rtc, strike, xmp, basefont, big, blinnk, center, font, marquee, multicol, nobr, spacer, tt.
There are quite a few attributes that should no longer be used, too.
Here is the complete list of obsolete features.
We’ve covered: