HTML is a markup language used in:
HTML documents are rendered by both visual and aural user agents. Make sure to author documents that work properly on both kinds of agents. For example, don’t author documents for which color has semantic meaning. And make sure to mark up abbreviations, language changes, and stress so that aural agents can read properly.
Read Dave Ragget’s A history of HTML covering the period 1989-1998.
HTML has undergone a few revisions since its inception. The interesting versions are:
Version | Date | Notes | Docs |
---|---|---|---|
HTML 1.0 | 1991 | Early version, used before most people took note. | — |
HTML 2.0 | 1994 | First version to get an official spec. | W3C • RFC |
HTML 3.0 | — | Very ambitious. Never actually implemented. | — |
HTML 3.2 | 1997 | A scaling back of 3.0. | W3C |
HTML 4 | 1998 (4.0) 1999 (4.01) |
More multimedia support, better accessibility, better internationalization. | W3C |
HTML 5 | 2008-present (Still evolving) |
The modern version. | Living Standard |
There was once a time when the web consisted of most documents written in HTML4 and below, but that time has long passed, and there is really no reason to learn the old versions anymore.
HTML5 is current. You should not author any new documents with earlier versions. You should be aware that they exist, though.
The official specification is known as the HTML Living Standard.
It is maintained by The WHATWG (Web Hypertext Application Technology Working Group).
Since you’ve already written your own simple web apps, let’s take a look at HTML form a computer science (language) lens, beginning with the question “what is the basic structure of an HTML document?”
The surface syntax of an HTML document, like that of any computer-friendly language, is a sequence of characters collected into tokens and assembled into phrases according to some rules. Here is an example document:
<!doctype html> <html> <head> <meta charset="utf-8"> <title>Hello</title> </head> <body> <h1>Welcome</h1> <p>Hello, world, this is my <a href="fido.html">dog</a>. <img src="fido.jpg" alt="fido"> <!-- That wasn't too bad --> </p> </body> </html>
The first line is a doctype that identifies the document as HTML5. There are other doctypes that tell the browser to use a different version of HTML. If you omit the doctype, the browser will resort to quirks mode — a rendering mode that tries to make old web pages written for hacked and buggy browsers back in the day render sort of the way they were intended.
Never use quirks mode.It’s gross, but a necessary evil, for now.
Always use
<!doctype html>
.
After the doctype comes the elements, defining a tree with root element html
. Elements are written with start tags and end tags. Sometimes the end tag can be omitted and sometimes the start tag can be omitted, too! Elements may have attributes. The content of an element may include other elements, as well as comments, text, and a few other things.
There are multiple definitions of the surface syntaxIt turns out that there are different syntaxes in which to write HTML documents!.
The two official syntaxes are the the HTML syntax and the the XML syntax.
The internal representation of an HTML document is called the DOM, which is short for “document object model.” The DOM is a tree data structure made up of the following types of nodes: element, attribute, text, cdata, entity reference, entity, processing instruction, comment, document, document type, document fragment, and notation. The HTML document above is represented with the following DOM (here we use the ⏎
character to represent newlines and the ‿
to represent spaces):
document doctype html element html text ⏎‿‿ element head text ⏎‿‿‿‿ element meta (charset=utf-8) text ⏎‿‿‿‿ element title text Hello text ⏎‿‿ text ⏎‿‿ element body text ⏎‿‿‿‿ element h1 text Welcome text ⏎‿‿‿‿ element p text Hello,‿world,‿this‿is‿my‿ element a (href=fido.html) text dog text .⏎‿‿‿‿‿‿ element img (src=fido.jpg alt=fido) text ⏎‿‿‿‿‿‿ comment ‿That‿wasn't‿too‿bad‿ text ⏎‿‿‿‿ text ⏎‿‿ text ⏎
It’s fine to draw DOM trees without the inter-element whitespace (the empty text nodes and the text nodes containing only whitespace that is not part of elements that can contain significant text). So the following picture would be more common:
The in-memory representation (DOM HTML) is what ultimately matters.
Here are big picture topics to keep in mind when learning HTML:
Quick reference time. Let’s face it. The elements are the central thing. So why not see the big list?
Here are the elements, grouped by category. I’ve tried to be up-to-date, but might have missed some. You can always find the complete list by going to the Elements section of the HTML Living Standard.
There are actually some cooler lists you might want to browse. One is the Periodic Table of the Elements. HTML5 Doctor’s Element Index is really useful too. And of course, there is MDN’s HTML Elements Reference.
ROOT | |
---|---|
html | The document root element |
METADATA | |
head | The metadata container |
title | The document’s title |
base | The URL to use for resolving relative URLs |
link | A link from this document to an other resource |
meta | Metadata not specifiable via title, base, link, style, or script |
style | Embedded styling information |
SECTIONING | |
body | Main document content |
article | Self-contained composition in a document, independently distributable (syndicatable) |
section | Thematic grouping of content, typically with a heading, such as chapters in a book, or a web page’s introduction, news, and contact info sections. |
nav | A (major) section that contains navigation links |
aside | Tangentially related content, such as would appear in a sidebar |
h1 | A “Level 1” section heading |
h2 | A “Level 2” section heading |
h3 | A “Level 3” section heading |
h4 | A “Level 4” section heading |
h5 | A “Level 5” section heading |
h6 | A “Level 6” section heading |
hgroup | A section heading that can contain multiple levels (e.g. headings and subheadings) |
header | A group of introductory or navigational aids for a document or section, such as a wrapper for a table of contents, logo, company name, and search form |
footer | A footer for its document or section, perhaps containing copyright info, author info, license agreements, etc. |
address | Contact info for nearest article or body ancestor |
GROUPING | |
p | Paragraph |
hr | Paragraph-level thematic break, such as a scene change in a story, or a transition to another topic within a section. |
pre | Preformatted text |
blockquote | A section quoted from another source |
ol | Ordered list |
ul | Unordered list |
menu | A semantic alternative to ul to express an unordered list of commands (a "toolbar"). |
li | List item |
dl | Description list (name/value groups), such as for questions/answers and terms/definitions |
dt | A name part of a name/value group in a description list |
dd | A value part of a name/value group in a description list |
figure | Content (such as illustrations, diagrams, photos, code listings) optionally with a caption, that is self-contained and is typically referenced as a single unit from the main flow of the document |
figcaption | A caption or legend for the rest of the contents of the figcaption element’s parent figure element, if any |
main | The dominant contents of the document |
search | Container for a set of form controls or other content related to performing a search or filtering operation |
div | Generic wrapper for a group of consecutive elements (should only be used as a last resort, when no existing element is suitable) |
TEXT-LEVEL SEMANTICS | |
a | A hyperlink, or a placeholder for a hyperlink |
em | Stress emphasis |
strong | Importance |
small | Side comments (e.g., fine print) |
s | No longer accurate or relevant |
cite | Title of a work, such as a book, paper, essay, poem, score, song, script, film, game, painting, play, musical, exhibition, or similar |
q | Content quoted from another source |
dfn | The defining instance of a term |
abbr | Abbreviation or acronym |
ruby | Text spans containing ruby markup |
rt | The ruby text component of a ruby annotation |
rp | Parentheses around a ruby text component of a ruby annotation, to be shown by user agents that don’t support ruby annotations |
data | Content tagged with a machine-readable format (in the value attribute)
|
time | a date, time, or datetime (human readable in content, machine readable in datetime attribute)
|
code | A fragment of computer code |
var | A variable |
samp | (Sample) output from a computer program or system |
kbd | User input, typically keyboard input, but could be voice or other kind of input |
sub | Subscript |
sup | Superscript |
i | Text in an alternate voice or mood, e.g., foreign words, technical terms, terms from a taxonomy, ship names, stage directions in a script, thoughts, hand-written notes in a document, voice-overs. (Do not use if some other element such as em, strong, dfn, var, cite, q applies.) |
b | Text to which attention is being drawn for utilitarian purposes without conveying any extra importance and with no implication of an alternate voice or mood, such as key words in a document abstract, product names in a review, actionable words in interactive text-driven software, or an article lede |
u | text with an unarticulated, though explicitly rendered, non-textual annotation, such as labeling the text as being a proper name in Chinese text (a Chinese proper name mark), or labeling the text as being misspelt |
mark | marked or highlighted for reference purposes, due to its relevance in another context |
bdi | text to be isolated from its surroundings for the purposes of bidirectional text formatting |
bdo | Bidirectional override |
span | Generic phrase-level wrapper |
br | Line break (Only used when the break is part of the content, as in a poem or address) |
wbr | Line break opportunity (usually inside of a very long word or source code line) |
EDITS | |
ins | An addition to the document |
del | A removal from the document |
EMBEDDING | |
picture | Container that provides multiple sources to its contained img element |
source | Alternative media resource for a media element |
img | An image |
iframe | A nested browsing context |
embed | an integration point for an external (typically non-HTML) application or interactive content |
object | an external resource, which, depending on the type of the resource, will either be treated as an image, as a nested browsing context, or as an external resource to be processed by a plugin. |
param | A parameter for plugins invoked by object elements |
video | A video |
audio | A sound or audio stream |
track | Explicit external timed text track for media elements |
map | An image map |
area | A hyperlink with some text and a corresponding area on an image map, or a dead area on an image map. |
TABULAR | |
table | A table |
caption | The title of a table |
colgroup | A group of table columns |
col | A column in a colgroup |
tbody | A block of rows making up the main content of a table |
thead | A block of rows making up the column labels (headers) of a table |
tfoot | A block of rows making up the column summaries (footers) of a table |
tr | Table row |
td | Table cell |
th | Header cell in a table |
FORMS | |
form | a collection of form-associated elements, some of which can represent editable values that can be submitted to a server for processing |
label | a caption in a user interface, generally for a specific form control |
input | a typed data field, usually with a form control. The types include hidden, text, search, tel, url, email, password, datetime, date, month, week, time, datetime-local, number, range, color, checkbox, radio, file, submit, image, reset, button |
button | A button |
select | A control for selecting among a set of options |
datalist | A set of option elements that represent predefined options for other controls |
optgroup | A group of option elements with a common label |
option | An option in a select element or as part of a list of suggestions in a datalist element |
textarea | A multiline plain text edit control |
output | The result of a calculation |
progress | The completion progress of a task |
meter | A scalar measurement within a known range, or a fractional value; for example disk usage, the relevance of a query result, or the fraction of a voting population to have selected a particular candidate |
fieldset | A group of form controls optionally grouped under a common name |
legend | caption for a fieldset |
INTERACTIVE | |
details | A disclosure widget from which the user can obtain additional information or controls |
summary | A summary, caption, or legend for the rest of the contents of the summary element’s parent details element, if any |
dialog | Part of an application that a user interacts with (e.g., dialog box, inspector, window) |
SCRIPTING | |
script | A script, either embedded or external |
noscript | Content activated only if scripting is disabled |
template | Fragments of HTML that can be cloned and inserted in the document by script. |
slot | A slot in a shadow tree |
canvas | A resolution-dependent bitmap canvas, which can be used for rendering graphs, game graphics, or other visual images on the fly |
You can’t just put any element inside any other. The rules are sometimes very complicated; see the HTML spec for details. The following image will give you an idea of some of the relationships, but not all:
Which attributes are allowed for which elements? Some, called the global attributes, are allowed on all elements. Some are allowed only on specific elements.
Element | Allowed Attributes |
---|---|
(ALL) | accesskey class contenteditable contextmenu dir draggable dropzone hidden id lang spellcheck style tabindex title onabort onblur oncanplay oncanplaythrough onchange onclick oncontextmenu oncuechange ondblclick ondrag ondragend ondragenter ondragleave ondragover ondragstart ondrop ondurationchange onemptied onended onerror onfocus oninput oninvalid onkeydown onkeypress onkeyup onload onloadeddata onloadedmetadata onloadstart onmousedown onmousemove onmouseout onmouseover onmouseup onmousewheel onpause onplay onplaying onprogress onratechange onreset onscroll onseeked onseeking onselect onshow onstalled onsubmit onsuspend ontimeupdate onvolumechange onwaiting |
html | manifest |
base | href target |
link | rel href media hreflang type sizes title |
meta | name http-equiv content charset |
style | media type scoped title |
script | src async defer type charset |
body | onafterprint onbeforeprint onbeforeunload onblur onerror onfocus onhashchange onload onmessage onoffline ononline onpagehide onpageshow onpopstate onredo onresize onscroll onstorage onundo onunload |
blockquote | cite |
ol | reverse start type |
li | value |
a | href target download ping rel media hreflang type |
q | cite |
data | value |
time | datetime |
ins | cite datetime |
del | cite datetime |
img | alt src srcset crossorigin usemap ismap width height |
iframe | src srcdoc name sandbox seamless width height |
embed | src type width height |
object | data type typemustmatch name usemap form width height |
param | name value |
video | src crossorigin poster preload autoplay mediagroup loop muted controls width height |
audio | src crossorigin preload autoplay mediagroup loop muted controls |
source | src type media |
track | kind src srclang label default |
canvas | width height |
map | name |
area | alt coords shape href target download ping rel media hreflang type |
colgroup | span |
col | span |
td | colspan rowspan headers |
th | colspan rowspan headers scope |
form | accept-charset action autocomplete enctype method name novalidate target |
fieldset | disabled form name |
label | form for |
input | accept alt autocomplete autofocus checked dirname disabled form formaction formenctype formmethod formnovalidate formtarget height inputmode list max maxlength min multiple name pattern placeholder readonly required size src step type value width |
button | autofocus disabled form formaction formenctype formmethod formnovalidate formtarget name type value |
select | autofocus disabled form multiple name required size |
optgroup | disabled label |
option | disabled label selected value |
textarea | autocomplete autofocus cols dirname disabled form inputmode maxlength name placeholder readonly required rows wrap |
keygen | autofocus challenge disabled form keytype name |
output | for form name |
progress | value max |
meter | value min max low high optimum |
details | open |
command | type label icon disabled checked radiogroup command |
menu | type label |
dialog | open |
As HTML has evolved, elements have come and gone. Yes, some have gone. You might still see some of these in browsers. If you do, well, yikes. Just make sure YOU don’t ever use them. If you are copying and pasting code from someone else, be on the lookout for these obsolete features and replace them.
The elements that have been removed include acronym, applet, bgsound, dir, frame, frameset, noframes, isindex, keygen, listing, menuitem, nextid, noembed, plaintext, rb, rtc, strike, xmp, basefont, big, blinnk, center, font, marquee, multicol, nobr, spacer, tt.
There are quite a few attributes that should no longer be used, too.
Here is the complete list of obsolete features.
Here are some questions useful for your spaced repetition learning. Many of the answers are not found on this page. Some will have popped up in lecture. Others will require you to do your own research.
We’ve covered: