Lexical structure v16

There are several aspects to the lexical structure of SQL:

  • SQL input consists of a sequence of commands.

  • A command is composed of a sequence of tokens, terminated by a semicolon (;). The end of the input stream also terminates a command.

  • The valid tokens depend on the syntax of the command.

  • A token can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol. Tokens are normally separated by whitespace (space, tab, new line) but don't need to be if there's no ambiguity (which is generally the case only if a special character is adjacent to some other token type).

  • Comments can occur in SQL input. They aren't tokens; they are equivalent to whitespace.

For example, the following is syntactically valid SQL input:

SELECT * FROM MY_TABLE;
UPDATE MY_TABLE SET A = 5;
INSERT INTO MY_TABLE VALUES (3, 'hi there');

This is a sequence of three commands, one per line, although that format isn't required. You can enter more than one command on a line, and commands can usually split across lines.

The SQL syntax isn't very consistent regarding the tokens that identify commands and the ones that are operands or parameters. The first few tokens are generally the command name, so the example contains a SELECT, an UPDATE, and an INSERT command. But, for instance, the UPDATE command always requires a SET token to appear in a certain position, and this variation of INSERT also requires a VALUES token to be complete. The precise syntax rules for each command are described in SQL reference.