Reference
Preface
In compliance with the DRY rule we don't repeat here information already present in the quick tour.
Lexical rules
Sources are in utf-8 format. The utf-8 marker, if present, is ignored.
Literals, punctuation tokens, keywords, identifiers can't be split by a line break.
Line breaks terminate single line comments.
Except for the above details, line breaks are otherwise ignored by the lexer and by the syntax parser.
The lexer detects the following syntactic elements:
-
comments
-
literals
-
symbols
-
keywords and punctuation
​
comments
​
they take this form:
​
// single lines comment
​
/* multiline comments
/* can be nested !! */
*/
string literals
string literals are enclosed in double quotes
​
"a string"
​
two consecutive strings are automatically concatenated, even if they are on different rows:
"first part " "second part"
​
there are a number of escape sequences that are used to encode special characters:
​
sequence character codepoint
​
\" " 0x22
\\ \ 0x2f
\' ' 0x27
\? ? 0x3f
\a <bell> 0x07
\b <backspace> 0x08
\f <form feed> 0x0c
\n <line feed> 0x0a
\r <return> 0x0d
\t <tab> 0x09
\v <vertical tab> 0x0b
\xnn <any value> 0..0xff
\unnnn <any unicode> 0..0xffff
\Unnnnnnnn <any unicode> 0..0x001fffff
​
n indicates an hex digit in one of the ranges 0..9, a..f, A..F
being the sing source an utf-8 file, strings can contain verbatim (not escaped) any utf-8 character.
unsigned integer literals
All integer literals begin with a 0..9 digit.
All are separated by the next token by a blank/tab or by puntuation.
Hex values must start with 0x and can contain hex digits and underscores.
Underscores must be preceeded and followed by a digit to be legal.
es:
​
0xfa9
0xfa_9b
​
The maximum legal value is 0xffff_ffff_ffff_ffff (16 digits)
​
Decimal integer values can contain only decimal digits (0..9) and underscores.
Underscores must be preceeded and followed by a digit to be legal.
es:
​
1_000_000
​
The maximum value is 18_446_744_073_709_551_616
float literals
All float literals begin with a 0..9 digit.
All are separated by the next token by a blank/tab or by punctuation.
The general form of a float is:
​
<integer part>.<fraction>e<sign><exponent>
​
<fraction> and e<sign><exponent> are optional but one of them must be present for the literal to be float.
<sign> is optional.
e can be uppercase.
Underscores can be inserted and are ignored but must be preceded and followed by a digit to be legal.
es:
10.3
1.0
12e-3
12_099E+9
​
The maximum allowed value is 17976931348623158e308
​
other literals
An imaginary literal is just an integer or float literal postfixed with i or I.
For the lexer bool literals (true and false) and null are not literal but keywords.
Symbols
Symbols can contain letters (a..z and A..Z) , decimal digits (0..9) and underscores.
Symbols end where the first not-letter, not-underscore, not-digit character is found
The symbol can't start with a digit or with _ followed by an uppercase.
Two consecutive underscores are forbidden.
​
OK:
_value
a_symbol_
KO:
_Value
0times
too__many
keywords and punctuation
Any symbol which matches one of the following sequences is passed from the lexer to the parser as a keyword token.
Additionally, arbitrary sequences of characters starting with a punctuation character are checked against this table.
If the match is ambiguous, the longer matching sequence is recognized and returned.
For example, if in the input stream there is ++, it is interpreted as a single ++, not a couple of +.
The list of sing keywords is pretty long because it includes all the c++ keywords.
This is done to prevent the sing programmer from using symbols which, after conversion to c++ would be interpreted as keywords.
​
Sing Keywords:
null true false void
mut requires namespace var
const type map weak
i8 i16 i32 i64
u8 u16 u32 u64
f32 f64 c64 c128
let string bool fn
pure in out io
.. ... if else
while for return break
continue sizeof ^ case
typeswitch switch default public
private enum class this
interface by step min
max swap ( )
[ ] { }
< > , =
++ -- . +
- * / %
>> << ~ &
| >= <= !=
== ! && ||
: ; += -=
*= /= ^= %=
>>= <<= &= |=
alignas alignof and and_eq
asm atomic_cancel atomic_commit atomic_noexcept
auto bitand bitor catch
char char8_t char16_t char32_t
compl concept consteval constexpr
constint const_cast co_await co_return
co_yield decltype delete do
double dynamic_cast explicit export
extern float friend goto
inline int long mutable
new noexcept not not_eq
nullptr operator or or_eq
protected reflexpr
register reinterpret_cast short signed
static static_assert static_cast struct
synchronized template thread_local throw
try typedef typeid typename
union unsigned using virtual
volatile wchar_t xor xor_eq
int8_t int16_t int32_t int64_t
uint8_t uint16_t uint32_t uint64_t
About Symbols
sing namespaces
Sing has a single namespace for each file plus an additional namespace for the member of each class.
The Namespace includes the 'requires' directives' alias and the defined names (one for declaration), including the ones into inner function blocks.
Sing doesn't support function overloading: symbols must be unique.
​
Referencing private symbols
Public declarations can't refer private ones, but a public function's body can refer any private symbol.
​
Circular dependencies between compilation units.
​
You have a circular dependency if a set of files refer to each other in circle through the "requires" directive.
The circle is broken if any of them doesn't refer to at least a public symbol of the following one in a public declaration outside a function block.
(i.e. if the requires was needed to access symbols in private declarations or inside function blocks).
Circular dependencies are forbidden.
​
Forward references
Are allowed:
from a function body to any symbol.
from a class declaration to another class if it occurs inside a member pointer declaration or as a function argument.
Forward declarations are otherwise forbidden.
es:
​
fn doFwdRef() void
{
var xx AnExample; // function body can forward ref
}
​
class AnExample {
fn getAClone() *AnExample; // reference to itself (AnExample)
fn doStuff(in the_arg DeclaredLater); // forward reference to another class in the argument.
}
​
Constant Expressions
Constant expressions (CTC)
​
Some expressions are required to be compile time constants (CTCs - i.e. constants whose value is known at compile time).
Compile time constant expressions are required in:
-
member initializers
-
argument defaults
CTC operators can include:
-
no postfix operator
-
prefix +, -, !, ~
-
all binop operators.
-
any numeric type conversion.
CTC operands can be:
-
all literals including enum literals
-
constants inited with a "Stricty Constant Epxression"
​
Strictly Constant Epxression (SC)
​
Some expressions are required to be strictly constants (SC) which means constants even in their plain C version.
SC expressions are required:
-
in array sizes declarations
-
as case labels in switches
-
as operands in CTC expressions
SC operators can include:
-
no postfix operator
-
prefix +, -
-
binop operators *, /, +, -, %, >>, <<, &, |, ^
-
integer types conversions.
CTC operands can be:
-
all integer literals including enum literals
-
constants inited with an SC
​
Note that an SC expression is also a CTC expression.
​
Compile time checks
The Compiler is required to check the value of any CTC expression and emit an error in case of overflows and in case the value is used inappropriately (es: a negative subscript).
Operators Details
The usual conversions:
Before any operation (including unary operations), 8 bit and 16 bit signed and unsigned integers are converted to i32.
Before a binary operation (an operation with 2 operads !), if any of the operands is complex and the other is a float of same precision, the other operand is converted to complex.
[] (subscript)
Applies to a vector and returns its element. The subscript must be an integer value.
​
() (function call)
Applies to a function. The returned type is the return type of the function.
​
. (member access)
If applied to a required file alias it returns an extern symbol.
If applied to a class instance or class instance pointer returns a class member.
If applied to an enum type returns an enum constant of same type.
​
* (dereference)
Applies to any non weak pointer and returns the pointed object
​
& (get address)
Applies to any local variable and returns its address (the address is of type 'pointer to <local variable type>)
​
+ unary operator
Apply to any number, returns the same type it receive (after integer promotion)
​
- unary operator
Apply to any number except unsigned types, returns the same type it receive (after integer promotion)
​
~ (bitwise negation)
Apply to any integer type, returns the same type it receive (after integer promotion)
​
! (bool negation)
Applies to a bool and returns a bool.
​
** * / + - (power, multiply, divide, add, subtract)
Apply to any number (the + operator applies to strings too).
After the usual conversions both operands must be of the same type, which is also the type of the result.
​
% & | ^ (modulus, bitwise and, bitwise or, exclusive or)
Apply to any integer. After the usual conversions both operands must be of the same type, which is also the type of the result.
​
>> << (shift)
Apply to any integers. The result type is the same of the left operand (after integer promotion).
​
< <= > >= (comparisons - not for equality)
Apply to any 2 scalars, even if of different types and the result is always value preserving. The result is a bool.
​
== != (equality comparison)
Apply to any 2 numbers, even if of different type. The result is a bool.
Additionally, applies to any 2 identical types except maps, weak pointers, classes and interfaces.
vectors can be compared if their elements can.
​
&& ||
Apply to bool and return bool.
They perform short-circuit evaluation. Meaning that:
if the left term of && is false, the right term is not evaluated.
if the left term of || is true, the right term is not evaluated.
​
Assignment and parameter passing rules
​
Assignments
The following applies equally to:
-
assigning a variable (or updating with a +=, -=... operator)
-
returning a value
-
using a value as a parameter default
-
using a value for initialization
​
An assignment is possible:
-
if the destination and source expressions have the same type.
-
if the destination and source expressions are pointer which differ only for the weakness.
-
if the destination is a const pointer and the source is a not-const pointer of the same type (ignoring weakness).
-
if the destination is a pointer and the source is null.
-
if the destination is a pointer and the source is the address of a local variable of the same type pointer by the pointer.
-
if the destination is a number and the assignment can happen without loss of range or precision, as detailed below.
-
if the destination is an interface pointer and the source is a pointer or the address of a class implementing the interface.
-
if the destination is a dynamic vector and the source is an array with elements of same type.
​
The assignment of a number is said to be "without loss of range or precision" if:
-
the source is a compile time constant expression (CTC, with the limitation stated above), it fits the target type and can be converted without loss of precision.
-
Any possible value of the source type can be converted to the target type without loss of range or precision.
​
Note that:
-
if the type is pointer, the pointed types must be identical (none of the above conversions apply)
-
if the type is a container, the elements contained (or the key of the map) must be of identical type.
​
Passing Values
​
As outputs
A value can be passed as a function output argument if:
-
it has the same type of the parameter.
-
if is a const pointer and the parameter is not-const pointer of same type.
-
if the destination is an interface and the source is a class implementing the interface.
As inputs
You can pass a value to an input argument if you can assign the argument type with the value
except: if the destination is a dynamic vector and the source is an array with elements of same type.
and additionally: if the destination is an interface and the source is a class implementing the interface.
​
Limitations
​
Initialization lists and type inference.
An initialization list is not a typed value, it is just a collection of typed values. As such it cannot be used to infer the type of a newly declared var.
​
type declarations
You can't have a type declaration whose only purpose is to rename an existing user defined named type, like a class or an enumeration.
When you make a type declaration you create an alias for an existing type (typically because it is unnamed). If the type has already a name this may just lead to confusion.
​
member functions
If a member function doesn't access any member it shouldn't be a member function (so to be accessible without having to instantiate an object). In this case the compiler emits an error.
​
Map key types
they must support the == operator. (see above)
​
weak pointers
are allowed only as non aggregated class members.
The purpose is to limit as much as possible the use of weak pointers.
Since pointers always point to dynamically allocated objects and dynamically allocated objects are typically classes...
​
It is not allowed to use the update operators (es: += /=...) if the left term is an 8 or 16 bit integer.
Because if you expand the update operator: x = x + ...; you discover that because of integer promotion you would get a narrowing conversion.
​
switch default case
A switch or typeswitch is not allowed to just have the default case.
​
switch of enumeration completeness
If the switch expression is of enum type, all the enum cases must be present in the case clauses or the switch must have a default case.
​
About typeswitch
The typeswitch expression can NOT include function calls.
This is due to the fact that typeswitch is compiled into an if..else if.. else if ..else C++ construct. The expression is evaluated multiple times and if it is not idempotent you step into umpredictable behavior.
Sing expressions are guaranteed to be idempotent if they don't include function calls.
The typeswitch expression can't be a weak pointer.
If the typeswitch expression is of type const*, the case labels must be as well.
​
sing syntax
​
notation
-
syntax elements are enclosed in angular brackets : < syntax element >
-
literals are in plain text, but can be enclosed in quotes to avoid ambiguity or to make them more visible. For example "." is a literal point, "[" is a literal bracket and """ is a literal quote.
-
optional elements or literals are enclosed in square brackets [ < optional > ]
-
alternate options are indicated by the | operand
-
zero_or_more() is used to insert zero or more occurrences of what is in the brackets
-
one_or_more() is used to insert one or more occurrences of what is in the brackets
​
​
terminal elements
some elements defined as terminal because they are easier to describe with words than formally.
< name >
< integer literal >
< string literal >
< float literal >
< imaginary literal >
​
​
General File Organization
< compilation unit > ::=
[< namespace declaration >]
zero_or_more(< required package >)
zero_or_more([public] < declaration >)
​
< namespace declaration > ::= namespace < qualified name > ;
​
< qualified name > ::= < name > | < name > "." < qualified name >
​
​< required package > ::= requires """ < pkg path > """ [, <name> ]
​
< pkg path > ::= < name > | < name > / < pkg path >
​
< declaration > ::=
< var decl > | < const decl > | < type decl > |
< func definition > | < enum decl > | < class decl > | < interface decl >
Variables, Constants, Types
< var decl > ::= var < name > [< type specification >] [= < initer >] ;
​
< const decl > ::= let < name > [< type specification >] = < initer > ;
​
< type decl > ::= type < name > < type specification > ;
​
< type specification > ::=
< base type > |
< qualified name > |
map ( < type specification > ) < type specification > |
"[" [< array size >] "]" < type specification > |
[const] [weak] * < type specification > |
< function type >
​
< array size > ::= * | < expression >
​
< base type > ::=
i8 | u8 | i16 | u16 | i32 | u32 | i64 | u64 |
f32 | f64 | c64 | c128 |
string | bool
​
< function type > ::= [pure] ( [< argsdef >] ) < return type >
​
< argsdef > ::= < single argdef > | < single argdef > , < argsdef >
< single argdef > ::= [< direction >] <name> < type specification > [= < initer >]
​
< direction > ::= out | io | in
​
< return type > ::= void | < type specification >
​
< initer > ::= < expression > | { < initer list > }
​
< initer list > ::= < initer > | <initer> , < initer list >
​
Functions, Blocks, Statements
< func definition > ::= fn < func fullname > < function type > < block >
​
< func fullname > ::= < func name > | < class name > "." < func name >
< class name > ::= < name >
< func name > ::= < name >
​
< block > ::= { zero_or_more( < block item > ) }
​
< block item > ::= < var decl > | < const decl > | < statement >
​
< statement > ::=
< block > |
while ( < expression > ) < block > |
for ( [< name > , ] < name > in < for range > ) < block > |
< if statement > |
< switch statement > |
< typeswitch statement > |
break ; |
continue ; |
return ( < expression > ) ; |
< prefix expression > ++ ; |
< prefix expression > -- ; |
++ < prefix expression > ; |
-- < prefix expression > ; |
swap ( < prefix expression > , < prefix expression > ) ; |
< prefix expression > < update op > < prefix expression > ; |
< function call > ;
​
< if statement > ::=
if ( < expression > ) < block >
zero_or_more( else if ( < expression > ) < block > )
[ else < block > ]
​
< for range > ::= < expression > [ ":" < expression > [ step < expression > ]]
​
< switch statement > ::= switch ( < expression > ) { one_or_more( < single case > ) [ < default case > ] }
​
< single case > ::= one_or_more( case < expression > ":" ) [< statement >]
​
< default case > ::= default ":" [< statement >]
​
< typeswitch statement > ::=
typeswitch ( <name> = < expression > ) {
one_or_more( < single type case > )
[ < default case > ] }
​
< single type case > ::= one_or_more( case < qualified name > ":" ) [< statement >]
​
< update op > ::=
"=" | "+=" | "-=" | "*=" | "/=" |
"^=" | "%=" | "&=" | "|=" | ">>=" | "<<="
Expressions
< expression > ::= < prefix expression > | < expression > < binop > < expression >
​
< binop > ::=
"+" | "-" | "*" | "/" | "^" | "%" |
"&" | "|" | ">>" | "<<" |
"<" | "<=" | ">" | ">=" | "==" | "!=" |
"**" | "&&" | "||"
​
< prefix expression > ::= < unop > < prefix expression > | < postfix expression >
​
< unop > :: = "-" | "+" | "!" | "~" | "&" | "*"
​
< postfix expression > ::=
< expression term > |
< postfix expression > "[" <index> "]" |
< postfix expression > "." < name > |
< function call >
​
< function call > ::= < postfix expression > ( < arguments > )
​
< expression term > ::=
null | false | true | this | <name> |
< integer literal > | < string literal > | < float literal > | < imaginary literal > |
< base type > ( < expression > ) |
( < expression > ) |
< builtin op > ( < expression > , < expression > )
​
< builtin op > ::= min | max
​
< arguments > ::= < single argument > | < single argument > , < arguments >
​
< single argument > ::= <expression> [ ":" <name> ]
Enums, Interfaces, Classes
< enum decl > ::= enum <name> { < enum elements > }
​
< enum elements > ::= < enum single element > | < enum single element > , < enum elements >
< enum single element > ::= < name > | < name > = < expression >
< interface decl > ::= interface <name> [ ":" < base interfaces > ] { zero_or_more( < interface element > ) }
​
< base interfaces > ::= < qualified name > | < qualified name > , < base interfaces >
​
< interface element > ::= fn [mut] <name> < function type > ;
< class_decl > ::= class <name> [ ":" < base class interfaces > ] { zero_or_more( < class element > ) }
​
< base class interfaces > ::= < qualified name > [ by <name>] | < qualified name > , < base class interfaces >
​
< class element > ::=
public ":" |
private ":" |
< var decl > |
fn [mut] <name> < function type > ; |
fn <name> by <name> ;