Reference

Preface
 

In compliance with the DRY rule we don't repeat here information already present in the quick tour.

Lexical rules
 

Sources are in utf-8 format. The utf-8 marker, if present, is ignored.
Literals, punctuation tokens, keywords, identifiers can't be split by a line break.
Line breaks terminate single line comments.
Except for the above details, line breaks are otherwise ignored by the lexer and by the syntax parser.

The lexer detects the following syntactic elements:

  • comments

  • literals

  • symbols

  • keywords and punctuation

comments

they take this form:

// single lines comment

/* multiline comments
/* can be nested !! */
*/


string literals

 

string literals are enclosed in double quotes

"a string"

two consecutive strings are automatically concatenated, even if they are on different rows:


"first part " "second part"

there are a number of escape sequences that are used to encode special characters:

sequence    character       codepoint

\"          "               0x22
\\          \               0x2f
\'          '               0x27
\?          ?               0x3f
\a          <bell>          0x07
\b          <backspace>     0x08
\f          <form feed>     0x0c
\n          <line feed>     0x0a
\r          <return>        0x0d
\t          <tab>           0x09
\v          <vertical tab>  0x0b
\xnn        <any value>     0..0xff
\unnnn      <any unicode>   0..0xffff
\Unnnnnnnn  <any unicode>   0..0x001fffff

n indicates an hex digit in one of the ranges 0..9, a..f, A..F 

being the sing source an utf-8 file, strings can contain verbatim (not escaped) any utf-8 character.

 

unsigned integer literals

 

All integer literals begin with a 0..9 digit.
All are separated by the next token by a blank/tab or by puntuation.

Hex values must start with 0x and can contain hex digits and underscores.
Underscores must be preceeded and followed by a digit to be legal.

es:

0xfa9
0xfa_9b

The maximum legal value is 0xffff_ffff_ffff_ffff (16 digits)

Decimal integer values can contain only decimal digits (0..9) and underscores.
Underscores must be preceeded and followed by a digit to be legal.

es:

1_000_000

The maximum value is 18_446_744_073_709_551_616

 

float literals

 

All float literals begin with a 0..9 digit.
All are separated by the next token by a blank/tab or by punctuation.

The general form of a float is:

<integer part>.<fraction>e<sign><exponent>

<fraction> and e<sign><exponent> are optional but one of them must be present for the literal to be float.
<sign> is optional.
e can be uppercase.
Underscores can be inserted and are ignored but must be preceded and followed by a digit to be legal.

es: 

 

10.3
1.0
12e-3
12_099E+9

The maximum allowed value is 17976931348623158e308

other literals

 

An imaginary literal is just an integer or float literal postfixed with i or I.
For the lexer bool literals (true and false) and null are not literal but keywords.

 

Symbols

 

Symbols can contain letters (a..z and A..Z) , decimal digits (0..9) and underscores. 
Symbols end where the first not-letter, not-underscore, not-digit character is found
The symbol can't start with a digit or with _ followed by an uppercase.
Two consecutive underscores are forbidden.

OK:

_value
a_symbol_


KO:

_Value
0times
too__many


keywords and punctuation


Any symbol which matches one of the following sequences is passed from the lexer to the parser as a keyword token.
Additionally, arbitrary sequences of characters starting with a punctuation character are checked against this table.
If the match is ambiguous, the longer matching sequence is recognized and returned.

For example, if in the input stream there is ++, it is interpreted as a single ++, not a couple of +.

The list of sing keywords is pretty long because it includes all the c++ keywords. 
This is done to prevent the sing programmer from using symbols which, after conversion to c++ would be interpreted as keywords.

Sing Keywords:
     
null            true            false           void
mut             requires        namespace       var
const           type            map             weak
i8              i16             i32             i64
u8              u16             u32             u64
f32             f64             c64             c128
let             string          bool            fn
pure            in              out             io
..              ...             if              else
while           for             return          break
continue        sizeof          ^               case
typeswitch      switch          default         public
private         enum            class           this
interface       by              step            min
max             swap            (               )
[               ]               {               }
<               >               ,               =
++              --              .               +
-               *               /               %
>>              <<              ~               &
|               >=              <=              !=
==              !               &&              ||
:               ;               +=              -=
*=              /=              ^=              %=
>>=             <<=             &=              |=
alignas         alignof         and             and_eq
asm             atomic_cancel   atomic_commit   atomic_noexcept
auto            bitand          bitor           catch
char            char8_t         char16_t        char32_t
compl           concept         consteval       constexpr
constint        const_cast      co_await        co_return
co_yield        decltype        delete          do
double          dynamic_cast    explicit        export
extern          float           friend          goto
inline          int             long            mutable
new             noexcept        not             not_eq
nullptr         operator        or              or_eq
protected       reflexpr            
register        reinterpret_cast short          signed
static          static_assert   static_cast     struct
synchronized    template        thread_local    throw
try             typedef         typeid          typename
union           unsigned        using           virtual
volatile        wchar_t         xor             xor_eq
int8_t          int16_t         int32_t         int64_t
uint8_t         uint16_t        uint32_t        uint64_t

About Symbols
 

sing namespaces

 

Sing has a single namespace for each file plus an additional namespace for the member of each class.
The Namespace includes the 'requires' directives' alias and the defined names (one for declaration), including the ones into inner function blocks.


Sing doesn't support function overloading: symbols must be unique.

Referencing private symbols

 

Public declarations can't refer private ones, but a public function's body can refer any private symbol.

Circular dependencies between compilation units.

You have a circular dependency if a set of files refer to each other in circle through the "requires" directive.
The circle is broken if any of them doesn't refer to at least a public symbol of the following one in a public declaration outside a function block. 
(i.e. if the requires was needed to access symbols in private declarations or inside function blocks).
Circular dependencies are forbidden.

Forward references

 

Are allowed:
from a function body to any symbol.
from a class declaration to another class if it occurs inside a member pointer declaration or as a function argument.
Forward declarations are otherwise forbidden.

es:

fn doFwdRef() void
{
    var xx AnExample;          // function body can forward ref

class AnExample {

    fn getAClone() *AnExample;              // reference to itself (AnExample)
    fn doStuff(in the_arg DeclaredLater);   // forward reference to another class in the argument.
}

Constant Expressions
 

Constant expressions (CTC)

Some expressions are required to be compile time constants (CTCs - i.e. constants whose value is known at compile time).

Compile time constant expressions are required in:

  • member initializers

  • argument defaults

 

CTC operators can include:

  • no postfix operator

  • prefix +, -, !, ~

  • all binop operators.

  • any numeric type conversion.

 

CTC operands can be:

  • all literals including enum literals

  • constants inited with a "Stricty Constant Epxression"

Strictly Constant Epxression (SC)

Some expressions are required to be strictly constants (SC) which means constants even in their plain C version.
SC expressions are required:

  • in array sizes declarations

  • as case labels in switches

  • as operands in CTC expressions

 

SC operators can include:

  • no postfix operator

  • prefix +, -

  • binop operators *, /, +, -, %, >>, <<, &, |, ^

  • integer types conversions.

 

CTC operands can be:

  • all integer literals including enum literals

  • constants inited with an SC

Note that an SC expression is also a CTC expression.

Compile time checks

 

The Compiler is required to check the value of any CTC expression and emit an error in case of overflows and in case the value is used inappropriately (es: a negative subscript). 
 

Operators Details
 

The usual conversions:

Before any operation (including unary operations), 8 bit and 16 bit signed and unsigned integers are converted to i32.
Before a binary operation (an operation with 2 operads !), if any of the operands is complex and the other is a float of same precision, the other operand is converted to complex.


[] (subscript)

Applies to a vector and returns its element. The subscript must be an integer value.

() (function call)

Applies to a function. The returned type is the return type of the function.

. (member access)

If applied to a required file alias it returns an extern symbol.
If applied to a class instance or class instance pointer returns a class member.
If applied to an enum type returns an enum constant of same type.

* (dereference)

Applies to any non weak pointer and returns the pointed object

& (get address)

Applies to any local variable and returns its address (the address is of type 'pointer to <local variable type>)

+ unary operator

Apply to any number, returns the same type it receive (after integer promotion)

- unary operator

Apply to any number except unsigned types, returns the same type it receive (after integer promotion)

~ (bitwise negation)

Apply to any integer type, returns the same type it receive (after integer promotion)

! (bool negation)

Applies to a bool and returns a bool.

** * / + - (power, multiply, divide, add, subtract)

Apply to any number (the + operator applies to strings too). 
After the usual conversions both operands must be of the same type, which is also the type of the result.

% & | ^ (modulus, bitwise and, bitwise or, exclusive or)

Apply to any integer. After the usual conversions both operands must be of the same type, which is also the type of the result.

>> << (shift)

Apply to any integers. The result type is the same of the left operand (after integer promotion).

< <= > >= (comparisons - not for equality)

Apply to any 2 scalars, even if of different types and the result is always value preserving. The result is a bool.

== != (equality comparison)

Apply to any 2 numbers, even if of different type. The result is a bool.
Additionally, applies to any 2 identical types except maps, weak pointers, classes and interfaces.
vectors can be compared if their elements can. 

&& ||

Apply to bool and return bool.
They perform short-circuit evaluation. Meaning that:
if the left term of && is false, the right term is not evaluated.
if the left term of || is true, the right term is not evaluated.

Assignment and parameter passing rules

Assignments

 

The following applies equally to:

  • assigning a variable (or updating with a +=, -=... operator)

  • returning a value

  • using a value as a parameter default

  • using a value for initialization

An assignment is possible:

  • if the destination and source expressions have the same type.

  • if the destination and source expressions are pointer which differ only for the weakness.

  • if the destination is a const pointer and the source is a not-const pointer of the same type (ignoring weakness).

  • if the destination is a pointer and the source is null.

  • if the destination is a pointer and the source is the address of a local variable of the same type pointer by the pointer.

  • if the destination is a number and the assignment can happen without loss of range or precision, as detailed below.

  • if the destination is an interface pointer and the source is a pointer or the address of a class implementing the interface.

  • if the destination is a dynamic vector and the source is an array with elements of same type.

The assignment of a number is said to be "without loss of range or precision" if:

  • the source is a compile time constant expression (CTC, with the limitation stated above), it fits the target type and can be converted without loss of precision.

  • Any possible value of the source type can be converted to the target type without loss of range or precision.

Note that:

  • if the type is pointer, the pointed types must be identical (none of the above conversions apply)

  • if the type is a container, the elements contained (or the key of the map) must be of identical type.

Passing Values

As outputs

A value can be passed as a function output argument if:

  • it has the same type of the parameter.

  • if is a const pointer and the parameter is not-const pointer of same type.

  • if the destination is an interface and the source is a class implementing the interface.

 

As inputs

You can pass a value to an input argument if you can assign the argument type with the value 
except: if the destination is a dynamic vector and the source is an array with elements of same type.
and additionally:  if the destination is an interface and the source is a class implementing the interface.

Limitations

Initialization lists and type inference.

An initialization list is not a typed value, it is just a collection of typed values. As such it cannot be used to infer the type of a newly declared var.

type declarations

You can't have a type declaration whose only purpose is to rename an existing user defined named type, like a class or an enumeration.
When you make a type declaration you create an alias for an existing type (typically because it is unnamed). If the type has already a name this may just lead to confusion.

member functions

If a member function doesn't access any member it shouldn't be a member function (so to be accessible without having to instantiate an object). In this case the compiler emits an error.

Map key types

they must support the == operator. (see above)

weak pointers

are allowed only as non aggregated class members.
The purpose is to limit as much as possible the use of weak pointers.
Since pointers always point to dynamically allocated objects and dynamically allocated objects are typically classes...

It is not allowed to use the update operators (es: += /=...) if the left term is an 8 or 16 bit integer.
Because if you expand the update operator: x = x + ...; you discover that because of integer promotion you would get a narrowing conversion.

switch default case

A switch or typeswitch is not allowed to just have the default case.

switch of enumeration completeness

If the switch expression is of enum type, all the enum cases must be present in the case clauses or the switch must have a default case.

About typeswitch

The typeswitch expression can NOT include function calls.

This is due to the fact that typeswitch is compiled into an if..else if.. else if ..else C++ construct. The expression is evaluated multiple times and if it is not idempotent you step into umpredictable behavior.

Sing expressions are guaranteed to be idempotent if they don't include function calls.

The typeswitch expression can't be a weak pointer.

If the typeswitch expression is of type const*, the case labels must be as well.

 
 
 
 
 
 

sing syntax

notation
 

  • syntax elements are enclosed in angular brackets :  < syntax element >

  • literals are in plain text, but can be enclosed in quotes to avoid ambiguity or to make them more visible. For example "." is a literal point, "[" is a literal bracket and """ is a literal quote.

  • optional elements or literals are enclosed in square brackets [ < optional > ]

  • alternate options are indicated by the | operand

  • zero_or_more() is used to insert zero or more occurrences of what is in the brackets

  • one_or_more() is used to insert one or more occurrences of what is in the brackets

terminal elements


some elements defined as terminal because they are easier to describe with words than formally.

< name >
< integer literal >
< string literal >
< float literal >

< imaginary literal >

General File Organization
 

< compilation unit > ::=

[< namespace declaration >]

zero_or_more(< required package >)

zero_or_more([public] < declaration >)

< namespace declaration > ::= namespace < qualified name > ;

< qualified name > ::= < name > | < name > "." < qualified name >

​< required package > ::= requires """ < pkg path > """ [, <name> ]

< pkg path > ::= < name > | < name > / < pkg path >

< declaration > ::=

< var decl > | < const decl > | < type decl > | 

< func definition > | < enum decl > | < class decl > | < interface decl >

 


Variables, Constants, Types


< var decl > ::= var < name > [< type specification >] [= < initer >] ;

< const decl > ::= let < name > [< type specification >] = < initer > ;

< type decl > ::= type < name > < type specification > ;

< type specification > ::=

< base type > |

< qualified name > |
map ( < type specification > ) < type specification >  |
"[" [< array size >] "]" < type specification > |
[const] [weak] * < type specification > |
< function type >

< array size > ::= * | < expression >

< base type > ::=

i8 | u8 | i16 | u16 | i32 | u32 | i64 | u64 |
f32 | f64 | c64 | c128 | 
string | bool

< function type > ::= [pure] ( [< argsdef >] ) < return type >

< argsdef > ::=  < single argdef > | < single argdef > , < argsdef >
    
< single argdef > ::= [< direction >] <name> < type specification > [= < initer >]

< direction > ::= out | io | in

< return type > ::= void | < type specification >

< initer > ::= < expression > | { < initer list > }

< initer list > ::= < initer > | <initer> , < initer list >

 

Functions, Blocks, Statements


< func definition > ::= fn < func fullname > < function type > < block >

< func fullname > ::= < func name > | < class name > "." < func name >
< class name > ::= < name >
< func name > ::= < name >

< block > ::= { zero_or_more( < block item > ) }

< block item > ::= < var decl > | < const decl > | < statement >

< statement > ::=  
    < block > |
    while ( < expression > ) < block > | 
    for ( [< name > , ] < name > in < for range > ) < block > | 
    < if statement > |
    < switch statement > |
    < typeswitch statement > |
    break ; | 
    continue ; |
    return ( < expression > ) ; |
    < prefix expression > ++ ; |
    < prefix expression > -- ; |
    ++ < prefix expression > ; |
    -- < prefix expression > ; |
    swap ( < prefix expression > , < prefix expression > ) ; |
    < prefix expression > < update op > < prefix expression > ; |
    < function call > ;

< if statement > ::= 
    if ( < expression > ) < block > 
    zero_or_more( else if ( < expression > ) < block > ) 
    [ else < block > ]

< for range > ::= < expression > [ ":" < expression > [ step < expression > ]]

< switch statement > ::= switch ( < expression > ) { one_or_more( < single case > ) [ < default case > ] }

< single case > ::= one_or_more( case < expression > ":" ) [< statement >]

< default case > ::= default ":" [< statement >]

< typeswitch statement > ::= 
    typeswitch ( <name> = < expression > ) { 
    one_or_more( < single type case > ) 
    [ < default case > ] }

< single type case > ::= one_or_more( case < qualified name > ":" ) [< statement >]

< update op > ::= 
    "=" | "+=" | "-=" | "*=" | "/=" | 
    "^=" | "%=" | "&=" | "|=" | ">>=" | "<<="

 

Expressions


< expression > ::= < prefix expression > | < expression > < binop > < expression >

< binop > ::=  
    "+" | "-" | "*" | "/" | "^" | "%" | 
    "&" | "|" | ">>" | "<<" |
    "<" | "<=" | ">" | ">=" | "==" | "!=" | 
    "**" | "&&" | "||" 

< prefix expression > ::=  < unop > < prefix expression > | < postfix expression >

< unop > :: = "-" | "+" | "!" | "~" | "&" | "*"

< postfix expression > ::= 
    < expression term > | 
    < postfix expression > "[" <index> "]" | 
    < postfix expression > "." < name > | 
    < function call >

< function call > ::= < postfix expression > ( < arguments > )

< expression term > ::= 
    null | false | true | this | <name> |
    < integer literal > | < string literal > | < float literal > | < imaginary literal > | 
    < base type > ( < expression > ) | 
    ( < expression > ) |  
    < builtin op > ( < expression > , < expression > )

< builtin op > ::= min | max

< arguments > ::= < single argument > | < single argument > , < arguments >

< single argument > ::= <expression> [ ":" <name> ]

 

 

Enums, Interfaces, Classes


< enum decl > ::= enum <name> { < enum elements > } 

< enum elements > ::= < enum single element > | < enum single element > , < enum elements >


< enum single element > ::= < name > | < name > = < expression >
 


< interface decl > ::= interface <name> [ ":" < base interfaces > ] { zero_or_more( < interface element > ) }

< base interfaces > ::= < qualified name > | < qualified name > , < base interfaces >

< interface element > ::= fn [mut] <name> < function type > ;

 


< class_decl > ::= class <name> [ ":" < base class interfaces > ] { zero_or_more( < class element > ) }

< base class interfaces > ::= < qualified name > [ by <name>] | < qualified name > , < base class interfaces >

< class element > ::= 
    public ":" |
    private ":" |
    < var decl > |
    fn [mut] <name> < function type > ; |
    fn <name> by <name> ;