User Tools

Site Tools


software:spirit

Boost.Spirit

Boost.Spirit is a framework for writing parsers. In a sense, it's a replacement for the Lex / Flex and Yacc / Bison tools. One of the advantages of Boost.Spirit is that it does this in plain C++ source, removing the dependency on additional tools, their individual syntax and build steps.

Boost.Spirit consists of three parts:

  • Qi, for writing parsers (i.e., reading source and calling functions as appropriate).
  • Karma, for writing generators (i.e., turning data structures into byte sequences).
  • Lex, for writing lexical analizers (i.e., tokenizing source), an optional and auxiliary function to Qi parsers.

Qi

Parsers

attr( arg ) typeof( attr ) takes no input, always successful, exposes arg as result
eoi unused matches end-of-input
eol unused matches CR, LF, or combinations thereof
eps takes no input, always successful
eps( arg ) unused takes no input, successful if arg evaluates to true
symbols<> see Symbol Tables below
lit(“…”) literal, to allow lit(“keyword”) » '=' » …
lexeme[ … ] suppress skip parsing
skip[ … ] enable skip parsing
skip(p)[ … ] enable skip parsing, using p as skip parser
omit[ … ] unused parses without exposing any attribute
raw[ … ] [first, last) parses, exposing the iterator range of the match
repeat(x)[ a ] vector<A> matches exactly x occurrences of a
repeat(x, inf)[ a ] vector<A> matches at least x occurrences of a
repeat(x, y)[ a ] vector<A> matches at least x, at most y occurrences of a

Operators

- 0..1 prefix; attribute is A
* 0..n prefix; attribute is std::vector<A>
+ 1..n prefix; attribute is std::vector<A>
! negate prefix; fails if the parser succeeds; does not consume input
& and prefix; fails if the parser fails; does not consume input
» followed by nary; attribute is fusion::vector< A, B, C >
> expecting nary; as » but with error on fail instead of backtracing
|| sequence nary; may be a; a followed by b; or just b – quicker for a » -b | b
| alternative nary; try a, then try b; attribute is boost::variant< A, B >
- difference binary; parses a, but not b; attribute is just a, b is ignored
% list of, seperated by binary; shorthand for parser » *(',' » parser)
^ permutation nary; matches a or b in any order, 0..1 times each
= assignment binary; assigns the RHS parser to the LHS rule / grammar
%= assignment binary; if value type of RHS parser matches LHS rule / grammar

Semantic Actions

Indicated by postfix [] after a parser. Calls the indicated handler, passing the type indicated by the parser (e.g. double). Optionally also passes parser context, and reference to boolean “hit” parameter.

Handlers can be:

Plain Functions

void handle( double const & d );
 
// Direct
parse( first, last, double_[ &handle ] );
// boost::bind
parse( first, last, double_[ boost::bind( &handle, _1 ) ] );

Member Functions

struct handler
{
    handle( double const & d ) const;
};
handler h;
 
parse( first, last, double_[ boost::bind( &handler::handle, &h, _1 ) ] );

Function Object

struct handler
{
    // Using placeholders for parser context and "hit" parameter
    void operator()( double const & d, boost::spirit::qi::unused_type, boost::spirit::qi::unused_type ) const;
};
 
parse( first, last, double_[handler()] );

Lambda

parse( first, last, double_[ std::cout << _1 << '\n' ] );

Note on Phoenix

The _1 placeholder is used by Boost.Bind, Boost.Lambda, and Phoenix.

  • Boost.Bind placeholders are e.g. ::_1
  • Boost.Lambda placeholders are e.g. boost::lambda::_1
  • Boost.Phoenix placeholders are e.g. boost::spirit::_1 or boost::spirit::qi::_1

Make sure you do not mix & mingle those, as they are not compatible. Phoenix is recommended.

  • ref() to indicate a variable name used in a semantic function is a variable at parser score, i.e. a mutable reference.
  • val to indicate a rule's synthesized attribute.

If setting a variable in parser scope via a Phoenix placeholder, you need to put the variable name inside ref(), to indicate that it is a mutable reference.

For pushing elements into a vector, Phoenix offers push_back( vector, element ). Note that the vector must be inside ref() again.

Functions

boost::spirit::qi::parse( begin_iterator, end_iterator, grammar_parser );
boost::spirit::qi::phrase_parse( begin_iterator, end_iterator, grammar_parser, skip_parser );
boost::spirit::qi::phrase_parse( begin_iterator, end_iterator, grammar_parser, skip_parser, parser_attribute );

The last call can be used in combination with parser % ',' to put the parsed sequence directly into a vector, instead of going through individual push_back() calls.

A true return value indicates a match (partial or complete). The begin_iterator is passed by reference, and advanced to the first character for which no match was possible. A complete match is indicated by begin_iterator == end_iterator after the call.

Symbol Tables

You can derive from qi::symbols to define symbol - value pairs that could then be used in a parser definition. The class is templated for the input character type, and the value type to be associated with the expression.

struct tictac_ : qi::symbols< char, unsigned >
{
    tictac_()
    {
        add
            ("X"    , 1)
            ("O"    , 0)
        ;
    }
} tictac;

​Documentation

Rules

Parser expressions can be assigned to “rules”, for modularizing a grammar.

rule< Iterator >
rule< Iterator, Skipper >
rule< Iterator, Signature >
rule< Iterator, Signature, Skipper >
rule< Iterator, Skipper, Signature >

Only the versions including a Skipper type can be used with the phrase_parse() (not having a Skipper limits you to the non-skipping parse()).

Signature specifies the attributes of the rule. The Signature can also declare inherited attributes in addition to its own result_type (which can be void): result_type( typeN, typeN, typeN ). Such inherited attributes can be referred to in the rule as _r1, _r2 and so on (courtesy of Boost.Phoenix).

Rules can be given a name (for error handling), through the .name( std::string ) member function.

Grammars

A grammar encapsules one or more rules, and assembles them for use.

  • Derive from grammar (giving the same template parameters as for “rules”)
  • Declare any rules used as member variables
  • Initialize the base class constructor with the rule to be called first when parsing, and give the grammar a name (for error handling)
  • Define the rules in the constructor
template < typename Iterator >
struct tictactoe : boost::spirit::qi::grammar< Iterator, unsigned() >
{
    boost::spirit::qi::rule< Iterator, unsigned() > r;
 
    // If tictactoe were not a template, we could use just base_type(r).
    tictactoe() : ticktactoe::base_type( r, "tictactoe" )
    {
        r = eps[ _val = 0 ]
            >> tictac[_val += _1]
        ;
    }
};

Calling a parser with inherited attributes then looks very much like a function call – parser( parameter ). From the Boost.Spirit example, an XML parser:

qi::rule< Iterator, std::string(), ascii::space_type >         start_tag;
qi::rule< Iterator, void( std::string() ), ascii::space_type > end_tag;
 
// ...
 
start_tag.name( "start_tag" );
end_tag.name( "end_tag" );
 
start_tag =
        '<'
    >>  !char_('/')
    >> lexeme[ +(char_ - '>')[_val += _1] ]
    >> '>'
;
 
end_tag =
        "</"
    >>  lit(_r1)
    >> '>'
;
 
// ...
 
xml =
        start_tag[ at_c<0>( _val ) = _1 ]
    >>  *node    [ push_back( at_c<1>( _val ), _1 ]
    >>  end_tag( at_c<0>( _val ) )
;

Instead of using the Phoenix at_c construct, you can instead use locals. (See ​Boost example, “One More Take” / “Local Variables”.)

Error Handling

Error handlers can be declared via the on_error< action >( rule, handler ) function.

The action parameter is the action to take:

fail return no_match
retry try to match again
accept adjust iterator, return match
rethrow rethrow

The rule parameter is the rule to which the error handler should be attached.

The handler parameter is the function to call if an error is caught. It takes 4 arguments:

first position of iterator when the rule was entered
last end of input
error-pos position of iterator when error occured
what a string describing the failure

This can be handled via Phoenix placeholders as well:

on_error< fail >
(
    xml
  , std::cerr
        << val( "Error, expecting " )
        << _4
        << val( " here: \"" )
        << construct< std::string( _3, _2 )
        << val( "\"" )
        << std::endl
);
software/spirit.txt · Last modified: 2018/09/10 16:21 (external edit)