Table of Contents
Boost.Spirit
Boost.Spirit is a framework for writing parsers. In a sense, it's a replacement for the Lex / Flex and Yacc / Bison tools. One of the advantages of Boost.Spirit is that it does this in plain C++ source, removing the dependency on additional tools, their individual syntax and build steps.
Boost.Spirit consists of three parts:
- Qi, for writing parsers (i.e., reading source and calling functions as appropriate).
- Karma, for writing generators (i.e., turning data structures into byte sequences).
- Lex, for writing lexical analizers (i.e., tokenizing source), an optional and auxiliary function to Qi parsers.
Qi
Parsers
attr( arg ) | typeof( attr ) | takes no input, always successful, exposes arg as result |
eoi | unused | matches end-of-input |
eol | unused | matches CR, LF, or combinations thereof |
eps | takes no input, always successful | |
eps( arg ) | unused | takes no input, successful if arg evaluates to true |
symbols<> | see Symbol Tables below | |
lit(“…”) | literal, to allow lit(“keyword”) » '=' » … |
|
lexeme[ … ] | suppress skip parsing | |
skip[ … ] | enable skip parsing | |
skip(p)[ … ] | enable skip parsing, using p as skip parser | |
omit[ … ] | unused | parses without exposing any attribute |
raw[ … ] | [first, last) | parses, exposing the iterator range of the match |
repeat(x)[ a ] | vector<A> | matches exactly x occurrences of a |
repeat(x, inf)[ a ] | vector<A> | matches at least x occurrences of a |
repeat(x, y)[ a ] | vector<A> | matches at least x , at most y occurrences of a |
Operators
- | 0..1 | prefix; attribute is A |
* | 0..n | prefix; attribute is std::vector<A> |
+ | 1..n | prefix; attribute is std::vector<A> |
! | negate | prefix; fails if the parser succeeds; does not consume input |
& | and | prefix; fails if the parser fails; does not consume input |
» | followed by | nary; attribute is fusion::vector< A, B, C > |
> | expecting | nary; as » but with error on fail instead of backtracing |
|| | sequence | nary; may be a; a followed by b; or just b – quicker for a » -b | b |
| | alternative | nary; try a, then try b; attribute is boost::variant< A, B > |
- | difference | binary; parses a, but not b; attribute is just a, b is ignored |
% | list of, seperated by | binary; shorthand for parser » *(',' » parser) |
^ | permutation | nary; matches a or b in any order, 0..1 times each |
= | assignment | binary; assigns the RHS parser to the LHS rule / grammar |
%= | assignment | binary; if value type of RHS parser matches LHS rule / grammar |
Semantic Actions
Indicated by postfix []
after a parser. Calls the indicated handler, passing the type indicated by the parser (e.g. double
). Optionally also passes parser context, and reference to boolean “hit” parameter.
Handlers can be:
Plain Functions
void handle( double const & d ); // Direct parse( first, last, double_[ &handle ] ); // boost::bind parse( first, last, double_[ boost::bind( &handle, _1 ) ] );
Member Functions
struct handler { handle( double const & d ) const; }; handler h; parse( first, last, double_[ boost::bind( &handler::handle, &h, _1 ) ] );
Function Object
struct handler { // Using placeholders for parser context and "hit" parameter void operator()( double const & d, boost::spirit::qi::unused_type, boost::spirit::qi::unused_type ) const; }; parse( first, last, double_[handler()] );
Lambda
parse( first, last, double_[ std::cout << _1 << '\n' ] );
Note on Phoenix
The _1
placeholder is used by Boost.Bind, Boost.Lambda, and Phoenix.
- Boost.Bind placeholders are e.g.
::_1
- Boost.Lambda placeholders are e.g.
boost::lambda::_1
- Boost.Phoenix placeholders are e.g.
boost::spirit::_1
orboost::spirit::qi::_1
Make sure you do not mix & mingle those, as they are not compatible. Phoenix is recommended.
ref()
to indicate a variable name used in a semantic function is a variable at parser score, i.e. a mutable reference.val
to indicate a rule's synthesized attribute.
If setting a variable in parser scope via a Phoenix placeholder, you need to put the variable name inside ref()
, to indicate that it is a mutable reference.
For pushing elements into a vector, Phoenix offers push_back( vector, element )
. Note that the vector must be inside ref()
again.
Functions
boost::spirit::qi::parse( begin_iterator, end_iterator, grammar_parser ); boost::spirit::qi::phrase_parse( begin_iterator, end_iterator, grammar_parser, skip_parser ); boost::spirit::qi::phrase_parse( begin_iterator, end_iterator, grammar_parser, skip_parser, parser_attribute );
The last call can be used in combination with parser % ',
' to put the parsed sequence directly into a vector, instead of going through individual push_back()
calls.
A true
return value indicates a match (partial or complete). The begin_iterator
is passed by reference, and advanced to the first character for which no match was possible. A complete match is indicated by begin_iterator == end_iterator
after the call.
Symbol Tables
You can derive from qi::symbols
to define symbol - value pairs that could then be used in a parser definition. The class is templated for the input character type, and the value type to be associated with the expression.
struct tictac_ : qi::symbols< char, unsigned > { tictac_() { add ("X" , 1) ("O" , 0) ; } } tictac;
Rules
Parser expressions can be assigned to “rules”, for modularizing a grammar.
rule< Iterator > |
rule< Iterator, Skipper > |
rule< Iterator, Signature > |
rule< Iterator, Signature, Skipper > |
rule< Iterator, Skipper, Signature > |
Only the versions including a Skipper
type can be used with the phrase_parse()
(not having a Skipper limits you to the non-skipping parse()
).
Signature
specifies the attributes of the rule. The Signature can also declare inherited attributes in addition to its own result_type
(which can be void
): result_type( typeN, typeN, typeN )
. Such inherited attributes can be referred to in the rule as _r1
, _r2
and so on (courtesy of Boost.Phoenix).
Rules can be given a name (for error handling), through the .name( std::string )
member function.
Grammars
A grammar encapsules one or more rules, and assembles them for use.
- Derive from
grammar
(giving the same template parameters as for “rules”) - Declare any rules used as member variables
- Initialize the base class constructor with the rule to be called first when parsing, and give the grammar a name (for error handling)
- Define the rules in the constructor
template < typename Iterator > struct tictactoe : boost::spirit::qi::grammar< Iterator, unsigned() > { boost::spirit::qi::rule< Iterator, unsigned() > r; // If tictactoe were not a template, we could use just base_type(r). tictactoe() : ticktactoe::base_type( r, "tictactoe" ) { r = eps[ _val = 0 ] >> tictac[_val += _1] ; } };
Calling a parser with inherited attributes then looks very much like a function call – parser( parameter )
. From the Boost.Spirit example, an XML parser:
qi::rule< Iterator, std::string(), ascii::space_type > start_tag; qi::rule< Iterator, void( std::string() ), ascii::space_type > end_tag; // ... start_tag.name( "start_tag" ); end_tag.name( "end_tag" ); start_tag = '<' >> !char_('/') >> lexeme[ +(char_ - '>')[_val += _1] ] >> '>' ; end_tag = "</" >> lit(_r1) >> '>' ; // ... xml = start_tag[ at_c<0>( _val ) = _1 ] >> *node [ push_back( at_c<1>( _val ), _1 ] >> end_tag( at_c<0>( _val ) ) ;
Instead of using the Phoenix at_c
construct, you can instead use locals. (See Boost example, “One More Take” / “Local Variables”.)
Error Handling
Error handlers can be declared via the on_error< action >( rule, handler )
function.
The action
parameter is the action to take:
fail | return no_match |
retry | try to match again |
accept | adjust iterator, return match |
rethrow | rethrow |
The rule
parameter is the rule to which the error handler should be attached.
The handler
parameter is the function to call if an error is caught. It takes 4 arguments:
first | position of iterator when the rule was entered |
last | end of input |
error-pos | position of iterator when error occured |
what | a string describing the failure |
This can be handled via Phoenix placeholders as well:
on_error< fail > ( xml , std::cerr << val( "Error, expecting " ) << _4 << val( " here: \"" ) << construct< std::string( _3, _2 ) << val( "\"" ) << std::endl );