====== Boost.Spirit ======
[[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/index.html | Boost.Spirit]] is a framework for writing parsers. In a sense, it's a replacement for the Lex / Flex and Yacc / Bison tools. One of the advantages of Boost.Spirit is that it does this in plain C++ source, removing the dependency on additional tools, their individual syntax and build steps.
Boost.Spirit consists of three parts:
* Qi, for writing //parsers// (i.e., reading source and calling functions as appropriate).
* Karma, for writing //generators// (i.e., turning data structures into byte sequences).
* Lex, for writing //lexical analizers// (i.e., tokenizing source), an optional and auxiliary function to Qi parsers.
===== Qi =====
==== Parsers ====
| ''attr( arg )'' | ''typeof( attr )'' | takes no input, always successful, exposes ''arg'' as result |
| ''eoi'' | unused | matches end-of-input |
| ''eol'' | unused | matches CR, LF, or combinations thereof |
| ''eps'' | | takes no input, always successful |
| ''eps( arg )'' | unused | takes no input, successful if ''arg'' evaluates to ''true'' |
| ''symbols<>'' | | see Symbol Tables below |
| ''lit("...")'' | | literal, to allow ''lit("keyword") >> '=' >> ...'' |
| ''lexeme[ ... ]'' | | suppress skip parsing |
| ''skip[ ... ]'' | | enable skip parsing |
| ''skip(p)[ ... ]'' | | enable skip parsing, using p as skip parser |
| ''omit[ ... ]'' | unused | parses without exposing any attribute |
| ''raw[ ... ]'' | ''[first, last)'' | parses, exposing the iterator range of the match |
| ''repeat(x)[ a ]'' | ''vector'' | matches exactly ''x'' occurrences of ''a'' |
| ''repeat(x, inf)[ a ]'' | ''vector'' | matches at least ''x'' occurrences of ''a'' |
| ''repeat(x, y)[ a ]'' | ''vector'' | matches at least ''x'', at most y occurrences of ''a'' |
* [[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/quick_reference/qi_parsers/char.html | Character Parsers]]
* [[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/quick_reference/qi_parsers/numeric.html | Numeric Parsers]]
* [[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/quick_reference/qi_parsers/string.html | String Parsers]]
==== Operators ====
| ''-'' | 0..1 | prefix; attribute is ''A'' |
| ''*'' | 0..n | prefix; attribute is ''std::vector'' |
| ''+'' | 1..n | prefix; attribute is ''std::vector'' |
| ''!'' | negate | prefix; fails if the parser succeeds; does not consume input |
| ''&'' | and | prefix; fails if the parser fails; does not consume input |
| ''>>'' | followed by | nary; attribute is ''fusion::vector< A, B, C >'' |
| ''>'' | expecting | nary; as ''>>'' but with error on fail instead of backtracing |
| ''||'' | sequence | nary; may be a; a followed by b; or just b -- quicker for ''a >> -b | b'' |
| ''|'' | alternative | nary; try a, then try b; attribute is ''boost::variant< A, B >'' |
| ''-'' | difference | binary; parses a, but not b; attribute is just a, b is ignored |
| ''%'' | list of, seperated by | binary; shorthand for ''parser >> *(',' >> parser)'' |
| ''^'' | permutation | nary; matches a or b in any order, 0..1 times each |
| ''='' | assignment | binary; assigns the RHS parser to the LHS rule / grammar |
| ''%='' | assignment | binary; if value type of RHS parser matches LHS rule / grammar |
==== Semantic Actions ====
Indicated by postfix ''[]'' after a parser. Calls the indicated handler, passing the type indicated by the parser (e.g. ''double''). Optionally also passes parser context, and reference to boolean "hit" parameter.
Handlers can be:
=== Plain Functions ===
void handle( double const & d );
// Direct
parse( first, last, double_[ &handle ] );
// boost::bind
parse( first, last, double_[ boost::bind( &handle, _1 ) ] );
=== Member Functions ===
struct handler
{
handle( double const & d ) const;
};
handler h;
parse( first, last, double_[ boost::bind( &handler::handle, &h, _1 ) ] );
=== Function Object ===
struct handler
{
// Using placeholders for parser context and "hit" parameter
void operator()( double const & d, boost::spirit::qi::unused_type, boost::spirit::qi::unused_type ) const;
};
parse( first, last, double_[handler()] );
=== Lambda ===
parse( first, last, double_[ std::cout << _1 << '\n' ] );
==== Note on Phoenix ====
The ''_1'' placeholder is used by Boost.Bind, Boost.Lambda, and Phoenix.
* Boost.Bind placeholders are e.g. ''::_1''
* Boost.Lambda placeholders are e.g. ''boost::lambda::_1''
* Boost.Phoenix placeholders are e.g. ''boost::spirit::_1'' or ''boost::spirit::qi::_1''
Make sure you do not mix & mingle those, as they are not compatible. Phoenix is recommended.
* ''ref()'' to indicate a variable name used in a semantic function is a variable at parser score, i.e. a mutable reference.
* ''val'' to indicate a rule's synthesized attribute.
If setting a variable in parser scope via a Phoenix placeholder, you need to put the variable name inside ''ref()'', to indicate that it is a mutable reference.
For pushing elements into a vector, Phoenix offers ''push_back( vector, element )''. Note that the vector must be inside ''ref()'' again.
==== Functions ====
boost::spirit::qi::parse( begin_iterator, end_iterator, grammar_parser );
boost::spirit::qi::phrase_parse( begin_iterator, end_iterator, grammar_parser, skip_parser );
boost::spirit::qi::phrase_parse( begin_iterator, end_iterator, grammar_parser, skip_parser, parser_attribute );
The last call can be used in combination with ''parser % ',''' to put the parsed sequence directly into a vector, instead of going through individual ''push_back()'' calls.
A ''true'' return value indicates a match (partial or complete). The ''begin_iterator'' is passed by reference, and advanced to the first character for which no match was possible. A complete match is indicated by ''begin_iterator == end_iterator'' after the call.
==== Symbol Tables ====
You can derive from ''qi::symbols'' to define symbol - value pairs that could then be used in a parser definition. The class is templated for the input character type, and the value type to be associated with the expression.
struct tictac_ : qi::symbols< char, unsigned >
{
tictac_()
{
add
("X" , 1)
("O" , 0)
;
}
} tictac;
[[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/reference/string/symbols.html |Documentation]]
==== Rules ====
Parser expressions can be assigned to "rules", for modularizing a grammar.
| rule< Iterator > |
| rule< Iterator, Skipper > |
| rule< Iterator, Signature > |
| rule< Iterator, Signature, Skipper > |
| rule< Iterator, Skipper, Signature > |
Only the versions including a ''Skipper'' type can be used with the ''phrase_parse()'' (not having a Skipper limits you to the non-skipping ''parse()'').
''Signature'' specifies the attributes of the rule. The Signature can also declare //inherited// attributes in addition to its own ''result_type'' (which can be ''void''): ''result_type( typeN, typeN, typeN )''. Such inherited attributes can be referred to in the rule as ''_r1'', ''_r2'' and so on (courtesy of Boost.Phoenix).
Rules can be given a name (for error handling), through the ''.name( std::string )'' member function.
==== Grammars ====
A grammar encapsules one or more rules, and assembles them for use.
* Derive from ''grammar'' (giving the same template parameters as for "rules")
* Declare any rules used as member variables
* Initialize the base class constructor with the rule to be called first when parsing, and give the grammar a name (for error handling)
* Define the rules in the constructor
template < typename Iterator >
struct tictactoe : boost::spirit::qi::grammar< Iterator, unsigned() >
{
boost::spirit::qi::rule< Iterator, unsigned() > r;
// If tictactoe were not a template, we could use just base_type(r).
tictactoe() : ticktactoe::base_type( r, "tictactoe" )
{
r = eps[ _val = 0 ]
>> tictac[_val += _1]
;
}
};
Calling a parser with inherited attributes then looks very much like a function call -- ''parser( parameter )''. From the Boost.Spirit example, an XML parser:
qi::rule< Iterator, std::string(), ascii::space_type > start_tag;
qi::rule< Iterator, void( std::string() ), ascii::space_type > end_tag;
// ...
start_tag.name( "start_tag" );
end_tag.name( "end_tag" );
start_tag =
'<'
>> !char_('/')
>> lexeme[ +(char_ - '>')[_val += _1] ]
>> '>'
;
end_tag =
""
>> lit(_r1)
>> '>'
;
// ...
xml =
start_tag[ at_c<0>( _val ) = _1 ]
>> *node [ push_back( at_c<1>( _val ), _1 ]
>> end_tag( at_c<0>( _val ) )
;
Instead of using the Phoenix ''at_c'' construct, you can instead use **locals**. (See [[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/tutorials/mini_xml___asts_.html | Boost example]], "One More Take" / "Local Variables".)
==== Error Handling ====
Error handlers can be declared via the ''on_error< action >( rule, handler )'' function.
The ''action'' parameter is the action to take:
| ''fail'' | return no_match |
| ''retry'' | try to match again |
| ''accept'' | adjust iterator, return match |
| ''rethrow'' | rethrow |
The ''rule'' parameter is the rule to which the error handler should be attached.
The ''handler'' parameter is the function to call if an error is caught. It takes 4 arguments:
| ''first'' | position of iterator when the rule was entered |
| ''last'' | end of input |
| ''error-pos'' | position of iterator when error occured |
| ''what'' | a string describing the failure |
This can be handled via Phoenix placeholders as well:
on_error< fail >
(
xml
, std::cerr
<< val( "Error, expecting " )
<< _4
<< val( " here: \"" )
<< construct< std::string( _3, _2 )
<< val( "\"" )
<< std::endl
);