====== Boost.Spirit ======

​[[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/index.html | Boost.Spirit]] is a framework for writing parsers. In a sense, it's a replacement for the Lex / Flex and Yacc / Bison tools. One of the advantages of Boost.Spirit is that it does this in plain C++ source, removing the dependency on additional tools, their individual syntax and build steps.

Boost.Spirit consists of three parts:

  * Qi, for writing //parsers// (i.e., reading source and calling functions as appropriate).
  * Karma, for writing //generators// (i.e., turning data structures into byte sequences).
  * Lex, for writing //lexical analizers// (i.e., tokenizing source), an optional and auxiliary function to Qi parsers. 

===== Qi =====

==== Parsers ====

| ''attr( arg )'' 		| ''typeof( attr )''	| takes no input, always successful, exposes ''arg'' as result |
| ''eoi'' 			| unused		| matches end-of-input |
| ''eol'' 			| unused		| matches CR, LF, or combinations thereof |
| ''eps'' 			| 			| takes no input, always successful |
| ''eps( arg )'' 		| unused 		| takes no input, successful if ''arg'' evaluates to ''true'' |
| ''symbols<>'' 		| 			| see Symbol Tables below |
| ''lit("...")'' 		| 			| literal, to allow ''lit("keyword") >> '=' >> ...'' |
| ''lexeme[ ... ]'' 		| 			| suppress skip parsing |
| ''skip[ ... ]'' 		|	 		| enable skip parsing |
| ''skip(p)[ ... ]'' 		| 			| enable skip parsing, using p as skip parser |
| ''omit[ ... ]'' 		| unused 		| parses without exposing any attribute |
| ''raw[ ... ]'' 		| ''[first, last)''	| parses, exposing the iterator range of the match |
| ''repeat(x)[ a ]'' 		| ''vector<A>''		| matches exactly ''x'' occurrences of ''a'' |
| ''repeat(x, inf)[ a ]''	| ''vector<A>''		| matches at least ''x'' occurrences of ''a'' |
| ''repeat(x, y)[ a ]'' 	| ''vector<A>''		| matches at least ''x'', at most y occurrences of ''a'' |

  * ​[[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/quick_reference/qi_parsers/char.html | Character Parsers]]
  * [[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/quick_reference/qi_parsers/numeric.html | ​Numeric Parsers]]
  * [[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/quick_reference/qi_parsers/string.html | ​String Parsers]]

==== Operators ====

| ''-''  | 0..1 	| prefix; attribute is ''A'' |
| ''*''  | 0..n 	| prefix; attribute is ''std::vector<A>'' |
| ''+''  | 1..n 	| prefix; attribute is ''std::vector<A>'' |
| ''!''  | negate 	| prefix; fails if the parser succeeds; does not consume input |
| ''&''  | and 		| prefix; fails if the parser fails; does not consume input |
| ''>>'' | followed by 	| nary; attribute is ''fusion::vector< A, B, C >'' |
| ''>''  | expecting 	| nary; as ''>>'' but with error on fail instead of backtracing |
| ''||'' | sequence 	| nary; may be a; a followed by b; or just b -- quicker for ''a >> -b | b'' |
| ''|''  | alternative 	| nary; try a, then try b; attribute is ''boost::variant< A, B >'' |
| ''-''  | difference 	| binary; parses a, but not b; attribute is just a, b is ignored |
| ''%''  | list of, seperated by | binary; shorthand for ''parser >> *(',' >> parser)'' |
| ''^''  | permutation 	| nary; matches a or b in any order, 0..1 times each |
| ''=''  | assignment 	| binary; assigns the RHS parser to the LHS rule / grammar |
| ''%='' | assignment 	| binary; if value type of RHS parser matches LHS rule / grammar |

==== Semantic Actions ====

Indicated by postfix ''[]'' after a parser. Calls the indicated handler, passing the type indicated by the parser (e.g. ''double''). Optionally also passes parser context, and reference to boolean "hit" parameter.

Handlers can be:

=== Plain Functions ===

<code cpp>
void handle( double const & d );

// Direct
parse( first, last, double_[ &handle ] );
// boost::bind
parse( first, last, double_[ boost::bind( &handle, _1 ) ] );
</code>

=== Member Functions ===

<code cpp>
struct handler
{
    handle( double const & d ) const;
};
handler h;

parse( first, last, double_[ boost::bind( &handler::handle, &h, _1 ) ] );
</code>

=== Function Object ===

<code cpp>
struct handler
{
    // Using placeholders for parser context and "hit" parameter
    void operator()( double const & d, boost::spirit::qi::unused_type, boost::spirit::qi::unused_type ) const;
};

parse( first, last, double_[handler()] );
</code>

=== Lambda ===

<code cpp>
parse( first, last, double_[ std::cout << _1 << '\n' ] );
</code>

==== Note on Phoenix ====

The ''_1'' placeholder is used by Boost.Bind, Boost.Lambda, and Phoenix.

  * Boost.Bind placeholders are e.g. ''::_1''
  * Boost.Lambda placeholders are e.g. ''boost::lambda::_1''
  * Boost.Phoenix placeholders are e.g. ''boost::spirit::_1'' or ''boost::spirit::qi::_1'' 

Make sure you do not mix & mingle those, as they are not compatible. Phoenix is recommended.

  * ''ref()'' to indicate a variable name used in a semantic function is a variable at parser score, i.e. a mutable reference.
  * ''val'' to indicate a rule's synthesized attribute. 

If setting a variable in parser scope via a Phoenix placeholder, you need to put the variable name inside ''ref()'', to indicate that it is a mutable reference.

For pushing elements into a vector, Phoenix offers ''push_back( vector, element )''. Note that the vector must be inside ''ref()'' again.

==== Functions ====

<code cpp>
boost::spirit::qi::parse( begin_iterator, end_iterator, grammar_parser );
boost::spirit::qi::phrase_parse( begin_iterator, end_iterator, grammar_parser, skip_parser );
boost::spirit::qi::phrase_parse( begin_iterator, end_iterator, grammar_parser, skip_parser, parser_attribute );
</code>

The last call can be used in combination with ''parser % ',''' to put the parsed sequence directly into a vector, instead of going through individual ''push_back()'' calls.

A ''true'' return value indicates a match (partial or complete). The ''begin_iterator'' is passed by reference, and advanced to the first character for which no match was possible. A complete match is indicated by ''begin_iterator == end_iterator'' after the call.

==== Symbol Tables ====

You can derive from ''qi::symbols'' to define symbol - value pairs that could then be used in a parser definition. The class is templated for the input character type, and the value type to be associated with the expression.

<code cpp>
struct tictac_ : qi::symbols< char, unsigned >
{
    tictac_()
    {
        add
            ("X"    , 1)
            ("O"    , 0)
        ;
    }
} tictac;
</code>

[[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/reference/string/symbols.html |​Documentation]]

==== Rules ====

Parser expressions can be assigned to "rules", for modularizing a grammar.

| rule< Iterator > |
| rule< Iterator, Skipper > |
| rule< Iterator, Signature > |
| rule< Iterator, Signature, Skipper > |
| rule< Iterator, Skipper, Signature > |

Only the versions including a ''Skipper'' type can be used with the ''phrase_parse()'' (not having a Skipper limits you to the non-skipping ''parse()'').

''Signature'' specifies the attributes of the rule. The Signature can also declare //inherited// attributes in addition to its own ''result_type'' (which can be ''void''): ''result_type( typeN, typeN, typeN )''. Such inherited attributes can be referred to in the rule as ''_r1'', ''_r2'' and so on (courtesy of Boost.Phoenix).

Rules can be given a name (for error handling), through the ''.name( std::string )'' member function.

==== Grammars ====

A grammar encapsules one or more rules, and assembles them for use.

  * Derive from ''grammar'' (giving the same template parameters as for "rules")
  * Declare any rules used as member variables
  * Initialize the base class constructor with the rule to be called first when parsing, and give the grammar a name (for error handling)
  * Define the rules in the constructor 

<code cpp>
template < typename Iterator >
struct tictactoe : boost::spirit::qi::grammar< Iterator, unsigned() >
{
    boost::spirit::qi::rule< Iterator, unsigned() > r;

    // If tictactoe were not a template, we could use just base_type(r).
    tictactoe() : ticktactoe::base_type( r, "tictactoe" )
    {
        r = eps[ _val = 0 ]
            >> tictac[_val += _1]
        ;
    }
};
</code>

Calling a parser with inherited attributes then looks very much like a function call -- ''parser( parameter )''. From the Boost.Spirit example, an XML parser:

<code cpp>
qi::rule< Iterator, std::string(), ascii::space_type >         start_tag;
qi::rule< Iterator, void( std::string() ), ascii::space_type > end_tag;

// ...

start_tag.name( "start_tag" );
end_tag.name( "end_tag" );

start_tag =
        '<'
    >>  !char_('/')
    >> lexeme[ +(char_ - '>')[_val += _1] ]
    >> '>'
;

end_tag =
        "</"
    >>  lit(_r1)
    >> '>'
;

// ...

xml =
        start_tag[ at_c<0>( _val ) = _1 ]
    >>  *node    [ push_back( at_c<1>( _val ), _1 ]
    >>  end_tag( at_c<0>( _val ) )
;
</code>

Instead of using the Phoenix ''at_c'' construct, you can instead use **locals**. (See [[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/tutorials/mini_xml___asts_.html | ​Boost example]], "One More Take" / "Local Variables".)

==== Error Handling ====

Error handlers can be declared via the ''on_error< action >( rule, handler )'' function.

The ''action'' parameter is the action to take:

| ''fail'' 	| return no_match |
| ''retry'' 	| try to match again |
| ''accept'' 	| adjust iterator, return match |
| ''rethrow'' 	| rethrow |

The ''rule'' parameter is the rule to which the error handler should be attached.

The ''handler'' parameter is the function to call if an error is caught. It takes 4 arguments:

| ''first'' 	| position of iterator when the rule was entered |
| ''last'' 	| end of input |
| ''error-pos'' | position of iterator when error occured |
| ''what'' 	| a string describing the failure |

This can be handled via Phoenix placeholders as well:

<code cpp>
on_error< fail >
(
    xml
  , std::cerr
        << val( "Error, expecting " )
        << _4
        << val( " here: \"" )
        << construct< std::string( _3, _2 )
        << val( "\"" )
        << std::endl
);
</code>