====== Boost.Spirit ====== ​[[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/index.html | Boost.Spirit]] is a framework for writing parsers. In a sense, it's a replacement for the Lex / Flex and Yacc / Bison tools. One of the advantages of Boost.Spirit is that it does this in plain C++ source, removing the dependency on additional tools, their individual syntax and build steps. Boost.Spirit consists of three parts: * Qi, for writing //parsers// (i.e., reading source and calling functions as appropriate). * Karma, for writing //generators// (i.e., turning data structures into byte sequences). * Lex, for writing //lexical analizers// (i.e., tokenizing source), an optional and auxiliary function to Qi parsers. ===== Qi ===== ==== Parsers ==== | ''attr( arg )'' | ''typeof( attr )'' | takes no input, always successful, exposes ''arg'' as result | | ''eoi'' | unused | matches end-of-input | | ''eol'' | unused | matches CR, LF, or combinations thereof | | ''eps'' | | takes no input, always successful | | ''eps( arg )'' | unused | takes no input, successful if ''arg'' evaluates to ''true'' | | ''symbols<>'' | | see Symbol Tables below | | ''lit("...")'' | | literal, to allow ''lit("keyword") >> '=' >> ...'' | | ''lexeme[ ... ]'' | | suppress skip parsing | | ''skip[ ... ]'' | | enable skip parsing | | ''skip(p)[ ... ]'' | | enable skip parsing, using p as skip parser | | ''omit[ ... ]'' | unused | parses without exposing any attribute | | ''raw[ ... ]'' | ''[first, last)'' | parses, exposing the iterator range of the match | | ''repeat(x)[ a ]'' | ''vector'' | matches exactly ''x'' occurrences of ''a'' | | ''repeat(x, inf)[ a ]'' | ''vector'' | matches at least ''x'' occurrences of ''a'' | | ''repeat(x, y)[ a ]'' | ''vector'' | matches at least ''x'', at most y occurrences of ''a'' | * ​[[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/quick_reference/qi_parsers/char.html | Character Parsers]] * [[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/quick_reference/qi_parsers/numeric.html | ​Numeric Parsers]] * [[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/quick_reference/qi_parsers/string.html | ​String Parsers]] ==== Operators ==== | ''-'' | 0..1 | prefix; attribute is ''A'' | | ''*'' | 0..n | prefix; attribute is ''std::vector'' | | ''+'' | 1..n | prefix; attribute is ''std::vector'' | | ''!'' | negate | prefix; fails if the parser succeeds; does not consume input | | ''&'' | and | prefix; fails if the parser fails; does not consume input | | ''>>'' | followed by | nary; attribute is ''fusion::vector< A, B, C >'' | | ''>'' | expecting | nary; as ''>>'' but with error on fail instead of backtracing | | ''||'' | sequence | nary; may be a; a followed by b; or just b -- quicker for ''a >> -b | b'' | | ''|'' | alternative | nary; try a, then try b; attribute is ''boost::variant< A, B >'' | | ''-'' | difference | binary; parses a, but not b; attribute is just a, b is ignored | | ''%'' | list of, seperated by | binary; shorthand for ''parser >> *(',' >> parser)'' | | ''^'' | permutation | nary; matches a or b in any order, 0..1 times each | | ''='' | assignment | binary; assigns the RHS parser to the LHS rule / grammar | | ''%='' | assignment | binary; if value type of RHS parser matches LHS rule / grammar | ==== Semantic Actions ==== Indicated by postfix ''[]'' after a parser. Calls the indicated handler, passing the type indicated by the parser (e.g. ''double''). Optionally also passes parser context, and reference to boolean "hit" parameter. Handlers can be: === Plain Functions === void handle( double const & d ); // Direct parse( first, last, double_[ &handle ] ); // boost::bind parse( first, last, double_[ boost::bind( &handle, _1 ) ] ); === Member Functions === struct handler { handle( double const & d ) const; }; handler h; parse( first, last, double_[ boost::bind( &handler::handle, &h, _1 ) ] ); === Function Object === struct handler { // Using placeholders for parser context and "hit" parameter void operator()( double const & d, boost::spirit::qi::unused_type, boost::spirit::qi::unused_type ) const; }; parse( first, last, double_[handler()] ); === Lambda === parse( first, last, double_[ std::cout << _1 << '\n' ] ); ==== Note on Phoenix ==== The ''_1'' placeholder is used by Boost.Bind, Boost.Lambda, and Phoenix. * Boost.Bind placeholders are e.g. ''::_1'' * Boost.Lambda placeholders are e.g. ''boost::lambda::_1'' * Boost.Phoenix placeholders are e.g. ''boost::spirit::_1'' or ''boost::spirit::qi::_1'' Make sure you do not mix & mingle those, as they are not compatible. Phoenix is recommended. * ''ref()'' to indicate a variable name used in a semantic function is a variable at parser score, i.e. a mutable reference. * ''val'' to indicate a rule's synthesized attribute. If setting a variable in parser scope via a Phoenix placeholder, you need to put the variable name inside ''ref()'', to indicate that it is a mutable reference. For pushing elements into a vector, Phoenix offers ''push_back( vector, element )''. Note that the vector must be inside ''ref()'' again. ==== Functions ==== boost::spirit::qi::parse( begin_iterator, end_iterator, grammar_parser ); boost::spirit::qi::phrase_parse( begin_iterator, end_iterator, grammar_parser, skip_parser ); boost::spirit::qi::phrase_parse( begin_iterator, end_iterator, grammar_parser, skip_parser, parser_attribute ); The last call can be used in combination with ''parser % ',''' to put the parsed sequence directly into a vector, instead of going through individual ''push_back()'' calls. A ''true'' return value indicates a match (partial or complete). The ''begin_iterator'' is passed by reference, and advanced to the first character for which no match was possible. A complete match is indicated by ''begin_iterator == end_iterator'' after the call. ==== Symbol Tables ==== You can derive from ''qi::symbols'' to define symbol - value pairs that could then be used in a parser definition. The class is templated for the input character type, and the value type to be associated with the expression. struct tictac_ : qi::symbols< char, unsigned > { tictac_() { add ("X" , 1) ("O" , 0) ; } } tictac; [[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/reference/string/symbols.html |​Documentation]] ==== Rules ==== Parser expressions can be assigned to "rules", for modularizing a grammar. | rule< Iterator > | | rule< Iterator, Skipper > | | rule< Iterator, Signature > | | rule< Iterator, Signature, Skipper > | | rule< Iterator, Skipper, Signature > | Only the versions including a ''Skipper'' type can be used with the ''phrase_parse()'' (not having a Skipper limits you to the non-skipping ''parse()''). ''Signature'' specifies the attributes of the rule. The Signature can also declare //inherited// attributes in addition to its own ''result_type'' (which can be ''void''): ''result_type( typeN, typeN, typeN )''. Such inherited attributes can be referred to in the rule as ''_r1'', ''_r2'' and so on (courtesy of Boost.Phoenix). Rules can be given a name (for error handling), through the ''.name( std::string )'' member function. ==== Grammars ==== A grammar encapsules one or more rules, and assembles them for use. * Derive from ''grammar'' (giving the same template parameters as for "rules") * Declare any rules used as member variables * Initialize the base class constructor with the rule to be called first when parsing, and give the grammar a name (for error handling) * Define the rules in the constructor template < typename Iterator > struct tictactoe : boost::spirit::qi::grammar< Iterator, unsigned() > { boost::spirit::qi::rule< Iterator, unsigned() > r; // If tictactoe were not a template, we could use just base_type(r). tictactoe() : ticktactoe::base_type( r, "tictactoe" ) { r = eps[ _val = 0 ] >> tictac[_val += _1] ; } }; Calling a parser with inherited attributes then looks very much like a function call -- ''parser( parameter )''. From the Boost.Spirit example, an XML parser: qi::rule< Iterator, std::string(), ascii::space_type > start_tag; qi::rule< Iterator, void( std::string() ), ascii::space_type > end_tag; // ... start_tag.name( "start_tag" ); end_tag.name( "end_tag" ); start_tag = '<' >> !char_('/') >> lexeme[ +(char_ - '>')[_val += _1] ] >> '>' ; end_tag = "> lit(_r1) >> '>' ; // ... xml = start_tag[ at_c<0>( _val ) = _1 ] >> *node [ push_back( at_c<1>( _val ), _1 ] >> end_tag( at_c<0>( _val ) ) ; Instead of using the Phoenix ''at_c'' construct, you can instead use **locals**. (See [[http://www.boost.org/doc/libs/release/libs/spirit/doc/html/spirit/qi/tutorials/mini_xml___asts_.html | ​Boost example]], "One More Take" / "Local Variables".) ==== Error Handling ==== Error handlers can be declared via the ''on_error< action >( rule, handler )'' function. The ''action'' parameter is the action to take: | ''fail'' | return no_match | | ''retry'' | try to match again | | ''accept'' | adjust iterator, return match | | ''rethrow'' | rethrow | The ''rule'' parameter is the rule to which the error handler should be attached. The ''handler'' parameter is the function to call if an error is caught. It takes 4 arguments: | ''first'' | position of iterator when the rule was entered | | ''last'' | end of input | | ''error-pos'' | position of iterator when error occured | | ''what'' | a string describing the failure | This can be handled via Phoenix placeholders as well: on_error< fail > ( xml , std::cerr << val( "Error, expecting " ) << _4 << val( " here: \"" ) << construct< std::string( _3, _2 ) << val( "\"" ) << std::endl );