i writing hand recursive-descent parser small language. in lexer have:
trait token{def position:int} trait keyword extends token trait operator extends token case class identifier(position:int, txt:string) extends token case class if (position:int) extends keyword case class plus (position:int) extends operator /* etcetera; 1 case class per token type */
my parser works well, , incorporate error recovery: replacing, inserting or discarding tokens until synchronization point.
for that, handy have function that, in invalid scala, this
def scanfor(tokenset:set[tokenclass], lookahead:int) = { lexer.upcomingtokens.take(lookahead).find{ token => tokenset.exists(tokenclass => token.isinstanceof[tokenclass]) } }
which call, example: scanfor(set(plus, minus, times, dividedby), 4)
however tokenclass
, of course, not valid type, , don't know how create previous set.
as alternatives:
- i create new trait , make token classes in token set want check against extend trait, ,
instanceof
check against trait. however, may have several of sets, make them hard name, , code hard maintain later on. - i create isxxx:token=>boolean functions, , make sets of those, seems unelegant
any suggestions?
i recommend, if there handful of such combinations, using additional trait. it's easy write , understand, , fast @ runtime. it's not bad say
case class plus(position: int) extends operator arithmetic precedence7 unary
but there wide range of alternatives.
if don't mind finicky manual maintenance process , need fast, defining id number (which must manually keep distinct) each token type allow use set[int]
or bitset
or long
select classes like. can set operations (union, intersection) build these selectors each other. it's not hard write unit tests make finicky bit little more reliable. if can @ least manage list types:
val = seq(plus, times, if /* etc */) assert(everyone.length == everyone.map(_.id).toset.size)
so shouldn't alarmed approach if decide performance , composability essential.
you can write custom extractors can (more slowly) pull out right subset of tokens pattern matching. example,
object arithop { def unapply(t: token): option[operator] = t match { case o: operator => o match { case _: plus | _: minus | _: times | _: dividedby => some(o) case _ => none } case _ => none } }
will give none
if it's not right type of operation. (in case, i'm assuming there's no parent other operator
.)
finally, express types unions , hlists , pick them out way using shapeless, don't have experience doing parser, i'm not sure of difficulties might encounter.
Comments
Post a Comment