Generic parsing and scanning framework

This code has been declared by its author to be Public Domain code.

Generic parsing and scanning framework by Yasha

2013

This is a port-and-expansion of my previous generic lexer and parser libraries for B3D, to BlitzMax.

BlitzMax's enhancements open up a whole array of ways to improve the syntax of such tools, and I think we're more or less at the maximum: the TMetaParser class can now accept input in a syntax that roughly-approximates BNF, making this about as light as it's possible to get. (There's also a "simple" TParser class that accepts plain combinators in a similar fashion to the B3D version - lightweight compared to other libraries, positively cumbersome compared to TMetaParser.)

It's called MetaParser because the implementation of the BNF layer is more or less metacircular.

Minimal example (demonstrates both TMetaParser and TLexer):

Import "TMeta.bmx"

Type Simple Extends TMetaParser
	
	Field grammar:TMap {..
		Expression = "SumExpr"..
	..
		SumExpr    = "ShiftExpr (%add ShiftExpr)* : @L @R ^"..
		ShiftExpr  = "MulExpr (%shl MulExpr)* : @L @R ^"..
		MulExpr    = "Atom (%mul Atom)* : @L @R ^"..
		Atom       = "FunCall | BracketExp | %number"..
		FunCall    = "%function %!lparen FunCommaArg* Expression %rparen : @name - @args < -"..
		  FunCommaArg = "Expression %comma : @ - ^"..
		BracketExp = "%lparen ! Expression %!rparen : - @ - ^"..
	}
	
End Type

Local q:Simple = New Simple

Local test:String = " 1 +2* f(4, 3 + 1)<< 4 + 5"
Print test + "~n"
Print q.Parse(GetLexer().ScanString(test)).ToString()

Function GetLexer:TLexer()
	Function R:TLexRule(r:String, a(l:TLexer), res:String = "", m:String = "")
		Return TLexRule.Create(r, a, res, m)
	End Function
	
	Global l:TLexer = TLexer.withRules([..
		R("[0-9]*", TLexAction.Store, "number"),..	'Simple Int
		R("0[xX][0-9a-fA-F]+", TLexAction.Store, "number"),..	'Hex Int (style: 0xABC12)
		R("[0-9]*\.[0-9]+([eE]-?[0-9][0-9]*)?", TLexAction.Store, "number"),..		'Float, simple or scientific
	..
		R("/\*", TLexAction.Mode, "COMMENT", ""),..		'Note that the star must be escaped
		R(".", TLexAction.Discard, "", "COMMENT"),..	'Match any charcater, but throw it away in comment mode only
		R("\*/", TLexAction.Mode, "", "COMMENT"),..		'Return to mode 0 on hitting */
		R("//[^n]*n", TLexAction.Discard),..			'Line comment: match up to end of line
	..
		R("\+", TLexAction.Store, "add"),..		'Any Regex operators need to be escaped with \
		R("-",  TLexAction.Store, "sub"),..
		R("\*", TLexAction.Store, "mul"),..
		R("/",  TLexAction.Store, "div"),..
		R("%",  TLexAction.Store, "mod"),..
		R("<<", TLexAction.Store, "shl"),..
		R(">>", TLexAction.Store, "shr"),..
		R("\^", TLexAction.Store, "pow"),..
		R("=",  TLexAction.Store, "eql"),..
		R("!=", TLexAction.Store, "neq"),..
		R("<=", TLexAction.Store, "leq"),..
		R(">=", TLexAction.Store, "geq"),..
		R("\(", TLexAction.Store, "lparen"),..
		R("\)", TLexAction.Store, "rparen"),..
		R(",",  TLexAction.Store, "comma"),..
	..
		R("[a-z][a-z0-9_]*", TLexAction.Store, "function"),..
	..
		R("[^[:space:]]", TLexAction.Error)..		'Raise an error over any other printable character
	])
	l.SetCaseSensitivity False
	l.SetGuardMode False
	
	Return l
End Function

(Subtraction, division, functions etc. are omitted to keep the example short. See the original parser example for a fuller grammar.)

Usage is directly derived from its predecessor libraries.

A lexer is built out of rules, consisting of a regex, an action on match, a result, and potentially a mode constraint. Lexers can scan strings or files, and return an array of TTokens. See the B3D original for more information.

A parser is created by extending TParser or TMetaParser, and listing rules. TMetaParser is more lightweight and is used in the example above: a field named "Grammar" is added to the class and its metadata contains the grammar for the parser, in BNF-style syntax. (Additions: a "!" can be added to the middle of rules to establish an "error point", and optionally rules can end with a colon followed by a filter string. Both of these additions are optional and explained in the B3D original entry. Terminal symbols are represented as a % or %! followed by their lexer symbol name.)

The only major advantage of the combinator-based TParser over TMetaParser is that filters can also be applied to sub-rules, but if you need to filter sub-rules (and anyway, filters are largely an unnecessary extra), consider factoring out the nested expression to its own rule. (TParser is also obviously used in TMeta's internal compiler; you can see it in use there.)

To use, instantiate your TMetaParser subtype and call the Parse method on a TToken array produced by the relevant lexer (lexer and parser terminal types need to match, so make sure the two are designed together). The first rule in the list will be treated as the "root", and used to match the whole document.

This should provide a route to rapid language development in BlitzMax. It's finally concise enough to look nice on the page.

Enjoy!

(This really ought to be at least three separate files; the natural splitting points are marked. It works OK as one though.)

Requires BaH.Regex.

' Generic parsing and lexing system, with BNF-style lightweight input
'=====================================================================

Import BaH.Regex
SuperStrict

Import Brl.Retro
Import Brl.LinkedList
Import Brl.Map
Import Brl.Reflection

Public

' TMetaParser
'=============

Type TMetaParser Extends TParser Abstract
	Field _rules:TMap, _start:String, _linked:Int
	Global pp:PMeta
	Method Parse:TParseNode(toks:TToken[])
		If Not _linked Then Link()
		Return Super.Parse(toks)
	End Method
	Method Link()
		Local metaData:TToken[] = MetaLexer.ScanString(TTypeId.ForObject(Self).FindField("grammar")._meta)
		_rules = CreateMap()
		If pp = Null Then pp = New PMeta	'Not safe to initialize globally
		For Local i:Int = 0 Until metaData.Length Step 2
			Local val:String = metaData[i + 1].value[1..]
			AddRule(metaData[i].value, val[..val.Length - 1])
		Next
		_start = metaData[0].value
		_linked = True
	End Method
	Method AddRule(key:String, rs:String)
		Local rp:String[] = rs.Split(":")
		Local rule:TParseRule = CompileRule(pp.Parse(BNFLexer.ScanString(rp[0])))
		If rp.Length > 1 Then rule.f(rp[1].Trim())
		_rules.Insert(key, rule)
	End Method
	Method CompileRule:TParseRule(r:TParseNode)
		If r.term
			If r.term.tType = "err" Then Return Commit()
			If r.term.value[0..2] = "%!" Then Return ExpectTT(r.term.value[2..])
			If r.term.value[0] = "%"[0] Then Return CheckTT(r.term.value[1..])
			Return Self.R(r.term.value)
		EndIf
		Select r.rule
			Case "ralt"
				Local fst:TParseRule = CompileRule(r.elem[0])
				If r.elem[1].elem[0].term
					r.elem[1] = r.elem[1].elem[1]
					Return Alt([ fst, CompileRule(r.elem[1]) ])
				Else
					For Local i:Int = 0 Until r.elem[1].elem.Length
						r.elem[1].elem[i] = r.elem[1].elem[i].elem[1]
					Next
					Local rst:TParseRule[] = MapCompile(r.elem[1].elem)
					rst = rst[..rst.Length + 1]
					For Local i:Int = rst.Length - 1 To 1 Step -1
						rst[i] = rst[i - 1]
					Next
					rst[0] = fst
					Return Alt(rst)
				EndIf
			Case "rcat"
				Return Cat(MapCompile(r.elem))
			Case "unar"
				Local arg:TParseRule = CompileRule(r.elem[0])
				Select r.elem[1].term.tType
					Case "plus" ; Return Plus([arg])
					Case "opt"  ; Return Opt([arg])
					Case "star" ; Return Rep([arg])
				End Select
		End Select
	End Method
	Method MapCompile:TParseRule[](a:TParseNode[])
		Local ret:TParseRule[] = New TParseRule[a.Length]
		For Local i:Int = 0 Until a.Length
			ret[i] = CompileRule(a[i])
		Next
		Return ret
	End Method
	Method startRule:TParseRule()
		Return TParseRule(_rules.ValueForKey(_start))
	End Method
	Method GetNamedRule:TParseRule(r:String)
		Return TParseRule(_rules.ValueForKey(r))
	End Method
End Type

Private

Global MetaLexer:TLexer = TLexer.withRules([..
	R("[a-z_][a-z0-9_]*", TLexAction.Store, "key"),..
	R("~q[^~q]*~q", TLexAction.Store, "value")..
])

Global BNFLexer:TLexer = TLexer.withRules([..
	R("[a-z_][a-z0-9_]*", TLexAction.Store, "name"),..
	R("%[a-z_][a-z0-9_]*", TLexAction.Store, "terminal"),..
	R("%![a-z_][a-z0-9_]*", TLexAction.Store, "terminal"),..
..
	R("\+", TLexAction.Store, "plus"),..
	R("\|", TLexAction.Store, "or"),..
	R("\?", TLexAction.Store, "opt"),..
	R("\*", TLexAction.Store, "star"),..
	R("=",  TLexAction.Store, "eql"),..
	R("!",  TLexAction.Store, "err"),..
	R("@",  TLexAction.Store, "any"),..
	R("\(", TLexAction.Store, "lparen"),..
	R("\)", TLexAction.Store, "rparen")..
])

Function R:TLexRule(r:String, a(l:TLexer), res:String = "", m:String = "")
	Return TLexRule.Create(r, a, res, m)
End Function

Type PMeta Extends TParser
	Field root:TParseRule, rcat:TParseRule, ralt:TParseRule, unar:TParseRule, atom:TParseRule, nest:TParseRule
	
	Method startRule:TParseRule()
		Return root
	End Method
	
	Method New()
		root = R("ralt")
		ralt = Cat([ R("rcat"), Rep([ CheckTT("or"), R("rcat") ]) ])
		rcat = Plus([ Alt([ CheckTT("err"), CheckTT("any"), R("unar") ]) ])
		unar = Cat([ R("atom"), Opt([ Alt([ CheckTT("plus"), CheckTT("opt"), CheckTT("star") ]) ]) ])
		atom = Alt([ CheckTT("name"), CheckTT("terminal"), R("nest") ])
		nest = Cat([ CheckTT("lparen"), R("ralt"), CheckTT("rparen") ]).f("- @ - ^")
	End Method
End Type

Public

' TParser
'=========

Type TParseNode
	Field name:String, rule:String, term:TToken
	Field elem:TParseNode[]
	Method n:TParseNode(s:String)
		name = s ; Return Self
	End Method
	Method ToString:String()
		Function show:String(n:TParseNode, pad:Int)
			Local id:String = LSet("", pad)
			If n.name = "#Nil" Then Return id + "[NIL]~n"
			If n.term Then Return id + n.name + ": { '" + n.term.value + "' : " + n.term.tType + " }~n"
			
			Local s:String = id + n.name + ": " + n.rule + "~n"
			For Local e:TParseNode = EachIn n.elem
				s :+ show(e, pad + 2)
			Next
			Return s
		End Function
		Return show(Self, 0)
	End Method
	Method GetElem:TParseNode(name:String)
		For Local e:Int = 0 Until elem.Length
			If elem[e].name = name Then Return elem[e]
		Next
		Return Null
	End Method
	
	Function Leaf:TParseNode(t:ttoken)
		Local n:TParseNode = New Self ; n.term = t ; Return n
	End Function
	Function Node:TParseNode(b:TParseNode[])
		Local n:TParseNode = New Self ; n.elem = b ; Return n
	End Function
	Function Nil:TParseNode()
		Local n:TParseNode = New Self ; n.name = "#Nil" ; Return n
	End Function
End Type

Type TParser Abstract
	Method Parse:TParseNode(toks:TToken[])
		Self.toks = toks ; ct = 0 ; epstk = Null ; edstk = Null ; rdepth = 0
		Local ret:TParseNode = startRule().run(Self)
		If ret Then ret.name = "@root"
		Return ret
	End Method
	
	Method Cat:TParseRule(rs:TParseRule[]) Final
		Return TParseRule.Make("Cat", Self, "", rs)
	End Method
	Method Alt:TParseRule(rs:TParseRule[]) Final
		Return TParseRule.Make("Alt", Self, "", rs)
	End Method
	Method Opt:TParseRule(rs:TParseRule[]) Final
		Return TParseRule.Make("Opt", Self, "", rs)
	End Method
	Method Rep:TParseRule(rs:TParseRule[]) Final
		Return TParseRule.Make("Rep", Self, "", rs)
	End Method
	Method Err:TParseRule(msg:String) Final
		Return TParseRule.Make("Err", Self, msg, Null)
	End Method
	Method Plus:TParseRule(rs:TParseRule[]) Final
		Return TParseRule.Make("Plus", Self, "", rs)
	End Method
	
	Method CheckTT:TParseRule(t:String) Final
		Return TParseRule.Make("CTT", Self, t, Null)
	End Method
	Method CheckVal:TParseRule(v:String) Final
		Return TParseRule.Make("CV", Self, v, Null)
	End Method
	Method ExpectTT:TParseRule(t:String) Final
		Return TParseRule.Make("ETT", Self, t, Null)
	End Method
	
	Method Commit:TParseRule() Final
		Return TParseRule.Make("Commit", Self, "", Null)
	End Method
	Method Any:TParseRule() Final
		Return TParseRule.Make("Any", Self, "", Null)
	End Method
	Method R:TParseRule(n:String) Final
		Return TParseRule.Make("Named", Self, n, Null)
	End Method
	
	Field toks:TToken[], ct:Int, epstk:Int[], edstk:Int[], rdepth:Int
	Method _ctok:TToken() Final
		If ct < toks.Length Then Return toks[ct] Else Return Null
	End Method
	Method _incr:TParseNode() Final
		Local tok:TToken = toks[ct] ; ct :+ 1
		Return TParseNode.Leaf(tok)
	End Method
	Method _back(pos:Int) Final
		If epstk
			Local ep:Int = epstk[epstk.Length - 1]
			If pos < ep Then Throw ParseError.Make(toks[ep], "error trying to complete '" + toks[ep - 1].value + "'")
		EndIf
		ct = pos
	End Method
	Method _popErrs() Final
		Local c:Int = 0
		For Local p:Int = edstk.Length - 1 To 0 Step -1
			If edstk[p] = rdepth Then c :+ 1 Else Exit
		Next
		If c
			edstk = edstk[.. edstk.Length - c]
			epstk = epstk[.. edstk.Length]
		EndIf
		rdepth :- 1
	End Method
	Method _pushErr() Final
		edstk :+ [rdepth] ; epstk :+ [ct]
	End Method
	Method GetNamedRule:TParseRule(r:String)
		Return TParseRule(TTypeId.ForObject(Self).FindField(r).Get(Self))
	End Method
	
	Method startRule:TParseRule() Abstract
End Type

Type TParseRule Abstract
	Field r:String, args:TParseRule[], _f:String[]
	Method run:TParseNode(p:TParser) Abstract
	Method f:TParseRule(filt:String)
		If args.Length = 1 And CatRule(args[0]) <> Null Then args[0].f filt
	End Method
	
	Function Make:TParseRule(subt:String, p:TParser, r:String, args:TParseRule[])
		Local t:TParseRule = TParseRule(TTypeId.ForName(subt + "Rule").NewObject())
		If args And args.Length > 1 And subt <> "Cat" And subt <> "Alt"
			args = [p.Cat(args)]
		EndIf
		t.r = r ; t.args = args
		Return t
	End Function
End Type

Private

Type CatRule Extends TParseRule
	Method run:TParseNode(p:TParser)
		Local pos:Int = p.ct, el:TParseNode[] = New TParseNode[args.Length], cct:Int = 0
		p.rdepth :+ 1
		For Local d:Int = 0 Until args.Length
			el[d - cct] = args[d].run(p)
			If el[d - cct] = Null
				p._back(pos) ; p._popErrs ; Return Null
			ElseIf el[d - cct] = CommitRule.Nil
				cct :+ 1
			EndIf
		Next
		If cct Then el = el[0..(el.Length - cct)]
		p._popErrs
		Return filter(el, p)
	End Method
	Method filter:TParseNode(el:TParseNode[], p:TParser)
		Local ls:TList = CreateList(), ct:Int = 0, noFold:Int = False
		If _f
			If _f.Length <= el.Length Then Throw ParseError.Make(p._ctok(), "filter pattern is wrong length for rule")
			For Local e:Int = 0 Until el.Length
				If el[e] <> OptRule.Nil
					If _f[e][0] = "@"[0]
						If _f[e].Length > 1 Then el[e].name = _f[e][1..] Else el[e].name = "@" + ct
						ls.AddLast el[e] ; ct :+ 1
					ElseIf _f[e][0] = "<"[0]
						Local prev:TParseNode = el[e - 1]
						If prev <> OptRule.Nil
							If prev.elem
								el[e].name = "@" + prev.elem.Length
								prev.elem = prev.elem + [el[e]]
							Else
								el[e].name = "@1"
								prev = TParseNode.Node([prev, el[e]])
								prev.name = prev.elem[0].name ; prev.elem[0].name = "@0"
								ls.RemoveLast() ; ls.AddLast prev
							EndIf
						Else
							el[e].name = "@0"
							prev = TParseNode.Node([el[e]])
							If _f[e - 1].Length > 1 Then prev.name = _f[e - 1][1..] Else prev.name = "@" + ct
							ls.AddLast prev ; ct :+ 1
						EndIf
					EndIf
				EndIf
			Next
			If _f[el.Length] <> "^" Then noFold = True
		Else
			For Local e:Int = 0 Until el.Length
				If el[e] <> OptRule.Nil
					ls.AddLast(el[e])
					el[e].name = "@" + ct ; ct :+ 1
				EndIf
			Next
		EndIf
		el = TParseNode[](ls.ToArray())
		If el = Null Then Return OptRule.Nil
		If (el.Length > 1) Or noFold Then Return TParseNode.Node(el) Else Return el[0]
	End Method
	Method f:TParseRule(filt:String)
		_f = filt.Split(" ")
		If _f[_f.Length - 1] <> "^"
			_f = _f[.._f.Length + 1] ; _f[_f.Length - 1] = ""
		EndIf
		Return Self
	End Method
End Type

Type AltRule Extends TParseRule
	Method run:TParseNode(p:TParser)
		Local pos:Int = p.ct
		For Local d:TParseRule = EachIn args
			Local n:TParseNode = d.run(p) ; If n Then Return n
			p._back(pos)
		Next
		Return Null
	End Method
	Method f:TParseRule(filt:String)
		RuntimeError "Cannot apply a filter string to an Alt rule"
	End Method
End Type

Type OptRule Extends TParseRule
	Global Nil:TParseNode = TParseNode.Nil()
	Method run:TParseNode(p:TParser)
		Local n:TParseNode = args[0].run(p)
		If n Then Return n Else Return Nil
	End Method
End Type

Type RepRule Extends TParseRule
	Method run:TParseNode(p:TParser)
		Return RunF(Self, p)
	End Method
	Function RunF:TParseNode(s:TParseRule, p:TParser)
		Local match:TParseNode, ls:TList
		Repeat
			Local pos:Int = p.ct
			match = s.args[0].run(p)
			If match = Null Or match = OptRule.Nil
				p._back(pos) ; Exit
			Else
				If ls = Null Then ls = CreateList()
				ls.AddLast(match)
			EndIf
		Forever
		If ls = Null
			Return OptRule.Nil
		Else
			Local ret:TParseNode[] = TParseNode[](ls.ToArray())
		'	If ret.Length > 1
				For Local i:Int = 0 Until ret.Length
					ret[i].name = "@" + i
				Next
				Return TParseNode.Node(ret)
		'	Else
		'		Return ret[0]
		'	EndIf
		EndIf
	End Function
End Type

Type ErrRule Extends TParseRule
	Method run:TParseNode(p:TParser)
		Throw ParseError.Make(p._ctok(), r)
	End Method
End Type

Type PlusRule Extends TParseRule
	Method run:TParseNode(p:TParser)
		Local ret:TParseNode = RepRule.RunF(Self, p)
		If ret = OptRule.Nil Then Return Null Else Return ret
	End Method
End Type

Type CTTRule Extends TParseRule
	Method run:TParseNode(p:TParser)
		If p._ctok() And p._ctok().tType = r Then Return p._incr() Else Return Null
	End Method
End Type

Type CVRule Extends TParseRule
	Method run:TParseNode(p:TParser)
		If p._ctok() And p._ctok().value = r Then Return p._incr() Else Return Null
	End Method
End Type

Type ETTRule Extends TParseRule
	Method run:TParseNode(p:TParser)
		If p._ctok() And p._ctok().tType = r Then Return p._incr() Else Throw ParseError.Gen(p._ctok(), r)
	End Method
End Type

Type CommitRule Extends TParseRule
	Global Nil:TParseNode = TParseNode.Nil()
	Method run:TParseNode(p:TParser)
		p._pushErr ; Return Nil
	End Method
End Type

Type AnyRule Extends TParseRule
	Method run:TParseNode(p:TParser)
		If p._ctok() Then Return p._incr() Else Return Null
	End Method
End Type

Type NamedRule Extends TParseRule
	Method run:TParseNode(p:TParser)
		Local ret:TParseNode = p.GetNamedRule(r).run(p)
		If ret And ret.rule = "" Then ret.rule = r
		Return ret
	End Method
End Type

Type ParseError
	Field msg:String
	Method ToString:String()
		Return "TParser: " + msg
	End Method
	Function Make:ParseError(t:TToken, msg:String = "")
		Local e:ParseError = New Self
		e.msg = "error in " + t.file + " at line " + t.l + ", col " + t.c + ":  " + msg
		Return e
	End Function
	Function Gen:ParseError(t:TToken, ex:String)
		Return Make(t, "expecting {" + ex + "} but found {" + t.tType + "}")
	End Function
End Type

Public

' TLexer
'========

Type TToken
	Field value:String, tType:String
	Field file:String, l:Int, c:Int
	Function Make:TToken(v:String, ty:String, f:String, l:Int, c:Int)
		Local t:TToken = New TToken
		t.value = v ; t.tType = ty ; t.file = f ; t.l = l ; t.c = c
		Return t
	End Function
End Type

Type TLexer
	Field rules:TLexRule[]
	Field cFile:LexFile
	Field out:TList, matchR:TLexRule, matchS:String
	Field csMode:Int, guardMode:Int, mode:String
	Field istk:TList, prev:TList
	
	Function withRules:TLexer(r:TLexRule[])
		Local l:TLexer = New TLexer
		l.rules = r
		l.Reset
		Return l
	End Function
	
	Method SetCaseSensitivity(cs:Int)
		csMode = cs
	End Method
	Method SetGuardMode(gm:Int)
		guardMode = gm
	End Method
	Method Reset()
		istk = CreateList() ; prev = CreateList() ; out = Null ; cFile = Null ; mode = ""
	End Method
	
	Method ScanFile:TToken[](name:String)
		Return TToken[](ScanLFile(Self, LexFile.fromFile(name)).ToArray())
	End Method
	Method ScanString:TToken[](s:String)
		Return TToken[](ScanLFile(Self, LexFile.fromString(s)).ToArray())
	End Method
	
	Method GuardFileName(f:LexFile)
		If guardMode Then prev.AddLast(f.dir + f.name)
	End Method
End Type

Type TLexRule
	Field rule:TRegEx, pattern:String
	Field action(l:TLexer), result:String, mode:String
	Function Create:TLexRule(rs:String, act(l:TLexer), result:String = "", mode:String = "")
		Local r:TLexRule = New TLexRule
		r.rule = TRegEx.Create("\G" + rs, Null) ; r.pattern = rs
		If result = "" And act = TLexAction.Store Then result = rs
		r.action = act ; r.result = result ; r.mode = mode
		Return r
	End Function
End Type

Type TLexAction
	Function Store(l:TLexer)
		Local fname:String = l.cFile.name ; If fname <> "<string>" Then fname :+ " [" + l.cFile.dir + fname + "]"
		l.out.AddLast TToken.Make(l.matchS, l.matchR.result, fname, l.cFile.cLine, l.cFile.cCol)
	End Function
	Function Mode(l:TLexer)
		l.mode = l.matchR.result
	End Function
	Function Error(l:TLexer)
		Throw LexError.Make(l, l.matchR.result)
	End Function
	Function Discard(l:TLexer)
	End Function
	Function Incl(l:TLexer)
		l.matchS = FilterIncludeString(l.matchS, l.matchR.result)	'Shorten the token to just the file path
		TryIncludeFile(l, l.matchS)
	End Function
End Type

Private

Type LexFile
	Field dir:String, name:String
	Field stream:String, sLen:Int, cPtr:Int, cLine:Int, cCol:Int
	Function fromFile:LexFile(name:String)
		name = RealPath(name)
		Local f:LexFile = New LexFile
		f.name = StripDir(name) ; f.dir = ExtractDir(name) + "/"
		f.stream = LoadText(name) ; f.sLen = Len(f.stream) ; f.cLine = 1 ; f.cCol = 1
		Return f
	End Function
	Function fromString:LexFile(s:String)
		Local f:LexFile = New LexFile
		f.name = "<string>" ; f.dir = ""
		f.stream = s ; f.sLen = Len(s) ; f.cLine = 1 ; f.cCol = 1
		Return f
	End Function
	Method Increment(count:Int)
		For Local c:Int = 1 To count
			If stream[cPtr] < 32	'Only count printable characters in the column field
				If stream[cPtr] = 10 Then cLine :+ 1 ; cCol = 1
			Else
				cCol :+ 1
			EndIf
			cPtr :+ 1
		Next
	End Method
End Type
Type LexError
	Field msg:String
	Function Make:LexError(l:TLexer, msg:String)
		Local e:LexError = New LexError, fname:String = l.cFile.name ; If fname <> "<string>" Then fname :+ " [" + l.cFile.dir + fname + "]"
		e.msg = "error in " + fname + " at line " + l.cFile.cLine + ", col " + l.cFile.cCol + ":  "
		If msg <> "" Then e.msg :+ msg Else e.msg :+ "unexpected character '" + l.cFile.stream[l.cFile.cPtr] + "'"
		Return e
	End Function
	Method ToString:String()
		Return "TLexer: " + msg
	End Method
End Type

Function ScanLFile:TList(l:TLexer, f:LexFile)
	l.cFile = f
	l.istk.AddLast f
	l.GuardFileName f
	
	TRegEx.options = New TRegExOptions
	TRegEx.options.targetIsMultiline = False
	TRegEx.options.caseSensitive = l.csMode
	
	l.mode = ""
	l.out = CreateList()
	Repeat
		Local token:String, rule:TLexRule
		
		While l.cFile.cPtr < l.cFile.sLen
			Local cf:LexFile = l.cFile
			token = "" ; rule = Null
			
			For Local r:TLexRule = EachIn l.rules
				If l.mode = r.mode
					Local cMatch:TRegExMatch = r.rule.Find(cf.stream, cf.cPtr)
					If cMatch
						If Len(cMatch.SubExp()) > Len(token) Then token = cMatch.SubExp() ; rule = r
					EndIf
				EndIf
			Next
			
			If rule		'Something matched successfully!
				l.matchS = token ; l.matchR = rule
				rule.action(l)
				cf.Increment Len(token)
			Else
				cf.Increment 1
			EndIf
			l.matchR = Null ; l.matchS = ""
		Wend
		
		If l.cFile <> f		'Pop back to the previous file in the Include 
			l.istk.RemoveLast
			l.cFile = LexFile(l.istk.Last())
		Else
			Exit	'If it's f, we're done
		EndIf
	Forever
	
	Local out:TList = l.out
	l.Reset
	Return out
End Function

' Use a simple set of filter chars to chop the path out of an Include directive
Function FilterIncludeString:String(inc:String, filter:String)
	For Local i:Int = 1 To Len(filter)
		Local p:Int = Instr(inc, Mid(filter, i, 1))
		If p Then inc = Mid(inc, p + 1) ; Exit
	Next
	For Local i:Int = 1 To Len(filter)
		Local p:Int = Instr(inc, Mid(filter, i, 1))
		If p Then inc = Left(inc, p - 1) ; Exit
	Next
	Return inc
End Function

' Try to include a source file, guards and recursion checks permitting
Function TryIncludeFile(l:TLexer, file:String)
	file = RealPath(file)
	
	If l.guardMode	'Auto-guarded includes: check if it's been used already, if so ignore it
		For Local name:String = EachIn l.prev
			If name = file Then Return	'...return without actually changing it
		Next
	EndIf
	
	For Local f:LexFile = EachIn l.istk		'Check against the currently-open files
		If f.dir + f.name = file
			Throw LexError.Make(l, "Cannot recursively include '" + file)
		EndIf
	Next
	
	l.cFile = LexFile.fromFile(file)
	l.istk.AddLast l.cFile ; l.GuardFileName l.cFile
End Function

Public

Comments

Azathoth

2014

Any reason grammar is of type TMap? It doesn't appear to get initialize

markcw

2014

Example seems to work fine here with no changes needed.

Executing:Example
 1 +2 *(3 + 1)<< 4 + 5

@root: SumExpr
  L: { '1' : number }
  R: 
    @0: 
      @0: { '+' : add }
      @1: ShiftExpr
        L: MulExpr
          L: { '2' : number }
          R: 
            @0: { '*' : mul }
            @1: Atom
              @0: { '(' : lparen }
              @1: SumExpr
                L: { '3' : number }
                R: 
                  @0: { '+' : add }
                  @1: { '1' : number }
              @2: { ')' : rparen }
        R: 
          @0: { '<<' : shl }
          @1: { '4' : number }
    @1: 
      @0: { '+' : add }
      @1: { '5' : number }


Process complete

I would try Import BRL.Map.

Yasha

2014

Yeah there doesn't seem to be much reason... the field has to be called "grammar" so that the constructors know where to look for the metadata block containing the language definition, but it isn't actually used for anything at runtime. Literally all it does is mark that metadata definition of the grammar to load up.

I guess TMap makes more sense to look at than e.g. Int, since it kind-of represents a keyed association data type? But if the field is unused, its actual type is not really that important (reflection doesn't care about variable types).

Azathoth

2014

@munch, I never had any problems with it running, I just assumed Parse might have used it to store data. After noticing it Null in the debugger and reading through the code better, only it's metadata is used.

**
I'm having trouble with Parse.Rep() ignoring filters.

Cat([ R("Expr"), Rep([ CheckTT("comma") , R("Expr") ]).f("- @") ])

If I wrap the contents with Parse.Cat(), I can use the filter on that

Cat([ R("Expr"), Rep([ Cat([ CheckTT("comma") , R("Expr") ]).f("- @") ]) ])

Is there also an example of using "<"? None of the examples seem to use it.

Yasha

2014

Ah, this version is slightly simplified from the Blitz3D one and doesn't actually run filters on rules other than Cat. Sorry about that. Largely this is because all of the other rules can return a list of variable length, or nothing, which can't always match a static filter string. And because TBH I wasn't expecting anyone to use the non-Meta parser form anyway! Although you're welcome to if you prefer it.

(Also: I figured that since filters are a convenience - you can always ignore them and process the tree manually later - full support isn't terribly important. In practice, for clarity they should really only go on toplevel named rules anyway.)

Should note that the "drop" operator in filter strings was originally supposed to be a tilde, not a minus - the minus happens to work only because of lazy coding not checking for it (it currently just tests "is it a @name? is it a <? else drop"). I... won't correct this for now, since you're using -, but feel like I should mention that background.

The < operator is used to shove a match onto the end of a preceding list. It's potentially useful for situations like "(Expr %comma)* Expr" - it's supposed to shove that last odd element onto the list with all of the others (assuming you filtered out the commas) to make the list completely homogenous, which presumably matches its intent in the language you're parsing (e.g. the list of arguments to a function should conceptually be "the list", not "most of the list and also that last one").

~~It strikes me that I may not have sufficiently tested this functionality. *goes and does so* Yeah it doesn't actually work - let me fix that.~~ Thanks for mentioning the < operator - it didn't actually work. I have now fixed it, and added it to the example.

Azathoth

2014

The reason for using TParser is the sub-rule filter, in the example I'm using it to remove commas in a list.

As for the minus drop operator, the code for PMeta gave the impression it was minus. I figured since ~ (tilde) is Blitzmax's escape character in string literals (but not metadata), the use of minus was deliberate.

Thanks for updating and explaining more of the parser.

Yasha

2014

Yep you're absolutely right on both counts, guess I forgot about that not being an oversight at all. Presumably the tilde only works at all because some language quirk is leaving my metadata strings unescaped. I should move away from using it too!

Yasha

2014

OK, I think I have updated it so that filters can now be applied to the Opt/Rep/Plus rules as well, where appropriate. Applying filters to Alt is also now explicitly disallowed because that can't work properly.

A note on the < operator: using it will always make the preceding Rep or Plus match into a list item, even if it only contains one element (which would naturally have to be the one appended by <), and would normally be eligible for upfolding. This is intentional and based on the idea that if you're using it, you'll be expecting a list to iterate over in that slot, so the slot should always deliver a list.

Code Archives Forum