Something along the lines of a lexical analyzer
BlitzMax Forums/BlitzMax Programming/Something along the lines of a lexical analyzer
| ||
| Well, I have come up with something along the lines of a lexical analyzer, as the title states. It works on Linux (ubuntu 11.10 64bit). For those of you who do not know what a lexical analyzer is, it breaks up the parts of code into chunks (aka. Tokens), and sends them to a parser to parse them. A good example is Flex (which creates a lexical analyzer), and Bison (which creates a parser). They work together flawlessly, but I personally HATE C. I hope my code doesn't suck too bad for 2AM on a weeknight (yes, I didn't comment, I just wanted to get it done)...
Const CNULL:Int = -1
Const CSTATEMENT:Int = 0
Const CINTEGER:Int = 1
Const CADD:Int = 2
Const CSUB:Int = 3
Type Token
Field typ:Int
Field txt:String
Function Create:Token(typ1:Int,txt1:String)
Local toret:Token = New Token
toret.typ = typ1
toret.txt = txt1
Return toret
End Function
End Type
Function Lex:Token(statement:String)
DebugLog "Lex"
Local tokens:Token[] = New Token[3]
For Local x:Int = 0 To 2 Step 1
tokens[x] = Token.Create(CNULL,"")
Next
Tokenize22(statement,tokens)
For x = 0 To 2 Step 1
DebugLog tokens[x].typ
DebugLog tokens[x].txt
Next
Local c:Int = 0
If tokens[0].typ = CNULL Then
llerror("Lexical Analyzer Error (Internal)!")
EndIf
If tokens[0].typ <> CINTEGER Then
llerror("Invalid Token!")
EndIf
If tokens[1].typ = CNULL Then
Return tokens[0]
EndIf
If tokens[2].typ = CNULL Then
llerror("Bad Number Of Tokens!")
EndIf
If tokens[2].typ = CSTATEMENT Then
Local ttok:Token = Lex(tokens[2].txt)
tokens[2].typ = CINTEGER
tokens[2].txt = ttok.txt
Return Token.Create(CINTEGER,Parse(tokens[0].typ,tokens[0].txt,tokens[1].typ,tokens[2].txt))
EndIf
End Function
Function Parse:String(typ:Int,txt1:String,modifier:Int,txt2:String)
Local toret:String
Select typ
Case CINTEGER
Local txt1_1:Int = Int(txt1)
Local txt2_1 = Int(txt2)
Select modifier
Case CADD
toret = String(txt1_1 + txt2_1)
Case CSUB
toret = String(txt1_1 - txt2_1)
Default
llerror("Unknown Type of Modifier For Stuff To Parse!")
End Select
Default
llerror("Unknown Type of Stuff To Parse!")
End Select
Return toret
End Function
Function Tokenize22(statement:String,tokens:Token[] Var)
Local mytokens:Byte = 0
Local inint:Byte = 0
For Local x:Int = 1 To Len(statement) Step 1
If inint = 1 Then
If Mid(statement,x,1) <> " " Then
tokens[mytokens].txt :+ Mid(statement,x,1)
Continue
Else
inint = 0
mytokens :+ 1
Continue
EndIf
EndIf
If mytokens = 2 Then
tokens[2].typ = CSTATEMENT
tokens[2].txt = Mid(statement,x)
Return
EndIf
Select Mid(statement,x,1)
Case " "
Continue
Case "+"
tokens[mytokens].typ = CADD
mytokens :+ 1
Case "-"
tokens[mytokens].typ = CSUB
mytokens :+ 1
Case "1","2","3","4","5","6","7","8","9","0"
inint = 1
tokens[mytokens].typ = CINTEGER
tokens[mytokens].txt = Mid(statement,x,1)
Default
llerror("Unknown Character!")
End Select
Next
End Function
Function llerror(text:String)
RuntimeError text
End Function
I am open to any suggestions! |
| ||
| Looks like good fun. Have you thought about how to extend it into a more generic framework? (I'm too lazy to write efficient lexer code by hand, so my own scanners invariably end up making heavy use of regular expressions.) What would be really cool is if it were possible to devise some kind of parser-framework (think Spirit, rather than a parser-generator like Bison) for BlitzMax. While I have a few ideas, the lack of much in the way of metaprogramming options makes it hard to come up with something that's both suitably expressive and fast (I wonder if using something like C macros counts as "cheating" when writing BlitzMax code?). |