Regular expression help
BlitzMax Forums/Brucey's Modules/Regular expression help
| ||
Hi I'm trying to get character positions of search results but noticed some strangeness. The following example illustrates the problems. Strict Import BaH.RegEx Local str$ = "1/5" + Chr($2013) + "6, 2/4" Local regexPrefix:TRegEx = TRegEx.Create("([0-9]+/)+") Local match:TRegExMatch For Local l = 1 To 2 Print "STRING: " + str$ match = regexPrefix.Find(str$) Repeat Print match.SubExp() + ": " + match.subStart() + ", " + match.subEnd() match = regexPrefix.Find() Until match = Null Or match.SubExp() = "" str$ = "1/5-6, 2/4" Next Problem 1: subStart() is returning 1 less than the actual position. Is this a bug or how regex is meant to work? Its easy enough to fix but thought it worth mentioning. Problem 2: This is the one that's caused me a headache. The values returned by subStart and subEnd are different when using Chr($2013) or the standard hyphen (both strings have the same number of characters). I'm assuming this is UTF-8 related but I can't work out how to account for this. Would really appreciate some help with this. I am using the latest version of the module (1.04) |
| ||
Hallo, What results are you expecting? I get : 1/: 0, 2 2/: 7, 9 for both. |
| ||
I'm using the SVN version (1.05), which was modified to work instead on BlitzMax native string size. (UTF-16 or thereabouts). Or you can get it from github, if you prefer? Since google dropped support for downloads on googlecode, I'm still sorting out a replacement downloads area. |
| ||
Both strings are the same length so I would expect them to both return the same, i.e. 1/: 0, 2 2/: 7, 9 But I get this: 1/: 0, 2 2/: 9, 11 1/: 0, 2 2/: 7, 9 Its the EN dash that causes the problem, I think because the character is more than 1 byte. Didn't realise there was a newer version elsewhere. Will give it try and report back :) |
| ||
Awesome, V1.05 is returning the same for both now. Thanks :) StartSub() is still returning 1 less than the actial position though. |
| ||
Did not check it -- but StartSub may return the "array position" (zero based) ? bye Ron |
| ||
In your example, subStart() returns 0 and 7 respectively. Based on your string, that looks fine :"1/5-6, 2/4" ^ ^ string position 0 7 BlitzMax is zero-indexed based - in strings, arrays, etc. Or am I not understanding your point? <edit> Derron thinks so too :-p |
| ||
I thought that at first, but my example uses the following string "1/5-6, 2/4" and extracts the "1/". It is returning 0 for subStart and 2 for subEnd. |
| ||
I'll get back to you ;-) |
| ||
I've committed an update which fixes the value returned from subEnd(), and also updates to the latest version of PCRE. :o) |
| ||
Nice one, thanks Brucey :) I'll check it out later. |
| ||
Great! The new regex version solved the issue I had: http://www.blitzbasic.com/Community/posts.php?topic=101635 Now I can use the oxygen engine. Looks way smoother! (although I still wonder why wxwidgets 3 uses gtk2-engines. Whatever...) Big thanks Brucey :) |
| ||
I see the problem was subEnd not subStart. I was expecting it to be 1 based like Blitz's string functions, not 0 based like arrays, hence some of the confusion in our above posts :p I've just downloaded from github but its still V1.05. |
| ||
I've just downloaded from github but its still V1.05. I bump versions with a "release". Stuff in source control shows the version that the new release will have when it is released. Releases are somewhat of an issue at the moment, as my usual place (googlecode) have stopped allowing new downloads (Thanks Google!). I have a second option which is still in testing - hosted on my own site. |
| ||
I have to admit I don't really know much about source control as I never use it - I just downloaded the files and install them, but couldn't see any difference when I tested it. |
| ||
Just think of that svn/git-projects as a backup of Bruceys current work. Releases are then specific backups Bruceys accepts as "ready for other users". What differences exist between different releases: just compare them using Diff-tools (available for desktop or often by the hosting platforms). bye Ron |