Regular expression help

BlitzMax Forums/Brucey's Modules/Regular expression help

Ghost Dancer(Posted 2014) [#1]
Hi

I'm trying to get character positions of search results but noticed some strangeness. The following example illustrates the problems.

Strict
Import BaH.RegEx

Local str$ = "1/5" + Chr($2013) + "6, 2/4"
Local regexPrefix:TRegEx = TRegEx.Create("([0-9]+/)+")
Local match:TRegExMatch

For Local l = 1 To 2
	Print "STRING: " + str$
	match = regexPrefix.Find(str$)
	
	Repeat
		Print match.SubExp() + ": " + match.subStart() + ", " + match.subEnd()
		match = regexPrefix.Find()
	Until match = Null Or match.SubExp() = ""
	
	str$ = "1/5-6, 2/4"
Next


Problem 1: subStart() is returning 1 less than the actual position. Is this a bug or how regex is meant to work? Its easy enough to fix but thought it worth mentioning.

Problem 2: This is the one that's caused me a headache. The values returned by subStart and subEnd are different when using Chr($2013) or the standard hyphen (both strings have the same number of characters). I'm assuming this is UTF-8 related but I can't work out how to account for this. Would really appreciate some help with this.

I am using the latest version of the module (1.04)


Brucey(Posted 2014) [#2]
Hallo,

What results are you expecting?

I get :
1/: 0, 2
2/: 7, 9

for both.


Brucey(Posted 2014) [#3]
I'm using the SVN version (1.05), which was modified to work instead on BlitzMax native string size. (UTF-16 or thereabouts).

Or you can get it from github, if you prefer?

Since google dropped support for downloads on googlecode, I'm still sorting out a replacement downloads area.


Ghost Dancer(Posted 2014) [#4]
Both strings are the same length so I would expect them to both return the same, i.e.

1/: 0, 2
2/: 7, 9

But I get this:

1/: 0, 2
2/: 9, 11

1/: 0, 2
2/: 7, 9

Its the EN dash that causes the problem, I think because the character is more than 1 byte.

Didn't realise there was a newer version elsewhere. Will give it try and report back :)


Ghost Dancer(Posted 2014) [#5]
Awesome, V1.05 is returning the same for both now. Thanks :)

StartSub() is still returning 1 less than the actial position though.


Derron(Posted 2014) [#6]
Did not check it -- but StartSub may return the "array position" (zero based) ?


bye
Ron


Brucey(Posted 2014) [#7]
In your example, subStart() returns 0 and 7 respectively. Based on your string, that looks fine :
                "1/5-6, 2/4"
                 ^      ^
string position  0      7

BlitzMax is zero-indexed based - in strings, arrays, etc.

Or am I not understanding your point?

<edit> Derron thinks so too :-p


Ghost Dancer(Posted 2014) [#8]
I thought that at first, but my example uses the following string "1/5-6, 2/4" and extracts the "1/". It is returning 0 for subStart and 2 for subEnd.


Brucey(Posted 2014) [#9]
I'll get back to you ;-)


Brucey(Posted 2014) [#10]
I've committed an update which fixes the value returned from subEnd(), and also updates to the latest version of PCRE.

:o)


Ghost Dancer(Posted 2014) [#11]
Nice one, thanks Brucey :) I'll check it out later.


UNZ(Posted 2014) [#12]
Great!

The new regex version solved the issue I had:
http://www.blitzbasic.com/Community/posts.php?topic=101635

Now I can use the oxygen engine. Looks way smoother!
(although I still wonder why wxwidgets 3 uses gtk2-engines. Whatever...)

Big thanks Brucey :)


Ghost Dancer(Posted 2014) [#13]
I see the problem was subEnd not subStart. I was expecting it to be 1 based like Blitz's string functions, not 0 based like arrays, hence some of the confusion in our above posts :p

I've just downloaded from github but its still V1.05.


Brucey(Posted 2014) [#14]
I've just downloaded from github but its still V1.05.

I bump versions with a "release". Stuff in source control shows the version that the new release will have when it is released.

Releases are somewhat of an issue at the moment, as my usual place (googlecode) have stopped allowing new downloads (Thanks Google!). I have a second option which is still in testing - hosted on my own site.


Ghost Dancer(Posted 2014) [#15]
I have to admit I don't really know much about source control as I never use it - I just downloaded the files and install them, but couldn't see any difference when I tested it.


Derron(Posted 2014) [#16]
Just think of that svn/git-projects as a backup of Bruceys current work.
Releases are then specific backups Bruceys accepts as "ready for other users".


What differences exist between different releases: just compare them using Diff-tools (available for desktop or often by the hosting platforms).




bye
Ron