string comparison
BlitzMax Forums/BlitzMax Programming/string comparison
| ||
does blitzmax check the length of the 2 strings before comparing them? (if they are different lengths they can be counted as not matching) |
| ||
I've had a dig through the source code, and no, it doesn't appear to do that check before comparing them. If you want to check, look for the keyword "bbStringCompare" in C:\Program Files\BlitzMax\mod\brl.mod\blitz.mod\blitz_string.c So not unless it does it in BlitzMax code somewhere, but there doesn't appear to be any additional BlitzMax code for string equality. |
| ||
ok thanks |
| ||
Is that an optimization that could be made? |
| ||
If your game/app compares tens of thousands of strings every frame, and this is causing your game/app to perform badly, I would assume so. |
| ||
How is the length calculated? If it works like TList.Count(), it's probably best not to bother. |
| ||
How is the length calculated? If it works like TList.Count(), it's probably best not to bother. A BlitzMax string is not a linked list. It knows its own size, as well. You could easily write your own compare-string function in C for length-vs-length optimization.Currently bbStringCompare in brl.blitz does not do this. I assume that is because the Compare method is supposed to return whether a is greater than, equal to, or less than b, even though bbStringCompare doesn't follow that rule (it only returns equality). Last edited 2010 |
| ||
what's wrong with simple size comparison:if str1.length < str2.length I am sure this is as fast as its going to get. |
| ||
even though bbStringCompare doesn't follow that rule (it only returns equality) No, it does return character order. That's why we can easily sort strings within array or TList. strict local array$[] = ["ba", "cc", "bb", "a"] k.Sort() for local n$ = eachin array print n next |
| ||
this is what blitz string does for the comparison: it does a for next loop using the shortest length string in which it compares every character until it finds the first unequal character then return the none 0 difference of the two characters if in the whole for next loop it doesn't find a character that is difference then it returns the difference in the length of the two. -n, 0 or +n. |
| ||
No, it does return character order. That's why we can easily sort strings within array or TList Huh. I was always getting true or false when I tested string comparison. The fact that it does obey the Compare method rules, though, is proof that bbStringCompare can't be optimized in such a way (I'd make your own comparison function in C). |
| ||
It returns True/False when using '=', it returns -1,0,1 when using Compare()SuperStrict Local String1:String = "abcde" Local String2:String = "bcdef" Local String3:String = "abcde" Print "String1 = String2 "+(String1 = String2) Print "String2 = String1 "+(String2 = String1) Print "String1 = String3 "+(String1 = String3) Print "~nString1.Compare(String2) "+(String1.Compare(String2)) Print "String2.Compare(String1) "+(String2.Compare(String1)) Print "String1.Compare(String3) "+(String1.Compare(String3)) |
| ||
It returns True/False when using '=', it returns -1,0,1 when using Compare() It always calls bbStringCompare, you can find that in generated asm code. |
| ||
Faster compare:SuperStrict Const Iterations:Long = 10000000 Local Variable1:String = "Hello world" Local Variable2:String = "this is another string" Local comparison:Long Local m:Int = MilliSecs() comparison = 0 For Local i:Int = 0 To Iterations If (Variable1 = Variable2) Then comparison:+1 End If Next Print "Regular compare:" + (MilliSecs() - m) comparison = 0 m = MilliSecs() For Local i:Int = 0 To Iterations If (FastStringComp(Variable1, Variable2)) Then comparison:+1 End If Next Print "Optimized compare:" + (MilliSecs() - m) Function FastStringComp:Int(str1:String, str2:String) If str1.length <> str2.length Return False Else Return str1 = str2 End Function speed benefit only in release mode. In debug mode it is slower. If you're going to compare strings of a given length, it's a bit slower than regular cmoparer (but only a very little bit), and for common string comparison wich you don't know the length of them, it should be a bit a lot faster. (almost twice faster in the small sample) Last edited 2010 Last edited 2010 |
| ||
here the Blitz_string.c function:int bbStringCompare( BBString *x,BBString *y ){ int k,n,sz; sz=x->length<y->length ? x->length : y->length; for( k=0;k<sz;++k ) if( n=x->buf[k]-y->buf[k] ) return n; return x->length-y->length; } apparently it does a final sgn function call. |
| ||
Local strOne:String = "Hello" Local strTwo:String = "Hellos" Print Compare(strOne,strTwo) Function Compare:int(x:String , y:String) If Object(x) = Object(y) Then Return true Return false End Function Any good? Dabz |
| ||
Local strOne:String = "Hello" Local strTwo:String = "Hellos" strTwo = strTwo[..5] Print Compare(strOne,strTwo) Print strOne=strTwo Function Compare:Int(x:String , y:String) If Object(x) = Object(y) Then Return True Return False End Function I don't know how object comparison works because I haven't looked it up, but it doesn't behave the same way as string comparison. |
| ||
Mmmm, just thought it was worth a punt! :) Seems to graft until you slice it, wonder why!?! Dabz |
| ||
As you say, it seems to work fine until you slice it. You can reassign it multiple times, give it longer or shorter strings and then reassign the matching value and it still works fine. As soon as you slice it, it gets lost. |
| ||
What does this mean? A bug? |
| ||
What does this mean? A bug? Not sure, I tend not to dig that deep when it comes to BlitzMax, I used that once years ago in another language, cannot recall the reason now, but as soon as I saw the string comparison thing, that just sprung to mind. Would be nice to know why the slice buggers it if its not a bug, just for future reference! Dabz |
| ||
I haven't a clue. I assumed that strings are either immutable or they're not. Since they are in BlitzMax, I believed that it would either work or not work. What slicing does that other attempts to modify a string don't do, I really couldn't say. BTW, has anyone actually tried my "breaking" test up there besides me? I haven't updated BlitzMax in ages, so it could very easily work fine on an updated version for all I know. |
| ||
If you mean post #17, then yeah, it borks here, Mac Max 1.41!!! Dabz |
| ||
"Object = Object" only compares their memory addresses. |
| ||
If you mean post #17, then yeah, it borks here, Mac Max 1.41!!! That's the one. Well if it's that way in 1.41 as well, then I really don't know what slicing does different. "Object = Object" only compares their memory addresses. No, it doesn't. If it did, this wouldn't work. Local strOne:String = "Hello" Local strTwo:String = "Hello" Print Compare(strOne,strTwo) Function Compare:Int(x:String , y:String) If Object(x) = Object(y) Then Return True Return False End Function |
| ||
No, it doesn't. If it did, this wouldn't work. Yes it does, but it is using the same mem allocation for equal literals (creating a single mem addres for them in compile time). This shows it: Strict Local strOne:String = "Hello" Local strTwo:String = "Hellos" strTwo = Left(strTwo, 5) Local StrTree:String = "Hello" Print Compare(strOne, strTwo) Print Compare(strOne, strTree) Function Compare:Int(x:String , y:String) If Object(x) = Object(y) Then Return True Return False End Function |
| ||
Ah, I see... Well, that explains it then! :) Dabz |
| ||
To be honest, that was my first thought when I saw the results, but I dismissed it as being too ludicrous to be true. I'm still not sure I've changed my mind. I suppose, since strings are immutable, there's no real harm in it, but it still feels wrong. |
| ||
To be honest, that was my first thought when I saw the results, but I dismissed it as being too ludicrous to be true. I'm still not sure I've changed my mind. I suppose, since strings are immutable, there's no real harm in it, but it still feels wrong. This is very common in compilers. When you're writing the intermediate representation of the syntax tree, you add references to the literal resources (strings). While the AST is being built, it's common to add a literal reference ONLY if it is new, so generated executable is smaller and has a better handling of redundant resources. But as none of us has the bcc source code, this can be different. Anyway, taking a look to the generated assembler seems to confirm both strings are sharing the same literal representation. |