Regarding variables and stuff

Blitz3D Forums/Blitz3D Beginners Area/Regarding variables and stuff

JBR(Posted 2003) [#1]
Hi, I have a few questions about variables in general.

1) Is there any speed benefits by using global variables over local ones. I tend to use local ones in functions but this would require the creation and deletion which will slow it down?

2) I have found that 'Abs' is slower than doing a simple check myself. ( if a%<0 then a%=-a% ). I'm using integer and just wonder if using floating point variables would be faster? I understand an integer & float operation can take place at the same time?

I'm just trying to speed up inner loop to be as fast as possible so any advice would be appreciated.

Thanks
Marg


robleong(Posted 2003) [#2]
You might be splitting hairs there. A quicker way of speeding up your inner loop would be to get rid of it altogether. Sure the program will be longer and less elegant, but it will be faster! Don't understand question 2 - as far as I'm aware, floating point operations are slower than integer ones. If you don't need to use floating point, don't. Good luck.


Curtastic(Posted 2003) [#3]
wow the if is twice as fast a abs().
these thing can make a huge defference in the speed of some programs, like ones that do it for every pixel drawn and draw the whole screen every loop.

ints are faster floats in blitz
using globals and locals are the same speed in whether in a function or not in blitz. but allocating locals takes time


JazzieB(Posted 2003) [#4]
It's worth noting that variables are Local by default. Even those outside of any Functions, which are only known by the main program. If anything needs to be known inside a Function from outide of it, then variables need to be defined as Global at the beginning of your code.

The only time I use Local is when defining a Blitz array for use within a Function.


BobR(Posted 2003) [#5]
Has anyone ever compiled a list of these speed comparisons..?

It's good to know about ABS() being slower than an explicit comparison.

How about others like that..?

For example, I've read in the forums that Functions are marginally slower than Subroutines.

Might not matter in a game of "Hangman", but it could just possibly make the difference between playable and unplayably slow.

Has anyone ever done any testing on anything else like that..?

Would this be a good subject for a new Topic..?


Andy(Posted 2003) [#6]
>Has anyone ever compiled a list of these speed
>comparisons..?

That would be rather pointless, since much of it would be dependant on the CPU.

>For example, I've read in the forums that Functions are
>marginally slower than Subroutines.

True.

>Might not matter in a game of "Hangman", but it could just
>possibly make the difference between playable and
>unplayably slow.

I don't think so... If that were the case, you would need a major overhaul of the logic.

Andy


Shambler(Posted 2003) [#7]
I suggest before spending lots of time speeding certain program areas up you first write the program.

Then and only then should you spend time speeding it up IF it requires speeding up at all!

Optimizing is all very well, just make sure you aren't wasting your time on something that's not going to be a problem anyway.


BobR(Posted 2003) [#8]
>>Has anyone ever compiled a list of these speed
>>comparisons..?

>That would be rather pointless, since much of it would be >dependant on the CPU.


Not at all- if one function runs twice as fast as another function on a 500Mhz CPU it will STILL run twice as fast on a 2Ghz CPU.

Coorae reported the ABS() function took twice as long as an explicit comparison. If your program required many such tests, you'd be far better off using the faster method, regardless of what speed the CPU might be.

Too many programs assume the user is going to have the baddest CPU on the block, making optimization unnecessary. But that's a BAD assumption to make, and limits your eventual market for the final program.

>>For example, I've read in the forums that Functions are
>>marginally slower than Subroutines.

>True.


Well, then if you have a time-critical program, why not use the method that's going to run faster..?

The original question in this thread was asking about speeding up a time-critical loop. If it needed to call several funtions, then maybe the increased speed of calling them as subroutines instead would make the difference.


>>Might not matter in a game of "Hangman", but it could just
>>possibly make the difference between playable and
>>unplayably slow.

>I don't think so... If that were the case, you would need >a major overhaul of the logic.


Which is why it would be nice to have a compilation of speed comparisons BEFORE you code the logic.

If you knew, for example, that ABS() was slower than an IF/THEN comparison, you could code for that in the first place. No "overhauls" needed.


It's just that you hear every now and then about some function or statement being faster than some other one, but as far as I've ever seen there is no "central" resource that lists all of these.

I for one would like to know the most advantageous approach before I end up HAVING to waste time overhauling the code.


Andy(Posted 2003) [#9]
>Not at all- if one function runs twice as fast as another
>function on a 500Mhz CPU it will STILL run twice as fast
>on a 2Ghz CPU.

That is certainly not the case... Core version/CPU speed/FSB speed/cache and even brand has a great influence on the results. The results on a 2GHz P4 is quite different from those of a 2GHz Athlon... And how about the P3 1.4GHz which will compete against the 2GHz P4 in some tests and won't in other tests and then there is the VIA C3... And what about drivers...

Andy


BobR(Posted 2003) [#10]
Sorry... you're still comparing apples and oranges.

You're confusing hardware speed with software execution speed.

If function "A" takes 8ms to execute on a 500Mhz CPU, and
function "B" takes 16ms to execute on a 500Mhz CPU, then

Function A will run 2x as fast as Function B, no matter what speed CPU they're running on.


The ACTUAL speeds will vary depending on the CPU, but
The RELATIVE execution speeds will be the same.


So... running on a 2Ghz CPU, Function A will take approximately 2ms to complete and Function B will take approximately 4ms to complete.

They run faster because the CPU is 4x faster, but Function A still is 2x as fast as Function B.


Any hardware differences between CPU brands or core technology only affects the overall processing speed of the hardware.


The relative speed differences between the software functions would remain constant.


So- if IF/THEN is twice as fast as ABS() on one computer, it will be twice as fast as ABS() on ANY computer.


The point is, BOTH methods will achieve the same result.

Why not use the method that's faster..?


Ross C(Posted 2003) [#11]
only thing is, that this will not make the loop twice as fast, but it would be interesting to see a comparison. i believe most of the time consumed is to do with texturing and displaying polys. tho, i could be wrong, have been in the past :S


Andy(Posted 2003) [#12]
>If function "A" takes 8ms to execute on a 500Mhz CPU, and
>function "B" takes 16ms to execute on a 500Mhz CPU, then
>
>Function A will run 2x as fast as Function B, no matter
>what speed CPU they're running on.

No they won't... You are arguing that all parts of the CPU(ALU; FPU, BPU etc)are equally fast on different CPUs, regardless of the core, the number of stages, the number of units, cache size and speed and busspeed...

Andy


WolRon(Posted 2003) [#13]
Get a load of this optimization:
A ONE pixel wide RECTangle (basically a line) will draw 70% faster than an orthogonal LINE the same length even though the rectangle is actually drawing 4 lines.
Note that RECT's can't draw angled lines but if you were going to draw a straight up/down or left/right line anyways, why not draw a rectangle?


soja(Posted 2003) [#14]
WolRon:
Very interesting. You would think that they would code special cases into LINE so that it would be optimized for straight lines, but on second thought, perhaps that was left out becuase on the average, it would take longer to make the "is this line straight?" check. And on the contrary, it's taken for granted that a RECT always consists of straight lines. (And perhaps there's an optimization in RECT that detects 1-pixel-wide rectangles and only draws one side.) That's what I'm guessing, anyway.


BobR(Posted 2003) [#15]
Cool..!

That's the kind of thing I'm interested in finding out, and perhaps compiling into a list of comparative features like that.

If a game requires a lot of orthagonal lines, the time savings by actually using RECTs might be considerable.

Many of the differences are pretty intuitive- for example, ImagesOverlap() is faster than ImagesCollide() because it's obvious the first doesn't compare ALL the pixels in the images.

So if you don't need perfect collision checking you can use the faster function.

But using a RECT instead of a LINE for speed is something nobody is ever going to think to do- unless they can read about it somewhere... like a list of speed comparisons.


Andy- I'm sorry you're not understanding what I'm saying. None of what you said has any bearing on whether or not one function runs faster than another function on ANY computer.


soja(Posted 2003) [#16]
Andy- I'm sorry you're not understanding what I'm saying. None of what you said has any bearing on whether or not one function runs faster than another function on ANY computer.

Allow me to interject. I think Andy understands exactly what you're saying. Ok, one function may indeed run faster than another function. Yes. His point is that the difference between the two execution speeds is not a 1:1 ratio as CPU frequencies differ. You do have to take into account how the CPU works. Remember that Hz (MHz, GHz) is not a measurement of speed, though many treat it as such.

I do not understand what you say here:

Any hardware differences between CPU brands or core technology only affects the overall processing speed of the hardware.

The relative speed differences between the software functions would remain constant.


There's no such thing as a "software" function, really. Software runs on hardware, and it is calculated by SOME processor somewhere. In this case, either the GPU or CPU. So what he said seems to apply. The only place where I see that something "runs in software" is where a differentiation is being made for graphics functions being computed by the CPU ("software") as opposed to the GPU ("hardware-accelerated").


WolRon(Posted 2003) [#17]
But using a RECT instead of a LINE for speed is something nobody is ever going to think to do- unless they can read about it somewhere... like a list of speed comparisons

Umm..., I did.


WolRon(Posted 2003) [#18]
Posted by Andy:
>Has anyone ever compiled a list of these speed
>comparisons..?

That would be rather pointless, since much of it would be dependant on the CPU.

So, basically my RECT for LINE optimization is hardware dependent?

Yeah right.


BobR(Posted 2003) [#19]
OK, first- anything external to the CPU will have NO effect on the relative execution speed of two software functions like ABS() and IF/THEN comparisons.


There MAY be a slight effect on processing speed if a specific CPU core implements the way it processes a certain operation differently than some other CPU. Not likely, and any effect would be small relative to the overall difference in speeds between two software functions.

For example, if the Athlon Thorobred core processes bitwise comparisons more efficiently than a PIII/Celeron, then MAYBE the relative speed between ABS() and IF/THEN MIGHT be affected. Slightly. Not likely.


If Blitz Basic could run on an 80286 CPU at 10Mhz, it would run very slowly-

But- ABS() would STILL run more slowly than IF/THEN.



>There's no such thing as a "software" function, really.


Umm.. not sure how to respond to that.


Hardware by itself can't do anything. With the exception of micro-coded low-level activity, hardware is "brain dead" without software instructions to tell it what to do.


What I'm trying to say here is that the same string of software commands (a software "function") will execute approximately the same no matter what CPU you run it on.

A faster CPU will execute them more quickly, but the operations will be pretty much the same.

(We're assuming the CPUs are compatible here. We're not comparing Apples and IBMs for example.)


So two software functions will run at approximately the same relative ratio on one CPU as they do on any different CPU.

The actual execution speeds may vary, but the relative ratio of the execution time will remain constant.


If ABS() runs slower than IF/THEN on one CPU, it will run slower on ANY CPU.


Thus, if I was writing a game that I wanted to sell to the largest possible market, I would use IF/THEN instead of ABS(). That would help guarantee it would run at the best possible speed on ANY computer.

I would also use Subroutines instead of Functions and ImagesOverlap instead of ImagesCollide (as much as possible), along with any other similar speed comparison "tricks" we can come up with like RECTs instead of orthagonal LINEs.

The point is not that ONE such "trick" would make much of a differnce, but that ALL the "tricks" we can find MIGHT make enough of a difference to matter.


BobR(Posted 2003) [#20]
WolRon - you mean you just tried RECT instead of using LINE one day..?

That's pretty good..! I know I never would have thought of trying that.

That's the kind of thing I think we need to compile into a reference on speed comparisons of different ways of accomplishing the same things.

As people discover stuff like this, it would be good to keep it all together so we don't all have to keep re-inventing the wheel.


Andy(Posted 2003) [#21]
>Andy- I'm sorry you're not understanding what I'm saying.
>None of what you said has any bearing on whether or not
>one function runs faster than another function on ANY
>computer.

I do understand, and from what you have said 3 times, you're wrong. I guarantee you'll look at things differently if you try to use the same harware except for CPU... Use an intel P3 900MHz and a VIA C3 900MHz, and do the calcs.

soja actually has a point with the graphics being handled by the GPU... I only dealt with the logic and number-crunching done by the CPU previously, but the GPU adds more variables.

>So, basically my RECT for LINE optimization is hardware
>dependent?
>Yeah right.

What did you expect it to be dependant on? Voodoo? Black Magick? Witchcraft? Government spending? Astrology?

Yes, the speed of the individual commands in B3D can vary and their speed in relation to other commands certainly vary across different hardware.

Andy


BobR(Posted 2003) [#22]
>Yes, the speed of the individual commands in B3D can vary >and their speed in relation to other commands certainly >vary across different hardware.


Not the things we were talking about earlier.

A P3 vs VIA C3 would make NO discernable difference to the ratio between execution times of two internal Blitz functions.

GPU, FSB, Cache, drivers, none of that external stuff has ANY bearing on the execution speed ratio between two functions like we're discussing here.


Things that draw to the screen probably WOULD be affected by the video card's hardware, so possibly a video card that can handle primitives like lines and rectangles MIGHT have an effect on the speed of those particular commands.


But for the kinds of things that started this thread, Blitz commands that run entirely inside the CPU, there should be no effect on the relative efficiency of the commands on different CPUs.


Perhaps you can explain how a video card or driver can affect how fast ABS() runs..?


Foppy(Posted 2003) [#23]
Umm..., I did.
I did that too (using rect instead of line)! It was in an attempt at a Pole Position style racing game. I had to draw the road horizontal line by horizontal line (decreasing the length of the line as it got closer to the horizon to get the 3D effect). First I used lines. Then I switched to rects. Apart from the better performance I did this because now I could choose a different "resolution" (for even more speed): suppose I use rects of height 2, that means I have to draw only half the number of rects! The road side gets chunkier but who cares, it's the speed that counts. Anyway I gave up when objects started to float into the air due to a mistake somewhere in my pseudo-3D code. ;)


soja(Posted 2003) [#24]
Hi Bob,

Call me confused. It sounds to me as if you're saying there are calculations ("software functions") which do not rely on any processing unit (like the CPU) to be computed. (?!)

All software, inlcuding the OS, must run on hardware. All functions (even the simplest, including ABS) must be computed by some processor (usually the CPU). So tell me why or how any function's computation wouldn't rely on the hardware that was computing it...?

One point of interesting comparison are the Athlons and Pentiums of the last couple years or so. In general, the Athlon has had a much lower comparable clock speed and yet it is able to perform at approximately the same level as a Pentium with a much higher frequency (e.g. 1533 MHz Athlon vs. 2000 MHz Pentium). This alone should reveal that there are many other factors besides clock speed which affect the whole processing performance of a CPU. For those interested, there are some nice CPU Pipeline tutorials found on Google (search for "CPU Pipeline").

WolRon:
It's pretty creative to imagine that Rect might work faster than Line for certain cases. It's counter-intuitive at first, but begins to make sense after some thought. It makes one want to think creatively about other functions too. Cool.


BobR(Posted 2003) [#25]
OK- I'm going to make up some numbers to illustrate what I've been saying. Don't take them as anything but illustrations. I haven't had a chance to code up any "benchmarks" to do actual timing, but this should help explain things:


Suppose you were to write a loop that takes a Random number and does nothing but do ABS(number) on it. Run the loop 10,000 times and time how long it takes to run.


Then write a loop that takes a Random number and does

IF number <0 THEN number = -number ; (Does the same thing as ABS() does)

and run it 10,000 times, timing how long THAT takes to run.


When you run the loops, you'll find out how long it takes to run ABS() 10,000 times, compared to running the IF/THEN line 10,000 times, on WHATEVER computer you're running them on.

The loop that takes less time to run will show which method of finding the Absolute Value of a number is faster.


Now take the program to a different computer. One with a far different CPU, a different video card, different RAM, completely different. (Still IBM compatible though.)


Now run the loops again. The time it takes to run 10,000 ABS() on the new computer WILL be different.


This is what you've been saying. A different computer WILL run the code faster or slower than the first one. No problem.


But- what I've been saying is that the second computer will run the OTHER loop, the IF/THEN loop at a different speed too... but they will BOTH be in the SAME RELATIONSHIP as they were on the first computer.


Let me make up some numbers to illustrate this:

Suppose the first computer was a PIII with a so-so video card and 256MB of RAM. Nothing special.


Loop 1 (the ABS function) takes 8 seconds to run 10,000 times.
Loop 2 (the IF/THEN statemsnt) takes 4 seconds to run 10,000 times.


So the first loop takes twice as long as the second loop.

(That means that doing an IF/THEN test is twice as fast as doing an ABS() to get the Absolute Value of a number. Both get exactly the same results, but one method is faster.)


Now go to your friend's "super bleeding edge" game computer. An Athlon 2800XP+ with a top-end Radeon video card, a Gig of RAM and a 333Mhz FSB. Run both loops on that computer.


Loop 1 takes 3 seconds to run on this computer.
Loop 2 takes 1.5 seconds to run on this computer.


You see, you're right- the other computer runs the loops faster.


But notice that Loop 1 is STILL TWICE as fast as loop 2.


Now take the two loops to your grandmother's house and run them on her old P150 computer that she uses to send e-mail with. No special video card and 32MB of RAM. Pretty pathetic.


Loop 1 takes 20 seconds to run on this computer.
Loop 2 takes 10 seconds to run on this computer.


Again, you're right- the slower computer runs the loops slower.

But again- notice that Loop 1 is STILL TWICE as fast as loop 2.


This is what I've been trying to say all along. YES, different hardware runs code at different speeds. YES, the hardware makes a difference how fast code will run.


BUT- code which runs slower than other code on one machine will STILL run slower than that other code on a different machine if it runs without using any external devices.

Even though BOTH sets of code run faster than when they were on the other computer.

The KIND of hardware won't change that.

Loop 1 will ALWAYS be faster than Loop 2 no matter WHAT hardware you run them both on.


The whole point of this whole thing is that if we can find out more of these speed comparisons between equivalent code, then we could use the faster code to make programs run faster on ANY computer than they would with the other (slower) code.



To answer your question:

The code that does both the ABS() and the IF/THEN runs in the CPU chip of the computer.

These are probably good choices of code to use for this discussion because NOTHING ELSE will affect how fast they run on any particular computer.


The machine code that perfoms the ABS() runs entirely inside the CPU. It doesn't use the video card and it doesn't need to use external RAM or the FrontSide Bus. Nothing outside the CPU will affect how fast it completes its task.


The only real factors affecting it are the speed of the CPU and the efficiency of the coding. (That's up to Mark Sibly in the case of Blitz Basic.)


So when you execute ABS() and then the IF/THEN statement, the speed of the CPU chip will be the same (since you're running them both on the same machine), and it won't affect the difference between the two methods of taking the Absolute Value of a number.

The ONLY difference between how fast they run will be the efficiency of the code Mark Sibly wrote.


THAT'S what we're actually trying to get to here- what functions or commands are coded "better" than other equivalent functions or commande.



Now- the LINE versus RECT issue is different.


In that case more of what Andy has said comes into play.

The CPU will have to do calculations on where the points are located, but it will have to involve the video card into actually displaying the line or rectangle.


IF the video card can draw rectangles or lines by itself, then all the CPU has to do is tell it where the starting and ending points are, and the video card hardware will do the rest.

This is very fast and lets the CPU go on and calculate other things while the video card is drawing and displaying the lines and rectangles.


BUT- if the video card can't draw the lines and rectangles by itself (older video cards didn't have as much functionality as newer video cards do these days), then the CPU will have to calculate EVERY point on the line or rectangle and send it to the video card, pixel by pixel.

All the video card would do in this case would be to display the pixels on the screen after the CPU has told it where they need to be to form a picture of a line or rectangle.


So in this case, Andy is right- the different hardware may treat LINE and RECT differently, so maybe one video card would draw lines faster, and another might draw rectangles faster, and we wouldn't be able to know which one to use to make our programs run faster on whatever video card the user owns.


The only way to tell for sure is to actually run tests like I talked about back at the top of this, on a lot of different computers and list the results for people to read and try on their own computers.


And maybe Shambler is right- maybe it would turn out that the differences between equivalent methods of doing the same things isn't enough to bother worrying about them.

But it would be nice to know for certain.


BobR(Posted 2003) [#26]
Oh- before anyone says anything... none of this is intended to be critical of Mark, or his programming of Blitz Basic.

EVERY computer language has things like this, and programmers have always taken advantage of them to make their programs just a little bit faster or better or whatever.

It's called "Undocumented Features" and it's sort of "beyond the fringe".


One other thing-

Using "features" like this can be dangerous too.

There's no guarantee that a "trick" like using IF/THEN instead of ABS() will be faster in the NEXT version of Blitz.

Mark might make a change which doesn't affect the functionality of the language, but DOES make ABS() every bit as fast, or even faster than any alternatives.

If that happened, a program which relied on this particular "Undocumented Feature" would end up actually being SLOWER than it could have been.

So you have to be careful about what "tricks" you use...


FlameDuck(Posted 2003) [#27]
It's good to know about ABS() being slower than an explicit comparison.
Yep. Except it isn't. It depends on what kind of numbers you have, and what CPU you have. Andy is correct.

Consider the following: Your ABS() statement is being compiled into one, maybe two calls to the ALU/FPU. The IF statement is compiled to at least five instuctions, probably more. Now the thing is the conditional branch (IF) will only effect some of the numbers, namely the negative ones, so depending on your numbers this will result in executing a fast compare (that will normally fail). Let's say for arguments sake, that the cost is 100 clock cycles for the low cost IF branch, 300 for the high cost IF branch and 200 for the ABS. Obviously executing the low cost IF branch 10K times is going to be faster than ABS, but ABS will still be faster than the high cost IF branch. Now you run the same program on a different CPU, with updated microcode that greatly reduces the cost of ABS() functions so that they only take 50 clock cycles. And the ratio will be all wrong. You cannot assume that on CPU's, even CPU's that are compatible, instructions will execute at the same relative speed.

Also tests like the "10000 loops" are seriously flawed, and it is prone to memory allignment issues, cache sizes (a small test loop will easily fit into a cache, your main game loop probably won't) and memory hits. Because of the genious decision to only use 4 CPU registers, Intel and compatible processors must access memory quite frequently. This means that memory speed and bandwidth quickly becomes an issue. While your test loop will probably never have to wait for data (as it doesn't use more variables than can comfortably fit in 4 registers) your main game loop will, and your previous optimization may be canceled by the memory bottleneck. This is also why you should write logic first, and optimize later.
So, basically my RECT for LINE optimization is hardware dependent?
Ofcourse it is. Rect uses the hardware accelerated functionality of DirectX, where as Line AFAIK uses an assembly optimized software function. If Line is faster on your machine, then it's because your GPU is more powerful than your CPU. Try it on a laptop or server configuration instead. Ofcourse filling a rectangle will by default be faster, because you can assume block copies, but if you have a particularily bad gfx card, the difference will be negligable.

And that's really the problem with optimization these days. Unless you're calling a function 1000 times or more a frame, there really isn't much point in optimizing it.