Whats so slow here? (Triangle fill routine)
Blitz3D Forums/Blitz3D Programming/Whats so slow here? (Triangle fill routine)
| ||
hi! i made a triangle fill routine here. i've seen so many fill routines which were so fast... why is mine so slow? is there any visible mistake? i mean - i've seen this sample below running on 100 fps, and now it's only 20 fps. where is the giant speed leak? i know, everything i'm programing is slow :(... thank you for any help! texture: ![]() |
| ||
Well, if you'd do a load texture, you could load it in VRAM instead. |
| ||
vram? this is a software rendered polygon. without textures it looks like that without textures, it is just high-end performance. the only difference to textures, is that i have to interpolate UV coordinates AND XY coordinates... |
| ||
Well, I would use the VRAM to do these operations, simply because VRAM is usually about 10X faster than regular RAM. |
| ||
what? vram? but i dont have a Graphics3D mode on! this is 2d. how would <you> use vram in this example? |
| ||
The code doesn't even run, here. I get array out of bounds error in debug mode. Also, I advise you specify the destination buffer with WritePixelFast, otherwise I don't think it'll work on my machine. |
| ||
Instead of using a Dim, work with a bank. Try avoiding to have code that looks like "reference (reference(reference))" in big loops. If possible take the reference and bring it to a simple reference% variable, such as the imagebuffer(img), etc. That alone should yeld some speed boost. This piece of code will take a nice chunk of CPU: For x = p1x To p2x u = vgliIntp(x, p1x, p2x, p1u#, p2u#) v = vgliIntp(x, p1x, p2x, p1v#, p2v#) WritePixelFast x, y, Tex(u, v) Next If you use a bank, it will be faster, for your u v map. Use peek to get the u v, and poke to initiate it in the beginning. Access the position of the u v data using a displacement value, calculated with (v * 16 + u) * 4. multiply by 4, to get the float value, as it will be 4 bytes. disp% = (v * 16 + u) * 4 texture_RGBA% = PeekInt (uv_table%, disp%) or, in your case, a more fancy code: disp% = (vgliIntp(x, p1x, p2x, p1v#, p2v#) * 16 + vgliIntp(x, p1x, p2x, p1u#, p2u#) ) * 4 texture_RGBA% = PeekInt (uv_table%, disp%) Note, that multiplying by 16 is actually for jumping to the next line of your texture. if the texture is 32 pixels wide, then this value should be 32, or a variable containing the texture width. Try to have a version of vgliIntp that returns an Int value. Avoid divisions, as it is slower than multiplications. It is preferable to multiply by 0.1 than to divide by 10 for gaining speed in calculating. *** NOTE: And, while I'm thinking about your case, why not just peek at the imagebuffer? Cheers. |
| ||
hi! thanks for the fast reply! i replaced it using banks,but it ain't faster. it is as slow as before. :( |
| ||
But, usually these, types of functions are preferably done in assembly language. |
| ||
I think you must make a DLL for this work, Devils Child. *kotz* I known this since mine last slowdown on my project :/ |
| ||
thank you for your help, i used shr and shl instead of floats and replaced speed critical functions with their content. thanks for help :) |
| ||
So, is it faster? ;P EDIT: tested on my gear. You passed from 20-30 FPS to 85-130 FPS when fully extended. Good! It's in my belief that a good / tight / efficient programming scheme on any language can supplant a poor inefficient programming scheme in Assembly / machine language. Oh, and you could also do this, but it becomes quite unreadable, for a gain of about 8 fps (on my system) WritePixelFast x, y, PeekFloat(Tex, (((p1v + (x - p1x) * iv) Shr 16) * 16 + (p1u + (x - p1x) * iu) Shr 16) * 4) Cheers. |
| ||
I was playing around with the array based approach and trying to minimise work in the writepixel loop. Not really working properly yet as far as indexing goes, I have increased the array size to stop it throwing hissy fits in debug mode. It gives a few extra fps anyway. |
| ||
hey nice speedup, thanks a lot :) |
| ||
Yep, his bumps the speed up by a solid 20% from the previous version (considering same coordinates). |