Can someone help me understand particle systems?
Monkey Archive Forums/Monkey Discussion/Can someone help me understand particle systems?
| ||
Is a particle system any more than drawing a lot of sprites with some form of randomized, yet controlled movement? Or is there more to it, such as the drawing is done differently so to not affect rendering times? Or is it that the math applied to the movement of the particle is done in such a way that it's done very quickly? |
| ||
I'll give it a shot. Particles are basically objects in a collection. The collection would be the particle emitter, and the objects are the individual particles. Each particle knows things like its own direction, speed, gravity etc. The Emitter creates these objects and sends them on their way. Each cycle of your game tells the particles how much time has passed and that they need to update their positions. Then when your graphic draw routine runs, it calls the Emitter collection which in turn calls all it's particles Draw commands. As particles exceed their boundaries or die off, they remove themselves from the Emitters collection. Also, the Emiter can use randomization to place the particles on their way creating the random looking effect. I'm sure I've missed something, which others can fill in, but that's the gist. |
| ||
Essentially, it is just a system that draws a lot of sprites with randomized yet controlled movement. However, there's more to it. First of all, like any system, it has some sorts of management. Particles are moved by a bunch of rules and similar rules define similar groups of particles aka particle systems. As for rendering/updating, there are a few things that can be done: 1. Rendering. The first thing you can do is share a single texture for all your particles. That way, you can just draw them all in one batch. Another thing you can do (that works for most, but not all particles) is to render them as point sprites. Point sprites are a feature of OpenGL (GLES too, and pretty sure DIrectX too) that can take 1 point that you send from the application (glDraw with GL_POINT) and turn it into a rectangle that you can then texture. Point sprites may not be supported by all devices and for few particles they may even be a little slower. They are however faster for lots of particles (thousands). 2. Updating. There's not much to optimize here on a high level. You will pretty much have to compute all the stuff for every particle. The only thing you can do is make sure you use arrays for your data (that way it's cache-friendly and you gain some speed). Maybe even interleaved arrays (instead of an array for positions, one for angles, etc. use one array for all the data like this [particle 1 pos, particle 1 color], [particle 2 pos, particle 2 color], etc.) Low-level optimization. If you really want fast particles, you can use SSE (PC) or NEON (iPhone) to do SIMD operations. SIMD = Single Instruction Multiple Data and what it does is compute 4 floats per cycle instead of 1. That way you may gain up to 3 times the performance (4 times minus the overhead of using SIMD) on computation, but not on rendering. From my experience, computation is nothing. Rendering is going to kill you. Many small particles = huge traffic on the bandwidth. Few large particles = huge pixel fillrate. Gotta find the balance between those Hope the wall of text helps :) |
| ||
Awesome, thanks @Carman and @JIM !!! |
| ||
At Monkey's level of abstraction the major performance improvement from having a structured particle system is likely to be found in ensuring re-use of the particle objects rather than constant construction and GC activity. |
| ||
...ensuring re-use of the particle objects... This is why the Diddy particle system uses parallel arrays instead of particle objects. 20 arrays of 10000 elements will have less memory and performance overhead than 10000 objects with 20 fields. |
| ||
@Samah Out of curiosity, did you try to have the particles interleaved? Having separate arrays means that for particle X you have position at an address, speed at address+1000, angle at address+2000 and so on. That is assuming that memory allocation did occur in the best possible way (one after the other) Interleaved means that even with a smaller cache out there, a memory read would still fetch more meaningful data. Assuming the arrays are float, 20 x 1000 x 4 (4 bytes in a float) you get 80000 bytes for particles. With a 64k L1 cache on the CPU, you will get a miss for every particle. With interleaved you get 2 reads for the entire system. That 64k cache is a very high cache (desktop CPUs). On iPad/iPhone you get around 32-64 bytes. In our above example, you get precisely 20 cache misses per particle for parallel arrays and only 2-3 (depending on CPU) on interleaved. Thats 10x faster (on the memory fetching alone). If you actually manage to get your particle to fit 64 bytes (reduce the 20 arrays to 16 = win) you get another 30% boost on the memory fetching part. |
| ||
Updates in Diddy's particle system are plenty fast tbh - it's the rendering that's slow. That's not something I have control over anyway. |
| ||
Hi, Another approach is to do things parametrically - ie: recalc x/y's based only on a 'time' variable. Doesn't work well if you want the emitter to move (coz the particles all move with it) or if you want the particles to interact with anything (coz they can't really) but it can be useful in some situations. Quick example: |
| ||
I don't even... |
| ||
@marksibly Whoa. Thanks for the code, it gives me exactly what I need to start experimenting. @Samah Yeah. I know what you mean. |
| ||
Interleaving arrays: I would think that creating an array of particle classes, that a compiler (or Java JIT) be smart enough to pack the array together when allocating memory? Nonetheless, it looks like interleaving is the way to go, but not mentioned here: http://developer.android.com/guide/practices/design/performance.html An array of ints is a much better than an array of Integers, but this also generalizes to the fact that two parallel arrays of ints are also a lot more efficient than an array of (int,int) objects. The same goes for any combination of primitive types. |
| ||
I would think that creating an array of particle classes, that a compiler (or Java JIT) be smart enough to pack the array together when allocating memory? 10000 instances of a Particle class means 10000 more objects on the heap. Using parallel arrays (or even one serialised array) means no mass pre-instantiation or heap/GC overhead. Also, an array of objects is only an array of heap references. The actual objects are stored elsewhere in memory and would not benefit from L1 cache. |