Crash on mac and windows in memcpy with GC?
BlitzMax Forums/BlitzMax Programming/Crash on mac and windows in memcpy with GC?
| ||
| There's been a gremlin in my code for a long time that I can't ignore any longer. When a memcpy occurs (most notable when converting a pixmap's format) sometimes I get a crash. This seems to only happen with multithreaded compile, as before I moved my project to MT I didn't have this problem and nothing related to the texture creation is changed. However it doesn't just happen with threads, the crash will happen on the primary thread as well. Here's one snipet of an example crash log Thread 0 Crashed: Dispatch queue: com.apple.main-thread 0 libSystem.B.dylib 0xffff1250 __longcopy + 80 1 libSystem.B.dylib 0xffff0876 __memcpy + 214 2 libGLImage.dylib 0x93dcfc93 glgProcessPixelsWithProcessor + 725 3 GLEngine 0x1368cd0a gleTextureImagePut + 1433 4 GLEngine 0x1368a490 glTexImage2D_Exec + 1427 5 libGL.dylib 0x914c245f glTexImage2D + 87 ... That crash occured while a texture was being generated from a pixmap. Similar crashes will occur when converting a pixmap format. It seems largely connected to the garbage collector, as if I put a GCCollect right before the copy it tends to crash more frequently. Additionally when I wrap the function that the copy will happen in with GCSuspend and GCResume it tends to happen less... but doesn't stop completely (perhaps a collect is already running when the suspend is called which doesn't get interrupted?). I tried turning the garbage collector to manual but then I started getting hanging... I'm rather confused and am pretty much out of ideas. Any thoughts or suggestions? This also seems to happen the most with pixmaps I get from brucy's freeimage mod but I can't confirm that it's just those pixmaps (and once they're in bmax pixmaps it shouldn't matter the source any way...) |
| ||
| are you doing memcopy in the main thread? (just an idea, I dont know if it really affects anything) |
| ||
| Doesn't matter where it happens, main or child. That crash above is specifically in the main (I figured it would be easier to manage things if it's in the main) |
| ||
| I would revert the project to single-threaded if possible... It seems like a rare MT bug. It would be best if you could post simplified code that reproduces it. |
| ||
| working on finding the time to punch up a simplified example, but haven't found it yet, especially difficult since it's not an every time type bug, but a when the stars align and therefore the memory doesn't... doing more testing on the PC I can confirm its exactly the sample crash, specifically it's in ConvertPixelsToStdFormat in ConvertPixels in Convert on a pixmap. Interestingly running it with the new 1.40 release with the MT debugger on mac when it crashes I get an array out of bounds exception. Combined with where it crashes (pixel.bmx, line 107) it appears to confirm my suspicion that under some circumstances the garbage collector (or something) will shift a memory block while it's being copied, this in turn puts the array out of whack and boom, crash. Once again this happens on the primary as well as child threads on a MT app. I can't revert the project to single threaded as there are some things that just aren't practical in a single thread and they're critical to my program (specifically background loading of pictures which can take a long time for a single large picture and I need to churn through LOTS while doing other things...) I still suspect the garbage collector since it's the most likely thing to be causing a block of memory to get shuffled about... I will try to punch up a simplified example and post in the bug reports but until then if anyone has any ideas I'd love to give them a shot... |
| ||
| I would also suspect the GC... is it also possible that some gfx memory is being GC'd causing the GL memcopy to crash on occasion? I vaguely remember there were some issues with the GC and OGL in places... can't remember if they were fixed - or if a particular fix has a knock-on effect. |
| ||
| I think the OGL connection is likely just random, as I will get the same crash with a strait TPixmap conversion or copy. It just happens to be copying the memory in the posted crash to opengl rather than to another pixmap. That said I would be interested in the GC/OGL connection as perhaps there's something that can be gleaned related to this... |
| ||
Here's a little sample, it's not exactly the same crash I'm seeing, but I think it's probably the same root cause... This is crashing on my mac as soon as I launch it.
SuperStrict
Function ConvertPicture:Object(in:Object) ' function to be spawned in a child thread
Local pixm:TPixmap = LoadPixmap("sample.jpg") ' load a pixmap, the larger the picture the better
If(pixm.format = PF_RGBA8888) Then Print "already PF_RGBA8888"
Local anotherpixm:tpixmap = pixm.Convert(PF_RGBA8888) ' do the format conversion, crash could happen in here...
Local yetanotherpixm:TPixmap = anotherpixm.copy() ' do a copy, this could also crash. This uses up more memory for yet more cleanup
Return yetanotherpixm ' return value to let it be stored in ram for a bit
End Function
Local onConversion:Int = 1
Local convertThread:TThread = CreateThread(ConvertPicture, Null)
Print "Starting first conversion"
' loop until escape is pressed
While(True) ' repeat forever
Local aPixm:object = ConvertPicture(Null) ' do a copy on the main thread as well for some memory retention and more ram thrashing
Print GCMemAlloced() + " collected " + GCCollect() ' thrash the garbage collector to try to provoke a crash
If(Not ThreadRunning(convertThread)) ' if the thread is done
convertThread = CreateThread(ConvertPicture, Null) ' start it again
onConversion:+1
Print "conversion " + onConversion
End if
Wend
specifically it crashes when the main thread goes to load the picture as well, without that it ran for a while without incident, but I will comment and let it run longer to see if I can get the exact same crash. |
| ||
| Had a power failure which set back the testing a bit. But after recovering if I try to run with the main thread convertpicture and GCCollect calls removed it crashes right away in debug mode... the main thread is doing a GCResume for some reason and the child is creating a new pixmap... however in non debug it seems to run just fine... still very confusing update: if you call GCCollect too fast it seems like a mutex that blocks simultaneous GCCollect calls gets stuck and the app will just idle out... definitely something wacky going on with the garbage collector in MT |
| ||
| With the debugger enabled I get a recursive GC collect that seems to lock up the memory system. Doesn't happen without debug on... there's definitely some issues with the MT garbage collector. |
| ||
| I've opened a bug report thread at http://www.blitzbasic.com/Community/posts.php?topic=91117 in the hopes of getting some exposure to someone more intimately aware of the threading and GC systems as they're turning into quite a rats nest as I dig in from my perspective at least. Still desperate for any ideas or suggestions of things to try. Also curious can anyone else reproduce crashing or hanging on the sample in debug or regular mode? At this point I just want to know if I've gone totally insane or just partially. |
| ||
| I'm successfully using MT in my applications and may be able to help The sample you provided, to me, seems odly formed and not a very good real-world example. For example, your "thread" is continually called like a function and doesn't really provide a big advantage in using it this way. I also find it odd that both your thread and main thread are constantly calling the same block of code - again, not a very good real world scenario. I'd be interested in seeing a better example that more colsely resembles what is happening in your real application On a side note, I have noticed some odd crashes with MT in cases where the existance of the thread was very short, or the life of a locked mutex was extremely short. Maybe try putting a small delay of 20ms or so at the end of the thread function and see if it improves. |
| ||
| The example is merely to demonstrate that there's an underlying problem, not to illustrate my usage. the reason the same block is called from the thread and the main thread is simply to abuse the memory faster and I didn't want to write 2 functions. I've done of a lot of playing with the example as well (such as putting the load outside the child thread and just doing converts, or making the child thread just loop converts forever so it's not constantly being relaunched, removing the main thread function call, etc.) sometimes things work, and then I'll run the same example with debug on and it will crash. Also if you move around the GCCollect call you will get different results. There's a fundamental problem since various more/less appropriate applications of multithreading will cause it. Relating delay, I can get crashes when the main and child threads are running both for extended periods. However under some circumstances I can create a hang when 2 things appear to be racing to free at the same time, this would be related I believe to the garbage collector calling an application lock, perhaps when the application is already busy locking for a free... This is why I started the support thread, there's a lot of locking of various things in the core of the GC and it's all tangled up, and on top of that I think there's a problem like you mentioned with locking/unlocking too fast. The real world scenario (haven't made a simplified example yet as it's VERY embeded in my programs flow) is a display starts, and a child thread is spawned to load pictures for use in the display (using freeimage to be precise so no it's not related to the graphics system only being accessable from the main thread). Sometimes everything works flawlessly. Sometimes It will crash right away, some times it will crash after processing 50 pictures, etc. It's very random... Thank you for the feedback, I'll try peppering some things with delays and see if that has any effect. |
| ||
| I'm at work right now, but now that I think about it, I also have a pice of code which also involves some pixmap manipulation that I can get to run great, as well as crash randomly depending on where I lock and unlock a Mutex. I'll look at that piece of code tonight when I get home and see if we have some similarities |
| ||
| I will be in your debut just for looking Jon, I've got a serious case of the crazys from this and it's pretty vital I get it sorted out... Here's a process sample from when I can get what I suspect is the double lock. I sent a different one to Brucy the other day to have a look at, and I believe there are some differences between the 2 (which again would imply that randomly too many/too fast locks = problems)
Call graph:
2435 Thread_100498 DispatchQueue_1: com.apple.main-thread (serial)
2435 start
2435 _start
2435 main
2435 -[NSApplication run]
2435 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
2435 _DPSNextEvent
2435 AEProcessAppleEvent
2435 aeProcessAppleEvent
2435 dispatchEventAndSendReply(AEDesc const*, AEDesc*)
2435 aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned long, unsigned char*)
2435 _NSAppleEventManagerGenericHandler
2435 -[NSAppleEventManager dispatchRawAppleEvent:withRawReply:handlerRefCon:]
2435 -[NSApplication(NSAppleEventHandling) _handleCoreEvent:withReplyEvent:]
2435 -[NSApplication(NSAppleEventHandling) _handleAEOpen:]
2435 -[NSApplication _sendFinishLaunchingNotification]
2435 -[NSApplication _postDidFinishNotification]
2435 -[NSNotificationCenter postNotificationName:object:]
2435 -[NSNotificationCenter postNotificationName:object:userInfo:]
2435 _CFXNotificationPostNotification
2435 __CFXNotificationPost
2435 _nsnote_callback
2435 run
2435 4
2435 415
2435 802
2435 639
2435 132
2435 54
2435 _brl_system_TMacOSSystemDriver_Poll
2435 updateEvents
2435 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
2435 _DPSNextEvent
2435 BlockUntilNextEventMatchingListInMode
2435 ReceiveNextEventCommon
2435 RunCurrentEventLoopInMode
2435 CFRunLoopRunInMode
2435 CFRunLoopRunSpecific
2435 __CFRunLoopRun
2435 __CFRunLoopDoObservers
2435 CFQSortArray
2435 CFSortIndexes
2435 malloc_zone_memalign
2435 szone_memalign
2435 szone_malloc_should_clear
2435 tiny_malloc_from_free_list
2435 tiny_free_list_add_ptr
2435 _sigtramp
2435 semaphore_wait_trap
2435 Thread_100499 DispatchQueue_2: com.apple.libdispatch-manager (serial)
2435 start_wqthread
2435 _pthread_wqthread
2435 _dispatch_worker_thread2
2435 _dispatch_queue_invoke
2435 _dispatch_mgr_invoke
2435 kevent
2435 Thread_100503
2435 thread_start
2435 _pthread_start
2435 threadProc
2435 _brl_threads_TThread__EntryStub
2435 bb_ThreadedPrepareElements
2435 191
2435 532
2435 141
2435 bbGCCollect
2435 collectMem
2435 343
2435 842
2435 bmx_freeimage_delete
2435 free
2435 __spin_lock
Thread 1 seems to be handling the event que, and locking and freeing junk as a result of mucking about. Thread 2 you always get in threaded apps, it seems to be the thread manager as best as I can tell... Thread 3 is my child child thread (note, just 1 child thread at this point) trying to do cleanup after it's done with a freeimage, the freeimage is in it's delete method, which calls free on it's allocated memory block, that's halting (I assume) to wait for the main thread to get done freeing things... which it won't because (again I assume) it's been confused by the child thread trying to free things. And yet again, just for the record, this is just one manifestation in one program. |
| ||
| I literally COVERED the suspected problem areas with Delay(20)'s and it seems to not hang (usual disclaimer with randomish crashes etc.)... I think you're very much on to something with the high speed lock/unlock causing problems, and that feeds back to my theory that the GC problem could actually be a thread control issue (i.e. the threads locking/unlocking)... Hope! there is hope! |
| ||
It was the same case in a project of mine. I purposely had to make my Lock/Unlock take longer than it should. If I remember right, here is what I did: (pseudo)lockMutex(imageMutex) thisPixmap=GetAPixmap()'external function unlockMutex(imageMutex) thisImage=LockPixmap(thisPixmap) 'The above code would randomly crash from 30 seconds to 2 minutes into running Then, to force the time between LockMutex and UnlockMutex to be longer, I simply kept the mutex locked until thisImage was created... lockMutex(imageMutex) thisPixmap=GetAPixmap()'external function thisImage=LockPixmap(thisPixmap) unlockMutex(imageMutex) 'This time, the above code works crash-free (and I've even let it run overnight) 'and the only difference is the location of UnlockMutex Anyways, the above example is how I got my code to run absolutely crash free |
| ||
| Thanks! I'm so far so good with a delay 20 added before a manual gccollect() call added after resuming the garbage collector (I had problems with the collector running while doing Some of the copys sometimes specifically in child threads. I think this also is prevent too many lock/unlock cycles on some mutexes... I'll need more poking and testing to verify but this is the first positive progress I've seen on this problem in a long time so I'm quite optimistic! |
| ||
another sampleSuperStrict Global theMutex:TMutex = CreateMutex() Global counter:Int = 0 Function tfunc:Object(in:Object) While(True) LockMutex(theMutex) counter:+1 Local pixm:TPixmap = CreatePixmap(2048, 2048, PF_RGBA8888) UnlockMutex(theMutex) Wend End Function CreateThread(tfunc, Null) Print "starting" While(True) LockMutex(theMutex) counter:+1 UnlockMutex(theMutex) If(counter >= 10000000) Print MilliSecs() counter = 0 End If Wend tossed that up on my PC while trying some stuff, it crashes right away on the create pixmap in the child thread with an access violation while trying to alloc the memory. |
| ||
Compiled on Linux, your example above also crashes with a segmentation fault.. But to further prove a point, add a simple delay in the thread and presto!SuperStrict Global theMutex:TMutex = CreateMutex() Global counter:Int = 0 Function tfunc:Object(in:Object) While(True) LockMutex(theMutex) counter=counter+1 Local pixm:TPixmap = CreatePixmap(2048, 2048, PF_RGBA8888) UnlockMutex(theMutex) Delay(100) Wend End Function CreateThread(tfunc, Null) Print "starting" While(True) LockMutex(theMutex) counter=counter+1 UnlockMutex(theMutex) If(counter >= 10000000) Print MilliSecs() counter = 0 End If Wend |
| ||
| I'm having great success with a bunch of delays peppered around. No more hangs and no crashes, however it does cause the application to leak like a sieve... it did this some other times when messing around with auto vs/manual GC... I'm not sure where it comes from but it's related as the memory is totally fine without delays but it will either crash or hang sooner or later. With delays no crash or hang but it will leak and leak until it chokes... At this point I'll take the leaks over the crashing but still something to get worked out... Still grinding |
| ||
| I have no problems with Auto GC with my threaded applications. I remember that you mentioned that you modified the CG code and now run it manually. You may find now that you have injected some delays in your thread, that if you restore the original GC code, it may work just fine for you and git rid of your memory leak |
| ||
| I restored the GC code before starting with the delays (on the theory that by that point I'm sure I'd broken something). I've noticed the leaking in the past under certain circumstances. I think I may try modifying the GC again to see if that cleans up some of the leaking. |
| ||
| You aren't by chance using MaxGUI in your thread, are you? I only mention this because you could create a memory leak by not calling FreeGadget()... |
| ||
| MaxGUI is used earlier in my program, but not in any child threads, and is totally shut down by the time I get to the part that runs for a while and leaks. I'm going to look back over my code and see if I can narrow down what object(s) are leaking, maybe there's a free that's getting missed somewhere due to my structure. |
| ||
| Hi, I've found one issue to do with allocating lots of large un-GCed memory - eg: the way pixmap does. Can you give this a try - it at least fixes the above! http://www.blitzbasic.com/tmp/blitz.mod.zip Replace your existing mod/brl.mod/blitz.mod folder with this 'un. |
| ||
| I've been making lots of workarounds, I'll pull as many out as I can and give this a go right now. Thanks mark! |
| ||
| So far so good on mac an PC. I am noticing the occasional slight delay (half a second or so) sometimes right about when I would expect a large free to be happening (such right about when I would expect my program to release all contact with a large pixmap), is this likely to be a result of the new changes or just my imagination? It's not a deal breaker (I mean I am dealing with LARGE chunks of memory so I should expect some things take a little time), just curious if that's a sign of the new code kicking in. |
| ||
| Seems better than before, however it will still crash or hang if 2 allocs happen at the same time, and possibly one triggers the collector... Related: I've been toying with turning off the auto collector so I can control when the collects happen (so I know an alloc isn't taking place). Whenever allocs will happen I lock a mutex, I then call GCCollect() whenever the mutex isn't locked in my main loop. This seems to work from a stability standpoint (as long as I don't miss any allocs with my mutex lock) but it creates a pause that grows in duration (especially on PC, but mac as well) the longer my program runs. I further set it so it only ran a GCCollect() once per second in the main loop, if the mutex wasn't locked, and it was perfectly smooth on the PC to start, I came back about 20 minutes later and there was about a 1/4 second pause once per second... [Update] Here's a sample of my application locking up due to 2 allocs at the same time... Main thread is trying to alloc an object, which triggers a GCCollect, which tries to alloc an object in the collection process, and end in a spin lock. Thread 2 is trying to alloc an object which causes the GC to try to lock the collector mutex and waits.
Call graph:
2367 Thread_179469 DispatchQueue_1: com.apple.main-thread (serial)
2367 start
2367 _start
2367 main
2367 -[NSApplication run]
2367 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
2367 _DPSNextEvent
2367 AEProcessAppleEvent
2367 aeProcessAppleEvent
2367 dispatchEventAndSendReply(AEDesc const*, AEDesc*)
2367 aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned long, unsigned char*)
2367 _NSAppleEventManagerGenericHandler
2367 -[NSAppleEventManager dispatchRawAppleEvent:withRawReply:handlerRefCon:]
2367 -[NSApplication(NSAppleEventHandling) _handleCoreEvent:withReplyEvent:]
2367 -[NSApplication(NSAppleEventHandling) _handleAEOpen:]
2367 -[NSApplication _sendFinishLaunchingNotification]
2367 -[NSApplication _postDidFinishNotification]
2367 -[NSNotificationCenter postNotificationName:object:]
2367 -[NSNotificationCenter postNotificationName:object:userInfo:]
2367 _CFXNotificationPostNotification
2367 __CFXNotificationPost
2367 _nsnote_callback
2367 run
2367 4
2367 1322
2367 2422
2367 77
2367 666
2367 809
2367 278
2367 _sidesign_minib3d_TEntity_MoveEntity
2367 bbObjectNew
2367 bbGCAllocObject
2367 allocMem
2367 collectMem
2367 353
2367 876
2367 _bah_freeimage_TBPHolder_Create
2367 bbObjectNew
2367 bbGCAllocObject
2367 __spin_lock
2367 Thread_179470 DispatchQueue_2: com.apple.libdispatch-manager (serial)
2367 start_wqthread
2367 _pthread_wqthread
2367 _dispatch_worker_thread2
2367 _dispatch_queue_invoke
2367 _dispatch_mgr_invoke
2367 kevent
2367 Thread_179481
2367 thread_start
2367 _pthread_start
2367 threadProc
2367 _brl_threads_TThread__EntryStub
2367 bb_ThreadedPrepareElements
2367 190
2367 539
2367 brl_filesystem_StripDir
2367 bbStringSlice
2367 bbStringNew
2367 bbGCAllocObject
2367 pthread_mutex_lock
2367 new_sem_from_pool
2367 _sigtramp
2367 semaphore_wait_trap
|
| ||
| I'm a bit confused by this now... seems to be the last lingering problem with my current structure. The garbage collector is in mode 2 (manual). The main thread has locked a mutex through TryLockMutex() that controls if the garbage collector is allowed to be called. Since it succeeded, it calls GCCollect() (translates to bbGCCollect) and that calls collectmem, then something, then it calls pthread_detach, which calls pthread_join, and then a spin lock... The child thread is waiting for the garbage collector mutex to unlock so it can continue with it's task. and seems to be waiting patiently like it should... What's up with the detach and joins?
Call graph:
2315 Thread_323551 DispatchQueue_1: com.apple.main-thread (serial)
2315 start
2315 _start
2315 main
2315 -[NSApplication run]
2315 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
2315 _DPSNextEvent
2315 AEProcessAppleEvent
2315 aeProcessAppleEvent
2315 dispatchEventAndSendReply(AEDesc const*, AEDesc*)
2315 aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned long, unsigned char*)
2315 _NSAppleEventManagerGenericHandler
2315 -[NSAppleEventManager dispatchRawAppleEvent:withRawReply:handlerRefCon:]
2315 -[NSApplication(NSAppleEventHandling) _handleCoreEvent:withReplyEvent:]
2315 -[NSApplication(NSAppleEventHandling) _handleAEOpen:]
2315 -[NSApplication _sendFinishLaunchingNotification]
2315 -[NSApplication _postDidFinishNotification]
2315 -[NSNotificationCenter postNotificationName:object:]
2315 -[NSNotificationCenter postNotificationName:object:userInfo:]
2315 _CFXNotificationPostNotification
2315 __CFXNotificationPost
2315 _nsnote_callback
2315 run
2315 4
2315 1322
2315 2422
2315 77
2315 bbGCCollect
2315 collectMem
2315 244
2315 pthread_detach
2315 pthread_join$NOCANCEL$UNIX2003
2315 __spin_lock
2315 Thread_323552 DispatchQueue_2: com.apple.libdispatch-manager (serial)
2315 start_wqthread
2315 _pthread_wqthread
2315 _dispatch_worker_thread2
2315 _dispatch_queue_invoke
2315 _dispatch_mgr_invoke
2315 kevent
2315 Thread_323610
2315 thread_start
2315 _pthread_start
2315 threadProc
2315 _brl_threads_TThread__EntryStub
2315 bb_ThreadedPrepareElements
2315 183
2315 549
2315 _bb_TElement_init
2315 135
2315 brl_threads_LockMutex
2315 _brl_threads_TMutex_Lock
2315 pthread_mutex_lock
2315 new_sem_from_pool
2315 _sigtramp
2315 semaphore_wait_trap
|
| ||
| Hi, Unless you post some more runnable code, I'm afraid there's not much I can do - stack traces aren't particularly useful in these cases, as with threading the problem may have already occured long before the crash. Have you tried running the app with plain old auto-GC enabled? There's a chance that if you've disabled GC and the app needs to allocate memory and can't it'll just fail and BANG - esp. with large allocations as I suspect your app is using. |
| ||
| Auto GC causes many many more crashes as it will fire when something is allocating quite often and then it dies. The reason I've switched back to manual GC is I can control when the collect happens, and therefore be sure than no child threads are busy allocating anything (through the use of a mutex). I'm still working on trying to punch up an example, but without much success, as even in my sprawling project it doesn't happen reliably so it's very hard to narrow down what/where/when/how/why something is going wrong. The only commonality I notice (as illustrated by the traces) is that problems are always within an alloc or free, and are much much much more prevalent if memory is being handled in 2 places at once (such as an alloc in the main and child threads at the same time). I was experiencing some problems with semaphores a while ago as well which caused me to abandon them as a means of restricting simultanious access, I'll see if I can re-create that problem with some sample code as perhaps that will be easier than my current flow. I don't think there's an allocation space issue, as if I dissabled the collector all together (just to see) it will run up to around 1gb alloced before anything bad starts to happen, where as it is usually running around 60-260mb with manual collection, and if I put it on auto it will spike up to about 400 before collecting sometimes. So there should be plenty of overhead, I tend to collect roughly every 10th of a second (assuming there's nothing blocking the collect) so the pool never rises, it will collect after every large alloc/free (not guaranteed due to timing but it should never pass 2 large alloc/free's), and it runs in a loop with the same content, usually for hours (6+) without any problems, and sometimes it will choke and die within minutes. Will try to get more sample code for you, just particularly curious what the "2315 pthread_join$NOCANCEL$UNIX2003" trace meant, and also why it's detaching/joining in the collect cycle. |
| ||
| I am also still having problems in my threaded app that also deals with pixmaps. It will randomly hang (not a full crash per se). I have tried the modified blitz.mod posted by Mark, but I'm still having problems. |
| ||
Here's an interesting dump I got from a tester. Still no code I know, still working on that...
Call graph:
2882 Thread_1175 DispatchQueue_1: com.apple.main-thread (serial)
2882 start
2882 main
2882 launchd_runtime
2882 mach_msg
2882 mach_msg_trap
2882 Thread_1176
2882 thread_start
2882 _pthread_start
2882 kqueue_demand_loop
2882 select$DARWIN_EXTSN
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5):
mach_msg_trap 2882
select$DARWIN_EXTSN 2882
Sample analysis of process 217 written to file /dev/stdout
This time thread 1 (not my thread, the one bmax runs I assume to trap events) seems to have found something more interesting to occupy it's time.... Will keep trying to get a good example of some form of this hanging/crashing. It keeps manifesting in such different ways it's quite annoying. |
| ||
| Just a follow up: Now running BMX v1.41 My MT code is now rock solid - but not all due to BMX 1.41. In my case, it came back to the fact that OGL isn't 100% thread safe. My random crashes appear to have came from the fact that I was Locking/UnLocking mutexes around Max2D commands (mainly DrawImage, which turned out to be the biggest culprit). My before code (pseudo) that would crash: (Notice that I'm locking a mutex around an external c function, and around drawImage) (Note that the Update() method happens in its own thread, and the Draw() method happens in the main thread) Type TWebCam Field image:TImage Field pixmap:TPixmap ... ... Method Update() LockMutex(pixmapMutex) Self.pixmap.pixels=grab_frame() 'grab_frame is an external c function UnlockMutex(pixmapMutex) LockMutex(imageMutex) Self.image=LoadImage(Self.pixmap) UnlockMutex(imageMutex) End Method Method Draw(x:Int,y:Int) LockMutex(imageMutex) DrawImage(Self.Image,x,y) UnlockMutex(ImageMutex) End Method End Type AFTER: Since the webcam image is returned as a pixmap, and I only need a TImage when its drawn, I make one on the fly in my draw method. Also notice that I no longer lock a mutex around the external c function, or the Max2D DrawImage() function... Type TWebCam Field pixmap:TPixmap ... ... Method Update() Local grabbedPixmap:TPixmap=CreatePixmap(640,480) grabbedPixmap.pixels=grab_frame() 'grab_frame is an external c function LockMutex(pixmapMutex) Self.pixMap=grabbedPixmap UnlockMutex(pixmapMutex) End Method Method Draw(x:Int,y:Int) Local thisImage:TImage LockMutex(pixmapMutex) thisImage=LoadImage(Self.pixmap) UnlockMutex(pixmapMutex) DrawImage(thisImage,x,y) End Method End Type These simple changes have made my application 100% stable. Ima747: Look for similar things in your MT code, and find way around Locking/Unlocking mutexes around Max2D functions and external c functions. Then you will either fix your problem, or eliminate the possibility that something that you are threading isn't really thread safe... |