Details
-
Bug
-
Resolution: Done
-
P1: Critical
-
5.14.2
-
None
-
-
6533d1a47309956e8acda90eb4c41d245e817c93 (qt/qtbase/dev) c6f0236892c0002b11512683754f2b22ae979eec (qt/qtbase/5.15) a6699dd0df0be39fc98a54642ac320e1d7a611d2 (qt/qtbase/5.12)
Description
I've come across a nasty race within QFseventsFileSystemWatcher when destructing it. The race goes like this:
1) We destruct QFseventsFileSystemWatcher, which calls FSEventStreamStop (https://github.com/qt/qtbase/blob/dev/src/corelib/io/qfilesystemwatcher_fsevents.mm#L316) and FSEventStreamInvalidate/FSEventStreamRelease (https://github.com/qt/qtbase/blob/dev/src/corelib/io/qfilesystemwatcher_fsevents.mm#L550-L551).
2) The FSEvent* calls will happen on the same thread as the destructor is being called on, which will be different to the thread that the FSEvent* events are popping out on.
3) So, there could be a case where we are in the middle of processing an event here: https://github.com/qt/qtbase/blob/dev/src/corelib/io/qfilesystemwatcher_fsevents.mm#L74 , but the QFseventsFileSystemWatcher has already died.
This can be seen with the attached project where if you run it then occasionally it will terminate like so. I have this set up to run with tsan in the hope that it catches it more easily. But I don't have Qt compiled with tsan enabled. If we did, then I bet we'd catch this entirely reproducibly.
$ ./FilesystemWatcherCrash
Start
ThreadSanitizer:DEADLYSIGNAL
==15744==ERROR: ThreadSanitizer: SEGV on unknown address (pc 0x00010e6ce5eb bp 0x7e800007e2d0 sp 0x7e800007e2d0 T64543391)
==15744==The signal is caused by a READ memory access.
==15744==Hint: this fault was caused by a dereference of a high value address (see registers below). Dissassemble the provided pc to learn which register value was used.
Finish
#0 operator==(QString const&, QString const&) qstring.cpp:3386 (QtCore:x86_64+0xbc5ea)
#1 QFseventsFileSystemWatcherEngine::checkDir(QHash<QString, QFseventsFileSystemWatcherEngine::DirInfo>::iterator&) qfilesystemwatcher_fsevents.mm:119 (QtCore:x86_64+0x2d5b30)
#2 QFseventsFileSystemWatcherEngine::processEvent(__FSEventStream const*, unsigned long, char*, unsigned int const, unsigned long long const*) qfilesystemwatcher_fsevents.mm:249 (QtCore:x86_64+0x2d6b17)
#3 callBackFunction(__FSEventStream const*, void*, unsigned long, void*, unsigned int const*, unsigned long long const*) qfilesystemwatcher_fsevents.mm:75 (QtCore:x86_64+0x2d99d9)
#4 implementation_callback_rpc <null>:514896 (FSEvents:x86_64+0x2992)
#5 _Xcallback_rpc <null>:514896 (FSEvents:x86_64+0x1dfd)
#6 FSEventsD2F_server <null>:514896 (FSEvents:x86_64+0x1cf6)
#7 __create_d2f_port_source_block_invoke <null>:514896 (FSEvents:x86_64+0x1c60)
#8 __tsan::dispatch_callback_wrap(void*) <null>:514896 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x72061)
#9 _dispatch_client_callout <null>:514896 (libdispatch.dylib:x86_64+0x2657)
#10 _dispatch_continuation_pop <null>:514896 (libdispatch.dylib:x86_64+0x4817)
#11 _dispatch_source_invoke <null>:514896 (libdispatch.dylib:x86_64+0x144bd)
#12 _dispatch_lane_serial_drain <null>:514896 (libdispatch.dylib:x86_64+0x7af5)
#13 _dispatch_lane_invoke <null>:514896 (libdispatch.dylib:x86_64+0x85d5)
#14 _dispatch_workloop_worker_thread <null>:514896 (libdispatch.dylib:x86_64+0x11c08)
#15 _pthread_wqthread <null>:514896 (libsystem_pthread.dylib:x86_64+0x2a3c)
#16 start_wqthread <null>:514896 (libsystem_pthread.dylib:x86_64+0x1b76)==15744==Register values:
{{ rax = 0x00007b3c00000798 rbx = 0x00000000b28f674d rcx = 0x0000f65c000046c0 rdx = 0x0000000000000008}}
{{ rdi = 0x0000000000000000 rsi = 0x0000000000000000 rbp = 0x00007e800007e2d0 rsp = 0x00007e800007e2d0}}
{{ {{ r8 = 0x0000000000000100 r9 = 0x0000000000000000 r10 = 0x0000000000000064 r11 = 0x0000000000000000}}}}
{{ r12 = 0x00007e800007e4f8 r13 = 0x00007e800007e448 r14 = 0x00007b0c00001920 r15 = 0x00007b24000012d0}}
{{ ThreadSanitizer can not provide additional info.}}
{{ SUMMARY: ThreadSanitizer: SEGV qstring.cpp:3386 in operator==(QString const&, QString const&)}}
{{ ==15744==ABORTING}}
{{ [1] 15744 abort ./FilesystemWatcherCrash}}
My suggested fix is to dispatch all FSEvent* calls that are made in QFseventsFileSystemWatcher onto the dispatch queue we've created where events are processed. During destruction that will have to be `dispatch_sync`.
Clang have had a similar issue and solved it in the same way as I'm suggesting: https://reviews.llvm.org/D74371