The most likely culprit is a race condition. A race condition is a bug that depends both on the code and on timing. They are the worst type of problem because you could test for hours and not see one, but after you deploy, they always appear.
Race conditions can also happen on single core systems, however they can be prevented by running all the competing threads with the FIFO scheduling. It’s also sometimes possible to prevent race conditions with adjustment of priorities. This requires careful verification of the code. Some specific race conditions can be prevented by careful coding but this requires an understanding of the compiler generated code.
These three ways of dealing with race conditions have the advantage that they incur no additional overhead. This is tempting but I personally discourage it.
mutexes, semaphores, read-write locks, and condition variables are the usual ways to prevent race conditions. If you are not running in a thread context where you can make a system call, such as in an interrupt handler, sometimes spin locks will solve the problem.
The basic idea behind a race condition is that you have two pieces of code that are dependent on the same data. If your code does not protect against both pieces of code using the data at the same time, then you can get a race condition. In a system with a single processor, one thread could be in the middle of an update when re-scheduling occurs, allowing the the 2nd thread access. FIFO scheduling prevents this. Differing priorities can prevent one thread from being interrupted, but not the other. This is sometimes sufficient. In a system with more than one processor, none of this works. Two threads at different priorities can be in the same code at once.
Programmers new to these ideas are often surprised when an extremely unlikely event occurs. I was. Unfortunately Murphy’s law is a fact here.