Remember when I mentioned we had ported our #fire propagation #cellularAutomaton from #Python to #Julia, gaining performance and the ability to parallelize more easily and efficiently?
A couple of days ago we had to run another big batch of simulations and while things progressed well at the beginning, we saw the parallel threads apparently hanging one by one until the whole process sat there doing who know what.
Our initial suspicion was that we had come across some weird #JuliaLang issue with #multithreading, which seemed to be confirmed by some posts we found on the Julia forums. We tried the workarounds suggested there, to no avail. We tried a different number of threads, and this led to the hang occurring after a different percent completion. We tried restarting the simulations skipping the ones already done. It always got stuck at the same place (for the same number of threads).
So, what was the problem?
1/n