Reports

So, would it be correct to say that CAS loop is safe to use because it is just a zero-risk bet to expect CPU not to do context switch so fast?

It is important to distinguish between "safe" in the sense that behavior is well defined under the relevant circumstances, and "safe" in the sense that the program can be relied upon to exhibit satisfactory characteristics. Atomic CAS is safe in the former sense, but algorithms using atomic CAS are not necessarily safe in the latter sense. For instance, they don't necessarily guarantee that all threads involved make progress in finite time.

Your example is not safe in the second sense. The use of atomic CAS does not avoid the possibility of one or more threads getting stuck in the loop, unless probabilistically.

In that case,

What exactly makes Compare-and-swap (CAS) loop a better choice in highly concurrent environment?

Individual atomic operations are typically much faster than taking out a lock, performing several operations, and then releasing the lock again. That can make lock-free approaches based on atomic operations consume less wall time and / or accommodate higher concurrency than lock-based approaches, at the cost of possibly consuming more CPU time. Such lock-free approaches often depend on atomic CAS.

But good, correct lock-free algorithms are tricky to develop. Neither lock-free approaches in general nor atomic CAS in particular is a silver bullet. And although they may provide for shorter completion times when used correctly, they do not magically give you more resources to work with, and they do not automatically use your resources more efficiently than lock-based approaches do.

How to cultivate that intuition of how many operations roughly it usually takes before CPU performs a context switch?

Don't. At least not while writing userspace code. Such intuition is not anywhere among the top things you need bear in mind to write good concurrent code, with or without locks. Relying on such assumptions is likely to make your code brittle.

what makes us believe the loop will eventually terminate and when?

To the limited extent that we do believe that, it is based on an assumption that either the timing will eventually work out, or that the other threads contending for the same resources will eventually run out of work to do. How reliable those assumptions are depends on many factors, among them the nature of the tasks, their number, and the number of concurrent execution units available to run them.

For example, what would be the upper bound of the number of retries of the loop or are there some underlying (os-level) guarantees regarding that?

What makes you think there would be any such thing in the general case?

79487942