Concurrent-versations: A Roundtable Discussion of Concurrency Models

“Golang, Kotlin, NodeJs, Python and Rust meet at a bar for a couple of drinks and eventually start flexing about their concurrency models. Let’s see how that goes!”

Golang: It's like I have a cape, but instead of flying, I can run millions of tasks at once.

Kotlin: Yeah, but I'm like the ninja of concurrency. Silent, efficient, and deadly when I need to be. Coroutine is the new Goroutine!

NodeJS: Okay, but can either of you handle the high-pressure job of managing all I/O operations as I do?

Python: And I'm like the wizard of concurrency. I may not have brawn, but I have plenty of brains to get the job done.

Goroutine: Shut Up Python, you’re just like Node with an event loop and a weird whitespace obsession.

Meanwhile, Rust is just sitting on the side, sipping on some beer and telling itself. “Let them “rust” in this stupid conversation. Also, I should try standup comedy”

This is usually the conversation that goes through my mind when I am contemplating which model should I apply to a given problem statement. What scares me more during this journey is that if “change is the only constant” then more models like these are inevitable. Hence this whole conundrum is just going to get worse down the line.

Hence there is a strong need to understand all of these at the foundation level and understand what’s that X factor they introduce which makes them great at what they do.

We need to build a framework/mental model which can describe any model and help us then map to the use-case at hand.

Going back to Day 0

Concurrency had to be introduced to the world for many reasons

Concurrency enables programs to handle multiple requests or users simultaneously, making it crucial for applications that require real-time data processing or fast response times, such as web servers, gaming platforms, or financial trading systems.
Concurrency can help programmers make better use of multi-core processors, which are increasingly common in modern computers and mobile devices.

If after reading the second point if you had the urge to add the comment

“Concurrency is not the same as parallelism, we don’t need multiple cores to achieve concurrency”.

Then hold onto your pants, I already agree with you. For readability's sake in some places parallelism and concurrency might be used interchangeably.

At the core of it, running programs have more or less remained the same. The program is still just a series of instructions that we ask the OS to execute on the core and then give us the output. To make that happen, the operating system uses the concept of a Thread.

It’s the job of the Thread to account for and sequentially execute the set of instructions it is assigned. Threads can create more Threads. All these different Threads run independently of each other and scheduling decisions are made at the Thread level.

Why did we not just stop at Threads?

Threads seemed to have answered the problem of handling traffic and doing computations efficiently. So can we not just throw more threads at the problem and call it a day? The short answer is Yes! we can, but have you ever witnessed all folks at your company just saying Threads are heavy! Don't just go out there and create a lot of them!

No, we are not trying to fat-shame Threads here.

So woke audience on Twitter you hold onto your pants as well.

Yes, the creation of Threads is a ~~heavy~~ complex process and we cannot take them for granted. This is the same reason why you can see every Senior Engineer just going around and saying Add Thread Pooling. That's just another way of saying control the number of Threads you create.

In OS Scheduler we believe!

If you are using Linux, Mac or Windows, your operating system has a preemptive scheduler. This implies that the scheduler can interrupt and switch between threads at any moment, based on various factors like thread priorities and events, such as network data reception. This can make it difficult to predict which threads will run at any particular time and the timing and order of thread execution.

Hence Threads move between different states →

Waiting: Threads in "Waiting" are stopped and waiting for hardware, system calls or synchronization calls, causing performance issues.
Runnable: Threads in "Runnable" want to execute their assigned instructions but face competition and scheduling latency.
Executing: This means the Thread has been placed on a core and is executing its machine instructions. The work related to the application is getting done. This is what everyone wants.

A context switch occurs when the scheduler swaps an executing thread with a runnable thread on a core. The previously executing thread is then placed in a runnable or waiting state, depending on its current status. The act of switching threads is called a Context Switch.

Context switches are like a bad break-up - they're expensive, take time, and can leave your program feeling lost and confused. It's like your program is trying to date multiple Threads at once, but every time it goes on a date, it has to take a break and see someone else. This switcheroo can take up to 1500 nanoseconds, which is a lot of time in the world of nanoseconds. To put it into perspective, that's like losing the ability to execute up to 18k instructions. It's like your program is constantly stuck in a love triangle, unable to fully commit to any one Thread.

Thus,

The primary reason why we cannot just throw Threads at the problem and had to come up with different strategies to tackle the above was to avoid fewer of these.

This is not the only problem with having many threads. There is the cache-coherence problem as well which eventually leads to the application going to be thrashing through memory and the performance is going to be horrible. To top it all off, we will have no understanding of why.

So which strategy/model should I go with?

Like all situations in life, It Depends! Being able to answer and reason the above in any situation will help anyone in becoming a better Engineer.

At the end of the day, all concurrency strategies can be boiled down to the below strategies

This is not exhaustive. Let me know in the comments if you feel there are other relevant factors as well.

How the underlying concurrency model pleases the OS scheduler - The fewer context switches we cause the better the performance of the program.
What kind of load are you dealing with IO Bound or CPU Bound
1. CPU-Bound: This is work that never creates a situation where the Thread may be placed in Waiting states. This is work that is constantly making calculations. Eg A Thread calculating Pi to the Nth digit would be CPU-Bound.
2. IO-Bound: This is work that causes Threads to enter into Waiting states. This is work that consists in requesting access to a resource over the network or making system calls into the operating system. A Thread that needs to access a database would be IO-Bound. I would include synchronization events (mutexes, atomic), that cause the Thread to wait as part of this category.

Case Study #1 - Golang and Kotlin

Both Golang and Kotlin employ Goroutines and Coroutines respectively. While golang has concurrency out of the box in Kotlin we might need to run it with the extending libraries.

Under the hood, while the mechanics to achieve the performance gain might be different but logically both models are trying to extend the same philosophy.

“Abstract out the thread management for the developer using coroutines.”

They both have their scheduler which map or context switches the coroutines on top of User Level Threads. Both of the models by default create just 1 Thread per Virtual Core.

These schedulers by nature are also cooperative by nature. To function as a cooperative scheduler, the scheduler must rely on clearly defined events in the user space that occur at secure points within the code, which the scheduler can then leverage to make informed scheduling decisions.

In Go, these events can be using the go keyword before any function, garbage collection etc.

In Kotlin generally invoking suspend functions can be considered as one of the events.

How do they please the underlying OS Scheduler?

It’s simple, turn IO-bound work into CPU-bound work at the OS level. When this happens the underlying OS thread never needs to context switch because of IO as they are always executing instructions. Doing so is possible for networking apps and other apps that don’t need system calls that block OS Threads. Voila! just by abstracting the concept to the coroutines level both schedulers just took things into their own hands and deliver performance to the developer.

Nature of the load → IO Bound or CPU Bound?

Both of them can handle both kinds of loads and handle it pretty decently for the developer. But in some use cases, it can shoot you in the foot. If your use case is primarily IO bound, this is because the I/O operations are typically asynchronous and already handled by the operating system or hardware, and adding more threads may not improve performance. Sometimes the complexity might even lead to degraded performance due to management overhead.

Case Study #2 - NodeJS and Python

Both of the above tend to execute their version of the event loop, thus we can keep them in the same category.

Both follow the same basic model of running a continuous loop that monitors a queue of events and schedules their execution. When an event is ready to be executed, the event loop will dispatch it to the appropriate callback function.

How do they please the underlying OS Scheduler?

These loops are generally single-threaded, hence no context switch. Honestly, that’s a very badass way of pleasing the OS scheduler, just cut the problem at the root. These threads tend to keep on running and monitoring a queue of events and executing them. One downside of this approach is that since it is single threaded if we introduce any CPU-bound work as part of callbacks in the event loop it will block further execution which might degrade performance. We’ll address how to tackle it as part of the next section.

Nature of the load → IO Bound or CPU Bound?

In a single-threaded event loop, all I/O operations are handled by a single thread, which makes it easy to write asynchronous code that doesn't block the main thread. However, there are some cases where a single-threaded event loop may not be sufficient, such as when you need to perform CPU-bound tasks or when you need to handle a very large number of connections. Both of them provide a way of passing on this CPU-bound work as part of a separate OS-level thread hence unblocking the event loop.

Conclusion

We can take the same framework and apply it to other languages to see if they're up to snuff. But let's not forget that all these fancy languages have tricks up their sleeves to make our lives easier when it comes to writing concurrent code. But at a broader level, these can help you build a foundational overview.

Now, here's the kicker: With great power comes great temptation! Sure, we can make concurrency seem like a piece of cake, but that doesn't mean you should just start throwing it into your code. You gotta do your research, benchmark your code, and understand what kind of work your application is doing before you start slapping on the concurrency. Otherwise, you'll end up shooting yourself in the foot and your app's performance will go down the toilet faster than you can say "context switch"!

References

Understand in Depth Go does its magic? Check this out
Kotlin Coroutines? Check this out
Some Good points for Node Eventloop - Check this out

Just a Startup Guy