Goals
Modern CPUs
- Not only x64. ARMs are coming!
- More cores, same speed, sad Moore :(
- Optimizations
Modern CPUs
- registries
- caches (L1, L2, L3)
- prefetcher
- predictor
- pipeline
- write buffer
Atomicity
- Do it in one step
- Lock'n'roll
- Claim and work.
Atomicity - Interlocked
- Atomic operations on CPU level
- They are frequently translated to one ASM instruction
Atomicity - Interlocked adding
- Interlocked.Add(ref counter, value)
- Interlocked.Increment(ref counter)
- Interlocked.Decrement(ref counter)
Atomicity - Interlocked swapping
- Interlocked.Exchange(ref location, newValue)
- Swaps value in the location with the newValue
- Returns old value
Atomicity - Interlocked conditional swapping
- Interlocked.CompareExchange(ref location, newValue, valueToCompare)
- Swaps value in the location with the newValue conditionally
- Returns old value
Memory models
Memory models
- Consider two virtual operations: LOAD & STORE
- Four possible combinations:
- LOAD-LOAD
- LOAD-STORE
- STORE-STORE
- STORE-LOAD
Memory models - barriers
- Inserted between two LOAD/STORE operations.
- A memory barrier is a simple mechanism that disables some reorderings
- A full memory barrier prohibits ANY reorderings
- A full memory barrier is emitted by:
- lock (obj)
- Interlocked.Add/Exchange
- when using high level constructs like WaitHandle
Memory models - barriers
- Aquire fence
- Volatile.Read
- Prohibits LOAD-LOAD
- Prohibits LOAD-STORE
- Release fence
- Volatile.Write
- Prohibits STORE-LOAD
- Prohibits STORE-STORE
Memory models - allowed reorderings
Name |
LOAD-LOAD |
L-S |
S-S |
S-L |
Full barrier (lock) |
|
|
|
|
Aquire fence |
|
|
✔ |
✔ |
Release fence |
✔ |
✔ |
|
|
No fence |
✔ |
✔ |
✔ |
✔ |
Structures & implementation
Summary
- You can write ultra performant code in .NET
- You can go low level with Volatile, Interlocked
- It's all there waiting for YOUR app to be performant!