Multiplayer Locks
Summary
The game thread and network thread access shared data structures without proper synchronization. These race conditions can cause operations to disappear, and that causes the game to desync.
Details
Despite locks and even lock-free concurrent data structures being well-understood in the mid 1980s, P3 does not use correct locks anywhere.
Execute Current Operations
The execute_operations
function at 0x00546870
is not locking the current operations properly.
The basic block at 0x005468B3
attempts to lock the current operations:
This is not how locks work.
Insert Pending Operations
The insert_into_pending_operations_warpper
function at 0054AA70
is not locking the pending operations properly.
The basic blocks at 0054AA79
attempt to lock the pending operations:
This is not now locks work.
Client Ingress Queue
The function at 0x0054B080
which moves operations from the ingress queue and the socket into the current operations is not locking the current operations properly at two locations.
The basic block at 0x0054B13F
attempts to try-lock the current operations:
This is not how locks work.
The basic block at 0x0054B200
attempts to try-lock the current operations:
This is not how locks work.
Client Pending Operations
The function at 0x0054AFA0
which sends operations from the pending operations to the host is not locking the current operations properly:
This is certainly not how locks work.
Host Egress Queue
The function at 0x0054B670
which moves operations from the host's pending operations and the client sockets into the egress queue is not locking the pending operations properly:
This is not how locks work.
Host Ingress Queue
The function at 0x0054B960
which moves operations from the host's egress and ingress queues into the current operations is not locking the current operations properly:
This is not how locks work.
Fix
All bugs are fixed by the fix-multiplayer-locks mod.
Execute Current Operations
To fix the problem at 0x005468B3
the following changes have to be made:
- The "locking" basic blocks at
0x005468B3
must correctly lock the current operations. This can be achieved by inserting a call to a proper lock function. - The "unlocking" basic block at
0x00547254
must correctly unlock the current operations. This can be achieved by inserting a call to a proper unlock function.
Insert Pending Operations
To fix the problem at 0x0054AA79
the following changes have to be made:
- The "locking" basic blocks at
0x0054AA79
must correctly lock the pending operations. This can be achieved by inserting a call to a proper lock function. - The "unlocking basic block at
0x0054AAC2
must correctly unlock the pending operations. This can be achieved by inserting a call to a proper unlock function.
Client Ingress Queue
To fix the problem at 0x0054B13F
the following changes have to be made:
- The "try-locking" basic block at
0x0054B13F
must correctly try-lock the current operations, and continue into the basic block at0x0054B14F
only if the lock was acquired. This can be achieved by replacing the twomov
instructions with a call instruction to a proper try-lock function which returns the result. - The "unlocking" basic block at
0x0054B198
must unlock the current operations only if they were locked by the basic block at0x0054B13F
. This can be achieved by replacing themov
instruction with a call instruction to a proper unlock function and making thejnz
instruction target the next instruction after the call. Thecmp
instruction above it must be moved below it to ensure it always happens, sojnz
must point to the movedcmp
.
To fix the problem at 0x0054B200
the following changes have to be made:
- The "try-locking" basic block at
0x0054B200
must correctly try-lock the current operations, and continue into the basic block at0x0054B210
only if the lock was acquired. This can be achieved by replacing the twomov
instructions with a call instruction to a proper try-lock function which returns the result. - The "unlocking" basic block at
0x0054B21C
must unlock the current operations only if they were locked by the basic block at0x0054B200
. This can be achieved by replacing themov
instruction with a call instruction to a proper unlock function and making thejnz
instruction target the next instruction after the call.
Client Pending Operations
To fix the problem at 0x0054AFB7
the following changes have to be made:
- The "locking" basic block at
0x0054AFB7
must correctly lock the pending operations. This can be achieved by replacing the entire block and its successor with a call instruction to a proper lock function. - The two "unlocking" branches at
0x0054B049
and0x0054B063
must unlock the pending operations. This can be achieved by replacing the respectivemov
instruction with a call instruction to a proper unlock function.
Host Egress Queue
To fix the problem at 0x0054B90D
the following changes have to be made:
- The "locking" basic block at
0x0054B90D
must correctly lock the pending operations. This can be achieved by inserting a call to a proper lock function. - The "unlocking" instruction at
0x0x0054B949
must correctly unlock the pending operations. This can be achieved by inserting a call to a proper unlock function.
Host Ingress Queue
To fix the problem at 0x0054BCCB
the following changes have to be made:
- The "try-locking" basic block at
0x0054BCCB
must correctly try-lock or lock the current operations, and continue into the basic block at0x0054BCD9
only if the lock was acquired. This can be achieved by inserting a call to a proper lock function. - The "unlocking" instruction at
0x0054BD2C
must correctly unlock the current operations if they were locked. If the "try-lock" was replaced with a lock, this can be achieved by inserting a call to a proper unlock function.