Multiplayer Locks
Summary
The game thread and network thread access shared data structures without proper synchronization. These race conditions can cause operations to disappear, and that causes the game to desync.
Details
Despite locks and even lock-free concurrent data structures being well-understood in the mid 1980s, P3 does not use correct locks anywhere.
Execute Current Operations
The execute_operations function at 0x00546870 is not locking the current operations properly.
The basic block at 0x005468B3 attempts to lock the current operations:
This is not how locks work.
Insert Pending Operations
The insert_into_pending_operations_warpper function at 0054AA70 is not locking the pending operations properly.
The basic blocks at 0054AA79 attempt to lock the pending operations:
This is not now locks work.
Client Ingress Queue
The function at 0x0054B080 which moves operations from the ingress queue and the socket into the current operations is not locking the current operations properly at two locations.
The basic block at 0x0054B13F attempts to try-lock the current operations:
This is not how locks work.
The basic block at 0x0054B200 attempts to try-lock the current operations:
This is not how locks work.
Client Pending Operations
The function at 0x0054AFA0 which sends operations from the pending operations to the host is not locking the current operations properly:
This is certainly not how locks work.
Host Egress Queue
The function at 0x0054B670 which moves operations from the host's pending operations and the client sockets into the egress queue is not locking the pending operations properly:
This is not how locks work.
Host Ingress Queue
The function at 0x0054B960 which moves operations from the host's egress and ingress queues into the current operations is not locking the current operations properly:
This is not how locks work.
Fix
All bugs are fixed by the fix-multiplayer-locks mod.
Execute Current Operations
To fix the problem at 0x005468B3 the following changes have to be made:
- The "locking" basic blocks at
0x005468B3must correctly lock the current operations. This can be achieved by inserting a call to a proper lock function. - The "unlocking" basic block at
0x00547254must correctly unlock the current operations. This can be achieved by inserting a call to a proper unlock function.
Insert Pending Operations
To fix the problem at 0x0054AA79 the following changes have to be made:
- The "locking" basic blocks at
0x0054AA79must correctly lock the pending operations. This can be achieved by inserting a call to a proper lock function. - The "unlocking basic block at
0x0054AAC2must correctly unlock the pending operations. This can be achieved by inserting a call to a proper unlock function.
Client Ingress Queue
To fix the problem at 0x0054B13F the following changes have to be made:
- The "try-locking" basic block at
0x0054B13Fmust correctly try-lock the current operations, and continue into the basic block at0x0054B14Fonly if the lock was acquired. This can be achieved by replacing the twomovinstructions with a call instruction to a proper try-lock function which returns the result. - The "unlocking" basic block at
0x0054B198must unlock the current operations only if they were locked by the basic block at0x0054B13F. This can be achieved by replacing themovinstruction with a call instruction to a proper unlock function and making thejnzinstruction target the next instruction after the call. Thecmpinstruction above it must be moved below it to ensure it always happens, sojnzmust point to the movedcmp.
To fix the problem at 0x0054B200 the following changes have to be made:
- The "try-locking" basic block at
0x0054B200must correctly try-lock the current operations, and continue into the basic block at0x0054B210only if the lock was acquired. This can be achieved by replacing the twomovinstructions with a call instruction to a proper try-lock function which returns the result. - The "unlocking" basic block at
0x0054B21Cmust unlock the current operations only if they were locked by the basic block at0x0054B200. This can be achieved by replacing themovinstruction with a call instruction to a proper unlock function and making thejnzinstruction target the next instruction after the call.
Client Pending Operations
To fix the problem at 0x0054AFB7 the following changes have to be made:
- The "locking" basic block at
0x0054AFB7must correctly lock the pending operations. This can be achieved by replacing the entire block and its successor with a call instruction to a proper lock function. - The two "unlocking" branches at
0x0054B049and0x0054B063must unlock the pending operations. This can be achieved by replacing the respectivemovinstruction with a call instruction to a proper unlock function.
Host Egress Queue
To fix the problem at 0x0054B90D the following changes have to be made:
- The "locking" basic block at
0x0054B90Dmust correctly lock the pending operations. This can be achieved by inserting a call to a proper lock function. - The "unlocking" instruction at
0x0x0054B949must correctly unlock the pending operations. This can be achieved by inserting a call to a proper unlock function.
Host Ingress Queue
To fix the problem at 0x0054BCCB the following changes have to be made:
- The "try-locking" basic block at
0x0054BCCBmust correctly try-lock or lock the current operations, and continue into the basic block at0x0054BCD9only if the lock was acquired. This can be achieved by inserting a call to a proper lock function. - The "unlocking" instruction at
0x0054BD2Cmust correctly unlock the current operations if they were locked. If the "try-lock" was replaced with a lock, this can be achieved by inserting a call to a proper unlock function.