Since all nuitka/cython programs rely on python3x.dll or libpython3.x.so, they all store low-level Python object structures (PyObject *) in memory, so they can be easily injected by using DLL injection tools.
On the other hand, all Python objects have magic methods (e.g., __add__
). Therefore, if I create a proxy object that wraps a target object, whenever any magic method of the proxy is called, the proxy logs the call, invokes the same method on the target object, returns the result as a normal object, and recursively proxies the resulting object, raw Python codes can be generated from these call logs.
This is the principle of my pyobject.objproxy library and the high-level wrapper pymodhook, which could be installed via pip install pymodhook
.
An example hooking numpy
and matplotlib
:
from pyobject import ObjChain
chain = ObjChain(export_attrs=["__array_struct__"])
np = chain.new_object("import numpy as np", "np")
plt = chain.new_object("import matplotlib.pyplot as plt", "plt",
export_funcs=["show"])
# Testing the pseudo numpy and matplotlib modules
arr = np.array(range(1, 11))
arr_squared = arr ** 2
print(np.mean(arr))
plt.plot(arr, arr_squared)
plt.show()
# Display the auto-generated code calling numpy and matplotlib libraries
print(f"Code:\n{chain.get_code()}\n")
print(f"Optimized:\n{chain.get_optimized_code()}")
The output:
Code: # Unoptimized code that contains all detailed access records for objects
import numpy as np
import matplotlib.pyplot as plt
var0 = np.array
var1 = var0(range(1, 11))
var2 = var1 ** 2
var3 = np.mean
var4 = var3(var1)
var5 = var1.mean
var6 = var5(axis=None, dtype=None, out=None)
ex_var7 = str(var4)
var8 = plt.plot
var9 = var8(var1, var2)
var10 = var1.to_numpy
var11 = var1.values
var12 = var1.shape
var13 = var1.ndim
...
var81 = var67.__array_struct__
ex_var82 = iter(var70)
ex_var83 = iter(var70)
var84 = var70.mask
var85 = var70.__array_struct__
var86 = plt.show
var87 = var86()
Optimized: # Optimized code
import numpy as np
import matplotlib.pyplot as plt
var1 = np.array(range(1, 11))
plt.plot(var1, var1 ** 2)
plt.show()
Though the code from raw call logs is cluttered, like that generated by IDA Pro, it could be optimized via a DAG algorithm (see details in README.md).
Additionally, programs hooked by my current pyobject.objproxy
library run 40x slower than normal (as measured by python -m pyobject.tests.test_objproxy_perf
).
For DLL injection, the injected DLL first searches for loaded DLLs from python31.dll
to python332.dll
(e.g. python313.dll
), then calls PyImport_ImportModule("__hook__")
. This requires __hook__.py
and other modules to be placed in the same directory as the EXE beforehand.
For usage instructions of this toolchain, see the README.md of the pymodhook
library.
Note: I am the developer of this reverse engineering toolchain.