Reports

Since all nuitka/cython programs rely on python3x.dll or libpython3.x.so, they all store low-level Python object structures (PyObject *) in memory, so they can be easily injected by using DLL injection tools.

On the other hand, all Python objects have magic methods (e.g., __add__). Therefore, if I create a proxy object that wraps a target object, whenever any magic method of the proxy is called, the proxy logs the call, invokes the same method on the target object, returns the result as a normal object, and recursively proxies the resulting object, raw Python codes can be generated from these call logs.

This is the principle of my pyobject.objproxy library and the high-level wrapper pymodhook, which could be installed via pip install pymodhook.

An example hooking numpy and matplotlib:

from pyobject import ObjChain

chain = ObjChain(export_attrs=["__array_struct__"])
np = chain.new_object("import numpy as np", "np")
plt = chain.new_object("import matplotlib.pyplot as plt", "plt",
                        export_funcs=["show"])

# Testing the pseudo numpy and matplotlib modules
arr = np.array(range(1, 11))
arr_squared = arr ** 2
print(np.mean(arr))

plt.plot(arr, arr_squared)
plt.show()

# Display the auto-generated code calling numpy and matplotlib libraries
print(f"Code:\n{chain.get_code()}\n")
print(f"Optimized:\n{chain.get_optimized_code()}")

The output:

Code: # Unoptimized code that contains all detailed access records for objects
import numpy as np
import matplotlib.pyplot as plt
var0 = np.array
var1 = var0(range(1, 11))
var2 = var1 ** 2
var3 = np.mean
var4 = var3(var1)
var5 = var1.mean
var6 = var5(axis=None, dtype=None, out=None)
ex_var7 = str(var4)
var8 = plt.plot
var9 = var8(var1, var2)
var10 = var1.to_numpy
var11 = var1.values
var12 = var1.shape
var13 = var1.ndim
...
var81 = var67.__array_struct__
ex_var82 = iter(var70)
ex_var83 = iter(var70)
var84 = var70.mask
var85 = var70.__array_struct__
var86 = plt.show
var87 = var86()

Optimized: # Optimized code
import numpy as np
import matplotlib.pyplot as plt
var1 = np.array(range(1, 11))
plt.plot(var1, var1 ** 2)
plt.show()

Though the code from raw call logs is cluttered, like that generated by IDA Pro, it could be optimized via a DAG algorithm (see details in README.md).
Additionally, programs hooked by my current pyobject.objproxy library run 40x slower than normal (as measured by python -m pyobject.tests.test_objproxy_perf).

For DLL injection, the injected DLL first searches for loaded DLLs from python31.dll to python332.dll (e.g. python313.dll), then calls PyImport_ImportModule("__hook__"). This requires __hook__.py and other modules to be placed in the same directory as the EXE beforehand.

For usage instructions of this toolchain, see the README.md of the pymodhook library.
Note: I am the developer of this reverse engineering toolchain.

79612753