Cython is a great language/tool for optimizing and speeding up Python code. But managing Cython projects can be a bit tedious as Cython code needs to be compiled ahead-of-time. It is possible to set up a workflow for Cython projects with Pyximport so that each Cython module can be automatically compiled before being imported. But Pyximport can be finicky. I have created a quick script that patches Cython's Pyximport for more robust auto-compilation of Cython modules (see github repository).
Cython is a language that extends Python for the creation of compiled and optimized Python extension modules. In many circumstances, Cython can make previously Python code run almost as fast a C. Cython users can achieve this by adding variable type information, and compiling ahead-of-time. Additionally, adding variable type information is optional, so optimizations can be done incrementally (and as needed) on previously Python code, rather than having to rewrite some Python code entirely in a new language like C++.
The standard way of compiling Cython code is create and run a "setup.py" file. But this method can be a bit tedious for managing large and/or fast moving projects. Instead Cython users can use Pyximport to have their Cython modules be automatically compiled, as needed, before those modules are imported by other modules at run-time. Pyximport is best used for Cython projects in development. Pyximport makes modifying and importing Cython modules as easy as modifying and importing Python modules. However, Pyximport can fail to automatically recompile certain files because of how Cython chooses when to recompile files. This can leave your project running the wrong code, with no visible warning that this is happening. My patch for Pyximport allows for robust autocompilation mainly by doing two things:
A pitfall of Cython's Pyximport is that one Cython module can prevent the compilation of another module with the same name, leading to unexpected behavior. This can be explained with the following example. Consider a setup like this:
|-- projectA
|-- __main__.py |-- my_module.pyx |-- projectB |-- __main__.py |-- my_module.pyx
The contents of both "__main__.py" files are:
The contents of project A's "my_module.pyx" are:
And the contents of project A's "my_module.pyx" are:
Then we can see problems with Cython's Pyximport with this procedure:
My patch for Pyximport stores the compiled extension modules inside the project directory, so that compiling extension modules for one project will not overwrite the extension modules of other projects. The simplest way to use my patch for pyximport is to just copy and paste the patch into the project directory. For our examples, we would end with a setup like so:
|-- projectA
|-- __main__.py |-- my_module.pyx |-- patched_pyximport.py |-- projectB |-- __main__.py |-- my_module.pyx |-- patched_pyximport.py
The "patched_pyximport.py" file can be found in the github respository. Now we just edit both "__main__.py" files to use the new patched_pyximport:
Following the previous procedure, my patch for pyximport fixes the name collision problem:
Cython's Pyximport uses "cythonize" which, by default, only recompiles modules if the ".pyx" file, or any of its dependencies have changed. But it determines that a file has changed by looking at the modification time for the file. If the ".pyx" file's modification time is after the resulting ".c" file modification time, then Cython will recompile the file. But what if a ".pyx" file has "changed" by the modification time isn't after the current ".c" file's modification time? Well, then Cython will not recompile the ".pyx" file, and instead import the previous version of the extension module. This would lead to unexpected behavior.
This situation is not hard to come by. For example, with moving files around. According to this StackOverflow answer, "If you are moving a file or directory within a filesystem, the timestamps on that file or directory won't change." So image a project setup like so:
|-- projectC
|-- __main__.py |-- my_module.pyx |-- old_projectC
With the "__main__.py" and "my_module.pyx" files similar (but not same) to the above example with name collisions:
Now the procedure is to:
1. Copy the current version of "projectC/my_module.pyx" into the "old_projectC" directory. Now the file structure looks like this:
|-- projectC
|-- __main__.py |-- my_module.pyx |-- old_projectC |-- my_module.pyx
2. Run "projectC/__main__.py"
This will compile and import "projectC/my_module.pyx. Desired Output: This project is great Resulting Output: This project is great 3. Change "projectC/my_module.pyx" like so:
4. Now rerun "projectC/__main__.py"
Since "projectC/my_module.pyx" file was change, Cython will recompile and import "projectC/my_module.pyx". Desired Output: This project is good Resulting Output: This project is good 5. Now restore the old version of the project. Delete "projectC/my_module.pyx" and then move "old_projectC/my_module.pyx" into "projectC". Depending on your system, the "my_module.pyx" file will not change its modification time after the move. 6. Now rerun "projectC/__main__.py" The"projectC/my_module.pyx" file was "changed", but Cython will not recompile "projectC/my_module.pyx" since the modification time has not advanced forward. Instead, the last version of the extension module will be imported. This is not what we want. Desired Output: This project is great Resulting Output: This project is good
My patch stores the file size and modification time of all module source files and their dependencies when a module is compiled/recompiled. If the modification time or file size changed, my patch will ensure that the file get recompile regardless of whether the modification advanced forward or not. My patch achieves this by "touching" the file if the current file stats differs from the stored stats for that file. Touching a file updates the modification time, and forces Cython to see the touched file as a newly modified file, thus forcing recompilation.
Again, to use the patched Pyximport, we add the patched_pyximport.py file to the original file structure:
|-- projectC
|-- __main__.py |-- my_module.pyx |-- patched_pyximport.py |-- old_projectC
And we update the "projectC/__main__.py" to use the patched Pyximport:
Then, if you follow all of the previous steps up to step 6, we get the desired behavior:
1.-5. Follow the steps shown above 6. Rerun "projectC/__main__.py" The"projectC/my_module.pyx" file was "changed", but the modification time has not advanced forward. This patch will detect this and touch the file so that Cython will be forced to recompile the module. Desired Output: This project is great Resulting Output: This project is great
Pyximport is a great tool to automatically compile extension modules with Cython. But Pyximport can be finicky. Pyximport can fail to recompile modules in certain circumstances, and would instead cause the program to import the wrong extension module. This is caused by 1) name collisions between two modules with the same name but are in different projects, or 2) failure to determine that a file needs to be compiled even though the modification time is old. I have created a quick script that patches Cython's Pyximport for more robust auto-compilation of Cython modules (see github repository) that addresses these two issues.
1 Comment
11/4/2022 12:46:43 pm
Board eye by firm. Finish born teacher stop budget control.
Reply
Leave a Reply. |
This section will not be visible in live published website. Below are your current settings: Current Number Of Columns are = 1 Expand Posts Area = Gap/Space Between Posts = 10px Blog Post Style = card Use of custom card colors instead of default colors = Blog Post Card Background Color = current color Blog Post Card Shadow Color = current color Blog Post Card Border Color = current color Publish the website and visit your blog page to see the results Author
I am Golden Rockefeller, a Ph.D Robotics student at Oregon State University. I will post updates on personal projects that I am working on. My interests include music synthesis, cooking, game design, and intelligent autonomous system. Categories
All
Archives |