An imputil replacement
[This is part of Installer release 5. It can also be downloaded separately.]
Module iu grows out of the pioneering work that Greg Stein did with imputil (actually, it includes some verbatim imputil code, but since Greg didn't copyright it, I won't mention it). Both modules can take over Python's builtin import and ease writing of at least certain kinds of import hooks.
iu differs from imputil :
- faster
- better emulation of builtin
import
- more managable
There is an ImportManager which provides the replacement for builtin import and hides all the semantic complexities of a Python import request from it's delegates..
ImportManager
ImportManager formalizes the concept of a metapath. This concept implicitly exists in native Python in that builtins and frozen modules are searched before sys.path , (on Windows there's also a search of the registry while on Mac, resources may be searched). This metapath is a list populated with ImportDirector instances. There are ImportDirector subclasses for builtins, frozen modules, (on Windows) modules found through the registry and a PathImportDirector for handling sys.path . For a top-level import (that is, not an import of a module in a package), ImportManager tries each director on it's metapath until one succeeds.
ImportManager hides the semantic complexity of an import from the directors. It's up to the ImportManager to decide if an import is relative or absolute; to see if the module has already been imported; to keep sys.modules up to date; to handle the fromlist and return the correct module object.
ImportDirectors
An ImportDirector just needs to respond to getmod(name) by returning a module object or None. As you will see, an ImportDirector can consider name to be atomic - it has no need to examine name to see if it is dotted.
To see how this works, we need to examine the PathImportDirector .
PathImportDirector
The PathImportDirector subclass manages a list of names - most notably, sys.path . To do so, it maintains a shadowpath - a dictionary mapping the names on it's pathlist (eg, sys.path ) to their associated Owners. (It could do this directly, but the assumption that sys.path is occupied solely by strings seems ineradicable.) Owner s of the appropriate kind are created as needed (if all your import s are satisfied by the first two elements of sys.path , the PathImportDirector 's shadowpath will only have two entries).
Owners
An Owner is much like an ImportDirector but manages a much more concrete piece of turf. For example, a DirOwner manages one directory. Since there are no other officially recognized filesystem-like namespaces for importing, that's all that's included in iu , but it's easy to imagine Owner s for zip files (and I have one for my own .pyz archive format) or even URLs.
As with ImportDirector s, an Owner just needs to respond to getmod(name) by returning a module object or None, and it can consider name to be atomic.
So structurally, we have a tree, rooted at the ImportManager . At the next level, we have a set of ImportDirector s. At least one of those directors, the PathImportDirector in charge of sys.path , has another level beneath it, consisting of Owners . This much of the tree covers the entire top-level import namespace.
The rest of the import namespace is covered by treelets, each rooted in a package module (an __init__.py ).
Packages
To make this work, Owner s need to recognize when a module is a package. For a DirOwner , this means that name is a subdirectory which contains an __init__.py . The __init__ module is loaded and it's __path__ is initialized with the subdirectory. Then, a PathImportDirector is created to manage this __path__ . Finally the new PathImportDirector 's getmod is assigned to the package's __importsub__ function.
When a module within the package is imported, the request is routed (by the ImportManager ) diretly to the package's __importsub__ . In a hierarchical namespace (like a filesystem), this means that __importsub__ (which is really the bound getmod method of a PathImportDirector instance) needs only the module name, not the package name or the fully qualified name. And that's exactly what it gets. (In a flat namespace - like most archives - it is perfectly easy to route the request back up the package tree to the archive Owner , qualifying the name at each step.)
Possibilities
Let's say we want to import from .zip files. So, we subclass Owner . The __init__ method should take a filename, and raise a ValueError if the file is not an acceptable .zip file, (when a new name is encountered on sys.path or a package's __path__ , registered Owner s are tried until one accepts the name). The getmod method would check the .zip file's contents and return None if the name is not found. Otherwise, it would extract the marshalled code object from the .zip, create a new module object and perform a bit of initialization (12 lines of code all told for my own archive format, including initializing a package with it's __subimporter__ ).
Once the new Owner class is registered with iu4 , you can put a .zip file on sys.path . A package could even put a .zip file on it's __path__ .
Compatibility
This code has been tested with the PyXML, mxBase and Win32 packages, covering over a dozen import hacks from manipulations of __path__ to replacing a module in sys.modules with a different one. Emulation of Python's native import is nearly exact, including the names recorded in sys.modules and module attributes (packages imported through iu have an extra attribute - __importsub__ ).
Performance
In most cases, iu is slower than builtin import (by 15 to 20%) but faster than imputil (by 15 to 20%). By inserting archives at the front of sys.path containing the standard lib and the package being tested, this can be reduced to 5 to 10% slower (or, on my 1.52 box, 10% faster!) than builtin import. A bit more can be shaved off by manipulating the ImportManager 's metapath.
Limitations
This module makes no attempt to facilitate policy import hacks. It is easy to implement certain kinds of policies within a particular domain, but fundamentally iu works by dividing up the import namespace into independent domains.
Quite simply, I think cross-domain import hacks are a very bad idea. As author of Installer, I have been dealing with import hacks for three years now. Many of them are highly fragile; they often rely on undocumented (maybe even accidental) features of implementation. A cross-domain import hack is not likely to work with PyXML, for example.
That rant aside, you can modify ImportManger to implement different policies. For example, I have a version that implements three import primitives: absolute import, relative import and recursive-relative import. I have no idea what the Python sytax for those should be, but __aimport__ , __rimport__ and __rrimport__ were easy to implement.
Usage
Here's a simple example of using iu as a builtin import replacement.
>>> import iu
>>> iu.ImportManager().install()
>>>
>>> import DateTime
>>> DateTime.__importsub__
<method PathImportDirector.getmod
of PathImportDirector instance at 825900>
>>>
|