Article
Aug 20, 2024

The trouble with __all__

According to the official Python PEP #8:

"To better support introspection, modules should explicitly declare the names in their public API using the __all__ attribute."

This sounds great! Let’s give it a go. To define a public API for a module, simply specify it in __all__ like so:

1# core.py
2
3class PublicAPI:
4	def get(self) -> str:
5		return "public"
6
7class _PrivateAPI:
8	def get(self) -> str:
9		return "shhh"
10
11__all__ = ["PublicAPI"]

The single underscore for _PrivateAPI is also recommended in the same PEP.

While the above does define the public api for core, it unfortunately doesn’t enforce it.

>>> from core import _PrivateAPI
>>> _PrivateAPI().get()
'shhh'

Python will even bring _PrivateAPI along for the ride if you just import core:

>>> import core
>>> core._PrivateAPI().get()
'shhh'

So what does __all__ actually do? The only place __all__ has any impact is with respect to wildcard imports:

>>> from core import *
>>> PublicAPI().get()
'public'
>>> _PrivateAPI().get()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name '_PrivateAPI' is not defined

import * isn’t exactly the best interface to go through when utilizing Python modules.

__all__ breaks from the convention of other languages - Java/C++ have public, TypeScript has export, rust has pub - all to help define and enforce the public interface.

In my time as a Python developer, I’ve rarely seen __all__ used. In fact, most developers don’t bother signifying whether a function or class should be private or public, simply because Python doesn’t offer a good way to do so.

Over time, this results in modules that were intended to be separate getting tightly coupled together, and domain boundaries breaking down.

Above all, this is dangerous to teams developing on the same repository.

Taken to the extreme, this can cost companies $Ms. I’ve seen a unicorn startup dump their existing codebase and start over because the modules within it became so tightly coupled together, it was impossible to effectively develop within it or break them apart.

So what can we do? Enter this fun little code snippet (tweaked a bit for our use case):

1class ModuleWrapper(ModuleType):
2    def __init__(self, module: ModuleType):
3        self._module = module
4        super().__init__(module.__name__, module.__doc__)
5
6    def __getattr__(self, name: str):
7        module = object.__getattribute__(self, "_module")
8        if "__all__" in module.__dict__:
9            if name in module.__all__:
10                return getattr(module, name)
11            raise AttributeError(
12                f"Module '{module.__name__}' does not export '{name}'"
13            )
14        return getattr(module, name) 

Okay, now we have a way to define a public interface which actually works!

>>> import sys
>>> import core
>>> sys.modules['core'] = core.ModuleWrapper(core)
>>> core._PrivateAPI()
AttributeError: Module 'core' does not export '_PrivateAPI'

This isn’t the prettiest solution, nor the most scalable. The existing “exploit” still exists. In order to fix this, we can use importlib to create a custom import hook with ModuleWrapper:

1# hook.py
2
3class ModuleWrapper: ...
4
5
6class WrapperFinder:
7    def __init__(self):
8        self.original_import = __import__
9
10    def strict_import(self, fullname: str, *args, **kwargs):
11        if fullname in sys.modules:
12            return sys.modules[fullname]
13        module = self.original_import(fullname, *args, **kwargs)
14        wrapped = ModuleWrapper(module)
15        sys.modules[fullname] = wrapped
16        return wrapped
17
18
19def enable_strict_imports():
20    finder = WrapperFinder()
21    __builtins__["__import__"] = finder.strict_import
22    sys.meta_path.insert(0, finder)

Finally, the behavior we want! enable_strict_imports will impact any imports invoked after it’s called.

>>> from hook import enable_strict_imports
>>> enable_strict_imports()
>>> import core
>>> core.PublicAPI().get()
'public'
>>> core.PrivateAPI.get()
AttributeError: Module 'core' does not export '_PrivateAPI'
>>> from core import PrivateAPI
ImportError: cannot import name '_PrivateAPI' from 'core'

If you find this useful, all the code is here!

A Better Solution

While we now have a way to enforce imports going through __all__, there’s a few downsides:

  • There’s a requirement to run enable_strict_imports before any other imports
  • There’s runtime impact
  • There’s no concept of scope

Enter Tach -  a tool we built to help resolve these issues. With Tach, you can declare each module, and define a strict interface through __all__. It has no runtime impact as it’s enforced through static analysis. You also get more fine-grained control of which modules can see each other.

Let’s take a look at doing that with our little sample codebase! In this case, we’ll move from the Python shell to main.py  to run our imports.

1# main.py
2
3from core import _PrivateAPI  # bad!! 

First off, pip install tach.

Then tach mod lets us mark core and main as modules.

tach sync creates the following configuration for our project:

1# tach.yml
2modules:
3  - path: core
4    depends_on: []
5  - path: main
6    depends_on:
7      - path: core

With one line (strict: true), we can enforce strict imports on core :

And with tach check, we can ensure that core is used correctly by main!

1modules:
2  - path: core
3    strict: true  # enforce imports of core go through __all__
4    depends_on: []
5  - path: main
6    depends_on:
7      - path: core

Let’s see what we get:

> tach check
❌ main.py[L1]: Module 'core' is in strict mode. Only imports from the public interface of this module are allowed. The import 'core._PrivateAPI' (in module 'main') is not included in __all__.

Tach caught it! If we update main.py  to the correct usage:

1# main.py
2
3from core import PublicAPI  # good!!

tach check gives us:

> tach check
✅ All module dependencies validated!

The ergonomics are different - instead of a runtime failure, we have a CLI command that we can add to CI, as a file watcher, or as a pre-commit hook. That being said, we’ve fixed all the other downsides of our other approach:

  • There’s no requirement to import and execute code for it to work
  • There’s no runtime impact
  • We can gain more control of the dependency graph! (In the above example, core can’t import main)

If you’ve dealt with this problem before, I’d love to hear your solution! All the above code is open source and can be found here and here.

At Gauge, we’re working to solve the monolith/microservices dilemma. If that sounds like something you’re dealing with, we’d love to chat!