How To
Aug 13, 2024

Find your third-party dependencies in Python

Find your third-party dependencies in Python

Does this error look familiar?

python3: No module named [insert module here]

It’s likely because you don’t have the correct set of dependencies installed. This often occurs when there’s a mismatch between the defined dependency set in pyproject.toml, requirements.txt, or setup.py, and the application code that is being run.

Conversely, sometimes you might have too many packages installed, making your bundle size bigger, slowing down your deployments, and costing you more time and money. This often happens when carving out services, removing dead code, or migrating to a new package.

In both these cases, we need to inspect our code and ensure that the set of installed dependencies == used dependencies. In the best case, we can also enforce that this state continues to exist in perpetuity.

Discovery

Assuming installed dependencies are defined, how do we figure out our used dependencies? tach has a builtin command to show you the external dependencies for a given Python module. Let’s dogfood the tach repository itself to ensure our dependency set is correct!

First - install tach:

> pip install tach

Next, let’s set up tach so that it can understand our project. We can do this by indicating our Python source root to tach through tach mod:

> tach mod

In this case, our source root is python in the root of the repository. This will write our project’s configuration to tach.toml.

Next, let’s run the builtin command report-externalto surface the dependencies within Rich. All of the Python code is in python/tach:

> tach report-external --raw python/tach
pydantic
pyyaml
gitpython
rich
prompt-toolkit
tomli-w
networkx
stdlib-list
importlib-metadata

Note that we use --raw to only print each dependency once, in the PEP 508 format. Without this flag set, tach will output every usage of a third party dependency.

Here we can see that tach utilizes 9 external dependencies - a fact we can validate by checking against the pyproject.toml:

dependencies = [
    "pyyaml~=6.0", # pyyaml
    "tomli-w~=1.0", # tomli-w
    "pydantic~=2.0", # pydantic
    "rich~=13.0", # rich
    "prompt-toolkit~=3.0", # prompt-toolkit
    "GitPython~=3.1", # gitpython
    "networkx~=3.0; python_version > '3.7'", # networkx
    "networkx>=2.6,<4.0; python_version == '3.7'",
    "pydot~=2.0", # ???
    "stdlib-list>=0.10.0; python_version < '3.10'", # stdlib-list
    "importlib_metadata>=6.5; python_version == '3.7'", # importlib-metadata
]

Interesting - here we see that most of the dependencies are covered, but pydot isn’t reported! What's going on?

If we inspect the code, we see:

# tach/show.py

if TYPE_CHECKING:
    ...
    import pydot

Because pydot is in a type checking block, it’s not a dependency that will block execution. tach ignores these types of imports by default. If we want to surface this, we can flip the ignore_type_checking  flag in our tach.toml config:

# tach.toml
...
ignore_type_checking_imports = false

Now if we run tach report-external --raw python/tach, we see pydot!

> tach tach report-external --raw python/tach
gitpython
networkx
pydantic
pyyaml
prompt-toolkit
stdlib-list
rich
pydot
tomli-w
importlib-metadata

Awesome - our used dependencies match our declared dependencies!

Enforcement

Generally, when someone removes a required dependency, there will be a failure somewhere in the existing toolchain.

However, it’s often hard to identify additional bloat around dependencies - this is where tach check-external comes in.

check-external validates that the code in the defined source root in your tach.toml only imports and uses third party dependencies listed in your pyproject.toml. This even works for multiple source roots! We can use this command in CI to validate that we have an exact match on third party dependencies:

> tach check-external
✅ All external dependencies validated!

This ensures we never ship more or less than we need with tach!