Managing Python Dependencies With Git Hooks

When developing Python applications, it’s good practice to “pin” your dependencies in a requirements.txt file with an explicit version number (==) for each library.

This enables repeatable deployments; source control always explicitly defines a single set of dependency versions. Otherwise, if you have Flask or Flask>=0.10.0, there could be a new release of Flask installed on deploy that no one has actually tested locally.

The standard way to do this is to use a virtualenv and do:

pip install Flask
pip freeze > requirements.txt

Resulting in something like:

Flask==0.10.1
Jinja2==2.7.1
MarkupSafe==0.18
Werkzeug==0.9.4
argparse==1.2.1
itsdangerous==0.23
wsgiref==0.1.2

Unfortunately, that leaves a bunch of information about Flask’s dependencies in our requirements.txt.

Since Flask specifies its dependencies as loosely as possible (Jinja2>=2.4) and so does Jinja2 (MarkupSafe), if we do this then soon we’ll start missing out on new versions of libraries in our dependency tree, ones that might have better performance, new features, or pending deprecations. It’s also just a lot of noise.

Fortunately, with pip we can make a requirements.txt that only contains Flask and then do

pip freeze -r requirements.txt requirements-pinned.txt

Resulting in:

Flask==0.10.1
## The following requirements were added by pip --freeze:
Jinja2==2.7.1
MarkupSafe==0.18
Werkzeug==0.9.4
argparse==1.2.1
itsdangerous==0.23
wsgiref==0.1.2

We can store both requirements files in version control. requirements.txt says what versions we support for top-level dependencies and requirements-pinned.txt has an explicit version for every dependency in the entire dependency tree, to be used in our deployment script.

With git hooks, we can automate all of this.

This pre-commit hook ensures that locally-installed dependencies are pinned before every commit:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash

if [[ $(git diff --name-only --cached | grep "requirements-pinned.txt") ]]; then
    echo "Modify requirements.txt instead of requirements-pinned.txt";
    exit 1;
fi

git stash -q --keep-index

# in case someone edited requirements.txt without actually installing the package yet
echo "Installing latest requirements from requirements.txt"
pip install -q -r requirements.txt

echo "Pinning dependencies to requirements-pinned.txt"
pip freeze -r requirements.txt > requirements-pinned.txt

git add requirements.txt requirements-pinned.txt

git stash pop -q

This post-merge and post-checkout hook ensures that locally-installed dependencies always correspond to the pinned dependencies in the currently checked-out version of the code.

1
2
3
#!/bin/bash
echo "Updating requirements from requirements-pinned.txt"
pip freeze -r requirements.txt | comm --nocheck-order -23 requirements-pinned.txt - | xargs pip install

git-hooks is a nice way to manage your git hooks. Just put them in a git_hooks directory in your repo, and you’re good to go with:

cd .git/ && ln -s ../git_hooks git_hooks
git hooks install

(Be sure to make them executable.)

This also enables developers to explicitly run the pre-commit hook:

git hooks run pre-commit

git commit --no-verify can be used to avoid triggering the pre-commit hook.

This setup ensures that all installations of the codebase from version control are always running on top of an explicit set of versions defined in version control for all dependencies in the dependency tree, while taking no extra time for developers.

At the same time, It also enables us to specify flexible version ranges for our top-level dependencies, making dependency management easier for installations not from version control (i.e., anyone depending on us via PyPI), but we’re not forced to take any immediate action when one of our dependencies releases a new version. (There was a service that notified you of this whose name I don’t remember, but I think it’s now defunct.)

Bash History With Expanded Aliases

A Hacker Schooler recently released an awesome tool to visualize your git workflow based on your command-line history using graphviz.

I wanted to use this with my history without it being littered with the aliases from my ultimate alias setup, so I wrote a Python script that looks up your bash and git aliases and then outputs an edited version of your bash history with both types of aliases expanded to their full commands.

Here’s my workflow. I guess I should put git status in my prompt.

The Ultimate Git Alias Setup

If you use git on the command-line, you’ll eventually find yourself wanting aliases for your most commonly-used commands. It’s incredibly useful to be able to explore your repos with only a few keystrokes that eventually get hardcoded into muscle memory.

Some people don’t add aliases because they don’t want to have to adjust to not having them when they’re doing something on a remote server. Personally, I find that having aliases doesn’t mean I that forget the underlying commands, and aliases provide such a massive improvement to my workflow that I can’t imagine having to type them out.

The simplest way to add an alias for a specific git command is to use a standard bash alias.

1
2
3
# .bashrc

alias s="git status -s"

The disadvantage of this is that it isn’t integrated with git’s own alias system, which lets you define git commands or external shell commands that you call with git <alias>. This has some nice advantages:

  • integration with git’s default bash completion for subcommand arguments
  • ability to store your git aliases separately from your bash aliases
  • ability to see all your aliases and their corresponding commands using git config

If you add the following code to your .bashrc on a system with the default git bash completion scripts installed, it will automatically create completion-aware g<alias> bash aliases for each of your git aliases.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
if [ -f /etc/bash_completion ] && ! shopt -oq posix; then
    . /etc/bash_completion
fi

function_exists() {
    declare -f -F $1 > /dev/null
    return $?
}

for al in `__git_aliases`; do
    alias g$al="git $al"

    complete_func=_git_$(__git_aliased_command $al)
    function_exists $complete_fnc && __git_complete g$al $complete_func
done

The main downside to this approach is that it will make your terminal take a little longer to load. (On my machine, it adds about a second.)

My aliases

Here are the aliases I use constantly in my workflow. I’m lazy about remembering many other aliases that I’ve decided I should be using, which this setup is great for because I can always list them all using gla.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[alias]
    # one-line log
    l = log --pretty=format:"%C(yellow)%h\\ %ad%Cred%d\\ %Creset%s%Cblue\\ [%cn]" --decorate --date=short

    a = add
    ap = add -p
    c = commit --verbose
    ca = commit -a --verbose
    cm = commit -m
    cam = commit -a -m
    m = commit --amend --verbose

    d = diff
    ds = diff --stat
    dc = diff --cached

    s = status -s
    co = checkout
    cob = checkout -b
    # list branches sorted by last modified
    b = "!git for-each-ref --sort='-authordate' --format='%(authordate)%09%(objectname:short)%09%(refname)' refs/heads | sed -e 's-refs/heads/--'"

    # list aliases
    la = "!git config -l | grep alias | cut -c 7-"

See Must Have Git Aliases for more.