Goalist Developers Blog

Which Python Package Manager Should You Use?

f:id:vivek081166:20190612133948p:plain
source : https://realpython.com/

Nowadays Python is everywhere  - academics, data science, machine learning, enterprise application, web application, scripting... you name it python is everywhere. Whatever you do, python is there either to help you or give you a headache.

Let's say, you have learned python programming and ready to use to develop applications, surely, as that sounds great, you jump into coding python scripts and eventually start installing python packages. From there one follows a dangerous path into a developer’s nightmare.

f:id:vivek081166:20190612152119p:plain
source : https://xkcd.com/1987/

Package installation may lead to having incompatibility issues or make other applications unworkable. And you may discover that your code does not work on some machines while it just works flawlessly on your local machine. Why??? It's because of the Python environment.

To save yourself from incompatibility issues, a separate virtual python environment needs to be created for a project.

A virtual environment is a bunch of scripts and directories that can run python isolated. By using a virtual environment, each python project can have its own dependencies regardless of other projects and system python environments.

In this blog post, I would like to share with you my environment for working with data and doing machine learning. You most definitely do not need to copy anyone's setup but perhaps use the one that best fit for you.

f:id:vivek081166:20190612134946p:plain

Every programmer has different preferences when it comes to their programming environment vim versus emacs, tabs versus spaces, virtualenv versus anaconda.

To start with, we need to talk about pip. A python person {what :O} knows that pip is Python's package manager. It has come built into Python for quite a while now, so if you have Python, you likely have pip already.

pip installs packages like tensorflow and numpy, pandas and jupyter, and many more along with their dependencies. Many Python resources are delivered in some form of pip packages. Sometimes you may see a file called requirements.txt in someone's folder of Python scripts. Typically, that file outlines all of the pip packages that the project uses, so you can easily install everything needed by using

pip install -r requirements.txt

As part of this ecosystem, there's a whole world of version numbers and dependencies. You sometimes need to use different versions of a given library for different projects that you are working on.

So you need a way to organize groups of packages into different isolated environments. Otherwise, looking at the version errors you would want to bang your head against the wall.

There are two popular options currently for taking care of managing your different pip packages virtualenv and anaconda.

f:id:vivek081166:20190612145609p:plain

1) Virtualenv

Virtualenv is a package that allows you to create named virtual environments where you can install pip packages in an isolated manner. This tool is great if you want to have detailed control over which packages you install for each environment you create.

For example, you could create an environment for web development with one set of libraries, and a different environment for data science. This way, you won't need to have unrelated libraries interacting with each other, and it allows you to create environments dedicated to specific purposes.

# install
pip install virtualenv

# create environment
virtualenv venv 

# activate environment
source venv/bin/activate

f:id:vivek081166:20190612141916p:plain

2) Anaconda

Now, if you're primarily doing data science work, Anaconda is also a great option. Anaconda is created by Continuum Analytics, and it is a Python distribution that comes preinstalled with lots of useful Python libraries for data science. Anaconda is popular because it brings many of the tools used in data science and machine learning with just one install, so it's great for having a short and simple setup.

Like Virtualenv, Anaconda also uses the concept of creating environments so as to isolate different libraries and versions. Anaconda also introduces its own package manager called conda from where you can install libraries.

Additionally, Anaconda still has a useful interaction with pip that allows you to install any additional libraries which are not available in the Anaconda package manager.

Follow the instructions to download and install anaconda from here

# create environment
conda create --name test-env

# activate environment
conda activate test-env

# install additional packages
conda install tensorflow

To add more you have a nice UI to manage your projects and environment
f:id:vivek081166:20190614111628p:plain

So... which one to use, virtualenv or anaconda?

Well, it's nice to try out different libraries on both virtualenv and anaconda, but sometimes those two package managers don't necessarily play nicely with each other on one system.

In my case, I have opted to use both, but I manage the whole thing using a library called pyenv.

Conceptually, pyenv sits on top of both virtualenv and anaconda and it can be used to control not only which virtualenv environment or Anaconda environment is in use, but it also easily controls whether I'm running Python 2 or Python 3.

pyenv local 2.7.10
pyenv activate py27_tf12


pyenv local 3.5.2
pyenv activate py35_tf12

One final aspect of pyenv that it has an ability to set a default environment for a given directory. This causes that desired environment to be automatically activated when you enter a directory.

~/code $ cd myproject
(py35_tf12) ~/code/myproject $

I find this to be way easier than trying to remember which environment I want to use every time I work on a project.

So which package manager do you use?

It really comes down to your workflow and preferences. If you typically just use the core data science tools and are not concerned with having some extra libraries installed that you don't use, Anaconda can be a great choice since it leads to a simpler workflow for your needs and preferences.

But if you are someone who loves to customize your environment and make it exactly like how you want it, then perhaps something like virtualenv or even pyenv maybe more to your liking.

There's no one right way to manage Python libraries, and there's certainly more out there than the options that I just presented.

As different tools come and go, it's important to remember that everyone has different needs and preferences, so choose for yourself the best one that fits your needs.

That's it for this post, my name is Vivek Amilkanthwar. See you soon with one of such next time; until then, Happy Learning :)