Hvorfor du har brug for Python-miljøer, og hvordan du styrer dem med Conda

Jeg har over to årtier med professionel erfaring som udvikler, jeg kender en lang række rammer og programmeringssprog, og en af ​​mine favoritter er Python. Jeg har undervist i det i nogen tid nu, og ifølge min erfaring er etablering af Python-miljøer et udfordrende emne .

Dermed,min vigtigste motivation for at skrive denne artikel var at hjælpe nuværende og potentielle Python-brugere med at få en bedre forståelse af, hvordan man styrer sådanne miljøer.

Hvis du har åbnet denne artikel, er chancerne for, at du allerede ved, hvad Python er, hvorfor det er et godt værktøj, og du har endda en Python installeret på din computer.

hvorfor har du lige brug for Python-miljøer ? Du kan spørge: skal jeg ikke bare installere den nyeste Python-version?

Hvorfor har du brug for flere Python-miljøer

Når du begynder at lære Python,det er et godt udgangspunkt for at installere den nyeste Python-version med de nyeste versioner af de pakker, du har brug for eller vil lege med. Derefter fordyber du dig sandsynligvis i denne verden og downloader Python-applikationer fra GitHub, Kaggle eller andre kilder. Disse applikationer har muligvis brug for andre versioner af Python / pakker end dem, du har brugt i øjeblikket.

I dette tilfælde skal du oprette forskellige såkaldte miljøer .

Bortset fra denne situation er der flere brugstilfælde, når flere miljøer kan være nyttige:

  • Du har en applikation (udviklet af dig selv eller af en anden), der engang fungerede smukt. Men nu har du forsøgt at køre det, og det fungerer ikke. Måske er en af ​​pakkerne ikke længere kompatible med de andre dele af dit program (på grund af de såkaldte break-ændringer ). En mulig løsning er at oprette et nyt miljø til din applikation, der indeholder Python-versionen og de pakker, der er helt kompatible med din applikation.
  • Du samarbejder med en anden , og du vil sikre dig, at din applikation arbejder på dit teammedlems computer og omvendt, så du kan også oprette et miljø til din kollegas applikation (er).
  • Du leverer et program til din klient , og igen vil du sikre dig, at det fungerer problemfrit på din klients computer.

Et miljø består af en bestemt Python-version og nogle pakker. Derfor, hvis du ønsker at udvikle eller bruge applikationer med forskellige Python- eller pakkeversionskrav , skal du oprette forskellige miljøer.

Nu hvor vi har diskuteret, hvorfor miljøer er nyttige, lad os dykke ind og tale om nogle af de vigtigste aspekter ved styring af dem.

Pakke- og miljøledere

De to mest populære værktøjer til opsætning af miljøer er:

  • PIP(en Python-pakkehåndtering; sjovt nok står det for "Pip Installs Packages") med virtualenv (et værktøj til at skabe isolerede miljøer)
  • Conda (en pakke- og miljøchef )

I denne artikel dækker jeg, hvordan man bruger Conda . Jeg foretrækker det fordi:

  1. Klar struktur : Det er let at forstå dets katalogstruktur
  2. Gennemsigtig filhåndtering : Det installerer ikke filer uden for dets bibliotek
  3. Fleksibilitet : Den indeholder mange pakker (PIP-pakker kan også installeres i Conda-miljøer)
  4. Multipurpose : Det er ikke kun til styring af Python-miljøer og pakker - du kan også bruge det til R (et programmeringssprog til statistisk computing)

På tidspunktet for skrivningen af ​​denne artikel bruger jeg 4.3.x-versionerne af Conda, men de nye 4.4.x-versioner er også tilgængelige.

I tilfælde af Conda 4.4 har der for nylig været ændringer, der påvirker Linux / Mac OS X-brugere. De er beskrevet i denne ændringslogindgang.

Sådan vælges en passende Conda-downloadindstilling

Installation af dit Conda-system er lidt mere kompliceret end at downloade et godt billede fra Unsplash eller købe en ny e-bog. Hvorfor det?

1. Installatør

I øjeblikket er der 3 forskellige installatører :

  • Anaconda (gratis)
  • Miniconda (gratis)
  • Anaconda Enterprise-platform (det er et kommercielt produkt, der giver organisationer mulighed for at anvende Python og R i virksomhedsmiljøer)

Lad os se nærmere på de gratis værktøjer, Anaconda og Miniconda . Hvad er de største forskelle mellem disse to?

Hvad er de ting, de deler til fælles? De er begge konfigureret på din computer

  • den Conda (pakken & miljø management system) og
  • det såkaldte “rodmiljø” (mere om det lidt senere).

Hvad angår de største forskelle, kræver Miniconda ca. 400 MB diskplads, og den indeholder kun et par grundlæggende pakker.

Den Anaconda installatør kræver ca 3GB diskplads, og det installeres over 150 videnskabelige pakker (f.eks pakker til statistik og machine learning). Det opretter også Anaconda Navigator, et GUI-værktøj, der hjælper dig med at administrere Conda-miljøer og pakker.

Jeg foretrækker Miniconda, da jeg aldrig har brugt de fleste af de pakker, der er inkluderet i Anaconda som standard. En anden grund er, at anvendelse af Miniconda muliggør en jævnere duplikering af miljøet (for eksempel hvis jeg også vil bruge det på en anden computer), da jeg kun installerer de pakker, der kræves af mine apps på begge computere.

Fra nu af skal jeg beskrive, hvordan Miniconda fungerer (i tilfælde af brug af Anaconda er processen næsten den samme).

2–3. Platform (operativsystem og bit-count)

Ud over disse 3 forskellige installationsprogrammer er der også undertyper baseret på bitantal: 32- og 64- bitinstallatører. Og selvfølgelig har disse også undertyper til de forskellige operativsystemer: Windows, Linux og Mac OS X (bortset fra at Mac OS X-versionen kun er 64-bit).

I denne artikel fokuserer jeg på Windows- versionen (Linux- og Mac OS X-versionerne er kun lidt forskellige. For eksempel er stien til installationsmapperne og nogle kommandolinjekommandoer forskellige).

Så 32-bit eller 64-bit?

Hvis du har et 64-bit operativsystem (OS) med 4 GB RAM eller mere, skal du installere 64-bit versionen. Derudover har du muligvis brug for et 64-bit installationsprogram, hvis de pakker, du planlægger at anvende, kræver 64-bit versioner af Python. For eksempel, hvis du vil bruge TensorFlow - mere præcist de officielle såkaldte binære filer - har du brug for en 64-bit OS og Python-version.

Hvis du har et 32-bit operativsystem, eller hvis du planlægger at bruge pakker, der kun har 32-bit versioner, er 32-bit versionen den gode mulighed for dig.

4. Python-version (til rodmiljøet)

If these 3 dimensions aren’t enough (installers, 32/64-bit, and operating systems), there is a 4th one based on the different Python versions (included in the installer — and consequently, in the root environment)!

So let’s talk a bit about the different available Python versions.

Currently, your options are version 2.7 or version 3.x (at the time of writing this article, it’s 3.6) for the Python that is inside the root environment. For the additional environments, you can choose any version — ultimately, this is why you create environments in the first place: to easily switch between the different environments and versions.

So 2.7 or 3.x version Python for my root environment?

Let me help you decide it really quickly:

Since the 3.x is newer, this should be your default choice. (The 2.7 version is a legacy version, it was released in 2010, and there won’t be newer 2.7 major releases for it, only fixes.)

However, if

  • you have mostly 2.7 code (you made or utilize applications using the 2.7 versions) or
  • you need to use packages that don’t have Python 3.x versions,

you should install a Python 2.7-based root environment.

You might ask that: why don’t I just create two environments based on these two 2.7 and 3.x versions? I’m glad that you asked. The reason for that is that your root environment is the one that is created during the installation process and it’s activated by default.

I’ll explain in one of the following sections how you can activate an environment, but basically it means that the root environment is the more easily accessible one, so carefully selecting your root environment will make your workflow more efficient.

Throughout the installation process, Miniconda will let you change some options set by default (for example you can check/uncheck some checkboxes). When you install Conda for the first time, I recommend that you leave these options intact (except for the path of the installation directory).

I’d like to mention one more thing here. While you can have multiple environments that contain different versions of Python at the same time on the same computer, you can’t set up 32- and 64-bit environments using the same Conda management system. It is possible to mix them somehow, but it is not that easy, so I’m going to devote a separate article to this topic.

Python environments: root and additional

So now you’ve picked an appropriate installer for yourself, well done! Now let’s take a look at the different types of environments and how they are created.

Miniconda sets up two things for you: Conda and the root environment.

The process looks like this: the installer installs Conda first, which is — as I already mentioned — the package and environment management tool. Then, Conda creates a root environment that contains two things:

  • a certain version of Python and
  • some basic packages.

Next to the root environment, you can create as many additional environments as you want. And the whole point is that these additional environments can contain different versions of Pythons and other packages. So it means that, for example, if your precious little application is not working anymore in the newest, state-of-the-art environment you’ve just set up, you can always go “back” and use some another version(s) of some packages (including Python— Python itself is a package, more on that later).

As I already summarized at the beginning of the article, the main use cases of applying an additional environment are these:

  • You develop applications with different Python or package version requirements
  • You use applications with different Python or package version requirements
  • You collaborate with other developers
  • You create Python applications for clients

Before diving into the basics of environment management, let’s take a look at your Conda system’s directory structure.

Directory structure

As I mentioned above, the Conda system is installed into a single directory. In my example this directory is: D:\Miniconda3-64\. It contains the root environment and two important directories (the other directories are irrelevant for now):

  • \pkgs (it contains the cached packages in compressed and uncompressed formats)
  • \envs (it contains the environments — except for the root environment — in separate subdirectories)

The most significant executable files and directories inside a Conda environment (placed in the \envs\environmentname directory) are:

  • \python.exe — the Python executable for command line applications. So for instance, if you are in the directory of the Example App, you can execute it by: python.exe exampleapp.py
  • \pythonw.exe — the Python executable for GUI applications, or completely UI-less applications
  • \Scripts — executables that are parts of the installed packages. Upon activation of an environment, this directory is added to the system path, so the executables become available without their full path
  • \Scripts\activate.exe — activates the environment

And if you’ve installed Jupyter, this is also an important file:

  • \Scripts\jupyter-notebook.exe— Jupyter notebook launcher (part of the jupyter package). In short, Jupyter Notebook creates so-called notebook documents that contain executable parts (for example Python) and human-readable parts as well. It’d take another article to get into it in more detail.

So now you should have at least one Python environment successfully installed on your computer. But how can you start utilizing it? Let’s take a closer look.

GUI vs. Command line (Terminal)

As I mentioned above, the Anaconda installer also installs a graphical user interface(GUI) tool called Anaconda Navigator. I also pointed out that I prefer using Miniconda, and that does not install a GUI for you, so you need to use text-based interfaces (for example command line tools or the Terminal).

In this article, I focus on the command line tools (Windows). And while I concentrate on the Windows version, these examples can be applied to Linux and Mac OS X as well, only the path of the installation folders and some command line commands differ.

To open the command line, select “Anaconda 32-bit” or “Anaconda 64-bit” (depending on your installation) in the Windows’s Start menu, then choose “Anaconda Prompt”.

I recommend reading through the official Conda cheat sheet (pdf), as it contains the command differences between Windows and Mac OS X/Linux, too.

In the following sections, I’m going to give you some examples of the basic commands, indicating their results as well. Hopefully these will help you better manage your new environment.

Managing environments

Adding a new environment

To create a new environment named, for instance mynewenv (you can name it what ever you like), that includes, let’s say, a Python version 3.4., run:

conda create --name mynewenv python=3.4

You can change an environment’s Python version by using the package management commands I describe in the next section.

Activating and leaving (deactivating) an environment

Inside a new Conda installation, the root environment is activated by default, so you can use it without activation.

In other cases, if you want to use an environment (for instance manage packages, or run Python scripts inside it) you need to first activate it.

Here is a step by step guide of the activation process:

First, open the command line (or the Terminal on Linux/Mac OS X). To activate the mynewenv environment, use the following commands depending on the operating system you have:

  • on Windows:
activate mynewenv
  • On Linux or Mac OS X:
source activate mynewenv

The command prompt changes upon the environment’s activation. It becomes, for example, (mynewenv) C:\>or (root) D:\>, so as a result of the activation, it now contains the active environment’s name.

The directories of the active environment’s executable files are added to the system path (this means that you can now access them more easily). You can leave an environment with this command:

deactivate

On Linux or Mac OS X, use this one:

source deactivate

According to the official Conda documentation, in Windows it is a good practice to deactivate an environment before activating another.

It needs to be mentioned that upon deactivating an environment, the root environment becomes active automatically.

To list out the available environments in a Conda installation, run:

conda env list 

Example result:

# conda environments:#mynewenv D:\Miniconda\envs\mynewenvtensorflow-cpu D:\Miniconda\envs\tensorflow-cpuroot * D:\Miniconda

Thanks to this command, you can list out all your environments (the root and all the additional ones). The active environment is marked with an asterisk (at each given moment, there can be only one active environment).

How do you learn the version of your Conda?

It can be useful to check what version of Conda you are using, and also what are the other parameters of your environment. I’m going to show you below how to easily list out this information.

To get the Conda version of the currently active environment, run this command:

conda --version

Example result:

conda 4.3.33

To get a detailed list of information about the environment, for instance:

  • Conda version,
  • platform (operating system and bit count — 32- or 64-bit),
  • Python version,
  • environment directories,

run this command:

conda info

Example result:

Current conda install:
Current conda install: platform : win-64 conda version : 4.3.33 conda is private : False conda-env version : 4.3.33 conda-build version : not installed python version : 3.6.3.final.0 requests version : 2.18.4 root environment : D:\Miniconda (writable) default environment : D:\Miniconda\envs\tensorflow-cpu envs directories : D:\Miniconda\envs C:\Users\sg\AppData\Local\conda\conda\envs C:\Users\sg\.conda\envs package cache : D:\Miniconda\pkgs C:\Users\sg\AppData\Local\conda\conda\pkgs channel URLs : //repo.continuum.io/pkgs/main/win-64 //repo.continuum.io/pkgs/main/noarch //repo.continuum.io/pkgs/free/win-64 //repo.continuum.io/pkgs/free/noarch //repo.continuum.io/pkgs/r/win-64 //repo.continuum.io/pkgs/r/noarch //repo.continuum.io/pkgs/pro/win-64 //repo.continuum.io/pkgs/pro/noarch config file : C:\Users\sg\.condarc netrc file : None offline mode : False user-agent : conda/4.3.33 requests/2.18.4 CPython/3.6.3 Windows/10 Windows/10.0.15063 administrator : False

Now you know some basic commands for managing your environment. Let’s take a look at managing the packages inside the environment.

Managing packages

Depending on the installer you chose, you’re going to end up with some basic (in case of using Miniconda) or a lot of (in case of using Anaconda) packages to start with. But what happens if you need

  • a new package or
  • another version of an already installed package?

Conda — your environment and package management tool — will come to the rescue. Let’s look at this in more detail.

Package channels

Channels are the locations of the repositories (on the illustration I call them storages) where Conda looks for packages. Upon Conda’s installation, Continuum’s (Conda’s developer) channels are set by default, so without any further modification, these are the locations where your Conda will start searching for packages.

Channels exist in a hierarchical order. The channel with the highest priority is the first one that Conda checks, looking for the package you asked for. You can change this order, and also add channels to it (and set their priority as well).

It is a good practice to add a channel to the channel list as the lowest priority item. That way, you can include “special” packages that are not part of the ones that are set by default (~Continuum’s channels). As a result, you’ll end up with all the default packages — without the risk of overwriting them by a lower priority channel — AND that “special” one you need.

To install a certain package that cannot be found inside these default channels, you can search for that “special” package on this website. Not all packages are available on all platforms (=operating system & bit count, for example 64-bit Windows), however, you can narrow down your search to a specific platform. If you find a channel that contains the package you’re looking for, you can append it to your channel list.

To add a channel (named for instance newchannel) with the lowest priority, run:

conda config --append channels newchannel

To add a channel (named newchannel) with the highest priority, run:

conda config --prepend channels newchannel

It needs to be mentioned that in practice you’ll most likely set channels with the lowest priority. For a beginner, adding a channel with the highest priority is an edge case.

To list out the active channels and their priorities, use the following command:

conda config --get channels

Example result:

--add channels 'conda-forge' # lowest priority --add channels 'rdonnelly' --add channels 'defaults' # highest priority

There is one more aspect that I’d like to summarize here. If multiple channels contain a package, and one channel contains a newer version than the other one, the channels’ hierarchical order determines which one of these two versions are going to be installed, even if the higher priority channel contains the older version.

Searching, installing and removing packages

To list out all the installed packages in the currently active environment, run:

conda list

The command results in a list of the matching package names, versions, and channels:

# packages in environment at D:\Miniconda: # asn1crypto 0.22.0 py36h8e79faa_1 bleach 1.5.0  ca-certificates 2017.08.26 h94faf87_0 ... wheel 0.29.0 py36h6ce6cde_1 win_inet_pton 1.0.1 py36he67d7fd_1 wincertstore 0.2 py36h7fe50ca_0 yaml 0.1.7 vc14hb31d195_1 [vc14]

To search for all the available versions of a certain package, you can use the search command. For instance, to list out all the versions of the seaborn package (it is a tool for data visualization), run:

conda search -f seaborn

Similarly to the conda listcommand, this one results in a list of the matching package names, versions, and channels:

Fetching package metadata ................. seaborn 0.7.1 py27_0 conda-forge 0.7.1 py34_0 conda-forge 0.7.1 py35_0 conda-forge ... 0.8.1 py27hab56d54_0 defaults 0.8.1 py35hc73483e_0 defaults 0.8.1 py36h9b69545_0 defaults

To install a package (for instanceseaborn) that is inside a channel that is on your channel list, run this command (if you don’t specify which version you want, it’ll automatically install the latest available version from the highest priority channel):

conda install seaborn

You can also specify the package’s version:

conda install seaborn=0.7.0

To install a package (for example yamlthat is, btw. a YAML parser and emitter) from a channel (for instance a channel named conda-forge), that is inside a channel that is not on your channel list, run:

conda install -c conda-forge yaml

To update all the installed packages (it only affects the active environment), use this command:

conda update

To update one specific package,for examplethe seaborn package, run:

conda update seaborn

To remove the seaborn package, run:

conda remove seaborn

There is one more aspect of managing packages that I’d like to cover in this article. If you don’t want to deal with compatibility issues (breaking changes) caused by a new version of one of the packages you use, you can prevent that package from updating. As I mentioned above, if you run the conda update command, all of your installed packages are going to be updated, so basically it is about creating an “exception list”. So how can you do this?

Prevent packages from updating (pinning)

Create a file named pinned in the environment’s conda-metadirectory. Add the list of the packages that you don’t want to be updated to the file. So for example, to force the seaborn package to the 0.7.x branch and lock the yamlpackageto the 0.1.7 version, add the following lines to the file named pinned:

seaborn 0.7.* yaml ==0.1.7

Changing an environment’s Python version

And how can you change the Python version of an environment?

Python is also a package. Why is that relevant for you? Because you’re going to use the same command for replacing the currently installed version of Python with another version that you use when you replace any other package with another version of that same package.

First, you should list out the available Python versions:

conda search -f python

Example result (the list contains the available versions and channels):

Fetching package metadata ................. python 2.7.12 0 conda-forge 2.7.12 1 conda-forge 2.7.12 2 conda-forge ... 3.6.3 h3b118a2_4 defaults 3.6.4 h6538335_0 defaults 3.6.4 h6538335_1 defaults

To replace the current Python version with, for example, 3.4.2, run:

conda install python=3.4.2

To update the Python version to the latest version of its branch (for instance updating the 3.4.2 to the 3.4.5 from the 3.4 branch), run:

conda update python

Adding PIP packages

Towards the beginning of this article, I recommended using Conda as your package and environment manager (and not PIP). And as I mentioned above, PIP packages are also installable into Conda environments.

Therefore, if a package is unavailable through the Conda channels, you can try to install it from the PyPI package index. You can do this by using thepip command (this command is made available by the Conda installer by default, so you can apply it in any active environment). For instance if you want to install the lightgbm package (it is a gradient boosting framework), run:

pip install lightgbm

Summary

So let’s wrap this up. I know that it seems quite complicated — and it is, in fact, complicated. However, utilizing environments will save you a lot of trouble.

In this article, I’ve summarized how you can:

  • choose an appropriate Conda installer for yourself
  • create additional environments (next to the root environment)
  • add or replace packages (and I also explain how channels work)
  • manage your Python version(s)

There are many more aspects in the area of Python environment management, so please let me know what aspects you find most challenging. Also let me know if you have some good practices that I don’t mention here. I’m curious about your workflow, so please feel free to share in the response section below if you have any suggestions!

Recommended Articles

If you’re interested in this topic, I encourage you to check out these articles as well. Thanks for these great resources Michael Galarnyk, Dries Cronje, Ryan Abernathey, Sanyam Bhutani, Jason Brownlee and Jake Vanderplas.

Python Environment Management with Conda (Python 2 + 3, Using Multiple Versions of Python)

Why do you need virtual environments? Say you have multiple projects and they all rely on a library (Pandas, Numpy…towardsdatascience.com

Setup your Windows 10 machine for Machine Learning

How to setup your Windows 10 machine for Machine Learning using Ubuntu Bash shell and Condabecominghuman.ai

Custom Conda Environments for Data Science on HPC Clusters

A problem that lot of scientists have to deal with is how to run our python code on an HPC cluster (e.g. an xsede…medium.com

Basic Tutorials Part 3

Condamedium.com

How to Setup a Python Environment for Machine Learning and Deep Learning with Anaconda - Machine…

It can be difficult to install a Python machine learning environment on some platforms. Python itself must be installed…machinelearningmastery.com

Conda: Myths and Misconceptions

I've spent much of the last decade using Python for my research, teaching Python tools to other scientists and…jakevdp.github.

Using Docker

A little side note based on one of my reader’s question (thanks for bringing this up Vikram Durai!):

If your application

  • uses a server (for example a database server with preloaded data), AND
  • you want to distribute this server and its data together with your application and its Python environment to others (for instance to a fellow developer or to a client),

you can “containerize” the whole thing with Docker.

In this case, all these components will be encapsulated in a Docker container:

  • The application itself,
  • The Conda environment that can run your application (so a compatible Python version and packages),
  • The local server or service (for example: a database server and a web server) required to run the application

You can read more about how Anaconda and Docker work together in this article by Kristopher Overholt:

Anaconda and Docker - Better Together for Reproducible Data Science

Anaconda integrates with many different providers and platforms to give you access to the data science libraries you…www.anaconda.com

Some more articles about Docker containers (by Preethi Kasireddy and Alexander Ryabtsev):

A Beginner-Friendly Introduction to Containers, VMs and Docker

If you’re a programmer or techie, chances are you’ve at least heard of Docker: a helpful tool for packing, shipping…

What is Docker and How to Use it With Python (Tutorial)

This is an introductory tutorial on Docker containers. By the end of this article you will get the idea on how to use…djangostars.com

Respond ? — please let me know in the response section if you have any suggestions or questions!

Thanks for reading! ?

Og tak til min kone Krisztina Szerovay, som hjalp mig med at gøre denne artikel mere forståelig og skabte illustrationer. Hvis du er interesseret i UX-design (hvis du er en udvikler, skal du være :)), skal du tjekke hendes UX Knowledge Base-skitser her:

UX Knowledge Base Sketch

UX Knowledges Base Sketch-kollektionen er til UX-designere og alle, der er interesseret i UX-design eller tegning. uxknowledgebase.com