Installation

Python

Trafilatura runs using Python, currently one of the most frequently used programming languages.

This software library/package is tested on Linux, macOS and Windows systems. It is compatible with Python 3 (3.4 upwards):

Trafilatura package

Trafilatura is packaged as a software library available from the package repository PyPI. As such it can notably be installed with pip or pipenv.

Basics

Please refer to this section for an introduction on command-line usage.

$ pip install trafilatura # pip3 install on systems where both Python 2 and 3 are installed

This project is under active development, please make sure you keep it up-to-date to benefit from latest improvements:

$ pip install -U trafilatura # to make sure you have the latest version
$ pip install -U git+https://github.com/adbar/trafilatura.git # latest available code (see build status above)

Command-line tool

For local or user installations where trafilatura cannot be used from the command-line, please refer to the official Python documentation and this page on finding executables from the command-line.

Additional functionality

A few additional libraries can be installed for extended functionality and faster processing: language detection and faster processing of downloads (cchardet, may not work on all systems).

$ pip install trafilatura[all] # all additional functionality

For extended date extraction you can use pip install htmldate[all].

You can also install or update relevant packages separately, trafilatura will detect which ones are present on your system and opt for the best available combination.

For infos on dependency management of Python packages see this discussion thread