Tutorials ========= Learn through practical examples. The following tutorials cover various scenarios, from text embedding for vector search to building custom web corpora and generating word frequency lists. .. toctree:: :maxdepth: 2 tutorial0 tutorial1 tutorial2 tutorial-epsilla tutorial-dwds Blog posts ^^^^^^^^^^ - `Extracting the main text content from web pages using Python `_ - `Validating TEI-XML documents with Python `_ - `Evaluating scraping and text extraction tools for Python `_ - `Filtering links to gather texts on the web `_ - `Using sitemaps to crawl websites on the command-line `_ - `Using RSS and Atom feeds to collect web pages with Python `_ - `Web scraping with R: Text and metadata extraction `_ - `Web scraping with Trafilatura just got faster `_ Videos ^^^^^^ Youtube playlist with video tutorials in several languages `Web scraping how-tos and tutorials `_. .. raw:: html External resources ^^^^^^^^^^^^^^^^^^ - `GLAM-Workbench `_ - `Harvesting collections of text from archived web pages `_ - `Compare two versions of an archived web page `_ - `User Ethics & Legal Concerns `_ - `Download von Web-Daten `_ & `Daten aufbereiten und verwalten `_ (Tutorials in German by Noah Bubenhofer)