Core functions ============== .. contents:: Table of contents :depth: 2 :local: :backlinks: none Extraction ---------- ``extract()`` ~~~~~~~~~~~~~ .. autofunction:: trafilatura.extract ``bare_extraction()`` ~~~~~~~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.bare_extraction ``baseline()`` ~~~~~~~~~~~~~~ .. autofunction:: trafilatura.baseline ``html2txt()`` ~~~~~~~~~~~~~~ .. autofunction:: trafilatura.html2txt ``try_readability()`` ~~~~~~~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.external.try_readability ``try_justext()`` ~~~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.external.try_justext ``extract_metadata()`` ~~~~~~~~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.extract_metadata ``extract_comments()`` ~~~~~~~~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.core.extract_comments Link discovery -------------- ``sitemap_search()`` ~~~~~~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.sitemaps.sitemap_search ``find_feed_urls()`` ~~~~~~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.feeds.find_feed_urls ``focused_crawler()`` ~~~~~~~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.spider.focused_crawler Helpers ------- ``fetch_url()`` ~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.fetch_url ``fetch_response()`` ~~~~~~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.fetch_response ``decode_file()`` ~~~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.utils.decode_file ``load_html()`` ~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.load_html ``sanitize()`` ~~~~~~~~~~~~~~ .. autofunction:: trafilatura.utils.sanitize ``trim()`` ~~~~~~~~~~ .. autofunction:: trafilatura.utils.trim XML processing -------------- ``xmltotxt()`` ~~~~~~~~~~~~~~ .. autofunction:: trafilatura.xml.xmltotxt ``validate_tei()`` ~~~~~~~~~~~~~~~~~~ .. autofunction:: trafilatura.xml.validate_tei