trafilatura

Package — ScrapingPython 3.7+Intermediate

Extract main text content from web pages; robust article extraction

Quick Info

Documentation
Official Docs
Python Version
3.7+
Dependencies
lxml, urllib3, certifi, courlan, htmldate, justext
Install
pip install trafilatura

Learn by Difficulty

Quick Example

python
# Install: pip install trafilatura
import trafilatura

# Basic trafilatura usage
print(f"Using trafilatura")
# See documentation for detailed examples

trafilatura is a third-party package. Extract main text content from web pages; robust article extraction. Install with: pip install trafilatura

Try in Playground

Tags

packageweb-scrapingparsingdata-extraction