textract
Package — DocumentsPython 3.7+Intermediate
Extract text from any document format (PDF, DOCX, PPTX, etc.)
Quick Info
- Documentation
- Official Docs
- Python Version
- 3.7+
- Dependencies
- chardet, argcomplete, beautifulsoup4, xlrd, six, SpeechRecognition, pdfminer.six, docx2txt, python-pptx, EbookLib
- Install
pip install textract
Learn by Difficulty
Quick Example
python
# Install: pip install textract import textract # Basic textract usage print(f"Using textract") # See documentation for detailed examples
textract is a third-party package. Extract text from any document format (PDF, DOCX, PPTX, etc.). Install with: pip install textract
Try in PlaygroundTags
packagedocumentsfile-formatoffice