- Enable text_extraction in default config
- Implement extract_text_data to collect text from Page objects
- Improve PDF grammar to handle text operators and spacing (TJ, Tj, ', ")
- Add logic for newline insertion based on Td/TD/Tm operators
- Add heuristic for space insertion based on negative kerning in TJ arrays
- Support common ligatures for StandardEncoding and MacRomanEncoding
- Support FlateDecode and ASCIIHexDecode filters
- Update rspamadm mime to support raw PDF extraction (-r flag) and better content type detection