Details
-
Bug
-
Resolution: Unresolved
-
P3: Somewhat important
-
None
-
6.3.2
-
None
Description
QPdfDocument::getAllText does not return all characters, some characters are missing. Please check the out.pdf I posted. The out.txt is generated by fitz+PyMuPDF:
python3 -m fitz gettext -pages 1 out.pdf
It works fine. But result from QPdfDocument::getAllText missing some charactors, I put the result in getAllText.txt file. Here is the diff :
getAllText: 是一个共享 ,供 个 系 统 (如在计算 机之
PyMuPDF: 接口是一个共享框架,供 两 个 系 统 (如在计算机和打印机之间
As it shows , a lot character are missing. I think pdfium returned wrong result, but chrome can handle this pdf correctly (copy works fine, along with other pdf viewers ). May be it's relevant to chromium version Qt used?