Details
-
Bug
-
Resolution: Done
-
P2: Important
-
6.2.1
-
None
-
7450eda927436a59f34f1a1455a6d6a9515d8156 (qt/qt5compat/dev)
Description
I'm testing a simple XML document with an encoding spec not being UTF-8:
<?xml version="1.0" encoding="iso-8859-1"> <child><t>Hällo, world</t></child>
However, the encoding is not recognized. Instead, I receive garbage for the Umlaut character.
After debugging the issue, I think the problem is here: qt5compat/src/core5/sax/qxml.cpp, line 1348++ in method QXmlInputSource::fromRawData:
... bool needMoreText; QByteArray encoding = extractEncodingDecl(d->encodingDeclChars, &needMoreText).toLatin1(); if (!encoding.isEmpty()) { auto e = QStringDecoder::encodingForData(encoding); if (e && *e != QStringDecoder::Utf8) { d->toUnicode = QStringDecoder(*e); ...
"extractEncodingDecl" properly reads the encoding as "iso-8859-1", but using "QStringDecoder::encodingForData" seems not to generate the corresponding decoder, but tries to guess the encoding from the string content.
Previous versions of Qt5 used "QTextCodec::codecForName", which renders the desired result:
... bool needMoreText; QString encoding = extractEncodingDecl(d->encodingDeclChars, &needMoreText); if (!encoding.isEmpty()) { if (QTextCodec *codec = QTextCodec::codecForName(std::move(encoding).toLatin1())) { /* If the encoding is the same, we don't have to do toUnicode() all over again. */ if(codec->mibEnum() != mib) { delete d->encMapper; ...