Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-84020

QUrl upcases "percent-encoded" parts of the url

    XMLWordPrintable

Details

    • Bug
    • Resolution: Invalid
    • Not Evaluated
    • None
    • 5.12.1, 5.14.2, 5.15.0 Beta4
    • Core: URL Handling
    • None
    • Windows 10 PC
    • Windows

    Description

      It seem to be impossible to open some specific urls using QUrl, because Qt modifies them.

      Consider the following example url:
      https://www.scopus.com/inward/record.uri?eid=2-s2.0-85063442214&doi=10.1117%2f12.2522991&partnerID=40&md5=945b4b60e9b05578fd98c79bfba20c9a .

      I use this real-world url, provided by a well-known and respectable company to demonstrate that such urls do exist in practice.

       

      Firstly, please make sure this url works for you if you click on it, or if you copy-paste it to the url field of your browser.

      It works for me now, but I cannot guarantee Scopus won't break it in a while.

      • When it works, it shows you the info about "Large scale..." paper by Chukalina, M. and others (note "DOI: 10.1117/12.2522991") .
      • When something is wrong (e.g. if you remove a couple of characters from the very end of the url), you usually end up on a page that asks you to sign in.

       

      Secondly, note that this url contains "%2f". This part looks much like a standard RFC3986 (sec 2.1) percent-encoding, but:

      • it uses lower-case "f", while the standard suggests to use upper case "F"
      • if you try to replace "%2f" with "%2F", the url would become broken, e.g. it's case-sensitive, while the standard recommends it to be case-insensitive .

       

      Thirdly, let's try to make the Qt desktop app open this url: 

      #include <QApplication>
      #include <QDesktopServices>
      #include <QUrl>
      #include <QPushButton>
      int main(int argc, char *argv[])
      {
          QApplication a(argc, argv);
          QString link = R"(https://www.scopus.com/inward/record.uri?eid=2-s2.0-85063442214&doi=10.1117%2f12.2522991&partnerID=40&md5=945b4b60e9b05578fd98c79bfba20c9a)";
      
          QPushButton p1("Open TolerantMode");
          QUrl url1 = QUrl(link,QUrl::TolerantMode); // default
          QObject::connect(&p1, &QPushButton::clicked, [&url1]()
              { QDesktopServices::openUrl(url1); }
          );
          p1.show();
          p1.move(100,100);
      
          QPushButton p2("Open StrictMode");
          QUrl url2 = QUrl(link,QUrl::StrictMode);
          QObject::connect(&p2, &QPushButton::clicked, [&url2]()
             { QDesktopServices::openUrl(url2); }
          );
          p2.show();
          p2.move(100,170);
          return a.exec();
      }
      

      And... Both "Open StrictMode" and "Open TolerantMode" fail for me - they both replace "%2f" with "%2F".

      • this is especially unexpected for the StrictMode - it turns out that it's not that strict.
      • the TolerantMode documentation lists the corrections it performs, but this replacement does not seem to be listed there.

      One real-world result is that Qt-based pdf viewer is unable to open an url that perfectly works in Abobe Acrobat and other pdf viewers, see this github issue .

       

      Lastly, even though Scopus's url syntax practice is somewhat questionable (and I'm going to report this to their support shortly after we'll deal with this ticket), RFC3986 (sec 2.4) also says that in general one should not assume anything about the "data" in the url. Thus, in general, it is not safe to replace "%2f" with "%2F" (and, afaiu, there's no reason to do this), because, according to RFC3986 itself, it is not even safe to assume that this is a RFC3986 percent-encoding (and thus case-insensitive) - this might be just some "arbitrary data" :

      Under normal circumstances, the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts. This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data.

      Attachments

        Activity

          People

            thiago Thiago Macieira
            i3v Igor Varfolomeev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: