public inbox for development@lists.ipfire.org
 help / color / mirror / Atom feed
From: Justin Luth <jluth@mail.com>
To: development@lists.ipfire.org
Subject: [PATCH] Fix bug 12252: updxlrator - handle \ in filename
Date: Thu, 05 Dec 2019 16:07:06 +0300	[thread overview]
Message-ID: <9fdfa42e-cb04-be00-3605-1284d294f5c5@mail.com> (raw)
In-Reply-To: <a345f1bf-67d7-9ec6-bba2-ac84ea02a272@mail.com>

[-- Attachment #1: Type: text/plain, Size: 4654 bytes --]

Recently some Microsoft patches are alternating between using
\ and / as directory separators, leaving files behind in
/var/updatecache/download that can only be removed in a terminal.

What was happening was that the lock file was created using the
URL name containing %5c while wget was actually writing to a filename
with a real backslash. The zero-sized lockfile was moved into the uuid
folder, and the actual .cab file was orphaned in the download folder,
indicated in the GUI as a stranded file that couldn't be deleted.

An alternate way to fix the problem could be to force wget to
use --restrict-file-names=windows - in which case \ | : ? " * < >
would all be saved as escaped codes. That would seem to better
match the URL mangling that apparently is happening.
Cons:
-removing sourceurl mangling means re-downloading existing caches.
-less human-readable URLs / filesnames
-conceptual alternative: I didn't prove/test this theory since it
means re-downloading all oddly-named files.

P.S. I didn't change perl's utime command to use updatefile_shellEscaped
since the filepath is in quotes - the space didn't actually need to be
escaped.

---

  config/updxlrator/download   | 20 ++++++++++++++++----
  config/updxlrator/updxlrator |  2 ++
  2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/config/updxlrator/download b/config/updxlrator/download
index afa6e6cb9..df4a58725 100644
--- a/config/updxlrator/download
+++ b/config/updxlrator/download
@@ -26,6 +26,7 @@ my @http_header=();
  my $remote_size=0;
  my $remote_mtime=0;
  my $updatefile='';
+my $updatefile_shellEscaped='';  #double escaped for re-processing by
shell/system calls
  my $unique=0;
  my $mirror=1;

@@ -38,11 +39,24 @@ my $restartdl = defined($ARGV[3]) ? $ARGV[3] : 0;

  umask(0002);

+# Hopefully correct documentation: based on debugging a "\" problem...
+# WGET doesn't use the filename from the URL, but gets the filename
from the server info.
+#   -mangles that name according to --restrict-file-names.
+#   -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled.
+#      -(which suggests the entry for %2f is incorrect BTW)
+# The URL passed to this module is mangled (correct? - perhaps by
http://search.cpan.org/dist/URI/lib/URI/Escape.pm)
+#   -but $updatefile needs to match wget's download name, regardless of
the URL's encoding.
+#       -achievable either by mangling the URL and/or the filename.
+#       -*make sure "updxlrator" has the same adjustments*
  $sourceurl =~ s@\%2b(a)+@ig;
  $sourceurl =~ s@\%2f@/@ig;
  $sourceurl =~ s@\%7e@~@ig;
  $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1);
+$updatefile_shellEscaped = $updatefile;
+$updatefile_shellEscaped =~ s@\%20@\\ @ig;
+$updatefile_shellEscaped =~ s@\%5c@\\\\@ig;
  $updatefile =~ s@\%20@ @ig;
+$updatefile =~ s@\%5c@\\@ig;
  $vendorid =~ tr/A-Z/a-z/;

  unless (-d "$repository/download/$vendorid")
@@ -58,7 +72,7 @@ if($restartdl == 0)

      # hinder multiple downloads from starting simultaneously. Create
empty "lock" file.
      # TODO: Another thread may sneak in between these two commands -
so not fool-proof, but good enough?
-    system("touch $repository/download/$vendorid/$updatefile");
+    system("touch
$repository/download/$vendorid/$updatefile_shellEscaped");

  }
  else
@@ -169,11 +183,9 @@ if ($_ == 0)
      }

      &writelog("Moving file to the cache directory: $vendorid/$uuid");
-    $updatefile =~ s@ @\\ @ig;
-    system("mv $repository/download/$vendorid/$updatefile
$repository/$vendorid/$uuid");
+    system("mv $repository/download/$vendorid/$updatefile_shellEscaped
$repository/$vendorid/$uuid");
      # Workaround for IPCop's mv bug:
      utime time,$remote_mtime,"$repository/$vendorid/$uuid/$updatefile";
-    $updatefile =~ s@\\ @ @ig;

&UPDXLT::setcachestatus("$repository/$vendorid/$uuid/source.url",$sourceurl);
&UPDXLT::setcachestatus("$repository/$vendorid/$uuid/status",$UPDXLT::sfOk);
diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator
index cdc7eeb50..1febae51e 100644
--- a/config/updxlrator/updxlrator
+++ b/config/updxlrator/updxlrator
@@ -338,11 +338,13 @@ sub check_cache
      my $sourceurl=$_[0];
      my $cfmirror=$_[4];

+    # Any changes made here must be mirrored in "download"
      $sourceurl =~ s@\%2b(a)+@ig;
      $sourceurl =~ s@\%2f@/@ig;
      $sourceurl =~ s@\%7e@~@ig;
      $updfile = substr($sourceurl,rindex($sourceurl,"/")+1);
      $updfile =~ s@\%20@ @ig;
+    $updfile =~ s@\%5c@\\@ig;

      if ($cfmirror)
      {

--
2.17.1


       reply	other threads:[~2019-12-05 13:07 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <a345f1bf-67d7-9ec6-bba2-ac84ea02a272@mail.com>
2019-12-05 13:07 ` Justin Luth [this message]
2019-12-06  6:55   ` Michael Tremer
2019-12-09 17:57     ` Justin Luth
2019-12-13 22:45       ` Michael Tremer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fdfa42e-cb04-be00-3605-1284d294f5c5@mail.com \
    --to=jluth@mail.com \
    --cc=development@lists.ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox