My minimally updated patch is re-included at the end of the message, and comments are responded to inline with the original context. I just tweaked some comment messages, and replaced a system call that I introduced recently with a perl-native implementation.
On 12/6/19 9:55 AM, Michael Tremer wrote:
This seems to be a tricky one.
+# Hopefully correct documentation: based on debugging a "" problem... +# WGET doesn't use the filename from the URL, but gets the filename from the server info. +# -mangles that name according to --restrict-file-names. +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. +# -(which suggests the entry for %2f is incorrect BTW) +# The URL passed to this module is mangled (correct? - perhaps by http://search.cpan.org/dist/URI/lib/URI/Escape.pm) +# -but $updatefile needs to match wget's download name, regardless of the URL's encoding. +# -achievable either by mangling the URL and/or the filename. +# -*make sure "updxlrator" has the same adjustments* $sourceurl =~ s@%2b@+@ig; $sourceurl =~ s@%2f@/@ig; $sourceurl =~ s@%7e@~@ig; $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1); +$updatefile_shellEscaped = $updatefile; +$updatefile_shellEscaped =~ s@%20@\ @ig; +$updatefile_shellEscaped =~ s@%5c@\\@ig; $updatefile =~ s@%20@ @ig; +$updatefile =~ s@%5c@\@ig; $vendorid =~ tr/A-Z/a-z/;
It is quite hard to figure out what was actually supposed to happen here. The replacements look more or less like fixes instead of thinking ahead and decoding any special characters.
Maybe, but most of the escaped characters are really weird for use in filenames, so likely the approach was "instead of decoding every possibility, lets just do the ones that are actually in use." And in 7 years I haven't noticed backslashes being a problem - but suddenly a new patch URL has popped up (wrongly) requesting a backslash. So now we need to deal with that situation or else have our ipfire cache look increasingly messy.
unless (-d "$repository/download/$vendorid") @@ -58,7 +72,7 @@ if($restartdl == 0)
# hinder multiple downloads from starting simultaneously. Create
empty "lock" file. # TODO: Another thread may sneak in between these two commands - so not fool-proof, but good enough?
- system("touch $repository/download/$vendorid/$updatefile");
- system("touch
$repository/download/$vendorid/$updatefile_shellEscaped");
However, this is all needed because we are calling shell commands here to move a file.
Not ALL of this is needed because of shell commands. Only the duplicate _shellEscaped variable is needed for the shell commands. The addition of
$updatefile =~ s@%5c@\@ig;
is required regardless of using shell commands or not.
If we would use perl’s move() function, then we could avoid escaping the characters here. They should all be allowed to be used in the file system:
https://perldoc.perl.org/File/Copy.html#move
Would you like to try that option?
Not really, since I'm just a bugfixer and not a perl programmer. But I did try it, and it (silently) failed to work. I also tried to use make_path instead of mkdir -p and that also silently failed. So I abandoned the attempt to remove all system calls. I'm also not eager to introduce regressions, and that is exactly what would happen if I started playing with things that I don't understand and don't know how to debug.
-Michael
Recently some Microsoft patches are alternating between using \ and / as directory separators, leaving files behind in updatecache/download that can only be removed in a terminal.
What was happening was that the lock file was created using the URL name containing %5c while wget was actually writing to a filename with a real backslash. The zero-sized lockfile was moved into the uuid folder, and the actual .cab file was orphaned in the download folder, indicated in the GUI as a stranded file that couldn't be deleted.
An alternate way to fix the problem could be to force wget to use --restrict-file-names=windows - in which case \ | : ? " * < > would all be saved as escaped codes. That would seem to better match the URL mangling that apparently is happening. Cons: -removing sourceurl mangling means re-downloading existing caches. -less human-readable URLs / filenames -conceptual alternative: I didn't prove/test this theory since it means re-downloading all oddly-named files.
P.S. I didn't change perl's utime command to use updatefile_shellEscaped since the filepath is in quotes - the space didn't actually need to be escaped. --- config/updxlrator/download | 21 +++++++++++++++++---- config/updxlrator/updxlrator | 2 ++ 2 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/config/updxlrator/download b/config/updxlrator/download index afa6e6cb9..ab89171e4 100644 --- a/config/updxlrator/download +++ b/config/updxlrator/download @@ -26,6 +26,7 @@ my @http_header=(); my $remote_size=0; my $remote_mtime=0; my $updatefile=''; +my $updatefile_shellEscaped=''; #double escaped for re-processing by shell/system calls my $unique=0; my $mirror=1;
@@ -38,11 +39,24 @@ my $restartdl = defined($ARGV[3]) ? $ARGV[3] : 0;
umask(0002);
+# Hopefully correct documentation: based on debugging a "" problem... +# WGET doesn't use the filename from the URL, but gets the filename from the server info. +# -mangles that name according to --restrict-file-names. +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. +# -(which suggests the entry for %2f is incorrect BTW) +# The URL passed to this module is mangled (correct? - perhaps by http://search.cpan.org/dist/URI/lib/URI/Escape.pm) +# -but $updatefile needs to match wget's download name, regardless of the URL's encoding. +# -achievable either by mangling the URL and/or the filename. +# -*make sure "updxlrator" has the same adjustments* $sourceurl =~ s@%2b@+@ig; $sourceurl =~ s@%2f@/@ig; $sourceurl =~ s@%7e@~@ig; $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1); +$updatefile_shellEscaped = $updatefile; +$updatefile_shellEscaped =~ s@%20@\ @ig; +$updatefile_shellEscaped =~ s@%5c@\\@ig; $updatefile =~ s@%20@ @ig; +$updatefile =~ s@%5c@\@ig; #added to fix ipfire bug 12252 $vendorid =~ tr/A-Z/a-z/;
unless (-d "$repository/download/$vendorid") @@ -58,8 +72,9 @@ if($restartdl == 0)
# hinder multiple downloads from starting simultaneously. Create empty "lock" file. # TODO: Another thread may sneak in between these two commands - so not fool-proof, but good enough? - system("touch $repository/download/$vendorid/$updatefile");
+ #use a native perl implementation of: system("touch $repository/download/$vendorid/$updatefile_shellEscaped"); + { open my $lock_file, ">>", "$repository/download/$vendorid/$updatefile" } } else { @@ -169,11 +184,9 @@ if ($_ == 0) }
&writelog("Moving file to the cache directory: $vendorid/$uuid"); - $updatefile =~ s@ @\ @ig; - system("mv $repository/download/$vendorid/$updatefile $repository/$vendorid/$uuid"); + system("mv $repository/download/$vendorid/$updatefile_shellEscaped $repository/$vendorid/$uuid"); # Workaround for IPCop's mv bug: utime time,$remote_mtime,"$repository/$vendorid/$uuid/$updatefile"; - $updatefile =~ s@\ @ @ig;
&UPDXLT::setcachestatus("$repository/$vendorid/$uuid/source.url",$sourceurl); &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/status",$UPDXLT::sfOk); diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator index cdc7eeb50..1febae51e 100644 --- a/config/updxlrator/updxlrator +++ b/config/updxlrator/updxlrator @@ -338,11 +338,13 @@ sub check_cache my $sourceurl=$_[0]; my $cfmirror=$_[4];
+ # Any changes made here must be mirrored in "download" $sourceurl =~ s@%2b@+@ig; $sourceurl =~ s@%2f@/@ig; $sourceurl =~ s@%7e@~@ig; $updfile = substr($sourceurl,rindex($sourceurl,"/")+1); $updfile =~ s@%20@ @ig; + $updfile =~ s@%5c@\@ig;
if ($cfmirror) { -- 2.17.1