Hi, Sorry for not replying earlier. It is the run-up to Christmas and I am quite busy with a thousand little things. So stay tuned if you do not hear from me very swiftly. > On 9 Dec 2019, at 17:57, Justin Luth wrote: > > My minimally updated patch is re-included at the end of the message, and > comments are responded to inline with the original context. I just > tweaked some comment messages, and replaced a system call that I > introduced recently with a perl-native implementation. > > On 12/6/19 9:55 AM, Michael Tremer wrote: >> This seems to be a tricky one. >> >> +# Hopefully correct documentation: based on debugging a "\" problem... >> +# WGET doesn't use the filename from the URL, but gets the filename >> from the server info. >> +# -mangles that name according to --restrict-file-names. >> +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. >> +# -(which suggests the entry for %2f is incorrect BTW) >> +# The URL passed to this module is mangled (correct? - perhaps by >> http://search.cpan.org/dist/URI/lib/URI/Escape.pm) >> +# -but $updatefile needs to match wget's download name, regardless of >> the URL's encoding. >> +# -achievable either by mangling the URL and/or the filename. >> +# -*make sure "updxlrator" has the same adjustments* >> $sourceurl =~ s@\%2b(a)+@ig; >> $sourceurl =~ s@\%2f@/@ig; >> $sourceurl =~ s@\%7e@~@ig; >> $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1); >> +$updatefile_shellEscaped = $updatefile; >> +$updatefile_shellEscaped =~ s@\%20@\\ @ig; >> +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig; >> $updatefile =~ s@\%20@ @ig; >> +$updatefile =~ s@\%5c@\\@ig; >> $vendorid =~ tr/A-Z/a-z/; >> >> It is quite hard to figure out what was actually supposed to happen here. The replacements look more or less like fixes instead of thinking ahead and decoding any special characters. > Maybe, but most of the escaped characters are really weird for use in > filenames, so likely the approach was "instead of decoding every > possibility, lets just do the ones that are actually in use." And in 7 > years I haven't noticed backslashes being a problem - but suddenly a new > patch URL has popped up (wrongly) requesting a backslash. So now we need > to deal with that situation or else have our ipfire cache look > increasingly messy. Yeah it looks like it, but try and error isn’t always the best approach for software development. >> >>> unless (-d "$repository/download/$vendorid") >>> @@ -58,7 +72,7 @@ if($restartdl == 0) >>> >>> # hinder multiple downloads from starting simultaneously. Create >>> empty "lock" file. >>> # TODO: Another thread may sneak in between these two commands - >>> so not fool-proof, but good enough? >>> - system("touch $repository/download/$vendorid/$updatefile"); >>> + system("touch >>> $repository/download/$vendorid/$updatefile_shellEscaped"); >> However, this is all needed because we are calling shell commands here to move a file. > Not ALL of this is needed because of shell commands. Only the duplicate > _shellEscaped variable is needed for the shell commands. The addition of > > $updatefile =~ s@\%5c@\\@ig; > > is required regardless of using shell commands or not. >> If we would use perl’s move() function, then we could avoid escaping the characters here. They should all be allowed to be used in the file system: >> >> https://perldoc.perl.org/File/Copy.html#move >> >> Would you like to try that option? > Not really, since I'm just a bugfixer and not a perl programmer. But I > did try it, and it (silently) failed to work. I also tried to use > make_path instead of mkdir -p and that also silently failed. So I > abandoned the attempt to remove all system calls. I'm also not eager to > introduce regressions, and that is exactly what would happen if I > started playing with things that I don't understand and don't know how > to debug. LOL, nobody is eager to introduce any regressions :) If you have IO operations like mkdir() you can catch any errors with the “die” keyword: mkdir(“$path/file”) or die “Could not create directory: $!”; That will cause the script to abort and log the message. >> >> -Michael > > Recently some Microsoft patches are alternating between using > \ and / as directory separators, leaving files behind in > updatecache/download that can only be removed in a terminal. > > What was happening was that the lock file was created using the > URL name containing %5c while wget was actually writing to a filename > with a real backslash. The zero-sized lockfile was moved into the uuid > folder, and the actual .cab file was orphaned in the download folder, > indicated in the GUI as a stranded file that couldn't be deleted. > > An alternate way to fix the problem could be to force wget to > use --restrict-file-names=windows - in which case \ | : ? " * < > > would all be saved as escaped codes. That would seem to better > match the URL mangling that apparently is happening. > Cons: > -removing sourceurl mangling means re-downloading existing caches. > -less human-readable URLs / filenames > -conceptual alternative: I didn't prove/test this theory since it > means re-downloading all oddly-named files. I would consider that that is a change that has more implications for the rest of the code which is not robust enough against it. Best, -Michael > > P.S. I didn't change perl's utime command to use updatefile_shellEscaped > since the filepath is in quotes - the space didn't actually need to be > escaped. > --- > config/updxlrator/download | 21 +++++++++++++++++---- > config/updxlrator/updxlrator | 2 ++ > 2 files changed, 19 insertions(+), 4 deletions(-) > > diff --git a/config/updxlrator/download b/config/updxlrator/download > index afa6e6cb9..ab89171e4 100644 > --- a/config/updxlrator/download > +++ b/config/updxlrator/download > @@ -26,6 +26,7 @@ my @http_header=(); > my $remote_size=0; > my $remote_mtime=0; > my $updatefile=''; > +my $updatefile_shellEscaped=''; #double escaped for re-processing by shell/system calls > my $unique=0; > my $mirror=1; > > @@ -38,11 +39,24 @@ my $restartdl = defined($ARGV[3]) ? $ARGV[3] : 0; > > umask(0002); > > +# Hopefully correct documentation: based on debugging a "\" problem... > +# WGET doesn't use the filename from the URL, but gets the filename from the server info. > +# -mangles that name according to --restrict-file-names. > +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. > +# -(which suggests the entry for %2f is incorrect BTW) > +# The URL passed to this module is mangled (correct? - perhaps by http://search.cpan.org/dist/URI/lib/URI/Escape.pm) > +# -but $updatefile needs to match wget's download name, regardless of the URL's encoding. > +# -achievable either by mangling the URL and/or the filename. > +# -*make sure "updxlrator" has the same adjustments* > $sourceurl =~ s@\%2b(a)+@ig; > $sourceurl =~ s@\%2f@/@ig; > $sourceurl =~ s@\%7e@~@ig; > $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1); > +$updatefile_shellEscaped = $updatefile; > +$updatefile_shellEscaped =~ s@\%20@\\ @ig; > +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig; > $updatefile =~ s@\%20@ @ig; > +$updatefile =~ s@\%5c@\\@ig; #added to fix ipfire bug 12252 > $vendorid =~ tr/A-Z/a-z/; > > unless (-d "$repository/download/$vendorid") > @@ -58,8 +72,9 @@ if($restartdl == 0) > > # hinder multiple downloads from starting simultaneously. Create empty "lock" file. > # TODO: Another thread may sneak in between these two commands - so not fool-proof, but good enough? > - system("touch $repository/download/$vendorid/$updatefile"); > > + #use a native perl implementation of: system("touch $repository/download/$vendorid/$updatefile_shellEscaped"); > + { open my $lock_file, ">>", "$repository/download/$vendorid/$updatefile" } > } > else > { > @@ -169,11 +184,9 @@ if ($_ == 0) > } > > &writelog("Moving file to the cache directory: $vendorid/$uuid"); > - $updatefile =~ s@ @\\ @ig; > - system("mv $repository/download/$vendorid/$updatefile $repository/$vendorid/$uuid"); > + system("mv $repository/download/$vendorid/$updatefile_shellEscaped $repository/$vendorid/$uuid"); > # Workaround for IPCop's mv bug: > utime time,$remote_mtime,"$repository/$vendorid/$uuid/$updatefile"; > - $updatefile =~ s@\\ @ @ig; > > &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/source.url",$sourceurl); > &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/status",$UPDXLT::sfOk); > diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator > index cdc7eeb50..1febae51e 100644 > --- a/config/updxlrator/updxlrator > +++ b/config/updxlrator/updxlrator > @@ -338,11 +338,13 @@ sub check_cache > my $sourceurl=$_[0]; > my $cfmirror=$_[4]; > > + # Any changes made here must be mirrored in "download" > $sourceurl =~ s@\%2b(a)+@ig; > $sourceurl =~ s@\%2f@/@ig; > $sourceurl =~ s@\%7e@~@ig; > $updfile = substr($sourceurl,rindex($sourceurl,"/")+1); > $updfile =~ s@\%20@ @ig; > + $updfile =~ s@\%5c@\\@ig; > > if ($cfmirror) > { > -- > 2.17.1 >