From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: [PATCH] Fix bug 12252: updxlrator - handle \ in filename Date: Fri, 13 Dec 2019 22:45:29 +0000 Message-ID: <55B66DB1-FD7E-4AA0-B333-3813C19E4094@ipfire.org> In-Reply-To: <478e0c2b-a656-e5c2-6051-e10f1b3596fb@mail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2427243766983400828==" List-Id: --===============2427243766983400828== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi, Sorry for not replying earlier. It is the run-up to Christmas and I am quite busy with a thousand little thin= gs. So stay tuned if you do not hear from me very swiftly. > On 9 Dec 2019, at 17:57, Justin Luth wrote: >=20 > My minimally updated patch is re-included at the end of the message, and > comments are responded to inline with the original context. I just > tweaked some comment messages, and replaced a system call that I > introduced recently with a perl-native implementation. >=20 > On 12/6/19 9:55 AM, Michael Tremer wrote: >> This seems to be a tricky one. >>=20 >> +# Hopefully correct documentation: based on debugging a "\" problem... >> +# WGET doesn't use the filename from the URL, but gets the filename >> from the server info. >> +# -mangles that name according to --restrict-file-names. >> +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. >> +# -(which suggests the entry for %2f is incorrect BTW) >> +# The URL passed to this module is mangled (correct? - perhaps by >> http://search.cpan.org/dist/URI/lib/URI/Escape.pm) >> +# -but $updatefile needs to match wget's download name, regardless of >> the URL's encoding. >> +# -achievable either by mangling the URL and/or the filename. >> +# -*make sure "updxlrator" has the same adjustments* >> $sourceurl =3D~ s@\%2b(a)+@ig; >> $sourceurl =3D~ s@\%2f@/@ig; >> $sourceurl =3D~ s@\%7e@~@ig; >> $updatefile =3D substr($sourceurl,rindex($sourceurl,"/")+1); >> +$updatefile_shellEscaped =3D $updatefile; >> +$updatefile_shellEscaped =3D~ s@\%20@\\ @ig; >> +$updatefile_shellEscaped =3D~ s@\%5c@\\\\@ig; >> $updatefile =3D~ s@\%20@ @ig; >> +$updatefile =3D~ s@\%5c@\\@ig; >> $vendorid =3D~ tr/A-Z/a-z/; >>=20 >> It is quite hard to figure out what was actually supposed to happen here. = The replacements look more or less like fixes instead of thinking ahead and d= ecoding any special characters. > Maybe, but most of the escaped characters are really weird for use in > filenames, so likely the approach was "instead of decoding every > possibility, lets just do the ones that are actually in use." And in 7 > years I haven't noticed backslashes being a problem - but suddenly a new > patch URL has popped up (wrongly) requesting a backslash. So now we need > to deal with that situation or else have our ipfire cache look > increasingly messy. Yeah it looks like it, but try and error isn=E2=80=99t always the best approa= ch for software development. >>=20 >>> unless (-d "$repository/download/$vendorid") >>> @@ -58,7 +72,7 @@ if($restartdl =3D=3D 0) >>>=20 >>> # hinder multiple downloads from starting simultaneously. Create >>> empty "lock" file. >>> # TODO: Another thread may sneak in between these two commands - >>> so not fool-proof, but good enough? >>> - system("touch $repository/download/$vendorid/$updatefile"); >>> + system("touch >>> $repository/download/$vendorid/$updatefile_shellEscaped"); >> However, this is all needed because we are calling shell commands here to = move a file. > Not ALL of this is needed because of shell commands. Only the duplicate > _shellEscaped variable is needed for the shell commands. The addition of >=20 > $updatefile =3D~ s@\%5c@\\@ig; >=20 > is required regardless of using shell commands or not. >> If we would use perl=E2=80=99s move() function, then we could avoid escapi= ng the characters here. They should all be allowed to be used in the file sys= tem: >>=20 >> https://perldoc.perl.org/File/Copy.html#move >>=20 >> Would you like to try that option? > Not really, since I'm just a bugfixer and not a perl programmer. But I > did try it, and it (silently) failed to work. I also tried to use > make_path instead of mkdir -p and that also silently failed. So I > abandoned the attempt to remove all system calls. I'm also not eager to > introduce regressions, and that is exactly what would happen if I > started playing with things that I don't understand and don't know how > to debug. LOL, nobody is eager to introduce any regressions :) If you have IO operations like mkdir() you can catch any errors with the =E2= =80=9Cdie=E2=80=9D keyword: mkdir(=E2=80=9C$path/file=E2=80=9D) or die =E2=80=9CCould not create direct= ory: $!=E2=80=9D; That will cause the script to abort and log the message. >>=20 >> -Michael >=20 > Recently some Microsoft patches are alternating between using > \ and / as directory separators, leaving files behind in > updatecache/download that can only be removed in a terminal. >=20 > What was happening was that the lock file was created using the > URL name containing %5c while wget was actually writing to a filename > with a real backslash. The zero-sized lockfile was moved into the uuid > folder, and the actual .cab file was orphaned in the download folder, > indicated in the GUI as a stranded file that couldn't be deleted. >=20 > An alternate way to fix the problem could be to force wget to > use --restrict-file-names=3Dwindows - in which case \ | : ? " * < > > would all be saved as escaped codes. That would seem to better > match the URL mangling that apparently is happening. > Cons: > -removing sourceurl mangling means re-downloading existing caches. > -less human-readable URLs / filenames > -conceptual alternative: I didn't prove/test this theory since it > means re-downloading all oddly-named files. I would consider that that is a change that has more implications for the res= t of the code which is not robust enough against it. Best, -Michael >=20 > P.S. I didn't change perl's utime command to use updatefile_shellEscaped > since the filepath is in quotes - the space didn't actually need to be > escaped. > --- > config/updxlrator/download | 21 +++++++++++++++++---- > config/updxlrator/updxlrator | 2 ++ > 2 files changed, 19 insertions(+), 4 deletions(-) >=20 > diff --git a/config/updxlrator/download b/config/updxlrator/download > index afa6e6cb9..ab89171e4 100644 > --- a/config/updxlrator/download > +++ b/config/updxlrator/download > @@ -26,6 +26,7 @@ my @http_header=3D(); > my $remote_size=3D0; > my $remote_mtime=3D0; > my $updatefile=3D''; > +my $updatefile_shellEscaped=3D''; #double escaped for re-processing by sh= ell/system calls > my $unique=3D0; > my $mirror=3D1; >=20 > @@ -38,11 +39,24 @@ my $restartdl =3D defined($ARGV[3]) ? $ARGV[3] : 0; >=20 > umask(0002); >=20 > +# Hopefully correct documentation: based on debugging a "\" problem... > +# WGET doesn't use the filename from the URL, but gets the filename from t= he server info. > +# -mangles that name according to --restrict-file-names. > +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. > +# -(which suggests the entry for %2f is incorrect BTW) > +# The URL passed to this module is mangled (correct? - perhaps by http://s= earch.cpan.org/dist/URI/lib/URI/Escape.pm) > +# -but $updatefile needs to match wget's download name, regardless of th= e URL's encoding. > +# -achievable either by mangling the URL and/or the filename. > +# -*make sure "updxlrator" has the same adjustments* > $sourceurl =3D~ s@\%2b(a)+@ig; > $sourceurl =3D~ s@\%2f@/@ig; > $sourceurl =3D~ s@\%7e@~@ig; > $updatefile =3D substr($sourceurl,rindex($sourceurl,"/")+1); > +$updatefile_shellEscaped =3D $updatefile; > +$updatefile_shellEscaped =3D~ s@\%20@\\ @ig; > +$updatefile_shellEscaped =3D~ s@\%5c@\\\\@ig; > $updatefile =3D~ s@\%20@ @ig; > +$updatefile =3D~ s@\%5c@\\@ig; #added to fix ipfire bug 12252 > $vendorid =3D~ tr/A-Z/a-z/; >=20 > unless (-d "$repository/download/$vendorid") > @@ -58,8 +72,9 @@ if($restartdl =3D=3D 0) >=20 > # hinder multiple downloads from starting simultaneously. Create empty "lo= ck" file. > # TODO: Another thread may sneak in between these two commands - so not fo= ol-proof, but good enough? > - system("touch $repository/download/$vendorid/$updatefile"); >=20 > + #use a native perl implementation of: system("touch $repository/download= /$vendorid/$updatefile_shellEscaped"); > + { open my $lock_file, ">>", "$repository/download/$vendorid/$updatefile" } > } > else > { > @@ -169,11 +184,9 @@ if ($_ =3D=3D 0) > } >=20 > &writelog("Moving file to the cache directory: $vendorid/$uuid"); > - $updatefile =3D~ s@ @\\ @ig; > - system("mv $repository/download/$vendorid/$updatefile $repository/$vendor= id/$uuid"); > + system("mv $repository/download/$vendorid/$updatefile_shellEscaped $repos= itory/$vendorid/$uuid"); > # Workaround for IPCop's mv bug: > utime time,$remote_mtime,"$repository/$vendorid/$uuid/$updatefile"; > - $updatefile =3D~ s@\\ @ @ig; >=20 > &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/source.url",$sourceur= l); > &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/status",$UPDXLT::sfOk= ); > diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator > index cdc7eeb50..1febae51e 100644 > --- a/config/updxlrator/updxlrator > +++ b/config/updxlrator/updxlrator > @@ -338,11 +338,13 @@ sub check_cache > my $sourceurl=3D$_[0]; > my $cfmirror=3D$_[4]; >=20 > + # Any changes made here must be mirrored in "download" > $sourceurl =3D~ s@\%2b(a)+@ig; > $sourceurl =3D~ s@\%2f@/@ig; > $sourceurl =3D~ s@\%7e@~@ig; > $updfile =3D substr($sourceurl,rindex($sourceurl,"/")+1); > $updfile =3D~ s@\%20@ @ig; > + $updfile =3D~ s@\%5c@\\@ig; >=20 > if ($cfmirror) > { > -- > 2.17.1 >=20 --===============2427243766983400828==--