From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: [PATCH] Fix bug 12252: updxlrator - handle \ in filename Date: Fri, 06 Dec 2019 06:55:00 +0000 Message-ID: <97DB63BD-8A53-4EA8-89C7-6B7A6D4224C5@ipfire.org> In-Reply-To: <9fdfa42e-cb04-be00-3605-1284d294f5c5@mail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5057088576914056511==" List-Id: --===============5057088576914056511== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi, Thank you very much for looking into this. This seems to be a tricky one. > On 5 Dec 2019, at 13:07, Justin Luth wrote: >=20 > Recently some Microsoft patches are alternating between using > \ and / as directory separators, leaving files behind in > /var/updatecache/download that can only be removed in a terminal. >=20 > What was happening was that the lock file was created using the > URL name containing %5c while wget was actually writing to a filename > with a real backslash. The zero-sized lockfile was moved into the uuid > folder, and the actual .cab file was orphaned in the download folder, > indicated in the GUI as a stranded file that couldn't be deleted. >=20 > An alternate way to fix the problem could be to force wget to > use --restrict-file-names=3Dwindows - in which case \ | : ? " * < > > would all be saved as escaped codes. That would seem to better > match the URL mangling that apparently is happening. > Cons: > -removing sourceurl mangling means re-downloading existing caches. > -less human-readable URLs / filesnames > -conceptual alternative: I didn't prove/test this theory since it > means re-downloading all oddly-named files. >=20 > P.S. I didn't change perl's utime command to use updatefile_shellEscaped > since the filepath is in quotes - the space didn't actually need to be > escaped. >=20 > --- >=20 > config/updxlrator/download | 20 ++++++++++++++++---- > config/updxlrator/updxlrator | 2 ++ > 2 files changed, 18 insertions(+), 4 deletions(-) >=20 > diff --git a/config/updxlrator/download b/config/updxlrator/download > index afa6e6cb9..df4a58725 100644 > --- a/config/updxlrator/download > +++ b/config/updxlrator/download > @@ -26,6 +26,7 @@ my @http_header=3D(); > my $remote_size=3D0; > my $remote_mtime=3D0; > my $updatefile=3D''; > +my $updatefile_shellEscaped=3D''; #double escaped for re-processing by > shell/system calls > my $unique=3D0; > my $mirror=3D1; >=20 > @@ -38,11 +39,24 @@ my $restartdl =3D defined($ARGV[3]) ? $ARGV[3] : 0; >=20 > umask(0002); >=20 > +# Hopefully correct documentation: based on debugging a "\" problem... > +# WGET doesn't use the filename from the URL, but gets the filename > from the server info. > +# -mangles that name according to --restrict-file-names. > +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. > +# -(which suggests the entry for %2f is incorrect BTW) > +# The URL passed to this module is mangled (correct? - perhaps by > http://search.cpan.org/dist/URI/lib/URI/Escape.pm) > +# -but $updatefile needs to match wget's download name, regardless of > the URL's encoding. > +# -achievable either by mangling the URL and/or the filename. > +# -*make sure "updxlrator" has the same adjustments* > $sourceurl =3D~ s@\%2b(a)+@ig; > $sourceurl =3D~ s@\%2f@/@ig; > $sourceurl =3D~ s@\%7e@~@ig; > $updatefile =3D substr($sourceurl,rindex($sourceurl,"/")+1); > +$updatefile_shellEscaped =3D $updatefile; > +$updatefile_shellEscaped =3D~ s@\%20@\\ @ig; > +$updatefile_shellEscaped =3D~ s@\%5c@\\\\@ig; > $updatefile =3D~ s@\%20@ @ig; > +$updatefile =3D~ s@\%5c@\\@ig; > $vendorid =3D~ tr/A-Z/a-z/; This is really messy code. Not yours. What has been here before. It is quite hard to figure out what was actually supposed to happen here. The= replacements look more or less like fixes instead of thinking ahead and deco= ding any special characters. > unless (-d "$repository/download/$vendorid") > @@ -58,7 +72,7 @@ if($restartdl =3D=3D 0) >=20 > # hinder multiple downloads from starting simultaneously. Create > empty "lock" file. > # TODO: Another thread may sneak in between these two commands - > so not fool-proof, but good enough? > - system("touch $repository/download/$vendorid/$updatefile"); > + system("touch > $repository/download/$vendorid/$updatefile_shellEscaped"); However, this is all needed because we are calling shell commands here to mov= e a file. I suppose that that is quite dangerous because any forged filenames could be = used to run any shell commands. If we would use perl=E2=80=99s move() function, then we could avoid escaping = the characters here. They should all be allowed to be used in the file system: https://perldoc.perl.org/File/Copy.html#move Would you like to try that option? >=20 > } > else > @@ -169,11 +183,9 @@ if ($_ =3D=3D 0) > } >=20 > &writelog("Moving file to the cache directory: $vendorid/$uuid"); > - $updatefile =3D~ s@ @\\ @ig; > - system("mv $repository/download/$vendorid/$updatefile > $repository/$vendorid/$uuid"); > + system("mv $repository/download/$vendorid/$updatefile_shellEscaped > $repository/$vendorid/$uuid"); Same here. > # Workaround for IPCop's mv bug: > utime time,$remote_mtime,"$repository/$vendorid/$uuid/$updatefile"; > - $updatefile =3D~ s@\\ @ @ig; >=20 > &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/source.url",$sourceurl= ); > &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/status",$UPDXLT::sfOk); > diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator > index cdc7eeb50..1febae51e 100644 > --- a/config/updxlrator/updxlrator > +++ b/config/updxlrator/updxlrator > @@ -338,11 +338,13 @@ sub check_cache > my $sourceurl=3D$_[0]; > my $cfmirror=3D$_[4]; >=20 > + # Any changes made here must be mirrored in "download" > $sourceurl =3D~ s@\%2b(a)+@ig; > $sourceurl =3D~ s@\%2f@/@ig; > $sourceurl =3D~ s@\%7e@~@ig; > $updfile =3D substr($sourceurl,rindex($sourceurl,"/")+1); > $updfile =3D~ s@\%20@ @ig; > + $updfile =3D~ s@\%5c@\\@ig; I think we will still need this one then, don=E2=80=99t we? >=20 > if ($cfmirror) > { >=20 > =E2=80=94 > 2.17.1 >=20 -Michael --===============5057088576914056511==--