From: Michael Tremer <michael.tremer@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: [PATCH] Fix bug 12252: updxlrator - handle \ in filename
Date: Fri, 13 Dec 2019 22:45:29 +0000 [thread overview]
Message-ID: <55B66DB1-FD7E-4AA0-B333-3813C19E4094@ipfire.org> (raw)
In-Reply-To: <478e0c2b-a656-e5c2-6051-e10f1b3596fb@mail.com>
[-- Attachment #1: Type: text/plain, Size: 9359 bytes --]
Hi,
Sorry for not replying earlier.
It is the run-up to Christmas and I am quite busy with a thousand little things. So stay tuned if you do not hear from me very swiftly.
> On 9 Dec 2019, at 17:57, Justin Luth <jluth(a)mail.com> wrote:
>
> My minimally updated patch is re-included at the end of the message, and
> comments are responded to inline with the original context. I just
> tweaked some comment messages, and replaced a system call that I
> introduced recently with a perl-native implementation.
>
> On 12/6/19 9:55 AM, Michael Tremer wrote:
>> This seems to be a tricky one.
>>
>> +# Hopefully correct documentation: based on debugging a "\" problem...
>> +# WGET doesn't use the filename from the URL, but gets the filename
>> from the server info.
>> +# -mangles that name according to --restrict-file-names.
>> +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled.
>> +# -(which suggests the entry for %2f is incorrect BTW)
>> +# The URL passed to this module is mangled (correct? - perhaps by
>> http://search.cpan.org/dist/URI/lib/URI/Escape.pm)
>> +# -but $updatefile needs to match wget's download name, regardless of
>> the URL's encoding.
>> +# -achievable either by mangling the URL and/or the filename.
>> +# -*make sure "updxlrator" has the same adjustments*
>> $sourceurl =~ s@\%2b(a)+@ig;
>> $sourceurl =~ s@\%2f@/@ig;
>> $sourceurl =~ s@\%7e@~@ig;
>> $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1);
>> +$updatefile_shellEscaped = $updatefile;
>> +$updatefile_shellEscaped =~ s@\%20@\\ @ig;
>> +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig;
>> $updatefile =~ s@\%20@ @ig;
>> +$updatefile =~ s@\%5c@\\@ig;
>> $vendorid =~ tr/A-Z/a-z/;
>>
>> It is quite hard to figure out what was actually supposed to happen here. The replacements look more or less like fixes instead of thinking ahead and decoding any special characters.
> Maybe, but most of the escaped characters are really weird for use in
> filenames, so likely the approach was "instead of decoding every
> possibility, lets just do the ones that are actually in use." And in 7
> years I haven't noticed backslashes being a problem - but suddenly a new
> patch URL has popped up (wrongly) requesting a backslash. So now we need
> to deal with that situation or else have our ipfire cache look
> increasingly messy.
Yeah it looks like it, but try and error isn’t always the best approach for software development.
>>
>>> unless (-d "$repository/download/$vendorid")
>>> @@ -58,7 +72,7 @@ if($restartdl == 0)
>>>
>>> # hinder multiple downloads from starting simultaneously. Create
>>> empty "lock" file.
>>> # TODO: Another thread may sneak in between these two commands -
>>> so not fool-proof, but good enough?
>>> - system("touch $repository/download/$vendorid/$updatefile");
>>> + system("touch
>>> $repository/download/$vendorid/$updatefile_shellEscaped");
>> However, this is all needed because we are calling shell commands here to move a file.
> Not ALL of this is needed because of shell commands. Only the duplicate
> _shellEscaped variable is needed for the shell commands. The addition of
>
> $updatefile =~ s@\%5c@\\@ig;
>
> is required regardless of using shell commands or not.
>> If we would use perl’s move() function, then we could avoid escaping the characters here. They should all be allowed to be used in the file system:
>>
>> https://perldoc.perl.org/File/Copy.html#move
>>
>> Would you like to try that option?
> Not really, since I'm just a bugfixer and not a perl programmer. But I
> did try it, and it (silently) failed to work. I also tried to use
> make_path instead of mkdir -p and that also silently failed. So I
> abandoned the attempt to remove all system calls. I'm also not eager to
> introduce regressions, and that is exactly what would happen if I
> started playing with things that I don't understand and don't know how
> to debug.
LOL, nobody is eager to introduce any regressions :)
If you have IO operations like mkdir() you can catch any errors with the “die” keyword:
mkdir(“$path/file”) or die “Could not create directory: $!”;
That will cause the script to abort and log the message.
>>
>> -Michael
>
> Recently some Microsoft patches are alternating between using
> \ and / as directory separators, leaving files behind in
> updatecache/download that can only be removed in a terminal.
>
> What was happening was that the lock file was created using the
> URL name containing %5c while wget was actually writing to a filename
> with a real backslash. The zero-sized lockfile was moved into the uuid
> folder, and the actual .cab file was orphaned in the download folder,
> indicated in the GUI as a stranded file that couldn't be deleted.
>
> An alternate way to fix the problem could be to force wget to
> use --restrict-file-names=windows - in which case \ | : ? " * < >
> would all be saved as escaped codes. That would seem to better
> match the URL mangling that apparently is happening.
> Cons:
> -removing sourceurl mangling means re-downloading existing caches.
> -less human-readable URLs / filenames
> -conceptual alternative: I didn't prove/test this theory since it
> means re-downloading all oddly-named files.
I would consider that that is a change that has more implications for the rest of the code which is not robust enough against it.
Best,
-Michael
>
> P.S. I didn't change perl's utime command to use updatefile_shellEscaped
> since the filepath is in quotes - the space didn't actually need to be
> escaped.
> ---
> config/updxlrator/download | 21 +++++++++++++++++----
> config/updxlrator/updxlrator | 2 ++
> 2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/config/updxlrator/download b/config/updxlrator/download
> index afa6e6cb9..ab89171e4 100644
> --- a/config/updxlrator/download
> +++ b/config/updxlrator/download
> @@ -26,6 +26,7 @@ my @http_header=();
> my $remote_size=0;
> my $remote_mtime=0;
> my $updatefile='';
> +my $updatefile_shellEscaped=''; #double escaped for re-processing by shell/system calls
> my $unique=0;
> my $mirror=1;
>
> @@ -38,11 +39,24 @@ my $restartdl = defined($ARGV[3]) ? $ARGV[3] : 0;
>
> umask(0002);
>
> +# Hopefully correct documentation: based on debugging a "\" problem...
> +# WGET doesn't use the filename from the URL, but gets the filename from the server info.
> +# -mangles that name according to --restrict-file-names.
> +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled.
> +# -(which suggests the entry for %2f is incorrect BTW)
> +# The URL passed to this module is mangled (correct? - perhaps by http://search.cpan.org/dist/URI/lib/URI/Escape.pm)
> +# -but $updatefile needs to match wget's download name, regardless of the URL's encoding.
> +# -achievable either by mangling the URL and/or the filename.
> +# -*make sure "updxlrator" has the same adjustments*
> $sourceurl =~ s@\%2b(a)+@ig;
> $sourceurl =~ s@\%2f@/@ig;
> $sourceurl =~ s@\%7e@~@ig;
> $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1);
> +$updatefile_shellEscaped = $updatefile;
> +$updatefile_shellEscaped =~ s@\%20@\\ @ig;
> +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig;
> $updatefile =~ s@\%20@ @ig;
> +$updatefile =~ s@\%5c@\\@ig; #added to fix ipfire bug 12252
> $vendorid =~ tr/A-Z/a-z/;
>
> unless (-d "$repository/download/$vendorid")
> @@ -58,8 +72,9 @@ if($restartdl == 0)
>
> # hinder multiple downloads from starting simultaneously. Create empty "lock" file.
> # TODO: Another thread may sneak in between these two commands - so not fool-proof, but good enough?
> - system("touch $repository/download/$vendorid/$updatefile");
>
> + #use a native perl implementation of: system("touch $repository/download/$vendorid/$updatefile_shellEscaped");
> + { open my $lock_file, ">>", "$repository/download/$vendorid/$updatefile" }
> }
> else
> {
> @@ -169,11 +184,9 @@ if ($_ == 0)
> }
>
> &writelog("Moving file to the cache directory: $vendorid/$uuid");
> - $updatefile =~ s@ @\\ @ig;
> - system("mv $repository/download/$vendorid/$updatefile $repository/$vendorid/$uuid");
> + system("mv $repository/download/$vendorid/$updatefile_shellEscaped $repository/$vendorid/$uuid");
> # Workaround for IPCop's mv bug:
> utime time,$remote_mtime,"$repository/$vendorid/$uuid/$updatefile";
> - $updatefile =~ s@\\ @ @ig;
>
> &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/source.url",$sourceurl);
> &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/status",$UPDXLT::sfOk);
> diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator
> index cdc7eeb50..1febae51e 100644
> --- a/config/updxlrator/updxlrator
> +++ b/config/updxlrator/updxlrator
> @@ -338,11 +338,13 @@ sub check_cache
> my $sourceurl=$_[0];
> my $cfmirror=$_[4];
>
> + # Any changes made here must be mirrored in "download"
> $sourceurl =~ s@\%2b(a)+@ig;
> $sourceurl =~ s@\%2f@/@ig;
> $sourceurl =~ s@\%7e@~@ig;
> $updfile = substr($sourceurl,rindex($sourceurl,"/")+1);
> $updfile =~ s@\%20@ @ig;
> + $updfile =~ s@\%5c@\\@ig;
>
> if ($cfmirror)
> {
> --
> 2.17.1
>
prev parent reply other threads:[~2019-12-13 22:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <a345f1bf-67d7-9ec6-bba2-ac84ea02a272@mail.com>
2019-12-05 13:07 ` Justin Luth
2019-12-06 6:55 ` Michael Tremer
2019-12-09 17:57 ` Justin Luth
2019-12-13 22:45 ` Michael Tremer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55B66DB1-FD7E-4AA0-B333-3813C19E4094@ipfire.org \
--to=michael.tremer@ipfire.org \
--cc=development@lists.ipfire.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox