public inbox for development@lists.ipfire.org
 help / color / mirror / Atom feed
From: Michael Tremer <michael.tremer@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: [PATCH] Fix bug 12252: updxlrator - handle \ in filename
Date: Fri, 13 Dec 2019 22:45:29 +0000	[thread overview]
Message-ID: <55B66DB1-FD7E-4AA0-B333-3813C19E4094@ipfire.org> (raw)
In-Reply-To: <478e0c2b-a656-e5c2-6051-e10f1b3596fb@mail.com>

[-- Attachment #1: Type: text/plain, Size: 9359 bytes --]

Hi,

Sorry for not replying earlier.

It is the run-up to Christmas and I am quite busy with a thousand little things. So stay tuned if you do not hear from me very swiftly.

> On 9 Dec 2019, at 17:57, Justin Luth <jluth(a)mail.com> wrote:
> 
> My minimally updated patch is re-included at the end of the message, and
> comments are responded to inline with the original context. I just
> tweaked some comment messages, and replaced a system call that I
> introduced recently with a perl-native implementation.
> 
> On 12/6/19 9:55 AM, Michael Tremer wrote:
>> This seems to be a tricky one.
>> 
>> +# Hopefully correct documentation: based on debugging a "\" problem...
>> +# WGET doesn't use the filename from the URL, but gets the filename
>> from the server info.
>> +#   -mangles that name according to --restrict-file-names.
>> +#   -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled.
>> +#      -(which suggests the entry for %2f is incorrect BTW)
>> +# The URL passed to this module is mangled (correct? - perhaps by
>> http://search.cpan.org/dist/URI/lib/URI/Escape.pm)
>> +#   -but $updatefile needs to match wget's download name, regardless of
>> the URL's encoding.
>> +#       -achievable either by mangling the URL and/or the filename.
>> +#       -*make sure "updxlrator" has the same adjustments*
>>  $sourceurl =~ s@\%2b(a)+@ig;
>>  $sourceurl =~ s@\%2f@/@ig;
>>  $sourceurl =~ s@\%7e@~@ig;
>>  $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1);
>> +$updatefile_shellEscaped = $updatefile;
>> +$updatefile_shellEscaped =~ s@\%20@\\ @ig;
>> +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig;
>>  $updatefile =~ s@\%20@ @ig;
>> +$updatefile =~ s@\%5c@\\@ig;
>>  $vendorid =~ tr/A-Z/a-z/;
>> 
>> It is quite hard to figure out what was actually supposed to happen here. The replacements look more or less like fixes instead of thinking ahead and decoding any special characters.
> Maybe, but most of the escaped characters are really weird for use in
> filenames, so likely the approach was "instead of decoding every
> possibility, lets just do the ones that are actually in use."  And in 7
> years I haven't noticed backslashes being a problem - but suddenly a new
> patch URL has popped up (wrongly) requesting a backslash. So now we need
> to deal with that situation or else have our ipfire cache look
> increasingly messy.

Yeah it looks like it, but try and error isn’t always the best approach for software development.

>> 
>>>  unless (-d "$repository/download/$vendorid")
>>> @@ -58,7 +72,7 @@ if($restartdl == 0)
>>> 
>>>      # hinder multiple downloads from starting simultaneously. Create
>>> empty "lock" file.
>>>      # TODO: Another thread may sneak in between these two commands -
>>> so not fool-proof, but good enough?
>>> -    system("touch $repository/download/$vendorid/$updatefile");
>>> +    system("touch
>>> $repository/download/$vendorid/$updatefile_shellEscaped");
>> However, this is all needed because we are calling shell commands here to move a file.
> Not ALL of this is needed because of shell commands.  Only the duplicate
> _shellEscaped variable is needed for the shell commands. The addition of
> 
> $updatefile =~ s@\%5c@\\@ig;
> 
> is required regardless of using shell commands or not.
>> If we would use perl’s move() function, then we could avoid escaping the characters here. They should all be allowed to be used in the file system:
>> 
>>   https://perldoc.perl.org/File/Copy.html#move
>> 
>> Would you like to try that option?
> Not really, since I'm just a bugfixer and not a perl programmer. But I
> did try it, and it (silently) failed to work.  I also tried to use
> make_path instead of mkdir -p and that also silently failed. So I
> abandoned the attempt to remove all system calls. I'm also not eager to
> introduce regressions, and that is exactly what would happen if I
> started playing with things that I don't understand and don't know how
> to debug.

LOL, nobody is eager to introduce any regressions :)

If you have IO operations like mkdir() you can catch any errors with the “die” keyword:

  mkdir(“$path/file”) or die “Could not create directory: $!”;

That will cause the script to abort and log the message.

>> 
>> -Michael
> 
> Recently some Microsoft patches are alternating between using
> \ and / as directory separators, leaving files behind in
> updatecache/download that can only be removed in a terminal.
> 
> What was happening was that the lock file was created using the
> URL name containing %5c while wget was actually writing to a filename
> with a real backslash. The zero-sized lockfile was moved into the uuid
> folder, and the actual .cab file was orphaned in the download folder,
> indicated in the GUI as a stranded file that couldn't be deleted.
> 
> An alternate way to fix the problem could be to force wget to
> use --restrict-file-names=windows - in which case \ | : ? " * < >
> would all be saved as escaped codes. That would seem to better
> match the URL mangling that apparently is happening.
> Cons:
> -removing sourceurl mangling means re-downloading existing caches.
> -less human-readable URLs / filenames
> -conceptual alternative: I didn't prove/test this theory since it
> means re-downloading all oddly-named files.

I would consider that that is a change that has more implications for the rest of the code which is not robust enough against it.

Best,
-Michael

> 
> P.S. I didn't change perl's utime command to use updatefile_shellEscaped
> since the filepath is in quotes - the space didn't actually need to be
> escaped.
> ---
> config/updxlrator/download   | 21 +++++++++++++++++----
> config/updxlrator/updxlrator |  2 ++
> 2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/config/updxlrator/download b/config/updxlrator/download
> index afa6e6cb9..ab89171e4 100644
> --- a/config/updxlrator/download
> +++ b/config/updxlrator/download
> @@ -26,6 +26,7 @@ my @http_header=();
> my $remote_size=0;
> my $remote_mtime=0;
> my $updatefile='';
> +my $updatefile_shellEscaped='';  #double escaped for re-processing by shell/system calls
> my $unique=0;
> my $mirror=1;
> 
> @@ -38,11 +39,24 @@ my $restartdl = defined($ARGV[3]) ? $ARGV[3] : 0;
> 
> umask(0002);
> 
> +# Hopefully correct documentation: based on debugging a "\" problem...
> +# WGET doesn't use the filename from the URL, but gets the filename from the server info.
> +#   -mangles that name according to --restrict-file-names.
> +#   -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled.
> +#      -(which suggests the entry for %2f is incorrect BTW)
> +# The URL passed to this module is mangled (correct? - perhaps by http://search.cpan.org/dist/URI/lib/URI/Escape.pm)
> +#   -but $updatefile needs to match wget's download name, regardless of the URL's encoding.
> +#   	-achievable either by mangling the URL and/or the filename.
> +#   	-*make sure "updxlrator" has the same adjustments*
> $sourceurl =~ s@\%2b(a)+@ig;
> $sourceurl =~ s@\%2f@/@ig;
> $sourceurl =~ s@\%7e@~@ig;
> $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1);
> +$updatefile_shellEscaped = $updatefile;
> +$updatefile_shellEscaped =~ s@\%20@\\ @ig;
> +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig;
> $updatefile =~ s@\%20@ @ig;
> +$updatefile =~ s@\%5c@\\@ig;    #added to fix ipfire bug 12252
> $vendorid =~ tr/A-Z/a-z/;
> 
> unless (-d "$repository/download/$vendorid")
> @@ -58,8 +72,9 @@ if($restartdl == 0)
> 
> 	# hinder multiple downloads from starting simultaneously. Create empty "lock" file.
> 	# TODO: Another thread may sneak in between these two commands - so not fool-proof, but good enough?
> -	system("touch $repository/download/$vendorid/$updatefile");
> 
> +	#use a native perl implementation of:  system("touch $repository/download/$vendorid/$updatefile_shellEscaped");
> +	{ open my $lock_file, ">>", "$repository/download/$vendorid/$updatefile" }
> }
> else
> {
> @@ -169,11 +184,9 @@ if ($_ == 0)
> 	}
> 
> 	&writelog("Moving file to the cache directory: $vendorid/$uuid");
> -	$updatefile =~ s@ @\\ @ig;
> -	system("mv $repository/download/$vendorid/$updatefile $repository/$vendorid/$uuid");
> +	system("mv $repository/download/$vendorid/$updatefile_shellEscaped $repository/$vendorid/$uuid");
> 	# Workaround for IPCop's mv bug:
> 	utime time,$remote_mtime,"$repository/$vendorid/$uuid/$updatefile";
> -	$updatefile =~ s@\\ @ @ig;
> 
> 	&UPDXLT::setcachestatus("$repository/$vendorid/$uuid/source.url",$sourceurl);
> 	&UPDXLT::setcachestatus("$repository/$vendorid/$uuid/status",$UPDXLT::sfOk);
> diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator
> index cdc7eeb50..1febae51e 100644
> --- a/config/updxlrator/updxlrator
> +++ b/config/updxlrator/updxlrator
> @@ -338,11 +338,13 @@ sub check_cache
> 	my $sourceurl=$_[0];
> 	my $cfmirror=$_[4];
> 
> +	# Any changes made here must be mirrored in "download"
> 	$sourceurl =~ s@\%2b(a)+@ig;
> 	$sourceurl =~ s@\%2f@/@ig;
> 	$sourceurl =~ s@\%7e@~@ig;
> 	$updfile = substr($sourceurl,rindex($sourceurl,"/")+1);
> 	$updfile =~ s@\%20@ @ig;
> +	$updfile =~ s@\%5c@\\@ig;
> 
> 	if ($cfmirror)
> 	{
> --
> 2.17.1
> 


      reply	other threads:[~2019-12-13 22:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <a345f1bf-67d7-9ec6-bba2-ac84ea02a272@mail.com>
2019-12-05 13:07 ` Justin Luth
2019-12-06  6:55   ` Michael Tremer
2019-12-09 17:57     ` Justin Luth
2019-12-13 22:45       ` Michael Tremer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55B66DB1-FD7E-4AA0-B333-3813C19E4094@ipfire.org \
    --to=michael.tremer@ipfire.org \
    --cc=development@lists.ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox