From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.haj.ipfire.org (localhost [127.0.0.1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4ZN64y4PWjz32gJ for ; Wed, 26 Mar 2025 12:47:30 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mail01.haj.ipfire.org", Issuer "R10" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4ZN64v26dwz30Zp for ; Wed, 26 Mar 2025 12:47:27 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 4ZN64t1F7Tzlg; Wed, 26 Mar 2025 12:47:26 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1742993246; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VJbrEGw+qdOEDsNqAr9VH7MIja0ZWefZdpa/clKC//8=; b=vXLP3fjJBo3SgFbAlK7j1rcI+YwTAKwZ+FV849WNGSM6wodjofw5/jNgMjJWfL/hh4NDui tqQNBKkgfhWdLkBQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1742993246; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VJbrEGw+qdOEDsNqAr9VH7MIja0ZWefZdpa/clKC//8=; b=WITwBy9u17jheEqJQ4Hj0e+2qxms7IPg3qryuXKXl+0LLCT3iWONVrdN2lm7QOgBUujKp+ Pm1cs4M2uWUcN5FgRZbqA4i9rX2OgNmuOI69mR2oqbkRFw7SyswDTHlmbGd3m00iunYIYw AgItDZrJoRGOnHKzWGvx1kCiUJpmwG1sWbfOrm7iPnhJvEozCszvFrB/IZ4HBLHdw3jcEc 8lLRsUWTS0AtCL9kf/rf7Pjq+BUILhs5sgssYMhHtaRz3qWhsZlLfryseA+Y0W/YChPWXV RRJ5gkkQVR89HBopXqubS6SImwR9CFFuMwBpHvSQBFRwOzZb3GtoGAHVy2AMGg== From: Stefan Schantl To: Michael Tremer CC: Date: Wed, 26 Mar 2025 13:47:25 +0100 Message-ID: <195d27e4348.27e7.cac9d3ffac9e24d09d20af05166fd73b@ipfire.org> In-Reply-To: References: <20250322145724.4593-1-stefan.schantl@ipfire.org> <31565D56-B0C2-4BA0-BF75-7902C376B35E@ipfire.org> Subject: Re: [PATCH 1/3] general-functions.pl: Add LWP-based flexible downloader function. Precedence: list List-Id: List-Subscribe: , List-Unsubscribe: , List-Post: List-Help: Sender: Mail-Followup-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="195d27e45986d4d27e7df4dacd" This is a multi-part message in MIME format. --195d27e45986d4d27e7df4dacd Content-Type: text/plain; format=flowed; charset="UTF-8" Content-Transfer-Encoding: 8bit Am 24. März 2025 11:02:08 schrieb Michael Tremer : > Hello, > >> On 23 Mar 2025, at 19:00, Stefan Schantl wrote: >> >> Hello Michael, >> >> thanks for the review and your detailed feedback. >> >> You are right this patch mainly moves the downloader logic from the >> ipblocklist feature into an own easy to use general function. >> >>> Morning Stefan, >>> >>> Thank you for taking the time and getting this out of your git >>> repository and onto this list :) >>> >>> As we are already shipping this code and this patch only moves it >>> around, there is probably not much standing in the way of seeing this >>> being merged. I have however a couple of questions that we might want >>> to address in a follow-up patch. >>> >>>> On 22 Mar 2025, at 14:57, Stefan Schantl >>>> wrote: >>>> >>>> This function can be used to grab content and/or store it into >>>> files. >>>> >>>> Signed-off-by: Stefan Schantl >>>> --- >>>> config/cfgroot/general-functions.pl | 262 >>>> ++++++++++++++++++++++++++++ >>>> 1 file changed, 262 insertions(+) >>>> >>>> diff --git a/config/cfgroot/general-functions.pl >>>> b/config/cfgroot/general-functions.pl >>>> index 8ba6e3f79..cb8df69c6 100644 >>>> --- a/config/cfgroot/general-functions.pl >>>> +++ b/config/cfgroot/general-functions.pl >>>> @@ -20,6 +20,23 @@ use IO::Socket; >>>> use Net::SSLeay; >>>> use Net::IPv4Addr qw(:all); >>>> >>>> +# Load module to move files. >>>> +use File::Copy; >>>> + >>>> +# Load module to get file stats. >>>> +use File::stat; >>>> + >>>> +# Load module to deal with temporary files. >>>> +use File::Temp; >>>> + >>>> +# Load module to deal with the date formats used by the HTTP >>>> protocol. >>>> +use HTTP::Date; >>>> + >>>> +# Load the libwwwperl User Agent module. >>>> +use LWP::UserAgent; >>> >>> Recently (let’s say a year ago) I have spent some time on making the >>> web UI faster. Especially on some hardware, is it not very snappy. >>> The reason for the slowness however was not that we are doing much >>> stuff in our code; it was rather that loading modules was taking a >>> long time in Perl. So I removed what was no longer needed and moved >>> some other stuff around which slightly helped (saved around 100ms on >>> the older IPFire Mini Appliance). >>> >>> Since the downloading functionality is not used *that* much here. >>> Should this maybe not live in a separate place and we only include it >>> when it is needed? >> >> Sounds good, so I'm totally fine with this suggestion. This can be part >> of a following patch(set). > > Well, moving it should not be a separate patch set, because we should not > move it twice. Hello Michael, this absolutely makes sense. Any ideas for the filename? I think downloader.pl could be misinterpreted by a certain users - maybe web-function.pl? > > But any of the other fixes should be implemented in a separate patch each. Let's do it this way. > > >>> >>>> + >>>> +$|=1; # line buffering >>> >>> Why is this needed? >> >> This line does not belong to the patch and is already part of the >> general-functions.pl file. >> >> The following can be found in the perl special variables manual about >> $|: >> >> $| - If set to nonzero, forces an fflush(3) after every write or print >> on the currently selected output channel. >> >> I'm not sure, if this line is required in general. > > I know what it does. I just wanted to know why this has been enabled here. Short answer: No clue. > > >> >>> >>>> + >>>> $General::version = 'VERSION'; >>>> $General::swroot = 'CONFIG_ROOT'; >>>> $General::noipprefix = 'noipg-'; >>>> @@ -1346,6 +1363,251 @@ sub generateProtoTransHash () { >>>> return %protocols; >>>> } >>>> >>>> +# Function to grab a given URL content or to download and store it >>>> on disk. >>>> +# >>>> +# The function requires a configuration hash to be passed. >>>> +# >>>> +# The following options (hash keys) are supported: >>>> +# >>>> +# URL -> The URL to the content or file. REQUIRED! >>>> +# FILE -> The filename as fullpath where the content/file should >>>> be stored on disk. >>>> +# ETAGSFILE -> A filename again as fullpath where Etags should be >>>> stored and read. >>>> +# ETAGPREFIX -> In case a custom etag name should be used, >>>> otherwise it defaults to the given URL. >>>> +# MAXSIZE -> In bytes until the downloader will abort downloading. >>>> (example: 10_485_760 for 10MB) >>>> +# >>>> +# If a file is given an If-Modified-Since header will be generated >>>> from the last modified timestamp >>>> +# of an already stored file. In case an Etag file is specified an >>>> If-None-Match header will be added to >>>> +# the request - Both can be used at the same time. >>>> +# >>>> +# In case no FILE option has been passed to the function, the >>>> content of the requested URL will be returned. >>>> +# >>>> +# Return codes (if FILE is used): >>>> +# >>>> +# nothing - On success >>>> +# no url - If no URL has been specified. >>>> +# not_modified - In case the servers responds with "Not modified" >>>> (304) >>>> +# dl_error - If the requested URL cannot be accessed. >>>> +# incomplete download - In case the size of the local file does >>>> not match the remote content_lenght. >>>> +# >>>> +sub downloader (%) { >>>> + my (%args) = @_; >>>> + >>>> + # Remap args hash and convert all keys into upper case format. >>>> + %args = map { uc $_ => $args{$_} } keys %args; >>>> + >>>> + # The amount of download attempts before giving up and >>>> + # logging an error. >>>> + my $max_dl_attempts = 3; >>>> + >>>> + # Temporary directory to download the files. >>>> + my $tmp_dl_directory = "/var/tmp"; >>>> + >>>> + # Assign hash values. >>>> + my $url = $args{"URL"} if (exists($args{"URL"})); >>>> + my $file = $args{"FILE"} if (exists($args{"FILE"})); >>>> + my $etags_file = $args{"ETAGSFILE"} if >>>> (exists($args{"ETAGSFILE"})); >>>> + my $etagprefix = $url; >>>> + $etagprefix = $args{"ETAGPREFIX"} if >>>> (exists($args{"ETAGPREFIX"})); >>>> + my $max_size = $args{"MAXSIZE"} if (exists($args{"MAXSIZE"})); >>> >>> I consider this really ugly, but I suppose this is what Perl wants. >>> >>>> + # Abort with error "no url", if no URL has been given. >>>> + return "no url" unless ($url); >>> >>> Could we rather die here with an error? Just returning a string here >>> seems to be too easy to ignore in case something above went wrong. >> >> Technically you are right, but let me explain the reason for this kind >> of error handling a bit further down. >> >>> >>>> + >>>> + my %etags = (); >>>> + my $tmpfile; >>>> + >>>> + # Read-in proxysettings. >>>> + my %proxysettings=(); >>>> + &readhash("${General::swroot}/proxy/settings", \%proxysettings); >>>> + >>>> + # Create a user agent instance. >>>> + # >>>> + # Request SSL hostname verification and specify path >>>> + # to the CA file. >>>> + my $ua = LWP::UserAgent->new( >>>> + ssl_opts => { >>>> + SSL_ca_file => '/etc/ssl/cert.pem', >>>> + verify_hostname => 1, >>>> + }, >>>> + ); >>>> + >>>> + # Set timeout to 10 seconds. >>>> + $ua->timeout(10); >>> >>> A 10 second timeout also sounds rather optimistic considering some >>> slower, overloaded mirrors or those that are quite far away. For any >>> tasks like downloading data in the background we could afford longer. >> >> Accepted, what would be a good default value here? > > 60 seconds? I think it makes sense to make this optionally configurable so > that for the IP check that has been changed here, we will only wait around > 10 seconds as it is a simple API request. I'll implement it as optional and config-able feature. Default if not set would be 60 seconds. > > >> >>> >>>> + >>>> + # Assign maximum download size if set. >>>> + $ua->max_size($max_size) if ($max_size); >>>> + >>>> + # Generate UserAgent. >>>> + my $agent = &MakeUserAgent(); >>>> + >>>> + # Set the generated UserAgent. >>>> + $ua->agent($agent); >>>> + >>>> + # Check if an upstream proxy is configured. >>>> + if ($proxysettings{'UPSTREAM_PROXY'}) { >>>> + my $proxy_url; >>>> + >>>> + $proxy_url = "http://"; >>>> + >>>> + # Check if the proxy requires authentication. >>>> + if (($proxysettings{'UPSTREAM_USER'}) && >>>> ($proxysettings{'UPSTREAM_PASSWORD'})) { >>>> + $proxy_url .= >>>> "$proxysettings{'UPSTREAM_USER'}\:$proxysettings{'UPSTREAM_PASSWORD >>>> '}\@"; >>>> + } >>>> + >>>> + # Add proxy server address and port. >>>> + $proxy_url .= $proxysettings{'UPSTREAM_PROXY'}; >>>> + >>>> + # Append proxy settings. >>>> + $ua->proxy(['http', 'https'], $proxy_url); >>>> + } >>>> + >>>> + # Create a HTTP request element and pass the given URL to it. >>>> + my $request = HTTP::Request->new(GET => $url); >>>> + >>>> + # Check if a file to store the output has been provided. >>>> + if ($file) { >>>> + # Check if the given file already exits, because it has been >>>> downloaded in the past. >>>> + # >>>> + # In this case we are requesting the server if the remote file >>>> has been changed or not. >>>> + # This will be done by sending the modification time in a special >>>> HTTP header. >>>> + if (-f $file) { >>>> + # Call stat on the file. >>>> + my $stat = stat($file); >>>> + >>>> + # Omit the mtime of the existing file. >>>> + my $mtime = $stat->mtime; >>>> + >>>> + # Convert the timestamp into right format. >>>> + my $http_date = time2str($mtime); >>>> + >>>> + # Add the If-Modified-Since header to the request to ask the >>>> server if the >>>> + # file has been modified. >>>> + $request->header( 'If-Modified-Since' => "$http_date" ); >>>> + } >>>> + >>>> + # Generate temporary file name, located in the tempoary download >>>> directory and with a suffix of ".tmp". >>>> + # The downloaded file will be stored there until some sanity >>>> checks are performed. >>>> + my $tmp = File::Temp->new( SUFFIX => ".tmp", DIR => >>>> "$tmp_dl_directory/", UNLINK => 0 ); >>>> + $tmpfile = $tmp->filename(); >>>> + } >>>> + >>>> + # Check if an file for etags has been given. >>>> + if ($etags_file) { >>>> + # Read-in Etags file for known Etags if the file is present. >>>> + &readhash("$etags_file", \%etags) if (-f $etags_file); >>>> + >>>> + # Check if an Etag for the current provider is stored. >>>> + if ($etags{$etagprefix}) { >>>> + # Grab the stored tag. >>>> + my $etag = $etags{$etagprefix}; >>>> + >>>> + # Add an "If-None-Match header to the request to ask the server >>>> if the >>>> + # file has been modified. >>>> + $request->header( 'If-None-Match' => $etag ); >>>> + } >>>> + } >>>> + >>>> + my $dl_attempt = 1; >>>> + my $response; >>>> + >>>> + # Download and retry on failure. >>>> + while ($dl_attempt <= $max_dl_attempts) { >>>> + # Perform the request and save the output into the tmpfile if >>>> requested. >>>> + $response = $ua->request($request, $tmpfile); >>>> + >>>> + # Check if the download was successfull. >>>> + if($response->is_success) { >>>> + # Break loop. >>>> + last; >>>> + >>>> + # Check if the server responds with 304 (Not Modified). >>>> + } elsif ($response->code == 304) { >>>> + # Remove temporary file, if one exists. >>>> + unlink("$tmpfile") if (-e "$tmpfile"); >>>> + >>>> + # Return "not modified". >>>> + return "not modified"; >>> >>> Same here. It feels like strings are not the most ideal return code. >> >> As promised up, I'm going to explain why I decided to return various >> strings in case of different errors. Calling die() immediately >> terminates the perl (main) script. > > This is correct, and I never really like this behaviour. However, Perl does > not know exceptions and so we cannot gracefully throw an error. > > But, not passing a URL is a programming error and that should get some > attention. Yeah, you have take care of this kind of perl behaviour and I'm also not in favour of it. That was the reason why I decided to do it in this way. I see the reason why you are recommending to die in case no url has passed - I'll adjust the code. Best regards, -Stefan > > > The “not modified” case should of course not call die. > >> So for example when processing a list of downloads and such an >> event/exception/error appeared in my opinion the main script, which >> called the function should decide how to go further. >> >> This is very similar to C where you call the function and take care of >> the return code and decide what to do next. > > Usually in C we return an integer or even better an enum. > >> So in this case I have to disagree with you and strongly recommend to >> keep the error processing at it is. >> >> Best regards, >> >> -Stefan >> >>> >>>> + >>>> + # Check if we ran out of download re-tries. >>>> + } elsif ($dl_attempt eq $max_dl_attempts) { >>>> + # Obtain error. >>>> + my $error = $response->content; >>>> + >>>> + # Remove temporary file, if one exists. >>>> + unlink("$tmpfile") if (-e "$tmpfile"); >>>> + >>>> + # Return the error message from response.. >>>> + return "$error"; >>>> + } >>>> + >>>> + # Remove temporary file, if one exists. >>>> + unlink("$tmpfile") if (-e "$tmpfile"); >>>> + >>>> + # Increase download attempt counter. >>>> + $dl_attempt++; >>>> + } >>>> + >>>> + # Obtain the connection headers. >>>> + my $headers = $response->headers; >>>> + >>>> + # Check if an Etag file has been provided. >>>> + if ($etags_file) { >>>> + # Grab the Etag from the response if the server provides one. >>>> + if ($response->header('Etag')) { >>>> + # Add the provided Etag to the hash of tags. >>>> + $etags{$etagprefix} = $response->header('Etag'); >>>> + >>>> + # Write the etags file. >>>> + &writehash($etags_file, \%etags); >>>> + } >>>> + } >>>> + >>>> + # Check if the response should be stored on disk. >>>> + if ($file) { >>>> + # Get the remote size of the content. >>>> + my $remote_size = $response->header('Content-Length'); >>>> + >>>> + # Perform a stat on the temporary file. >>>> + my $stat = stat($tmpfile); >>>> + >>>> + # Grab the size of the stored temporary file. >>>> + my $local_size = $stat->size; >>>> + >>>> + # Check if both sizes are equal. >>>> + if(($remote_size) && ($remote_size ne $local_size)) { >>>> + # Delete the temporary file. >>>> + unlink("$tmpfile"); >>>> + >>>> + # Abort and return "incomplete download" as error. >>>> + return "incomplete download"; >>>> + } >>>> + >>>> + # Move the temporaray file to the desired file by overwriting a >>>> may >>>> + # existing one. >>>> + move("$tmpfile", "$file"); >>>> + >>>> + # Omit the timestamp from response header, when the file has been >>>> modified the >>>> + # last time. >>>> + my $last_modified = $headers->last_modified; >>>> + >>>> + # Check if we got a last-modified value from the server. >>>> + if ($last_modified) { >>>> + # Assign the last-modified timestamp as mtime to the >>>> + # stored file. >>>> + utime(time(), "$last_modified", "$file"); >>>> + } >>>> + >>>> + # Delete temporary file. >>>> + unlink("$tmpfile"); >>>> + >>>> + # If we got here, everything worked fine. Return nothing. >>>> + return; >>>> + } else { >>>> + # Decode the response content and return it. >>>> + return $response->decoded_content; >>>> + } >>>> +} >>>> + >>>> # Cloud Stuff >>>> >>>> sub running_in_cloud() { >>>> -- >>>> 2.47.2 --195d27e45986d4d27e7df4dacd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Am 24. M=C3=A4rz 2025 11:02:08 schrieb Michael Tremer <= ;michael.tremer@ipfire.org>:

Hello,

On 23 Mar 2025, at 19:00, Stefan Schantl <stefan.schan= tl@ipfire.org> wrote:

Hello Michael,

thanks for the review and your detailed feedback.

You are right this patch mainly moves the downloader logi= c from the
ipblocklist feature into an own easy to use general funct= ion.

Morning Stefan,

Thank you for taking the time and getting this out of you= r git
repository and onto this list :)

As we are already shipping this code and this patch only = moves it
around, there is probably not much standing in the way of= seeing this
being merged. I have however a couple of questions that w= e might want
to address in a follow-up patch.

On 22 Mar 2025, at 14:57, Stefan Schantl
<stefan.schantl@ipfire.org> wrote:

This function can be used to grab content and/or store it= into
files.

Signed-off-by: Stefan Schantl <stefan.schantl@ipfire.o= rg>
---
config/cfgroot/general-functions.pl | 262
++++++++++++++++++++++++++++
1 file changed, 262 insertions(+)

diff --git a/config/cfgroot/general-functions.pl
b/config/cfgroot/general-functions.pl
index 8ba6e3f79..cb8df69c6 100644
--- a/config/cfgroot/general-functions.pl
+++ b/config/cfgroot/general-functions.pl
@@ -20,6 +20,23 @@ use IO::Socket;
use Net::SSLeay;
use Net::IPv4Addr qw(:all);

+# Load module to move files.
+use File::Copy;
+
+# Load module to get file stats.
+use File::stat;
+
+# Load module to deal with temporary files.
+use File::Temp;
+
+# Load module to deal with the date formats used by the = HTTP
protocol.
+use HTTP::Date;
+
+# Load the libwwwperl User Agent module.
+use LWP::UserAgent;

Recently (let=E2=80=99s say a year ago) I have spent some= time on making the
web UI faster. Especially on some hardware, is it not ver= y snappy.
The reason for the slowness however was not that we are d= oing much
stuff in our code; it was rather that loading modules was= taking a
long time in Perl. So I removed what was no longer needed= and moved
some other stuff around which slightly helped (saved arou= nd 100ms on
the older IPFire Mini Appliance).

Since the downloading functionality is not used *that* mu= ch here.
Should this maybe not live in a separate place and we onl= y include it
when it is needed?

Sounds good, so I'm totally fine with this suggestion. Th= is can be part
of a following patch(set).

Well, moving it should not be a separate patch set, becau= se we should not move it twice.
<= br>
Hello Michael,

=
this absolutely makes sense. Any ideas for the filename?<= /div>

I think downloader.pl co= uld be misinterpreted by a certain users - maybe web-function.pl?


But any of the other fixes should be implemented in a sep= arate patch each.

Let's do it this way.



+
+$|=3D1; # line buffering

Why is this needed?

This line does not belong to the patch and is already par= t of the
general-functions.pl file.

The following can be found in the perl special variables = manual about
$|: 

$| - If set to nonzero, forces an fflush(3) after every w= rite or print
on the currently selected output channel.

I'm not sure, if this line is required in general.

I know what it does. I just wanted to know why this has b= een enabled here.

Short answer: No clue.




+
$General::version =3D 'VERSION';
$General::swroot =3D 'CONFIG_ROOT';
$General::noipprefix =3D 'noipg-';
@@ -1346,6 +1363,251 @@ sub generateProtoTransHash () {
return %protocols;
}

+# Function to grab a given URL content or to download an= d store it
on disk.
+#
+# The function requires a configuration hash to be passe= d.
+#
+# The following options (hash keys) are supported:
+#
+# URL -> The URL to the content or file. REQUIRED!
+# FILE -> The filename as fullpath where the content/= file should
be stored on disk.
+# ETAGSFILE -> A filename again as fullpath where Eta= gs should be
stored and read.
+# ETAGPREFIX -> In case a custom etag name should be = used,
otherwise it defaults to the given URL.
+# MAXSIZE -> In bytes until the downloader will abort= downloading.
(example: 10_485_760 for 10MB)
+#
+# If a file is given an If-Modified-Since header will be= generated
from the last modified timestamp
+# of an already stored file. In case an Etag file is spe= cified an
If-None-Match header will be added to
+# the request - Both can be used at the same time.
+#
+# In case no FILE option has been passed to the function= , the
content of the requested URL will be returned.
+#
+# Return codes (if FILE is used):
+#
+# nothing - On success
+# no url - If no URL has been specified.
+# not_modified - In case the servers responds with "Not = modified"
(304)
+# dl_error - If the requested URL cannot be accessed.
+# incomplete download - In case the size of the local fi= le does
not match the remote content_lenght.
+#
+sub downloader (%) {
+ my (%args) =3D @_;
+
+ # Remap args hash and convert all keys into upper case = format.
+ %args =3D map { uc $_ =3D> $args{$_} } keys %args;
+
+ # The amount of download attempts before giving up and<= /div>
+ # logging an error.
+ my $max_dl_attempts =3D 3;
+
+ # Temporary directory to download the files.
+ my $tmp_dl_directory =3D "/var/tmp";
+
+ # Assign hash values.
+ my $url =3D $args{"URL"} if (exists($args{"URL"}));
+ my $file =3D $args{"FILE"} if (exists($args{"FILE"}));<= /div>
+ my $etags_file =3D $args{"ETAGSFILE"} if
(exists($args{"ETAGSFILE"}));
+ my $etagprefix =3D $url;
+ $etagprefix =3D $args{"ETAGPREFIX"} if
(exists($args{"ETAGPREFIX"}));
+ my $max_size =3D $args{"MAXSIZE"} if (exists($args{"MAX= SIZE"}));

I consider this really ugly, but I suppose this is what P= erl wants.

+ # Abort with error "no url", if no URL has been given.<= /div>
+ return "no url" unless ($url);

Could we rather die here with an error? Just returning a = string here
seems to be too easy to ignore in case something above we= nt wrong.

Technically you are right, but let me explain the reason = for this kind
of error handling a bit further down.


+
+ my %etags =3D ();
+ my $tmpfile;
+
+ # Read-in proxysettings.
+ my %proxysettings=3D();
+ &readhash("${General::swroot}/proxy/settings", \%pr= oxysettings);
+
+ # Create a user agent instance.
+ #
+ # Request SSL hostname verification and specify path
+ # to the CA file.
+ my $ua =3D LWP::UserAgent->new(
+ ssl_opts =3D> {
+ SSL_ca_file     =3D> '/etc/ssl/cert.pem',<= /div>
+ verify_hostname =3D> 1,
+ },
+ );
+
+ # Set timeout to 10 seconds.
+ $ua->timeout(10);

A 10 second timeout also sounds rather optimistic conside= ring some
slower, overloaded mirrors or those that are quite far aw= ay. For any
tasks like downloading data in the background we could af= ford longer.

Accepted, what would be a good default value here?

60 seconds? I think it makes sense to make this optionall= y configurable so that for the IP check that has been changed here, we will= only wait around 10 seconds as it is a simple API request.

I'll  implement= it as optional and config-able feature. Default if not set would be 60 sec= onds.




+
+ # Assign maximum download size if set.
+ $ua->max_size($max_size) if ($max_size);
+
+ # Generate UserAgent.
+ my $agent =3D &MakeUserAgent();
+
+ # Set the generated UserAgent.
+ $ua->agent($agent);
+
+ # Check if an upstream proxy is configured.
+ if ($proxysettings{'UPSTREAM_PROXY'}) {
+ my $proxy_url;
+
+ $proxy_url =3D "http://";
+
+ # Check if the proxy requires authentication.
+ if (($proxysettings{'UPSTREAM_USER'}) &&
($proxysettings{'UPSTREAM_PASSWORD'})) {
+ $proxy_url .=3D
"$proxysettings{'UPSTREAM_USER'}\:$proxysettings{'UPSTREA= M_PASSWORD
'}\@";
+ }
+
+ # Add proxy server address and port.
+ $proxy_url .=3D $proxysettings{'UPSTREAM_PROXY'};
+
+ # Append proxy settings.
+ $ua->proxy(['http', 'https'], $proxy_url);
+ }
+
+ # Create a HTTP request element and pass the given URL = to it.
+ my $request =3D HTTP::Request->new(GET =3D> $url)= ;
+
+ # Check if a file to store the output has been provided= .
+ if ($file) {
+ # Check if the given file already exits, because it has= been
downloaded in the past.
+ #
+ # In this case we are requesting the server if the remo= te file
has been changed or not.
+ # This will be done by sending the modification time in= a special
HTTP header.
+ if (-f $file) {
+ # Call stat on the file.
+ my $stat =3D stat($file);
+
+ # Omit the mtime of the existing file.
+ my $mtime =3D $stat->mtime;
+
+ # Convert the timestamp into right format.
+ my $http_date =3D time2str($mtime);
+
+ # Add the If-Modified-Since header to the request to as= k the
server if the
+ # file has been modified.
+ $request->header( 'If-Modified-Since' =3D> "$http= _date" );
+ }
+
+ # Generate temporary file name, located in the tempoary= download
directory and with a suffix of ".tmp".
+ # The downloaded file will be stored there until some s= anity
checks are performed.
+ my $tmp =3D File::Temp->new( SUFFIX =3D> ".tmp", = DIR =3D>
"$tmp_dl_directory/", UNLINK =3D> 0 );
+ $tmpfile =3D $tmp->filename();
+ }
+
+ # Check if an file for etags has been given.
+ if ($etags_file) {
+ # Read-in Etags file for known Etags if the file is pre= sent.
+ &readhash("$etags_file", \%etags) if (-f $etags_fil= e);
+
+ # Check if an Etag for the current provider is stored.<= /div>
+ if ($etags{$etagprefix}) {
+ # Grab the stored tag.
+ my $etag =3D $etags{$etagprefix};
+
+ # Add an "If-None-Match header to the request to ask th= e server
if the
+ # file has been modified.
+ $request->header( 'If-None-Match' =3D> $etag );
+ }
+ }
+
+ my $dl_attempt =3D 1;
+ my $response;
+
+ # Download and retry on failure.
+ while ($dl_attempt <=3D $max_dl_attempts) {
+ # Perform the request and save the output into the tmpf= ile if
requested.
+ $response =3D $ua->request($request, $tmpfile);
+
+ # Check if the download was successfull.
+ if($response->is_success) {
+ # Break loop.
+ last;
+
+ # Check if the server responds with 304 (Not Modified).=
+ } elsif ($response->code =3D=3D 304) {
+ # Remove temporary file, if one exists.
+ unlink("$tmpfile") if (-e "$tmpfile");
+
+ # Return "not modified".
+ return "not modified";

Same here. It feels like strings are not the most ideal r= eturn code.

As promised up, I'm going to explain why I decided to ret= urn various
strings in case of different errors. Calling die() immedi= ately
terminates the perl (main) script.

This is correct, and I never really like this behaviour. = However, Perl does not know exceptions and so we cannot gracefully throw an= error.

But, not passing a URL is a programming error and that sh= ould get some attention.

Yeah, you have take care of this kind of perl behaviour= and I'm also not in favour of it. That was the reason why I decided to do = it in this way.

I see th= e reason why you are recommending to die in case no url has passed - I'll a= djust the code.

Best reg= ards,

-Stefan


The =E2=80=9Cnot modified=E2=80=9D case should of course = not call die.

So for example when processing a list of downloads and su= ch an
event/exception/error appeared in my opinion the main scr= ipt, which
called the function should decide how to go further.

This is very similar to C where you call the function and= take care of
the return code and decide what to do next.

Usually in C we return an integer or even better an enum.=

So in this case I have to disagree with you and strongly = recommend to
keep the error processing at it is.

Best regards,

-Stefan


+
+ # Check if we ran out of download re-tries.
+ } elsif ($dl_attempt eq $max_dl_attempts) {
+ # Obtain error.
+ my $error =3D $response->content;
+
+ # Remove temporary file, if one exists.
+ unlink("$tmpfile") if (-e "$tmpfile");
+
+ # Return the error message from response..
+ return "$error";
+ }
+
+ # Remove temporary file, if one exists.
+ unlink("$tmpfile") if (-e "$tmpfile");
+
+ # Increase download attempt counter.
+ $dl_attempt++;
+ }
+
+ # Obtain the connection headers.
+ my $headers =3D $response->headers;
+
+ # Check if an Etag file has been provided.
+ if ($etags_file) {
+ # Grab the Etag from the response if the server provide= s one.
+ if ($response->header('Etag')) {
+ # Add the provided Etag to the hash of tags.
+ $etags{$etagprefix} =3D $response->header('Etag');
+
+ # Write the etags file.
+ &writehash($etags_file, \%etags);
+ }
+ }
+
+ # Check if the response should be stored on disk.
+ if ($file) {
+ # Get the remote size of the content.
+ my $remote_size =3D $response->header('Content-Lengt= h');
+
+ # Perform a stat on the temporary file.
+ my $stat =3D stat($tmpfile);
+
+ # Grab the size of the stored temporary file.
+ my $local_size =3D $stat->size;
+
+ # Check if both sizes are equal.
+ if(($remote_size) && ($remote_size ne $local_si= ze)) {
+ # Delete the temporary file.
+ unlink("$tmpfile");
+
+ # Abort and return "incomplete download" as error.
+ return "incomplete download";
+ }
+
+ # Move the temporaray file to the desired file by overw= riting a
may
+ # existing one.
+ move("$tmpfile", "$file");
+
+ # Omit the timestamp from response header, when the fil= e has been
modified the
+ # last time.
+ my $last_modified =3D $headers->last_modified;
+
+ # Check if we got a last-modified value from the server= .
+ if ($last_modified) {
+ # Assign the last-modified timestamp as mtime to the
+ # stored file.
+ utime(time(), "$last_modified", "$file");
+ }
+
+ # Delete temporary file.
+ unlink("$tmpfile");
+
+ # If we got here, everything worked fine. Return nothin= g.
+ return;
+ } else {
+ # Decode the response content and return it.
+ return $response->decoded_content;
+ }
+}
+
# Cloud Stuff

sub running_in_cloud() {
-- 
2.47.2

--195d27e45986d4d27e7df4dacd--