Am 24. März 2025 11:02:08 schrieb Michael Tremer <michael.tremer@ipfire.org>:

Hello,

On 23 Mar 2025, at 19:00, Stefan Schantl <stefan.schantl@ipfire.org> wrote:

Hello Michael,

thanks for the review and your detailed feedback.

You are right this patch mainly moves the downloader logic from the
ipblocklist feature into an own easy to use general function.

Morning Stefan,

Thank you for taking the time and getting this out of your git
repository and onto this list :)

As we are already shipping this code and this patch only moves it
around, there is probably not much standing in the way of seeing this
being merged. I have however a couple of questions that we might want
to address in a follow-up patch.

On 22 Mar 2025, at 14:57, Stefan Schantl
<stefan.schantl@ipfire.org> wrote:

This function can be used to grab content and/or store it into
files.

Signed-off-by: Stefan Schantl <stefan.schantl@ipfire.org>
---
config/cfgroot/general-functions.pl | 262
++++++++++++++++++++++++++++
1 file changed, 262 insertions(+)

diff --git a/config/cfgroot/general-functions.pl
b/config/cfgroot/general-functions.pl
index 8ba6e3f79..cb8df69c6 100644
--- a/config/cfgroot/general-functions.pl
+++ b/config/cfgroot/general-functions.pl
@@ -20,6 +20,23 @@ use IO::Socket;
use Net::SSLeay;
use Net::IPv4Addr qw(:all);

+# Load module to move files.
+use File::Copy;
+
+# Load module to get file stats.
+use File::stat;
+
+# Load module to deal with temporary files.
+use File::Temp;
+
+# Load module to deal with the date formats used by the HTTP
protocol.
+use HTTP::Date;
+
+# Load the libwwwperl User Agent module.
+use LWP::UserAgent;

Recently (let’s say a year ago) I have spent some time on making the
web UI faster. Especially on some hardware, is it not very snappy.
The reason for the slowness however was not that we are doing much
stuff in our code; it was rather that loading modules was taking a
long time in Perl. So I removed what was no longer needed and moved
some other stuff around which slightly helped (saved around 100ms on
the older IPFire Mini Appliance).

Since the downloading functionality is not used *that* much here.
Should this maybe not live in a separate place and we only include it
when it is needed?

Sounds good, so I'm totally fine with this suggestion. This can be part
of a following patch(set).

Well, moving it should not be a separate patch set, because we should not move it twice.

Hello Michael,

this absolutely makes sense. Any ideas for the filename?

I think downloader.pl could be misinterpreted by a certain users - maybe web-function.pl?


But any of the other fixes should be implemented in a separate patch each.

Let's do it this way.



+
+$|=1; # line buffering

Why is this needed?

This line does not belong to the patch and is already part of the
general-functions.pl file.

The following can be found in the perl special variables manual about
$|: 

$| - If set to nonzero, forces an fflush(3) after every write or print
on the currently selected output channel.

I'm not sure, if this line is required in general.

I know what it does. I just wanted to know why this has been enabled here.

Short answer: No clue.




+
$General::version = 'VERSION';
$General::swroot = 'CONFIG_ROOT';
$General::noipprefix = 'noipg-';
@@ -1346,6 +1363,251 @@ sub generateProtoTransHash () {
return %protocols;
}

+# Function to grab a given URL content or to download and store it
on disk.
+#
+# The function requires a configuration hash to be passed.
+#
+# The following options (hash keys) are supported:
+#
+# URL -> The URL to the content or file. REQUIRED!
+# FILE -> The filename as fullpath where the content/file should
be stored on disk.
+# ETAGSFILE -> A filename again as fullpath where Etags should be
stored and read.
+# ETAGPREFIX -> In case a custom etag name should be used,
otherwise it defaults to the given URL.
+# MAXSIZE -> In bytes until the downloader will abort downloading.
(example: 10_485_760 for 10MB)
+#
+# If a file is given an If-Modified-Since header will be generated
from the last modified timestamp
+# of an already stored file. In case an Etag file is specified an
If-None-Match header will be added to
+# the request - Both can be used at the same time.
+#
+# In case no FILE option has been passed to the function, the
content of the requested URL will be returned.
+#
+# Return codes (if FILE is used):
+#
+# nothing - On success
+# no url - If no URL has been specified.
+# not_modified - In case the servers responds with "Not modified"
(304)
+# dl_error - If the requested URL cannot be accessed.
+# incomplete download - In case the size of the local file does
not match the remote content_lenght.
+#
+sub downloader (%) {
+ my (%args) = @_;
+
+ # Remap args hash and convert all keys into upper case format.
+ %args = map { uc $_ => $args{$_} } keys %args;
+
+ # The amount of download attempts before giving up and
+ # logging an error.
+ my $max_dl_attempts = 3;
+
+ # Temporary directory to download the files.
+ my $tmp_dl_directory = "/var/tmp";
+
+ # Assign hash values.
+ my $url = $args{"URL"} if (exists($args{"URL"}));
+ my $file = $args{"FILE"} if (exists($args{"FILE"}));
+ my $etags_file = $args{"ETAGSFILE"} if
(exists($args{"ETAGSFILE"}));
+ my $etagprefix = $url;
+ $etagprefix = $args{"ETAGPREFIX"} if
(exists($args{"ETAGPREFIX"}));
+ my $max_size = $args{"MAXSIZE"} if (exists($args{"MAXSIZE"}));

I consider this really ugly, but I suppose this is what Perl wants.

+ # Abort with error "no url", if no URL has been given.
+ return "no url" unless ($url);

Could we rather die here with an error? Just returning a string here
seems to be too easy to ignore in case something above went wrong.

Technically you are right, but let me explain the reason for this kind
of error handling a bit further down.


+
+ my %etags = ();
+ my $tmpfile;
+
+ # Read-in proxysettings.
+ my %proxysettings=();
+ &readhash("${General::swroot}/proxy/settings", \%proxysettings);
+
+ # Create a user agent instance.
+ #
+ # Request SSL hostname verification and specify path
+ # to the CA file.
+ my $ua = LWP::UserAgent->new(
+ ssl_opts => {
+ SSL_ca_file     => '/etc/ssl/cert.pem',
+ verify_hostname => 1,
+ },
+ );
+
+ # Set timeout to 10 seconds.
+ $ua->timeout(10);

A 10 second timeout also sounds rather optimistic considering some
slower, overloaded mirrors or those that are quite far away. For any
tasks like downloading data in the background we could afford longer.

Accepted, what would be a good default value here?

60 seconds? I think it makes sense to make this optionally configurable so that for the IP check that has been changed here, we will only wait around 10 seconds as it is a simple API request.

I'll  implement it as optional and config-able feature. Default if not set would be 60 seconds.




+
+ # Assign maximum download size if set.
+ $ua->max_size($max_size) if ($max_size);
+
+ # Generate UserAgent.
+ my $agent = &MakeUserAgent();
+
+ # Set the generated UserAgent.
+ $ua->agent($agent);
+
+ # Check if an upstream proxy is configured.
+ if ($proxysettings{'UPSTREAM_PROXY'}) {
+ my $proxy_url;
+
+ $proxy_url = "http://";
+
+ # Check if the proxy requires authentication.
+ if (($proxysettings{'UPSTREAM_USER'}) &&
($proxysettings{'UPSTREAM_PASSWORD'})) {
+ $proxy_url .=
"$proxysettings{'UPSTREAM_USER'}\:$proxysettings{'UPSTREAM_PASSWORD
'}\@";
+ }
+
+ # Add proxy server address and port.
+ $proxy_url .= $proxysettings{'UPSTREAM_PROXY'};
+
+ # Append proxy settings.
+ $ua->proxy(['http', 'https'], $proxy_url);
+ }
+
+ # Create a HTTP request element and pass the given URL to it.
+ my $request = HTTP::Request->new(GET => $url);
+
+ # Check if a file to store the output has been provided.
+ if ($file) {
+ # Check if the given file already exits, because it has been
downloaded in the past.
+ #
+ # In this case we are requesting the server if the remote file
has been changed or not.
+ # This will be done by sending the modification time in a special
HTTP header.
+ if (-f $file) {
+ # Call stat on the file.
+ my $stat = stat($file);
+
+ # Omit the mtime of the existing file.
+ my $mtime = $stat->mtime;
+
+ # Convert the timestamp into right format.
+ my $http_date = time2str($mtime);
+
+ # Add the If-Modified-Since header to the request to ask the
server if the
+ # file has been modified.
+ $request->header( 'If-Modified-Since' => "$http_date" );
+ }
+
+ # Generate temporary file name, located in the tempoary download
directory and with a suffix of ".tmp".
+ # The downloaded file will be stored there until some sanity
checks are performed.
+ my $tmp = File::Temp->new( SUFFIX => ".tmp", DIR =>
"$tmp_dl_directory/", UNLINK => 0 );
+ $tmpfile = $tmp->filename();
+ }
+
+ # Check if an file for etags has been given.
+ if ($etags_file) {
+ # Read-in Etags file for known Etags if the file is present.
+ &readhash("$etags_file", \%etags) if (-f $etags_file);
+
+ # Check if an Etag for the current provider is stored.
+ if ($etags{$etagprefix}) {
+ # Grab the stored tag.
+ my $etag = $etags{$etagprefix};
+
+ # Add an "If-None-Match header to the request to ask the server
if the
+ # file has been modified.
+ $request->header( 'If-None-Match' => $etag );
+ }
+ }
+
+ my $dl_attempt = 1;
+ my $response;
+
+ # Download and retry on failure.
+ while ($dl_attempt <= $max_dl_attempts) {
+ # Perform the request and save the output into the tmpfile if
requested.
+ $response = $ua->request($request, $tmpfile);
+
+ # Check if the download was successfull.
+ if($response->is_success) {
+ # Break loop.
+ last;
+
+ # Check if the server responds with 304 (Not Modified).
+ } elsif ($response->code == 304) {
+ # Remove temporary file, if one exists.
+ unlink("$tmpfile") if (-e "$tmpfile");
+
+ # Return "not modified".
+ return "not modified";

Same here. It feels like strings are not the most ideal return code.

As promised up, I'm going to explain why I decided to return various
strings in case of different errors. Calling die() immediately
terminates the perl (main) script.

This is correct, and I never really like this behaviour. However, Perl does not know exceptions and so we cannot gracefully throw an error.

But, not passing a URL is a programming error and that should get some attention.

Yeah, you have take care of this kind of perl behaviour and I'm also not in favour of it. That was the reason why I decided to do it in this way.

I see the reason why you are recommending to die in case no url has passed - I'll adjust the code.

Best regards,

-Stefan


The “not modified” case should of course not call die.

So for example when processing a list of downloads and such an
event/exception/error appeared in my opinion the main script, which
called the function should decide how to go further.

This is very similar to C where you call the function and take care of
the return code and decide what to do next.

Usually in C we return an integer or even better an enum.

So in this case I have to disagree with you and strongly recommend to
keep the error processing at it is.

Best regards,

-Stefan


+
+ # Check if we ran out of download re-tries.
+ } elsif ($dl_attempt eq $max_dl_attempts) {
+ # Obtain error.
+ my $error = $response->content;
+
+ # Remove temporary file, if one exists.
+ unlink("$tmpfile") if (-e "$tmpfile");
+
+ # Return the error message from response..
+ return "$error";
+ }
+
+ # Remove temporary file, if one exists.
+ unlink("$tmpfile") if (-e "$tmpfile");
+
+ # Increase download attempt counter.
+ $dl_attempt++;
+ }
+
+ # Obtain the connection headers.
+ my $headers = $response->headers;
+
+ # Check if an Etag file has been provided.
+ if ($etags_file) {
+ # Grab the Etag from the response if the server provides one.
+ if ($response->header('Etag')) {
+ # Add the provided Etag to the hash of tags.
+ $etags{$etagprefix} = $response->header('Etag');
+
+ # Write the etags file.
+ &writehash($etags_file, \%etags);
+ }
+ }
+
+ # Check if the response should be stored on disk.
+ if ($file) {
+ # Get the remote size of the content.
+ my $remote_size = $response->header('Content-Length');
+
+ # Perform a stat on the temporary file.
+ my $stat = stat($tmpfile);
+
+ # Grab the size of the stored temporary file.
+ my $local_size = $stat->size;
+
+ # Check if both sizes are equal.
+ if(($remote_size) && ($remote_size ne $local_size)) {
+ # Delete the temporary file.
+ unlink("$tmpfile");
+
+ # Abort and return "incomplete download" as error.
+ return "incomplete download";
+ }
+
+ # Move the temporaray file to the desired file by overwriting a
may
+ # existing one.
+ move("$tmpfile", "$file");
+
+ # Omit the timestamp from response header, when the file has been
modified the
+ # last time.
+ my $last_modified = $headers->last_modified;
+
+ # Check if we got a last-modified value from the server.
+ if ($last_modified) {
+ # Assign the last-modified timestamp as mtime to the
+ # stored file.
+ utime(time(), "$last_modified", "$file");
+ }
+
+ # Delete temporary file.
+ unlink("$tmpfile");
+
+ # If we got here, everything worked fine. Return nothing.
+ return;
+ } else {
+ # Decode the response content and return it.
+ return $response->decoded_content;
+ }
+}
+
# Cloud Stuff

sub running_in_cloud() {
-- 
2.47.2