Hello,On 23 Mar 2025, at 19:00, Stefan Schantl <stefan.schantl@ipfire.org> wrote:Hello Michael,thanks for the review and your detailed feedback.You are right this patch mainly moves the downloader logic from theipblocklist feature into an own easy to use general function.Morning Stefan,Thank you for taking the time and getting this out of your gitrepository and onto this list :)As we are already shipping this code and this patch only moves itaround, there is probably not much standing in the way of seeing thisbeing merged. I have however a couple of questions that we might wantto address in a follow-up patch.On 22 Mar 2025, at 14:57, Stefan Schantl<stefan.schantl@ipfire.org> wrote:This function can be used to grab content and/or store it intofiles.Signed-off-by: Stefan Schantl <stefan.schantl@ipfire.org>---config/cfgroot/general-functions.pl | 262++++++++++++++++++++++++++++1 file changed, 262 insertions(+)diff --git a/config/cfgroot/general-functions.plb/config/cfgroot/general-functions.plindex 8ba6e3f79..cb8df69c6 100644--- a/config/cfgroot/general-functions.pl+++ b/config/cfgroot/general-functions.pl@@ -20,6 +20,23 @@ use IO::Socket;use Net::SSLeay;use Net::IPv4Addr qw(:all);+# Load module to move files.+use File::Copy;++# Load module to get file stats.+use File::stat;++# Load module to deal with temporary files.+use File::Temp;++# Load module to deal with the date formats used by the HTTPprotocol.+use HTTP::Date;++# Load the libwwwperl User Agent module.+use LWP::UserAgent;Recently (let’s say a year ago) I have spent some time on making theweb UI faster. Especially on some hardware, is it not very snappy.The reason for the slowness however was not that we are doing muchstuff in our code; it was rather that loading modules was taking along time in Perl. So I removed what was no longer needed and movedsome other stuff around which slightly helped (saved around 100ms onthe older IPFire Mini Appliance).Since the downloading functionality is not used *that* much here.Should this maybe not live in a separate place and we only include itwhen it is needed?Sounds good, so I'm totally fine with this suggestion. This can be partof a following patch(set).Well, moving it should not be a separate patch set, because we should not move it twice.
But any of the other fixes should be implemented in a separate patch each.
++$|=1; # line bufferingWhy is this needed?This line does not belong to the patch and is already part of thegeneral-functions.pl file.The following can be found in the perl special variables manual about$|:$| - If set to nonzero, forces an fflush(3) after every write or printon the currently selected output channel.I'm not sure, if this line is required in general.I know what it does. I just wanted to know why this has been enabled here.
+$General::version = 'VERSION';$General::swroot = 'CONFIG_ROOT';$General::noipprefix = 'noipg-';@@ -1346,6 +1363,251 @@ sub generateProtoTransHash () {return %protocols;}+# Function to grab a given URL content or to download and store iton disk.+#+# The function requires a configuration hash to be passed.+#+# The following options (hash keys) are supported:+#+# URL -> The URL to the content or file. REQUIRED!+# FILE -> The filename as fullpath where the content/file shouldbe stored on disk.+# ETAGSFILE -> A filename again as fullpath where Etags should bestored and read.+# ETAGPREFIX -> In case a custom etag name should be used,otherwise it defaults to the given URL.+# MAXSIZE -> In bytes until the downloader will abort downloading.(example: 10_485_760 for 10MB)+#+# If a file is given an If-Modified-Since header will be generatedfrom the last modified timestamp+# of an already stored file. In case an Etag file is specified anIf-None-Match header will be added to+# the request - Both can be used at the same time.+#+# In case no FILE option has been passed to the function, thecontent of the requested URL will be returned.+#+# Return codes (if FILE is used):+#+# nothing - On success+# no url - If no URL has been specified.+# not_modified - In case the servers responds with "Not modified"(304)+# dl_error - If the requested URL cannot be accessed.+# incomplete download - In case the size of the local file doesnot match the remote content_lenght.+#+sub downloader (%) {+ my (%args) = @_;++ # Remap args hash and convert all keys into upper case format.+ %args = map { uc $_ => $args{$_} } keys %args;++ # The amount of download attempts before giving up and+ # logging an error.+ my $max_dl_attempts = 3;++ # Temporary directory to download the files.+ my $tmp_dl_directory = "/var/tmp";++ # Assign hash values.+ my $url = $args{"URL"} if (exists($args{"URL"}));+ my $file = $args{"FILE"} if (exists($args{"FILE"}));+ my $etags_file = $args{"ETAGSFILE"} if(exists($args{"ETAGSFILE"}));+ my $etagprefix = $url;+ $etagprefix = $args{"ETAGPREFIX"} if(exists($args{"ETAGPREFIX"}));+ my $max_size = $args{"MAXSIZE"} if (exists($args{"MAXSIZE"}));I consider this really ugly, but I suppose this is what Perl wants.+ # Abort with error "no url", if no URL has been given.+ return "no url" unless ($url);Could we rather die here with an error? Just returning a string hereseems to be too easy to ignore in case something above went wrong.Technically you are right, but let me explain the reason for this kindof error handling a bit further down.++ my %etags = ();+ my $tmpfile;++ # Read-in proxysettings.+ my %proxysettings=();+ &readhash("${General::swroot}/proxy/settings", \%proxysettings);++ # Create a user agent instance.+ #+ # Request SSL hostname verification and specify path+ # to the CA file.+ my $ua = LWP::UserAgent->new(+ ssl_opts => {+ SSL_ca_file => '/etc/ssl/cert.pem',+ verify_hostname => 1,+ },+ );++ # Set timeout to 10 seconds.+ $ua->timeout(10);A 10 second timeout also sounds rather optimistic considering someslower, overloaded mirrors or those that are quite far away. For anytasks like downloading data in the background we could afford longer.Accepted, what would be a good default value here?60 seconds? I think it makes sense to make this optionally configurable so that for the IP check that has been changed here, we will only wait around 10 seconds as it is a simple API request.
++ # Assign maximum download size if set.+ $ua->max_size($max_size) if ($max_size);++ # Generate UserAgent.+ my $agent = &MakeUserAgent();++ # Set the generated UserAgent.+ $ua->agent($agent);++ # Check if an upstream proxy is configured.+ if ($proxysettings{'UPSTREAM_PROXY'}) {+ my $proxy_url;++ $proxy_url = "http://";++ # Check if the proxy requires authentication.+ if (($proxysettings{'UPSTREAM_USER'}) &&($proxysettings{'UPSTREAM_PASSWORD'})) {+ $proxy_url .="$proxysettings{'UPSTREAM_USER'}\:$proxysettings{'UPSTREAM_PASSWORD'}\@";+ }++ # Add proxy server address and port.+ $proxy_url .= $proxysettings{'UPSTREAM_PROXY'};++ # Append proxy settings.+ $ua->proxy(['http', 'https'], $proxy_url);+ }++ # Create a HTTP request element and pass the given URL to it.+ my $request = HTTP::Request->new(GET => $url);++ # Check if a file to store the output has been provided.+ if ($file) {+ # Check if the given file already exits, because it has beendownloaded in the past.+ #+ # In this case we are requesting the server if the remote filehas been changed or not.+ # This will be done by sending the modification time in a specialHTTP header.+ if (-f $file) {+ # Call stat on the file.+ my $stat = stat($file);++ # Omit the mtime of the existing file.+ my $mtime = $stat->mtime;++ # Convert the timestamp into right format.+ my $http_date = time2str($mtime);++ # Add the If-Modified-Since header to the request to ask theserver if the+ # file has been modified.+ $request->header( 'If-Modified-Since' => "$http_date" );+ }++ # Generate temporary file name, located in the tempoary downloaddirectory and with a suffix of ".tmp".+ # The downloaded file will be stored there until some sanitychecks are performed.+ my $tmp = File::Temp->new( SUFFIX => ".tmp", DIR =>"$tmp_dl_directory/", UNLINK => 0 );+ $tmpfile = $tmp->filename();+ }++ # Check if an file for etags has been given.+ if ($etags_file) {+ # Read-in Etags file for known Etags if the file is present.+ &readhash("$etags_file", \%etags) if (-f $etags_file);++ # Check if an Etag for the current provider is stored.+ if ($etags{$etagprefix}) {+ # Grab the stored tag.+ my $etag = $etags{$etagprefix};++ # Add an "If-None-Match header to the request to ask the serverif the+ # file has been modified.+ $request->header( 'If-None-Match' => $etag );+ }+ }++ my $dl_attempt = 1;+ my $response;++ # Download and retry on failure.+ while ($dl_attempt <= $max_dl_attempts) {+ # Perform the request and save the output into the tmpfile ifrequested.+ $response = $ua->request($request, $tmpfile);++ # Check if the download was successfull.+ if($response->is_success) {+ # Break loop.+ last;++ # Check if the server responds with 304 (Not Modified).+ } elsif ($response->code == 304) {+ # Remove temporary file, if one exists.+ unlink("$tmpfile") if (-e "$tmpfile");++ # Return "not modified".+ return "not modified";Same here. It feels like strings are not the most ideal return code.As promised up, I'm going to explain why I decided to return variousstrings in case of different errors. Calling die() immediatelyterminates the perl (main) script.This is correct, and I never really like this behaviour. However, Perl does not know exceptions and so we cannot gracefully throw an error.But, not passing a URL is a programming error and that should get some attention.
The “not modified” case should of course not call die.So for example when processing a list of downloads and such anevent/exception/error appeared in my opinion the main script, whichcalled the function should decide how to go further.This is very similar to C where you call the function and take care ofthe return code and decide what to do next.Usually in C we return an integer or even better an enum.So in this case I have to disagree with you and strongly recommend tokeep the error processing at it is.Best regards,-Stefan++ # Check if we ran out of download re-tries.+ } elsif ($dl_attempt eq $max_dl_attempts) {+ # Obtain error.+ my $error = $response->content;++ # Remove temporary file, if one exists.+ unlink("$tmpfile") if (-e "$tmpfile");++ # Return the error message from response..+ return "$error";+ }++ # Remove temporary file, if one exists.+ unlink("$tmpfile") if (-e "$tmpfile");++ # Increase download attempt counter.+ $dl_attempt++;+ }++ # Obtain the connection headers.+ my $headers = $response->headers;++ # Check if an Etag file has been provided.+ if ($etags_file) {+ # Grab the Etag from the response if the server provides one.+ if ($response->header('Etag')) {+ # Add the provided Etag to the hash of tags.+ $etags{$etagprefix} = $response->header('Etag');++ # Write the etags file.+ &writehash($etags_file, \%etags);+ }+ }++ # Check if the response should be stored on disk.+ if ($file) {+ # Get the remote size of the content.+ my $remote_size = $response->header('Content-Length');++ # Perform a stat on the temporary file.+ my $stat = stat($tmpfile);++ # Grab the size of the stored temporary file.+ my $local_size = $stat->size;++ # Check if both sizes are equal.+ if(($remote_size) && ($remote_size ne $local_size)) {+ # Delete the temporary file.+ unlink("$tmpfile");++ # Abort and return "incomplete download" as error.+ return "incomplete download";+ }++ # Move the temporaray file to the desired file by overwriting amay+ # existing one.+ move("$tmpfile", "$file");++ # Omit the timestamp from response header, when the file has beenmodified the+ # last time.+ my $last_modified = $headers->last_modified;++ # Check if we got a last-modified value from the server.+ if ($last_modified) {+ # Assign the last-modified timestamp as mtime to the+ # stored file.+ utime(time(), "$last_modified", "$file");+ }++ # Delete temporary file.+ unlink("$tmpfile");++ # If we got here, everything worked fine. Return nothing.+ return;+ } else {+ # Decode the response content and return it.+ return $response->decoded_content;+ }+}+# Cloud Stuffsub running_in_cloud() {--2.47.2