From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.haj.ipfire.org (localhost [127.0.0.1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4ZNBmD11rmz32gJ for ; Wed, 26 Mar 2025 16:18:20 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) client-signature RSA-PSS (4096 bits)) (Client CN "mail01.haj.ipfire.org", Issuer "R10" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4ZNBm83zRBz309B for ; Wed, 26 Mar 2025 16:18:16 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 4ZNBm768Zyz1XS; Wed, 26 Mar 2025 16:18:15 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1743005896; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EEpR5Xv0t9LeXcr8EWqPf57ywXcDNWqGz0jOIHtKsr0=; b=C0P4t3Hg285V/4ydKl7wy8SIdLvf5NmwG5tdhabeJcGAIozpEhp5Z1BMDWqzLA8C4/RI54 3olyif9M/CwazOAw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1743005896; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EEpR5Xv0t9LeXcr8EWqPf57ywXcDNWqGz0jOIHtKsr0=; b=ES2ICRd0KG7O2toDdsTwtqyAPEyHVbUn32dUWRaXF2RVQuXLR5oXzB97/uozHOtw2ljlQ5 xcrThhuy0v/9YMpk40G5pAvW0RvRhfQ2CFsgoCOxefP0p4QofuJQqdScPKW90mx9vySmlC 5xmpI2xsU9v3woCfgAZCgU3Smss/BW/Aa5ST+RwvDsQPCZqpEQR92k5M9qMePTbTuJJrV6 iL7mvkAX9MSW19WAJS0aSPqOwaL7NUrwNGwVzLNKcdrJjCyGOC4iUy7dhKjcaxFwT2Wa/H XU4meBxnRsdDlzz/m28LSCfxn2dK7nDZphzR7Siza8yUhrC5i1Y1TkG8cKwh9g== Content-Type: text/plain; charset=utf-8 Precedence: list List-Id: List-Subscribe: , List-Unsubscribe: , List-Post: List-Help: Sender: Mail-Followup-To: Mime-Version: 1.0 Subject: Re: [PATCH 1/3] general-functions.pl: Add LWP-based flexible downloader function. From: Michael Tremer In-Reply-To: <195d27e4348.27e7.cac9d3ffac9e24d09d20af05166fd73b@ipfire.org> Date: Wed, 26 Mar 2025 16:18:15 +0000 Cc: development@lists.ipfire.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <20250322145724.4593-1-stefan.schantl@ipfire.org> <31565D56-B0C2-4BA0-BF75-7902C376B35E@ipfire.org> <195d27e4348.27e7.cac9d3ffac9e24d09d20af05166fd73b@ipfire.org> To: Stefan Schantl Hello Stefan, > On 26 Mar 2025, at 12:47, Stefan Schantl = wrote: >=20 >=20 >=20 > Am 24. M=C3=A4rz 2025 11:02:08 schrieb Michael Tremer = : >=20 >> Hello, >>=20 >>> On 23 Mar 2025, at 19:00, Stefan Schantl = wrote: >>>=20 >>> Hello Michael, >>>=20 >>> thanks for the review and your detailed feedback. >>>=20 >>> You are right this patch mainly moves the downloader logic from the >>> ipblocklist feature into an own easy to use general function. >>>=20 >>>> Morning Stefan, >>>>=20 >>>> Thank you for taking the time and getting this out of your git >>>> repository and onto this list :) >>>>=20 >>>> As we are already shipping this code and this patch only moves it >>>> around, there is probably not much standing in the way of seeing = this >>>> being merged. I have however a couple of questions that we might = want >>>> to address in a follow-up patch. >>>>=20 >>>>> On 22 Mar 2025, at 14:57, Stefan Schantl >>>>> wrote: >>>>>=20 >>>>> This function can be used to grab content and/or store it into >>>>> files. >>>>>=20 >>>>> Signed-off-by: Stefan Schantl >>>>> --- >>>>> config/cfgroot/general-functions.pl | 262 >>>>> ++++++++++++++++++++++++++++ >>>>> 1 file changed, 262 insertions(+) >>>>>=20 >>>>> diff --git a/config/cfgroot/general-functions.pl >>>>> b/config/cfgroot/general-functions.pl >>>>> index 8ba6e3f79..cb8df69c6 100644 >>>>> --- a/config/cfgroot/general-functions.pl >>>>> +++ b/config/cfgroot/general-functions.pl >>>>> @@ -20,6 +20,23 @@ use IO::Socket; >>>>> use Net::SSLeay; >>>>> use Net::IPv4Addr qw(:all); >>>>>=20 >>>>> +# Load module to move files. >>>>> +use File::Copy; >>>>> + >>>>> +# Load module to get file stats. >>>>> +use File::stat; >>>>> + >>>>> +# Load module to deal with temporary files. >>>>> +use File::Temp; >>>>> + >>>>> +# Load module to deal with the date formats used by the HTTP >>>>> protocol. >>>>> +use HTTP::Date; >>>>> + >>>>> +# Load the libwwwperl User Agent module. >>>>> +use LWP::UserAgent; >>>>=20 >>>> Recently (let=E2=80=99s say a year ago) I have spent some time on = making the >>>> web UI faster. Especially on some hardware, is it not very snappy. >>>> The reason for the slowness however was not that we are doing much >>>> stuff in our code; it was rather that loading modules was taking a >>>> long time in Perl. So I removed what was no longer needed and moved >>>> some other stuff around which slightly helped (saved around 100ms = on >>>> the older IPFire Mini Appliance). >>>>=20 >>>> Since the downloading functionality is not used *that* much here. >>>> Should this maybe not live in a separate place and we only include = it >>>> when it is needed? >>>=20 >>> Sounds good, so I'm totally fine with this suggestion. This can be = part >>> of a following patch(set). >>=20 >> Well, moving it should not be a separate patch set, because we should = not move it twice. >=20 >=20 > Hello Michael, >=20 > this absolutely makes sense. Any ideas for the filename? >=20 > I think downloader.pl could be misinterpreted by a certain users - = maybe web-function.pl? download-functions.pl? >=20 >>=20 >> But any of the other fixes should be implemented in a separate patch = each. >=20 >=20 > Let's do it this way. >=20 >>=20 >>>>=20 >>>>> + >>>>> +$|=3D1; # line buffering >>>>=20 >>>> Why is this needed? >>>=20 >>> This line does not belong to the patch and is already part of the >>> general-functions.pl file. >>>=20 >>> The following can be found in the perl special variables manual = about >>> $|:=20 >>>=20 >>> $| - If set to nonzero, forces an fflush(3) after every write or = print >>> on the currently selected output channel. >>>=20 >>> I'm not sure, if this line is required in general. >>=20 >> I know what it does. I just wanted to know why this has been enabled = here. >=20 >=20 > Short answer: No clue. Ooookay. >=20 >>=20 >>>=20 >>>>=20 >>>>> + >>>>> $General::version =3D 'VERSION'; >>>>> $General::swroot =3D 'CONFIG_ROOT'; >>>>> $General::noipprefix =3D 'noipg-'; >>>>> @@ -1346,6 +1363,251 @@ sub generateProtoTransHash () { >>>>> return %protocols; >>>>> } >>>>>=20 >>>>> +# Function to grab a given URL content or to download and store = it >>>>> on disk. >>>>> +# >>>>> +# The function requires a configuration hash to be passed. >>>>> +# >>>>> +# The following options (hash keys) are supported: >>>>> +# >>>>> +# URL -> The URL to the content or file. REQUIRED! >>>>> +# FILE -> The filename as fullpath where the content/file should >>>>> be stored on disk. >>>>> +# ETAGSFILE -> A filename again as fullpath where Etags should be >>>>> stored and read. >>>>> +# ETAGPREFIX -> In case a custom etag name should be used, >>>>> otherwise it defaults to the given URL. >>>>> +# MAXSIZE -> In bytes until the downloader will abort = downloading. >>>>> (example: 10_485_760 for 10MB) >>>>> +# >>>>> +# If a file is given an If-Modified-Since header will be = generated >>>>> from the last modified timestamp >>>>> +# of an already stored file. In case an Etag file is specified an >>>>> If-None-Match header will be added to >>>>> +# the request - Both can be used at the same time. >>>>> +# >>>>> +# In case no FILE option has been passed to the function, the >>>>> content of the requested URL will be returned. >>>>> +# >>>>> +# Return codes (if FILE is used): >>>>> +# >>>>> +# nothing - On success >>>>> +# no url - If no URL has been specified. >>>>> +# not_modified - In case the servers responds with "Not modified" >>>>> (304) >>>>> +# dl_error - If the requested URL cannot be accessed. >>>>> +# incomplete download - In case the size of the local file does >>>>> not match the remote content_lenght. >>>>> +# >>>>> +sub downloader (%) { >>>>> + my (%args) =3D @_; >>>>> + >>>>> + # Remap args hash and convert all keys into upper case format. >>>>> + %args =3D map { uc $_ =3D> $args{$_} } keys %args; >>>>> + >>>>> + # The amount of download attempts before giving up and >>>>> + # logging an error. >>>>> + my $max_dl_attempts =3D 3; >>>>> + >>>>> + # Temporary directory to download the files. >>>>> + my $tmp_dl_directory =3D "/var/tmp"; >>>>> + >>>>> + # Assign hash values. >>>>> + my $url =3D $args{"URL"} if (exists($args{"URL"})); >>>>> + my $file =3D $args{"FILE"} if (exists($args{"FILE"})); >>>>> + my $etags_file =3D $args{"ETAGSFILE"} if >>>>> (exists($args{"ETAGSFILE"})); >>>>> + my $etagprefix =3D $url; >>>>> + $etagprefix =3D $args{"ETAGPREFIX"} if >>>>> (exists($args{"ETAGPREFIX"})); >>>>> + my $max_size =3D $args{"MAXSIZE"} if (exists($args{"MAXSIZE"})); >>>>=20 >>>> I consider this really ugly, but I suppose this is what Perl wants. >>>>=20 >>>>> + # Abort with error "no url", if no URL has been given. >>>>> + return "no url" unless ($url); >>>>=20 >>>> Could we rather die here with an error? Just returning a string = here >>>> seems to be too easy to ignore in case something above went wrong. >>>=20 >>> Technically you are right, but let me explain the reason for this = kind >>> of error handling a bit further down. >>>=20 >>>>=20 >>>>> + >>>>> + my %etags =3D (); >>>>> + my $tmpfile; >>>>> + >>>>> + # Read-in proxysettings. >>>>> + my %proxysettings=3D(); >>>>> + &readhash("${General::swroot}/proxy/settings", \%proxysettings); >>>>> + >>>>> + # Create a user agent instance. >>>>> + # >>>>> + # Request SSL hostname verification and specify path >>>>> + # to the CA file. >>>>> + my $ua =3D LWP::UserAgent->new( >>>>> + ssl_opts =3D> { >>>>> + SSL_ca_file =3D> '/etc/ssl/cert.pem', >>>>> + verify_hostname =3D> 1, >>>>> + }, >>>>> + ); >>>>> + >>>>> + # Set timeout to 10 seconds. >>>>> + $ua->timeout(10); >>>>=20 >>>> A 10 second timeout also sounds rather optimistic considering some >>>> slower, overloaded mirrors or those that are quite far away. For = any >>>> tasks like downloading data in the background we could afford = longer. >>>=20 >>> Accepted, what would be a good default value here? >>=20 >> 60 seconds? I think it makes sense to make this optionally = configurable so that for the IP check that has been changed here, we = will only wait around 10 seconds as it is a simple API request. >=20 >=20 > I'll implement it as optional and config-able feature. Default if not = set would be 60 seconds. Agreed. >>=20 >>>=20 >>>>=20 >>>>> + >>>>> + # Assign maximum download size if set. >>>>> + $ua->max_size($max_size) if ($max_size); >>>>> + >>>>> + # Generate UserAgent. >>>>> + my $agent =3D &MakeUserAgent(); >>>>> + >>>>> + # Set the generated UserAgent. >>>>> + $ua->agent($agent); >>>>> + >>>>> + # Check if an upstream proxy is configured. >>>>> + if ($proxysettings{'UPSTREAM_PROXY'}) { >>>>> + my $proxy_url; >>>>> + >>>>> + $proxy_url =3D "http://"; >>>>> + >>>>> + # Check if the proxy requires authentication. >>>>> + if (($proxysettings{'UPSTREAM_USER'}) && >>>>> ($proxysettings{'UPSTREAM_PASSWORD'})) { >>>>> + $proxy_url .=3D >>>>> = "$proxysettings{'UPSTREAM_USER'}\:$proxysettings{'UPSTREAM_PASSWORD >>>>> '}\@"; >>>>> + } >>>>> + >>>>> + # Add proxy server address and port. >>>>> + $proxy_url .=3D $proxysettings{'UPSTREAM_PROXY'}; >>>>> + >>>>> + # Append proxy settings. >>>>> + $ua->proxy(['http', 'https'], $proxy_url); >>>>> + } >>>>> + >>>>> + # Create a HTTP request element and pass the given URL to it. >>>>> + my $request =3D HTTP::Request->new(GET =3D> $url); >>>>> + >>>>> + # Check if a file to store the output has been provided. >>>>> + if ($file) { >>>>> + # Check if the given file already exits, because it has been >>>>> downloaded in the past. >>>>> + # >>>>> + # In this case we are requesting the server if the remote file >>>>> has been changed or not. >>>>> + # This will be done by sending the modification time in a = special >>>>> HTTP header. >>>>> + if (-f $file) { >>>>> + # Call stat on the file. >>>>> + my $stat =3D stat($file); >>>>> + >>>>> + # Omit the mtime of the existing file. >>>>> + my $mtime =3D $stat->mtime; >>>>> + >>>>> + # Convert the timestamp into right format. >>>>> + my $http_date =3D time2str($mtime); >>>>> + >>>>> + # Add the If-Modified-Since header to the request to ask the >>>>> server if the >>>>> + # file has been modified. >>>>> + $request->header( 'If-Modified-Since' =3D> "$http_date" ); >>>>> + } >>>>> + >>>>> + # Generate temporary file name, located in the tempoary download >>>>> directory and with a suffix of ".tmp". >>>>> + # The downloaded file will be stored there until some sanity >>>>> checks are performed. >>>>> + my $tmp =3D File::Temp->new( SUFFIX =3D> ".tmp", DIR =3D> >>>>> "$tmp_dl_directory/", UNLINK =3D> 0 ); >>>>> + $tmpfile =3D $tmp->filename(); >>>>> + } >>>>> + >>>>> + # Check if an file for etags has been given. >>>>> + if ($etags_file) { >>>>> + # Read-in Etags file for known Etags if the file is present. >>>>> + &readhash("$etags_file", \%etags) if (-f $etags_file); >>>>> + >>>>> + # Check if an Etag for the current provider is stored. >>>>> + if ($etags{$etagprefix}) { >>>>> + # Grab the stored tag. >>>>> + my $etag =3D $etags{$etagprefix}; >>>>> + >>>>> + # Add an "If-None-Match header to the request to ask the server >>>>> if the >>>>> + # file has been modified. >>>>> + $request->header( 'If-None-Match' =3D> $etag ); >>>>> + } >>>>> + } >>>>> + >>>>> + my $dl_attempt =3D 1; >>>>> + my $response; >>>>> + >>>>> + # Download and retry on failure. >>>>> + while ($dl_attempt <=3D $max_dl_attempts) { >>>>> + # Perform the request and save the output into the tmpfile if >>>>> requested. >>>>> + $response =3D $ua->request($request, $tmpfile); >>>>> + >>>>> + # Check if the download was successfull. >>>>> + if($response->is_success) { >>>>> + # Break loop. >>>>> + last; >>>>> + >>>>> + # Check if the server responds with 304 (Not Modified). >>>>> + } elsif ($response->code =3D=3D 304) { >>>>> + # Remove temporary file, if one exists. >>>>> + unlink("$tmpfile") if (-e "$tmpfile"); >>>>> + >>>>> + # Return "not modified". >>>>> + return "not modified"; >>>>=20 >>>> Same here. It feels like strings are not the most ideal return = code. >>>=20 >>> As promised up, I'm going to explain why I decided to return various >>> strings in case of different errors. Calling die() immediately >>> terminates the perl (main) script. >>=20 >> This is correct, and I never really like this behaviour. However, = Perl does not know exceptions and so we cannot gracefully throw an = error. >>=20 >> But, not passing a URL is a programming error and that should get = some attention. >=20 >=20 > Yeah, you have take care of this kind of perl behaviour and I'm also = not in favour of it. That was the reason why I decided to do it in this = way. >=20 > I see the reason why you are recommending to die in case no url has = passed - I'll adjust the code. >=20 > Best regards, >=20 > -Stefan >=20 >>=20 >> The =E2=80=9Cnot modified=E2=80=9D case should of course not call = die. >>=20 >>> So for example when processing a list of downloads and such an >>> event/exception/error appeared in my opinion the main script, which >>> called the function should decide how to go further. >>>=20 >>> This is very similar to C where you call the function and take care = of >>> the return code and decide what to do next. >>=20 >> Usually in C we return an integer or even better an enum. >>=20 >>> So in this case I have to disagree with you and strongly recommend = to >>> keep the error processing at it is. >>>=20 >>> Best regards, >>>=20 >>> -Stefan >>>=20 >>>>=20 >>>>> + >>>>> + # Check if we ran out of download re-tries. >>>>> + } elsif ($dl_attempt eq $max_dl_attempts) { >>>>> + # Obtain error. >>>>> + my $error =3D $response->content; >>>>> + >>>>> + # Remove temporary file, if one exists. >>>>> + unlink("$tmpfile") if (-e "$tmpfile"); >>>>> + >>>>> + # Return the error message from response.. >>>>> + return "$error"; >>>>> + } >>>>> + >>>>> + # Remove temporary file, if one exists. >>>>> + unlink("$tmpfile") if (-e "$tmpfile"); >>>>> + >>>>> + # Increase download attempt counter. >>>>> + $dl_attempt++; >>>>> + } >>>>> + >>>>> + # Obtain the connection headers. >>>>> + my $headers =3D $response->headers; >>>>> + >>>>> + # Check if an Etag file has been provided. >>>>> + if ($etags_file) { >>>>> + # Grab the Etag from the response if the server provides one. >>>>> + if ($response->header('Etag')) { >>>>> + # Add the provided Etag to the hash of tags. >>>>> + $etags{$etagprefix} =3D $response->header('Etag'); >>>>> + >>>>> + # Write the etags file. >>>>> + &writehash($etags_file, \%etags); >>>>> + } >>>>> + } >>>>> + >>>>> + # Check if the response should be stored on disk. >>>>> + if ($file) { >>>>> + # Get the remote size of the content. >>>>> + my $remote_size =3D $response->header('Content-Length'); >>>>> + >>>>> + # Perform a stat on the temporary file. >>>>> + my $stat =3D stat($tmpfile); >>>>> + >>>>> + # Grab the size of the stored temporary file. >>>>> + my $local_size =3D $stat->size; >>>>> + >>>>> + # Check if both sizes are equal. >>>>> + if(($remote_size) && ($remote_size ne $local_size)) { >>>>> + # Delete the temporary file. >>>>> + unlink("$tmpfile"); >>>>> + >>>>> + # Abort and return "incomplete download" as error. >>>>> + return "incomplete download"; >>>>> + } >>>>> + >>>>> + # Move the temporaray file to the desired file by overwriting a >>>>> may >>>>> + # existing one. >>>>> + move("$tmpfile", "$file"); >>>>> + >>>>> + # Omit the timestamp from response header, when the file has = been >>>>> modified the >>>>> + # last time. >>>>> + my $last_modified =3D $headers->last_modified; >>>>> + >>>>> + # Check if we got a last-modified value from the server. >>>>> + if ($last_modified) { >>>>> + # Assign the last-modified timestamp as mtime to the >>>>> + # stored file. >>>>> + utime(time(), "$last_modified", "$file"); >>>>> + } >>>>> + >>>>> + # Delete temporary file. >>>>> + unlink("$tmpfile"); >>>>> + >>>>> + # If we got here, everything worked fine. Return nothing. >>>>> + return; >>>>> + } else { >>>>> + # Decode the response content and return it. >>>>> + return $response->decoded_content; >>>>> + } >>>>> +} >>>>> + >>>>> # Cloud Stuff >>>>>=20 >>>>> sub running_in_cloud() { >>>>> --=20 >>>>> 2.47.2 >=20