From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.haj.ipfire.org (localhost [127.0.0.1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4ZLpW71kFHz2y97 for ; Mon, 24 Mar 2025 10:02:11 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mail01.haj.ipfire.org", Issuer "R10" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4ZLpW34Mhnz2xS5 for ; Mon, 24 Mar 2025 10:02:07 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 4ZLpW223pLz97; Mon, 24 Mar 2025 10:02:06 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1742810526; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rQ/ecMkuC+b9Dv2UKZWxojwzMs04i9sNpDIa68M06hw=; b=2vb9X8pAUk73l+qOvu+ENWt+Q8/jw9e7vlRqzYmfCBh52tytzqVmdeTq5PcFwREZV/XRh/ EkNblxAxdEtZRXCA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1742810526; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rQ/ecMkuC+b9Dv2UKZWxojwzMs04i9sNpDIa68M06hw=; b=GPF3kHUWpeOIQK5IYqqcmsTC3pyL56sqAfajXxb6oIWhqufOL3Zkwv90cteYahLpDyS+8S KZAWOw3T4fTfhcvBWqe2edMPRoWFwSeaVZu3k72oT4nt/ATDs6er3jo8wSlTH1wTiIqdoR yz3a7YJwxbHIzDnph/iWCaX1AN2xfA6fNtAkmKAQOTIU/KRaWtM1lOihOpuCkQN50HxmZN pkhE/jUyw5W8tiuYaxeAmlPOubOVfImnnM0DviIWzXoeS8ggecCd7TUwdLjYOmKGVJ8nio E9Ei9UJZYspgSnqQtvgCy+upAFOHBcPVDwNbwpMHnZdO552QiZOpbJt6Fgh/YQ== Content-Type: text/plain; charset=utf-8 Precedence: list List-Id: List-Subscribe: , List-Unsubscribe: , List-Post: List-Help: Sender: Mail-Followup-To: Mime-Version: 1.0 Subject: Re: [PATCH 1/3] general-functions.pl: Add LWP-based flexible downloader function. From: Michael Tremer In-Reply-To: Date: Mon, 24 Mar 2025 10:02:05 +0000 Cc: development@lists.ipfire.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <20250322145724.4593-1-stefan.schantl@ipfire.org> <31565D56-B0C2-4BA0-BF75-7902C376B35E@ipfire.org> To: Stefan Schantl Hello, > On 23 Mar 2025, at 19:00, Stefan Schantl = wrote: >=20 > Hello Michael, >=20 > thanks for the review and your detailed feedback. >=20 > You are right this patch mainly moves the downloader logic from the > ipblocklist feature into an own easy to use general function. >=20 >> Morning Stefan, >>=20 >> Thank you for taking the time and getting this out of your git >> repository and onto this list :) >>=20 >> As we are already shipping this code and this patch only moves it >> around, there is probably not much standing in the way of seeing this >> being merged. I have however a couple of questions that we might want >> to address in a follow-up patch. >>=20 >>> On 22 Mar 2025, at 14:57, Stefan Schantl >>> wrote: >>>=20 >>> This function can be used to grab content and/or store it into >>> files. >>>=20 >>> Signed-off-by: Stefan Schantl >>> --- >>> config/cfgroot/general-functions.pl | 262 >>> ++++++++++++++++++++++++++++ >>> 1 file changed, 262 insertions(+) >>>=20 >>> diff --git a/config/cfgroot/general-functions.pl >>> b/config/cfgroot/general-functions.pl >>> index 8ba6e3f79..cb8df69c6 100644 >>> --- a/config/cfgroot/general-functions.pl >>> +++ b/config/cfgroot/general-functions.pl >>> @@ -20,6 +20,23 @@ use IO::Socket; >>> use Net::SSLeay; >>> use Net::IPv4Addr qw(:all); >>>=20 >>> +# Load module to move files. >>> +use File::Copy; >>> + >>> +# Load module to get file stats. >>> +use File::stat; >>> + >>> +# Load module to deal with temporary files. >>> +use File::Temp; >>> + >>> +# Load module to deal with the date formats used by the HTTP >>> protocol. >>> +use HTTP::Date; >>> + >>> +# Load the libwwwperl User Agent module. >>> +use LWP::UserAgent; >>=20 >> Recently (let=E2=80=99s say a year ago) I have spent some time on = making the >> web UI faster. Especially on some hardware, is it not very snappy. >> The reason for the slowness however was not that we are doing much >> stuff in our code; it was rather that loading modules was taking a >> long time in Perl. So I removed what was no longer needed and moved >> some other stuff around which slightly helped (saved around 100ms on >> the older IPFire Mini Appliance). >>=20 >> Since the downloading functionality is not used *that* much here. >> Should this maybe not live in a separate place and we only include it >> when it is needed? >=20 > Sounds good, so I'm totally fine with this suggestion. This can be = part > of a following patch(set). Well, moving it should not be a separate patch set, because we should = not move it twice. But any of the other fixes should be implemented in a separate patch = each. >>=20 >>> + >>> +$|=3D1; # line buffering >>=20 >> Why is this needed? >=20 > This line does not belong to the patch and is already part of the > general-functions.pl file. >=20 > The following can be found in the perl special variables manual about > $|:=20 >=20 > $| - If set to nonzero, forces an fflush(3) after every write or print > on the currently selected output channel. >=20 > I'm not sure, if this line is required in general. I know what it does. I just wanted to know why this has been enabled = here. >=20 >>=20 >>> + >>> $General::version =3D 'VERSION'; >>> $General::swroot =3D 'CONFIG_ROOT'; >>> $General::noipprefix =3D 'noipg-'; >>> @@ -1346,6 +1363,251 @@ sub generateProtoTransHash () { >>> return %protocols; >>> } >>>=20 >>> +# Function to grab a given URL content or to download and store it >>> on disk. >>> +# >>> +# The function requires a configuration hash to be passed. >>> +# >>> +# The following options (hash keys) are supported: >>> +# >>> +# URL -> The URL to the content or file. REQUIRED! >>> +# FILE -> The filename as fullpath where the content/file should >>> be stored on disk. >>> +# ETAGSFILE -> A filename again as fullpath where Etags should be >>> stored and read. >>> +# ETAGPREFIX -> In case a custom etag name should be used, >>> otherwise it defaults to the given URL. >>> +# MAXSIZE -> In bytes until the downloader will abort downloading. >>> (example: 10_485_760 for 10MB) >>> +# >>> +# If a file is given an If-Modified-Since header will be generated >>> from the last modified timestamp >>> +# of an already stored file. In case an Etag file is specified an >>> If-None-Match header will be added to >>> +# the request - Both can be used at the same time. >>> +# >>> +# In case no FILE option has been passed to the function, the >>> content of the requested URL will be returned. >>> +# >>> +# Return codes (if FILE is used): >>> +# >>> +# nothing - On success >>> +# no url - If no URL has been specified. >>> +# not_modified - In case the servers responds with "Not modified" >>> (304) >>> +# dl_error - If the requested URL cannot be accessed. >>> +# incomplete download - In case the size of the local file does >>> not match the remote content_lenght. >>> +# >>> +sub downloader (%) { >>> + my (%args) =3D @_; >>> + >>> + # Remap args hash and convert all keys into upper case format. >>> + %args =3D map { uc $_ =3D> $args{$_} } keys %args; >>> + >>> + # The amount of download attempts before giving up and >>> + # logging an error. >>> + my $max_dl_attempts =3D 3; >>> + >>> + # Temporary directory to download the files. >>> + my $tmp_dl_directory =3D "/var/tmp"; >>> + >>> + # Assign hash values. >>> + my $url =3D $args{"URL"} if (exists($args{"URL"})); >>> + my $file =3D $args{"FILE"} if (exists($args{"FILE"})); >>> + my $etags_file =3D $args{"ETAGSFILE"} if >>> (exists($args{"ETAGSFILE"})); >>> + my $etagprefix =3D $url; >>> + $etagprefix =3D $args{"ETAGPREFIX"} if >>> (exists($args{"ETAGPREFIX"})); >>> + my $max_size =3D $args{"MAXSIZE"} if (exists($args{"MAXSIZE"})); >>=20 >> I consider this really ugly, but I suppose this is what Perl wants. >>=20 >>> + # Abort with error "no url", if no URL has been given. >>> + return "no url" unless ($url); >>=20 >> Could we rather die here with an error? Just returning a string here >> seems to be too easy to ignore in case something above went wrong. >=20 > Technically you are right, but let me explain the reason for this kind > of error handling a bit further down. >=20 >>=20 >>> + >>> + my %etags =3D (); >>> + my $tmpfile; >>> + >>> + # Read-in proxysettings. >>> + my %proxysettings=3D(); >>> + &readhash("${General::swroot}/proxy/settings", \%proxysettings); >>> + >>> + # Create a user agent instance. >>> + # >>> + # Request SSL hostname verification and specify path >>> + # to the CA file. >>> + my $ua =3D LWP::UserAgent->new( >>> + ssl_opts =3D> { >>> + SSL_ca_file =3D> '/etc/ssl/cert.pem', >>> + verify_hostname =3D> 1, >>> + }, >>> + ); >>> + >>> + # Set timeout to 10 seconds. >>> + $ua->timeout(10); >>=20 >> A 10 second timeout also sounds rather optimistic considering some >> slower, overloaded mirrors or those that are quite far away. For any >> tasks like downloading data in the background we could afford longer. >=20 > Accepted, what would be a good default value here? 60 seconds? I think it makes sense to make this optionally configurable = so that for the IP check that has been changed here, we will only wait = around 10 seconds as it is a simple API request. >=20 >>=20 >>> + >>> + # Assign maximum download size if set. >>> + $ua->max_size($max_size) if ($max_size); >>> + >>> + # Generate UserAgent. >>> + my $agent =3D &MakeUserAgent(); >>> + >>> + # Set the generated UserAgent. >>> + $ua->agent($agent); >>> + >>> + # Check if an upstream proxy is configured. >>> + if ($proxysettings{'UPSTREAM_PROXY'}) { >>> + my $proxy_url; >>> + >>> + $proxy_url =3D "http://"; >>> + >>> + # Check if the proxy requires authentication. >>> + if (($proxysettings{'UPSTREAM_USER'}) && >>> ($proxysettings{'UPSTREAM_PASSWORD'})) { >>> + $proxy_url .=3D >>> "$proxysettings{'UPSTREAM_USER'}\:$proxysettings{'UPSTREAM_PASSWORD >>> '}\@"; >>> + } >>> + >>> + # Add proxy server address and port. >>> + $proxy_url .=3D $proxysettings{'UPSTREAM_PROXY'}; >>> + >>> + # Append proxy settings. >>> + $ua->proxy(['http', 'https'], $proxy_url); >>> + } >>> + >>> + # Create a HTTP request element and pass the given URL to it. >>> + my $request =3D HTTP::Request->new(GET =3D> $url); >>> + >>> + # Check if a file to store the output has been provided. >>> + if ($file) { >>> + # Check if the given file already exits, because it has been >>> downloaded in the past. >>> + # >>> + # In this case we are requesting the server if the remote file >>> has been changed or not. >>> + # This will be done by sending the modification time in a special >>> HTTP header. >>> + if (-f $file) { >>> + # Call stat on the file. >>> + my $stat =3D stat($file); >>> + >>> + # Omit the mtime of the existing file. >>> + my $mtime =3D $stat->mtime; >>> + >>> + # Convert the timestamp into right format. >>> + my $http_date =3D time2str($mtime); >>> + >>> + # Add the If-Modified-Since header to the request to ask the >>> server if the >>> + # file has been modified. >>> + $request->header( 'If-Modified-Since' =3D> "$http_date" ); >>> + } >>> + >>> + # Generate temporary file name, located in the tempoary download >>> directory and with a suffix of ".tmp". >>> + # The downloaded file will be stored there until some sanity >>> checks are performed. >>> + my $tmp =3D File::Temp->new( SUFFIX =3D> ".tmp", DIR =3D> >>> "$tmp_dl_directory/", UNLINK =3D> 0 ); >>> + $tmpfile =3D $tmp->filename(); >>> + } >>> + >>> + # Check if an file for etags has been given. >>> + if ($etags_file) { >>> + # Read-in Etags file for known Etags if the file is present. >>> + &readhash("$etags_file", \%etags) if (-f $etags_file); >>> + >>> + # Check if an Etag for the current provider is stored. >>> + if ($etags{$etagprefix}) { >>> + # Grab the stored tag. >>> + my $etag =3D $etags{$etagprefix}; >>> + >>> + # Add an "If-None-Match header to the request to ask the server >>> if the >>> + # file has been modified. >>> + $request->header( 'If-None-Match' =3D> $etag ); >>> + } >>> + } >>> + >>> + my $dl_attempt =3D 1; >>> + my $response; >>> + >>> + # Download and retry on failure. >>> + while ($dl_attempt <=3D $max_dl_attempts) { >>> + # Perform the request and save the output into the tmpfile if >>> requested. >>> + $response =3D $ua->request($request, $tmpfile); >>> + >>> + # Check if the download was successfull. >>> + if($response->is_success) { >>> + # Break loop. >>> + last; >>> + >>> + # Check if the server responds with 304 (Not Modified). >>> + } elsif ($response->code =3D=3D 304) { >>> + # Remove temporary file, if one exists. >>> + unlink("$tmpfile") if (-e "$tmpfile"); >>> + >>> + # Return "not modified". >>> + return "not modified"; >>=20 >> Same here. It feels like strings are not the most ideal return code. >=20 > As promised up, I'm going to explain why I decided to return various > strings in case of different errors. Calling die() immediately > terminates the perl (main) script. This is correct, and I never really like this behaviour. However, Perl = does not know exceptions and so we cannot gracefully throw an error. But, not passing a URL is a programming error and that should get some = attention. The =E2=80=9Cnot modified=E2=80=9D case should of course not call die. > So for example when processing a list of downloads and such an > event/exception/error appeared in my opinion the main script, which > called the function should decide how to go further. >=20 > This is very similar to C where you call the function and take care of > the return code and decide what to do next. Usually in C we return an integer or even better an enum. > So in this case I have to disagree with you and strongly recommend to > keep the error processing at it is. >=20 > Best regards, >=20 > -Stefan >=20 >>=20 >>> + >>> + # Check if we ran out of download re-tries. >>> + } elsif ($dl_attempt eq $max_dl_attempts) { >>> + # Obtain error. >>> + my $error =3D $response->content; >>> + >>> + # Remove temporary file, if one exists. >>> + unlink("$tmpfile") if (-e "$tmpfile"); >>> + >>> + # Return the error message from response.. >>> + return "$error"; >>> + } >>> + >>> + # Remove temporary file, if one exists. >>> + unlink("$tmpfile") if (-e "$tmpfile"); >>> + >>> + # Increase download attempt counter. >>> + $dl_attempt++; >>> + } >>> + >>> + # Obtain the connection headers. >>> + my $headers =3D $response->headers; >>> + >>> + # Check if an Etag file has been provided. >>> + if ($etags_file) { >>> + # Grab the Etag from the response if the server provides one. >>> + if ($response->header('Etag')) { >>> + # Add the provided Etag to the hash of tags. >>> + $etags{$etagprefix} =3D $response->header('Etag'); >>> + >>> + # Write the etags file. >>> + &writehash($etags_file, \%etags); >>> + } >>> + } >>> + >>> + # Check if the response should be stored on disk. >>> + if ($file) { >>> + # Get the remote size of the content. >>> + my $remote_size =3D $response->header('Content-Length'); >>> + >>> + # Perform a stat on the temporary file. >>> + my $stat =3D stat($tmpfile); >>> + >>> + # Grab the size of the stored temporary file. >>> + my $local_size =3D $stat->size; >>> + >>> + # Check if both sizes are equal. >>> + if(($remote_size) && ($remote_size ne $local_size)) { >>> + # Delete the temporary file. >>> + unlink("$tmpfile"); >>> + >>> + # Abort and return "incomplete download" as error. >>> + return "incomplete download"; >>> + } >>> + >>> + # Move the temporaray file to the desired file by overwriting a >>> may >>> + # existing one. >>> + move("$tmpfile", "$file"); >>> + >>> + # Omit the timestamp from response header, when the file has been >>> modified the >>> + # last time. >>> + my $last_modified =3D $headers->last_modified; >>> + >>> + # Check if we got a last-modified value from the server. >>> + if ($last_modified) { >>> + # Assign the last-modified timestamp as mtime to the >>> + # stored file. >>> + utime(time(), "$last_modified", "$file"); >>> + } >>> + >>> + # Delete temporary file. >>> + unlink("$tmpfile"); >>> + >>> + # If we got here, everything worked fine. Return nothing. >>> + return; >>> + } else { >>> + # Decode the response content and return it. >>> + return $response->decoded_content; >>> + } >>> +} >>> + >>> # Cloud Stuff >>>=20 >>> sub running_in_cloud() { >>> --=20 >>> 2.47.2