From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.haj.ipfire.org (localhost [127.0.0.1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4ZLFLy0kSMz331L for ; Sun, 23 Mar 2025 12:08:10 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) client-signature RSA-PSS (4096 bits)) (Client CN "mail01.haj.ipfire.org", Issuer "R10" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4ZLFLt3YJQz32vy for ; Sun, 23 Mar 2025 12:08:06 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 4ZLFLt0GBmz8xW; Sun, 23 Mar 2025 12:08:05 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1742731686; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7Gz7BV4NraklEAx+p2dttL3f+LdLBHDxlBbks1DlD6Q=; b=ffwYT36grPia4hUkxZCv4NuFVme/H1qtwN/4T5VluxrI5iHYf01DSJmBup26iE+3XMScmV HUlQaboRv/h8YuDA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1742731686; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7Gz7BV4NraklEAx+p2dttL3f+LdLBHDxlBbks1DlD6Q=; b=XzwWiW/LecC8Id7HMisXnr1A8UZLMnoMa9oWGyUHf0Htw1a644H3kJaDVK3U3agjk1Taip +5O02QmB3cMDGjrU2gy2lm+C3j6C2vy8qdNnBlv1Vx7y7s6UrduwYCyTDmi32tZQUFJGg2 LhHjoAM8AtLS1jP0dU+vnH4fi0qrGtsdd9sms48C0lQKQl9t7uAsVlRAa0JXA07L7FUnw8 uwlSh7QPpR87rfizHGPMtJGdOKIm4UZzsCc+ShbCtzePZ2J6Jv80do94MPwV34MPitHiok 1WBmurX5Uz/Jtq0M9jNSd3Ru3eL2OtudstEvLXIjcA/TfQ2dhY0NeZD0cTya5Q== Content-Type: text/plain; charset=utf-8 Precedence: list List-Id: List-Subscribe: , List-Unsubscribe: , List-Post: List-Help: Sender: Mail-Followup-To: Mime-Version: 1.0 Subject: Re: [PATCH 1/3] general-functions.pl: Add LWP-based flexible downloader function. From: Michael Tremer In-Reply-To: <20250322145724.4593-1-stefan.schantl@ipfire.org> Date: Sun, 23 Mar 2025 12:08:05 +0000 Cc: development@lists.ipfire.org Content-Transfer-Encoding: quoted-printable Message-Id: <31565D56-B0C2-4BA0-BF75-7902C376B35E@ipfire.org> References: <20250322145724.4593-1-stefan.schantl@ipfire.org> To: Stefan Schantl Morning Stefan, Thank you for taking the time and getting this out of your git = repository and onto this list :) As we are already shipping this code and this patch only moves it = around, there is probably not much standing in the way of seeing this = being merged. I have however a couple of questions that we might want to = address in a follow-up patch. > On 22 Mar 2025, at 14:57, Stefan Schantl = wrote: >=20 > This function can be used to grab content and/or store it into files. >=20 > Signed-off-by: Stefan Schantl > --- > config/cfgroot/general-functions.pl | 262 ++++++++++++++++++++++++++++ > 1 file changed, 262 insertions(+) >=20 > diff --git a/config/cfgroot/general-functions.pl = b/config/cfgroot/general-functions.pl > index 8ba6e3f79..cb8df69c6 100644 > --- a/config/cfgroot/general-functions.pl > +++ b/config/cfgroot/general-functions.pl > @@ -20,6 +20,23 @@ use IO::Socket; > use Net::SSLeay; > use Net::IPv4Addr qw(:all); >=20 > +# Load module to move files. > +use File::Copy; > + > +# Load module to get file stats. > +use File::stat; > + > +# Load module to deal with temporary files. > +use File::Temp; > + > +# Load module to deal with the date formats used by the HTTP = protocol. > +use HTTP::Date; > + > +# Load the libwwwperl User Agent module. > +use LWP::UserAgent; Recently (let=E2=80=99s say a year ago) I have spent some time on making = the web UI faster. Especially on some hardware, is it not very snappy. = The reason for the slowness however was not that we are doing much stuff = in our code; it was rather that loading modules was taking a long time = in Perl. So I removed what was no longer needed and moved some other = stuff around which slightly helped (saved around 100ms on the older = IPFire Mini Appliance). Since the downloading functionality is not used *that* much here. Should = this maybe not live in a separate place and we only include it when it = is needed? > + > +$|=3D1; # line buffering Why is this needed? > + > $General::version =3D 'VERSION'; > $General::swroot =3D 'CONFIG_ROOT'; > $General::noipprefix =3D 'noipg-'; > @@ -1346,6 +1363,251 @@ sub generateProtoTransHash () { > return %protocols; > } >=20 > +# Function to grab a given URL content or to download and store it on = disk. > +# > +# The function requires a configuration hash to be passed. > +# > +# The following options (hash keys) are supported: > +# > +# URL -> The URL to the content or file. REQUIRED! > +# FILE -> The filename as fullpath where the content/file should be = stored on disk. > +# ETAGSFILE -> A filename again as fullpath where Etags should be = stored and read. > +# ETAGPREFIX -> In case a custom etag name should be used, otherwise = it defaults to the given URL. > +# MAXSIZE -> In bytes until the downloader will abort downloading. = (example: 10_485_760 for 10MB) > +# > +# If a file is given an If-Modified-Since header will be generated = from the last modified timestamp > +# of an already stored file. In case an Etag file is specified an = If-None-Match header will be added to > +# the request - Both can be used at the same time. > +# > +# In case no FILE option has been passed to the function, the content = of the requested URL will be returned. > +# > +# Return codes (if FILE is used): > +# > +# nothing - On success > +# no url - If no URL has been specified. > +# not_modified - In case the servers responds with "Not modified" = (304) > +# dl_error - If the requested URL cannot be accessed. > +# incomplete download - In case the size of the local file does not = match the remote content_lenght. > +# > +sub downloader (%) { > + my (%args) =3D @_; > + > + # Remap args hash and convert all keys into upper case format. > + %args =3D map { uc $_ =3D> $args{$_} } keys %args; > + > + # The amount of download attempts before giving up and > + # logging an error. > + my $max_dl_attempts =3D 3; > + > + # Temporary directory to download the files. > + my $tmp_dl_directory =3D "/var/tmp"; > + > + # Assign hash values. > + my $url =3D $args{"URL"} if (exists($args{"URL"})); > + my $file =3D $args{"FILE"} if (exists($args{"FILE"})); > + my $etags_file =3D $args{"ETAGSFILE"} if = (exists($args{"ETAGSFILE"})); > + my $etagprefix =3D $url; > + $etagprefix =3D $args{"ETAGPREFIX"} if = (exists($args{"ETAGPREFIX"})); > + my $max_size =3D $args{"MAXSIZE"} if (exists($args{"MAXSIZE"})); I consider this really ugly, but I suppose this is what Perl wants. > + # Abort with error "no url", if no URL has been given. > + return "no url" unless ($url); Could we rather die here with an error? Just returning a string here = seems to be too easy to ignore in case something above went wrong. > + > + my %etags =3D (); > + my $tmpfile; > + > + # Read-in proxysettings. > + my %proxysettings=3D(); > + &readhash("${General::swroot}/proxy/settings", \%proxysettings); > + > + # Create a user agent instance. > + # > + # Request SSL hostname verification and specify path > + # to the CA file. > + my $ua =3D LWP::UserAgent->new( > + ssl_opts =3D> { > + SSL_ca_file =3D> '/etc/ssl/cert.pem', > + verify_hostname =3D> 1, > + }, > + ); > + > + # Set timeout to 10 seconds. > + $ua->timeout(10); A 10 second timeout also sounds rather optimistic considering some = slower, overloaded mirrors or those that are quite far away. For any = tasks like downloading data in the background we could afford longer. > + > + # Assign maximum download size if set. > + $ua->max_size($max_size) if ($max_size); > + > + # Generate UserAgent. > + my $agent =3D &MakeUserAgent(); > + > + # Set the generated UserAgent. > + $ua->agent($agent); > + > + # Check if an upstream proxy is configured. > + if ($proxysettings{'UPSTREAM_PROXY'}) { > + my $proxy_url; > + > + $proxy_url =3D "http://"; > + > + # Check if the proxy requires authentication. > + if (($proxysettings{'UPSTREAM_USER'}) && = ($proxysettings{'UPSTREAM_PASSWORD'})) { > + $proxy_url .=3D = "$proxysettings{'UPSTREAM_USER'}\:$proxysettings{'UPSTREAM_PASSWORD'}\@"; > + } > + > + # Add proxy server address and port. > + $proxy_url .=3D $proxysettings{'UPSTREAM_PROXY'}; > + > + # Append proxy settings. > + $ua->proxy(['http', 'https'], $proxy_url); > + } > + > + # Create a HTTP request element and pass the given URL to it. > + my $request =3D HTTP::Request->new(GET =3D> $url); > + > + # Check if a file to store the output has been provided. > + if ($file) { > + # Check if the given file already exits, because it has been = downloaded in the past. > + # > + # In this case we are requesting the server if the remote file has = been changed or not. > + # This will be done by sending the modification time in a special = HTTP header. > + if (-f $file) { > + # Call stat on the file. > + my $stat =3D stat($file); > + > + # Omit the mtime of the existing file. > + my $mtime =3D $stat->mtime; > + > + # Convert the timestamp into right format. > + my $http_date =3D time2str($mtime); > + > + # Add the If-Modified-Since header to the request to ask the server = if the > + # file has been modified. > + $request->header( 'If-Modified-Since' =3D> "$http_date" ); > + } > + > + # Generate temporary file name, located in the tempoary download = directory and with a suffix of ".tmp". > + # The downloaded file will be stored there until some sanity checks = are performed. > + my $tmp =3D File::Temp->new( SUFFIX =3D> ".tmp", DIR =3D> = "$tmp_dl_directory/", UNLINK =3D> 0 ); > + $tmpfile =3D $tmp->filename(); > + } > + > + # Check if an file for etags has been given. > + if ($etags_file) { > + # Read-in Etags file for known Etags if the file is present. > + &readhash("$etags_file", \%etags) if (-f $etags_file); > + > + # Check if an Etag for the current provider is stored. > + if ($etags{$etagprefix}) { > + # Grab the stored tag. > + my $etag =3D $etags{$etagprefix}; > + > + # Add an "If-None-Match header to the request to ask the server if = the > + # file has been modified. > + $request->header( 'If-None-Match' =3D> $etag ); > + } > + } > + > + my $dl_attempt =3D 1; > + my $response; > + > + # Download and retry on failure. > + while ($dl_attempt <=3D $max_dl_attempts) { > + # Perform the request and save the output into the tmpfile if = requested. > + $response =3D $ua->request($request, $tmpfile); > + > + # Check if the download was successfull. > + if($response->is_success) { > + # Break loop. > + last; > + > + # Check if the server responds with 304 (Not Modified). > + } elsif ($response->code =3D=3D 304) { > + # Remove temporary file, if one exists. > + unlink("$tmpfile") if (-e "$tmpfile"); > + > + # Return "not modified". > + return "not modified"; Same here. It feels like strings are not the most ideal return code. > + > + # Check if we ran out of download re-tries. > + } elsif ($dl_attempt eq $max_dl_attempts) { > + # Obtain error. > + my $error =3D $response->content; > + > + # Remove temporary file, if one exists. > + unlink("$tmpfile") if (-e "$tmpfile"); > + > + # Return the error message from response.. > + return "$error"; > + } > + > + # Remove temporary file, if one exists. > + unlink("$tmpfile") if (-e "$tmpfile"); > + > + # Increase download attempt counter. > + $dl_attempt++; > + } > + > + # Obtain the connection headers. > + my $headers =3D $response->headers; > + > + # Check if an Etag file has been provided. > + if ($etags_file) { > + # Grab the Etag from the response if the server provides one. > + if ($response->header('Etag')) { > + # Add the provided Etag to the hash of tags. > + $etags{$etagprefix} =3D $response->header('Etag'); > + > + # Write the etags file. > + &writehash($etags_file, \%etags); > + } > + } > + > + # Check if the response should be stored on disk. > + if ($file) { > + # Get the remote size of the content. > + my $remote_size =3D $response->header('Content-Length'); > + > + # Perform a stat on the temporary file. > + my $stat =3D stat($tmpfile); > + > + # Grab the size of the stored temporary file. > + my $local_size =3D $stat->size; > + > + # Check if both sizes are equal. > + if(($remote_size) && ($remote_size ne $local_size)) { > + # Delete the temporary file. > + unlink("$tmpfile"); > + > + # Abort and return "incomplete download" as error. > + return "incomplete download"; > + } > + > + # Move the temporaray file to the desired file by overwriting a may > + # existing one. > + move("$tmpfile", "$file"); > + > + # Omit the timestamp from response header, when the file has been = modified the > + # last time. > + my $last_modified =3D $headers->last_modified; > + > + # Check if we got a last-modified value from the server. > + if ($last_modified) { > + # Assign the last-modified timestamp as mtime to the > + # stored file. > + utime(time(), "$last_modified", "$file"); > + } > + > + # Delete temporary file. > + unlink("$tmpfile"); > + > + # If we got here, everything worked fine. Return nothing. > + return; > + } else { > + # Decode the response content and return it. > + return $response->decoded_content; > + } > +} > + > # Cloud Stuff >=20 > sub running_in_cloud() { > --=20 > 2.47.2 >=20 >=20