public inbox for development@lists.ipfire.org
 help / color / mirror / Atom feed
From: Michael Tremer <michael.tremer@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: [PATCH 4/5] ids-functions.pl: Use If-Modified-Since header to reduce file downloads.
Date: Mon, 28 Mar 2022 16:15:49 +0100	[thread overview]
Message-ID: <026F49A8-7D3B-4374-855A-740B75209BD1@ipfire.org> (raw)
In-Reply-To: <1576548dfa8d2c950e321c766dc85b8967eeb20f.camel@ipfire.org>

[-- Attachment #1: Type: text/plain, Size: 6815 bytes --]

Hello,

> On 24 Mar 2022, at 18:50, Stefan Schantl <stefan.schantl(a)ipfire.org> wrote:
> 
> Hello Michael,
> 
> thanks for your thoughts and feedback.
> 
>> Hello,
>> 
>> I like this mechanism. It is quite powerful since it will save a lot
>> of unnecessary downloads and CPU cycles. However, I am not sure if
>> all providers will support this because some use crappy CDNs which do
>> not give anything about conforming to any RFC.
> 
> Currently I'm only aware that content which is hosted on github.com
> cannot be checked with this header flag. They are using Etags which
> also can be used for cache management, but works in a completely
> different way.

Yes, but we can still use them.

You can just store the Etag and send it with your next request.

> There are plenty more providers, some which requires a subscription
> which I do not own - so they currently remains untested. 
> 
>> 
>>> On 23 Mar 2022, at 04:04, Stefan Schantl
>>> <stefan.schantl(a)ipfire.org> wrote:
>>> 
>>> When using the "If-Modified-Since" header, the server can be
>>> requested
>>> if a modified version of the file can be served.
>>> 
>>> In case that is true, the file will be sent and stored by the
>>> downloader
>>> function. If the file has not been touched since the last time, the
>>> server will respond with the code "304" (Not modified).
>>> 
>>> This tells us, that the current stored file is the latest one
>>> (still up-to-date)
>>> and we safely can skip the download attempt for this provider.
>>> 
>>> Signed-off-by: Stefan Schantl <stefan.schantl(a)ipfire.org>
>>> ---
>>> config/cfgroot/ids-functions.pl | 38
>>> ++++++++++++++++++++++++++++++---
>>> 1 file changed, 35 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/config/cfgroot/ids-functions.pl b/config/cfgroot/ids-
>>> functions.pl
>>> index dfbeb1a7d..d7df41dd1 100644
>>> --- a/config/cfgroot/ids-functions.pl
>>> +++ b/config/cfgroot/ids-functions.pl
>>> @@ -365,9 +365,25 @@ sub downloadruleset ($) {
>>> my $tmp = File::Temp->new( SUFFIX => ".tmp", DIR =>
>>> "/var/tmp/", UNLINK => 0 );
>>> my $tmpfile = $tmp->filename();
>>> 
>>> +  # Genarate and assign file name and path to store
>>> the downloaded rules file.
>>> +  my $dl_rulesfile = &_get_dl_rulesfile($provider);
>>> +
>>> +  # Load perl module to deal with file atributes.
>>> +  use File::stat;
>> 
>> I don’t like modules in the middle of a script. This makes testing a
>> new perl update more difficult. If I can load the script, I expect it
>> to have all dependencies it needs. This does not work when modules
>> are being loaded later on. I would say that we need to cover those
>> things by a module test then.
>> 
>> I would prefer reliability over performance (I assume that this your
>> argument why to load it only when you need it).
> 
> I'll move the load of the modules to the top of the script. This is
> also the way all other (library) scripts are doing this.

Thank you.

> This may will lead to some more resource consumption, but will help us
> to execute the script in a safe way - even when anything got updated.
> 
> Module tests generally are a good way - I'm thinking about implementing
> some at a later time (after some important changes are finished) for
> the "ids-functions.pl".
> 
>> 
>>> +
>>> +  # Get the mtime of the rulesfile if it exists.
>>> +  my $mtime = (stat($dl_rulesfile)->mtime) if (-f
>>> $dl_rulesfile);
>> 
>> I suggest making this a little bit simpler:
>> 
>> Create the request first, then stat the file (you don’t need to check
>> if it exists first because stat will just tell you).
>> 
>> Then add the header if you have a timestamp; if not, then don’t add a
>> useless header. This is a lot more straight forward :)
>> 
>>> +
>>> +  # Convert the mtime into gmtime format.
>>> +  my $gmtime = gmtime($mtime || 0);
>>> +
>>> # Pass the requested url to the downloader.
>>> my $request = HTTP::Request->new(GET => $url);
>>> 
>>> +  # Add the If-Modified-Since header to the request,
>>> containing the omited and converted
>>> +  # mtime of the downloaded rules file, if one is
>>> present.
>>> +  $request->header( 'If-Modified-Since' => "$gmtime"
>>> );
>> 
>> Will this survive any redirects? I know that some redirect to AWS S3
>> and you will need to make sure that the header is being sent when you
>> follow the redirect - assuming S3 supports this which I think they
>> would.
> 
> If the servers in between are not entirely broken or do silly stuff,
> they should.

This is in the control of the client. It should not matter what the server responds with.

>> 
>>> +
>>> my $dl_attempt = 1;
>>> my $response;
>>> 
>>> @@ -381,6 +397,14 @@ sub downloadruleset ($) {
>>> # Break loop.
>>> last;
>>> 
>>> +  # Check if the server responds with 304
>>> (Not Modified).
>>> +  } elsif ($response->code == 304) {
>>> +  # Log to syslog.
>>> +  &_log_to_syslog("Ruleset is up-to-
>>> date, no update required.");
>>> +
>>> +  # Nothing to be done, the ruleset
>>> is up-to-date.
>>> +  return;
>>> +
>>> # Check if we ran out of download re-tries.
>>> } elsif ($dl_attempt eq $max_dl_attempts) {
>>> # Obtain error.
>>> @@ -406,6 +430,10 @@ sub downloadruleset ($) {
>>> # Get the remote size of the downloaded file.
>>> my $remote_filesize = $headers->content_length;
>>> 
>>> +  # Get the timestamp from header, when the file has
>>> been modified the
>>> +  # last time.
>>> +  my $last_modified = $headers->last_modified;
>>> +
>>> # Load perl stat module.
>>> use File::stat;
>> 
>> You are loading the module again.
> 
> Thanks, this is an additional point why, loading all required modules
> on top are a good idea. You do not have to think about which modules
> are already loaded or may be reloaded etc.
>> 
>>> 
>>> @@ -428,9 +456,6 @@ sub downloadruleset ($) {
>>> return 1;
>>> }
>>> 
>>> -  # Genarate and assign file name and path to store
>>> the downloaded rules file.
>>> -  my $dl_rulesfile = &_get_dl_rulesfile($provider);
>>> -
>>> # Check if a file name could be obtained.
>>> unless ($dl_rulesfile) {
>>> # Log error message.
>>> @@ -449,6 +474,13 @@ sub downloadruleset ($) {
>>> # Overwrite the may existing rulefile or tarball
>>> with the downloaded one.
>>> move("$tmpfile", "$dl_rulesfile");
>>> 
>>> +  # Check if the server respond contained a
>>> last_modified value.
>>> +  if ($last_modified) {
>>> +  # Assign the last modified timestamp from
>>> server as mtime to the
>>> +  # rules file.
>>> +  utime(time(), "$last_modified",
>>> "$dl_rulesfile");
>>> +  }
>>> +
>>> # Delete temporary file.
>>> unlink("$tmpfile");
>>> 
>>> -- 
>>> 2.30.2
>>> 
>> 
>> -Michael
> 
> -Stefan


  reply	other threads:[~2022-03-28 15:15 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-23  4:04 [PATCH 1/5] ids-functions.pl: Drop downloader code for sourcefire based ruleset Stefan Schantl
2022-03-23  4:04 ` [PATCH 2/5] ids-functions.pl: Allow "5" download attempts for each provider before fail Stefan Schantl
2022-03-23  9:28   ` Michael Tremer
2022-03-24 18:23     ` Stefan Schantl
2022-03-28 15:16       ` Michael Tremer
2022-03-23  4:04 ` [PATCH 3/5] ids-functions.pl: Remove temporary file, if the download failed Stefan Schantl
2022-03-23  4:04 ` [PATCH 4/5] ids-functions.pl: Use If-Modified-Since header to reduce file downloads Stefan Schantl
2022-03-23  9:34   ` Michael Tremer
2022-03-24 18:50     ` Stefan Schantl
2022-03-28 15:15       ` Michael Tremer [this message]
2022-03-23  4:04 ` [PATCH 5/5] ids-functions.pl: Do not longer call any log message as "ERROR" Stefan Schantl
2022-03-23  9:37 ` [PATCH 1/5] ids-functions.pl: Drop downloader code for sourcefire based ruleset Michael Tremer
  -- strict thread matches above, loose matches on Subject: below --
2022-03-22 19:40 Stefan Schantl
2022-03-22 19:40 ` [PATCH 4/5] ids-functions.pl: Use If-Modified-Since header to reduce file downloads Stefan Schantl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=026F49A8-7D3B-4374-855A-740B75209BD1@ipfire.org \
    --to=michael.tremer@ipfire.org \
    --cc=development@lists.ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox