From: Michael Tremer <michael.tremer@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: [PATCH 4/5] ids-functions.pl: Use If-Modified-Since header to reduce file downloads.
Date: Mon, 28 Mar 2022 16:15:49 +0100 [thread overview]
Message-ID: <026F49A8-7D3B-4374-855A-740B75209BD1@ipfire.org> (raw)
In-Reply-To: <1576548dfa8d2c950e321c766dc85b8967eeb20f.camel@ipfire.org>
[-- Attachment #1: Type: text/plain, Size: 6815 bytes --]
Hello,
> On 24 Mar 2022, at 18:50, Stefan Schantl <stefan.schantl(a)ipfire.org> wrote:
>
> Hello Michael,
>
> thanks for your thoughts and feedback.
>
>> Hello,
>>
>> I like this mechanism. It is quite powerful since it will save a lot
>> of unnecessary downloads and CPU cycles. However, I am not sure if
>> all providers will support this because some use crappy CDNs which do
>> not give anything about conforming to any RFC.
>
> Currently I'm only aware that content which is hosted on github.com
> cannot be checked with this header flag. They are using Etags which
> also can be used for cache management, but works in a completely
> different way.
Yes, but we can still use them.
You can just store the Etag and send it with your next request.
> There are plenty more providers, some which requires a subscription
> which I do not own - so they currently remains untested.
>
>>
>>> On 23 Mar 2022, at 04:04, Stefan Schantl
>>> <stefan.schantl(a)ipfire.org> wrote:
>>>
>>> When using the "If-Modified-Since" header, the server can be
>>> requested
>>> if a modified version of the file can be served.
>>>
>>> In case that is true, the file will be sent and stored by the
>>> downloader
>>> function. If the file has not been touched since the last time, the
>>> server will respond with the code "304" (Not modified).
>>>
>>> This tells us, that the current stored file is the latest one
>>> (still up-to-date)
>>> and we safely can skip the download attempt for this provider.
>>>
>>> Signed-off-by: Stefan Schantl <stefan.schantl(a)ipfire.org>
>>> ---
>>> config/cfgroot/ids-functions.pl | 38
>>> ++++++++++++++++++++++++++++++---
>>> 1 file changed, 35 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/config/cfgroot/ids-functions.pl b/config/cfgroot/ids-
>>> functions.pl
>>> index dfbeb1a7d..d7df41dd1 100644
>>> --- a/config/cfgroot/ids-functions.pl
>>> +++ b/config/cfgroot/ids-functions.pl
>>> @@ -365,9 +365,25 @@ sub downloadruleset ($) {
>>> my $tmp = File::Temp->new( SUFFIX => ".tmp", DIR =>
>>> "/var/tmp/", UNLINK => 0 );
>>> my $tmpfile = $tmp->filename();
>>>
>>> + # Genarate and assign file name and path to store
>>> the downloaded rules file.
>>> + my $dl_rulesfile = &_get_dl_rulesfile($provider);
>>> +
>>> + # Load perl module to deal with file atributes.
>>> + use File::stat;
>>
>> I don’t like modules in the middle of a script. This makes testing a
>> new perl update more difficult. If I can load the script, I expect it
>> to have all dependencies it needs. This does not work when modules
>> are being loaded later on. I would say that we need to cover those
>> things by a module test then.
>>
>> I would prefer reliability over performance (I assume that this your
>> argument why to load it only when you need it).
>
> I'll move the load of the modules to the top of the script. This is
> also the way all other (library) scripts are doing this.
Thank you.
> This may will lead to some more resource consumption, but will help us
> to execute the script in a safe way - even when anything got updated.
>
> Module tests generally are a good way - I'm thinking about implementing
> some at a later time (after some important changes are finished) for
> the "ids-functions.pl".
>
>>
>>> +
>>> + # Get the mtime of the rulesfile if it exists.
>>> + my $mtime = (stat($dl_rulesfile)->mtime) if (-f
>>> $dl_rulesfile);
>>
>> I suggest making this a little bit simpler:
>>
>> Create the request first, then stat the file (you don’t need to check
>> if it exists first because stat will just tell you).
>>
>> Then add the header if you have a timestamp; if not, then don’t add a
>> useless header. This is a lot more straight forward :)
>>
>>> +
>>> + # Convert the mtime into gmtime format.
>>> + my $gmtime = gmtime($mtime || 0);
>>> +
>>> # Pass the requested url to the downloader.
>>> my $request = HTTP::Request->new(GET => $url);
>>>
>>> + # Add the If-Modified-Since header to the request,
>>> containing the omited and converted
>>> + # mtime of the downloaded rules file, if one is
>>> present.
>>> + $request->header( 'If-Modified-Since' => "$gmtime"
>>> );
>>
>> Will this survive any redirects? I know that some redirect to AWS S3
>> and you will need to make sure that the header is being sent when you
>> follow the redirect - assuming S3 supports this which I think they
>> would.
>
> If the servers in between are not entirely broken or do silly stuff,
> they should.
This is in the control of the client. It should not matter what the server responds with.
>>
>>> +
>>> my $dl_attempt = 1;
>>> my $response;
>>>
>>> @@ -381,6 +397,14 @@ sub downloadruleset ($) {
>>> # Break loop.
>>> last;
>>>
>>> + # Check if the server responds with 304
>>> (Not Modified).
>>> + } elsif ($response->code == 304) {
>>> + # Log to syslog.
>>> + &_log_to_syslog("Ruleset is up-to-
>>> date, no update required.");
>>> +
>>> + # Nothing to be done, the ruleset
>>> is up-to-date.
>>> + return;
>>> +
>>> # Check if we ran out of download re-tries.
>>> } elsif ($dl_attempt eq $max_dl_attempts) {
>>> # Obtain error.
>>> @@ -406,6 +430,10 @@ sub downloadruleset ($) {
>>> # Get the remote size of the downloaded file.
>>> my $remote_filesize = $headers->content_length;
>>>
>>> + # Get the timestamp from header, when the file has
>>> been modified the
>>> + # last time.
>>> + my $last_modified = $headers->last_modified;
>>> +
>>> # Load perl stat module.
>>> use File::stat;
>>
>> You are loading the module again.
>
> Thanks, this is an additional point why, loading all required modules
> on top are a good idea. You do not have to think about which modules
> are already loaded or may be reloaded etc.
>>
>>>
>>> @@ -428,9 +456,6 @@ sub downloadruleset ($) {
>>> return 1;
>>> }
>>>
>>> - # Genarate and assign file name and path to store
>>> the downloaded rules file.
>>> - my $dl_rulesfile = &_get_dl_rulesfile($provider);
>>> -
>>> # Check if a file name could be obtained.
>>> unless ($dl_rulesfile) {
>>> # Log error message.
>>> @@ -449,6 +474,13 @@ sub downloadruleset ($) {
>>> # Overwrite the may existing rulefile or tarball
>>> with the downloaded one.
>>> move("$tmpfile", "$dl_rulesfile");
>>>
>>> + # Check if the server respond contained a
>>> last_modified value.
>>> + if ($last_modified) {
>>> + # Assign the last modified timestamp from
>>> server as mtime to the
>>> + # rules file.
>>> + utime(time(), "$last_modified",
>>> "$dl_rulesfile");
>>> + }
>>> +
>>> # Delete temporary file.
>>> unlink("$tmpfile");
>>>
>>> --
>>> 2.30.2
>>>
>>
>> -Michael
>
> -Stefan
next prev parent reply other threads:[~2022-03-28 15:15 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-23 4:04 [PATCH 1/5] ids-functions.pl: Drop downloader code for sourcefire based ruleset Stefan Schantl
2022-03-23 4:04 ` [PATCH 2/5] ids-functions.pl: Allow "5" download attempts for each provider before fail Stefan Schantl
2022-03-23 9:28 ` Michael Tremer
2022-03-24 18:23 ` Stefan Schantl
2022-03-28 15:16 ` Michael Tremer
2022-03-23 4:04 ` [PATCH 3/5] ids-functions.pl: Remove temporary file, if the download failed Stefan Schantl
2022-03-23 4:04 ` [PATCH 4/5] ids-functions.pl: Use If-Modified-Since header to reduce file downloads Stefan Schantl
2022-03-23 9:34 ` Michael Tremer
2022-03-24 18:50 ` Stefan Schantl
2022-03-28 15:15 ` Michael Tremer [this message]
2022-03-23 4:04 ` [PATCH 5/5] ids-functions.pl: Do not longer call any log message as "ERROR" Stefan Schantl
2022-03-23 9:37 ` [PATCH 1/5] ids-functions.pl: Drop downloader code for sourcefire based ruleset Michael Tremer
-- strict thread matches above, loose matches on Subject: below --
2022-03-22 19:40 Stefan Schantl
2022-03-22 19:40 ` [PATCH 4/5] ids-functions.pl: Use If-Modified-Since header to reduce file downloads Stefan Schantl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=026F49A8-7D3B-4374-855A-740B75209BD1@ipfire.org \
--to=michael.tremer@ipfire.org \
--cc=development@lists.ipfire.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox