Hello,
On 24 Mar 2022, at 18:50, Stefan Schantl stefan.schantl@ipfire.org wrote:
Hello Michael,
thanks for your thoughts and feedback.
Hello,
I like this mechanism. It is quite powerful since it will save a lot of unnecessary downloads and CPU cycles. However, I am not sure if all providers will support this because some use crappy CDNs which do not give anything about conforming to any RFC.
Currently I'm only aware that content which is hosted on github.com cannot be checked with this header flag. They are using Etags which also can be used for cache management, but works in a completely different way.
Yes, but we can still use them.
You can just store the Etag and send it with your next request.
There are plenty more providers, some which requires a subscription which I do not own - so they currently remains untested.
On 23 Mar 2022, at 04:04, Stefan Schantl stefan.schantl@ipfire.org wrote:
When using the "If-Modified-Since" header, the server can be requested if a modified version of the file can be served.
In case that is true, the file will be sent and stored by the downloader function. If the file has not been touched since the last time, the server will respond with the code "304" (Not modified).
This tells us, that the current stored file is the latest one (still up-to-date) and we safely can skip the download attempt for this provider.
Signed-off-by: Stefan Schantl stefan.schantl@ipfire.org
config/cfgroot/ids-functions.pl | 38 ++++++++++++++++++++++++++++++--- 1 file changed, 35 insertions(+), 3 deletions(-)
diff --git a/config/cfgroot/ids-functions.pl b/config/cfgroot/ids- functions.pl index dfbeb1a7d..d7df41dd1 100644 --- a/config/cfgroot/ids-functions.pl +++ b/config/cfgroot/ids-functions.pl @@ -365,9 +365,25 @@ sub downloadruleset ($) { my $tmp = File::Temp->new( SUFFIX => ".tmp", DIR => "/var/tmp/", UNLINK => 0 ); my $tmpfile = $tmp->filename();
- # Genarate and assign file name and path to store
the downloaded rules file.
- my $dl_rulesfile = &_get_dl_rulesfile($provider);
- # Load perl module to deal with file atributes.
- use File::stat;
I don’t like modules in the middle of a script. This makes testing a new perl update more difficult. If I can load the script, I expect it to have all dependencies it needs. This does not work when modules are being loaded later on. I would say that we need to cover those things by a module test then.
I would prefer reliability over performance (I assume that this your argument why to load it only when you need it).
I'll move the load of the modules to the top of the script. This is also the way all other (library) scripts are doing this.
Thank you.
This may will lead to some more resource consumption, but will help us to execute the script in a safe way - even when anything got updated.
Module tests generally are a good way - I'm thinking about implementing some at a later time (after some important changes are finished) for the "ids-functions.pl".
- # Get the mtime of the rulesfile if it exists.
- my $mtime = (stat($dl_rulesfile)->mtime) if (-f
$dl_rulesfile);
I suggest making this a little bit simpler:
Create the request first, then stat the file (you don’t need to check if it exists first because stat will just tell you).
Then add the header if you have a timestamp; if not, then don’t add a useless header. This is a lot more straight forward :)
- # Convert the mtime into gmtime format.
- my $gmtime = gmtime($mtime || 0);
# Pass the requested url to the downloader. my $request = HTTP::Request->new(GET => $url);
- # Add the If-Modified-Since header to the request,
containing the omited and converted
- # mtime of the downloaded rules file, if one is
present.
- $request->header( 'If-Modified-Since' => "$gmtime"
);
Will this survive any redirects? I know that some redirect to AWS S3 and you will need to make sure that the header is being sent when you follow the redirect - assuming S3 supports this which I think they would.
If the servers in between are not entirely broken or do silly stuff, they should.
This is in the control of the client. It should not matter what the server responds with.
my $dl_attempt = 1; my $response;
@@ -381,6 +397,14 @@ sub downloadruleset ($) { # Break loop. last;
- # Check if the server responds with 304
(Not Modified).
- } elsif ($response->code == 304) {
- # Log to syslog.
- &_log_to_syslog("Ruleset is up-to-
date, no update required.");
- # Nothing to be done, the ruleset
is up-to-date.
- return;
# Check if we ran out of download re-tries. } elsif ($dl_attempt eq $max_dl_attempts) { # Obtain error. @@ -406,6 +430,10 @@ sub downloadruleset ($) { # Get the remote size of the downloaded file. my $remote_filesize = $headers->content_length;
- # Get the timestamp from header, when the file has
been modified the
- # last time.
- my $last_modified = $headers->last_modified;
# Load perl stat module. use File::stat;
You are loading the module again.
Thanks, this is an additional point why, loading all required modules on top are a good idea. You do not have to think about which modules are already loaded or may be reloaded etc.
@@ -428,9 +456,6 @@ sub downloadruleset ($) { return 1; }
- # Genarate and assign file name and path to store
the downloaded rules file.
- my $dl_rulesfile = &_get_dl_rulesfile($provider);
# Check if a file name could be obtained. unless ($dl_rulesfile) { # Log error message. @@ -449,6 +474,13 @@ sub downloadruleset ($) { # Overwrite the may existing rulefile or tarball with the downloaded one. move("$tmpfile", "$dl_rulesfile");
- # Check if the server respond contained a
last_modified value.
- if ($last_modified) {
- # Assign the last modified timestamp from
server as mtime to the
- # rules file.
- utime(time(), "$last_modified",
"$dl_rulesfile");
- }
# Delete temporary file. unlink("$tmpfile");
-- 2.30.2
-Michael
-Stefan