Hello, > On 24 Mar 2022, at 18:50, Stefan Schantl wrote: > > Hello Michael, > > thanks for your thoughts and feedback. > >> Hello, >> >> I like this mechanism. It is quite powerful since it will save a lot >> of unnecessary downloads and CPU cycles. However, I am not sure if >> all providers will support this because some use crappy CDNs which do >> not give anything about conforming to any RFC. > > Currently I'm only aware that content which is hosted on github.com > cannot be checked with this header flag. They are using Etags which > also can be used for cache management, but works in a completely > different way. Yes, but we can still use them. You can just store the Etag and send it with your next request. > There are plenty more providers, some which requires a subscription > which I do not own - so they currently remains untested. > >> >>> On 23 Mar 2022, at 04:04, Stefan Schantl >>> wrote: >>> >>> When using the "If-Modified-Since" header, the server can be >>> requested >>> if a modified version of the file can be served. >>> >>> In case that is true, the file will be sent and stored by the >>> downloader >>> function. If the file has not been touched since the last time, the >>> server will respond with the code "304" (Not modified). >>> >>> This tells us, that the current stored file is the latest one >>> (still up-to-date) >>> and we safely can skip the download attempt for this provider. >>> >>> Signed-off-by: Stefan Schantl >>> --- >>> config/cfgroot/ids-functions.pl | 38 >>> ++++++++++++++++++++++++++++++--- >>> 1 file changed, 35 insertions(+), 3 deletions(-) >>> >>> diff --git a/config/cfgroot/ids-functions.pl b/config/cfgroot/ids- >>> functions.pl >>> index dfbeb1a7d..d7df41dd1 100644 >>> --- a/config/cfgroot/ids-functions.pl >>> +++ b/config/cfgroot/ids-functions.pl >>> @@ -365,9 +365,25 @@ sub downloadruleset ($) { >>> my $tmp = File::Temp->new( SUFFIX => ".tmp", DIR => >>> "/var/tmp/", UNLINK => 0 ); >>> my $tmpfile = $tmp->filename(); >>> >>> + # Genarate and assign file name and path to store >>> the downloaded rules file. >>> + my $dl_rulesfile = &_get_dl_rulesfile($provider); >>> + >>> + # Load perl module to deal with file atributes. >>> + use File::stat; >> >> I don’t like modules in the middle of a script. This makes testing a >> new perl update more difficult. If I can load the script, I expect it >> to have all dependencies it needs. This does not work when modules >> are being loaded later on. I would say that we need to cover those >> things by a module test then. >> >> I would prefer reliability over performance (I assume that this your >> argument why to load it only when you need it). > > I'll move the load of the modules to the top of the script. This is > also the way all other (library) scripts are doing this. Thank you. > This may will lead to some more resource consumption, but will help us > to execute the script in a safe way - even when anything got updated. > > Module tests generally are a good way - I'm thinking about implementing > some at a later time (after some important changes are finished) for > the "ids-functions.pl". > >> >>> + >>> + # Get the mtime of the rulesfile if it exists. >>> + my $mtime = (stat($dl_rulesfile)->mtime) if (-f >>> $dl_rulesfile); >> >> I suggest making this a little bit simpler: >> >> Create the request first, then stat the file (you don’t need to check >> if it exists first because stat will just tell you). >> >> Then add the header if you have a timestamp; if not, then don’t add a >> useless header. This is a lot more straight forward :) >> >>> + >>> + # Convert the mtime into gmtime format. >>> + my $gmtime = gmtime($mtime || 0); >>> + >>> # Pass the requested url to the downloader. >>> my $request = HTTP::Request->new(GET => $url); >>> >>> + # Add the If-Modified-Since header to the request, >>> containing the omited and converted >>> + # mtime of the downloaded rules file, if one is >>> present. >>> + $request->header( 'If-Modified-Since' => "$gmtime" >>> ); >> >> Will this survive any redirects? I know that some redirect to AWS S3 >> and you will need to make sure that the header is being sent when you >> follow the redirect - assuming S3 supports this which I think they >> would. > > If the servers in between are not entirely broken or do silly stuff, > they should. This is in the control of the client. It should not matter what the server responds with. >> >>> + >>> my $dl_attempt = 1; >>> my $response; >>> >>> @@ -381,6 +397,14 @@ sub downloadruleset ($) { >>> # Break loop. >>> last; >>> >>> + # Check if the server responds with 304 >>> (Not Modified). >>> + } elsif ($response->code == 304) { >>> + # Log to syslog. >>> + &_log_to_syslog("Ruleset is up-to- >>> date, no update required."); >>> + >>> + # Nothing to be done, the ruleset >>> is up-to-date. >>> + return; >>> + >>> # Check if we ran out of download re-tries. >>> } elsif ($dl_attempt eq $max_dl_attempts) { >>> # Obtain error. >>> @@ -406,6 +430,10 @@ sub downloadruleset ($) { >>> # Get the remote size of the downloaded file. >>> my $remote_filesize = $headers->content_length; >>> >>> + # Get the timestamp from header, when the file has >>> been modified the >>> + # last time. >>> + my $last_modified = $headers->last_modified; >>> + >>> # Load perl stat module. >>> use File::stat; >> >> You are loading the module again. > > Thanks, this is an additional point why, loading all required modules > on top are a good idea. You do not have to think about which modules > are already loaded or may be reloaded etc. >> >>> >>> @@ -428,9 +456,6 @@ sub downloadruleset ($) { >>> return 1; >>> } >>> >>> - # Genarate and assign file name and path to store >>> the downloaded rules file. >>> - my $dl_rulesfile = &_get_dl_rulesfile($provider); >>> - >>> # Check if a file name could be obtained. >>> unless ($dl_rulesfile) { >>> # Log error message. >>> @@ -449,6 +474,13 @@ sub downloadruleset ($) { >>> # Overwrite the may existing rulefile or tarball >>> with the downloaded one. >>> move("$tmpfile", "$dl_rulesfile"); >>> >>> + # Check if the server respond contained a >>> last_modified value. >>> + if ($last_modified) { >>> + # Assign the last modified timestamp from >>> server as mtime to the >>> + # rules file. >>> + utime(time(), "$last_modified", >>> "$dl_rulesfile"); >>> + } >>> + >>> # Delete temporary file. >>> unlink("$tmpfile"); >>> >>> -- >>> 2.30.2 >>> >> >> -Michael > > -Stefan