Most Microsoft updates now contain an SHA1 hash in the filename. Since these files are uniquely identifiable, use mirror mode (which creates a hash of just the filename instead of the entire URL) to cache them. (But first check the URL cache to see if it has been downloaded as a URL already.)
This is a HUGELY needed fix. Windows 10 updates are 5+ GB per month, and we lose several days of bandwidth downloading duplicates from different mirrors. Sometimes a single client will request the same patch from multiple mirrors. That's bad. This patch will save a ton of bandwidth, and lots of disk space.
The patch limits the SHA1 test to microsoft only, but it could be easily expanded to other vendors if there is a need.
Signed-off-by: Justin Luth jluth@mail.com --- This is a slight hack, because the fix is tucked away in a somewhat obscure function. I mean, someone could completely redesign this and make more modular functions that create the hash, check if the file exists, etc. But this patch very neatly is contained in one section of the code and doesn't modify anything else, so I think the simplicity and elegance warrant the hackiness.
Because the fix is tucked away in the check_cache function, I added one comment in the Microsoft section, clearly alerting future programmers about the change. Originally, I had put my SHA1 test here, but doing so required pre-processing the caches and renaming the hash identifiers. This patch avoids that ugly business.
This patch works beautifully because it never downloads anything extra. If you already cached the URL, then you won't re-download the filename. But if you hit a different mirror now, you will download one more time (as normal) and after that every different mirror will be "satisfied".
In the bug report, there is a script that can be tweaked to RENAME the URL hash to become a filename hash, in case any site really wants to avoid that possibility of redownloading a file they already have. But since I haven't seen anyone else complaining about this problem, I doubt anyone would be interested.
A good test URL (that is a small file, not 1+ GB) is 7.au.download.windowsupdate.com/d/msdownload/update/others/2015/03/16743052_f84687743a71a750edef8ffedd978602a2592000.cab You can use numbers other than 7, remove the 7. or remove 7.au. in order to access different mirrors of the same file. --- config/updxlrator/updxlrator | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator index 5baaaae58..ff23b3a95 100644 --- a/config/updxlrator/updxlrator +++ b/config/updxlrator/updxlrator @@ -86,6 +86,8 @@ while (<>) { && ($source_url !~ m@&@) ) { + # NOTE: check_cache will change to $mirror instead of $unique if the filename contains an SHA1 hash + # and the URL is not found in cache! $xlrator_url = &check_cache($source_url,$hostaddr,$username,"Microsoft",$unique); }
@@ -400,6 +402,17 @@ sub check_cache &debuglog("Retrieving file from cache ($updsource) for $hostaddr"); &setcachestatus("$updcachedir/$vendorid/$uuid/access.log",time); $cacheurl="http://$netsettings%7B%27GREEN_ADDRESS%27%7D:$http_port/updatecache/$vendori..."; + } + elsif ( + ($cfmirror == $unique) && + ($vendorid == "microsoft") && + ($source_url =~ m@.*[0-9a-f]{40}.[^.]+@i) + ) + { + # Most Microsoft updates now have an SHA1 hash in the name. These should be treated as unique files. + # Since it wasn't found in the URL cache, switch to mirror mode and try again using just the filename. + &debuglog("SHA1: $vendorid $uuid not cached. Reprocessing as mirror $sourceurl"); + $cacheurl = &check_cache($source_url,$hostaddr,$username,$vendorid,$mirror); } else {