---------- Forwarded message ---------- From: Jan Behrens jan.behrensx@googlemail.com Date: 2013/3/6 Subject: Re: Update-Accelerator 3.0 To: Michael Tremer michael.tremer@ipfire.org
Hi,
okay dynamic is not always really dynamic. Youtube-Videos are also dynamic content to squid. So is everything with ? in the URL threaded as dynamic in squid.
And that is why squid behaves like he does:
refresh_pattern -i (/cgi-bin/|?) 0 0% 0
Of course there is no sense in caching "really" dynamic content, as you say, it is generated per request or per client. But for some big names, there is sense in caching some pseudo dynamic content excluding the really dynamic ones.
well it would be possible to cache HTTPS - Content with "Man-in-the-Middle" Squid. The client will make a HTTPS-connection to squid and squid will make it's connection to the original destination server. At this point squid can serve and save content.
2013/3/6 Michael Tremer michael.tremer@ipfire.org
Hi,
it is great that you take part in the discussion.
On Wed, 2013-03-06 at 20:13 +0100, Jan Behrens wrote:
I agree to Fajar's intention! We need a way more dynamic content caching capability.
No, I think that is exactly the wrong way. Dynamic content should not be cached because it is _dynamic_. It's very unlikely that someone else will get the same response to a request.
The only thing that makes sense it to cache _static_ content like avatars, videos, (big) pictures. But actually this is the proxy's task, which apparently does not work very well for some things.
I think the way to go is an written Addon for squid which does the needed work. Here is a point to start reading about: http://wiki.squid-cache.org/Features/AddonHelpers#HTTP_Redirection
What work? Re-implementing an extra cache is not a good idea. At least not for small files.
In the company I work in, it is as follows: The most traffic is secured (HTTPS). Just think about the hosting providers like Dropbox, Google-Drive, and so on. There probably are files which are needed by many people and then downloaded by them, and you can't legally cache them. That's a shame.
Files transferred over HTTPS cannot be cached by the proxy. This has no legal reasons. It is just technically impossible.
2013/3/6 Michael Tremer michael.tremer@ipfire.org Hello,
On Wed, 2013-03-06 at 16:57 +0800, Fajar Ramadhan wrote: > Hello there, replying inline > > > Any other ideas? > > Hyper Cache That's a possibility. I didn't know that anyone is still using the word hyper :D > > >> Michael from IPFire.org told me that you may have some > requirements or > > >> ideas for an improved update accelerator > > Firstly, this idea is not part of update accelerator thingie. Well, we are thinking about a rewrite, so every idea is welcome. Nobody promises that it will be implemented, but in the process of searching for the real purpose of the update accelerator, feel free to write anything on your mind if you think it is worth considering. > > >> cause we plan to extend the > > >> current version (at this point it look like a complete rewrite > o_O) > > complete rewrite, maybe :) ? > My idea is basically out from squid 2.7 abilities to cache dynamic > contents using built-in storeurl feature. > http://www.squid-cache.org/Doc/config/storeurl_rewrite_program/ As we are looking into the (far) future, we cannot possibly stick to an old version of squid. Even the currently in IPFire 2 running version 3.1 is "old" right now. Maybe it is also a good idea to design this without considering squid as the default thing to work with it. It should be possible to drop squid and use an other proxy server - although I really don't have plans for that right now, because squid is the best proxy server one can have. > Wiki for example how to use storeurl > http://wiki.squid-cache.org/Features/StoreUrlRewrite > > We already know that squid 2.7 is already obsolete - but this feature > was extremely useful for slow internet users (just like me in > Indonesia, where bandwidth is expensive). Built-in storeurl feature > inside squid 2.7 has ability to cache or manipulating caching for > dynamically hosted contents (dynamic contents). Example for Facebook's > CDN : It is interesting that this has not been ported to squid 3.x. Apparently, the reason is that this the implementation was poorly written and so people thought about replacing it entirely. It also looks like there are not many users of this feature. > If squid already cache one of this picture then all same pictures from > hprofile-ak-prn1, hprofile-ak-prn2, hprofile-ak-prn3 ..... > hprofile-ak-prnX will result cache hit - squid not necessary to fetch > same content from different cdn urls, since its already in cache and > request got rewritten by storeurl. All contents from Facebook such as > javascript, css, image, even sound and videos will have very high > chance to get hits from squid. Looking at your user data your provided further below, the important stuff to cache is big files. That's not only video and all sorts of downloads. Nowadays javascript code of sites like Facebook is of the size of one or two megabytes*. * Didn't check. Read this somewhere, some time ago. What I get from this is that we should design the rewrite to literally cache anything. A technical question from me: Why cannot we use the internal cache of squid to do so, but code our own caching proxy that is actually queried by the real caching proxy? I think even with a very fast implementation, squid will always be much faster. > This method works on almost all web that serving dynamic contents for > its visitors : Youtube videos (all resolutions) , blogger.com > contents, online games patch files, google maps, ads, imeem, etc. This > is something that you cannot done with squid 3.x. This cannot be done with squid 3 AT THE MOMENT. > Another approach to make it work on squid 3 is using ICAP - I'm not > familiar with this one since I never used it. You can see some > reference about ICAP to cache dynamic contents here (for me it seems > difficult to do it) : >
http://www.squid-cache.org/mail-archive/squid-users/201206/0074.html
As pointed out earlier, I like ICAP. The protocol has a lot of advantages and makes us independent from squid (not to replace it, but being not dependent on a certain version - they all talk ICAP). Can someone find out if somebody already implemented this kind of thing? Terima kasih, -Michael _______________________________________________ Development mailing list Development@lists.ipfire.org http://lists.ipfire.org/mailman/listinfo/development
On Wed, 2013-03-06 at 22:23 +0100, Bernhard Bitsch wrote:
I don't think we should make a new solution based on squid's caching, too.
The existing Update Accelerator is written as a rewriter module to squid.
This model is strong enough to realize the function " caching of frequent file requests ".
When we jump right ahead to discuss technical details, I would like someone to check out if we can easily control the cache to store our files, so that we don't have to manage our own one.
My first idea for a redesign of the accelerator was generalize the conditions for caching.
In the moment all conditions can be described by the pattern
if URI match set of (sample URI's and RE'S)_1 & URI !match set of (sample URIs and RE's)_2 then
check(URI)
fi
This can be enhanced if the sets of URI's and RE's are condensed to two regular expressions for each caching class, actually called "vendor".
Then the check for caching is just a loop over all classes.
A second enhancement can be achieved if the most requested checks are made first. The loop terminates by the first match.
The latest version of PCRE comes with a fast JIT compiler for regular expressions. We should take advantage of that instead of running thrugh loops.
I agree that all URLs should be configurable.
-Michael
I agree to the privacy violation problem for man-in-the-middle caching... That's why I am not allowed to use it, that is the initial problem. There must be a way to store in cache things like files which are hosted at some filehoster on the internet. But okay, at the moment this is just not possible (legal way).
When we jump right ahead to discuss technical details, I would like someone to check out if we can easily control the cache to store our files, so that we don't have to manage our own one.
You could give squid a separated cache-dir for update-things. Did you check the new "rock-storage" type of squid? It is really fast. Also the newer versions of squid are multithreaded or at least multiple instanced.
So I recommend using squid's cache. It is well established over many years and very fast. There just has to be a mapping for the files to query them from cache storage. Squid has an internal mapping, if it would be possible to get into that API - the problem will be small.
2013/3/6 Michael Tremer michael.tremer@ipfire.org
On Wed, 2013-03-06 at 22:23 +0100, Bernhard Bitsch wrote:
I don't think we should make a new solution based on squid's caching, too.
The existing Update Accelerator is written as a rewriter module to squid.
This model is strong enough to realize the function " caching of frequent file requests ".
When we jump right ahead to discuss technical details, I would like someone to check out if we can easily control the cache to store our files, so that we don't have to manage our own one.
My first idea for a redesign of the accelerator was generalize the conditions for caching.
In the moment all conditions can be described by the pattern
if URI match set of (sample URI's and RE'S)_1 & URI !match set of (sample URIs and RE's)_2 then
check(URI)
fi
This can be enhanced if the sets of URI's and RE's are condensed to two regular expressions for each caching class, actually called "vendor".
Then the check for caching is just a loop over all classes.
A second enhancement can be achieved if the most requested checks are made first. The loop terminates by the first match.
The latest version of PCRE comes with a fast JIT compiler for regular expressions. We should take advantage of that instead of running thrugh loops.
I agree that all URLs should be configurable.
-Michael
Development mailing list Development@lists.ipfire.org http://lists.ipfire.org/mailman/listinfo/development
Gesendet: Mittwoch, 06. März 2013 um 23:08 Uhr Von: "Michael Tremer" michael.tremer@ipfire.org An: "Bernhard Bitsch" Bernhard.Bitsch@gmx.de Cc: "development@lists.ipfire.org" development@lists.ipfire.org Betreff: Re: Aw: Fwd: Update-Accelerator 3.0
On Wed, 2013-03-06 at 22:23 +0100, Bernhard Bitsch wrote:
I don't think we should make a new solution based on squid's caching, too.
The existing Update Accelerator is written as a rewriter module to squid.
This model is strong enough to realize the function " caching of frequent file requests ".
When we jump right ahead to discuss technical details, I would like someone to check out if we can easily control the cache to store our files, so that we don't have to manage our own one.
No problem. But this solution must give us th possibility to manage the file store from the WUI. I don't want to miss this feature.
My first idea for a redesign of the accelerator was generalize the conditions for caching.
In the moment all conditions can be described by the pattern
if URI match set of (sample URI's and RE'S)_1 & URI !match set of (sample URIs and RE's)_2 then
check(URI)
fi
This can be enhanced if the sets of URI's and RE's are condensed to two regular expressions for each caching class, actually called "vendor".
Then the check for caching is just a loop over all classes.
A second enhancement can be achieved if the most requested checks are made first. The loop terminates by the first match.
The latest version of PCRE comes with a fast JIT compiler for regular expressions. We should take advantage of that instead of running thrugh loops.
The loops are not avoidable by a JIT compiler ( Perl does this too ). The storage application must loop over the various categories. At a short look on PCRE I could not find a possibility for efficiently assembling several single RE's/URI's to one. This is necessary if we want the user to extend the rule set. A main problem in the actual implementation is the extension by adding a new alternative.
- Bernhard
Hi would prefer also the current state of an separate repository, maybe its easy, but i dont now if its possible to move squid's cache, or part of them easily to other locations, or maintain it via the cache manager.
So the current way to store it separate seems for me the best way,
Having the possibility to set debug-options and to move the update-cache to another location via webui could also be a good feature.
with a ui to manage the sources and files but with a more IO-less version to store / access the metadata and even to handle the (update) files properly while downloading and on deletion.
Some points of thats whats missing and may implemented I wrote on the wiki-page.
But I generally would deny to implement any function which breaks current security functions like MITM-ssl or try to implement squid-functionality which is handled inside squid in a better way.
Because of the nature of dynamic content, its dynamic and handled better inside squid and its logic. To implement features, which try to cache social content like the one Fajar mentioned, is a bad way, except the content can be identified and checked for a longer time, otherwise we would implement a second LRU-cache inside squid again and thats already exist.
To cache FB content is almost senseless over a longer time, cause they frequently change their interfaces / paths and naming to prevent f.e. hackers to misuse their platform. with that in mind there is only a small part of content which can be cached and currently is (by squid).
Kind regards,
Ingo
2013/3/7 Bernhard Bitsch Bernhard.Bitsch@gmx.de:
Gesendet: Mittwoch, 06. März 2013 um 23:08 Uhr Von: "Michael Tremer" michael.tremer@ipfire.org An: "Bernhard Bitsch" Bernhard.Bitsch@gmx.de Cc: "development@lists.ipfire.org" development@lists.ipfire.org Betreff: Re: Aw: Fwd: Update-Accelerator 3.0
On Wed, 2013-03-06 at 22:23 +0100, Bernhard Bitsch wrote:
I don't think we should make a new solution based on squid's caching, too.
The existing Update Accelerator is written as a rewriter module to squid.
This model is strong enough to realize the function " caching of frequent file requests ".
When we jump right ahead to discuss technical details, I would like someone to check out if we can easily control the cache to store our files, so that we don't have to manage our own one.
No problem. But this solution must give us th possibility to manage the file store from the WUI. I don't want to miss this feature.
My first idea for a redesign of the accelerator was generalize the conditions for caching.
In the moment all conditions can be described by the pattern
if URI match set of (sample URI's and RE'S)_1 & URI !match set of (sample URIs and RE's)_2 then
check(URI)
fi
This can be enhanced if the sets of URI's and RE's are condensed to two regular expressions for each caching class, actually called "vendor".
Then the check for caching is just a loop over all classes.
A second enhancement can be achieved if the most requested checks are made first. The loop terminates by the first match.
The latest version of PCRE comes with a fast JIT compiler for regular expressions. We should take advantage of that instead of running thrugh loops.
The loops are not avoidable by a JIT compiler ( Perl does this too ). The storage application must loop over the various categories. At a short look on PCRE I could not find a possibility for efficiently assembling several single RE's/URI's to one. This is necessary if we want the user to extend the rule set. A main problem in the actual implementation is the extension by adding a new alternative.
- Bernhard
Development mailing list Development@lists.ipfire.org http://lists.ipfire.org/mailman/listinfo/development
Hey,
please guys, don't let us drop the ball on this.
I have seen that some discussed this topic on the wiki pages, which is ... interesting, but please let's keep this in an orderly fashion.
What I want you to do is to write a list on the wiki with all the features we want and which features from the current version we don't want or need anymore. Maybe Jörn-Ingo can make a list of the current features.
After that we will agree on all proposed features and see how we can implement them. We are not going into too much technical detail until we reached this point.
And if we got even that finished, we start planning who implements what and so on. Details when we get there.
-Michael
Michael,
basically you're right, but there is development just on the run in the moment. And the intermediate results aren't to bad. Therefore my more technical details.
And yes we should collect the proposal for features, mainly the topic "source and kind of cached material" is relevant for the decision about the technical implementation.
-Bernhard
On Wed, 2013-03-06 at 21:02 +0100, Jan Behrens wrote:
okay dynamic is not always really dynamic. Youtube-Videos are also dynamic content to squid. So is everything with ? in the URL threaded as dynamic in squid.
That's a bad decision if that is true, because having a query term does not necessarily make content dynamic. There are things like the Expires: and Pragma: headers in HTTP to control what has to be cached and for how long.
well it would be possible to cache HTTPS - Content with "Man-in-the-Middle" Squid. The client will make a HTTPS-connection to squid and squid will make it's connection to the original destination server. At this point squid can serve and save content.
Not with me. This heavily violates the concept of secure communication. In my opinion, I cannot trust the proxy to correctly verify the server's certificate for example. This is only one among a whole bunch of security issues.
-Michael