No subject


Tue Jan 17 15:36:28 EST 2012


Dynamic Mirror
  Description:=20
  Assume there are nice webpages on remote hosts we want to bring into =
our namespace. For FTP servers we would use the mirror program which =
actually maintains an explicit up-to-date copy of the remote data on the =
local machine. For a webserver we could use the program webcopy which =
acts similar via HTTP. But both techniques have one major drawback: The =
local copy is always just as up-to-date as often we run the program. It =
would be much better if the mirror is not a static one we have to =
establish explicitly. Instead we want a dynamic mirror with data which =
gets updated automatically when there is need (updated data on the =
remote host).

  Solution:=20
  To provide this feature we map the remote webpage or even the complete =
remote webarea to our namespace by the use of the Proxy Throughput =
feature (flag [P]):

RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^hotsheet/(.*)$  http://www.tstimpreso.com/hotsheet/$1  =
[P]
RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^usa-news\.html$   =
http://www.quux-corp.com/news/index.html  [P]
Reverse Dynamic Mirror
  Description:=20
  ...=20
  Solution:=20

RewriteEngine on
RewriteCond   /mirror/of/remotesite/$1           -U
RewriteRule   ^http://www\.remotesite\.com/(.*)$ =
/mirror/of/remotesite/$1
Confused yet?  While I've got you all in a spin, I'll casually slip in =
some extra information that isn't useful for this problem but I'd like =
to mention it to ^yob anyway... :-POn-the-fly Content-Regeneration
  Description:=20
  Here comes a really esoteric feature: Dynamically generated but =
statically served pages, i.e. pages should be delivered as pure static =
pages (read from the filesystem and just passed through), but they have =
to be generated dynamically by the webserver if missing. This way you =
can have CGI-generated pages which are statically served unless one (or =
a cronjob) removes the static contents. Then the contents gets =
refreshed.

  Solution:=20
  This is done via the following ruleset:=20
RewriteCond %{REQUEST_FILENAME}   !-s
RewriteRule ^page\.html$          page.cgi   =
[T=3Dapplication/x-httpd-cgi,L]
Here a request to page.html leads to a internal run of a corresponding =
page.cgi if page.html is still missing or has filesize null. The trick =
here is that page.cgi is a usual CGI script which (additionally to its =
STDOUT) writes its output to the file page.html. Once it was run, the =
server sends out the data of page.html. When the webmaster wants to =
force a refresh the contents, he just removes page.html (usually done by =
a cronjob).

Cheers,Tyson,
------=_NextPart_000_002B_01C38DA4.D2AA3970
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1264" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY>
<DIV><FONT face=3DArial size=3D2>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>Sneeze=20
asked:</FONT></DIV></FONT><FONT face=3DArial size=3D2>&gt; I'm wanting =
to setup some=20
mirrors of Internet websites on a PC.&nbsp; Can anyone<BR>&gt; point me =
towards=20
some good resources on how to do this and what programs I<BR>&gt; might=20
need.&nbsp; I've had a quick look at wget and noticed some =
mirroring<BR>&gt;=20
options but I'm not sure what to do.<BR></DIV></FONT>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>First thing, there are =
2 basic=20
approaches to this:</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>1. Take a snapshot of a =
site and=20
present the snapshot content on your mirror.</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>2. Use proxy software =
to dynamically=20
present the other website.</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>Using 'wget' you will =
end up with a=20
snapshot of the website.</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&gt; Also, is it at all possible to =
mirror a site=20
that's completely scripted or<BR>&gt; PHP?&nbsp; Would the mirroring =
process=20
create static HTML pages in place of the<BR>&gt; original=20
pages?<BR></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>Proxy&nbsp;software =
will be able to=20
store static versions of pages served using&nbsp;'GET /somefile.php=20
HTTP/1.x'&nbsp;for re-distribution when the cache control mechanisms =
allow=20
it.&nbsp; This means for the majority of PHP scripted pages, the answer =
is=20
NO.&nbsp;&nbsp;Unless the PHP coder has been thoughtful enough to =
specify proxy=20
friendly cache control parameters for pages that are non-dynamic, the =
proxy=20
software will not be able to serve cached pages from the =
site.</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>It may be possible to =
tell proxy=20
software to disallow access to non-cacheable content, and hence you =
would have=20
an interesting mix of the two - dynamic mirror building of static =
content.&nbsp;=20
See the following for details relating to the&nbsp;Apache 2.0 web server =
(which=20
can do proxy serving):</FONT></DIV>
<DIV><A =
href=3D"http://httpd.apache.org/docs-2.0/mod/mod_proxy.html"><FONT=20
face=3DArial=20
size=3D2>http://httpd.apache.org/docs-2.0/mod/mod_proxy.html</FONT></A></=
DIV>
<DIV><A =
href=3D"http://httpd.apache.org/docs-2.0/mod/mod_disk_cache.html"><FONT=20
face=3DArial=20
size=3D2>http://httpd.apache.org/docs-2.0/mod/mod_disk_cache.html</FONT><=
/A></DIV>
<DIV><A =
href=3D"http://httpd.apache.org/docs-2.0/mod/mod_mem_cache.html"><FONT=20
face=3DArial=20
size=3D2>http://httpd.apache.org/docs-2.0/mod/mod_mem_cache.html</FONT></=
A></DIV>
<DIV><A=20
href=3D"http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html#Solutions">=
<FONT=20
face=3DArial=20
size=3D2>http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html#Solutions<=
/FONT></A></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2></FONT><FONT face=3DArial =
size=3D2><FONT=20
color=3D#0000ff>From</FONT> <A=20
href=3D"http://httpd.apache.org/docs-2.0/misc/rewriteguide.html"><FONT =
face=3DArial=20
size=3D2>http://httpd.apache.org/docs-2.0/misc/rewriteguide.html</FONT></=
A>&nbsp;<FONT=20
color=3D#0000ff>we find:</FONT></FONT></DIV>
<H3>Dynamic Mirror</H3>
<DL>
  <DT>Description:=20
  <DD>
  <P>Assume there are nice webpages on remote hosts we want to bring =
into our=20
  namespace. For FTP servers we would use the <CODE>mirror</CODE> =
program which=20
  actually maintains an explicit up-to-date copy of the remote data on =
the local=20
  machine. For a webserver we could use the program <CODE>webcopy</CODE> =
which=20
  acts similar via HTTP. But both techniques have one major drawback: =
The local=20
  copy is always just as up-to-date as often we run the program. It =
would be=20
  much better if the mirror is not a static one we have to establish =
explicitly.=20
  Instead we want a dynamic mirror with data which gets updated =
automatically=20
  when there is need (updated data on the remote host).</P>
  <DT>Solution:=20
  <DD>
  <P>To provide this feature we map the remote webpage or even the =
complete=20
  remote webarea to our namespace by the use of the <DFN>Proxy =
Throughput</DFN>=20
  feature (flag <CODE>[P]</CODE>):</P>
  <DIV class=3Dexample><PRE>RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^<STRONG>hotsheet/</STRONG>(.*)$  =
<STRONG>http://www.tstimpreso.com/hotsheet/</STRONG>$1  =
[<STRONG>P</STRONG>]
</PRE></DIV>
  <DIV class=3Dexample><PRE>RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^<STRONG>usa-news\.html</STRONG>$   =
<STRONG>http://www.quux-corp.com/news/index.html</STRONG>  =
[<STRONG>P</STRONG>]
</PRE></DIV></DD></DL>
<H3>Reverse Dynamic Mirror</H3>
<DL>
  <DT>Description:=20
  <DD>...=20
  <DT>Solution:=20
  <DD>
  <DIV class=3Dexample><PRE>RewriteEngine on
RewriteCond   /mirror/of/remotesite/$1           -U
RewriteRule   ^http://www\.remotesite\.com/(.*)$ =
/mirror/of/remotesite/$1
</PRE></DIV></DD></DL>
<DIV class=3Dexample><PRE><FONT face=3DArial color=3D#0000ff =
size=3D2>Confused yet?  While I've got you all in a spin, I'll casually =
slip in some extra information that isn't useful for this problem but =
I'd like to mention it to ^yob anyway... :-P</FONT></PRE>
<H3>On-the-fly Content-Regeneration</H3>
<DL>
  <DT>Description:=20
  <DD>
  <P>Here comes a really esoteric feature: Dynamically generated but =
statically=20
  served pages, i.e. pages should be delivered as pure static pages =
(read from=20
  the filesystem and just passed through), but they have to be generated =

  dynamically by the webserver if missing. This way you can have =
CGI-generated=20
  pages which are statically served unless one (or a cronjob) removes =
the static=20
  contents. Then the contents gets refreshed.</P>
  <DT>Solution:=20
  <DD>This is done via the following ruleset:=20
  <DIV class=3Dexample><PRE>RewriteCond %{REQUEST_FILENAME}   =
<STRONG>!-s</STRONG>
RewriteRule ^page\.<STRONG>html</STRONG>$          =
page.<STRONG>cgi</STRONG>   [T=3Dapplication/x-httpd-cgi,L]
</PRE></DIV>
  <P>Here a request to <CODE>page.html</CODE> leads to a internal run of =
a=20
  corresponding <CODE>page.cgi</CODE> if <CODE>page.html</CODE> is still =
missing=20
  or has filesize null. The trick here is that <CODE>page.cgi</CODE> is =
a usual=20
  CGI script which (additionally to its <CODE>STDOUT</CODE>) writes its =
output=20
  to the file <CODE>page.html</CODE>. Once it was run, the server sends =
out the=20
  data of <CODE>page.html</CODE>. When the webmaster wants to force a =
refresh=20
  the contents, he just removes <CODE>page.html</CODE> (usually done by =
a=20
  cronjob).</P></DD></DL><PRE><P><FONT face=3DArial color=3D#0000ff =
size=3D2>Cheers,</FONT></P><P><FONT face=3DArial color=3D#0000ff =
size=3D2>Tyson,</FONT></P></PRE></DIV></BODY></HTML>

------=_NextPart_000_002B_01C38DA4.D2AA3970--


To unsubscribe: send mail to majordomo at wireless.org.au
with "unsubscribe melbwireless" in the body of the message



More information about the Melbwireless mailing list