263 lines
7.4 KiB
Plaintext
263 lines
7.4 KiB
Plaintext
NAME:
|
|
|
|
Snoopy - the PHP net client v1.2.4
|
|
|
|
SYNOPSIS:
|
|
|
|
include "Snoopy.class.php";
|
|
$snoopy = new Snoopy;
|
|
|
|
$snoopy->fetchtext("http://www.php.net/");
|
|
print $snoopy->results;
|
|
|
|
$snoopy->fetchlinks("http://www.phpbuilder.com/");
|
|
print $snoopy->results;
|
|
|
|
$submit_url = "http://lnk.ispi.net/texis/scripts/msearch/netsearch.html";
|
|
|
|
$submit_vars["q"] = "amiga";
|
|
$submit_vars["submit"] = "Search!";
|
|
$submit_vars["searchhost"] = "Altavista";
|
|
|
|
$snoopy->submit($submit_url,$submit_vars);
|
|
print $snoopy->results;
|
|
|
|
$snoopy->maxframes=5;
|
|
$snoopy->fetch("http://www.ispi.net/");
|
|
echo "<PRE>\n";
|
|
echo htmlentities($snoopy->results[0]);
|
|
echo htmlentities($snoopy->results[1]);
|
|
echo htmlentities($snoopy->results[2]);
|
|
echo "</PRE>\n";
|
|
|
|
$snoopy->fetchform("http://www.altavista.com");
|
|
print $snoopy->results;
|
|
|
|
DESCRIPTION:
|
|
|
|
What is Snoopy?
|
|
|
|
Snoopy is a PHP class that simulates a web browser. It automates the
|
|
task of retrieving web page content and posting forms, for example.
|
|
|
|
Some of Snoopy's features:
|
|
|
|
* easily fetch the contents of a web page
|
|
* easily fetch the text from a web page (strip html tags)
|
|
* easily fetch the the links from a web page
|
|
* supports proxy hosts
|
|
* supports basic user/pass authentication
|
|
* supports setting user_agent, referer, cookies and header content
|
|
* supports browser redirects, and controlled depth of redirects
|
|
* expands fetched links to fully qualified URLs (default)
|
|
* easily submit form data and retrieve the results
|
|
* supports following html frames (added v0.92)
|
|
* supports passing cookies on redirects (added v0.92)
|
|
|
|
|
|
REQUIREMENTS:
|
|
|
|
Snoopy requires PHP with PCRE (Perl Compatible Regular Expressions),
|
|
which should be PHP 3.0.9 and up. For read timeout support, it requires
|
|
PHP 4 Beta 4 or later. Snoopy was developed and tested with PHP 3.0.12.
|
|
|
|
CLASS METHODS:
|
|
|
|
fetch($URI)
|
|
-----------
|
|
|
|
This is the method used for fetching the contents of a web page.
|
|
$URI is the fully qualified URL of the page to fetch.
|
|
The results of the fetch are stored in $this->results.
|
|
If you are fetching frames, then $this->results
|
|
contains each frame fetched in an array.
|
|
|
|
fetchtext($URI)
|
|
---------------
|
|
|
|
This behaves exactly like fetch() except that it only returns
|
|
the text from the page, stripping out html tags and other
|
|
irrelevant data.
|
|
|
|
fetchform($URI)
|
|
---------------
|
|
|
|
This behaves exactly like fetch() except that it only returns
|
|
the form elements from the page, stripping out html tags and other
|
|
irrelevant data.
|
|
|
|
fetchlinks($URI)
|
|
----------------
|
|
|
|
This behaves exactly like fetch() except that it only returns
|
|
the links from the page. By default, relative links are
|
|
converted to their fully qualified URL form.
|
|
|
|
submit($URI,$formvars)
|
|
----------------------
|
|
|
|
This submits a form to the specified $URI. $formvars is an
|
|
array of the form variables to pass.
|
|
|
|
|
|
submittext($URI,$formvars)
|
|
--------------------------
|
|
|
|
This behaves exactly like submit() except that it only returns
|
|
the text from the page, stripping out html tags and other
|
|
irrelevant data.
|
|
|
|
submitlinks($URI)
|
|
----------------
|
|
|
|
This behaves exactly like submit() except that it only returns
|
|
the links from the page. By default, relative links are
|
|
converted to their fully qualified URL form.
|
|
|
|
|
|
CLASS VARIABLES: (default value in parenthesis)
|
|
|
|
$host the host to connect to
|
|
$port the port to connect to
|
|
$proxy_host the proxy host to use, if any
|
|
$proxy_port the proxy port to use, if any
|
|
$agent the user agent to masqerade as (Snoopy v0.1)
|
|
$referer referer information to pass, if any
|
|
$cookies cookies to pass if any
|
|
$rawheaders other header info to pass, if any
|
|
$maxredirs maximum redirects to allow. 0=none allowed. (5)
|
|
$offsiteok whether or not to allow redirects off-site. (true)
|
|
$expandlinks whether or not to expand links to fully qualified URLs (true)
|
|
$user authentication username, if any
|
|
$pass authentication password, if any
|
|
$accept http accept types (image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*)
|
|
$error where errors are sent, if any
|
|
$response_code responde code returned from server
|
|
$headers headers returned from server
|
|
$maxlength max return data length
|
|
$read_timeout timeout on read operations (requires PHP 4 Beta 4+)
|
|
set to 0 to disallow timeouts
|
|
$timed_out true if a read operation timed out (requires PHP 4 Beta 4+)
|
|
$maxframes number of frames we will follow
|
|
$status http status of fetch
|
|
$temp_dir temp directory that the webserver can write to. (/tmp)
|
|
$curl_path system path to cURL binary, set to false if none
|
|
|
|
|
|
EXAMPLES:
|
|
|
|
Example: fetch a web page and display the return headers and
|
|
the contents of the page (html-escaped):
|
|
|
|
include "Snoopy.class.php";
|
|
$snoopy = new Snoopy;
|
|
|
|
$snoopy->user = "joe";
|
|
$snoopy->pass = "bloe";
|
|
|
|
if($snoopy->fetch("http://www.slashdot.org/"))
|
|
{
|
|
echo "response code: ".$snoopy->response_code."<br>\n";
|
|
while(list($key,$val) = each($snoopy->headers))
|
|
echo $key.": ".$val."<br>\n";
|
|
echo "<p>\n";
|
|
|
|
echo "<PRE>".htmlspecialchars($snoopy->results)."</PRE>\n";
|
|
}
|
|
else
|
|
echo "error fetching document: ".$snoopy->error."\n";
|
|
|
|
|
|
|
|
Example: submit a form and print out the result headers
|
|
and html-escaped page:
|
|
|
|
include "Snoopy.class.php";
|
|
$snoopy = new Snoopy;
|
|
|
|
$submit_url = "http://lnk.ispi.net/texis/scripts/msearch/netsearch.html";
|
|
|
|
$submit_vars["q"] = "amiga";
|
|
$submit_vars["submit"] = "Search!";
|
|
$submit_vars["searchhost"] = "Altavista";
|
|
|
|
|
|
if($snoopy->submit($submit_url,$submit_vars))
|
|
{
|
|
while(list($key,$val) = each($snoopy->headers))
|
|
echo $key.": ".$val."<br>\n";
|
|
echo "<p>\n";
|
|
|
|
echo "<PRE>".htmlspecialchars($snoopy->results)."</PRE>\n";
|
|
}
|
|
else
|
|
echo "error fetching document: ".$snoopy->error."\n";
|
|
|
|
|
|
|
|
Example: showing functionality of all the variables:
|
|
|
|
|
|
include "Snoopy.class.php";
|
|
$snoopy = new Snoopy;
|
|
|
|
$snoopy->proxy_host = "my.proxy.host";
|
|
$snoopy->proxy_port = "8080";
|
|
|
|
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)";
|
|
$snoopy->referer = "http://www.microsnot.com/";
|
|
|
|
$snoopy->cookies["SessionID"] = 238472834723489l;
|
|
$snoopy->cookies["favoriteColor"] = "RED";
|
|
|
|
$snoopy->rawheaders["Pragma"] = "no-cache";
|
|
|
|
$snoopy->maxredirs = 2;
|
|
$snoopy->offsiteok = false;
|
|
$snoopy->expandlinks = false;
|
|
|
|
$snoopy->user = "joe";
|
|
$snoopy->pass = "bloe";
|
|
|
|
if($snoopy->fetchtext("http://www.phpbuilder.com"))
|
|
{
|
|
while(list($key,$val) = each($snoopy->headers))
|
|
echo $key.": ".$val."<br>\n";
|
|
echo "<p>\n";
|
|
|
|
echo "<PRE>".htmlspecialchars($snoopy->results)."</PRE>\n";
|
|
}
|
|
else
|
|
echo "error fetching document: ".$snoopy->error."\n";
|
|
|
|
|
|
Example: fetched framed content and display the results
|
|
|
|
include "Snoopy.class.php";
|
|
$snoopy = new Snoopy;
|
|
|
|
$snoopy->maxframes = 5;
|
|
|
|
if($snoopy->fetch("http://www.ispi.net/"))
|
|
{
|
|
echo "<PRE>".htmlspecialchars($snoopy->results[0])."</PRE>\n";
|
|
echo "<PRE>".htmlspecialchars($snoopy->results[1])."</PRE>\n";
|
|
echo "<PRE>".htmlspecialchars($snoopy->results[2])."</PRE>\n";
|
|
}
|
|
else
|
|
echo "error fetching document: ".$snoopy->error."\n";
|
|
|
|
|
|
COPYRIGHT:
|
|
Copyright(c) 1999,2000 ispi. All rights reserved.
|
|
This software is released under the GNU General Public License.
|
|
Please read the disclaimer at the top of the Snoopy.class.php file.
|
|
|
|
|
|
THANKS:
|
|
Special Thanks to:
|
|
Peter Sorger <sorgo@cool.sk> help fixing a redirect bug
|
|
Andrei Zmievski <andrei@ispi.net> implementing time out functionality
|
|
Patric Sandelin <patric@kajen.com> help with fetchform debugging
|
|
Carmelo <carmelo@meltingsoft.com> misc bug fixes with frames
|