使用php simple html dom parser解析html标签
用了一下
解析HTML页面,感觉还不错,它能创建一个DOM tree方便你解析html里面的内容。用来抓东西挺好的。
附带一个例子,你也到sourceforge下载压缩包看里面的例子:
PHP Simple HTML DOM Parser , written in PHP5+, allows you to manipulate HTML in a very easy way. Supporting invalid HTML, this parser is better then other PHP scripts
using complicated regexes to extract information from web pages.
Before getting the necessary info, a DOM should be created from either URL or file. The following script extracts links & images from a website:
view plain
copy to clipboard
print
?
Php代码
- // Create DOM from URL or file
- $html = file_get_html("http://www.microsoft.com/");
-
- // Extract links
- foreach($html->find("a") as $element)
- echo $element->href . "<br>";
-
- // Extract images
- foreach($html->find("img") as $element)
- echo $element->src . "<br>";
// Create DOM from URL or file
$html = file_get_html("http://www.microsoft.com/");
// Extract links
foreach($html->find("a") as $element)
echo $element->href . "<br>";
// Extract images
foreach($html->find("img") as $element)
echo $element->src . "<br>";
The parser can also be used to modify HTML elements:
view plain
copy to clipboard
print
?
Php代码
- // Create DOM from string
- $html = str_get_html("<div id="simple">Simple</div><div id="parser">Parser</div>");
-
- $html->find("div", 1)->class = "bar";
-
- $html->find("div[id=simple]", 0)->innertext = "Foo";
-
- // Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>
- echo $html;
// Create DOM from string
$html = str_get_html("<div id="simple">Simple</div><div id="parser">Parser</div>");
$html->find("div", 1)->class = "bar";
$html->find("div[id=simple]", 0)->innertext = "Foo";
// Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>
echo $html;
Do you wish to retrieve content without any tags?
view plain
copy to clipboard
print
?
Php代码
- echo file_get_html("http://www.yahoo.com/")->plaintext;
echo file_get_html("http://www.yahoo.com/")->plaintext;
In the package files of this parser ([url]http://simplehtmldom.sourceforge.net/[/url]) you can find some scraping examples from digg, imdb, slashdot. Let’s create one that extracts the first 10 results (titles only) for the keyword “php” from Google:
view plain
copy to clipboard
print
?
Php代码
- $url = "http://www.google.com/search?hl=en&q=php&btnG=Search";
-
- // Create DOM from URL
- $html = file_get_html($url);
-
- // Match all "A" tags that have the class attribute equal with "l"
- foreach($html->find("a[class=l]") as $key => $info)
- {
- echo ($key + 1).". ".$info->plaintext."<br />
";
- }
$url = "http://www.google.com/search?hl=en&q=php&btnG=Search";
// Create DOM from URL
$html = file_get_html($url);
// Match all "A" tags that have the class attribute equal with "l"
foreach($html->find("a[class=l]") as $key => $info)
{
echo ($key + 1).". ".$info->plaintext."<br />
";
}
NOTE Make sure to include the parser before using any functions of it:
view plain
copy to clipboard
print
?
Php代码
- include "simple_html_dom.php";
include "simple_html_dom.php";
For more information regarding the usage of this function consider checking the ‘PHP Simple HTML Dom Parser’ Manual. To download the package files use the following URL:
[url]