利用PHP中的file_get_contents(URL)函数实现偷取网页正文内容的时候,将偷取过来的UTF-8网页内容进行转码后输出,直接输出可以看到偷取后的内容。可有时我们有时也会遇到在输出偷取过来的内容时,却是一片空白,什么都没有。例如有如下代码:

$url = "http://www.nbrlw.com/wtrl/2/index.html";
$string = file_get_contents($url); 	   //得到网页内容
echo iconv('utf-8','gbk',$string);           //转码输出

却是一片空白。折腾了半天,原来在转码时,应该忽略一些不需要转码的字符,即:

 echo iconv('utf-8','gbk',$string);

改写成:

      
echo iconv('utf-8','gbk// ignore',$string);

utf-8直接转gbk,这样问题就来了,当有些字符无法转换的时候就从此处断开,导致内容不完整。后来又重新查手册,才发现iconv还有两个可选的辅助参数:TRANSLIT和IGNORE ,(其中IGNORE 就是说遇到无法转换的就跳过)。

2 Comments

  1. 12-10-2010 – Winter is in full swing in Utah. Get stoked with the latest photos and videos from Utah resorts. Get to Utah this holiday season. Amazing conditions are guaranteed. We have twelve resorts open and more on the way. Snow conditions are prime for the early season. Keep your eyes on the blog for the latest from Utah’s resorts as …

发表评论

电子邮件地址不会被公开。 必填项已用*标注

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


*