匹配获取HTML标签属性的正则 表达式
目的:
1、希望删除除class,src,href外的其他HTML
例如
1)
<a href="http://51js.com" title="这是标题" class="a">标题</a>
删除属性后:
<a href="http://51js.com" class="a">标题</a>
2)
<td style="color:red" class="b" rospan="3" colspan="5"> </td>
删除属性后:
<td class="b" rospan="3" colspan="5"> </td>
想找到以个匹配这样的正则表达式,谢谢。
LZ,刚看到,你一是“希望删除除class、src、href外的其他”,二是希望删除除class、 rospan、 colspan外的其它。综合1、2,你的意思就是删除title、style等。由此琢磨你的实际用途也就是说,大多数的标签属性需要保留,只有少数删除掉。那我就想啦,标签属性如此之多,匹配保留的关键词串势必特别雍长,你干嘛不穷举少数呢? SUCH AS:(?=title|style)[^s]+=[""]?[^""]*[""]?(?=s|>)。下面逐步建立和完善这个正则。
var str = "
<a href="http://51js.com" title=这是标题 class="a">标题</a>
<td style=color:red class="b" rospan="3" colspan="5"></td>
<td style="border-right: #d4d0c8; padding-right: 0.75pt; border-top: #d4d0c8;
padding-left: 0.75pt; padding-bottom: 0cm; border-left: windowtext 0.5pt solid;
width: 62pt; padding-top: 0.75pt; border-bottom: black 0.5pt solid; height: 18.75pt;
background-color: transparent" width=83 rowspan="4">
<a href="fdsafd" class="ddd" rowspan="fdsd">
";
str = str.replace(/(?=title|style)[^s]+=[""]?[^""]*[""]?(?=s|>)/gi, "");
alert(str)
- <script>
- var str = "
- <a href="http://51js.com" title=这是标题 class="a">标题</a>
- <td style=color:red class="b" rospan="3" colspan="5"></td>
- <td style="border-right: #d4d0c8; padding-right: 0.75pt; border-top: #d4d0c8;
- padding-left: 0.75pt; padding-bottom: 0cm; border-left: windowtext 0.5pt solid;
- width: 62pt; padding-top: 0.75pt; border-bottom: black 0.5pt solid; height: 18.75pt;
- background-color: transparent" width=83 rowspan="4">
- <a href="fdsafd" class="ddd" rowspan="fdsd">
- ";
- str = str.replace(/(?=title|style)[^s]+=[""]?[^""]*[""]?(?=s|>)/gi, "");
- alert(str)
- </script>
截取parent源码片段,实际测试HTML标签属性过滤。为了说明问题,在上面基础上再过滤掉属性“class”和“alt”。
<textarea id="txt" style="width:500px;height:500px">
<div class="maintable"><br><div class="subtable nav" style="width:100%">
<span id="forumlist" onmouseover="showMenu(this.id)"><a href="index.php">无忧脚本</a></span>
» <a href="forumdisplay.php?fid=1">JavaScript & VBScript & DHTML 脚本技术讨论版</a> » 求以匹配获取HTML标签属性的正则 表达式</div><br></div>
<div class="maintable">
<table width="100%" cellspacing="0" cellpadding="0" align="center" style="clear: both;">
<tr><td valign="bottom">
<div style="margin-bottom: 4px">
<a href="redirect.php?fid=1&tid=88672&goto=nextoldset" style="font-weight: normal"> ‹‹ 上一主题</a> | <a href="redirect.php?fid=1&tid=88672&goto=nextnewset" style="font-weight: normal">下一主题 ››</a><br>
</div>
</td><td width="40%" align="right" valign="bottom">
<div class="right"> <a href="post.php?action=reply&fid=1&tid=88672&extra="><img src="images/default/reply.gif" border="0" alt="" /></a></div>
<div id="newspecialheader" class="right" onmouseover="showMenu(this.id)"><a
href="post.php?action=newthread&fid=1&extra="
><img src="images/default/newtopic.gif" border="0" alt="" /></a><a href="###"><img src="images/default/newspecial.gif" border="0" alt="" /></a></div>
<div class="popupmenu_popup newspecialmenu" id="newspecialheader_menu" style="display: none">
<table cellpadding="4" cellspacing="0" border="0" width="100%">
<tr><td class="popupmenu_option"><div class="newspecial"><a href="post.php?action=newthread&fid=1&extra=&poll=yes">投票</a></div></td></tr>
<div class="maintable">
</textarea>
<script>
var str = document.getElementById("txt").value;
str = str.replace(/(?=title|style|class|alt)[^s]+=[""]?[^""]*[""]?(?=s|>)/gi, "");
alert(str)
</script>
- <textarea id="txt" style="width:500px;height:500px">
- <div class="maintable"><br><div class="subtable nav" style="width:100%">
- <span id="forumlist" onmouseover="showMenu(this.id)"><a href="index.php">无忧脚本</a></span>
- » <a href="forumdisplay.php?fid=1">JavaScript & VBScript & DHTML 脚本技术讨论版</a> » 求以匹配获取HTML标签属性的正则 表达式</div><br></div>
- <div class="maintable">
- <table width="100%" cellspacing="0" cellpadding="0" align="center" style="clear: both;">
- <tr><td valign="bottom">
- <div style="margin-bottom: 4px">
- <a href="redirect.php?fid=1&tid=88672&goto=nextoldset" style="font-weight: normal"> ‹‹ 上一主题</a> | <a href="redirect.php?fid=1&tid=88672&goto=nextnewset" style="font-weight: normal">下一主题 ››</a><br>
- </div>
- </td><td width="40%" align="right" valign="bottom">
- <div class="right"> <a href="post.php?action=reply&fid=1&tid=88672&extra="><img src="images/default/reply.gif" border="0" alt="" /></a></div>
- <div id="newspecialheader" class="right" onmouseover="showMenu(this.id)"><a
- href="post.php?action=newthread&fid=1&extra="
- ><img src="images/default/newtopic.gif" border="0" alt="" /></a><a href="###"><img src="images/default/newspecial.gif" border="0" alt="" /></a></div>
- <div class="popupmenu_popup newspecialmenu" id="newspecialheader_menu" style="display: none">
- <table cellpadding="4" cellspacing="0" border="0" width="100%">
- <tr><td class="popupmenu_option"><div class="newspecial"><a href="post.php?action=newthread&fid=1&extra=&poll=yes">投票</a></div></td></tr>
- <div class="maintable">
- </textarea>
- <script>
- var str = document.getElementById("txt").value;
- str = str.replace(/(?=title|style|class|alt)[^s]+=[""]?[^""]*[""]?(?=s|>)/gi, "");
- alert(str)
- </script>
对上面表达式进行完善,防止出现<...>title="..."</...>时候删除title="...",即仅仅过滤HTML标签的属性。
具体添加逻辑判断(?![^>]*(?=<)):字符处理范围不包括标签对之间的innerText ==> /(?![^>]*(?=<))(?=title|style)[^s]+=[""]?[^""]*[""]?(?=s|>)/gi
- <script>
- var str = "<a href="http://51js.com" title=这是标题 class="a">标题 title=这是标题 style=color:red</a><td style=color:red class="b" rospan="3" colspan="5">style=color:red title=这是标题</td>";
- str = str.replace(/(?![^>]*(?=<))(?=title|style)[^s]+=[""]?[^""]*[""]?(?=s|>)/gi, "");
- alert(str)
- </script>
- 上一篇: 正则表达式 提取 html 标签的内容
- 下一篇: 正则表达式提取HTML页面的特定部分