igor (ico) wrote,
igor
ico

парсим большой xml файл,перл

Значительно быстрее XML::Parser, XML::Simple, XML::LibXML и XML::DOM

#!perl
use strict;
my $xmlfile = shift; die "Cannot find file \"$xmlfile\"" unless -f $xmlfile;
open(XML, "<", $xmlfile );
{
  local $/ = '</content>';
   while (<XML>) {

         while ( /<ip*>(.+?)<\/ip>/ig) {
           print "$1\n";
         }
         while ( /<ipSubnet*>(.+?)<\/ipSubnet>/ig) {
           print "$1\n";
         }

         print "\n";

     }
}

[Error: Irreparable invalid markup ('<ipsubnet*>') in entry. Owner must fix manually. Raw contents below.]

Значительно быстрее XML::Parser, XML::Simple, XML::LibXML и XML::DOM
<lj-cut>
<pre>
#!perl
use strict;
my $xmlfile = shift; die &quot;Cannot find file \&quot;$xmlfile\&quot;&quot; unless -f $xmlfile;
open(XML, &quot;&lt;&quot;, $xmlfile );
{
&nbsp; local $/ = &#39;&lt;/content&gt;&#39;;
&nbsp;&nbsp; while (&lt;XML&gt;) {

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; while ( /&lt;ip*&gt;(.+?)&lt;\/ip&gt;/ig) {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print &quot;$1\n&quot;;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; while ( /&lt;ipSubnet*&gt;(.+?)&lt;\/ipSubnet&gt;/ig) {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print &quot;$1\n&quot;;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print &quot;\n&quot;;

&nbsp;&nbsp;&nbsp;&nbsp; }
}

<domain><url><ip><ipsubnet*></ipsubnet*></ip></url></domain></pre>




проверяем на гранях.ру:
- d:grani.ru ip:95.211.178.194 ip:166.78.44.193 ip:192.237.193.101 ip:166.78.40.126 ip:209.61.166.159 ip:209.61.166.246
ip:192.237.192.230 ip:209.61.166.64 ip:209.61.166.165 ip:209.61.166.4 ip:161.47.4.47 ip:23.253.121.44 ip:161.47.4.175
ip:161.47.4.248 ip:161.47.5.110 ip:161.47.5.166 ip:161.47.5.217 ip:162.242.141.164 ip:104.130.254.221 ip:104.239.245.140
ip:104.239.247.103 ip:172.99.100.79 ip:161.47.4.162 ip:174.143.186.112 ip:161.47.4.174 ip:66.216.67.114 ip:161.47.4.48
ip:161.47.5.142 ip:161.47.5.172 ip:161.47.5.163 ip:161.47.4.19 ip:174.143.186.218 ip:161.47.4.188 ip:209.61.166.43 ip:209.61.166.28
ip:162.209.66.206 ip:166.78.44.37 ip:66.216.109.184 ip:166.78.45.247 ip:161.47.4.80 ip:166.78.40.252 ip:161.47.5.183
ip:66.216.67.40 ip:104.239.245.37 ip:166.78.40.84 ip:161.47.7.59 ip:104.130.254.249 ip:104.239.247.210 ip:172.99.100.95
ip:161.47.6.150 ip:166.78.40.123 ip:66.216.68.28 ip:161.47.5.39 ip:104.130.144.187 ip:148.62.2.165 ip:104.239.220.103
ip:161.47.16.237 ip:148.62.4.51 ip:161.47.17.78 ip:161.47.17.182 ip:148.62.5.136 ip:148.62.5.188 ip:161.47.17.228 ip:148.62.0.118
ip:148.62.0.187 ip:161.47.18.82 ip:161.47.18.142 ip:148.62.2.12 ip:148.62.2.77 ip:161.47.19.33 ip:148.62.3.24 ip:148.62.4.172
ip:161.47.20.221 ip:161.47.21.100 ip:148.62.2.178 ip:148.62.3.19 ip:148.62.4.105 ip:148.62.5.241 ip:161.47.16.178 ip:161.47.16.198
ip:161.47.19.60 ip:148.62.5.165 ip:161.47.19.123 ip:148.62.4.219 ip:161.47.20.22 ip:148.62.2.60 ip:161.47.20.84 ip:148.62.3.249
ip:161.47.21.126 ip:148.62.4.250 ip:161.47.16.10 ip:161.47.16.140
...
итд.

</lj-cut>
Tags: perl
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments