[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Omaha.pm] Another 10m ad-hoc report
I love that it takes longer to explain what I'm doing and why then to
actually do it in Perl. :)
The Swiss army chainsaw of text processing, baby. :)
j
Project:
Given a file that looks like this:
2006-07-14 09:12:59|97036502|NYCBER|GNRSPE|1170245141
2006-07-14 09:12:59|97036503|CRPBFT|GNRSPE|1450000001
CRPBFT|GNRSPE|1450000001|L||2007173547||DMC|2006-07-14
09:17:08.27300|0|0|PROCRPBFTACT-2007173547ITN-6COD-12PMFRD-2006071400000
0TOD-20060716000000AMT-0STA-A
1) Ignore all lines that don't start with "2006"
2) Ignore all lines that don't contain "GRMSTR"
3) In the remaining lines:
Column 1 (counting from 0) is "prop".
Column 4 (counting from 0) is "message_grp".
Per prop, tell me the number of lines, and the number of unique
message_grp's.
Solution:
$ cat j.pl
while (<>) {
next unless (/^2006/);
next unless (/GRMSTR/);
@l = split /\|/;
$count{$l[2]}{keys}{$l[4]} = 1;
$count{$l[2]}{lines}++;
}
foreach $prop (sort keys %count) {
my $lines = $count{$prop}{lines};
my $keys = scalar(keys %{$count{$prop}{keys}});
print "$prop sent $lines GRMSTR records containing $keys unique
message_grp's\n";
}
Result:
$ cat libqumv.log | perl j.pl
ATLCNN sent 37 GRMSTR records containing 37 unique message_grp's
AUSCTR sent 28 GRMSTR records containing 28 unique message_grp's
...etc...