[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Omaha.pm] Tweak the Perl regex engine: assign to pos()
http://headrattle.blogspot.com/search/label/perl
OK, Perl is way too cool.
I was minding my own business, searching for every occurrence of
'CCAGC' in E-coli, when I hit a snag. Several hundred of my known
locations weren't showing up.
Why? Because the Perl regular expression engine, by default, starts
searching for the next occurrence of something after the end of the
occurrence it just found. This is what most humans want. But you may
notice that in the string 'CCAGCCAGC' the thing I'm searching for
('CCAGC') overlaps itself, so the regex engine doesn't see the second
one.
"Crap," I thought.
But this is Perl -- maybe there's a way? 30 seconds in the
documentation (perldoc -f pos) and it said I could assign to pos().
Really? Sweet! Problem solved!
#!/usr/bin/perl
use strict;
open (IN, "E_coli.seq");
my $seq = <IN>;
chomp $seq;
close IN;
my $find_this = 'CCAGC';
while ($seq =~ /$find_this/g) {
my $start = pos($seq) - length( $find_this ) + 1;
my $stop = pos($seq);
pos($seq) = $start;
print " Found $find_this at [$start..$stop]\n";
}