[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Omaha.pm] Tweak the Perl regex engine: assign to pos()



http://headrattle.blogspot.com/search/label/perl


OK, Perl is way too cool.

I was minding my own business, searching for every occurrence of 'CCAGC' in E-coli, when I hit a snag. Several hundred of my known locations weren't showing up.

Why? Because the Perl regular expression engine, by default, starts searching for the next occurrence of something after the end of the occurrence it just found. This is what most humans want. But you may notice that in the string 'CCAGCCAGC' the thing I'm searching for ('CCAGC') overlaps itself, so the regex engine doesn't see the second one.

"Crap," I thought.

But this is Perl -- maybe there's a way? 30 seconds in the documentation (perldoc -f pos) and it said I could assign to pos(). Really? Sweet! Problem solved!


#!/usr/bin/perl

use strict;

open (IN, "E_coli.seq");
my $seq = <IN>;
chomp $seq;
close IN;

my $find_this = 'CCAGC';
while ($seq =~ /$find_this/g) {
   my $start = pos($seq) - length( $find_this ) + 1;
   my $stop  = pos($seq);
   pos($seq) = $start;
   print "   Found $find_this at [$start..$stop]\n";
}