[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Omaha.pm] pos() - WHERE did my regex match?



pos() is neat. Rarely do I care WHERE a regex hit a string, but in the example below I do care, very deeply, WHERE the hits were. Enter pos().

The part of my code that uses pos():

while ($seqstr =~ /$primer_seq/g) {
printf(" Found '%s'. Next attempt at character %s\n", $&, pos($seqstr)+1);

Yoinked from this website:
 http://www.regular-expressions.info/perl.html
 Finding All Matches In a String

That website is actually more helpful than (perldoc -f pos)

I end up Googling for this about once a year.  :)

Cheers,

j




primer_finder.pl

#!/usr/bin/perl

use Bio::SeqIO;

# A hash of all our known primers...
my %primers;
$primers{"18S_F"} = uc("attggagggcaagtctggtg");
$primers{"18S_R"} = uc("ctatgccgactagggatcgg");
$primers{"M1"} = "GGAAGTAAAAGTCGTAACAAGGTT";
$primers{"I1"} = "CCGTAGGTGAACCTGCG";
$primers{"I4"} = "GCATATCAATAAGCGGAGGA";
$primers{"H2R8"} = "CCTCGGATCAGGTAGGGATAC";
$primers{"I2"} = "GCATCGATGAAGAACGCAGC";
$primers{"I3"} = "CGAGTCTTTGAACGCACATTG";

my $io = Bio::SeqIO->new(
  #-file => '/home/dbastola/genbakDownload/161_88107/gbbct24.seq',
  -file => 'fake_data.gbk',
  -format => 'genbank'
);

while (my $seq = $io->next_seq) {
  # $seq is now a Bio::Seq object
  my $acc = $seq->accession;
  my $seqstr = uc($seq->seq);
  print "Searching $acc...\n";
  foreach my $primer_name (keys %primers) {
     my $primer_seq = $primers{$primer_name};
     print "   looking for $primer_name ($primer_seq)...\n";
     while ($seqstr =~ /$primer_seq/g) {
printf(" Found '%s'. Next attempt at character %s\n", $&, pos($seqstr)+1);
        my $start = pos($seqstr) - length( $primer_seq ) + 1;
        my $stop = pos($seqstr);
        print "   Hey, I found $primer_name at [$start..$stop]\n";
     }
  }

}