[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Omaha.pm] Fwd: [Bioperl-l] Bio::SeqIO -- add an ugly but fast grep hack?




It's humbling when people reduce 18 lines of my code down to a command-line one liner. :)

j


Begin forwarded message:
From: Jay Hannah <jay@jays.net>
Date: September 29, 2006 7:49:53 AM CDT
To: bioperl-l@bioperl.org
Cc: kiran bina <kiranbina@gmail.com>
Subject: Re: [Bioperl-l] Bio::SeqIO -- add an ugly but fast grep hack?

On Sep 14, 2006, at 10:58 AM, Amir Karger wrote:
From: Chris Fields [mailto:cjfields@uiuc.edu]
{
    local $/ = "//\n";
    while (my $gb = <>) {
        print $gb if $gb =~ m/Staphylococcus\sepidermidis/im;
    }
}

Perl Golf! (Untested, as all good Perl Golf should be.)

perl -wne 'BEGIN {$/="//\n"} print if /Staphylococcus\sepidermidis/ im/'
blah.gb > filtered.gb

Wow. You guys are amazing.

My version was a lot longer (Reverse Perl Golf!!):

   my @files = @{$self->{files}};
   my $file;
   foreach $file (@files) {
      open (IN, $file);
      my $locus;
      while (<IN>) {
         if (/^LOCUS/) {
            # A locus has begun.
            $locus = $_;
         } elsif (/^\/\//) {
            # A locus ends.
            $locus .= $_;
            if ($locus =~ /$args{grep}/s) {
               print OUT $locus;
            }
         } else {
            # A row inside a locus.
            $locus .= $_;
         }
      }
   }

I'm playing with an abstraction layer I'm calling "OpenLab". Here the grep() method does the work:

my $ol  = OpenLab->new();
# Load up just the "ATCC 12228" sequences from a directory...
my $ss1 = $ol->new_SequenceSet(name => "Organism1");
$ss1->load(files => "$data_dir/*");
$ss1->grep(
   grep    => "ATCC 12228",
   storage => $data_tmp
);

I'll be replacing the implementation inside my class with your wizardry.

Thanks!

j
12 years of Perl later, still learning new tricks.  :)