[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Omaha.pm] "Command-Line Bioinformatics"

To: "Perl Mongers of Omaha, Nebraska USA" <omaha-pm@pm.org>
Subject: Re: [Omaha.pm] "Command-Line Bioinformatics"
From: "kiran bina" <kiranbina@gmail.com>
Date: Wed, 28 Feb 2007 09:07:55 -0600
Delivered-to: mailman-omaha-pm@mailman.pm.dev
Delivered-to: omaha-pm@pm.org
In-reply-to: <F1B665E9-138C-43D2-B2C1-F891433C1B8C@jays.net>
List-archive: <http://mail.pm.org/pipermail/omaha-pm>
List-help: <mailto:omaha-pm-request@pm.org?subject=help>
List-id: "Perl Mongers of Omaha, Nebraska USA" <omaha-pm.pm.org>
List-post: <mailto:omaha-pm@pm.org>
List-subscribe: <http://mail.pm.org/mailman/listinfo/omaha-pm>, <mailto:omaha-pm-request@pm.org?subject=subscribe>
List-unsubscribe: <http://mail.pm.org/mailman/listinfo/omaha-pm>, <mailto:omaha-pm-request@pm.org?subject=unsubscribe>
References: <F7C1E903-1712-40A5-B817-8CDAADECEBF4@jays.net> <F1B665E9-138C-43D2-B2C1-F891433C1B8C@jays.net>
Reply-to: "Perl Mongers of Omaha, Nebraska USA" <omaha-pm@pm.org>

I am glad I am using perl and not python or any thing else. Thanks to Bob who got me started with perl and Jay with bio-perl helping me every step of the way. :-)

On 2/28/07, Jay Hannah <jay@jays.net> wrote:

Reading this article:
http://www.linuxjournal.com/article/6977
Sequencing the SARS Virus - Linux Journal, Nov 2003

This guy needs Perl and/or BioPerl.  :)

> The sequence file is in FASTA format consisting of a header line
> and the sequence, split into fixed-width lines. The following
> counts the number of Gs and Cs in the sequence and presents the
> total as a fraction of the total number of bases:
>
> > grep -v "^>" AY274119.fa | fold -w 1 |
> tr "ATGC" "..xx" | sort | uniq -c |
> sed 's/[^0-9]//g' | t -s "\012" " " |
> sed 's/\([0-9]*\) \([0-9]*\)/scale = 3;
> ↪\2 \/ (\1+\2)/' |
> bc -i
> scale = 3; 12127 / (17624+12127)
> .407
>
> Out of the 29,751 bases in our sequence, 12,127 are either G or C,
> giving a GC content of 41%.

BioPerl version:

use Bio::SeqIO;
my $io = Bio::SeqIO->new(
  -file   => ' AY274119.fa',
  -format => 'Fasta'
);
my $seq = $io->next_seq->seq;
print ( ($seq =~ tr/GC/GC/) / length ($seq) );

Command-line Perl:

perl -e '$/ = undef; $_ = <>; s/>.*//; s/\n//g; print tr/GC/GC/ /
length($_)' AY274119.fa

I'm sure you can Perl Golf my stabs at it.  :)

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah

_______________________________________________
Omaha-pm mailing list
Omaha-pm@pm.org
http://mail.pm.org/mailman/listinfo/omaha-pm

--
Dhundy R. Bastola
Assistant Professor
Department of Pediatrics
University of Nebraska Medical Center
Omaha NE 68198
Always reply to: dbastola@unmc.edu

References:
- [Omaha.pm] "Command-Line Bioinformatics"
  - From: Jay Hannah <jay@jays.net>

Prev by Date: [Omaha.pm] "Command-Line Bioinformatics"
Next by Date: Re: [Omaha.pm] Perl, Python, Ruby or PHP ...
Previous by thread: [Omaha.pm] "Command-Line Bioinformatics"
Next by thread: [Omaha.pm] DBI::Profile
Index(es):
- Date
- Thread