Announcing scribl-rails
Sometimes back I had mentioned about scribl javascript framework for drawing bioinformatics glyphs on HTML5 canvas. If you are a Rails developer you will be happy to know that I have written scribl-rails, an asset helper for including scribl in your application asset pipeline.
Usage
Add the following to your gemfile:
gem 'scribl-rails'
ran bundle install from the application directory
Add the following directive to your Javascript manifest file (application.js):
//= require scribl
Enjoy using scribl-rails and creating cute bio-graphics! Many thanks to Chase Miller for the awesome library!
For more information and development check out the scribl-rails at github
Goodbye Steve Jobs
Use Scribl to draw genomic glyphs on HTML5 canvas
The Scribl library by Chase Miller is an awesome and a promising Javascript library for visualizing biological sequence information and rendering it on the web. Scribl generates biological charts of genomic regions, alignments, and assembly data. The library is under continuous development and I have been able to use it for some internal projects!
A very nice list of examples and introduction is available at the home page and the wiki provides an elaborate user guide!
Happy biology!
Keep track of Bioruby plugins
Biogems.info is a new site for keeping track of new and existing Bioruby plugins. Plugins are separate code libraries that split functionality out of the Bioruby main tree. The idea is to have a core Bioruby release and to allow Ruby developers to contribute to Bioruby through plugins. According to Bonnal, the maintainer of biogem (the bio-plugin crafting tool), plugins are separately maintained and may represent experimental or work in progress.
To read more about Bioruby plugin system please refer to the wiki page on plugins.
Happy biology!
Bioruby 1.4.2 released!
The Bioruby development team has continued to work tirelessly to bring us the latest release of the Ruby bioinformatics library commonly referred to as bioruby. A list of all the new changes is available here . One of the most pleasant news for beginners is that the Bioruby tutorial has been updated thanks to Michael O’Keefe and Pjotr Prins. The Release is largely a bug fix release with updates on web services from SOAP to REST interfaces. Upgrading to the latest release is easy…
gem update bio
or
gem install bio
Happy biology!
Processing netMHCII-pan prediction output
Like most informatics throughput methods, epitope prediction generates a lot of output and in a not so friendly format suitable for subsequent analysis. I considered writing a parser for the output using Ruby, but would that not take long? A simple vim function that I added to my .vimrc file to format the output and use a single keystroke worked the magic and saved time.
" formating output from netMHCII-pan program function! FormatNetmhcOutput() g/^\#/norm dd g/^--/norm dd g/^Protein/norm dd %le g/^pos/norm dd %s/<=\sWB//g %s/<=\sSB//g %s/\s\+$// %s/\s\+/,/g g/^$/d endfunction nmap ;h :call FormatNetmhcOutput()
This function can be called by pressing the ; and h key when in normal mode. It removes comments and provides a csv output that can be read with a simple R directive.
data <– read.csv("file.csv")
Convert a fastA file to a hash
Sometimes you might want to convert a file of fastA sequences to a hash. Here is a one line method that might come handy for that.
require 'bio' file_path = "example.fasta" def fasta_to_hash Bio::FlatFile.auto(file_path){ |f| f.map {|entry| Hash.[](entry.definition.to_sym,[entry.seq.to_s])}} end #=>[{:"seq1"=>["gatataggagatatcgttagag"]}]
The result is an array of hashes. Each hash key corresponds to the sequence name
My general purpose bioinformatics toolbox
I spend most of my time writing code and using an range of bioinformatics analysis packages. Unlike in many other professions, sometimes there are no best tools for accomplishing a bioinformatics task. The tools are continuously improved and the choice of tools is dependent on the research question and the biology of whatever you are investigating. However I have come to rely on some general purpose resources that make me more productive. Let me introduce you to my general purpose spanner box.
Code Editing
I have finally found nirvana in MacVim, which is the preferred version of Vim for Mac OS. It allows screen splitting, window resizing and integrates with the console, such that you can run system commands right in the editor. You have to install the necessary scripts or plugins to support what you want to do.
It has increased my productivity, although it has a slightly steep learning curve. This is a tool I recommend to any bioinformatician, if you are not already using it!
Cost: Free
Source Control
I use git for source control. It is awesome and fits very well with my workflow. Git has powerful features and easy to use and work with. I like the idea of distributed source control and it makes it easier to work on different versions of the same project!
Cost: Free
Bibliography manager
I use Papers, which is a commercial tool but I would recommend it to anyone. It helps me sort,annotate and read research articles. I once used Mendeley, which is an awesome tool as well.
Cost: $42 (has academic discounts)
Terminal
One of the best tools which we may forget is a tool is the terminal! Since I use Mac OS, I enjoy the best of both worlds, a powerful Unix command line support, excellent graphics and support for proprietary software if need arises.
Pen
I don’t keep an electronic notebook since I prefer jotting down my notes and having a Notebook. I use a liquid ink Pilot V ball RT pen. The pen has a retractable cone-tip liquid ink rollerball, rubber grip and metal pocket clip. It is airplane-safe and writes a 0.4mm line.
Cost: $5
NoteBook.
My preference for a notebook is an A4 D66174 Notebook. Each book has about 180 pages. It comes with a protective handcover. This is an archive for my written thoughts, discussions and workflows.
Cost $90 for a pack of five books.
What is your general purpose bioinformatics toolbox?
Translating a nucleotide sequence in six frames with bioruby
Bioruby offers a very easy and simple way to translate nucleotide sequences.
seq= Bio::Sequence::NA.new("acctatagctctagcta")
seq.translate
We know that there are six posible reading frames for any given nucleotide sequence. Generally the longests Open reading frame is taken to be the correct frame, when we do not have information about the possible protein that is encoded by a given gene. By default the translate method performs translation in the first frame but it can take an argument that defines the translation frame
seq.translate(2) #translate using the second reading frame.
Given a long list of sequences how do we quickly determine the correct reading frame. We would want to have method to translate a given sequence in all frames and pick the longest reading frame. Assuming that the correct reading frame has no stop codons, we can write a quick method to perform the six frame translation.
def longest_reading_frame(sequence) orfs = [] #a container for orfs(open reading frames) #translate a sequence in all 6 frames 6.times do |frame| translated = Bio::Sequence::NA.new(sequence).translate(frame + 1) stop_codons = translated.scan(/\*/).size orfs << translated if stop_codons == 0 end orfs[0] end
This method uses an array to collect all translated sequences that contain no stop codons and returns the first sequence in the array. This might not scale very well for very long sequences but that will be a post for another day!
Happy Biology!