THE WORLD OF AVRO'S

Tuesday, March 9, 2010

BIOTECH COMPANIES IN INDIA

Becton Dickinson India (BD)
CEO: Ram Sharma
6th Floor, Signature Towers B, South City 1, Gurgaon – 122 001
Tel: +91 (124) 2383566-71 Fax: +91 (124) 2383224-5
Website: www.bd.com
Amersham Biosciences
CEO: Ravindar Thirusangu
Corporate office, FF-3, Palani Center, 1st Floor, 32 Venkatnarayana Road,
T Nagar, Chennai - 600 017
Tel.: +91 (044) 24340747/24338582 Fax: +91 (044) 24323770
Ranbaxy Laboratories Ltd
CEO: DS Brar, managing director
19, Ranbaxy Laboratories Ltd, Nehru Place, New Delhi - 19
Tel.: +91(011) 26002075 Fax: +91 (011) 26002074
Website: www.ranbaxy.com
Bio-Rad laboratories
CEO: Dhiren Wagle, country manager
B and B1, Enkey Towers, Udyog Vihar Phase 5, Gurgaon – 122 016
Tel.: +91 (0124) 2398112/13/14 Fax: +91 (0124) 2398115
Website: www.biorad.com
Quintiles Spectral (India) Pvt Ltd
CEO: Dr Ferzaan Engineer, managing director
3 Ashoknagar Bungalows behind Sundarvan,
Satellite road, Ahmedabad - 380 015
Tel.: +91 (079) 6923303 Fax: +91 (079) 6924311
Website: www.quintiles.com
SIRO Clinpharm Pvt Ltd
CEO: Dr Gautam Daftary
106, Lady Ratan Tata Medical and Research Center,
MK Marg, Cooperage, Mumbai - 400 021
Tel.: +91 (022) 22844011/22843892/22885795
Fax: +91 (022) 22844169/22885791
Website: www.siroindia.com

Shanthas Biotechnics
CEO: KI Varaprasad Reddy
Serene Chambers, 3rd Floor, Banjara Hills,
Road No. 7, Hyderabad - 500 034
Tel.: +91 (040) 23608843/23548507/23543010
Fax: +91 (040) 23548476/23541713
Website: www.shanthabiotech.com
Span Diagnostics
CEO: PK Desai, CMD
173-B, New Industrial Estate, Udhna, Surat - 394 210
Tel: +91 (261) 2277211/2277143/2279583/2274528
Fax: +91 (261) 2279319/3665757
Website: www.span.co.in
Wipro HealthScience CEO: TK Kurien, No. 92, 2nd Main Road,
Keonics Electronic City, Bangalore – 560 100
Tel.: +91 (080) 51103070 Fax: +91 (080) 51103083
Website: www.wipro.com/healthscience
Wockhardt CEO: Habil Khorakiwala, chairman,
Wockhardt Towers, Bandra Kurla complex, Bandra (East),
Mumbai - 400 051
Tel.: +91 (022) 26534444 Fax: +91 (022) 26534242
Website: www.wockhardt.com
Eli Lilly & Co. CEO: Rajiv Gulati, managing director and chairman
Plot no-92, Sector 32, Institutional Area, Gurgaon – 122 001
Tel.: +91 (0124) 223000 Fax: +91 (0124) 2823012-14
Website: www.lilly.com
Ranbaxy Laboratories CEO: DS Brar, managing director 19, Ranbaxy Laboratories Ltd,
Nehru Place, New Delhi - 19
Tel.: +91(011) 26002075 Fax: +91 (011) 26002074
Website: www.ranbaxy.com

Haffkine Bio-Pharmaceutical. CEO: Medha Gadge, managing director
Acharya Donde Marg, Parel, Mumbai
Tel.: +91 (022) 24129320-22/24129224 Fax: +91 (022) 24168578
Website: www.vaccinehaffkine.com
Krebs Biochemicals
CEO: Dr RT Ravi
401/ 402, Cyber Heights, 4th Floor, Plot No 13,
Road No. 2, Banjara Hills, Hyderabad – 500 033
Tel.: +91 (040) 23540325 Fax: +91 (040) 23540320
Website: www.krebsbiochem.com
Nicholas Piramal
CEO: Ajay Piramal, chairman
100, Center Point, Dr Ambedkar Road, Parel, Mumbai - 400 012
Tel.: +91 (022) 56636666/56636441 Fax: +91 (022) 24172851
Website: www.nicholaspiramal.com
Bharat Serums and Vaccines
CEO: Dr Gautam Daftary, managing director
52, Mittal Chambers, Nariman Point, Mumbai – 400 021
Tel.: +91 (0 22) 56560900 Fax: +91 (022) 56560903
Website: www.bharatserums.com
Millipore
CEO: Subhash Bagaria
50A Ring Road, Phase – II, Peenya, Bangalore - 560 058
Tel.: +91 (080) 8396320 Fax: +91 (080) 8396345
Website: www.millipore.com
Agilent
Management Team:
Dr G Ramakrishnan, vice president (Agilent LSCA)
Chandiwalan Estate, Maa Anandmai Marg, Kalkaji, New Delhi - 19
Tel.: +91 (011) 26826067 Fax: +91 (011) 26826068
Website: www.agilent.com

Indian Immunologicals
CEO: KV Balasubramaniam
Road No. 44, Jubilee Hills, Hyderabad – 500 033
Tel.: +91(040) 23544585 Fax: +91 (080) 23544007
Website: www.indimmune.com
Biocon India
CEO: Kiran Mazumdar Shaw
20th KM Hosur Road, Electronic City, Bangalore – 561 229
Tel.: +91(080) 8523430-34 Fax: +91(080) 8523423
Website: www.biocon.com
Panacea Biotec
CEO: Soshil Kumar Jain
B-1 Extn. /A-27, Mohan cooperative industrial estate, Mathura road, New Delhi - 110 044
Tel.: +91 (011) 26945270/51678000/51679000
Fax: +91(011) 26940199/26940621
Website: www.panaceabiotec.com
Punjab Biotech Park
CEO: Gurmeet S Bhatia
Association Bhawan, Bay no. 143-144, Industrial phase 7,
Mohali – 160 055
Tel.: +91-172-2236576 Fax: +91-172-2236948
International Biotech Park, Pune
CEO: Balaji Rao
TCG Urban Infrastructure Holdings Ltd, 304, Enterprise center, Vile Parle (East), Mumbai – 400 099.
Tel.: +91-22-26172555 Fax: +91-22-56772575
International Pharma & Biotech Park, Ahmedabad
CEO: Shekhar Patel
Ganesh Housing Corporation Ltd, Ist floor, Samudra Complex, CG Road, Ahmedabad – 380 086
Tel.: +91-79-6562345/6560226/6561350 Fax: +91-79-6421085

Shapoorji Pallonji Biotech Park
CEO: S Dhawan
No. A-13, Ground Floor, Street No.3, Indian Airlines Colony, Begumpet, Secunderabad - 500 003.
Tel.: +91-40-27900893 Fax: +91-40-27901428/2009.
Website: www.spbiotechpark.com
Bangalore Biotech Park
CEO: MK Shankaralinge Gowda, secretary, Dept of IT & BT
Government of Karnataka, No 9, III Floor, UNI Building, Thimmaiah Road, Bangalore - 560 052.
Tel.: +91-80-22262466/22262450 Fax: +91-80-22288341
KINFRA
Kerala Industrial Infrastructure Development Corporation (KINFRA)
CEO: Dr GC Gopala Pillai
Kerala Industrial Infrastucture Development Corporation, Kinfra House, TC 31/2312, Sasthamangalam, Thiruvananthapuram – 10
Tel.: +91-471-2726585 Fax: +91-471-2724773
Website: www.kinfra.com
TICEL Bio Park Ltd
TIDCO Center for Life Sciences (TICEL Bio Park)
CEO: KR Vishwanathan, Ticel Biopark
Taramani Road, Taramani, Chennai - 600 113
Tel.: +91-44-22542060/61 Fax: +91-44-22542055
Website: www.ticelbiopark.com
Mahindra Industrial Park
CEO: BG Menon
Arjay Apex Center, 1st Floor, 24, College Road, Chennai - 600 006
Tel.: +91-44-28212893-94 Fax: +91-44-28212895
Website: www.mahindracity.com
GlaxoSmithKline Pharmaceuticals
CEO: S.Kalyanasundaram,
252, Dr. Annie besent road, Worli, Mumbai-400025
Tel: 91-22-24959595 Fax: 91-22-939744
Web: www.gsk-india.com

MAHYCO-MONSANTO BIOTECH
Maharashtra Hybrid Seeds Corporation (Mahyco)
CEO: Dr MK. Sharma
Avra Laboratories Pvt. Ltd.Avra House, 54 Sai Enclave, Habsiguda, Hyderabad - 500 007, INDIA.
Tel : 00 91 40 27179147, 27178568 Fax : 00 91 40 27175605, 27179149
email: hyd2_avralab@sancharnet.in, info@avralab.com, avra_chandra@sancharnet.in
AVT BIOTECH
The Sheveroy Estates Ltd.
( A.V. Thomas Group )
Plot No. 10 & 11, CEPZ,Cochin 682 030 India
Tel: 0091- 484 - 2413177,2413179,2413194 Fax : 0091 - 484 - 2413193
E-mail : avtbiotech@vsnl.net
Olympus Biotech India Pvt.Ltd
Address: 39, Lower Range, Calcutta 700019, INDIA
Phone: 91 33 247-3081 Fax: 91 33 240-5623
E-Mail: qasid@cal2.vsnl.net.in
IndiaCo Ventures (P) Ltd.
India Contact
Indiaco iCenter, 214 LBS Road,Pune 411030, India.
Tel: 091-20-5602-1357, 091-20-5602-7155 Fax: 091-20-2433-4978
Email: info@indiaco.com
US Contact: 3940, Freedom Circle, Santa Clara, CA 95050, USA
Tel: 408-200-7420 Fax: 408-200-7401
iPharma India Private Limited
4th Floor, Raj Mahal,84, Veer Nariman Road,Churchgate, Mumbai 400020
Tel: (022)22836965/66/67/68 Fax : (022)22024261
E-Mail: ipharma@vsnl.com
IvyComm Systems
Contact Person: Mr. Puneet Aggarwal
Address: 1405, Sector 14 Faridabad - 121007 Haryana INDIA
Tel: 91-129-2284107 Fax : 91-129-28803781
http://www.herbscancure.com

Packdrome International
Contact Person: Mr. Sandeep Kumar Goyal
206 Anchal Plaza Nelson Mandela Road, Vasant Kunj New Delhi - 110070 Tel: 91-11-26131663 Fax: 91-11-26131663
http://www.packdrome.com
Aurocomp
Contact Person: Mr. Madhusudan
14, Perumal Koil Street Stuti Apartments Pondicherry - 605001
Tel: 91-413-2343380 Fax: 91-413-2344641
Home :http://aurocomp.com/
Shreeji Pharma International
Contact Person: Mr. Nilesh Sheth
FF/10, Narsinghdham Complex, Sangam Charrasta, Harni Road, Vadodara - 390 018
Tel: 91-265-2465036 Fax : 91-265-2465036
Home :http://www.shreejipharma.com
Niyaz Agro
Contact Person: Mr. Ravi Chandran
31, Balaji Nagar, Padi Chennai - 600080 Tamilnadu
Tel : 91-44-26541131 Fax : 91-44-26541131
Metro Agrotech
Contact Person: Mr. Saeed H Chanda
Opp. Darpan Talkies Station Road Dahod Gujarat INDIA
Tel: 91-2673-224786, Home: http://www.sennacassia.com
Krishna Tradings
Contact Person: Mr. Dev Mohanty
123, Atlanta Nariman Point Mumbai - 400021 Maharashtra INDIA
Tel: 91-22-22886085 Fax: 91-22-22837494
M.S.S. Asan & Son
Contact Person: Mr. S. Prakash
93, Balamore Road Nagercoil - 629 001 Tamil Nadu INDIA
Tel : 91-4652-227422 Fax : 91-4652-224373
Home :http://www.mssasan.com
Avon Organics Ltd
Contact Person: Mr. Prabhu Ramachandran
Madhupala Tower, Greenlands, Opp Hotel Green Park, Ameerpet Hyderabad - 500016
Tel: 91-40-23414432 / 23405042 Fax : 91-40-23404438
Home :http://www.avonorganicsltd.com
Prakruti Products
Contact Person: Mr. M. R. Shetty
Ground Floor Sagar Complex, Maruti Temple Road Karwar - 581301 Karnataka INDIA
Tel: 91-8382-220000 / 225000 Fax : 91-8382-222400
Home :http://www.prakruti.com
Ceear Impex
Contact Person: Mr. Bala Yogi
4/1, Kumudh Apartments Lakshmipuram First Street Royapettah Chennai - 600014 Tamilnadu
Tel: 91-9884014570
Home: http://www.goldjewelbiz.com
Gaurav Pharma Pvt Ltd
Contact Person :Mr. Vikas Singhal
201-202, Suneja Towers - II District Centre, Janakpuri New Delhi -110058 INDIA
Tel : 91-11-25591435 / 25509020 / 25559310 Fax : 91-11-25591413
Raj Pharmaceuticals
Contact Person: Ms. Beena Mehta
Nanavat Main Road, Above Akhandjyoti Gas, 2nd Floor, Surat - 395003 Gujarat INDIA
Tel: 91-9825098496 / 91-9825127771 Fax : 91-261-2411964
Home :http://www.akhandjyotigasindia.com
Sharda International
Contact Person: Mr. Nilesh Desai
Shree Krishna Bhavan Sahar Mumbai - 400099
Tel : 91-22-28302252 Fax : 91-22-28302252
Rohit Plastics & Chemicals
Contact Person :Mr. Rohit Garg
204 Ravi Kiran Building Off Link Road, Andheri (W) Mumbai - 400053
Tel : 91-22-26736677 Fax : 91-22-26736643
Home :http://www.rpcindia.com
Dava Infotech, Inc
Contact Person :Mr. Jay Patil
4th Floor, Samruddhi Venture Park, Central MIDC Road Andheri (E) Mumbai - 400 093
Tel : 91-22-28316033 / 34 / 35/ 36 Fax : 91-22-28382645
Dinesh Chemicals
Contact Person: Mr. Ramesh Babaria
Babaria House 22/8, Baij Nath Para Raipur - 492001 Chhattishgarh INDIA. Tel: 91-771-2227109 / 5050849 Fax: 91-771-2537662
D.S.A Exports
Contact Person: Mr. Inder Chopra
7/1, Usha Ganj Indore - 452001 Madhya Pradesh INDIA
Tel: 91-0731-5041212 Fax: 91-0731-2702088
http://free.hostdepartment.com/d/dsaexports
Brij & Co.
Contact Person: Mr. Brij Bansal
B-50 Kamla Nagar Agra - 292005 Uttar Pradesh INDIA
Tel: 91-562-2184818 Fax: 91-562-2184818
LT Pharma Pvt Ltd
Contact Person: Mr. Kedar Gadgil
7 Harshada 74 Rambaug Colony Paud Road Pune - 411038 Maharashtra INDIA
Tel: 91-20-25434254 Fax : 91-20-25434254
Home :http://www.lateralthoughtpharma.com
Dheer Trading
Contact Person: Mr. Sahil Patel
Sector 4 Plot 243/5 Dheer Dental Clinic, G-14 Madhuban Complex, Oslo Cinema Road Gandhidham - 370201 Kutch, Gujarat INDIA
Tel: 91-95283636972
R. S. Bhargava Pharmacy Pvt. Ltd.
Contact Person: Mr. R.S. Bhargava
115-A, 2nd Floor, Hari Nagar, Asharm Chowk, New Delhi - 110014 INDIA
Tel : 91-11-26343568 / 26341058 Fax : 91-11-26343580
Pharma & Pharma Inc.
Contact Person: Mr. K. P. Rao
15-23/A, Kodandaram Nagar Dilsuk Nagar P.O. Hyderabad - 500060
Tel: 91-40-24067397 Fax : 91-40-24067397
Pragati Biocare
Contact Person: Dr. Suddhir Jain
No.81/3, Govindappa Road, Basvangudi Bangalore - 560004 Karnataka Tel: 91-80-26500265 Fax: 91-80-51204068
Pyramid Inc
Contact Person: Mr. K. R. Santhosh
'HOME MAID', 7/235 Vadookara, Trichur - 680007 Kerala INDIA
Tel : 91-487-2447290 Fax : 91-487-2252017
D. K. Pharmachem Pvt Ltd
Contact Person: Mr. Rakesh Bakshi
F-32 / W-6, M.I.D.C., Badlapur (E) Mumbai - 421 503 Maharashtara
Tel : 91-251-2690111 / 2691466 Fax : 91-251-2690366
Home :http://www.dkpharmachem.com
Anusha India Pvt Ltd
Contact Person: Mr. V K Jain
61/8 Sahibabad Industrial Area Site IV Ghaziabad - 201010
Tel : 91-120-24773065 Fax : 91-120-24712151
Sree Sai Lakshmi Traders
Contact Person: Mr. G. Bhaskara Ramam
50-94-12/3, Vijayarama Apts Santipuram Viskhapatnam - 530016
Tel: 91-891-2702958 Fax : 91-891-2701815
La Grand Exports
Contact Person: Mr. Parind Shah
501, Adamji Building 413, Narshi Natha Street Mumbai - 400009
Tel: 91-22-2355051 Fax : 91-22-2351660
Matex Exports Pvt Ltd
Contact Person: Mrs. Babitha P. Chintala
3, Shreeji Kiran, Tejpal Road Vile Parle (E) Mumbai - 400 057
Tel: 91-22-28242522 Fax : 91-22-28251591
Shreyas Chemicals
Contact Person: Mr. Jitendra N Shah
627, GIDC, Makarpura Industrial Area Shreyas House, Dandia Bazar, Baroda - 390 001
Tel : 91-265-2643337 / 2636134 Fax : 91-265-2644052
Halcyon Pharmaceuticals Ltd
Contact Person: Dr. Rajender Singh Lamba
Red Square Market S.C.F.-77/78 Hisar - 125001 Haryana
Tel: 91-1662-239950 Fax: 91-1662-231628
Kaipuzha Ayur Exports
Contact Person: Mr. Kaipuzha Rajan Pillai
Karavaloor, Punaloor Kollam - 691318 Kerala INDIA
Tel: 91-475-2251861 Fax : 91-475-2251358
Green Waves Pvt Ltd
Contact Person: Mr. G. Misra
Brahmeswarpatna Bhubaneswar - 751002 Orissa INDIA
Tel: 91-674-2430226 Fax: 91-674-2430224
B. K. Enterprises
Contact Person: Mr. C. M. Trivedi
F/11, Natasha Park Society No 1 Opp. Military Boys Hostel, Chhani Road, and Vadodara - 390 002. Tel: 91-265-2772905 Fax: 91-265-2772905
Unnati Enterprises
Contact Person: Mr. Akshay Lakhotia
11/1 Sarat Bose Road Kolkata - 700020 West Bengal INDIA
Tel: 91-33-22407071 Fax: 91-33-22407475
Cebon India Ltd
Contact Person: Mr. Puneet Pruthi
Plot no. 2 H.S.I.D.C. Gurgaon - 122001 Haryana INDIA
Tel: 91-1762-280797 Fax: 91-1762-282514
Bruklyn Pharma Distributors (P) Ltd
Contact Person: Mr. A. Jithendran
Alappat Cross Road, Perumanoor Cochin - 682015 Kerala INDIA
Tel: 91-484-2357746 Fax: 91-484-2356544
Goodrich Carbohydrates Ltd
Contact Person: Mr. Rajat Gupta
R-146, Model-Town S.C.O. 204, Sector-12 Karnal - 132001 Haryana
Tel: 91-184-2266293 Fax: 91-184-2265157
Seagull Laboratories (I) Pvt Ltd
Contact Person: MR. P.K.SHARMA
G-2, C - Block Community Centre Naraina Vihar New Delhi - 110028
Tel: 91-11-25770553 / 25775451 / 25771017 / 25771021
Fax: 91-11-25774230
Rawat Vit Chem (P) Ltd
Contact Person: Dr. R. S. Rawat
2A, Anand Ashram Near Paranjape School Talegaon Dabhade Pune – 410507, Tel: 91-2114-225947 Fax : 91-2114-225037
Cottex Mills
Contact Person: Mr. Rajiv Dadu
Rajkot Road Barnala - 148 101 Punjab INDIA
Tel : 91-1679-309832 Fax : 91-1679-309932
Blue Cross laboratories limited
Peninsula Chambers, Ganpatrao Kadam Marg, Lower Parel,
Mumbai 400 013. India.
Manufacturing Plants
A-12, MIDC, Ambad, Nasik 422 010. India.
L-17, Verna Industrial Estate, Verna, Goa 403 722. India. E –mail: nkk@bluecrosslabs.com
Elite Pharma Pvt. Ltd.
1210/11/12 G.I.D.C.,
Phase III, Vatva, Ahmedabad - 382 445
Phone : - +91 79 583 1668 Telefax : - +91 79 583 2998
KARNAVATI ENGINEERING LTD.
FACTORY
Survey No. 342, Nani Kadi, Taluka Kadi, Dist. Mehsana (N.G.), INDIA.
Phone: +91-2764 - 262463, 241464-65 Fax : +91-2764 – 242608
TISSUE CULTURE LAB
756, Prakruti Farm, Hirapur Chowkdi, Hirapur, Ahmedabad, INDIA.
Phone: +91-79 - 3855260 / 3855495 / 496
Anant group
194, Arvind Chambers, Gauri Studio Compound,
Western Express Highway, Andheri (E), Mumbai 400 069, India
Tel :0091 - 22 - 26820059,26820060,26823263,26821361,56901361 - 65 Fax : 0091 - 22 - 26830537, 26821425
Email : anantco@anantco.com
Website : www.anantco.com
Aurobindo Pharma Ltd.
Registered & Corporate Office
Plot # 2, Maitri Vihar, Ameerpet,Hyderabad – 500 038,Andhra Pradesh, Phone: +91 (40) 5572 5000 Fax: +91(40) 2374 1080 / 2374 6883
Email : info@aurobindo.com
Lambda Therapeutic Research Ltd.
42 Premier House 1, Bodakdev
Sarkhej Gandhinagar Highway, Ahmedabad 380 054, Gujarat, India
Phone: 91-79-2685 3995, 2685 3441, 2684 0080, Fax: 91-79-2684 0079
Email: lambda@lambda-CRO.com
CITADEL FINE PHARMACEUTICALS LTD.
Archana Towers
First Street, Sripuram Colony, St. Thomas Mount, Chennai 600016
Tel: +91 44 55875555, Fax: +91 44 22334275
E-mail: info@cabl.co.in
Goodcare Pharma Pvt Ltd
1, Gupta Lane, Calcutta – 700 006
Phone: +91-33-2272 6965 , 2269-2265 / 66 Fax : +91-33-2269 5705
Email: corporate@goodcarepharma.com
Goodcare Pharma Pvt Ltd New Delhi
Contact Person: Ms Sharma
Email: order@allayurveda.com
Lincoln Pharmaceutical Ltd
2nd Floor, Nirav Complex, Opp. Navrang School, Naranpura,
Ahmedabad - 380 014.
Phone : +91-79-2768 2704 / 2708 / 2347 / 2097
Surya Pharmaceutical Limited
S.C.O. 164-165, Sector 9-C , Madhya Marg Chandigarh INDIA - 160018
Phone : +91-172-5005000 Fax : +91-172-5076000
email: sales@suryapharma.com
Shasun Chemicals And Drugs Limited,
The Company Secretary
No. 3 Duraisamy Road, T. Nagar, Chennai 600017.
Phone: 91- 44 - 2434 8930 Fax: 91 - 44 - 2434 8924
E-mail: investors@shasun.com
Pharm Products Pvt. Ltd.
No.3362-AT, AF-9, 8th Street 11th Main Road, Anna Nagar, Chennai - 600 040, Tamilnadu, INDIA
Phone: (91 44) 6216635, 6213496 Fax: (91 44) 6222415
email: ppplm@vsnl.com
FACTORY & ADMINISTRATIVE OFFICEPharm Products Pvt. Ltd.
'Vijai' Medical College Road, Thanjavur - 613 007, Tamilnadu, INDIA
Phone: (91 4362) 239176,237471,237225 Fax: (91 4362) 231650
email: pppl@eth.net
Mangalam Drugs & Organics Ltd.
292, Princess Street, 2nd Floor, Mumbai - 400 002
Tel.: 91 22 22087969 / 7489 / 7411 Fax.: 91 22 22087074
Email. : mangalam_drugs@vsnl.com
dhootg@bom2.vsnl.net.in

BY
AVRO JYOTI DE
MERIT IIT

INTRODUCTION OF BIOPERL

INTRODUCTION TO THE BIOPERL

Bioperl is a collection of more than 500 Perl modules for bioinformatics that have
been written and maintained by an international group of volunteers. Bioperl is free
(under a very unrestrictive copyright), and its home is http://bioperl.org.
One of the most difficult things about Bioperl is getting started using it. This is due
to a scarcity of good documentation (which is being rectified) as well as the sheer size
of the Bioperl module library. This chapter will help you get started using the Bioperl
project software; it will guide you through the initial steps of getting the software,
installing it, and exploring the tutorial and example material that it provides.
After working through this chapter, you’ll be well prepared to delve deeper into the
riches of Bioperl, and, if you’ve also worked through the object-oriented chapters
earlier in this book, you’ll be in a good position to read the Bioperl code and contribute
to the project yourself.
The modules in Bioperl are written in the object-oriented style. Perl programmers
who do not know object-oriented programming can still use the Bioperl modules
with just a bit of extra information, as outlined in Chapter 3.
The Bioperl modules cover various areas of bioinformatics, including some you’ve
seen previously in this book. Although Bioperl includes some example programs, it is
not meant to be a collection of complete user-ready programs. Rather, it’s implemented
as a toolkit you can dip into for help when writing your own programs. Its
goal is to provide good working solutions to common bioinformatics tasks and to
speed your program development.
One of the best things about Bioperl is that it’s an open source project, meaning that
interested developers are invited to contribute by writing code or in other ways, and
the code is available to anyone interested. If you’ve learned enough about Perl for
bioinformatics to have worked through a good portion of this book, you’ll find
plenty of opportunity to get involved in Bioperl if you have the time and inclination.
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
292 | Chapter 9: Introduction to Bioperl
The Growth of Bioperl
The practice of freely distributing Perl software for bioinformatics over the Internet
began around 1992. Gradually, Perl became more and more popular for biology
applications.* The release of Perl Version 5 and its support for object-oriented programming
accelerated the development of reusable modules for biology at many
research centers around the world. The large-scale genome sequencing efforts, then
underway, provided much of the impetus, as well as the talent and funding, for these
efforts. The Bioperl project, officially organized in 1995, coalesced around one of
these bioinformatics research groups that was doing a good job of organizing the collective
volunteer effort such collaborative projects require. More information, including
a short history and list of contributors, can be found at the Bioperl site http://
bioperl.org. The Bioperl web site is frequently being updated and improved, and is
the primary source of Bioperl code and documentation.
Today, the Bioperl project has grown to a point where it is both useful enough, and
well enough documented, that it is a must for Perl programming in bioinformatics.
The documentation includes a program bptutorial.pl that comes with Bioperl,
which explains and demonstrates several areas of the project (more on that later in
this chapter).
Installing Bioperl
Installing Bioperl’s large collection of modules isn’t too difficult. It usually goes fairly
painlessly, even though there are a few extra installation steps due to the additional
outside programs and Perl modules on which Bioperl depends. You will probably
want to use Perl’s CPAN to fetch and install Bioperl.
INSTALL is a good document that covers Bioperl installation on Unix/Linux, Windows,
and Mac operating systems. It’s part of the Bioperl distribution and is located
at http://bioperl.org/Core/Latest/INSTALL. This location may change, but it’s easy to
find from the Download link on the main Bioperl web page. If you’re going to install
Bioperl, I recommend you read this document first; here, I’ll give an overview of
what’s available and add a few additional comments that may help with installation.
Here’s how to get Bioperl on different platforms:
• On Unix/Linux, if you download the tar file from the web site, you merely need
to untar it and go through the configure and make process by hand, as described
in the INSTALL file that comes with the distribution.
* See, for example, the article “How Perl Saved the Human Genome Project” by Lincoln Stein at http://bioperl.
org/GetStarted/tpj_ls_bio.html.
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
Installing Bioperl | 293
• If you have a Microsoft Windows machine with ActiveState’s Perl (http://www.
activestate.com), there’s a PPM file available for Bioperl; at the time of this writing,
it’s at http://bioperl.org/ftp/DIST/Bioperl-1.2.1.ppd.
• There is also a CVS repository for Bioperl from which you can fetch the most
current versions of the modules. But be careful: some newer versions of the modules
are implementing new features and have more bugs than are found in some
of the older, more stable releases. The details of how to install from CVS are also
available at the Bioperl web page.
All these methods for installing Bioperl are fine, but probably the most common way
for Perl programmers to install sets of modules is by way of CPAN. I discussed
CPAN in Chapter 1, but it’s worth discussing again as it relates to Bioperl.
To install Bioperl, you start by typing the following at the command line:
perl -MCPAN -e shell;
This gives the CPAN shell prompt:
cpan>
It’s often the case that a module you want to install may require other modules for its
proper operation, and perhaps one or more of these additional modules have not yet
been installed. The CPAN system includes a way to check to see what other modules
are required, and you can configure it to automatically follow and install missing
prerequisites.
Especially with a large collection of modules like Bioperl, you may expect to see such
prerequisites crop up. When I asked my CPAN session how it was configured:
cpan> o conf
one of the lines of output from that query was:
prerequisites_policy ask
I couldn’t find a way to list the options within my CPAN session, so I took a look at
the CPAN documentation in another window with perldoc CPAN, searched for the
string prerequisites_policy, and read the following:
prerequisites_policy
what to do if you are missing module prerequisites
('follow' automatically, 'ask' me, or 'ignore')
To handle the prerequisites automatically, I configured my CPAN session like so:
cpan> o conf prerequisites_policy follow
Now, if you have a slow Internet connection, setting the option this way can be a
problem, because some prerequisites may take a long time to install, and you’ll be
stuck waiting for everything to finish. In that case, you may prefer to be asked about
each prerequisite, so you can see if you have the needed time to do the installation.
Plan on the process taking at least a couple of hours. My experience using both a
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
294 | Chapter 9: Introduction to Bioperl
dial-up modem and a cable modem from a home office is that the time involved was
not too onerous.
As you probably know by now, you can see a summary of CPAN session commands
by typing help:
cpan> help
Display Information
command argument description
a,b,d,m WORD or /REGEXP/ about authors, bundles, distributions, modules
i WORD or /REGEXP/ about anything of above
r NONE reinstall recommendations
ls AUTHOR about files in the author's directory
Download, Test, Make, Install...
get download
make make (implies get)
test MODULES, make test (implies make)
install DISTS, BUNDLES make install (implies test)
clean make clean
look open subshell in these dists' directories
readme display these dists' README files
Other
h,? display this menu ! perl-code eval a perl command
o conf [opt] set and query options q quit the cpan shell
reload cpan load CPAN.pm again reload index load newer indices
autobundle Snapshot force cmd unconditionally do cmd
cpan>
To find the Bioperl distribution, I typed:
cpan> i /bioperl/
and got the following output:
cpan> i /bioperl/
CPAN: Storable loaded ok
Going to read /root/.cpan/Metadata
Database was generated on Sun, 27 Apr 2003 04:12:50 GMT
Bundle Bundle::BioPerl (C/CR/CRAFFI/Bundle-BioPerl-2.04.tar.gz)
Distribution B/BI/BIRNEY/bioperl-0.05.1.tar.gz
Distribution B/BI/BIRNEY/bioperl-0.6.2.tar.gz
Distribution B/BI/BIRNEY/bioperl-0.7.0.tar.gz
Distribution B/BI/BIRNEY/bioperl-1.0.2.tar.gz
Distribution B/BI/BIRNEY/bioperl-1.0.tar.gz
Distribution B/BI/BIRNEY/bioperl-1.2.1.tar.gz
Distribution B/BI/BIRNEY/bioperl-1.2.tar.gz
Distribution B/BI/BIRNEY/bioperl-db-0.1.tar.gz
Distribution B/BI/BIRNEY/bioperl-ext-0.6.tar.gz
Distribution B/BI/BIRNEY/bioperl-gui-0.7.tar.gz
Distribution C/CR/CRAFFI/Bundle-BioPerl-2.04.tar.gz
Module Bio::LiveSeq::IO::BioPerl (B/BI/BIRNEY/bioperl-1.2.1.tar.gz)
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
Installing Bioperl | 295
13 items found
cpan>
From looking at the Bioperl web page, I knew that the Bundle::BioPerl had extra
useful modules in it that Bioperl uses.
I started the installation by first installing the Bundle:
cpan> install Bundle::BioPerl
This process fetched the module code from a CPAN repository, unpacked it, tested
it, and installed it, including its prerequisites (without questions asked because I set
the follow option as described earlier in this section).
Somewhat fortified by my success, I decided to go for broke and install the main Bioperl
distribution.
From the search output from my CPAN query, i /bioperl/, just shown, I saw that
the highest numbered release was 1.2.1. Just to be sure it was the latest release, I also
spent some time at the Bioperl web site reading the news about the latest releases, so
I was quite sure it was what I wanted.
I also spent some time reading the INSTALL file, and I knew that I was generally in
good shape. I had a new-enough version of Perl (Version 5.8.0) and was on one of
the standard supported platforms.
I installed it on a notebook computer with an Intel 686 processor that had the
RedHat Linux 7.2 operating system. The computer and the operating system were
about two years old, old enough that I did do some looking around on the Bioperl
web site to see if there were any advisories about which versions of Linux were recommended
or advised against.
In general, modern computer systems are complex, and they change rapidly. Operating
systems and hardware have a replacement cycle of about two years. Some of this
is planned obsolescence; some is legitimate technical progress. Whatever the cause,
the result is that a system such as the one I’m describing needs several pieces to be in
sync. The hardware, operating system software, Perl version, and Bioperl version
have to coexist; other pieces such as the C compiler, web browser, web server, and
so forth may also cause problems. Hence my interest in checking to see if there were
any advisories on these topics.
However, I didn’t see any warnings about my particular platform. And since I had
recently installed the latest version of Perl with success, I felt I had performed due
diligence and that I should go ahead and try to install the Bioperl modules.
To do so, I typed:
cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
There followed a great amount of activity as CPAN got the distribution from the
Internet and tested the various modules. After a time, the tests were complete; a few
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
296 | Chapter 9: Introduction to Bioperl
of them had failed, and CPAN decided not to install the modules. Since only a handful
of module tests failed, I looked at the output on my screen and decided that the
failures were only in peripheral parts of Bioperl I didn’t have an immediate use for,
and if I ever did need them, I could fix them later.
So, to finish the installation, I forced CPAN to do the installation despite the presence
of some test failures:
cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
This resulted in the modules and the documentation being installed.
Testing Bioperl
To check that things were working okay with my new Bioperl installation, I first
wrote a little test to see if a Perl program could find the Bio::Perl module:
use Bio::Perl;
I ran it by putting it in a file bp0.pl and giving it to the Perl interpreter:
[tisdall]# perl bp0.pl
[tisdall]#
As you see, it didn’t complain, which means Perl found the Bio::Perl module. If it
can’t find it, it will complain. When I copied the bp0.pl program to a file called bp0.
pl.broken and changed the module call of Bio::Perl to a call for the nonexistent
module Bio::Perrl, I got the following (slightly truncated) error output:
[tisdall@coltrane development]$ perl bp0.pl.broken
Can't locate Bio/Perrl.pm in @INC
BEGIN failed--compilation aborted at bp0.pl.broken line 1.
Second Test
Now I knew that Bio::Perl could be found and loaded. I next tried a couple of test
programs that are given in the bptutorial.pl document. I found a link to that tutorial
document on the Web from the http://bioperl.org page. Alternately, I could have
opened a window and typed at the command prompt:
perldoc bptutorial.pl
I went to the section near the beginning of the document called “I.2 Quick getting
started scripts” and created a file tut1.pl on my computer by pasting in the text of
the first tutorial script:
use Bio::Perl;
# this script will only work with an internet connection
# on the computer it is run on
$seq_object = get_sequence('swissprot',"ROA1_HUMAN");
write_sequence(">roa1.fasta",'fasta',$seq_object);
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
Testing Bioperl | 297
I then ran the program and looked at the output file:
[tisdall]$ perl tut1.pl
[tisdall]$ cat roa1.fasta
>ROA1_HUMAN Heterogeneous nuclear ribonucleoprotein A1 (Helix-destabilizing protein)
(Single-strand binding protein) (hnRNP core protein A1).
SKSESPKEPEQLRKLFIGGLSFETTDESLRSHFEQWGTLTDCVVMRDPNTKRSRGFGFVT
YATVEEVDAAMNARPHKVDGRVVEPKRAVSREDSQRPGAHLTVKKIFVGGIKEDTEEHHL
RDYFEQYGKIEVIEIMTDRGSGKKRGFAFVTFDDHDSVDKIVIQKYHTVNGHNCEVRKAL
SKQEMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSG
DGYNGFGNDGGYGGGGPGYSGGSRGYGSGGQGYGNQGSGYGGSGSYDSYNNGGGRGFGGG
SGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSS
SSSSYGSGRRF
[tisdall]$
That seemed to work perfectly.
Third Test
I tried the next short script from the same section of the tutorial, pasting it into a file
called tut2.pl:
[tisdall]$ cat tut2.pl
use Bio::Perl;
# this script will only work with an internet connection
# on the computer it is run on
$seq_object = get_sequence('swissprot',"ROA1_HUMAN");
# uses the default database - nr in this case
$blast_result = blast_sequence($seq);
write_blast(">roa1.blast",$blast_report);
[tisdall]$ perl tut2.pl
-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: libwww-perl/5.69
Content-Length: 178
Content-Type: application/x-www-form-urlencoded
...
---------------------------------------------------
Submitted Blast for [blast-sequence-temp-id]
[tisdall]$ cat roa1.blast
[tisdall]$ ls -l roa1.blast
-rw-rw-r-- 1 tisdall tisdall 0 Apr 30 11:28 roa1.blast
[tisdall]$
Here, I experienced a problem. Running the program created a page of error output,
mostly formatted in HTML, which I’ve truncated in the output. When I tried to cat
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
298 | Chapter 9: Introduction to Bioperl
the output file, there was nothing in it, which I verified (on Linux) by examining the
file with ls -l, which showed that it had 0 bytes in it.
This wasn’t very encouraging; the second of the two short example scripts was failing.
Looking at the two programs tut1.pl and tut2.pl, you can see that the second is an
extension of the first; after getting the sequence, the second program also tries to
blast it, and this is where it is failing. It’s running (printing those ... dots took a while
because the program was waiting for a reply from NCBI over the Internet), but it’s
not printing out the BLAST report to the file roa1.blast. What’s going wrong?
I started my investigation by looking at the documentation for the Bio::Perl module
by typing:
perldoc Bio::Perl
and searching for the function that is failing, blast_sequence. Here’s what I found:
blast_sequence
Title : blast_sequence
Usage : $blast_result = blast_sequence($seq)
$blast_result = blast_sequence('MFVEGGTFASEDDDSASAEDE');
Function: If the computer has Internet accessibility, blasts
the sequence using the NCBI BLAST server against nrdb.
It choose the flavour of BLAST on the basis of the sequence.
This function uses Bio::Tools::Run::RemoteBlast, which itself
use Bio::SearchIO - as soon as you want to more, check out
these modules
Returns : Bio::Search::Result::GenericResult.pm
Args : Either a string of protein letters or nucleotides, or a
Bio::Seq object
That last sentence about the Args gave me a clue. I went back and looked at the failing
program, which I placed in a file called tut2.pl, and, sure enough, the argument
for the blast_sequence function, $seq, is neither a string of sequence nor a Bio::Seq
object. In fact, it’s not even defined! Clearly, what the tutorial writer meant instead
of $seq, was $seq_object, which is defined and populated by the earlier get_sequence
method call.
I was visited by a brainwave, and I decided to put a use strict and a use warnings
into the program and to declare variables with my to see what ensued:
[tisdall]$ cat tut2.pl
use Bio::Perl;
use strict;
use warnings;
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
Testing Bioperl | 299
# this script will only work with an internet connection
# on the computer it is run on
my $seq_object = get_sequence('swissprot',"ROA1_HUMAN");
# uses the default database - nr in this case
my $blast_result = blast_sequence($seq);
write_blast(">roa1.blast",$blast_report);
[tisdall]$ perl tut2.pl
Global symbol "$seq" requires explicit package name at tut2.pl line 11.
Global symbol "$blast_report" requires explicit package name at tut2.pl line 13.
Execution of tut2.pl aborted due to compilation errors.
[tisdall]$
Well, that’s pretty clear. The variables $seq and $blast_report are wrong; apparently,
the author intended to reuse the variables $seq_object and $blast_result
instead. So, I edited the file and ran it again:
[tisdall]$ cat tut2.pl
use Bio::Perl;
use strict;
use warnings;
# this script will only work with an internet connection
# on the computer it is run on
my $seq_object = get_sequence('swissprot',"ROA1_HUMAN");
# uses the default database - nr in this case
my $blast_result = blast_sequence($seq_object);
write_blast(">roa1.blast",$blast_result);
[tisdall]$ perl tut2.pl
[tisdall]$ perl tut2.pl
Submitted Blast for [ROA1_HUMAN] ...................
[tisdall]$ ls -l roa1.blast
-rw-rw-r-- 1 tisdall tisdall 56888 May 5 15:05 roa1.blast
[tisdall]$
Examining the roa1.blast file convinced me that the program had successfully called
blast. I decided that was a success, albeit a qualified one: this was a spot in the documentation
that could use a little attention, clearly. By the time you’re reading this, it
may well have been fixed.
Fourth Test
Now, let me show you one more test of my new Bioperl installation.
Looking at the Bio::Perl documentation, I found the following example and discussion
at the very beginning.
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
300 | Chapter 9: Introduction to Bioperl
Bio::Perl(3) User Contributed Perl Documentation Bio::Perl(3)
NAME
Bio::Perl - Functional access to BioPerl for people who
don't know objects
SYNOPSIS
use Bio::Perl;
# will guess file format from extension
$seq_object = read_sequence($filename);
# forces genbank format
$seq_object = read_sequence($filename,'genbank');
# reads an array of sequences
@seq_object_array = read_all_sequences($filename,'fasta');
# sequences are Bio::Seq objects, so the following methods work
# for more info see L, or do 'perldoc Bio/Seq.pm'
print "Sequence name is ",$seq_object->display_id,"\n";
print "Sequence acc is ",$seq_object->accession_number,"\n";
print "First 5 bases is ",$seq_object->subseq(1,5),"\n";
# get the whole sequence as a single string
$sequence_as_a_string = $seq_object->seq( );
# writing sequences
write_sequence(">$filename",'genbank',$seq_object);
write_sequence(">$filename",'genbank',@seq_object_array);
# making a new sequence from just strings you have
# from something else
$seq_object = new_sequence("ATTGGTTTGGGGACCCAATTTGTGTGTTATATGTA",
"myname","AL12232");
# getting a sequence from a database (assumes internet connection)
$seq_object = get_sequence('swissprot',"ROA1_HUMAN");
$seq_object = get_sequence('embl',"AI129902");
$seq_object = get_sequence('genbank',"AI129902");
# BLAST a sequence (assummes an internet connection)
$blast_report = blast_sequence($seq_object);
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
Testing Bioperl | 301
write_blast(">blast.out",$blast_report);
DESCRIPTION
Easy first time access to BioPerl via functions
FEEDBACK
Mailing Lists
User feedback is an integral part of the evolution of this
and other Bioperl modules. Send your comments and suggestions
preferably to one of the Bioperl mailing lists.
Your participation is much appreciated.
bioperl-l@bio.perl.org
Reporting Bugs
Report bugs to the Bioperl bug tracking system to help us
keep track the bugs and their resolution. Bug reports can
be submitted via email or the web:
bioperl-bugs@bio.perl.org
http://bugzilla.bioperl.org/
AUTHOR - Ewan Birney
Email bioperl-l@bio.perl.org
Describe contact details here
APPENDIX
The rest of the documentation details each of the object
methods. Internal methods are usually preceded with a _
...
I took the example code from the SYNOPSIS section and put it in a file called bp1.pl. I
noticed that the code was not meant to be a running program. The variable
$filename refers to some sequence file in the first line of code without being initialized,
and then the same variable name is used in several other lines for different purposes.
This is not an uncommon situation in such SYNOPSIS sections; the point is to
show how individual calls can be made to methods in the class, not to demonstrate a
complete working program (although sometimes the code can be run exactly as is).
I had to make some changes to make this code a working, runnable program. I
declared the strict and warnings pragmas and declared each variable with my. I created
variables for the different sequence filenames I needed as both input and output,
and I made sure that the input sequence files were on disk and of the correct
variety (for example, I created the array.fasta file with three FASTA headers and
sequences one after the other). In the end I had this code:
use Bio::Perl;
use strict;
use warnings;
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
302 | Chapter 9: Introduction to Bioperl
my $gbfilename = 'AI129902.genbank';
# will guess file format from extension
my $seq_object0 = read_sequence($gbfilename);
# forces genbank format
my $seq_object1 = read_sequence($gbfilename,'genbank');
my $fastafilename = 'array.fasta';
# reads an array of sequences
my @seq_object_array = read_all_sequences($fastafilename,'fasta');
# sequences are Bio::Seq objects, so the following methods work
# for more info see L, or do 'perldoc Bio/Seq.pm'
print "Sequence name is ",$seq_object1->display_id,"\n";
print "Sequence acc is ",$seq_object1->accession_number,"\n";
print "First 5 bases is ",$seq_object1->subseq(1,5),"\n";
# get the whole sequence as a single string
my $sequence_as_a_string = $seq_object1->seq( );
# writing sequences
my $gbfilenameout = 'bpout.genbank';
write_sequence(">$gbfilenameout",'genbank',$seq_object1);
write_sequence(">$gbfilenameout",'genbank',@seq_object_array);
# making a new sequence from just strings you have
# from something else
my $seq_object2 = new_sequence("ATTGGTTTGGGGACCCAATTTGTGTGTTATATGTA",
"myname","AL12232");
# getting a sequence from a database (assumes internet connection)
my $seq_object3 = get_sequence('swissprot',"ROA1_HUMAN");
my $seq_object4 = get_sequence('embl',"AI129902");
my $seq_object5 = get_sequence('genbank',"AI129902");
# BLAST a sequence (assummes an internet connection)
my $blast_report = blast_sequence($seq_object3);
write_blast(">blast.out",$blast_report);
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
Bioperl Problems | 303
As you see, I didn’t check on the output of all the method calls, but the real purpose
of this test of my newly installed Bioperl modules was just to see if all the modules
and methods could be found and would run when called.
As mentioned previously, I also added use strict; and use warnings; and declared all
the variables with my. Often, you don’t see that usage in this kind of documentation;
it’s not required in Perl, and it may distract some readers of the documentation from
the main point of showing how the methods are called. This is not an unusual omission
to find in Perl class documentation, but of course, it’s recommended and often
very helpful, as you saw in the last section.
So, I ran my slightly edited version of the example code from the beginning of the
Bio::Perl manpage, with the following results:
[tisdall]$ perl bp1.pl
Sequence name is AI129902
Sequence acc is AI129902
First 5 bases is CTCCG
Submitted Blast for [ROA1_HUMAN] .........
[tisdall]$
I looked at each of the input and output files and verified the expected contents. The
following output from a listing of the files will give you an idea of what to expect
(although you may get different results by running the program with different input
files or database lookups):
-rw-rw-r-- 1 tisdall tisdall 2391 May 5 10:37 AI129902.genbank
-rw-rw-r-- 1 tisdall tisdall 2485 May 5 13:00 array.fasta
-rw-rw-r-- 1 tisdall tisdall 1513 May 5 13:02 bp1.pl
-rw-rw-r-- 1 tisdall tisdall 3653 May 5 13:02 bpout.genbank
-rw-rw-r-- 1 tisdall tisdall 56888 May 5 13:04 blast.out
Notice that at the end of the program I call the blast_sequence method on the protein
sequence object $seq_object3. I discovered that trying to run the blast_sequence
method on a nucleotide sequence object (such as $seq_object5) failed. Although the
documentation for the method said that the sequence type would be examined and
the appropriate BLAST program called (for example, blastp for protein sequence
and blastx for nucleotide sequence, against the nr nonredundant protein database),
it always seemed to call blastp no matter what the input sequence, and therefore it
failed unless called with protein sequence as I had stored in $seq_object3. Perhaps
this bug, a disconnect between the code and the documentation, has been fixed by
the time you read this.
Bioperl Problems
Bioperl is still a work in progress, and it has some problems. I’d like to mention the
two main problems now.
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
304 | Chapter 9: Introduction to Bioperl
First, the Bioperl documentation is incomplete. In fact, until fairly recently, there was
no document that provided a tutorial introduction to the project. This has changed;
the bptutorial.pl document, which you’ve already seen and will see more of, is an
excellent beginning, despite its occasional errors. This document cleverly combines a
tutorial with quite a few example programs that you can run, as you’ll soon see.
Other documentation for Bioperl is also available, including Internet-based tutorials,
forthcoming books, example programs, and journal articles. So, the situation has
recently improved.
Second, Bioperl is big (over 500 modules), written by volunteers, and gradually
evolving. The size of the project is a sign that Bioperl addresses many interesting and
useful problems, but it also means that, for the new user of Bioperl, an overview of
the available resources is a task in itself.
The majority of the Bioperl code is quite good, especially the most-used parts of it.
However, the volunteer and evolving nature of Bioperl development means that
some of the code is unfinished and not as well integrated with other parts of the
project as one would like. Newer or less used modules may still need some shaking
out by users in real-world situations. This is where you can make an initial contribution
to the project: as you find problems, report them (more on that later).
Many of the computing world’s most successful programs are the result of the same
kind of volunteer development as Bioperl (the Perl language itself and the Apache
web server are two examples). Bioperl is well positioned to achieve a similarly central
position in the field of bioinformatics.
Overview of Objects
Bioperl is a big project and a fairly large collection of modules. Some of these modules
are standalone; others interact with each other in various ways.
Your first task in learning about Bioperl is to get an idea of the main subject areas the
modules are designed to address. So to begin with, here is a brief overview of the
main types of objects in Bioperl, collected in a few broadly defined groups.
Sequences
Bio::Seq is the main sequence object in Bioperl.
Bio::PrimarySeq is a sequence object without features.
Bio::SeqIO provides sequence file input and output.
Bio::Tools::SeqStats provides statistics on a sequence.
Bio::LiveSeq::* handles changing sequences.
Bio::Seq::LargeSeq provides support for very large sequences.
Databases
Bio::DB::GenBank provides GenBank access. Similar modules are available for
several biological databases.
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
Overview of Objects | 305
Bio::Index::* indexing and accessing local databases.
Bio::Tools::Run::StandAloneBlast runs BLAST on your local computer.
Bio::Tools::Run::RemoteBlast runs BLAST remotely.
Bio::Tools::BPlite parses BLAST reports.
Bio::Tools::BPpsilite parses psiblast reports.
Bio::Tools::HMMER::Results parses HMMER hidden Markov model results.
Alignments
Bio::SimpleAlign manipulates and displays simple multiple sequence
alignments.
Bio::UnivAln manipulates and displays multiple sequence alignments.
Bio::LocatableSeq are sequence objects with start and end points for locating
relative to other sequences or alignments.
Bio::Tools::pSW aligns two sequences with the Smith-Waterman algorithm.
Bio::Tools::BPbl2seq is a lightweight BLAST parser for pairwise sequence
alignment using the BLAST algorithm.
Bio::AlignIO also aligns two sequences with BLAST.
Bio::Clustalw is an interface to the Clustalw multiple sequence alignment
package.
Bio::TCoffee is an interface to the TCoffee multiple sequence alignment
package.
Bio::Variation::Allele handles sets of alleles.
Bio::Variation::SeqDiff handles sets of mutations and variants.
Features and genes on sequences
Bio::SeqFeature is the sequence feature object in Bioperl.
Bio::Tools::RestrictionEnzyme locates restriction sites in sequence.
Bio::Tools::Sigcleave finds amino acid cleavage sites.
Bio::Tools::OddCodes rewrites amino acid sequences in abbreviated codes for
specific statistical analysis (e.g., a hydrophobic/hydrophilic two-letter
alphabet).
Bio::Tools::SeqPattern provides support for regular expression descriptions of
sequence patterns.
Bio::LocationI provides an interface to location information for a sequence.
Bio::Location::Simple handles simple location information for a sequence, both
as a single location and as a range.
Bio::Location::Split provides location information where the location may
encompass multiple ranges, and even multiple sequences.
Bio::Location::Fuzzy provides location information that may be inexact.
Bio::Tools::Genscan is an interface to the gene finding program.
Bio::Tools::Sim4::Results (and Exon) is an interface to the gene exon finding
program.
Bio::Tools::ESTScan is an interface to the gene finding program.
Bio::Tools::MZEF is an interface to the gene finding program.
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
306 | Chapter 9: Introduction to Bioperl
Bio::Tools::Grail is an interface to the gene finding program.
Bio::Tools::Genemark is an interface to the gene finding program.
Bio::Tools::EPCR parses the output of ePCR program.
bptutorial.pl
I’ve already shown you a little of the bptutorial.pl document. I ran and discussed a
few of the short example programs in the preceding sections.
As you know, one of the easiest ways to get started with a programming system is to
find some working and fairly generic programs in that system. You can read and run
the programs, and then proceed to alter them using them as templates for your own
programming development.
Bioperl comes with a directory of example programs, but the best place to begin
looking for starting-off program code is right in the bptutorial.pl document itself.
That .pl suffix on the name is the giveaway; the document is actually itself a program,
cleverly designed so that you can read, and run, example programs that exercise
the core parts of the Bioperl project.
The following explanation of the runnable programs that are part of bptutorial.pl
appears at the end of the document (when you view it on the Web or as the output
of perldoc bptutorial.pl).
V.2 Appendix: Tutorial demo scripts
The following scripts demonstrate many of the features of
bioperl. To run all the core demos, run:
> perl -w bptutorial.pl 0
To run a subset of the scripts do
> perl -w bptutorial.pl
and use the displayed help screen.
It may be best to start by just running one or two demos
at a time. For example, to run the basic sequence manipulation
demo, do:
> perl -w bptutorial.pl 1
Some of the later demos require that you have an internet
connection and/or that you have an auxilliary bioperl
library and/or external cpan module and/or external program
installed. They may also fail if you are not running
under Linux or Unix. In all of these cases, the script
should fail "gracefully" simply saying the demo is being
skipped. However if the script "crashes", simply run the
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
bptutorial.pl | 307
other demos individually (and perhaps send an email to
bioperl-l@bioperl.org detailing the problem :-).
(Recall that the -w flag to Perl turns on warnings in almost the same manner as a use
warnings; directive.)
To test my Bioperl installation, I started by running the basic sequence manipulation
demo as suggested.
First, I thought I might copy the bptutorial.pl program file into my own working
directory from the Bioperl distribution directory where I’d unpacked the source
code. I wanted to put it in my own directory so as not to muddy up the Bioperl distribution
directory with my own extraneous files. However, I discovered that the
tutorial demo programs rely on a number of datafiles that are found in the t/data/
subdirectory of the Bioperl distribution. Running the program in my own directory
gave me an error because the program evidently requires a FASTA-formatted file
called dna1.fa:
[tisdall]$ perl -w bptutorial.pl 1
Beginning sequence_manipulations and SeqIO example...
------------- EXCEPTION -------------
MSG: Could not open t/data/dna1.fa: No such file or directory
STACK Bio::Root::IO::_initialize_io /usr/local/lib/perl5/site_perl/5.8.0/Bio/Root/IO.
pm:264
STACK Bio::SeqIO::_initialize /usr/local/lib/perl5/site_perl/5.8.0/Bio/SeqIO.pm:437
STACK Bio::SeqIO::fasta::_initialize /usr/local/lib/perl5/site_perl/5.8.0/Bio/SeqIO/
fasta.pm:80
STACK Bio::SeqIO::new /usr/local/lib/perl5/site_perl/5.8.0/Bio/SeqIO.pm:355
STACK Bio::SeqIO::new /usr/local/lib/perl5/site_perl/5.8.0/Bio/SeqIO.pm:368
STACK main::__ANON__ bptutorial.pl:2758
STACK toplevel bptutorial.pl:3933
--------------------------------------
[tisdall]$
I decided it would be easier to run bptutorial.pl from the distribution directory. I
entered that directory; on my Linux machine, I unpacked the Bioperl source code
into /usr/local/src/bioperl-1.2.1. I then tried to run the demo:
[tisdall]$ cd /usr/local/src/bioperl-1.2.1
[tisdall]$ perl -w bptutorial.pl 1
Beginning sequence_manipulations and SeqIO example...
First sequence in fasta format...
>Test1
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC
TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG
TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC
ACAACATCCATGAAACGCATTAGCACCACC
Seq object display id is Test1
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
308 | Chapter 9: Introduction to Bioperl
Sequence is
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTAC
CTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAAT
TACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
Sequence from 5 to 10 is TTTCAT
Acc num is unknown
Moltype is dna
Primary id is Test1
Truncated Seq object sequence is TTTCAT
Reverse complemented sequence 5 to 10 is ATGAAA
Translated sequence 6 to 15 is LQRAICLCVD
Beginning 3-frame and alternate codon translation example...
ctgagaaaataa translated using method defaults : LRK*
ctgagaaaataa translated as a coding region (CDS): MRK
Translating in all six frames:
frame: 0 forward: LRK*
frame: 0 reverse-complement: LFSQ
frame: 1 forward: *EN
frame: 1 reverse-complement: YFL
frame: 2 forward: EKI
frame: 2 reverse-complement: IFS
Translating with all codon tables using method defaults:
1 : LRK*
2 : L*K*
3 : TRK*
4 : LRK*
5 : LSK*
6 : LRKQ
9 : LSN*
10 : LRK*
11 : LRK*
12 : SRK*
13 : LGK*
14 : LSNY
15 : LRK*
16 : LRK*
21 : LSN*
[tisdall]$
That seemed to run without error. Now I wanted to see the Perl code that had run
and produced that output.
Exploring a bit, I found that the POD documentation part of bptutorial.pl is
roughly the first half of the file, and that the second half of the file contains the Perl
code for the demos.
The author attributions and the libraries that are loaded are at the beginning of the
Perl code section, somewhere near the middle of the bptutorial.pl file:
#!/usr/bin/perl
# PROGRAM : bptutorial.pl
# PURPOSE : Demonstrate various uses of the bioperl package
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
bptutorial.pl | 309
# AUTHOR : Peter Schattner schattner@alum.mit.edu
# CREATED : Dec 15 2000
# REVISION : $Id: ch09,v 1.31 2003/09/11 13:54:52 mam Exp mam $
use strict;
use Bio::SimpleAlign;
use Bio::AlignIO;
use Bio::SeqIO;
use Bio::Seq;
That seems like a good bet for a list of the most important Bioperl modules. Actually,
some other modules are loaded for individual demos; at various points, the following
come into play:
use Bio::SearchIO;
use Bio::Root::IO;
use Bio::MapIO;
use Bio::TreeIO;
use Bio::Perl
Following those use Module directives comes the following:
# subroutine references
my ($access_remote_db, $index_local_db, $fetch_local_db,
$sequence_manipulations, $seqstats_and_seqwords,
$restriction_and_sigcleave, $other_seq_utilities, $run_remoteblast,
$run_standaloneblast, $blast_parser, $bplite_parsing, $hmmer_parsing,
$run_clustalw_tcoffee, $run_psw_bl2seq, $simplealign,
$gene_prediction_parsing, $sequence_annotation, $largeseqs,
$run_tree, $run_map, $run_struct, $run_perl, $searchio_parsing,
$liveseqs, $demo_variations, $demo_xml, $display_help, $bpinspect1 );
# global variable file names. Edit these if you want to try
#out a tutorial script on a different file
Bio::Root::IO->catfile("t","data","ecolitst.fa");
# used in $sequence_manipulations
my $dna_seq_file = Bio::Root::IO->catfile("t","data","dna1.fa");
# used in $other_seq_utilities and in $run_perl and $sequence_annotation
my $amino_seq_file = Bio::Root::IO->catfile("t","data","cysprot1.fa");
# used in $blast_parser
my $blast_report_file = Bio::Root::IO->catfile("t","data","blast.report");
# used in $bplite_parsing
my $bp_parse_file1 = Bio::Root::IO->catfile("t","data","blast.report");
# used in $bplite_parsing
my $bp_parse_file2 = Bio::Root::IO->catfile("t","data","psiblastreport.out");
# used in $bplite_parsing
my $bp_parse_file3 = Bio::Root::IO->catfile("t","data","bl2seq.out");
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
310 | Chapter 9: Introduction to Bioperl
# used in $run_clustalw_tcoffee
my $unaligned_amino_file = Bio::Root::IO->catfile("t","data","cysprot1a.fa");
# used in $simplealign
my $aligned_amino_file = Bio::Root::IO->catfile("t","data","testaln.pfam");
# other global variables
my (@runlist, $n );
A look at the documentation (perldoc Bio::Root::IO) shows that the catfile method
returns the pathname of a file, which is being saved in a scalar variable such as $dna_
seq_file. This method is there for portability between operating systems, because
operating systems each have their own syntax for specifying pathnames.
After that comes the code for the help screen, the output of which you’ve already
seen; next comes the individual demo subroutines; and finally the code for the main
subroutine, which you saw in the last section.
At the very end of the bptutorial.pl file, I found the part of the code that launches
all the demos:
## "main" program follows
#no strict 'refs';
@runlist = @ARGV;
# display help if no option
if (scalar(@runlist)= =0) {&$display_help;};
# argument = 0 means run tests 1 thru 22
if ($runlist[0] = = 0) {@runlist = (1..22); };
foreach $n (@runlist) {
if ($n = =100) {my $object = $runlist[1]; &$bpinspect1($object); last;}
if ($n = =1) {&$sequence_manipulations; next;}
if ($n = =2) {&$seqstats_and_seqwords; next;}
if ($n = =3) {&$restriction_and_sigcleave; next;}
if ($n = =4) {&$other_seq_utilities; next;}
if ($n = =5) {&$run_perl; next;}
if ($n = =6) {&$searchio_parsing; next;}
if ($n = =7) {&$bplite_parsing; next;}
if ($n = =8) {&$hmmer_parsing; next;}
if ($n = =9) {&$simplealign ; next;}
if ($n = =10) {&$gene_prediction_parsing; next;}
if ($n = =11) {&$access_remote_db; next;}
if ($n = =12) {&$index_local_db; next;}
if ($n = =13) {&$fetch_local_db; next;}
if ($n = =14) {&$sequence_annotation; next;}
if ($n = =15) {&$largeseqs; next;}
if ($n = =16) {&$liveseqs; next;}
if ($n = =17) {&$run_struct; next;}
if ($n = =18) {&$demo_variations; next;}
if ($n = =19) {&$demo_xml; next;}
if ($n = =20) {&$run_tree; next;}
if ($n = =21) {&$run_map; next;}
if ($n = =22) {&$run_remoteblast; next;}
if ($n = =23) {&$run_standaloneblast; next;}
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
bptutorial.pl | 311
if ($n = =24) {&$run_clustalw_tcoffee; next;}
if ($n = =25) {&$run_psw_bl2seq; next;}
&$display_help;
}
## End of "main"
So, I searched for sequence_manipulation and found the following code for this, the
first of the bptutorial.pl demos:
#################################################
# sequence_manipulations ( ):
#
$sequence_manipulations = sub {
my ($infile, $in, $out, $seqobj);
$infile = $dna_seq_file;
print "\nBeginning sequence_manipulations and SeqIO example... \n";
# III.3.1 Transforming sequence files (SeqIO)
$in = Bio::SeqIO->new('-file' => $infile ,
'-format' => 'Fasta');
$seqobj = $in->next_seq( );
# perl "tied filehandle" syntax is available to SeqIO,
# allowing you to use the standard <> and print operations
# to read and write sequence objects, eg:
#$out = Bio::SeqIO->newFh('-format' => 'EMBL');
$out = Bio::SeqIO->newFh('-format' => 'fasta');
print "First sequence in fasta format... \n";
print $out $seqobj;
# III.4 Manipulating individual sequences
# The following methods return strings
print "Seq object display id is ",
$seqobj->display_id( ), "\n"; # the human read-able id of the sequence
print "Sequence is ",
$seqobj->seq( )," \n"; # string of sequence
print "Sequence from 5 to 10 is ",
$seqobj->subseq(5,10)," \n"; # part of the sequence as a string
print "Acc num is ",
$seqobj->accession_number( ), " \n"; # when there, the accession number
print "Moltype is ",
$seqobj->alphabet( ), " \n"; # one of 'dna','rna','protein'
print "Primary id is ", $seqobj->primary_seq->primary_id( )," \n";
# a unique id for this sequence irregardless
#print "Primary id is ", $seqobj->primary_id( ), " \n";
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
312 | Chapter 9: Introduction to Bioperl
# a unique id for this sequence irregardless
# of its display_id or accession number
# The following methods return an array of Bio::SeqFeature objects
$seqobj->top_SeqFeatures; # The 'top level' sequence features
$seqobj->all_SeqFeatures; # All sequence features, including sub
# seq features
# The following methods returns new sequence objects,
# but do not transfer features across
# truncation from 5 to 10 as new object
print "Truncated Seq object sequence is ",
$seqobj->trunc(5,10)->seq( ), " \n";
# reverse complements sequence
print "Reverse complemented sequence 5 to 10 is ",
$seqobj->trunc(5,10)->revcom->seq, " \n";
# translation of the sequence
print "Translated sequence 6 to 15 is ",
$seqobj->translate->subseq(6,15), " \n";
my $c = shift;
$c ||= 'ctgagaaaataa';
print "\nBeginning 3-frame and alternate codon translation example... \n";
my $seq = new Bio::PrimarySeq('-SEQ' => $c,
'-ID' => 'no.One');
print "$c translated using method defaults : ",
$seq->translate->seq, "\n";
# Bio::Seq uses same sequence methods as PrimarySeq
my $seq2 = new Bio::Seq('-SEQ' => $c, '-ID' => 'no.Two');
print "$c translated as a coding region (CDS): ",
$seq2->translate(undef, undef, undef, undef, 1)->seq, "\n";
print "\nTranslating in all six frames:\n";
my @frames = (0, 1, 2);
foreach my $frame (@frames) {
print " frame: ", $frame, " forward: ",
$seq->translate(undef, undef, $frame)->seq, "\n";
print " frame: ", $frame, " reverse-complement: ",
$seq->revcom->translate(undef, undef, $frame)->seq, "\n";
}
print "Translating with all codon tables using method defaults:\n";
my @codontables = qw( 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 );
foreach my $ct (@codontables) {
print $ct, " : ",
$seq->translate(undef, undef, undef, $ct)->seq, "\n";
}
return 1;
} ;
#################################################
This is the Title of the Book, eMatter Edition
Copyright © 2003 O’Reilly & Associates, Inc. All rights reserved.
bptutorial.pl: sequence_manipulation Demo | 313
bptutorial.pl: sequence_manipulation
Demo
In this section, I’ll go through the code for the demo subroutine sequence_
manipulation that was shown in the last section.
The subroutine is actually an anonymous subroutine; a reference to the subroutine is
saved in the scalar reference variable $sequence_manipulation:
$sequence_manipulations = sub {
...
}
The first few lines of code declare some variables with my. Notice that these are not
being passed in as arguments; this method uses no arguments but does occasionally
use global variables such as $dna_seq_file, which, as you’ve just seen, contain the
pathname of the input sequence file the demo will use:
my ($infile, $in, $out, $seqobj);
$infile = $dna_seq_file;
print "\nBeginning sequence_manipulations and SeqIO example... \n";
The code is cross-referenced to the tutorial sections of the file. The next comment
line refers to the part of the document:
# III.3.1 Transforming sequence files (SeqIO)
which can be looked up in the table of contents to the document for further reading:
III.3 Manipulating sequences
III.3.1 Manipulating sequence data with Seq methods (Seq)
Now, I’ll take a look at the first section of example code in the sequence_
manipulations method:
# III.3.1 Transforming sequence files (SeqIO)
$in = Bio::SeqIO->new('-file' => $infile ,'-format' => 'Fasta');
$seqobj = $in->next_seq( );
# perl "tied filehandle" syntax is available to SeqIO,
# allowing you to use the standard <> and print operations
# to read and write sequence objects, eg:
#$out = Bio::SeqIO->newFh('-format' => 'EMBL');
$out = Bio::SeqIO->newFh('-format' => 'fasta');
print "First sequence in fasta format... \n";
print $out $seqobj;
The code starts with a call to the new object constructor of the Bio::SeqIO class. The
new method is being passed the pathname to a FASTA file in $infile, and told that
the format is FASTA

A quick look at the Bio::SeqIO documentation explains that the call to Bio::SeqIO->
new returns a stream object for the specified format. So, $out is a stream object (a
stream is input or output of data) for FASTA-formatted data, and $in is a stream
object for FASTA-formatted input from the file named in the $infile variable. These
$in and $out objects are also filehandles.
After the $in object is initialized on the FASTA file named in $infile, it calls the
next_seq method, which gets the next (in this case, the first and perhaps only)
FASTA record from the file, and it creates a sequence object $seqobj. The output
$out object is created. The Perl print statement is then called, using $out as a filehandle,
and printing $seqobj.

BY
AVRO JYOTI DE
MERIT IIT

THE WORLD OF AVRO'S

Tuesday, March 9, 2010

INTRODUCTION OF BIOPERL

Followers

Blog Archive

About Me