Brazilian EMBnet Node: progress Report


Ana Tereza Vasconcelos1, Wim M. Degrave2, Goran Neshich3

1Laboratório Nacional de Computação, Científica Laboratório de Bioinformática, Quitandinha Petrópolis, Rio de Janeiro (Brazil)

2Oswaldo Cruz Institute (IOC), FIOCRUZ, Rio de Janeiro, Brazil

3Brazilian Agricultural Research Corporation (EMBRAPA) and UNICAMP’s Department of Biology, Campinas, Brazil


The Brazilian EMBnet node conducts research and development in Bioinformatics and Computational Biology, with emphasis on creating and applying computational and mathematical methods and models for solving biological problems. The Brazilian node is formed by a network of three institutions: The National Laboratory for Scientific Computation (LNCC - Petropolis), the Oswaldo Cruz Foundation (Fiocruz – Rio de Janeiro) and Empresa Brasileira de Pesquisa Agropecuária (EMBRAPA - Campinas). The network maintains and develops databases and tools in bioinformatics and computational biology to supply the needs of thematic networks and national and international collaborative projects, organizes training courses on several levels, and promotes technology and innovation.

National laboratory for scientific computation -
LNCC, Laboratory for Bioinformatics

Dr. Ana Tereza Vasconcelos.



The LNCC, one of the National Institutes of the Ministry of Science and Technology, has at present the following available computational resources: Sunfire 6800 with 24 processors and 24 Gb of memory, SunFire 3800 with 4 processors and 32 Gb of memory, SGI Challenger with 8 processors and 2 Gb of memory and 2 Sun Enterprise 450, offering a set of tools with modern technology that is up-to-date and ready for the development of applications that demand high levels of computational and scientific resources. The computational resources of the LNCC also include 90 Unix workstations (IBM, Silicon Graphics, Sun and Linux), 350 PCs and 100 printers. The external network of the LNCC is made of two links, one of 34 Mbps (megabits/second) to the POP-Rio de Janeiro of the RNP which is operated by the LNCC in its old headquarters and another of 2 Mbps to the REDERIO. The rate of use of the two links (Rio de Janeiro-Petrópolis) is of approximately 50%. Expansion of the links with the REDERIO aiming at the interconnection with the REMAV-Rio de Janeiro (High-Speed Metropolitan Network) is under study. Two communication servers for dialed access, each having 30 digital lines (total 60 lines) should also be at our disposal and will be located in Petrópolis and in Rio de Janeiro (POP-Rio de Janeiro).

The platform of the internal network of the LNCC is composed at present of 2 CISCO Catalyst switches, model 6509, interconnected to 4 Gbps, interconnecting two clusters of Catalyst switches, model XL-2909, with FEC connections of 800 Mbps in each cluster. The master switches of each cluster have 2 expansion slots available, besides several 10/100 Mbps ports reserved for expansions. In total, the LNCC has approximately 500 10/100 Mbps ports in the clusters of switches interconnecting the workstations of its technical/scientific staff. The cabling is certified and warranted  by Lucent Tecnologies for a period of 15 years. The switch connections are made of fiber optics and the links from these to the rooms (stations) are made in category 7 twisted-pair cables.


The Computational Genomics Unit Darcy Fontoura de Almeida is associated to the Laboratory of Bioinformatics of the National Laboratory of Scientific Computation – LNCC. This Unit, coordinated by Ana Tereza Ribeiro de Vasconcelos, has the purpose of integrating the activities of high-throughput DNA sequencing and bioinformatics into a single center, thus allowing for the best possible use of the data generated by the new 454 GS FLX sequencer of Roche. Inaugurated in September 19, 2008, the Computational Genomics Unit is a center of excellence of national reference in high-throughput sequencing. At present, the 454 GS FLX sequencer is the only one in South America that follows all the specifications of the Roche manufacturer. The laboratory can also count with an Agilent Bionalyzer 2100, a Nanodrop 3000 fluorometer, a Genomic Solutions HydroShear, a Qiagen TissueLyser, centrifuges, a Beckman Coulter Z1, Veriti thermocyclers and other support equipment.

Projects with financial support

2008 - Actual: Genômica Computacional e o Seqüenciamento Parcial do Genoma de Trypanosoma Cruzi

Financial support: Fundação Carlos Chagas Filho de Amparo à Pesq. do Estado do Rio de Janeiro-FAPERJ

2008 - Actual: apoio para a manutenção e instalação da unidade multiusuário de genômica computacional

Financial support: Fundação Carlos Chagas Filho de Amparo à Pesq. do Estado do Rio de Janeiro-FAPERJ

2008 - Actual: Rede Sul Americana e Iberoamericana de Bioinformática (Red SurAmericana e Iberoamericana de Bioinformatica)

Financial support: Nacional de Desenvolvimento Científico e Tecnológico-CNPq

2008 - Actual: Rede Nacional de Sequenciamento de DNA - Projeto Genoma Brasileiro: Determinação de Genomas Relevantes para a Saúde Humana

Financial support : Ministério da Ciência e Tecnologia-MCT, Ministério da Saúde-MS

2008 - Actual: Rede Brasileira de Pesquisas sobre o Câncer - RBPC

Financial support Ministério da Saúde-MS e Conselho Nacional de Desenvolvimento Científico e Tecnológico-CNPq

2007 - Actual: Biotecnologia - Insumos para Genômica e Proteômica

Financial support: Conselho Nacional de Desenvolvimento Científico e Tecnológico-CNPq

2007 - Actual: Prospecção de novos genes com potencial biotecnológico

Financial support: Conselho Nacional de Desenvolvimento Científico e Tecnológico-CNPq

2006 - Actual: Estudo multicêntrico para caracterização molecular das hemofilias A e B e determinação do estado de portador de hemofilia no Brasil

Financial support : Ministério da Saúde-MS

2006 - 2008: Brazilian Microbiological Resource Center (BMRC)

Financial support: Conselho Nacional de Desenvolvimento Científico e Tecnológico-CNPq, Empresa Brasileira de Pesquisa Agropecuária-Centro Nac. de Pesq. de Soja-EMBRAPA SOJA

2006 - Actual: CTpedia database

Financial support: Ludwig Institute for Câncer research

2004 - Actual: HAMAP BRAZIL - PAthogenic Proteins Annotation Project

Financial support: Swiss Institute for Bioinformatics

2004 - 2008: Projeto Genomica comparativa de Xylella fastidiosa

Financial support : Ministério da Ciência e Tecnologia-MCT, Universidade de São Paulo-USP

2004 - Actual: Fixadores de Nitrogenio

Financial support: Conselho Nacional de Desenvolvimento Científico e Tecnológico-CNPq, Empresa Brasileira de Pesquisa Agropecuária-Centro Nac. de Pesq. de Soja-EMBRAPA SOJA


  • Genômica funcional de microrganismos patogênicos, 2009.
  • Genômica e Bioinformática, 2008.
  • Bioinformática I - Banco de dados do ponto de vista biológico , 2007.
  • Tópicos Especiais em Genética II -Genômica Comparativa , 2007.
  • Análise e Comparação de Genomas - Procariotos , 2006.
  • Bioinformática I - Banco de Dados do Ponto de Vista Biológico , 2006.


  • EMBnet node
  • Expasy Mirror
  • CTdatabase
  • Brazilian Microbiological Resource
  • Mamibase
  • Tractor DB
  • Structural Descriptor DataBase
  • SABIA – Software for automatic Bacterial Anottation

Fundação oswaldo cruz – FIOCRUZ, Platform for Bioinformatics, and Laboratory for Functional Genomics and Bioinformatics

Dr. Wim Degrave

The EMBnet node activities at Fiocruz are assured by the institutional Bioinformatics Platform, with support from the VPPLR-PDTIS program-RPT4A and by the IOC - Functional Genomics and Bioinformatics Unit and support from the Program for Scientific Computing, and the Fiocruz Network.

Team members:

  • Wim M. Degrave
  • Antonio Basilio de Miranda
  • Thomas Dan Otto (currently at the Sanger Institute, UK)
  • Fábio F. Mota - Technologist
  • Mark Catanho - PhD student in Cellular and Molecular Biology
  • Ana Carolina Guimarães - PhD student in Cellular and Molecular Biology
  • Flávio Engelke - Master student in Biomedical Sciences


Bioinformatics services; support for genomics and proteomics platforms at Fiocruz, genome sequencing projects, software and application development; installation and upgrading of software; construction, implementation and updating of databases; design and maintenance of information services; organization of training courses and on-line training, research projects in comparative genomics, evolutionary biology and genome wide metabolic analysis, drug development in neglected diseases.

The node aims to:

  • provide the environment and support in bioinformatics (biological data processing, access to genetic databases, creating and maintaining databases for proteomic analysis) and support for special applications such as molecular modeling, assembly and genome analysis, support for proteomics,
  • organize hands-on training courses and on-line training to users, mostly within graduate and post-graduate programs;
  • contribute to specific research projects through software and database development;
  • disseminate bioinformatics as a tool and as a research and development discipline. The Bioinformatics node contributes to improvement of public health and the development of new technologies and tools;
  • generate a potential economic impact, because it contributes to the patentability in research projects and innovations, and has captured external resources for this purpose.


The main infrastructure of the unit is currently comprised of a dozen of smaller dedicated servers. Two larger servers are to be included in 2010. Fiocruz has an extensive network of fiber optics, linking several thousands of PCs in the different Institutes that comprise the Foundation, and is connected to the RNP and REDERIO through high speed links. Fiocruz counts with several additional bioinformatics groups performing research and development in fields such as genomics, statistics and epidemiology, molecular modeling, georeferencing, systems biology etc., and counts with post-graduate courses in Computational and Systems Biology.

Special Services offered:

  • Bioinformatics databases and applications
  • Genome assembly
  • Web servers
  • General sequence analysis
  • Proteome analysis
  • Data processing

The most common software packages for sequence assembly and database are available.

Products developed by the team of the platform:

  • BioParser * - Analyzer/parser for all varieties of BLAST and FASTA, with support for versions of BLAST with and without gaps.
  • SQUID * - Friendly local grid environment for the use of BLAST and FASTA programs.
  • GenoMycDB - Database for information related to the genome and proteome of mycobacteria
  • REReP - Method to facilitate the assembly of genomes, based on the detection and filtering of seqs. repetitive (applicable to data obtained by the method of Sanger and probably pirosequenciamento)
  • AnEnPi - Tool for clustering, similarity search, identification of cases of functional analogy and reconstruction of metabolic pathways.
  • ProteinWorldDB - Database indexes of similarity between protein sequences of hundreds of genomes – http://www.proteinworlddb.org


  • Computational analysis of sequence and protein (IOC 26051)
  • Origin, Structure and Evolution of prokaryotic genomes (IOC 26052)

Recent new collaborations

  • Analysis of the genome of Streptococcus pneumoniae, in collaboration with BioManguinhos (Dr. Marco Medeiros)
  • Development of a multiplex PCR for distinguishing species of the genus Wolbachia, Ehrlichia, Rickettsia and Anaplasma, in collaboration with Dr. Agnes Rossi (top Mar/2009 - Ready for testing on bench)
  • Analysis of genes of Vibrio mimicus, in collaboration with Dr. Ana Carolina Vicente (top Sep/2009)


The empresa brasileira de pesquisa agropecuária - EMPRAPA, Laboratory for Computational Biology

Dr. Goran Neshich

Embrapa, through its laboratory for Computational Biology, has a long record of offering services to academic partners through the internet using its experience and knowledge in maintaining its own product STING. It mirrors also public databases such as PDB, Uniprot, Prosite, HSSP, DSSP, ProTherm etc. while maintaining STING mirrors at 5 continents. Embrapa´s activities include intensive service, education and development and involves students as well as experienced colleagues both from Brazil and from Latin America.


The lab is located in an environment with plenty of space for students, researchers, computer space, dedicated server space, dedicated space for training. Currently, the hardware infrastructure is going through an extensive renewal and new machines are being installed, replacing old (2001) acquired servers and PC stations. Expanded storage space is being acquired to aid in the ever growing problem of expanding disc space for DB and their back-ups. We have SUN, SGI and Dell clusters, totaling at about 60 CPUs, while total storage space around 15 Tb in separate servers. Around 15 Linux/Windows dual system PCs are dedicated for student work and other 20 are dedicated for training only. Due to infrastructure updating, lab reconstruction and team renewal, the lab stopped temporarily to offer general EMBnet services until all pending issues are resolved, in order to provide for better services.


We are restoring and expanding at the same time the STING and its database, transforming it into a federative contribution platform. We would like to offer to the EMBnet not only the new STING but also our experience in upgrading it, maintaining it, mirroring it and using it for educational purposes.


We run a course for two major universities and their program for bioinformatics – UFMG and Unicamp. Both are well attended and teach mostly structural computational biology, but also some tools and DBs from sequence – one dimensional world.

During the last three years we offered total of 3 courses for more than 50 students, mainly covering material from structural computational biology and structural bioinformatics.

Database construction

We constructed a first Latin American database that was registered in the NAR DB issue. Since then, we aggregated many parameters into that same STING_DB, making it the largest of its kind available for access over the web. Currently this database contains more than 28.5•109 registers (61,000 PDB files, ~130,000 chains, ~300AA/chain, 731 descriptors/AA).

Software development

We also published and posted on the Web STING suite of software programs for educational and analytical purposes. Analysis STING protocols are designed for routine use and can generate advanced reports about structure, sequence, function, stability and binding of proteins and their ligands.


  • Study of Macromolecular Communication in Homo and Hetero complexes through their interfaces Unicamp-IB+Embrapa/CNPTIA. Large scale protein function prediction tools” Genoscope, France + mbrapa/CNPTIA
  • ”TargetsDB - Base de dados de alvos terapêuticos validados” UFMG + Embrapa/CNPTIA
  • Automatic prediction of protein-protein interfaces based on a novel hydrophobicity index studies Unicamp-IB + Embrapa/CNPTIA
  • Free Bioinformatics Technology consolidation and application in Biomedicine (FreeBIT) Red Iberoamericana de Bioinformatica + Embrapa/CNPTIA
  • “Druggable proteins: Identification of potential therapeutic targets for development of agrochemicals, veterinary and medical drugs and vaccines for treating plant and animal diseases important for agriculture and live stock” UFMG+USP+UNICAMP+UNIFEI+EMBRAPA. GenoProtPlus SUN Computers e EMBRAPA
  • Molecular modeling and structural analysis of the protein twitching motility a product of XF1633 gene of Xylella fastidiosa. EMBRAPA - CNPTIA


