Trypanosoma cruzi - Programa de Pós
Transcrição
Trypanosoma cruzi - Programa de Pós
Universidade Federal de Minas Gerais Instituto de Ciências Biológicas Departamento de Bioquímica e Imunologia Tese de Doutorado Genômica Comparativa para Identificação de Fatores de Virulência no Trypanosoma cruzi Rondon Pessoa de Mendonça Neto Orientadora: Profa. Dra. Santuza Maria Ribeiro Teixeira Co-orientadora: Profa. Dra. Daniella Castanheira Bartholomeu Novembro de 2013 Rondon Pessoa de Mendonça Neto Genômica Comparativa para Identificação de Fatores de Virulência no Trypanosoma cruzi Tese submetida ao Programa de PósGraduação em Bioinformática da Universidade Federal de Minas Gerais como requisito parcial para a obtenção do título de Doutor em Bioinformática Orientadora: Profa. Dra. Santuza Maria Ribeiro Teixeira Co-orientadora: Profa. Dra. Daniella Castanheira Bartholomeu Universidade Federal de Minas Gerais Instituto de Ciências Biológicas Departamento de Bioquímica e Imunologia Belo Horizonte/MG - Brasil Novembro de 2013 “Por mais longa que seja a caminhada, o mais importante é dar o primeiro passo.” Vinícius de Moraes Ao Lucas Sumário Agradecimentos............................................................................................. I Lista de abreviaturas..................................................................................... II Lista de figuras, tabelas e anexos........................................................... IV Resumo......................................................................................................... VII Abstract......................................................................................................... IX 1. Introdução................................................................................................ 1 1.1 Trypanosoma cruzi e a Doença de Chagas...................................... 1 1.2 Variabilidade Populacional do Trypanosoma cruzi............................ 4 1.3 Genoma do Trypanosoma cruzi......................................................... 5 1.4 Genômica Comparativa de Tripanosomatídeos................................ 9 1.5 O Clone CL-14................................................................................... 13 1.6 Expressão Gênica em Tripanosomatídeos........................................ 14 2. Objetivos.................................................................................................. 19 2.1 Objetivo Geral.................................................................................... 19 2.2 Objetivos Específicos......................................................................... 19 3. Materiais e Métodos................................................................................. 20 3.1 Sequenciamento do DNA Nuclear e Mitocondrial............................. 20 3.2 Pré-processamento e Pré-análises das Sequências......................... 21 3.3 Amplificação por PCR do DNA Nuclear e Mitocondrial de Cepas do Trypanosoma cruzi............................................................................ 23 3.4 Análises Filogenéticas....................................................................... 24 3.5 Montagem do Genoma Mitocondrial.................................................. 26 3.6 Determinação do Número de Cópias de Famílias Multigênicas........ 27 3.7 Determinação de Identidade entre CDS............................................ 29 3.8 Análises de genes Trans-sialidases com Repetições SAPA............. 30 3.9 Sequenciamento do Transcriptoma e mapeamento.......................... 32 4. Resultados............................................................................................... 35 4.1 Sequenciamento e Montagem do Genoma do Clone CL-14............. 35 4.2 Análises Filogenéticas....................................................................... 42 4.3 Montagem e Análise do Genoma Mitocondrial de CL- 14....................................................................................................... 52 4.4 Análise Comparativa de Famílias Multigênicas................................. 57 4.5 Análises das diferenças nos genes codificando Trans-sialidases com Repetições SAPA em CL Brener e CL-14......................................... 66 4.6 Sequenciamento e Mapeamento do Transcriptoma de CL-14.......... 75 5. Discussão…………………………………………………………….............. 80 6. Referências Bibliográficas........................................................................ 95 7. Anexos..................................................................................................... 106 Agradecimentos Agradeço a todos que me apoiaram nessa etapa. Minha gratidão especial é à: Profa. Santuza Teixeira e Daniella Bartholomeu. Muito obrigado, o que eu aprendi devo à vocês. Só me mostraram o melhor caminho, planejamento. Acima da ciência que me passaram, o profissionalismo apresentado é admirável. Meus colegas dos laboratórios LGMT, LIGP e HPGL; Dr. Najib El-Sayed; Dr. Ricardo Gazzinelli; Dra. Caroline Junqueira; Todos os colaboradores dos Institutos e Universidades que passei com esse trabalho; Meus professores; Meus amigos e parentes, compreensivos com meu trabalho, com destaque à Tia Júlia, que me mostrou essa trilha; Agradeço especialmente ao sacrifício que Lucas e Nádia tiveram, sem reclamações, nessa etapa de muito trabalho e sacrifício; Por fim, à CAPES. I Lista de Abreviaturas BAC – Bacterial artificial chromosome – cromossomo artificial de bactéria cDNA – Complementary DNA – DNA complementar DGF – Dispersed gene family – família de genes dispersos DNA – Desoxiribonucleic acid – Ácido desoxirribonucleico DTU – Discrete typing unity – unidade discreta de tipagem GPI8 – Glycosylphosphatidylinositol-anchor transamidase subunit 8 – Subunidade 8 transamidase da âncora glicosil fosfatidil inositol gRNA – Guide RNA – RNAs guias Kb – kilobases - 103 bases nucleotídicas kDNA – Kinetoplast DNA – DNA do cinetoplasto MASP – Mucin associated surface protein – Proteína de superfície associada à mucina mRNA – Messenger RNA – RNA mensageiro MURF1 – Maxicircle unidentified read frame 1 – Frame de leitura de maxicírculo não identificado 1 MURF2 – Maxicircle unidentified read frame 1 – Frame de leitura de maxicírculo não identificado 2 ND4 – NADH desidrogenase 4 ND5 – NADH desidrogenase 5 II ng – Nanogramas nt – Nucleotídeos ORF – Open reading frame – Janela aberta de leitura PFGE – Pulse field gel electrophoresis – Eletroforese em gel em campo pulsátil RNA – Ribonucleic acid – Ácido ribonucleico RNAseq – RNA sequencing – Sequenciamento quantitativo de RNA rRNA – Ribossomal RNA – RNA ribossomal SAPA – Shed acute phase antigen – antígeno de fase aguda exudado SNP – Single nucleotide polymorphism – Polimorfismo de nucleotídeo único snRNA –RNA pequeno nuclear snoRNA – RNA pequeno nucleolar T. cruzi – Trypanosoma cruzi TcTS-SAPA – Trans-sialidase de Trypanosoma cruzi com repetições SAPA tRNA – RNA transportador UTR – Untranslated region – Região não traduzida WGS – Whole genome shotgun – Estratégia de sequenciamento baseada na fragmentação de todo o genoma III Lista de figuras, tabelas e anexos Figura 1 – Representação esquemática do ciclo de vida do Trypanosoma cruzi....................................................................................................................... 2 Tabela 1 – PCR de diferenciação dos grupos de T. cruzi.................................... 25 Tabela 2 – PCR para diferenciação de tamanhos dos clusters de repetições SAPA..................................................................................................................... 31 Tabela 3 – Dados comparativos entre sequenciamento e montagem dos genomas dos clones CL-14 e CL Brener.............................................................................. 35 Figura 2 - Número de reads de CL-14 pelo tamanho em pares de base............. 36 Figura 3 - Pulse field do DNA total de CL-14 e CL Brener................................... 37 Figura 4 – Sintenia entre contigs de CL-14 e seus cromossomos homólogos em CL Brener.................................................................................................................... 40 Figura 5 – Southern Blots genômicos................................................................... 42 Tabela 4 – PCR in silico de marcadores utilizados na genotipagem do T. cruzi............................................................................................................. .......... 44 Figura 6 – Eletroforese dos amplicons dos marcadores para diferenciação de DTUs...................................................................................................................... 46 Figura 7 – Árvores filogenéticas............................................................................47 Figura 8 – Parte do alinhamento entre os dois diferentes haplótipos................... 49 Figura 9 – Alinhamento das reads de CL-14 com genes homólogos de CL Brener.................................................................................................................... 50 IV Figura 10 - Cobertura das reads de CL-14 nos maxicírculos de CL Brener e Esmeraldo.............................................................................................................. 52 Tabela 5 – Polimorfismos encontrados entre os kDNAs dos clones CL Brener e CL14........................................................................................................................... 53 Figura 11 - Comparação entre os genomas mitocondriais de CL-14 e CL Brener.................................................................................................................... 55 Tabela 6 – Contagem das famílias gênicas e grupos de ortólogos...................... 58 Tabela 7 – Médias dasi dentidades das sequências codificadoras de proteínas......................................................................................................................... 59 Figura 12 – Resultados do algoritmo.................................................................... 62 Figura 13 – Comparação de mapeamento entre o algoritmo e BWA................... 64 Figura 14 – Coberturas da Trans-sialidase pelas reads genômicas de CL Brener e CL-14..................................................................................................................... 66 Figura 15 – Cobertura nucleotídeo a nucleotídeo da Trans-sialidase Tc00.1047053509495.30 por reads de sequenciamento genômico de CL Brener e CL-14 e por reads do sequenciamento do transcriptoma de CL-14...................... 68 Figura 16 – Eletroforeses de analises de TcTS-SAPA......................................... 69 Figura 17 – Organização das Repetições SAPA nos clones CL Brener e CL14........................................................................................................................... 71 Figura 18 – Western blot TcTS e TcTS-SAPA..................................................... 73 Figura 19 – Exemplos de perfis de RNAs totais e bibliotecas de cDNA............... 75 V Figura 20 – Mapeamento das reads do sequenciamento do mRNA de CL-14.... 78 Anexo 1 – Publicação: Predicting the proteins of Angomonas deanei, Strigomonas culicis and their respective endosymbionts reveals new aspects of the trypanosomatidae family........................................................................................ 105 Anexo 2 – Publicação: Distinct genomic organization, mRNA expression and cellular localization of members of two amastin sub-families present in Trypanosoma cruzi…………………………………………………………………………………….... 126 Anexo 3 – Manuscrito em preparação: Genome sequence of a highly attenuate clone of Trypanosoma cruzi identifies SAPA repeats as a major virulence factor in this human parasite…………………….………………………………………….…… 138 VI Resumo Trypanosoma cruzi, o agente etiológico da doença de Chagas, pertence a um grupo de organismos com genoma peculiar, no qual expansões massivas de famílias de genes de proteínas de superfície estão presentes e uma grande parte deste é dedicada à sequências repetitivas. A conclusão do sequenciamento genoma de referência, do clone CL Brener, revelou vários dados relacionados à virulência do parasito. CL-14 é um clone avirulento derivado da mesma cepa de T. cruzi CL, no entanto, em contraste com CL Brener, o clone CL-14 não é infeccioso nem patogênico in vivo. Com o objetivo de investigar os determinantes moleculares de virulência do T. cruzi, foi realizada uma comparação direta entre os genomas dos clones CL Brener e CL-14, com base nas sequências disponíveis CL Brener e sequências do genoma de CL-14 por nós geradas utilizando a plataforma 454 FLX. Embora ambos os genomas não foram totalmente montados, verificou-se que eles apresentam organização altamente semelhante tanto com relação ao genoma nuclear quanto ao genoma mitocondrial (kDNA), possuem números semelhantes de sequências codificantes preditas bem como números semelhantes de cópias de membros das famílias de multigênicas. Análises de PCR, bem como inferências filogenéticas mostraram que o CL-14 é também um clone híbrido, que pertence à mesma DTU que o clone CL Brener (TcVI). Análises de similaridade e Southern blot indicam que os dois clones apresentam cariótipos semelhantes e identidade de sequência superior a 99 %. A única diferença importante detectada entre estes dois genomas é relativa a um subgrupo da grande família de genes que codificam as trans-sialidases (TcTS), conhecidas por apresentarem um domínio C-terminal contendo 12 repetições de aminoácidos denominado ‘shed acute phase antigen’ ou repetições SAPA. Presentes no genoma do clone CL Brener, o qual possui pelo VII menos três cópias de TcTS contendo domínios repetitivos variando entre 19-41 repetições, as repetições SAPA são altamente imunogênicas e promovem um aumento da meia vida das proteínas TcTS liberadas na corrente sanguínea do hospedeiro. No clone CL -14, foi identificada somente uma cópia de TcTS contendo três repetições SAPA. Esta quantidade reduzida de repetições SAPA em genes de TcTS em CL-14, confirmada experimentalmente por PCR, ensaios de Southern blot, western blot e dados de transcriptoma, pode constituir um dos fatores responsáveis pelas diferenças de virulência entre as duas linhagens. VIII Abstract Trypanosoma cruzi, the etiologic agent of Chagas disease, belongs to a group of organisms with a peculiar genome in which a massive expansion of surface protein gene families is present and a large proportion of it is devoted to repetitive sequences. The completion of the CL Brener reference strain genome revealed several new features related to the parasite virulence. CL-14 is an avirulent clone derived from the same T. cruzi CL strain, however, in contrast to CL Brener, CL-14 is neither infective nor pathogenic in vivo. To investigate the molecular determinants of T. cruzi virulence, we performed a direct comparison of the CL Brener and CL-14 genomes, based on the available CL Brener sequences and sequences we have generated from CL-14 using the 454 FLX plataform. Although both genomes have not been fully assembled, we found that they have highly similar nuclear genome organization, almost 100% identical mitochondrial maxi-circle kDNA, similar numbers of predicted coding sequences as well as number of copies of members of multigene families. PCR analyses as well as phylogenetic inferences showed that CL-14 is also a hybrid that belongs to the same DTU as CL Brener (TcVI). Southern blot analyses indicate a similar karyotype and, for most multigenic families, sequence identity among the two clones is higher than 99%. The only major difference detected between these two genomes is related to a sub-group of the large TransSialidase gene family (TcTS), known to have a C-terminal domain with 12-aminoacid repeats named ‘shed acute phase antigen’ or SAPA repeats. At least three copies of TcTS containing a repetitive domain varying from 19 to 41 repeats, which are highly immunogenic and promote an increase in the half-life of TcTS protein sheded in the host bloodstream, are present in the CL Brener genome, whereas in CL-14, only one copy containing 3 SAPA repeats was identified. This reduced IX amount of SAPA repeats in the CL-14 TcTS, confirmed by PCR, Southern, western blot analyses and transcriptome data, may constitute one of the factors responsible for the differences in virulence between these two strains. X 1. Introdução 1.1 Trypanosoma cruzi e a Doença de Chagas A Doença de Chagas ou Tripanossomíase Americana é causada pelo Trypanosoma cruzi, parasito protozoário descoberto por Carlos Chagas no início do século XX. A tripanossomíase americana foi designada como doença tropical negligenciada pela Organização Mundial da Saúde (WHO, 2013). A doença acomete entre 7 e 8 milhões de pessoas e causa 12000 mortes por ano (Rassi et al., 2010). São encontradas áreas endêmicas em 21 países latino americanos. No entanto, nas últimas décadas, a doença tem sido cada vez mais detectada nos Estados Unidos da América e Canadá devido à imigração de pessoas entre os países (WHO, 2013). Em consequência do largo uso de inseticidas em spray, Uruguai, Chile e Brasil declararam que estão livres de transmissão via Triatoma infestans, o vetor principal do T. cruzi (Schofield, et al., 2006). A transmissão do parasito para o homem ocorre mais comumente com o contato das fezes infectadas do vetor hematófago com mucosas ou feridas abertas pelo mesmo ao sugar o sangue do hospedeiro (Fig. 1). Insetos triatomíneos são exclusivamente hematófagos e se tornam infectados com T. cruzi quando se alimentam de sangue de mamíferos contendo formas tripomastigotas do parasito. Uma vez no intestino do inseto, o parasito se 1 Figura 1: Representação esquemática do ciclo de vida do Trypanosoma cruzi. a: presença de formas tripomastigotas metacíclicas nas fezes do vetor; b: entrada de formas tripomastigotas metacíclicas no hospedeiro vertebrado por lesão ou fissura na pele ou mucosas; c: multiplicação intracelular das formas amastigotas; d: diferenciação das formas amastigotas em formas tripomastigotas; e: liberação das formas tripomastigotas e infecção de novas células do hospedeiro; f: liberação das formas tripomastigotas para a corrente sanguínea do hospedeiro; g: infecção de tecidos musculares e/ou nervosos por formas tripomastigotas; h: ingestão de formas tripomastigotas sanguíneas pelo vetor i: diferenciação das formas epimastigotas em formas tripomastigotas metacíclicas no intestino posterior do vetor, reiniciando o ciclo de vida do parasito. Figura retirada e traduzida Expert Reviews in Molecular Medicine, Cambridge University Press, 2002. 2 transforma em epimastigotas, as quais são formas replicativas. Na porção final do trato digestivo do barbeiro as formas epimastigotas se diferenciam em tripomastigotas metacíclicas, a forma do T. cruzi capaz de infectar mamíferos pela transmissão vetorial. Quando insetos infectados defecam durante o repasto sanguíneo, eles depositam parasitos, o que pode resultar em transmissão pelo contato como conjuntivas, mucosas ou a lesão da picada do inseto. Os tripomastigotas metacíclicos penetram nas células do hospedeiro e se transformam em amastigotas, as formas replicativas no hospedeiro vertebrado. Após vários ciclos de multiplicação, as amastigotas se transformam em tripomastigotas e a célula hospedeira é rompida, liberando parasitos no sangue. Os tripomastigotas liberados podem infectar células adjacentes ou serem distribuídos pelo corpo pelos vasos linfáticos ou sanguíneos, infectando órgãos e tecidos distantes. O ciclo de vida e a transmissão continuam quando vetores se alimentam do sangue de hospedeiros contaminados (Brener et al., 2000). Menos frequente a transmissão pode ocorrer por transfusão sanguínea, transplante de órgãos, transmissão congênita (WHO, 2013) ou ainda por transmissão oral por ingestão de alimentos contaminados. A mortalidade está mais associada com o estágio crônico da doença, a qual pode levar vários anos para desenvolver. Não existe vacina para a doença de Chagas e existem somente 2 medicamentos disponíveis para o tratamento, ambos com pouca eficácia e apresentando sérios efeitos colaterais (Brener et al., 2000 e WHO, 2013). 3 1.2 Variabilidade na População de Trypanosoma cruzi Características biológicas (Andrade, 1974), bioquímicas e moleculares (Miles et al., 1981, Morel et al., 1980, Tybarenc e Ayala, 1991, Freitas et al., 2006, Herrera et al., 2007), permitiram a classificação das varias cepas de T. cruzi em dois grupos denominados TcI e II. Essas linhagens são muito divergentes como revelado pelos autores e pertencem a ambientes predominantemente distintos: TcI, na região central da América do Sul e com ciclo de vida silvestre, apresenta baixo índice de parasitismo em humanos. O TcII, com transmissão doméstica, causa infecções em humanos com alta parasitemia em áreas endêmicas (Zingales et al., 1999). No Brasil, cepas TcII aparentemente são exclusivamente responsáveis por lesões da Doença de Chagas (Freitas et al., 2005). Em 2006, Freitas et al., separaram 144 diferentes haplótipos através de filogenia com marcadores moleculares e demonstraram que algumas cepas não poderiam ser classificadas como Tc I ou II, sugerindo um novo grupo para essas cepas, o Tc III. Outras cepas não foram classificadas pelos parâmetros estudados e, portanto, a classificação filogenética não estaria completa. Um novo grupo de cepas deveria ser criado, pois possui características de dois grupos distintos, Tc II e III, indicando a existência de cepas híbridas. Com o melhor entendimento da sua estrutura populacional e inclusão de novos marcadores moleculares, mais recentemente as várias cepas deste 4 protozoário passaram a ser classificadas em seis grupos, T. cruzi I-VI (Zingales et al., 2009). Na nova classificação, foi denominado um grupo híbrido, TcVI, o qual é oriundo do parental receptor TcII e doador TcIII (Freitas et al., 2006). O genoma mitocondrial do TcIV é derivado de cepa parental pertencente ao grupo TcIII. 1.3 Genoma do Trypanosoma cruzi Análises da sequência completa do genoma do T. cruzi, publicada em 2005 (El-Sayed et al., 2005) mostraram que seu genoma de 55 milhões de pares de bases (Mb) é diplóide, dos quais 50% são codificantes e grande parte corresponde a sequências repetitivas, como retrotransposons e genes de grandes famílias de proteínas de superfície. O clone referência escolhido para o projeto genoma do T. cruzi é o clone CL Brener (Brener e Chiari, 1963), um clone híbrido o qual é pertencente ao grupo TcVI (Zingales et al., 2009). A escolha do clone CL Brener para o projeto genoma foi baseada em cinco características: seu padrão de infecção em camundongos é bem conhecido, foi isolada do vetor Triatoma infestans, possui um tropismo preferencial para coração e células musculares, apresenta uma clara fase aguda em humanos infectados e é sensível a drogas utilizadas clinicamente para a doença de Chagas (Zingales et al., 1997). Outros importantes trabalhos de análises genômicas com esse clone haviam sido previamente publicados, incluindo 5 análises de cariótipo (Branche et al., 2006 e Porcile et al., 2003), mapas físicos e geração de ESTs de todos os estágios de vida do ciclo do parasita (Brandão et al., 1997; Cano et al., 1995; Cerqueira et al., 2005; Henriksson et al., 1995; Porcel et al., 2000; Verdun et al., 1998). O sequenciamento do genoma do T. cruzi foi baseado na técnica wholegenome shotgun (WGS) com uma cobertura de 14 vezes e montagem final de 5486 scaffolds (El-Sayed et al., 2005). Durante o sequenciamento do genoma verificou-se que este é um genoma híbrido resultante da fusão de dois genótipos de cepas oriundas de T. cruzi II e T cruzi III, ou seja, possui dois haplótipos diferentes. Foi então sequenciado pela mesma técnica, com uma baixa cobertura (2,5x) o genoma do T. cruzi clone Esmeraldo, pertencente ao grupo Tc II. Comparando os contigs do clone CL Brener e as reads do clone Esmeraldo, foi possível discriminar os dois haplótipos de CL Brener. Sequências de CL Brener mais similares aos reads de Esmeraldo foram denominadas Esmeraldo-like. O outro haplótipo foi anotado como nonEsmeraldo-like. Dos Esmeraldo-like; 6043 22570 genes preditos, 6159 representam alelos representam alelos non-Esmeraldo-like e 10368 representam sequências que não puderam ser associadas a um haplótipo em particular. Além de descrever o genoma e sua organização, os autores ainda apresentaram uma nova família de genes com mais de 1300 cópias, a família das MASPs, que codificam proteínas de superfície associadas a mucinas. Por meio de análises filogenéticas, Freitas et al., (2006) confirmaram a natureza 6 híbrida do clone CL Brener, como resultante da fusão de cepas parentais pertencentes aos grupos T. cruzi II e T. cruzi III. A fim de gerar uma montagem com maior resolução, que representasse melhor os cromossomos do T. cruzi, Weatherly et al. (2009) gerou consensus de cada par de cromossomos homólogos para ambos haplótipos. Os autores montaram inicialmente 11 cromossomos baseados na sintenia com os cromossomos de T. brucei. Outros cromossomos foram montados após o mapeamento de ambas as extremidades de clones de Bacterial Artificial Chromosome (BAC) que tivessem sequências de diferentes contigs ou scaffolds na direção correta. No total, 41 cromossomos foram montados, contagem a qual corrobora com a contagem de cromossomos de T. cruzi predita baseada em estudos com pulsed-field gel electrophoresis (PFGE) (Branche et al., 2006). A montagem proposta por Weatherly et al. 2009, apresenta 90% dos genes anotados no genoma. Verificou-se que a organização genômica do T. cruzi é extremamente sintênica com os genomas de T. brucei e L. major (os quais juntamente com o T. cruzi são conhecidos como Tri-Tryps). Essa sintenia é bem conservada em regiões contendo os genes housekeeping, mas é quebrada em regiões de famílias gênicas que codificam para cromossômicas proteínas de superfície internas não-sintênicas que e ocorrem regiões em posições subteloméricas. Retroelementos e RNAs estruturais ocorrem também nessas regiões de baixa sintenia (El-Sayed et. al., 2005b). 7 O genoma mitocondrial presente nestes organismos, denominado kDNA é constituído por 25-50 cópias de maxicírculo (com 22Kb) e 5000-10000 cópias de minicírculos (com 7,5Kb) (Shapiro, 1993), Esta rede de kDNA única está presente na estrutura da mitocôndria, que caracteriza os eucariotos flagelados da classe Kinetoplastida, a qual os tripanosomatídeos pertencem. Possui aproximadamente 22 Kb, é distinto dos outros genomas mitocondriais pelo seu grande tamanho, complexidade e conteúdo (Westenberger et al., 2006) e compreende aproximadamente 20-25% do DNA total desse organismo (Souza, 2003). Este DNA é um importante marcador taxonômico e, a partir dele, foi definida a relação filogenética de 45 cepas de T. cruzi, agrupadas em três clados, A, B e C (Machado e Ayala, 2001). O maxicírculo do clone CL Brener é um oriundo do TcIII e o maxicírculo do clone Esmeraldo é pertencente ao TcII. Em 2011, Frazén et al. apresentaram a sequência do genoma do clone Sylvio X10/1 de T. cruzi, um representante do grupo Tc I. Os dados revelaram que os genomas dos dois clones, Sylvio X10/1 e CL Brener, possuem alta sintenia e um set de genes muito similar, mas com grandes diferenças na quantidade de genes pertencentes às famílias multigênicas. Os alelos do clone Sylvio X10/1 tem 97% e 96% de identidade com os haplótipos non-EsmeraldoLike e Esmeraldo-like, respectivamente, o que sugere que o clone Sylvio X10/1 é mais similar ao haplótipo non-Esmeraldo like, ou seja, ao genótipo de T. cruzi tipo III. A quantidade de DNA não codificante entre os genomas também é extremamente semelhante. 8 1.4 Genômica Comparativa de Tripanosomatídeos Estudos de genômica comparativa revelam aspectos importantes relacionados às diferenças no ciclo de vida, tipo de hospedeiro, virulência e patogenicidade de organismos causadores de doenças. Apesar da grande distância filogenética, análises dos genomas do T. cruzi e de dois outros tripanosomatídeos patogênicos para o homem, Leishmania major e Trypanosoma brucei, revelaram um proteoma comum contendo 6200 proteínas e uma alta sintenia gênica (El-Sayed et al., 2005b). A frequente correlação entre blocos sintênicos conservados e os grandes clusters genômicos direcionais (DGC), os policístrons característicos dos três tripanosomatídeos, também refletem seu acoplamento das reações de transcrição com o subsequente processamento do RNA pelo trans-splicing e poliadenilação. Apesar disso, existem diferenças substanciais mesmo em genes com o mesmo contexto genômico, o que indica adaptações específicas a pressões seletivas, estratégias de sobrevivência de cada organismo e diferenças no ciclo de vida, O Trypanosoma cruzi, como já citado, tem seu ciclo de vida entre o vetor invertebrado triatomíneo e o hospedeiro vertebrado, onde infecta o sangue e invade células. O Trypanosoma brucei divide seu ciclo de vida entre um vetor invertebrado (moscas do gênero Glossina), nas formas epimastigotas e promastigota, e entre o hospedeiro vertebrado na forma tripomastigota, apenas no sangue do hospedeiro, não invadindo células. A Leishmania major, também 9 divide seu ciclo de vida entre um vetor invertebrado e hospedeiro vertebrado. Os vetores são mosquitos fêmeas dos gêneros Lutzomyia ou Phlebotomus, onde ocorrem as formas amastigota e promastigota procíclica. Na forma promastigota, a Leishmania invade o hospedeiro vertebrado pela picada do inseto, em sua forma promastigota e, após invasão celular, se transforma em amastigotas, a forma replicativa. Os três tripanosomatídeos também praticam diferentes estratégias de evasão do sistema imunológico do hospedeiro: L. major altera a função dos macrófagos infectados, T cruzi expressa uma complexa variedade de antígenos de superfície de dentro das células que infecta e T. brucei se mantem extracelular, mas contorna a resposta imune do hospedeiro pela mudança periódica de sua principal proteína de superfície, a VSG (El-Sayed et al, 2005b e Pays et al., 2004) A localização de grandes arranjos de genes que codificam para proteínas de superfície, perto ou dentro de telômeros e a presença de elementos transponíveis nesses arranjos podem aumentar a frequência de recombinação e resultar na variação de sequências codificadoras. Isso é observado em T. cruzi com as MASPs, DGF-1 e RHS, esta última também em T. brucei, onde estes genes podem estar relacionados com a evasão imune e sobrevivência em diferentes hospedeiros. A recombinação frequente dessas regiões resulta em grandes polimorfismos entre cromossomos homólogos. Análises similares realizadas com os dados dos genomas de três espécies de Leishmania, Leishmania infantum, Leishmania braziliensis 10 (Peacock et al., 2007) e Leishmania major (Ivens et al., 2005) mostraram também que a formação de pseudogenes e perda de genes são eventos que poderiam determinar algumas das diferenças observadas nos processos de interação parasito-hospedeiro (Peacock et al., 2007). O estudo comparativa do genoma do Trypanosoma brucei gambiense, a subespécie causadora da doença do sono em humanos, com o genoma do T. brucei brucei, uma subespécie que não infecta humanos, não foi capaz de revelar a presença de sequências específicas nesses tripanosomatídeos, que poderiam explicar as diferenças na capacidade de infectar hospedeiros humanos. No entanto, foram identificadas contagens diferentes de cópias de genes que codificam para proteínas de superfície entre os clones infectivos e não-infectivos (Jackson et al., 2010). Dentre outros tripanosomatídeos que tiveram seus genomas sequenciados, cabe ressaltar: Leishmania tarentolae, a espécie não virulenta com ausência de genes associados ao estágio intracelular no hospedeiro mamífero (Raymond et al., 2012); Leishmania donovani, onde o genoma de isolados clínicos indicou co-infecção com Leptomonas (Singh et al., 2013); Trypanosoma cruzi marinkellei B7, um parasito associado a morcegos, com variação no número de cópias de genes em famílias multigênicas e muitas sequências únicas, incluindo potenciais genes subespécie-específicos (Franzén et al., 2012); Leishmania amazonensis, o agente etiológico da leishmaniose cutânea humana, revelando genes de superfícies únicos para o 11 gênero que podem estar relacionadas com o desenvolvimento da doença e interação com células do hospedeiro e também foi proposto um interactoma híbrido entre proteínas secretadas pelo parasito em fatores que imitam o sistema do hospedeiro (Real et al., 2013); Angomonas deanei e Strigomonas culicis, cuja sequência revelou dados sobre a interação e adaptação desses tripanosomatídeos com seus endossimbiontes, fornecendo informações sobre a evolução de células eucariotas (Motta et al., 2013). Em 2013, Goodhead et al., sequenciaram os genomas de duas subespécies de Trypanosoma brucei, T. b. gambiense e T. b. rhodesiense as quais além de serem geneticamente e geograficamente distantes e são associados a fases diferentes da doença do sono africana. Utilizando de marcadores específicos desenvolvidos para cada um desses genótipos foi observado que o T. b. rhodsiense isolado de um único foco possui genótipo e fenótipo dos dois genomas de referência. Os resultados dos autores sugerem que houve introgressão genética entre as subespécies infectivas de T. brucei e, portanto elas não são geneticamente isoladas. Dados de outros genomas como de T. congolense IL3000, T. vivax Y486, L. mexicana U1103, C. fasciculata Cf-Cl, T. brucei Lister 427, T. cruzi Esmeraldo, T. cruzi JR cl. 4, E. monterogeii LV88 e L. panamensis L13 são encontrados em http://tritrypdb.org. 12 1.5 O Clone CL-14 O clone CL Brener, utilizado como cepa referência para o projeto genoma do T. cruzi foi isolado a partir de uma cepa isolada do Triatoma infestans (revisado por Zingales et al., 1997). Um segundo clone derivado dessa mesma cepa, o clone CL-14 de T. cruzi apresenta como característica peculiar o fato de ser totalmente avirulento. Ensaios de infecção in vitro mostraram que o clone CL-14 é quatro vezes menos invasivo quando comparado com a cepa CL parental (Atayde et al., 2004). A inoculação de tripomastigotas do clone CL-14, além de não produzir parasitemia, mesmo em animais imuno-deficientes, é capaz de induzir imunidade protetora eficiente, subsequente ao desafio com a cepa CL, prevenindo a mortalidade, desenvolvimento de parasitemia e sintomas da doença em camundongos (Lima et al., 1990 e 1995). Em 1999, Paiva et al., demonstrou que a CL-14 induz imunidade envolvendo resposta CD8+ em animais, os quais, após serem desafiados com CL Brener, não apresentam parasitemia e parasitismo tecidual, inclusive em animais neonatos. Os animais imunizados são capazes de induzirem produção de INF- , IgG1, IgG2a e IgG2b (Pyrrho et al., 1998, Soares et al., 2003, Atayde et al., 2004). Devido a sua natureza não virulenta, o clone CL-14 foi usado como vetor vacinal contra melanoma (Junqueira et al., 2012). O gene do antígeno NYESO-1, característico por baixa expressão em tecidos normais, mas com alta 13 expressão em neoplasias como tumores de pulmão, esôfago, fígado, estômago, próstata, ovário, vesícula e melanoma, foi clonado em vetor de expressão em CL-14. O T. cruzi CL-14 transgênico expressando NY-ESO-1, quando testado em modelos animais, foi capaz de induzir altos níveis de resposta imune humoral e celular do tipo Th1 contra o antígeno NY-ESO-1, além de inibirem completamente o crescimento de melanomas, quando células humanas expressando NY-ESO-1 foram injetadas nos animais infectados com CL-14. 1.6 Expressão Gênica em Tripanosomatídeos A expressão gênica em T. cruzi, como nos outros membros da família Trypanosomatidae, ocorre de forma bastante peculiar. O parasito transcreve seus genes constitutivamente em longos transcritos policistrônicos que são processados pós-transcricionalmente. A iniciação da transcrição é bidirecional entre dois policístrons diferentes (Martínez-Calvillo et al., 2003, 2004.). Os genes codificadores de proteínas são transcritos pela RNA polimerase II em pre-mRNAs policistrônicos (Martínez-Calvillo et al., 2003). Uma vez sintetizados, reações de trans-splicing, que resultam na união da sequência Spliced Leader, ou SL contendo cap à extremidade 5’ do transcrito (Liang et al. 2003) e a poliadenilação ocorrem gerando mRNAs monocistrônicos maduros (Teixeira e Da Rocha 2003). Hartmann et al., (1998) e López-Estraño et al., 14 (1998) demonstraram que regiões intergênicas ricas em polipirimidinas guiam a adição do SL e a poliadenilação. Foi demonstrado por Campos et al., 2008, que em T. cruzi o tamanho médio entre os sítios de adição do SL e os motivos de polipirimidina tem 18 nucleotídeos e a distância média entre os sítios de adição de cauda poli-A e a sequência rica em polipirimidina upstream mais próxima é de 40 nucleotídeos. Os autores demonstraram também que os tamanhos médios das sequências 5’-UTR e 3’-UTR de T. cruzi são 35 e 264 nucleotídeos, respectivamente, sendo menores que o observado para T. brucei, corroborando com os resultados de análises comparativas dos genomas dos Tri-Tryps obtidos por Berriman et al., 2005, El-Sayed et al., 2005a e Ivens et al., 2005, que mostram que o genoma de T. cruzi é mais compacto que o genoma de T. brucei. Poucos estudos sobre expressão gênica ao nível global em T. cruzi foram publicados. Utilizando a técnica de microarray (Minning et al., 2003, Minning et al., 2009), observou-se que um total de 4992 transcritos (aprox. 41% dos genes) parece ser regulado negativa ou positivamente em pelo menos em um dos estágios de vida do parasita. Alguns desses resultados de microarray foram validados por comparação com dados de RT-PCR quantitativo. Foi também observado que membros de clusters parálogos em T. cruzi podem exibir divergências significativas de expressão ao longo do ciclo de vida, no que diz respeito à abundância dos respectivos mRNAs, como é o caso das amastinas, proteínas expressas em amastigotas (Teixeira et al., 1994), mas 15 com membros que expressam também em epimastigotas (Kangussu-Marcolino et al., 2013). Essas análises de microarray apresentam, entretanto, algumas limitações sendo, uma delas, o fato de ser necessário o conhecimento prévio das sequências a serem analisadas, as quais precisam estar presentes nos chips (de oligonucleotídeos ou de cDNAs). Além disso, a detecção de variações nos níveis de mRNAs mais raros torna-se muito difícil, mais ainda quando a quantidade de mRNA para a hibridização com as sondas dos chips é pequena, pois algumas formas, como as amastigotas intracelulares, são difíceis de serem obtidas. Também é difícil representar todos os genes no espaço disponível dos chips. A técnica de microarray não tem sensibilidade de detectar RNAs de baixa expressão, pode criar artefatos de hibridização cruzada, gerar os dados apresenta um custo elevado e tem baixo rendimento (Wang et al., 2009). Em contraste com a tecnologia de microarray, abordagens baseadas em sequenciamento de cDNAs determinam diretamente os níveis dos vários mRNAs nas células. Existe hoje, um grande volume de dados de expressed sequence tags (ESTs) nos bancos de dados genômicos. Porém esses dados não apresentam claramente informações sobre a expressão gênica, pois além de não cobrirem todo o transcriptoma, não temos como quantificar de forma precisa os transcritos correspondentes a cada EST. O advento da nova tecnologia de sequenciamento de cDNA ou RNA-seq, que é capaz de determinar a estrutura e também quantificar o nível dos transcritos nas células, 16 possibilita agora a obtenção do conjunto completo de dados sobre a expressão de um genoma. Essa nova tecnologia já foi utilizada para estudar o transcriptoma em T brucei. Nesses estudos (Siegel et al., 2011, Kolev et al., 2010 e Archer et al., 2011) foram geradas sequências de RNAs presentes nas duas formas do parasito e demonstrou-se que a iniciação da transcrição nesses organismos não é restrita ao início dos grupos de genes, mas pode ocorrer bidiredicionalmente em sítios internos, como descrito para Leishmania major (Martínez-Calvillo et al., 2003 e 2004) e para Trypanosoma cruzi (revisado por Araújo et al, 2011). O mapeamento das reads provenientes do sequenciamento do transcriptoma do T. brucei realizado por Kolev et. al., 2010, mostrou que a transcrição é bidirecional, pois as reads derivadas de fragmentos de RNA 5’-trifostato mapearam em sentidos opostos, a partir da mesma origem. Em 2011, Archer et al. identificou padrões de sequências (motifs) de transcritos co-regulados, sugerindo que possam ser sinais de regulação de expressão. Os motifs de RNA envolvidos na regulação do ciclo celular foram descritos para T. brucei e são conservados entre outros cinetoplastídeos. Além disso, estes estudos descreveram as posições dos sítios de adição do spliced leader e dos sítios de adição de cauda poli-A. Transcritos não descritos anteriormente foram anotados e, não menos importante, as quantidades dos transcritos foram definidas para cada estágio do ciclo de vida, definindo o nível de expressão dos genes (Siegel et al., 2005). 17 Em 2011, Franzén e seus colaboradores publicaram o transcriptoma de pequenos RNAs não codificantes do T. cruzi em larga escala pela técnica RNA-seq, a fim de descrever o metabolismo dos RNAs neste organismo, uma vez que ele não tem vias clássicas de processos relacionados à RNA interferente (da Rocha et al., 2004). Os autores encontraram sequências relacionadas a rRNAs, snRNAs, snoRNAs, grande quantidade de pequenos RNAs derivados de tRNAs e 92 novos loci, onde a maioria não apresenta homologia com classes conhecidas de RNA. 18 2. Objetivos 2.1 Objetivo Geral: Investigar as bases genômicas da diferença de infectividade entre os clones de Trypanosoma cruzi, CL Brener, um clone virulento e CL-14, um clone avirulento no modelo de infecção animal. 2.2 Objetivos Específicos 1. Geração e análise da sequência completa do genoma do clone CL-14; 2. Comparação de sequências nucleotídicas entre CL-14 e CL Brener; 3. Análises filogenéticas entre CL-14 e CL Brener; 4. Análise do conteúdo gênico e número de cópias de famílias multigênicas entre CL Brener e CL-14; 5. Montagem e anotação do genoma de maxicírculo de CL-14; 6. Geração e análise de sequências de cDNA de epimastigotas, amastigotas intracelular e tripomastigotas de CL-14 por RNA-seq e comparação com sequências de RNA expressas em CL Brener e CL-14. 19 3. Materiais e Métodos 3.1 Sequenciamento do DNA nuclear e mitocondrial Formas epimastigotas do clone CL-14 do Trypanosoma cruzi, foram cultivadas em meio LIT como descrito por Teixeira et al., (1994). O DNA total foi extraído e foi utilizado para a construção de duas bibliotecas genômicas, cada uma com 5mg DNA. Uma biblioteca foi realizada pelo método shotgun, onde o DNA é fragmentado aleatoriamente e outra biblioteca foi realizada pelo método paired end tag (PET), no qual o DNA é fragmentado em pedaços maiores e são sequenciadas “etiquetas” nas extremidades destes fragmentos, os quais são posteriormente mapeados no genoma que é então montado (Fullwood et al., 2009). Cada biblioteca foi sequenciada individualmente por pirosequenciamento de alto rendimento, high-throughput pyrosequencing, com o equipamento Roche 454 FLX-Titanium no Laboratório Nacional de Computação Científica – LNCC, em Petrópolis/RJ. A montagem do genoma total do T. cruzi, também realizada no LNCC foi feita de novo pelos softwares Mira (Chevreux et al., 2004) e Newbler (www.454.com). 20 3.2 Pré-processamento e pré-análises das sequências O grupo do LNCC nos enviou três conjuntos de sequências: as reads que são as sequências geradas pelo sequenciador, e duas montagens de contigs, uma montada pelo Mira e outra pelo Newbler. O produto do sequenciamento, as reads, foi pré-processado in silico através do software Seqclean e do banco de dados UniVec. O primeiro passo foi procurar por adaptadores e sequências de baixa qualidade usando o Seqclean com o banco UniVec. Esses adaptadores são provenientes de duas etapas distintas da técnica de sequenciamento: adaptação e amplificação dos fragmentos do DNA dentro das beads e a amplificação para o sequenciamento propriamente dito. Suas sequências são: Primer A – CCTCCCTCGCGCCATCAG e Primer B – GCCTTGCCAGCCCGCTCAG, para amplificação das amostras e Primer A – GCCTCCCTCGCGCCA e Primer B – GCCTTGCCAGCCCGC para o sequenciamento. Aquelas sequências que apresentaram contaminação por adaptadores ou baixa qualidade foram tiveram essas estruturas retiradas. As sequências montadas não receberam pré-processamento. Como primeiras análises, determinadas com uma pipeline da linguagem Perl e o pacote de funções BioPerl (Stajich et al., 2002), estabelecemos a quantidade de sequências geradas durante o sequenciamento e também durante as montagens, a quantidade total de nucleotídeos sequenciados, a 21 cobertura do genoma (quantas vezes o genoma foi sequenciado), a porcentagem de CG, e o N50. N50 é uma métrica estatística onde é calculado o tamanho do médio dos melhores contigs montados, usando um genoma de referência, o qual utilizamos o genoma do Trypanosoma cruzi clone CL Brener. Esse valor é definido pelo tamanho do contig onde, a soma de contigs maiores ou iguais produza o mesmo valor que a metade do genoma de referência. Os contigs foram organizados do maior para o menor e, somando um a um, quando a soma atingiu o valor da metade do genoma diplóide do clone CL Brener, o N50 foi definido pelo tamanho do último contig somado. Os contigs montados do clone CL-14 maiores que o N50 foram alinhados pelo software MEGABLAST contra os cromossomos montados do clone CL Brener (TritrypDB versão 4.3) e, os cromossomos que apresentaram maior pontuação com os contigs nos alinhamentos, foram selecionados como homólogos. Os contigs foram então alinhados com seus cromossomos homólogos pelo software CONTIGuator (Galardini et al., 2011) para análises de sintenia. 22 3.3 Amplificação por PCR do DNA nuclear e mitocondrial de cepas do Trypanosoma cruzi Foram realizadas reações de polimerase (PCR) in silico e depois confirmadas in vitro para dois marcadores nucleares, mini-exon SL (Burgos et al., 2007) e rDNA 24S (Souto et al., 1996) e para um marcador mitocondrial na sequência do gene citocromo oxidase II (COII), como descrito por De Freitas et al., 2006. Com uma pipeline em Perl e algoritmos do software e-PCR (Schuler, 1997), procuramos por sequências entre os primers F- AAGGTGCGTCGACAGTGTGG e R- TTTTCAGAATGGCCGAACAGT para o marcador do gene nuclear que codifica para a subunidade ribossomal 24S e pelas sequências entre os primers F- CGTACCAATATAGTACAGAAACTG e RCTCCCCAGTGTGGCCTGGG para o marcador nuclear miniexon SL. Foram permitidos alinhamentos de primers com até 2 gaps e 2 mismatches afim de permitir o pareamento dos primers e verificar se os primers não anelariam em outras regiões. A análise do marcador plastidial COII foi realizada com um passo adicional onde os amplicons resultante do PCR eletrônico com os primers F- CCATATATTGTTGCATTATT e R- TTGTAATAGGAGTCATGTTT foram recuperados do genoma montado por um script em Perl e digeridos também in silico pelo software ReMap do pacote Emboss (Rice et al., 2010), configurado para encontrar o sítio de restrição da enzima AluI. 23 As mesmas análises foram feitas in vitro com DNA extraído de células de formas tripomastigotas de cultura de representantes dos grupos Tc I-III, V, VI e duas amostras biológicas de CL-14. As reações foram compostas de 1X tampão GoTaq, 1.5 mM MgSO4 , 40 mM dNTPs, 0.75 U Taq e 10 pM de cada primer. As sequências dos primers foram as mesmas das análises in silico e os programas estão resumidos na Tabela 1. O produto de PCR obtido para o marcador COII foi digerido com a enzima AluI, 1U para cada 20 mL, overnight a 36 °C. Os resultados de PCR dos marcadores Mini-exon SL e rDNA 24Sα e o produto de digestão do marcador COII foram submetidos a eletroforese em gel de poliacrilamida 6%, seguido por coloração com nitrato de prata. 3.4 Análises Filogenéticas Com uma pipeline em Perl e outros programas, como os algoritmos do pacote BLAST, ClustalW (Larkin et al., 2007) e Mega 5 (Tamura et al., 2011), determinamos a distância filogenética de 3 genes de cópia simples entre a clone CL-14, os dois haplótipos da clone CL Brener e o clone Sylvio X10/1. Esses genes são: proteína de reparo de mismatch de DNA (MSH2); proteína de resposta ao estresse oxidativo trypanotiona redutase e citocromo oxidase II, um gene codificado no genoma mitocrondrial. 24 COII Mini-exon rDNA 24Sa 94 °C / 5 min 94 °C / 3 min 94 °C / 10 min Desnaturação 94 °C / 45s 94 °C / 1 min 94 °C / 30 s Anelamento 45 °C / 45s 68 °C / 1 min (2*) 60 °C / 30 s Desnaturação inicial 66 °C / 1 min (2*) 64 °C / 1 min (2*) 62 °C / 1 min (2*) 60 °C / 1 min (35*) Extensão Ciclos Extensão final 72 °C / 1 min 72 °C / 1 min 72 °C / 30 s 40 43** 30 72 C / 5 min 72 °C / 10 min 72 °C / 10 min Tabela 1 - Programas utilizados para PCR de diferenciação dos grupos de T. cruzi. * Número de ciclos com diferentes temperaturas. ** Número total de ciclos. 25 Para tal, os contigs de CL-14 foram alinhados por MEGABLAST com filtro de baixa complexidade desligado, contra as CDS (coding sequences) deCL Brener. As subsequências dos contigs de CL-14 que melhor alinharam com as CDS de CL Brener foram cortadas e anotadas como CDS de CL-14. Estas, juntamente com suas CDS de referência da CL Brener, foram alinhadas múltipla e globalmente pelo software ClustalW, permitindo gaps de até 10 nucleotídeos, com 100 reamostragens bootstrap utilizando a matriz de pontuação IUB, a qual pontua e alinha sequências com nucleotídeos não definidos, como “N” no lugar de nucleotídeos (Larkin et al., 2007). Scripts em Perl foram desenvolvidos para cortar overhangs nos alinhamentos, que são sequências presentes nas extremidades dos alinhamentos, onde estas não pareiam com todas as sequências, gerando blocos de alinhamentos compactos. Tais alinhamentos processados foram agrupados pelo algoritmo neighbour joining no software Mega5 para determinar as distâncias filogenéticas para cada gene entre os diferentes clones. 3.5 Montagem do Genoma Mitocondrial Para determinar o haplótipo ao qual a CL-14 pertence, verificamos a cobertura pelas reads de CL-14 nos maxicírculos dos clones CL Brener e Esmeraldo, os quais são maxicírculos Tc III e Tc II, respectivamente. As reads foram alinhadas contra os maxicírculos 26 em questão pelo software MEGABLAST, software do pacote BLAST, com o filtro de alta complexidade desligado, uma vez que este genoma possui muitas regiões repetitivas. Com scripts em Perl, foram selecionados apenas os melhores hits e foi gerada uma figura apresentando a cobertura dos maxicírculos, para cada alinhamento. Posteriormente, também com scripts em Perl, foi verificado se as reads se alinham com apenas um ou com os dois genomas plastidiais. 3.6 Determinação do Número de Cópias de Famílias Multigênicas As ORFs do clone CL Brener montadas e anotadas, foram selecionadas e agrupadas de acordo com suas famílias gênicas. Em pipeline próprio escrito em Perl, as reads de CL-14 foram mapeadas contra cada grupo de ORFs e, a partir da cobertura, a quantidade de sequências de cada ORF representada no genoma da CL-14 foi estimada. Como primeiro passo, todo o banco de reads é alinhado contra cada uma das ORFs. São selecionados os melhores alinhamentos recíprocos, ou seja, aqueles que tem maior pontuação tanto no sentido read -> ORF quanto ORF->read. Os alinhamentos selecionados são computados para gerar um arquivo com a contagem de reads que cobrem cada um dos nucleotídeos das ORFs, um a um. A média das coberturas de todos os nucleotídeos é subtraída pelo desvio padrão e, este resultado, é dividido pela cobertura do sequenciamento de cada haplótipo. O resultado desse algoritmo é 27 arredondado como um número inteiro e apresentado como a contagem predita do número de cópias para cada sequência analisada. Esse pipeline é capaz de contar as sequências tanto para uma única sequência ou para um grupo de sequências similares, como uma família multigênica, a partir de apenas um representante. Para a estimativa de cópias de uma única sequência e com contagem que represente apenas ela, testa-se o mapeamento sempre aumentando o a identidade dos alinhamentos até que o resultado pare de convergir. Para conduzir a contagem de uma família multigênica inteira a partir de uma única sequência, faz-se o contrário, diminuindo a identidade dos alinhamentos até que a contagem final pare de convergir. Os resultados do pipeline são: a contagem total de cada sequência query, um gráfico com as coberturas de cada nucleotídeo plotado, um arquivo de texto com as coberturas nucleotídeo a nucleotídeo e um arquivo com as reads ortólogas a cada ORF da referência. As reads de CL-14 foram também alinhadas pelo software BWA (Li & Durbin, 2010) com a opção “bwasw” contra o genoma do clone CL Brener para comparar o mapeamento do pipeline com o mapeamento desse software. O mapeamento foi configurado para alinhar cada read apenas uma vez. A visualização foi feita com o software IGV (Thorvaldsdóttir et al., 2012). 28 3.7 Determinação de Identidade entre CDS As reads do clone CL-14 foram mapeadas contra as CDS do clone CL Brener (TritrypDB versão 4.3) pelo software BWA, com a opção “bwasw”, para mapeamento de reads longas. O arquivo de mapeamento foi editado pelo pacote de funções SAMTOOLS (Li et al., 2009) para a conversão de arquivo .bam para .sam e distribuição organizada dos dados. Ainda com o pacote SAMTOOLS, foi utilizada a opção “mpileup” para identificar os polimorfismos entre os clones a partir do mapeamento. O pacote de funções BCFTOOLS, desenvolvido pelos mesmos autores, foi utilizado para gerar um arquivo de texto com as informações dos polimorfismos. Com as informações de polimorfismos entre as reads e as CDS de referência, utilizou-se o script VCFUTILS, pertencente ao pacote SAMTOOLS, para gerar as CDS do clone CL-14. As CDS de CL-14 preditas foram alinhadas contra as CDS do clone CL Brener pelo software MEGABLAST, onde os melhores alinhamentos recíprocos foram selecionados e, as médias das identidades entre eles foram anotadas como a identidade média entre as CDS. Famílias multigênicas também foram selecionadas e tiveram suas identidades médias verificadas. 29 3.8 Análises de Genes Trans-sialidases com Repetições SAPA Com pipeline para desenho de primers que flanqueiam repetições, desenvolvido no laboratório da Dra. Daniella C. Bartholomeu, e o software ePCR, identificaram-se diferenças nos tamanhos dos genes trans-sialidase que contem repetições SAPA, TcTS-SAPA. As coberturas desses genes de CL Brener foram estimadas em CL-14 com suas reads, utilizando o pipeline de contagem de número de cópias, exposto anteriormente. As reads ortólogas de cada gene foram montadas pelo software CAP3 e, os contigs resultantes, forma analisados no software ORFfinder para a seleção da ORF ortóloga à ORF da CL Brener. As ORFs foram alinhadas pelo software CLustalW, em alinhamento global e as quantidades de repetições SAPA para os dois clones, contadas manualmente. Foi desenvolvido um par de primers para a verificação da diferença de tamanho dos clusters SAPA entre os clones. O primer F 5’- CGGGATCGTGGGAGACGGGT-3’ anela-se dentro da região codificadora da trans-sialidase Tc00.1047053509495.30 e o primer R 5’- ACCGTTGCCAGCGGGAGTTG-3’ anela-se na região 3’-UTR do mesmo gene. O programa para amplificação está na Tabela 2. A cada reação, foram adicionados 30ng de DNA template. 30 Desnaturação inicial 94 °C / 10 min Desnaturação 94 °C / 30 s Anelamento 55 °C / 30 s Extensão 72 °C / 30 s 30 Ciclos 72 °C / 10 min Extensão final Tabela 2 – Programa de PCR para diferenciação de tamanhos dos clusters de repetições SAPA. 31 Nosso grupo realizou eletroforeses de digestões por endonucleases dos DNAs totais dos clones CL Brener e CL-14. As enzimas utilizadas, AluI, PuvII e HpaII, clivam sequências nucleotídicas dentro das repetições SAPA. Essas digestões foram hibridizadas com sondas SAPA. Foram feitos também, western blots. Parasitos nas fases de vida epimastigota e tripomastigota cultivadas em meio LIT a 28oC e coletadas durante a fase de crescimento exponencial foram lavados em tampão fostato (pH 7.4). As células foram contadas e ajustadas para a concentração de 3x10 8 células/mL em tampão de amostra (0,5M TrisHCl, 0,01M EDTA, 5% SDS, 5% 2-mercaptoetanol) e fervidas por 5 min. A eletroforese correu em gel de poliacrilamida na concentração 0,1% SDS/12% poliacrilamida. A corrida se deu a 100 volts por 2 horas a temperatura ambiente. Polipeptídeos foram transblotados em folhas de nitrocelulose (0,45 m de tamanho de poros) a 100 volts por 1,5 horas e depois bloqueados com 20mM de Tris e 0,13 mM de NaCl, pH 7,6 overnight a 4oC. Logo após, foram hibridizados com anticorpos anti trans-sialidases e anti SAPA durante 1 hora à temperatura ambiente e a reação foi parada após 30 minutos. 3.9 Sequenciamento e mapeamento do Transcriptoma de CL-14 O cultivo de formas epimastigotas, amastigotas intracelulares e tripomastigotas derivadas de cultura de tecidos das cepas CL Brener e CL14 foram feitos de acordo com os métodos descritos por Chiari (1981) e Teixeira et 32 al. (1994). RNA total foi purificado de culturas de epimastigotas do clone CL-14 de acordo com os métodos descritos em Teixeira et al. (1994). O mRNA foi extraído utilizando por cromatografia em colunas RNAeasy Extration Kit (Qiagen) seguindo as instruções do fabricante. A purificação das amostras de RNA foram realizadas utilizando o RNeasy MiniEluteTM Cleanup Kit (Qiagem), seguindo as instruções do fabricante. As amostras limpas foram quantificadas com o aparelho NanoDrop ND-100 UV/Vis (NanoDrop Technologies, USA) e vizualizadas em gel de agarose desnaturante 1,2% a fim de verificar a qualidade do RNA total, de acordo com procedimentos padrões descritos em Ausubel et al., (1995). Para a produção das bibliotecas de cDNA T. cruzi obtidos a partir de cultura de células foi utilizado o TruSeq RNA Sample Preparation Kits v2 (Illumina) de acordo com as instruções do fabricante, utilizando primers específicos, que segreguem as diferentes amostras e seus tempos amostrais. As bibliotecas de cDNA foram sequenciadas no sistema Illumina Hiseq 1500 existente na facility do Dr. Najib M. El-Sayed. As reads foram identificadas e filtradas pela sua qualidade, utilizando o software FASTQC (Andrews, 2010). O mapeamento das reads contra o genoma de referência, genoma montado e anotado do T. cruzi CL Brener, o qual já foi realizado para as amostras previamente sequenciadas, se deu pelo software Tophat2 (Kim et al., 2013). 33 4. Resultados 4.1 Sequenciamento e Análise do Genoma do Clone CL14 Foram gerados aproximadamente 3,5 milhões de reads no sequenciamento de uma biblioteca de shotgun (WGS) obtida a partir do DNA total extraído de formas epimastigotas do T. cruzi clone CL-14 com um total de mais de 1,5 bilhões de nucleotídeos sequenciados (Tabela 3). Essas sequências, em sua maioria, têm tamanho de aproximadamente 400pb (Figura 2), como é esperado para o sequenciamento realizado pela plataforma Roche 454 FLX-Titanium. Baseado no tamanho do genoma nuclear haplóide estimado em 55Mb (Souza et al., 2011), o total de nucleotídeos sequenciados para a CL14 corresponde a uma cobertura de 27 vezes. Um tamanho de genoma similar foi estimado para o clone CL Brener (El-Sayed et al., 2005) e a comparação de bandas cromossomais separadas por pulsed-field gel electrophoresis (PFGE) mostra um padrão de bandas similares entre CL-14 e CL Brener (Figura 3). O genoma nuclear diplóide de 110Mb predito para o clone CL Brener, uma estimativa baseada nos dados de sequenciamento, a qual é similar ao genoma estimado para CL-14. O conteúdo GC estimado em 51%, baseado no total de reads do genoma da CL-14 é também similar ao do genoma da CL Brener, mas é maior que o conteúdo GC do clone Sylvio X10/1, o qual representa 49,21% dos nucleotídeos sequenciados. 34 CL-14 CL Brener Metodologia 454 FLX Sanger Contagem 3457102 1192680 Bases sequenciadas 1506882872 768436632 Cobertura 27x 14x Contagem 43'906 4'008 Pares de bases 54782655 60372297 N50 1629 25950 Conteúdo CG 50,62% 51,00% Reads Contigs Tabela 3 – Dados comparativos entre sequenciamento e montagem dos genomas dos clones CL-14 e CL Brener. 35 Figura 2 - Número de reads de CL-14 pelo tamanho em pares de base. 36 Figura 3 – Cariótipo molecular de CL-14 e CL Brener apresentando os cromossomos destes clones. As setas vermelhas mostram algumas diferenças entre os genomas. Apesar de divergentes, os cromossomos são muito parecidos quanto aos tamanhos e quantidade. 37 prevista experimentalmente por densitometria de pulse-field gel electrophoresis (entre 106,4 e 110,7 Mb (Cano et al., 1995)), é ligeiramente maior que o Alguns resultados diferem daqueles com os obtidos durante o sequenciamento do genoma do clone CL Brener, pois este foi sequenciado pelo método de Sanger (El-Sayed et al., 2005a), diferentemente da CL-14 que teve seu genoma sequenciado por pirosequenciamento. O tamanho das reads geradas pelo sequenciamento de Sanger é maior que as reads geradas pelo pirosequenciamento, 800 e 400 bases em média, respectivamente. Além disso, o genoma do clone CL Brener foi sequenciado em mate pairs com 3, 10, 45 e 100Kb, ao passo que o genoma de CL-14 foi sequenciado com pair ends de aproximadamente 3Kb. Essas diferenças foram suficientes para garantir a montagem mais eficiente do clone CL Brener, onde foram gerados 4008 contigs em contraste com os mais de 43 mil contigs para o clone CL-14. A grande dificuldade na montagem destes genomas se deve ao fato de o T. cruzi ter 50% de repetições em seu código genético. Muitas destas repetições ultrapassam o tamanho das reads, impossibilitando uma montagem correta dos contigs e também a real representação dessas zonas repetitivas, no que diz respeito ao seu tamanho e ocorrências ao longo do genoma. Os resultados do genoma montado também diferem do genoma da CL Brener pelos softwares de montagem (Celera Assembler para a montagem do genoma da CL Brener e Newbler para a montagem do genoma da CL-14) e pela característica inerente ao pirosequenciamento que não consegue 38 sequenciar eficientemente regiões de homopolímeros com mais de 6 repetições (Ronaghi, 2000). O genoma haplóide de CL Brener tem um número estimado de 12000 genes, organizados em longos clusters que são transcritos policistronicamente. (El-Sayed et al., 2005a). Análises dos contigs de CL-14 indicam uma organização genômica similar. A figura 4 apresenta dois arranjos de sintenia entre cromossomos de CL Brener e os contigs montados de CL-14. Com uma montagem incompleta e com contigs pequenos, a figura 4 contém apenas trechos dos cromossomos de CL Brener escolhidos. As sequências de CL-14 são sintênicas com seus ortólogos em CL Brener por todas suas extensões, não havendo inversão da polarização de codificação das fitas de DNA. Por causa do grande número de contigs, não foi possível predizer a contagem acurada do número total de genes a partir da montagem, uma vez que muitas das open reading frames “ORFs” estão truncadas. Mais ainda, como demonstrado à frente, assim como o clone CL Brener, a CL-14 tem o genoma híbrido, constituído por dois distintos haplótipos, o que torna a montagem do genoma ainda mais complexa e difícil. No entanto, para investigar a existência de mudanças no cariótipo ou na presença de grandes rearranjos cromossomais, foram feitas hibridizações de bandas cromossomais separadas por PFGE com diferentes sondas. Algumas diferenças nas localizações de genes gp82 foram descritas por Atayde et al., 2004, que identificou a presença de duas bandas cromossomais hibridizando com a sonda gp82 no clone 39 CL-14 que são ausentes na cepa CL. No entanto, como a cepa CL é formada por uma população mista de diferentes clones, foi decidido comparar o cariótipo molecular dos clones CL-14 e CL Brener. Os resultados estão apresentados na Figura 5. Foram feitas duas hibridizações diferentes com o cariótipo completo por PFGE, os quais incluíram sondas para sequências da família multigênica MASP, para o gene de cópia simples GPI8 e duas hibridizações em gel de eletroforese oriundo de digestões enzimáticas, com sondas para sequência das também famílias multigênicas amastinas e DGF-1. Os resultados indicam grande similaridade entre os clones tanto no tamanho quanto na intensidade das bandas, apresentando divergência apenas com a sonda da grande família MASP, aparentemente distribuída em grande parte dos cromossomos dos genomas dos dois clones, onde duas bandas apresentadas no clone CL Brener estão com menor intensidade no clone CL14. Esses resultados sugerem que não são encontrados grandes rearranjos entre os dois genomas. 40 Figura 4 – Sintenia entre contigs de CL-14 e seus cromossomos homólogos em CL Brener. As linhas horizontais representam trechos dos cromossomos montados do clone CL Brener e contgs do clone CL Brener. Em vermelho, os blocos sintênicos entre as sequências dos clones. Figura gerada pelo software CONTIGuator. 41 Figura 5 – As figuras superiores apresentam bandas cromossomais separadas por Pulse-Field Gel Electrophoresis (PFGE) e coradas com brometo de etídio. Duas sondas de DNA genes diferentes foram utilizadas para hibridar com membranas provenientes do PFGE (southern blot), mostrando mesmo número de bandas e posições entre amostras de CL Brener e CL-14. As figuras inferiores mostram hibridizações com sonda de DNA (southern blot), para os genes amastina e DGF-1 em membranas provenientes de géis de eletroforese digeridos por enzimas de restrição. 42 4.2 Análises Filogenéticas Para determinar a qual grupo a CL-14 pertence, os marcadores nucleares correspondentes à subunidade ribossomal 24S rDNA e o gene Spliced Leader (SL), como também um marcador para um gene do genoma mitocondrial, a citocromo oxidase II (COII) foram analisados (De Freitas et al., 2006). Reações em cadeia de polimerase (PCR) in silico foram realizadas usando primers específicos para essas sequências e os tamanhos dos amplicons gerados usando as reads de CL-14 como alvo, foram comparados com os tamanhos esperados de amplicons correspondentes das sequências genômicas do clone CL-Brener. Para o amplicon da citocromo oxidase II, nós comparamos os tamanhos dos produtos de digestão por AluI, também realizada in silico. Como apresentado na Tabela 4, a comparação de fragmentos resultantes da amplificação de marcadores 24S rDNA e SL indicam que o clone CL-14 deveria ser classificado como TcII, pois estão presentes amplicons de 150pb para o marcador SL e 125pb para o marcador 24S rDNA. Esses resultados são encontrados para o clone Esmeraldo (TcII) e CL Brener (TcVI). Adicionalmente, produtos de PCR correspondentes ao gene mitocondrial COII resultam em 2 fragmentos de 81 e 294pb após a digestão com AluI, o que é característico de cepas oriundas de T. cruzi tipos III, IV, V e VI. 43 Linhagem SL rDNA 24S COII Tc I 150 110 30, 81 e 264 Tc II 150 125 81, 82 e 212 Tc III 200 110 81 e 294 Tc IV 200 125 81 e 294 Tc V 150 110 e 125 81,264 e 294 Tc VI 150 125 81 e 294 Cl-14 150 125 81 e 294 Tabela 4 – PCR in silico de marcadores utilizados na genotipagem do T. cruzi. Tamanho dos amplicons em pares de base para cada marcador molecular. Os marcadores mini-exon SL e rDNA 24S representam o genoma nuclear e o marcador COII, o genoma mitocondrial. 44 Juntos, estes resultados, assim como os resultados descritos adiante, indicam que, similarmente à CL Brener, CL-14 é um clone híbrido e deve ser classificado como Tc VI. Como os dois clones foram isolados da mesma cepa e baseado no fato que o marcador mitocondrial corresponde a TcIII, levantamos a hipótese de que o clone CL-14 é derivado do mesmo evento de hibridização que ocorreu entre cepas ancestrais pertencentes a TcII e TcIII, o qual, similarmente ao clone CL Brener, manteve uma mitocôndria do parental Tc III. Os resultados obtidos com análises in silico foram confirmadas in vitro pela amplificação de DNAs purificados de culturas de epimastigotas de CL-14 e CL Brener, utilizando primers que amplificam os marcadores SL, 24S rDNA e COII (Figura 6). Além das análises desses marcadores, foi feito um agrupamento por similaridade das sequências de aminoácidos de dois genes nucleares, MSH2 e Tripanotiona redutase (TR), os quais apresentam, entre os haplótipos, diferenças nas sequências de nucleotídeos e também um gene mitocondrial, o COII. Os resultados, apresentados na figura 7 confirmam nossa predição de que a CL-14 é muito próxima filogeneticamente da CL Brener e que sequências pertencentes a dois haplótipos distintos (esmerado like e non-esmeraldo like) são presentes no genoma da CL-14. Alinhamentos de sequências entre 392310 reads do genoma de CL-14 que correspondem a regiões codificantes e sequências codificantes dos dois haplótipos de CL Brener mostram que 175612 (44,7%) tem maior similaridade 45 com o haplótipo Esmeraldo like, 185497 (47,3%) com o haplótipo nonEsmeraldo like. Para um total de 31201 reads (8%), não foi possível distinguir entre os dois haplótipos. 46 Figura 6 - Eletroforese dos amplicons dos marcadores: A mini-exon SL, B - rDNA 24S , C - COII. A coluna Controle representa a PCR sem amostra de DNA. A amostra Colombiana é uma cepa representante do TcI, Esmeraldo TcII, 231 é uma cepa TcIII, 115 TcV, CL Brener e CL-14 são TcVI. 47 Figura 7 - Árvores filogenéticas produzidas pelo algoritmo neighbor-joining, a partir de sequências peptídicas dos genes MSH2 e tripanotiona redutase (genes nucleares) e citocromo oxidase II (gene mitocondrial) entre CL-14 e CL Brener. Sequências do clone Sylvio X10/1, um T. cruzi TcI foram adicionadas para demonstrar a distância com esse DTU de T. cruzi. 48 As reads de CL-14 foram alinhadas às CDS correspondentes aos alelos esmeraldo like e non-esmeraldo like para verificar aonde eles se diferem. Os genes homólogos de CL Brener têm aproximadamente 2,2% de diferença entre si nas regiões codificantes (El-Sayed et al., 2005a). Essas diferenças são provocadas por SNPs (single nucleotide polymorphism) entre as sequências. A figura 8 apresenta um fragmento de alinhamento local e as diferenças entre os dois haplótipos de um gene de uma proteína de ligação ao RNA, com identificadores Tc00.1047053506211.70 e Tc00.1047053508895.50, que são esmo like e non-esmo like, respectivamente. Ao alinharmos as reads de CL-14 com seus homólogos em CL Brener e analisarmos as suas regiões polimórficas, verificamos que cada read alinha-se perfeitamente com apenas um haplótipo. Sempre que um alelo de CL Brener tem sua região codificante totalmente coberta pelas reads, o seu alelo homólogo também foi totalmente coberto. Mais que isso, nenhuma read apresenta características para os dois haplótipos ao mesmo tempo, apenas para um, sugerindo que o clone CL-14 tem dois alelos distintos para cada gene (Figura 9). Assim como o clone CL Brener, pode apresentar alelos polimórficos, esmeraldo like e non-esmeraldo like. Em oposição aos resultados obtidos com os alinhamentos das reads de CL-14 contra as CDS de CL Brener, muitos contigs de CL-14 apresentaram características dos dois haplótipos, evidenciando que os haplótipos não foram corretamente segregados durante a montagem dos contigs. 49 Esmo like Non-esmo like Figura 8- Parte do alinhamento entre os dois diferentes haplótipos de CL Brener dos genes Tc00.1047053506211.70 e Tc00.1047053508895.50, que codificam para proteínas que se ligam ao RNA. As setas vermelhas indicam os polimorfismos entre os mesmos. 50 Figura 9 - Alinhamento esquemático das reads de CL-14 com genes homólogos de CL Brener em seus diferentes haplótipos. Os losangos representam SNPs entre as sequências de CL. As sequências gênicas são representadas pelas linhas longas e delimitadas e as reads de CL-14 são representadas pelas linhas menores. Em vermelho, sequências atribuídas ao haplótipo esmeraldo like (esmo like) e em azul, sequências atribuídas ao haplótipo non-esmeraldo like ( non-esmo like). Nenhuma read de CL-14 alinhou com os dois haplótipos concomitantemente nas regiões polimórficas. 51 4.3 Montagem e Análise do Genoma Mitocondrial de CL-14 Para definir o haplótipo do maxicírculo do clone CL-14, verificamos a cobertura de suas reads nos maxicírculos dos clones Esmeraldo e CL Brener, Tc II e III, respectivamente. Como pode ser observado na figura 10, a partir de alinhamentos locais, 13907 reads de CL-14 alinharam-se com o maxicírculo de CL Brener e apenas 94 com fragmentos do genoma mitocondrial do clone Esmeraldo. As reads mapeadas, selecionadas por seus best hits e sem redundância de alinhamento, mostraram maior similaridade pelo maxicírculo TcIII. Todas as reads similares a este clone, também foram alinhadas via MEGABLAST ao maxicírculo de CL Brener, indicando que elas representam regiões mais conservadas desses genomas. Apenas 9 polimorfismos entre CL Brener e CL-14 foram identificados em toda a extensão do maxicírculo, sendo elas provenientes de inserção ou deleção, como mostra a tabela 5. Todos os genes do maxicírculo da CL Brener estão representados no maxicírculo do clone CL-14, em alto grau de sintenia. A montagem e anotação do maxicírculo de CL-14 apresenta aproximadamente 20,6Mb e contém, downstream aos genes que codificam para as subunidades ribossomais 12S e 9S, todos os 18 genes codificadores de proteínas previamente identificados no maxicírculo do clone CL Brener e 52 Figura 10 - Cobertura das reads de CL-14 nos maxicírculos de CL Brener e Esmeraldo, Tc III e Tc II respectivamente. As réguas representam os genomas linearizados e os retângulos azuis, as reads. No detalhe, a cobertura completa do maxicírculo de CL Brener. 53 Posição CL Brener CL-14 Fenômeno 5943 GTTTTT GTTTT Deleção 6271 TAAAA TAAA Deleção 10789 TAAAAAAAAA TAAAAAAAA Deleção 14149 ATTTT ATTT Deleção 14638 AACA AA Deleção 16564 GTTTTT GTTTT Deleção 16989 GA GTA Inserção 17287 TAA TA Deleção 19829 CTTTTTTTT CTTTTTTT Deleção 20006 GTTT GTT Deleção Tabela 5 – Polimorfismos encontrados entre os kDNAs dos clones CL Brener e CL-14. 54 pertencentes ao maxicírculo, a partir de alinhamentos com MEGABLAST contra o maxicírculo de CL Brener foram montadas pelo software CAP3. Os contigs resultantes foram montados manualmente até formar um único scaffold, que representa o genoma mitocondrial do clone CL-14 (Figura 11). 55 Figura 11 - Comparação entre os genomas mitocondriais de CL-14 e CL Brener e os polimorfismos entre eles. 56 4.4 Análise Comparativa de Famílias Multigênicas Uma vez que a montagem das reads do clone CL-14 resultou em um genoma muito fragmentado, decidimos realizar análises de genes pertencentes às grandes famílias multigênicas que são sabidamente envolvidas em interações parasito-hospedeiro, baseadas nas reads de CL-14. Para isso foi desenvolvido um script em PERL, onde as reads de CL-14 foram alinhadas por alinhamento par a par e local via MEGABLAST contra todas as CDS do clone de referência CL Brener. A extensão da cobertura desses alinhamentos foi avaliada a fim de verificar se algum gene de CL Brener não está representado no genoma do clone CL-14. Após procurar num total de 23216 genes codificadores de proteínas preditos no genoma de CL Brener, concluímos que todos os genes estão presentes no genoma da CL-14, o que indica que o conteúdo genético de ambos é altamente similar. Análises comparativas baseadas nas sequências das reads apresentam mais de 99,5% de identidade entre sequências de famílias multigênicas descritas em CL Brener. A fim de selecionar as reads ortólogas para cada CDS de CL Brener, filtraram-se os alinhamentos, escolhendo o melhor hit tanto read a CDS quanto de CDS a read. As reads dos melhores hits de cada CDS foram selecionadas e utilizadas para gerar uma cobertura de alinhamentos, observando que cada read foi selecionada para apenas um alinhamento. Com os alinhamentos, foi possível determinar quantas reads cobrem cada base das CDS, gerando 57 informações de cobertura nucleotídeo a nucleotídeo. Os valores das coberturas foram normalizados via z-score. O z-score é dado pela diferença da cobertura do sequenciamento e da média das coberturas, dividido pelo seu desvio padrão. Portanto, com base nos valores de cobertura do sequenciamento do genoma e na cobertura das CDS montadas, foi possível estimar o número de cópias de cada gene ou família multigênica analisados. Além de apresentar grupos idênticos de genes, não existem grandes diferenças no número de cópias entre membros de famílias multigênicas entre os dois genomas (Tabela 6). Dois genes de cópia simples, MSH2 e PGP, foram utilizados como referência para calibrar o software de contagem de número de cópias. A Tabela 7 apresenta as identidades das sequências codificadoras de proteínas entre CL Brener e CL-14. Todas as sequências codificadoras de proteínas montadas e anotadas em CL Brener tem seus resultados no campo CDS e, outras famílias multigênicas e grupos de ortólogos, estão descritos em suas correspondentes linhas. Os resultados mostram alta similaridade entre as sequências analisadas, onde as CDS tem um mínimo de identidade de 99,73% (tabela 7), um valor maior que o encontrado entre as sequências codificantes dos haplótipos de CL Brener, que é de 97,8% (El-Sayed et al., 2005ª). 58 Famílias gênicas CL-14 CL Brener Trans-sialidase 1463 1481 MASP 1399 1465 Mucinas 999 992 RHS 773 777 DGF-1 565 569 GP63 491 449 RNA helicases 156 157 Kinesinas 102 102 Tuzinas 83 83 Cruzainas (calpainas) 67 66 Dineína heavy chain 45 45 Amastinas 27 27 GAPDH 21 20 KMP-11 18 11 MSH2 2 2 PGP 2 2 Tabela 6 – Contagem dos membros das famílias gênicas nos clones CL-14 e CL Brener, utilisando script desenvolvido neste trabalho. 59 Identidade % CDS 99,79 MASP 99,87 Trans-sialidase 99,80 RHS 99,74 DGF 99,84 GP63 99,73 RNA-binding 99,83 Amastin 99,69 Tabela 7 – Média das identidades das sequências codificadoras de proteínas entre CL Brener e CL-14. CDS são todas as sequências codificadoras montadas e anotadas em CL Brener. 60 O algoritmo utilizado para a contagem de genes e membros de famílias multigênicas com a cobertura das sequências de referência pelas reads, desenvolvido por nosso grupo, foi capaz de estimar também a contagem de genes real para o clone CL Brener, pois na ocasião da publicação do genoma (El-Sayed et al., 2005ª) os autores contaram apenas o número de ORFs montadas. Essa estratégia prévia deixa de representar sequências que não foram montadas, seja por dificuldade com o processo de montagem genômica, pela grande extensão de repetições ou cópias que são idênticas e não foram segregadas corretamente. A contagem pela cobertura de reads contorna esses impasses. Sequências “quimeras”, onde dois diferentes haplótipos foram montados juntos em uma só sequência também são encontrados no genoma montado do clone CL Brener. Em virtude da cobertura nucleotídeo a nucleotídeo de nosso algoritmo, é possível analisar essas sequências e, se houver SNPs entre elas, eles são detectados. Com as informações de SNPs, pode-se segregar os diferentes haplótipos que por ventura tenham sido montados juntamente. Ajustando-se a identidade entre as reads de CL-14 e as ORFs de CL Brener a 99,5%, é possível realizar não só a contagem dos genes de CL-14, como também identificar as diferenças entre os alelos, pois, a essa estringência, observa-se a cobertura apenas parcial de SNPs entre os mesmos, uma vez que a cobertura dos SNPs será a metade das sequências conservadas. O algoritmo é capaz de realizar a contagem de membros de 61 grandes famílias mesmo que a similaridade entre eles seja alta. Diminuindo-se o cutoff de identidade até que a contagem de genes pare de convergir, pode-se estimar a dimensão de famílias multigênicas com poucos representantes das famílias. Isso é desejável nos casos onde os genes, quando tem suas sequências muito semelhantes, foram montados juntamente em um ou poucos representantes. A cobertura das sequências para de convergir por outros fatores estabelecidos no algoritmo, principalmente pela seleção de ortólogos a partir da escolha do melhor alinhamento recíproco. O algoritmo tem como saída de resultados, cinco arquivos, sendo cada um com: número de cópias preditas, figura de coberturas nucleotídeo a nucleotídeo, arquivo de texto com as coberturas, alinhamento das reads em formato XML e lista de reads com suas respectivas sequências da referência. A figura 12 apresenta parte de um arquivo de texto com as coberturas nucleotídeo a nucleotídeo (Figura 12-A) e gráficos com as coberturas dos dois alelos do gene GP72 com reads de CL Brener bem como histogramas das frequências das coberturas (Figura 12-B). É possível observar a topografia das coberturas e, nos histogramas, a barra de maior valor encontra-se próxima à cobertura sequenciada de cada alelo, que no caso do sequenciamento do genoma do clone CL Brener, foi de sete vezes. Com o intuito de verificar a acurácia do algoritmo, e se seus resultados são compatíveis com os softwares disponíveis, foi feito também mapeamento de uma sequência de trans-sialidase com sequências repetitivas a partir do 62 Figura 12. Resultados do algoritmo. A- Texto parcial das coberturas de cada nucleotídeo da ORF mapeada. B – Gráficos da cobertura dos genes, onde a abscissa representa a ORF em toda sua extensão. Os histogramas são as frequências dos valores de coberturas. O valor encontrado acima dos histogramas é o número de cópias predito. 63 software BWA, utilizando reads do clone CL-14. A Figura 13 apresenta mapeamento pelo BWA e pelo nosso algoritmo, mostrando que, além das coberturas encontradas serem as mesmas, as posições dos motivos repetitivos, que neste caso são degenerados, também foram preditas de maneira semelhante. O algoritmo de contagem de número de cópias de genes também foi utilizado, em colaboração com os pesquisadores da Rede Genoma Brasileira nos estudos das famílias multigênicas presentes nos genomas dos tripanosomatídeos Angomonas deanei e Strigomonas culicis (Motta et al., 2013) e Trypanosoma rangeli (em preparação). Esse algoritmo foi também utilizado nos estudos sobre a caracterização dos genes que codificam para as duas sub-famílias que codificam para as proteínas amastinas presentes no genoma do clone CL Brener (Kangussu-Marcolino et al., 2013), A correção estimativa do número de cópias de genes de amastinas dentro de grupos específicos, previamente realizada por El-Sayed et. al., 2005a, possibilitou inferir sobre clusters de genes desta família. Foi observado que genes de amastinas do mesmo grupo são organizadas no mesmo cromossomo e em tandem, separadas ou não por genes que codificam para proteínas tuzinas. 64 Figura 13 – Comparação de mapeamento sobre o gene TcTSSAPA Tc00.1047053509495.30 pelo BWA visualizado pelo IGV com o algoritmo desenvolvido. Ambos foram feitos com reads do clone CL-14. A figura superior, gerada pelos softwares BWA e IGV, apresenta histograma de cobertura do gene pelas reads e cada barra abaixo é uma read alinhada. A figura inferior foi gerada pelo algoritmo desenvolvido e apresenta as coberturas de cada nucleotídeo da sequência de referência pelas reads mapeadas. 65 4.5 Análises das diferenças nos genes codificando Trans-sialidases com Repetições SAPA em CL Brener e CL-14 Vários motivos repetitivos foram reportados como associados ao haplótipo de virulência em parasitos (Mendes et al., 2013). Utilizando algoritmos para design de marcadores e PCR in silico, foram observadas diferenças nos tamanhos dos genes que codificam para Trans-sialidases Tc00.1047053507085.30, Tc00.1047053509495.30 e Tc00.1047053510787.10. Tais divergências se dão pela menor quantidade de motivos repetitivos SAPA, de sequência 5’-GACAGCAGTGCCCACGGT ACGCCCTCGACTCCCGTTGAC AGCAGTGCCCACGGTACACCCTCGACTCCCGTT-3'. Em CL Brener, a Trans-sialidase Tc00.1047053509495.30 possui 19 repetições SAPA enquanto seu homólogo no clone CL-14 possui apenas 3 repetições. A figura 14 apresenta a cobertura de reads dos clones CL Brener e CL14 mapeadas na sequência codificadora deste gene. A abcissa representa toda a sequência de nucleotídeos e a ordenada, a cobertura por reads para cada um dos nucleotídeos. A cobertura apresentada em verde refere-se a CL Brener. A linha vermelha representa a cobertura pelas reads de CL-14. A cobertura pelas reads de CL Brener na região da SAPA é característica para sequências com apenas uma cópia no genoma, onde o genoma foi sequenciado na ordem de 14 vezes. O mesmo resultado não é 66 Domínios N e C-terminais Domínio da família das sialidases Domínio lectina like Domínio SAPA Figura 14 - Coberturas da Trans-sialidase Tc00.1047053509495.30 pelas reads genômicas de CL Brener e CL-14. Em verde, a cobertura nucleotídeo a nucleotídeo da sequência em CL Brener. A linha vermelha mostra a cobertura no clone CL-14. Abaixo do gráfico, a régua mostra as posições dos domínios. 67 observado para o clone CL-14. Suas reads, provenientes do sequenciamento do genoma, cobrem a sequência do gene, porém claramente não apresentam cobertura para toda a extensão das 19 cópias de SAPA, cobrindo apenas três motivos SAPA repetitivos. Utilizando reads geradas no sequenciamento do transcriptoma do clone CL-14 (veja resultados no item 4.6), obtivemos o mesmo resultado, onde a cobertura da CDS é observada, porém não há cobertura em toda a extensão das repetições SAPA (Figura 15). O trecho de repetições SAPA possuem degenerações ao longo de sua extensão e, a correspondência de cada uma no clone CL-14 em relação à CL Brener pode ser observada nos resultados de cobertura das reads do sequenciamento genômico. Isso se deve ao fato de o tamanho médio das reads geradas no pirosequenciamento cobrirem mais de uma repetição SAPA. No entanto, as reads geradas no sequenciamento do transcriptoma são menores e não cobrem sequências de repetições suficientes para detectar, pelo transcriptoma, a posição exata das repetições SAPA com suas degenerações. Para verificar se in vitro os resultados encontrados in silico acerca da baixa quantidade de repetições SAPA encontradas no clone CL-14, quando comparado ao clone CL Brener, podem ser também observadas experimentalmente, foram desenvolvidas duas estratégias baseadas em amplificação das trans-sialidases com SAPA e em análises de southern blot. A figura 16-A apresenta eletroforese em gel de agarose produtos de PCR obtidos 68 Cl-14 transcriptome Rep1 Cl-14 transcriptome Rep2 Cl-14 genome CL Brener genome Figura 15 – Cobertura nucleotídeo a nucleotídeo da Trans-sialidase Tc00.1047053509495.30 por reads de sequenciamento genômico de CL Brener e CL-14 e por reads do sequenciamento do transcriptoma de CL14. A sombra rosada indica a posição das repetições SAPA. 69 Figura 16 – A- Eletroforese de PCR com primers que anelam em sequências flanqueadoras das repetições SAPA. B- O gel da esquerda é eletroforese de fragmentos de digestão das enzimas AluI, PuvII e HpaII, de DNA genômico dos clones CL Brener e CL-14. O gel da direita apresenta Southern Blot com sondas SAPA, hibridizadas sobre as digestões. 70 com um par de primers que flanqueiam as repetições SAPA, sendo o primer F upstream aos motivos, dentro da região codificadora e o primer R, downstream aos motivos, fora da região codificadora. A amplificação em CL-14 apresenta uma única banda de aproximadamente 500pb. A amostra de CL Brener gerou um arraste, sem apresentar bandas bem definidas, o que é devido à formação de muitos fragmentos com tamanhos variados. Esse fenômeno é causado pelo fato das repetições SAPA, em grande quantidade, anelarem entre si, formando fragmentos inespecíficos. Foi também realizado southern blot com sondas de repetições SAPA sobre fragmentos de DNA de CL-14 e CL Brener digeridos com as enzimas AluI, PvuII e HpaII (Figura 16-B). Tais enzimas possuem seu sítio de corte dentro do motivo SAPA e são, portanto, capazes de cortar os motivos repetitivos e, com hibridizações, quantificar as diferenças nos números de repetições. Com a enzima AluI, é possível identificar um número maior de bandas provenientes de digestão na amostra CL Brener e também a maior intensidade dessas bandas, quando comparadas com CL-14. As hibridizações com sonda SAPA sobre as digestões com enzimas PvuII e HpaII, apresentam sinal mais forte em CL Brener, indicando maior presença dos motivos SAPA neste clone. Isso confirma os dados obtidos in silico evidenciando que as transsialidases com SAPA (TcTS-SAPA) estão em número menor no clone CL-14 e possuem quantidades menores de repetições SAPA em suas sequências (Figura 17). 71 CL Brener 98,6% iden. CL-14 100% iden. ... ..... . . ..... .... . ..... .......... ......... ...... Domínios sialidase e lectina SAPA Domínio transmembrana Figura 17 – Desenho esquemático da organização das repetições SAPA nos clones CL Brener e CL-14, presentes no gene Tc00.1047053509495.30. 72 Com o intuito de verificar, se as diferenças observadas nas análises genômicas podem ser confirmadas na expressão de proteínas, foi feito um ensaio de western blot com anticorpos anti-SAPA e anti-trans-sialidase com proteínas totais dos clones CL Brener e CL-14, obtidas de culturas dos estágios de vida epimastigota e tripomastigota (figura 18). São identificadas transsialidases nas amostras de tripomastigotas em CL Brener e CL-14, indicando que ambas expressam trans-sialidases. Porém, quando utilizados anticorpos anti-SAPA, apenas as amostras de tripomastigotas de CL Brener são identificadas com sinal forte, e uma banda fraca na amostra de tripomastigota de CL-14 é observada. 73 CL Br Try CL Br Epi CL-14Try CL-14 Epi CL Br Try CL Br Epi CL-14Try CL-14 Epi CL Br Try CL Br Epi CL-14Try CL-14 Epi MW 177 – 118 – 75 51 39 26 18 – KDa Comassie Anti-SAPA Anti-TS Figura 18 – À esquerda, perfil de eletroforese de proteínas totais dos clones CL Brener e CL-14, nos estágios de vida epimastigota e tripomastigota. No meio, Western blot com anticorpo anti-SAPA. À direita, Western Blot com anticorpos anti-trans-sialidase. 74 4.6 Sequenciamento e Mapeamento do Transcriptoma de CL-14 Com o objetivo de investigar se haveria diferenças no padrão global de expressão gênica entre os dois clones, foram produzidas bibliotecas de cDNAs, provenientes de mRNAs extraídos de formas epimastigotas de T. cruzi clone CL-14 obtidas de culturas em meio LIT e de formas tripomastigota e amastigotas obtidas 48 e 60 horas após a infecção de células de fibroblasto de prepúcio humano (HFF). No caso das culturas de amastigotas, foram extraídos os RNAs do parasito juntamente com RNA total das células hospedeiras. As figuras 19 A apresenta perfil dos RNAs totais de amostra de células infectadas por 48 horas contendo formas amastigotas e, a figura 19 B apresenta o perfil dos RNAs totais das formas epimastigotas. Na figura 19 A observam-se dois picos com um alto sinal de fluorescência correspondentes as subunidades maior e menor do ribossomo da célula hospedeira e ainda picos de menor intensidade, na mesma posição do rRNA da subunidade menor da célula hospedeira e que são correspondentes as moléculas de rRNA do parasito. Em 19 B é possível observar picos com aproximadamente 2000 e 2151 nucleotídeos, provenientes da subunidade maior do rRNA do T. cruzi e 2221 nucleotídeos correspondente ao rRNA da subunidade menor do ribossomo do parasito. Picos de menor peso molecular (< 200nt) correspondem a tRNA, alguns pequenos RNAs ou são resultantes de degradação de RNA. 75 Figura 19 – Exemplos de perfis de RNAs totais e bibliotecas de cDNA. A- RNA total extraído de amostras de células HFF infectadas com amastigotas a 48 horas após a infecção. B- RNA total de epimastigotas. C- Bibliotecas de cDNA produzidas a partir de mRNA de células HFF e amastigotas. D- Bibliotecas de cDNA produzidas a partir de mRNA de epimastigotas. Em C e D, os picos das extremidades são marcadores de peso molecular. 76 Com uso de beads magnéticas contendo olido-dT ligados, os mRNAs foram purificados e utilizados como fitas molde para produção das bibliotecas de cDNAs a serem sequenciadas na plataforma Illumina HiSeq 1500. O transcriptoma de cada estágio de vida foi sequenciado a partir de duplicatas biológicas com fragmento de pair-ends de 300 nucleotídeos (Figura 19 C e D). Cada fragmento (pair-end) teve 100nt de suas extremidades sequenciados (reads de 100 nt), utilizando a facility de sequenciamento existente no departamento de Plant Biology, na University of Maryland, onde realizei o estágio de doutorado sanduíche. As reads geradas foram mapeadas pelo software TopHat2 nos cromossomos do genoma do Trypanosoma cruzi CL Brener versão 4.3 do TritrypDB em duas estratégias diferentes. Uma das estratégias mapeou as reads utilizando cada haplótipo da referência separadamente. A outra estratégia mapeou as reads no genoma de referência completo, de uma só vez. Como exemplo, descrevemos os resultados obtidos com o mapeamento das reads obtidas da biblioteca gerada com RNA extraído de células infectadas por 48h. Esse mapeamento feito contra o genoma completo, configurado para não mapear uma read mais de uma vez, resultou em aproximadamente 19 e 11 milhões de pair-ends alinhados (para cada réplica biológica), onde 17 e 10,5 milhões de pair-ends tiveram ambas reads mapeadas, 16 e 9,5 milhões foram mapeados na direção correta, sendo as reads R1 alinhadas sentido downstream ao DNA de referência e as reads R2 alinhadas sentido upstream. 77 Aproximadamente 1,5 milhão de reads da réplica 1 e 847 mil reads da réplica 2, foram alinhadas em singletons, sem seus pares. Os resultados dos mapeamentos contra o genoma completo e contra cada haplótipo separadamente, visualizados pelo software IGV, demonstraram novamente a natureza híbrida do genoma do clone CL-14 (Figura 20). Devido ao fato de não ter sido ainda concluído o sequenciamento e análise do transcriptoma do clone CL Brener de T. cruzi (clone de referência do projeto genoma) não foi possível, com os dados gerados a partir das bibliotecas de CL-14 realizar as análises comparativas que são o objetivo dessa parte do trabalho. Essas análises estão em andamento em colaboração com o grupo do Dr. Najib El-Sayed, da Universidade de Maryland. Uma vez concluídas, essas análises nos permitirão determinara se existem diferenças no conjunto de genes expressos nos vários estágios do ciclo de vida desses dois clones de T. cruzi e que poderiam estar relacionadas às diferenças na virulência observada entre eles. 78 Figura 20 – Mapeamento das reads do sequenciamento do mRNA de CL-14. O gene utilizado como referência é o MSH2, um gene de cópia simples. As duas figuras superiores apresentam mapeamento sobre o genoma completo. As figuras de baixo mostram o mapeamento em cada haplótipo separadamente, evidenciando os polimorfismos entre os haplótipos. As setas mostram os polimorfismos encontrados em reads mapeadas nos haplótipos errados. Cada barra cinza representa reads corretamente mapeadas com seus mate pairs. Barras de outras cores são mapeamentos simples, sem mate pairs. 79 5. Discussão O sequenciamento do genoma completo do Trypanosoma cruzi clone CL Brener (El-Sayed et. al., 2005ª) confirmou a natureza diplóide desse organismo e ainda mostrou que se trata de um genoma híbrido, contendo 22570 genes codificadores de proteínas. Segundo Arner et. al., 2007, essa predição de conteúdo gênico é muito conservadora, sendo que o T. cruzi tem pelo menos o dobro de genes, se forem consideradas todas as cópias de genes que fazem parte de famílias multigênicas. Isso ocorre devido ao fato de genes pertencentes a essas famílias multicópias terem disso montados de forma incompleta e de forma não precisa, sem distinção dos diferentes haplótipos. O sequenciamento se deu por WGS (whole-genome shotgun), utilizando o método de Sanger, onde foram geradas aproximadamente 768 milhões de nucleotídeos numa cobertura de 14 vezes o genoma. O sequenciamento do clone CL-14 foi realizado também por WGS, porém via pirosequenciamento, e gerou 1,5 bilhão de nucleotídeos, com cobertura de 27 vezes o genoma, em contraste com o sequenciamento mais modesto do clone CL Brener. Mesmo assim, nenhum dos sequenciamentos gerou grandes contigs, devido ao caráter repetitivo das sequências desses genomas. Com um tamanho médio das reads de 650 e 400 nucleotídeos para CL Brener e CL-14, respectivamente, e a grande quantidade de sequências repetitivas do genoma 80 (mais que 50%) não foi possível fazer uma montagem completa desses genomas. Essas repetições incluem genes que codificam proteínas de superfície e muitas outras famílias gênicas e repetições fora de sequências codificantes. Com reads maiores, El-Sayed et al., 2005ª geraram 4008 contigs para o clone CL Brener, atingindo N50 de 25950 nucleotídeos, um resultado bem melhor quando comparado ao nosso, com um N50 de 1629 nucleotídeos para o clone CL-14. Nos dois casos esses números são baixos, sabendo-se que os cromossomos do T. cruzi variam de tamanho entre 0,51 e 3,27 milhões de pares de bases (Souza et al., 2013). O N50 é um parâmetro que estima o número de nucleotídeos da montagem correspondente à soma de contigs do mesmo tamanho ou maiores (partindo-se do maior para o menor), com a qual se atinge a metade do total de nucleotídeos do genoma. Essa medida, o N50 é, portanto, uma estimativa de tamanho dos maiores contigs montados. Para o estudo de genes individuais, a montagem do genoma do clone CL Brener é satisfatória, pois os contigs montados são, na maioria das vezes, maiores que o tamanho médio de sequências codificadoras de proteínas, cuja média é 1513pb (El-Sayed et al., 2013). Dos mais de 40 mil contigs montados com o genoma do clone CL-14, somente 10772 tem tamanhos maiores que o tamanho médio das CDS de T. cruzi, mas entretanto muitos deles podem ser quimeras geradas com sequencias dos dois haplótipos. A montagem do genoma da CL Brener resultou em 838 scaffolds, os quais foram posteriormente agregados em 41 pares de cromossomos preditos com 81 tamanhos entre 78 Kb a 2,4Mb (Weatherly et. al., 2009). Note-se que foram montados cromossomos com tamanhos muito menores do que os tamanhos estimados por PFGE. Note-se também que nesse trabalho, 9,3 milhões de nucleotídeos provenientes de contigs ou singlets não foram incorporados na montagem, ou seja, mesmo essa “montagem aperfeiçoada” publicada por Weatherly et. al. (2009) ainda está bastante incompleta. Utilizando mapas de sintenia com cromossomos do Trypanosoma brucei e bibliotecas de BACs do projeto genoma original, esses autores propuseram a existência de 41 pares de cromossomos no clone CL Brener. Dados de PFGE indicam a presença de 20 bandas cromossômicas. Estes dados não são contraditórios, visto que ocorre co-migração de bandas cromossômicas em T. cruzi (revisado por Zingales, et al., 1997). No caso da CL-14, uma tentativa de geração de scaffolds utilizando o programa Mauve Multiple Genome Alignment (gel.ahabs.wisc.edu/mauve/ ) baseada nos contigs de CL Brener, não resultou em uma melhora na nossa montagem. Como perspectivas acerca da montagem do genoma do T. cruzi, planejamos o sequenciamento de novo do genoma do clone CL Brener e CL-14 na plataforma de sequenciamento Illumina a qual, apesar de fornecer reads menores, gera uma quantidade maior de informação, devido à grande cobertura. Essa maior cobertura do genoma, somada ao sequenciamento original, feito pela metodologia de Sanger, no caso de CL Brener tornará mais fácil a segregação dos haplótipos e concatenação de contigs, com o intuito de 82 representar com fidelidade os cromossomos. Uma outra estratégia presente em nossas perspectivas é a de sequenciar alguns cromossomos individuais com a nova plataforma de sequenciamento, Single Molecule Real-Time (SMRT) desenvolvida pela Pacific Biosciences, a qual permite gerar reads maiores entre 10-20 Kb (English et al., 2012). Com isso torna-se possível montar cromossomos inteiros, com aproximadamente 1000 sequencias de SMRT, montagem essa onde os clusters contendo famílias multigênicas estariam devidamente representados. Almejamos que essas estratégias de sequenciamento combinadas possam fornecer dados suficientes para uma montagem fiel do genoma e por fim fornecer resultados promissores para a continuidade dos estudos genômicos do Trypanosoma cruzi. O genoma nuclear diplóide de 110 Mb estimado para o clone CL Brener, é similar ao tamanho do genoma observado para CL-14, 112Mb (Souza et al., 2011). Em 2005, El-Sayed et al., estimou, com base em dados de sequencia que o genoma nuclear do clone CL Brener possui 110Mb. Pequenas divergências nas estimativas dos tamanhos dos genomas do CL Brener refletem a dificuldade de montar o genoma com tamanha carga de repeti ções, pois elas podem não ser representadas em todas suas extensões, acabando por subestimar as predições. Ambos os genomas, CL Brener e CL-14 são significante maiores do que o genoma nuclear do clone Sylvio X10/1, um T. cruzi do grupo TcI, o qual tem 5,9Mb a menos de sequências haplóides (Fránzen et al., 2011), quando comparado ao predito por El-Sayed et al., 2005. 83 Muitas das diferenças que justificam o tamanho reduzido do genoma do clone Sylvio X10/1 são relacionadas aonmúmero de membros pertencentes a grandes famílias multigênicas (Fránzen et al., 2011) Os genomas de CL-14 e CL Brener são mais semelhantes entre si do que os genomas de outras cepas de T. cruzi, no que diz respeito ao tamanho destes e composição de bandas cromossômicas (Souza et al., 2011). As hibridizações de sondas de DNA com bandas cromossômicas dos dois clones (southern blots), confirmam esse resultado. A sonda MASP, de família multigênica, mostra as mesmas bandas hibridizadas, sendo que a intensidade de algumas não é a mesma. Essas diferenças de sinal devem ser devidas a diferenças entre as sequências dentro da família gênica, mas não no conteúdo gênico, visto que a estimativa no número de cópias é a mesma. O mesmo resultado é observado para uma sonda que hibridiza com o gene de cópia única GPI8 e em outros dois southern blots hibridizados com sondas dos genes de amastina e DGF- , nesses casos utilizando fragmentos de DNA genômico gerados por digestão enzimática. Membros da família Kinetoplastida tem o genoma mitocondrial conhecido como kDNA (kinetoplast DNA). O kDNA do T. cruzi consiste em milhares de minicírculos variáveis e dezenas de maxicírculos. Os minicírculos, com aproximadamente 1,4Kb, são compostos de quatro sequências conservadas de 100 a 200 pares de bases repetidas e sequencias 84 hipervariáveis nas quais estão codificados os RNAs guias, ou gRNA (Ray, 1989). Os maxicírculos, por sua vez, tem aproximadamente 22Kb, 15Kb dos quais correspondem a sequências codificadoras de proteínas mitocondriais (Ruvalcaba-Trejo e Sturm, 2011; Junqueira et al., 2005). O sequenciamento do maxicírculo da CL Brener mostrou que este pode ser classificado como pertencente ao grupo TcIII, ou seja, no híbrido CL Brener foi herdada uma mitocôndria da cepa parental TcIII. A cepa Esmeraldo é uma cepa não híbrida pertencente ao grupo TcII e, portanto, possui a sequência do maxicírculo do tipo TcII (Westenberg et al., 2006). O genoma do maxicírculo do clone CL-14 foi montado tendo o maxicírculo do clone CL Brener como referência, após a seleção dos reads com base no alinhamento com o genoma mitocondrial de CL Brener. A grande cobertura de reads em determinados pontos dos maxicírculos indicam regiões largamente repetitivas, as quais sendo maiores que as reads geradas pelo pirosequenciamento, impedem uma montagem correta dessas sequências. O DNA mitocondrial da CL-14 confirma a natureza híbrida deste clone, isolado da mesma cepa da qual foi também isolado o clone CL Brener, pois ambos tem similaridade maior entre si quando comparada à sequencia do genoma do maxicírculo da cepa Esmeraldo, que é TcII. Pode-se observar que os genomas mitocondriais dos clones CL-14 e CL Brener apresentam alto grau de sintenia. Apenas nove polimorfismos foram identificados entre os mesmos, sendo que todos são, ou inserção ou deleção de nucleotídeos (Figura 9). Entre 85 as diferenças observadas, chamam a atenção as deleções de nucleotídeos nas sequencias de citocromo oxidase I, a MURF1, MURF2, ND5 e inserção na região codificadora de ND4. Visto que esses mRNAs sofrem extensas modificações pós-transcricionais por adição de Uridinas (RNA editing) não podemos afirmar por enquanto se essas inserções e deleções estariam afetando a expressão dessas enzimas mitocondriais. No entanto, a maioria dos polimorfismos é encontrada dentro de homopolímeros, os quais podem ter sido erroneamente representados pelo sequenciamento do genoma do clone CL-14, que foi realizado pela plataforma 454 FLX, Os nossos dados de RNA-seq poderão servir para esclarecer essa questão e talvez identificar outras diferenças relevantes que poderiam resultar nas diferenças de virulência entre os clones. Os marcadores moleculares nucleares desenvolvidos por Souto et al. (1996), mini-exon SL e rDNA 24S e um marcador mitocondrial da sequência do gene citocromo oxidase II (COII) (De Freitas et al., 2006) foram utilizados para análises in vitro e in silico para a determinação de qual grupo de T. cruzi o clone CL-14 pertence. Todos os resultados indicam que este clone é, assim como o clone CL Brener, pertencente ao grupo VI do T. cruzi (TcVI), um grupo que abrange linhagens híbridas (Broutin et al., 2006). Outras análises indicam que os haplótipos de CL-14 teriam a mesma origem filogenética dos haplótipos de CL Brener, sendo eles esmeraldo like e non-esmeraldo like, ou seja, um haplótipo TcII e outro haplótipo TcIII, como descrito por El-Sayed et al. (2005ª). 86 Essa conclusão, ou seja, de que o clone CL-14 pertence a mesma DTU que o clone CL Brener e teriam a mesma origem filogenética, asseguram que análises genômicas baseadas em cobertura da reads de CL-14 sobre o genoma de referência da CL Brener sejam conduzidas com confiança. Dois genes nucleares de cópia simples e um gene mitocondrial também de cópia simples, foram selecionados para montar as reads relacionadas a esses genes de CL-14 e construir uma árvore filogenética. A árvore com sequencias dos genes nucleares construída de forma independente das sequencias do genoma de referência, resultou em duas sequências para cada gene e uma única sequência para o gene mitocondrial. Alinhamentos múltiplos e globais foram realizados entre as sequências de cada gene, com dados dos clones CL Brener e CL-14 e árvores filogenéticas foram feitas com os resultados dos alinhamentos. É notável que os alelos dos genes nucleares clusterizam-se com seu alelo específico, confirmando que se trata de um genoma híbrido e que os haplótipos de CL-14 são oriundos de parentais TcII e TcIII. Outliers T. cruzi tipo I, genes do clone Sylvio X10/1, foram adicionados às árvores produzidas para identificar se algum dos alelos da CL-14 estaria filogeneticamente mais próximo à TcI, o que não foi observado. Os mesmos resultados foram encontrados na árvore produzida com gene mitocondrial, apresentando que o gene de CL-14 é mais próximo ao de CL Brener que Sylvio X10/1, indicando que o haplótipo do genoma mitocondrial de CL-14 é TcIII, assim como kDNA de CL Brener (Westenberger et al., 2006). 87 Uma vez que não foi possível montar o genoma do clone CL-14 todas as análises e comparações genômicas foram realizadas com base nos reads, ou seja, pela cobertura das sequências de genoma de referência, o clone CL Brener. Para isso, as reads de CL-14 foram alinhadas contra todas as sequências de CL Brener que estão devidamente montadas, sem mistura de haplótipos. Nenhuma das reads que apresenta ao menos uma característica específica de algum haplótipo apresenta alguma característica do outro haplótipo. Para todo polimorfismo entre reads de CL-14 e o genoma de referência nas regiões onde o genoma de referência apresenta polimorfismos entre seus haplótipos, são observadas reads de CL-14 correspondentes apenas a um dos haplótipos. Isso confirma novamente que o genoma do clone CL-14 além de ser híbrido, não possui genes quimera ou mistura de sequências entre seus alelos. A mesma análise foi realizada para a família das amastinas, porém com mais representantes desta família, a qual possui 12 cópias montadas na versão do genoma publicada por El-Sayed (2005a). Ainda que a função das amastinas não tenha sido elucidada, a expressão desses genes na superfície de amastigotas em T. cruzi e Leishmania spp, sugere que elas participem de importantes interações com as células do hospedeiro mamífero (Rochette et al., 2005). A diversidade de sequência nesta família, a qual se deve a alguns fatores, como taxas de mutação e mecanismos de conversão de genes, 88 poderia estar relacionada ao papel que as amastinas possam ter nessas interações com proteínas distintas do hospedeiro (Cerqueira et al., 2008). Os clones de T. cruzi CL-14 e CL Brener contem o mesmo conteúdo gênico, incluindo housekeeping genes que codificam famílias multigênicas de proteínas de superfície e genes de função desconhecida. Não há ORFs ou família gênica descritos em CL Brener que não tenham cobertura total ou parcial em CL-14. Não foi possível identificar ORFs específicos do clone CL-14 e, portanto, não presentes em CL Brener. Por outro lado, Frazén et. al., 2011ª, encontraram diferenças em 6 ORFs presentes em CL Brener mas ausentes em Sylvio X-10. Os autores ainda verificaram que as dimensões das famílias multigênicas entre esses clones não são semelhantes, em sua maioria. Foram identificadas diferenças nas sequências nucleotídicas de muitos homólogos, principalmente em famílias multigênicas. Tais diferenças são relacionadas a SNPs, adição ou deleção de motifs característicos dessas sequências e extensão de repetições. Essas divergências também foram observadas comparando os genomas de CL Brener e Sylvio X10/1, entre subespécies de T. brucei (Jackson et. al., 2010) e entre espécies de Leishmania (Peacock et. al., 2007). Diferentemente do observado para Sylvio X10/1, não observamos grandes diferenças na quantidade de cópias de genes de famílias multigênicas. Nas nossas buscas por características genômicas que poderiam ser correlacionadas com diferenças de virulência entre os clones estudados, foi identificado um número menor de repetições de aminoácidos presentes em um 89 sub-grupo de genes da família multigênica trans-sialidase. A família das transsialidases são proteínas de membrana, importantes alvos de estudo, pois participam da interação entre o parasito e hospedeiro. Elas transferem o ácido siálico das células do hospedeiro para a superfície celular do parasito, modulando a ação do sistema imunológico do hospedeiro (Schauer et al., 1983) e participam de outros aspectos da interação parasito-hospedeiro (DcRubin e Schenkman, 2012). Algumas trans-sialidases do subgrupo I possuem repetições SAPA (shed acute phase antigen) em sua porção C-terminal (Pollevick et al., 1991) que parecem ter o papel de aumentar a meia-vida da proteína liberada no sangue do hospedeiro (Buscaglia et al., 1999). Tais repetições são alvos do sistema imune adaptativo do hospedeiro. Anticorpos não inibitórios são gerados contra a proximidade do sítio catalítico e do domínio lectina das trans-sialidases (Pitcovsky et al., 2002) e, junto com as repetições SAPA altamente imunogênicas, atrasam a formação de anticorpos inibitórios ou neutralizantes, os quais controlam os níveis do parasito (Risso et al., 2007). O clone CL-14 possui trans-sialidases do subgrupo I com repetições SAPA, TcTS-SAPA, porém com um número menor de repetições do as TcTS do subgrupo I do clone CL Brener. Observamos que, para a TcTS-SAPA Tc00.1047053509495.30 de CL Brener, 19 SAPAs são encontradas enquanto seu homólogo no clone CL-14 possui apenas 3 repetições. Com um total de 12 aminoácidos em cada SAPA, algumas 90 degenerações específicas e conservadas são observadas, tanto em CL Brener quanto em CL-14. Pela cobertura das reads de CL-14 sobre essa TcTS-SAPA em questão (figura 13) é possível observar, com base nas degenerações, que as sapas cobertas em CL14 não são apresentadas na mesma ordem que em CL Brener. Esses dados foram confirmados pela amplificação de fragmentos de TcTS-SAPA provenientes dos dois clones (figura 16-A). Foram gerados muitos amplicons para o clone CL Brener, devido ao fato da grande quantidade de repetições SAPA, as quais podem anelar entre si, formando novas fitas de DNAs que servem de substrato para a DNA polimerase que acaba gerando fragmentos de diversos tamanhos. Observa-se, devido a esse fenômeno, um arraste no gel, aonde deveriam ser encontradas bandas bem definidas, como o que acontece com o clone CL-14, o qual, por possuir poucas repetições SAPA não provoca esse resultado. Southern blots com sondas que hibridizam com as repetições SAPA foram realizados sobre membranas transferidas de eletroforese a partir de digestões de DNA total dos clones com enzimas que clivam as repetições SAPA. Foi possível observar para as três digestões realizadas que quantidades maiores de bandas foram hibridizadas nas amostras de CL Brener em comparação com CL-14. Além disso, as intensidades de sinais obtidos nas hibridizações das digestões com as enzimas PuvII e HpaII são maiores em CL Brener e também o tamanho das bandas é diferente, confirmando os dados in 91 silico, nos quais foi predito que o clone CL-14 possui menor quantidade de cópias das repetições SAPA. Com o intuito de verificar a expressão das proteínas TcTS-SAPA, foram realizados westerns blots com anticorpos anti-SAPA e anti-transi-sialidases nas fases de vida epimastigota e tripomastigota de ambos os clones (figura 18). Os anticorpos anti-trans-sialidases reconhecem proteínas de tripomastigotas de CL-14 com intensidade semelhante às amostras de proteínas de tripomastigotas do clone CL Brener. A intensidade semelhante mostra que os genes que codificam trans-sialidases estão presentes em ambos os clones. Não são observados sinais nas amostras de proteínas de epimastigotas, como esperado, pois em T. cruzi, as trans-sialidases são proteínas de superfície que interagem com o hospedeiro vertebrado e não com o vetor invertebrado. Ao realizar o experimento com anti-corpos anti-SAPA, observam-se hibridizações com intensos sinais e várias bandas no clone CL Brener na fase tripomastigota, porém sinal fraco e número bem menor de bandas para o clone CL-14 na fase tripomastigota. Este resultado também confirma nossas observações in silico de que o clone CL-14 possui TcTS-SAPA, funcionais, porém com número menor de repetições SAPA e, provavelmente, número também menor de TcTSSAPA. As trans-sialidases do subgrupo I estão presentes nos cromossomos 33 do T. cruzi. Assim como na montagem dos outros cromossomos, é possível identificar sequências de ambos haplótipos que foram concatenadas, grandes 92 gaps e clusters de sequências repetitivas que podem ou não estar corretamente representadas. Além da possibilidade de ter sido feita uma montagem incorreta de partes do cromossomo, há também o viés da não representação correta das repetições SAPA. Uma de nossas perspectivas é sequenciar novamente esses cromossomos com sequenciamento de reads longas (>10Kb) (English et al., 2012) e montar corretamente os mesmos. Isso possibilitará verificar a sequência exata dessas moléculas e comparar com maior confiança as diferenças entre as TcTS-SAPA dos clones CL Brener e CL-14. Além disso, poderemos verificar se houve uma grande deleção de clusters inteiros no genoma do clone CL-14, em comparação com o genoma do clone CL Brener, deleção apenas das repetições SAPA de alguns genes que codificam para trans-sialidases do subgrupo I, ou mesmo adição dessas repetições nos genes de CL Brener. Com poucas diferenças genômicas encontradas que possam ser relacionadas à virulência, temos também como objetivo realizar o estudo de transcriptômica comparativa. Para tal, os níveis de expressão dos genes entre os clones aqui estudados serão avaliados nos diferentes estágios de vida, principalmente na diferenciação entre as fases amastigota e tripomastigota, que são fases da vida do parasito onde este está intimamente relacionado e em contato com o hospedeiro vertebrado. Até este momento, temos sequenciados cDNAs provenientes dos mRNAs do clone CL-14, nos estágios epimastigota, amastigota e tripomastigota, pela tecnologia RNA-seq. Esses 93 sequenciamentos foram realizados na plataforma Illumina, a qual gera grande quantidade de reads, na proporção correta de sequências presentes no transcriptoma do parasito, o que é ideal para avaliar os níveis de expressão dos transcritos. Em colaboração com o grupo do Dr. Najib M. El-Sayed, responsável pelo estudo do transcriptoma do clone CL Brener, faremos as análises comparativas desses dois transcriptomas a fim de verificar diferenças significativas entre os níveis de expressão entre os genes dos clones, no que diz respeito à capacidade de infecção e desenvolvimento no organismo do hospedeiro vertebrado. Essas análises encontram-se em andamento. 94 6. Referências Bibliográficas Andrade, S.G. Caracterização de cepas do Trypanosoma cruzi isoladas no Recôncavo Baiano. Rev Patol Trop 3: 65-121, 1974. Andrews, S.; http://www.bioinformatics.babraham.ac.uk/projects/fastqc/, 2010. Araújo, P. R.; Teixeira, S. M. Regulatory elements involved in the posttranscriptional control of stage-specific gene expression in Trypanosoma cruzi: a review. Mem. Inst. Oswaldo Cruz, Rio de Janeiro, v. 106, n. 3, 2011 . Archer, S.K.; Inchaustegui, D.; Queiroz, R.; Clayton, C. The Cell Cycle Regulated Transcriptome of Trypanosoma brucei. PLoS ONE 6(3), 2011. Arner, E.; Kindlund, E.; Nilsson, D.; Farzana, F.; Ferella, M.; Tammi, M. T.; Andersson, B.; Database of Trypanosoma cruzi repeated genes: 20,000 additional gene variants. BMC Genomics, 8:391, 2007. Atayde V.D.; Neira, I.; Cortez, M.; Ferreira, D.; Freymuller, E.; Yoshida, N.;Molecular basis of non-virulence of Trypanosoma cruzi clone CL-14. Inter. J. Parasitol., 34: 851-60, 2004. Ausubel, F.M., Brent, R. and Kingston, R.E. (1995). Current Protocols in Molecular Biology. New York Greene Publishing Associates and WileyInterscience. Berriman, M.; Ghedin, E.; Hertz-Fowler, C.; Blandin, G.; Renauld, H.;Bartholomeu, C. C.; Lennard, N. J.; Caler E. et al. The genome of the African trypanosome Trypanosoma brucei, Science 309, pp. 416–422, 2005. Branche, C., Ochaya, S., Aslund, L., Andersson, B., 2006. Comparative karyotyping as a tool for genome structure analysis of Trypanosoma cruzi. Mol. Biochem. Parasitol. 147, 30–38. Brandao, A., Urmenyi, T., Rondinelli, E., Gonzalez, A., de Miranda, A.B., Degrave, W., 1997.Identification of transcribed sequences (ESTs) in the Trypanosoma cruzi genome project. Mem. Inst. Oswaldo Cruz 92, 863–866. Brener, Z.; Chiari, E. Variações morfológicas observadas em diferentes amostras de Trypanosoma cruzi, Rev. Inst. Med. Trop. São Paulo 5, pp. 220– 224, 1963. 95 Brener, Z.; Andrade, Z.;Barral-Neto, M. Trypanosoma cruzi e doença de Chagas. Guanbara-Koogan, 2ª.Edição, 2000. Burgos, J.M.et al. Direct molecular profiling of minicircle signatures and lineages of Trypanosoma cruzi bloodstream populations causing congenital Chagas disease, International Journal of Parasitology 37 (12), pp. 1319–1327, 2007. Buscaglia, C. A.; Alfonso, J.; Campetella, O.; Frasch, A. C.; Tandem amino acid repeats from Trypanosoma cruzi shed antigens increase the half-life of proteins in blood. Blood, 93, 2025-2032, 1999. Broutin, H.; Tarrieu, F.; Tibayrenc, M.; Oury, B.; Barnabé, C.; Phylogenetic analysis of the glucose-6-phosphate isomerase gene in Trypanosoma cruzi. Experimental Parasitol. 113:1–7, 2006. Campos, P. C.;Bartholomeu, D. C.; Da Rocha, W. D.;Cerqueira, G. C.;Teixeira, S.M.R. Sequences involved in mRNA processing in Trypanosoma cruzi, International Journal for Parasitology, Volume 38, Issue 12, Pages 1383-1389, 2008. Cano, M.I., Gruber, A., Vazquez, M., Cortes, A., Levin, M.J., Gonzalez, A., et al., 1995. Molecular karyotype of clone CL Brener chosen for the Trypanosoma cruzi genome project. Mol. Biochem. Parasitol. 71, 273–278. Cerqueira G. C.; Bartholomeu, D. C.; Da Rocha, W. D.;Hou, L.;Freitas-Silva, D. M.;Machado, C. R.;El-Sayed, N. M.;Teixeira, S. M. R. Sequence diversity and evolution of multigene families in Trypanosoma cruzi, Molecular and Biochemical Parasitology, Volume 157, Issue 1, Pages 65-72, 2008. Cerqueira, G.C., Da Rocha, W.D., Campos, P.C., Zouain, C.S., Teixeira, S.M., 2005. Analysis of expressed sequence tags from Trypanosoma cruzi amastigotes. Mem. Inst. Oswaldo Cruz 100, 385–389. Chervitz, S. S.; Dagdigian, C.; Fuellen, G.; Gilbert, J. G.; Korf, I.; Lapp, H. et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res.;12(10):1611–1618, 2002. Chevreux, B; Pfisterer, T.; Drescher, B.; Driesel, A. J.; Mülle,r W. E.; Wetter, T.; Suhai, S.Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 14(6): 1147–1159, 2004. 96 Chiari E. (1981). Diferenciação do Trypanosoma cruzi em cultura. PhD thesis, Universidade Federal de Minas Gerais, Belo Horizonte. Cribb, P.; Serra, E. One and two-hybrid analysis of the interactions between components of the Trypanosoma cruzi spliced leader RNA gene promoter binding complex. Int J Parasitol 39: 525-532, 2008. Da Rocha, W.D.; Otsu, K.; Teixeira, S. M.; Donelson, J. E.; Tests of cytoplasmic RNA interference (RNAi) and construction of a tetracyclineinducible T7 promoter system in Trypanosoma cruzi. Mol Biochem Parasitol 133: 175–186. 2004. Dc-Rubin, S. S.; Schenkman, S.; Trypanosoma cruzi trans-sialidase as a multifunctional enzyme in Chagas’ disease. Cellular Microbiology, v. 14, 2012. De Freitas, J. M.; Augusto-Pinto, L.; Pimenta, J. R.; Bastos-Rodrigues, L.; Goncalves, V. F.et al. Ancestral genomes, sex, and the population structure of Trypanosoma cruzi. PLoS Pathog. 2:e24, 2006. El-Sayed, N. M.; Myler, P. J.; Bartholomeu, D. C.; Nilsson, D.; Aggarwal, G.et al. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science 309: 409–415, 2005a. El-Sayed, N.M.; Myler, P. J.; Blandin, G.; Berriman, M.; Crabtree, J.; Aggarwal, G.; Caler, E.; Renauld, H.; Worthey, E. A.; Hertz-Fowler, C.; Ghedin, E.; Peacock, C.; Bartholomeu, D. C.et al. Comparative genomics of trypanosomatid parasitic protozoa. Science 309:404-409, 2005b. English, A. C.; Richards, S.; Gibbs, R. A. Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology. Plos One, 7(11), 2012. Franzén O.; Arner, E.; Ferella, M.; Nilsson, D.; Respuela, P.; Carninci, P.; Hayashizaki, Y.; Aslund, L.; Andersson, B.; Daub, C. O. The Short NonCoding Transcriptome of the Protozoan Parasite Trypanosoma cruzi. PLoS Neglected Tropical Diseases, 5(8): e1283. 2011. Franzén, O.; Ochaya, S.; Sherwood, E.; Lewis, M. D.; Llewellyn, M. S. et al. Shotgun Sequencing Analysis of Trypanosoma cruzi I Sylvio X10/1 and Comparison with T. cruzi VI CL Brener. PLoS Negl Trop Dis 5(3): e984, 2011. 97 Franzén O, Talavera-López C, Ochaya S, Butler CE, Messenger LA, Lewis MD, Llewellyn MS, Marinkelle CJ, Tyler KM, Miles MA, Andersson B.; Comparative genomic analysis of human infective Trypanosoma cruzi lineages with the batrestricted subspecies T. cruzi marinkellei. BMC Genomics. 2012 Oct 5;13:531. Freitas, J.M.; Lages-Silva, E.; Crema, E.; Pena, S. D. J.; Macedo, A. M.; Real time PCR strategy for the identification of major lineages of Trypanosoma cruzi directly in chronically infected human tissues. Int J Parasitol. 35:411–41, 2005. Freitas, J. M.; Augusto-Pinto, L.; Pimenta, J. R.; Gonçalves, V. F.; Teixeira, S. M.; Chiari,E.; Junqueira, Macedo, A. M.; Machado, C. R.; Pena, S. D. Ancestral population structure of Trypanosoma cruzi. PLoS Pathog Bastos-Rodrigues, L.; A. C.; Fernandes, O.; genomes, sex and the 2: e24, 2006. Fullwood, M. J.; Wei, C. L.; Liu, E. T.; Ruan, Y. Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res.;19:521-532, 2009. Galardini, M.; Biondi, E. G.; Bazzicalupo, M.; Mengoni, A.; CONTIGuator: A Bacterial Genomes Finishing Tool for Structural Insights on Draft Genoms. Source Code for Biology and Medicine, 6:11, 2011. Hartmann, C.; Hotz, H. R.; McAndrew, M,; Clayton, C. Effect of multiple downstream splice sites on polyadenylation in Trypanosoma brucei. Mol Biochem Parasitol 93: 149-152, 1998. Henriksson, J., Porcel, B., Rydaker, M., Ruiz, A., Sabaj, V., Galanti, N., et al., 1995. Chromosome specific markers reveal onserved linkage groups in spite of extensive chromosomal size variation in Trypanosoma cruzi. Mol. Biochem. Parasitol. 73, 63–74. Herrera,C.; Bargues, M. D.; Fajardo, A.; Montilla, M.;Triana, O.; Vallejo, G. A.; Guhl, F.Identifying four Trypanosoma cruzi I isolate haplotypes from different geographic regions in Colombia. Infect Genet Evol 7: 535-539, 2007. Hotez, P. J.;Molyneux,D. H.; Fenwick, A.et al. Control of neglected tropical diseases, N Engl J Med 357, pp. 1018–1027, 2007. Huang, X.;Madan, A. CAP3: A DNA sequence assembly program.Genome Res., 9 868-877, 1999. Ivens, A.C.; Peacock, C. S.;Worthey, E. A.; Murphy, L.; Aggarwal, G.; Berriman, M.; Sisk, E.; Rajandream, M. A. et al. The genome of the kinetoplastid parasite, Leishmania major. Science 309, pp. 436–442, 2005. 98 Jackson, A. P.; Sanders, M.; Berry, A.; McQuillan, J.; Aslett, M. A.; Quail, M. A.; Chukualim, B.; Capewell, P.; MacLeod, A.; Melville, S. E.; Gibson, W.; Barry, J. D.; Berriman, M.; Hertz-Fowler, C.The genome sequence of Trypanosoma brucei gambiense, causative agent of chronic human african trypanosomiasis. PLoS Negl Trop Dis. 4:e658, 2010. Junqueira, C.; Gerrero, A. T.; Galvão-Filho, B.; Andrade, W. A.; Salgado, A. P.; Cunha, T. M.; Robert, C.; Campos, M. A.; Penido, M. L.; Mendonça-Previato, L.; Previato, J. O.; Ritter, G.; Cunha, F. Q.; Gazzinelli, R. T.; Trypanosoma cruzi adjuvants potentiate T cell-mediated immunity induced by a NY-ESO-1 based antitumor vaccine. Plos One, vol. 7, 2012 Kangussu-Marcolino, M. M.; de Paiva, R. C.; Araújo, P. R.; Mendonça-Neto, R. P.; Lemos, L.; Bartholomeu, D. C.; Mortara, R. A.,; DaRocha, W. d., Teixeira, S. M. T.; Distinct genomic organization, mRNA expression and cellular localization of members of two amastin sub-families present in Trypanosoma cruzi. BMC Microbiology, v. 13, p. 10, 2013. Kim, D.; Pertea, G.; Trapnell, C.; Pimentel, H.; Saizberg, S.; TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions, Genome Biology, 14, R36, 2013. Kim, K.S.; Teixeira, S.M.; Kirchhoff, L.V.; Donelson, J.E.; Transcription and editing of cytochrome oxidase II RNAs in Trypanosoma cruzi. J Biol Chem, 2, 1994. Kirchhoff, L. V.; Epidemiology of American Trypanosomiasis. In: Weiss, L. M.; Tanowitz, H. B.; Kirchhoff, L. V.; Advances In Parasitology: Chagas Disease. Elsevier, 2011. 1-14. Kirchhoff, L. V.; Hieny, S.; Shiver, G. M.; Snary, D.; Sher, A. Cryptic epitope explains the failure of a monoclonal antibody to bind to certain isolates of Trypanosoma cruzi. J. Immunol. 133, 2731–2735, 1984. Kolev, N. G.; Franklin, J. B.; Carmi, S.; Shi, H.; Michaeli, S. et al. The Transcriptome of the Human Pathogen Trypanosoma brucei at SingleNucleotide Resolution. PLoS Pathog 6(9), 2009. Larkin, M.A.; Blackshields, G.; Brown, N.P.; Chenna, R.; McGettigan, P.A.; McWilliam, H.; Valentin, F.; Wallace, I.M.; Wilm, A.; Lopez, R.; Thompson, J.D.; Gibson, T.J.;Higgins, D.G. ClustalW and ClustalX version 2. 2948, 2007. Li H.; Durbin, R.; Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595, 2010. 99 Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; 1000 Genome Project Data Processing Subgroup; The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9, 2009. Liang, X.H.; Haritan, A.; Uliel, S.; Michaeli, S. Trans and cis splicing in trypanosomatids: mechanism, factors, and regulation. Eukaryot Cell 2: 830-840, 2003. Lima, M. T.; Jansen, A. M.; Rondinelli, E.; Gattass, C. R. Trypanosoma cruzi: properties of a clone isolated from the CL strain. Parasitol. Res., 77: 77-81, 1990. Lima, M. T.; Lenzi, H. L.; Gattass, C. R. Negative tissue parasitism in mice injected with a non-infective clone of Trypanosoma cruzi. Parasitol. Res. 81: 612, 1995. López-Estraño, C.; Tschudi, C.; Ullu, E.Exonic sequences in the 5' untranslated region of alpha-tubulin mRNA modulate trans splicing in Trypanosoma brucei. Mol Cell Biol 18: 4620-4628, 1995. Machado, C. A.; Ayala, F. J. Nucleotide sequences provide evidence of genetic exchange among distantly related lineages of Trypanosoma cruzi. Proc Natl Acad Sci USA, 98:7396-7401, 2001. Martínez-Calvillo, S.; Yan, S.; Nguyen, D.; Fox, M.; Stuart, K.; Myler, P. J. Transcription of Leishmania major Friedlin chromosome 1 initiates in both directions within a single region. Mol Cell 11: 1291-1299, 2003. Martínez-Calvillo, S.; Nguyen, D.; Stuart, K.; Myler, P. J.Transcription initiation and termination on Leishmania major chromosome 3.Eukaryot Cell 3: 506-517, 2004. Mendes, T. A. O.; Lobo, F. P.; Rodrigues, T. S.; Rodrigues-Luiz, G. F.; DaRocha, W. D.; Fujiwara, R. T.; Teixeira, S. M. R.; Bartholomeu, D. C.; Repeat-Enriched Proteins Are Related to Host Cell Invasion and Immune Evasion in Parasitic Protozoa. Mol Biol Evol v. 30, p. 951-963, 2013. Miles, M. A.; Cedillos, R. A.; Povoa, M. M.; Souza, A. A.; de Prata, A. A.; Macedo, V.Do radically dissimilar Trypanosoma cruzi strains (zymodemes) cause Venezuelan and Brazilian forms of Chagas disease? Lancet 317: 13381340, 1981. 100 Minning, T. A.; Bua, J.; Garcia, G. A.; McGraw, R. A.; Tarlenton, R. L. Microarray profiling of gene expression during trypomastigote to amastigote transition in Trypanosoma cruzi. BMC Genomics, 131:55-64, 2003. Minning, T. A.; Weatherly, D. B.; Atwood, J. 3rd; Orlando, R.; Tarleton, R. L.The steady-state transcriptome of the four major life-cycle stages of Trypanosoma cruzi.BMC Genomics.7;10:370, 2009. Morel, C.; Chiari, E.; Camargo, E. A.; Mattei, D. M.; Romanha, A. J.; Simpson, L.Strains and clones of Trypanosoma cruzi can be characterized by pattern of restriction endonuclease. Proc Natl Acad Sci USA 77: 6810-6814, 1980. Motta MC, Martins AC, de Souza SS, Catta-Preta CM, Silva R, Klein CC, de Almeida LG, de Lima Cunha O, Ciapina LP, Brocchi M, Colabardini AC, de Araujo Lima B, Machado CR, de Almeida Soares CM, Probst CM, de Menezes CB, Thompson CE, Bartholomeu DC, Gradia DF, Pavoni DP, Grisard EC, Fantinatti-Garboggini F, Marchini FK, Rodrigues-Luiz GF, Wagner G, Goldman GH, Fietto JL, Elias MC, Goldman MH, Sagot MF, Pereira M, Stoco PH, de Mendonça-Neto RP, Teixeira SM, Maciel TE, de Oliveira Mendes T. A, Ürményi TP, de Souza W, Schenkman S, de Vasconcelos AT.; Predicting the proteins of Angomonas deanei, Strigomonas culicis and their respective endosymbionts reveals new aspects of the trypanosomatidae family. PLoS One. 2013 Najafabadi, H. S.; Lu, Z.; MacPherson, C.; Mehta, V.; Adoue, V.; Pastinen, T.; Salavati, R.; Global identification of conserved post-transcriptional regulatory programs in trypanosomatids. Nucleic Acids Research, online, July, 2013. Nozaki, T.;Cross, G. A. M. Effects of 3' untranslated and intergenic regions on gene expression in Trypanosoma cruzi, Molecular and Biochemical Parasitology, Volume 75, Issue 1, Pages 55-67, 1995. Ochs DE, Otsu K, Teixeira SM, Moser DR, Kirchhoff LV: Maxicircle genomic organization and editing of an ATPase subunit 6 RNA in Trypanosoma cruzi. Mol Biochem Parasitol, 76(1-2), 1996. Paiva, C. N., Castelo-Branco, M. T., Rocha, J. A., Lannes-Vieira, J, eGattass, C. R; Trypanosoma cruzi: lack of T cell abnormalities in mice vaccinated with live trypomastigotes. Parasitol Res, p. 1012-1017, 1999. Pays, E.; Vanhamme, L.; Pérez-Morga, D.; Antigenic variation in Trypanosoma brucei: facts, challenges and mysteries. Current Opinion in Microbiology, vol. 7, p. 369–374, 2004. 101 Peacock, C. S.; Seeger, K.; Harris, D.; Murphy, L.; Ruiz, J. C.; Quail, M. A.; Peters, N.; Adlem, E.; Tivey, A. et al. Comparative genomic analysis of three Leishmania species that cause diverse human disease. Nat Genet. 39(7):83947, 2007. Pena, S. D. J.; Machado, C. R.; Macedo, A. M. Trypanosoma cruzi: ancestral genomes and population structure. Mem. Inst. Oswaldo Cruz, Rio de Janeiro, 2011. Pitcovsky, T. A., Buscaglia, C. A., Mucci, J., Campetella, O.; A functional network of intramolecular cross-reacting epitopes delays the elicitation of neutralizing anti- bodies to Trypanosoma cruzi trans-sialidase. J Infect Dis 186: 397–404, 2002. Pollevick, G. D.; Affranchino, J. L.; Frasch, A. C. C.; Sanchez, D. O.; The complete sequence of a shed acute-phase antigen of Trypanosoma cruzi. Mol Biochem Parasitol 47: 247–250, 1991. Porcel, B.M., Aslund, L., Pettersson, U., Andersson, B., 2000. Trypanosoma cruzi: a putative vacuolar ATP synthase subunit and a CAAX prenyl proteaseencoding gene, as examples of gene identification in genome projects. Exp. Parasitol. 95, 176–186. Porcile PE, Santos MR, Souza RT, Verbisck NV, Brandão A, Urmenyi T, Silva R, Rondinelli E, Lorenzi H, Levin MJ, Degrave W, Franco da Silveira J. A refined molecular karyotype for the reference strain of the Trypanosoma cruzi genome project (clone CL Brener) by assignment of chromosome markers. Gene. 2003 Apr 10;308:53-65. Pyrrho, A. S.; Moraes, J. L.; Peçanha, L. M.; eGattass, C. R; Trypanosoma cruzi: IgG1 and IgG2b are the main immunoglobulins produced by vaccinated mice." Parasitol Res p. 333- 337, 1998. Ray DS; Conserved sequence blocks in kinetoplast minicircles from diverse species of trypanosomes. Mol Cell Biol, 9(3), 1989. Raymond F., Boisvert S., Roy G., et al.; Genome sequencing of the lizard parasite Leishmania tarentolae reveals loss of genes associated to the intracellular stage of human pathogenic species. Nucleic Acids Res. 2012;40:1131-47. Rassi, A.; Jr, R. A.; Marin-Neto, J. A. Chagas disease.Lancet.375:1388, 2010. Real F, Vidal RO, Carazzolle MF, Mondego JM, Costa GG, Herai RH, Würtele M, de Carvalho LM, E Ferreira RC, Mortara RA, Barbiéri CL, Mieczkowski P, da 102 Silveira JF, Briones MR, Pereira GA, Bahia D.; The Genome Sequence of Leishmania amazonensis: Functional Annotation and Extended Analysis of Gene Models. DNA Res. 2013 Jul 15. Risso, M. G.; Pitcovsky, T. A.; Caccuri, R. L.; Campetella, O.; Leguizamon, M. S.; Immune system pathogenesis is prevented by the neutralization of the systemic trans-sialidase from Trypanosoma cruzi during severe infections. Parasitology 134: 503–510, 2007. Rochette, A.; McNicoll, F.;Girard, F.et al. Characterization and developmental gene regulation of a large gene family encoding amastin surface proteins in Leishmania spp, Mol Biochem Parasitol 140, pp. 205–220, 2005. Ronaghi, M; Improved Performance of Pyrosequencing Using Single-Stranded DNA-Binding Protein. Analytical Biochemistry, 286, 2, 2000. Rice, P.; Longden, I.; Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite.Trends in Genetics, vol 16, No 6.pp.276-277, 2000. Schauer, R.; Reuter, G.; Muhlpfordt, H.; Andrade, A. F.; Pereira, M. E.; The occurrence of N-acetyl- and N-glycoloylneuraminic acid in Trypanosoma cruzi. Hoppe Seylers Z Physiol Chem 364: 1053–1057, 1983. Schofield, C. J.; Jannin, J.; Salvatella , R.; The future of Chagas disease control. Trends in Parasitology - Vol. 22, p 583-588, 2006. Shapiro, T. A.; Kinetoplast DNA maxicircles: networks within networks. PNAS, v. 16, p. 7809-7813, 1993. Siegel, T. N.; Tan, K. S.; Cross, G. A..; Systematic study of sequence motifs for RNA trans-splicing in Trypanosoma brucei. Mol. Cell Biol. 25:9586-9594, 2005. Siegel, T. N.; Kapila, G.; George, A.M.; Cross, T. O. Gene expression in Trypanosoma brucei: lessons from high-throughput RNA sequencing, Trends in Parasitology, In Press, Corrected Proof, 2011. Singh, N.; Chikara, S.; Sundar, S.; SOLiD™ Sequencing of Genomes of Clinical Isolates of Leishmania donovani from India Confirm Leptomonas Co-Infection and Raise Some Key Questions. Plos One, 2013, v8.2 Soares, M. B.; Goncalves, R.; et al; Balanced cytokine-producing pattern in mice immunized with an avirulent Trypanosoma cruzi. An Acad Bras Cienc, p. 167-172, 2003. 103 Souto, R. P.;Fernandes, O.;Macedo, A. M.;Campbell, D. A.;Zingales, B. DNA markers define two major phylogenetic lineages of Trypanosoma cruzi, Mol. Biochem. Parasitol. 83, pp. 141–152, 1996. Souza, R. T.; Lima, F. M.; Barros, R. M.; Cortez, D. R.; Santos, M. F.; Cordero, E. M.; Ruiz, J. C.; Goldenberg, S.; Teixeira, M. M. G.; Franco da Silveira, J.; Genome Size, Karyotype Polymorphism and Chromosomal Evolution in Trypanosoma cruzi. PLoS ONE 6(8): e23042, 2011. Souza, W. Novel Cell Biology of Trypanosoma cruzi In American Trypanosomiasis World Class Parasites: Volume 7. Edited by Miles MATKM. Boston , Springer; 13-24, 2003. Stajich, J. E.; Block, D.; Boulez, K.; Brenner; Tamura, K.; Peterson, D.; Peterson, N.; Stecher, G.; Nei, M.;Kumar, S. MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution, 2011. Stephen, F.; Altschul, T. L.; Madden, A.A.; Jinghui Z.; Zheng, Z.; Miller, W.;Lipman, D. J.Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25:3389-3402, 1997. Schuler, G. D.; Sequence mapping by electronic PCR. Genome Res 7: 541– 550, 1997. Teixeira, S. M. R.; da Rocha, W. D. Control of gene expression and genetic manipulation in the Trypanosomatidae. Genet Mol Res 2: 148-158, 2003. Teixeira, S.M.R.; Russell, D.G.; Kirchhoff, L.V.;Donelson, J.E.A differentially expressed gene family encoding "amastin", a surface glycoprotein of Trypanosoma cruzi amastigotes. J. Biol. Chem.269: 20509-20516, 1994. Thorvaldsdóttir, H.; Robinson, J. T.; Mesirov, J. P.; Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics 2012. Tibayrenc, M.; Ayala, F. J.;Towards a population genetics of microorganisms: the clonal theory of parasitic protozoa. Parasitol Today 7: 228-232, 1991. Verdun, R.E., Di Paolo, N., Urmenyi, T.P., Rondinelli, E., Frasch, A.C., Sanchez, D.O., 1998. Gene discovery through expressed sequence Tag sequencing in Trypanosoma cruzi. Infect. Immun. 66, 5393–5398. 104 Wang, Z.; Mark, G.; Snyder, M. RNA-Seq: a transcriptomics. Nature 10(1): 57–63,2009. revolutionary tool for Weatherly, D. B.; Boehlke, C.; Tarleton, R. L. Chromosome level assembly of the hybrid Trypanosoma cruzi genome. BMC Genomics 10: 255, 2009. Weinkauf, C., Salvador, R., and Pereiraperrin, M.; Neurotrophin receptor TrkC is an entry receptor for Trypanosoma cruzi in neural, glial and epithelial cells. Infect Immun 79: 4081–4087, 2011. Westenberger, S. J.; Cerqueira, G. C.; El-Sayed, N. M.; Zingales. B.; Campbell, D. A.; Sturm, N. R. Trypanosoma cruzi mitochondrial maxicircles display species- and strain-specific variation and possess a conserved element in the non-coding region. BMC Genomics. 7:60. doi: 10.1186/1471-2164-7-60, 2006. WHO; A human rights-based approach to neglected tropical diseases. WHO. 2013. Disponível em http://www.who.int/tdr/publications/documents/humanrights.pdf. 10/10/2013 WHO. Chagas disease (American Trypanosomiaisis). Fact Sheet no 340. Disponível em http://www.who.int/mediacentre/factsheets/fs340/en/. 28/06/2013 Yeo, M.; Mauricio, I. L.; Messenger, L. A.; Lewis, M. D.; Llewellyn, M. S. et al. Multilocus Sequence Typing (MLST) for Lineage Assignment and High Resolution Diversity Studies in Trypanosoma cruzi. PLoS Negl Trop Dis 5(6): e1049, 2011. Zingales, B.; Pereira, M. E.; Almeida, K. A.; Umezawa, E. S.; Nehme, N. S.; Oliveira, R. P.; Macedo, A.; Souto,R. P.Biological parameters and molecular markers of clone CL Brener, the reference organism of the Trypanosoma cruzi genome project. Mem Inst Oswaldo Cruz. 92(6):811-4, 1997. Zingales B, Stolf BS, Souto RP, Fernandes O, Briones MR. Epidemiology, biochemistry and evolution of Trypanosoma cruzi lineages based on ribosomal RNA sequences. Mem Inst Oswaldo Cruz. 94:159–164. 1999 Zingales, B. et al. A new consensus for Trypanosoma cruzi intraspecific nomenclature: second revision meeting recommends TcI to TcVI. Mem. Inst. Oswaldo Cruz, Rio de Janeiro, v. 104, n. 7, Nov. 2009. 105 Predicting the Proteins of Angomonas deanei, Strigomonas culicis and Their Respective Endosymbionts Reveals New Aspects of the Trypanosomatidae Family Maria Cristina Machado Motta1, Allan Cezar de Azevedo Martins1, Silvana Sant’Anna de Souza1,2, Carolina Moura Costa Catta-Preta1, Rosane Silva2, Cecilia Coimbra Klein3,4,5, Luiz Gonzaga Paula de Almeida3, Oberdan de Lima Cunha3, Luciane Prioli Ciapina3, Marcelo Brocchi6, Ana Cristina Colabardini7, Bruna de Araujo Lima6, Carlos Renato Machado9, Célia Maria de Almeida Soares10, Christian Macagnan Probst11,12, Claudia Beatriz Afonso de Menezes13, Claudia Elizabeth Thompson3, Daniella Castanheira Bartholomeu14, Daniela Fiori Gradia11, Daniela Parada Pavoni12, Edmundo C. Grisard15, Fabiana Fantinatti-Garboggini13, Fabricio Klerynton Marchini12, Gabriela Flávia Rodrigues-Luiz14, Glauber Wagner15, Gustavo Henrique Goldman7, Juliana Lopes Rangel Fietto16, Maria Carolina Elias17, Maria Helena S. Goldman18, Marie-France Sagot4,5, Maristela Pereira10, Patrı́cia H. Stoco15, Rondon Pessoa de Mendonça-Neto9, Santuza Maria Ribeiro Teixeira9, Talles Eduardo Ferreira Maciel16, Tiago Antônio de Oliveira Mendes14, Turán P. Ürményi2, Wanderley de Souza1, Sergio Schenkman19*, Ana Tereza Ribeiro de Vasconcelos3* 1 Laboratório de Ultraestrutura Celular Hertha Meyer, Instituto de Biofı́sica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil, 2 Laboratório de Metabolismo Macromolecular Firmino Torres de Castro, Instituto de Biofı́sica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil, 3 Laboratório Nacional de Computação Cientı́fica, Laboratório de Bioinformática, Petrópolis, Rio de Janeiro, Brazil, 4 BAMBOO Team, INRIA Grenoble-Rhône-Alpes, Villeurbanne, France, 5 Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France, 6 Departamento de Genética, Evolução e Bioagentes, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, São Paulo, Brazil, 7 Departamento de Ciências Farmacêuticas, Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, São Paulo, Brazil, 8 Laboratório Nacional de Ciência e Tecnologia do Bioetanol, Campinas, São Paulo, Brazil, 9 Departamento de Bioquı́mica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil, 10 Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, Brazil, 11 Laboratório de Biologia Molecular de Tripanossomatı́deos, Instituto Carlos Chagas/Fundação Oswaldo Cruz, Curitiba, Paraná, Brazil, 12 Laboratório de Genômica Funcional, Instituto Carlos Chagas/Fundação Oswaldo Cruz, Curitiba, Paraná, Brazil, 13 Centro Pluridisciplinar de Pesquisas Quı́micas, Biológicas e Agrı́colas, Universidade Estadual de Campinas, Campinas, São Paulo, Brazil, 14 Departamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil, 15 Laboratórios de Protozoologia e de Bioinformática, Departamento de Microbiologia, Imunologia e Parasitologia, Centro de Ciências Biológicas, Universidade Federal de Santa Catarina, Florianópolis, Santa Catarina, Brazil, 16 Departamento de Bioquı́mica e Biologia Molecular, Centro de Ciências Biológicas e da Saúde, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil, 17 Laboratório Especial de Ciclo Celular, Instituto Butantan, São Paulo, São Paulo, Brazil, 18 Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, São Paulo, Brazil, 19 Departamento de Microbiologia, Imunologia e Parasitologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, São Paulo, Brazil Abstract Endosymbiont-bearing trypanosomatids have been considered excellent models for the study of cell evolution because the host protozoan co-evolves with an intracellular bacterium in a mutualistic relationship. Such protozoa inhabit a single invertebrate host during their entire life cycle and exhibit special characteristics that group them in a particular phylogenetic cluster of the Trypanosomatidae family, thus classified as monoxenics. In an effort to better understand such symbiotic association, we used DNA pyrosequencing and a reference-guided assembly to generate reads that predicted 16,960 and 12,162 open reading frames (ORFs) in two symbiont-bearing trypanosomatids, Angomonas deanei (previously named as Crithidia deanei) and Strigomonas culicis (first known as Blastocrithidia culicis), respectively. Identification of each ORF was based primarily on TriTrypDB using tblastn, and each ORF was confirmed by employing getorf from EMBOSS and Newbler 2.6 when necessary. The monoxenic organisms revealed conserved housekeeping functions when compared to other trypanosomatids, especially compared with Leishmania major. However, major differences were found in ORFs corresponding to the cytoskeleton, the kinetoplast, and the paraflagellar structure. The monoxenic organisms also contain a large number of genes for cytosolic calpain-like and surface gp63 metalloproteases and a reduced number of compartmentalized cysteine proteases in comparison to other TriTryp organisms, reflecting adaptations to the presence of the symbiont. The assembled bacterial endosymbiont sequences exhibit a high A+T content with a total of 787 and 769 ORFs for the Angomonas deanei and Strigomonas culicis endosymbionts, respectively, and indicate that these organisms hold a common ancestor related to the Alcaligenaceae family. Importantly, both symbionts contain enzymes that complement essential host cell biosynthetic pathways, such as those for amino acid, lipid and purine/pyrimidine metabolism. These findings increase our understanding of the intricate symbiotic relationship between the bacterium and the trypanosomatid host and provide clues to better understand eukaryotic cell evolution. PLOS ONE | www.plosone.org 1 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis Citation: Motta MCM, Martins ACdA, de Souza SS, Catta-Preta CMC, Silva R, et al. (2013) Predicting the Proteins of Angomonas deanei, Strigomonas culicis and Their Respective Endosymbionts Reveals New Aspects of the Trypanosomatidae Family. PLoS ONE 8(4): e60209. doi:10.1371/journal.pone.0060209 Editor: John Parkinson, Hospital for Sick Children, Canada Received October 16, 2012; Accepted February 22, 2013; Published April 3, 2013 Copyright: � 2013 Motta et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro (FAPERJ), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) and Conselho Nacional de Desenvolvimento Cientı́fico e Tecnológico (CNPq). The work of CCK as part of her PhD is funded by the ERC AdG SISYPHE coordinated by MFS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The co-author Maria Carolina Elias is a PLOS ONE Editorial Board member. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials. * E-mail: [email protected] (ATRdV); [email protected] (SS) the insect host, which seems to be mediated by gp63 proteases, sialomolecules, and mannose-rich glycoconjugates [20,21]. Molecular data support the grouping of all endosymbiontcontaining trypanosomatids together in a single phylogenetic branch. Moreover, studies based on rRNA sequencing suggest that symbionts from different protozoan species share high identities and are most likely derived from an ancestor of a b-proteobacterium of the genus Bordetella, which belongs to the Alcaligenaceae family [2,22,23]. Taken together, these results suggest that a single evolutionary event gave rise to all endosymbiont-bearing trypanosomatids, recapitulating the process that led to the formation of the mitochondrion in eukaryotic cells [24]. In this work, we analyzed the predicted protein sequences of A. deanei and S. culicis and their respective symbionts. This is the first time that genome databases have been generated from endosymbiont-containing trypanosomatids, which represent an excellent biological model to study eukaryotic cell evolution and the bacterial origin of organelles. The analysis presented here also clarifies aspects of the evolutionary history of the Trypanosomatidae family and helps us to understand how these protozoa maintain a close symbiotic relationship. Introduction Protists of the Trypanosomatidae family have been intensively studied because some of them are agents of human illnesses such as Chagas’ disease, African sleeping sickness, and leishmaniasis, which have a high incidence in Latin America, Sub-Saharan Africa, and parts of Asia and Europe, together affecting approximately 33 million people. Some species are also important in veterinary medicine, seriously affecting animals of economic interest such as horses and cattle. In addition, some members of the Phytomonas genus infect and kill plants of considerable economical interest such as coconut, oil palm, and cassava. These organisms circulate between invertebrate and vertebrate or plant hosts. In contrast, monoxenic species, which predominate in this family, inhabit a single invertebrate host during their entire life cycle [1]. Among the trypanosomatids, six species found in insects bear a single obligate intracellular bacterium in their cytoplasm [2], with Angomonas deanei and Strigomonas culicis (previously named as Crithidia deanei and Blastocrithidia culicis, respectively) representing the species better characterized by ultrastructural and biochemical approaches [3]. In this obligatory association, the endosymbiont is unable to survive and replicate once isolated from the host, whereas aposymbiotic protozoa are unable to colonize insects [4,5]. The symbiont is surrounded by two membrane units and presents a reduced peptidoglycan layer, which is essential for cell division and morphological maintenance [6]. The lack of a typical gramnegative cell wall could facilitate the intense metabolic exchange between the host cell and the symbiotic bacterium. Biochemical studies revealed that the endosymbiont contains enzymes that complete essential metabolic pathways of the host protozoan for amino acid production and heme biosynthesis, such as the enzymes of the urea cycle that are absent in the protozoan [7,8,9,10,11]. Furthermore, the bacterium enhances the formation of polyamines, which results in high rates of cell proliferation in endosymbiont-bearing trypanosomatids compared to other species of the family [12]. Conversely, the host cell supplies phosphatidylcholine, which composes the endosymbiont envelope [5], and ATP produced through the activity of protozoan glycosomes [13]. The synchrony in cellular division is another striking feature of this symbiotic relationship. The bacterium divides in coordination with the host cell structures, especially the nucleus, with each daughter cell carrying only one symbiont [14]. The presence of the prokaryote causes ultrastructural alterations in the host trypanosomatid, which exhibits a reduced paraflagellar structure and a typical kinetoplast DNA network [15,16,17]. The endosymbiontharboring strains exhibit a differential surface charge and carbohydrate composition than the aposymbiotic cells obtained after antibiotic treatment [18,19]. Furthermore, the presence of the symbiotic bacterium influences the protozoan interaction with PLOS ONE | www.plosone.org Materials and Methods Materials and methods are described in the Text S1. Nucleotide Sequence Accession Numbers The sequences of Angomonas deanei, Strigomonas culicis, Candidatus Kinetoplastibacterium crithidii and Candidatus Kinetoplastibacterium blastocrithidii were assigned as PRJNA169008, PRJNA170971, CP003978 and CP003733, respectively, in the DDBJ/EMBL/GenBank. Results and Discussion General Characteristics A 454-based pyrosequencing generated a total of 3,624,411 reads with an average length of 365 bp for A. deanei and a total of 2,666,239 reads with an average length of 379 bp for S. culicis (Table 1). A total of 16,957 and 12,157 ORFs were obtained for A. deanei and S. culicis genomes using this strategy, while their respective endosymbionts held a total of 787 and 769 ORFs, respectively. The total number of ORFs includes non-coding protein tRNA and rRNA genes. Tables 1 and 2 present the number of known proteins, hypothetical and partial ORFs for the two trypanosomatids and their endosymbionts, respectively. The tRNA genes representing all 20 amino acids were identified in both trypanosomatids and their respective symbionts. At least one copy of the rRNA genes (18S, 5.8S and 28S) was identified in the genomes of A. deanei and S. culicis. We found that bacterial 2 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis Table 1. Protein Reference Sequence-Guided Assembly data of A. deanei and S. culicis genomes. Parameter A. deanei S. culicis Reads 3,624,411 2,666,239 Average reads length (bp) 365 379 Steps 3 5 Genes in contigs (protein reference sequence) 12,469 9,902 Genes in exclusive contigs 4,435 2,202 Number of known protein ORFs 7,912 6,192 Number of hypothetical ORFs 8,791 5,700 Number of partial ORFs 206 217 Total number of genes (including tRNAs and rRNAs) 16,957 12,157 doi:10.1371/journal.pone.0060209.t001 endosymbiont genomes also contain at least three copies of the rRNA operon. Figure 1. Venn diagram illustrating the distribution of MCL protein clusters. The diagram shows the cluster distribution comparing endosymbiont-bearing trypanosomatids (group A), Leishmania sp. (group B) and Trypanosoma sp. (group C). Protein clusters with less clear phylogenetic distributions are identified as others. doi:10.1371/journal.pone.0060209.g001 General Protein Cluster Analysis A total of 16,648 clusters were identified. Of those, 2,616 (16.4%) contained proteins from all species analyzed. To provide a more comprehensive coverage of the phylogenetic distribution, we have separated the species into three groups: endosymbiontbearing trypanosomatids (A, s = 2 species), Leishmania sp. (B, s = 5) and Trypanosoma sp. (C, s = 4), and we considered a protein cluster to be present in the group even if zero, two or one species were missing, respectively. The protein cluster distribution is shown in Figure 1. In this way, 2,979 protein clusters (17.9%) were identified in all groups, with 130 (0.8%) identified only in groups A and B (AB group), 31 (0.2%) only in groups A and C (AC group), and 501 (3.2%) only in groups B and C (BC group). The AB group represents the proteins that are absent in the Trypanosoma sp. branch. These proteins are mainly related to general metabolic function (p = 46 proteins), hypothetical conserved (p = 37) or transmembrane/surface proteins (p = 33). The AC group is fourfold smaller than the AB group, in accordance with the closer relationship between endosymbiont-bearing trypanosomatids and Leishmania sp [25]. The proteins in the AC group are mainly related to general metabolic function (p = 11), transmembrane/ surface proteins (p = 8) and hypothetical conserved proteins (p = 7), and the relative distribution between these categories is very similar to the distribution in the AB group. The BC group is almost four-fold larger than the AB group, and mainly consists of conserved hypothetical proteins. One hypothesis to explain these different levels of conservation could be that organisms from the genera Trypanosoma and Leishmania inhabit insect and mammalian hosts, while the symbiont-bearing protozoa are mainly insect parasites. Thus, different surface proteins would be involved in host/protozoa interactions and distinct metabolic proteins are required for survival in these diverse environments. Only a small fraction of protein clusters (n = 54, 0.3%) was identified in group A. This finding is in striking contrast to protein clusters identified only in group B (n = 889, 5.3%) or only in group C (n = 679, 4.5%), which represent specializations of the Leishmania or Trypanosoma branches. This small set is mainly composed of hypothetical proteins without similar proteins in the GenBank database. Only three of the group A clusters are similar to bacterial proteins, with two of these similar to Bordetella (clusters 04518 and 05756). The third one is similar to the bacterial-type glycerol dehydrogenase of Crithidia sp. (cluster 07344). Of all the clusters that are present in all species except for one (n = 1,274, 7.6%), 694 (54.5%) are missing in S. culicis, followed by T. congolense (n = 211, 16.6%), A. deanei (n = 201, 15.8%) and T. vivax (n = 104, 8.0%). The fact that endosymbiont-bearing species are better represented in these sets could be due to unidentified proteins in the assembly and/or cluster analysis. This is reinforced by the fact that among clusters containing proteins from just one species (n = 9,477; 56.9%), most (73.9%) are from species with genomes that are not completely assembled (T. vivax, n = 1,881, 19.8%; T. congolense, n = 1,845, 19.5%; A. deanei, n = 1,745, 18.4%; Table 2. General characteristics of the A. deanei and S. culicis symbionts. Parameter A. deanei symbiont S. culicis symbiont Length (BP) 821,813 820,037 G+C (%) 30.96% 32.55% Number of known protein CDSs 640 637 Number of hypothetical CDSs 94 78 Coding region (% of genome size) 88 87 Average CDSs length (bp) 987 bp 1,004 bp rRNA 9 9 rRNA 16 s 3 3 rRNA 23 s 3 3 rRNA 5 s 3 3 tRNA 44 45 Total number of genes 787 769 doi:10.1371/journal.pone.0060209.t002 PLOS ONE | www.plosone.org 3 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis Wolbachia and T. asinigenitalis (E). Alignments were performed with the ACT program based on tblastx analyses. Red (direct similarity) and blue lines (indirect similarity) connect similar regions with at least 700 bp and a score cutoff of 700. The numbers on the right indicate the size of the entire sequence for each organism. doi:10.1371/journal.pone.0060209.g002 S. culicis, n = 1,530, 16.1%). T. brucei and T. cruzi also account for significant numbers of clusters with only a single species (n = 1,094, 11.5% and n = 1,071, 11.3%, respectively), and these clusters mainly consist of multigenic surface proteins. Our data support the idea that endosymbiont-bearing trypanosomatids share a larger proportion of their genes with the Leishmania sp. in accordance with previous phylogenetic studies [2,25]. Only one fifth of all trypanosomatid protein clusters are shared among most of the species analyzed here. This proportion increases to one fourth if we only analyze the Leishmania and Trypanosoma genera; however, the number of clusters specific for endosymbiont-bearing kinetoplastids is a relatively small proportion (0.6%) of all clusters, indicating that the specialization of genes in the species following this evolutionary process was relatively small. Genomic Characteristics of the A. deanei and S. culicis Endosymbionts The endosymbiont genomes. Table 2 summarizes the genome analyses of both symbionts. The genome of the A. deanei endosymbiont contains 821,813 bp, with almost 31% G+C content and 787 CDSs. Of these, 640 (81.3%) were characterized as known CDSs, 94 (11.9%) as hypothetical, and 53 (6.7%) as rRNA or tRNA. The average CDS length is 987 bp, and coding regions account for 88% of the genome, indicating that the genome is highly compact. There are three copies of each rRNA and 44 tRNAs, suggesting a functional translation metabolism. The endosymbiont of S. culicis has a genome composed of 820,037 bps and 769 CDSs, 637 (83.5%) coding for known proteins, 78 (9.5%) annotated as hypothetical proteins, and 54 (6.0%) as rRNA or tRNA. The G+C content (32.6%) is similar to but slightly higher than that of the A. deanei endosymbiont (30.96%). A. deanei and S. culicis endosymbiont genomes are composed of 88 and 87% of CDSs with few regions formed by non-coding sequences. A direct comparison between the two endosymbionts indicated that they share 507 genes that meet the criteria for inclusion in a cluster as described in the Materials and Methods. This represents approximately 70% of the annotated genes in both genomes, indicating a certain degree of genetic similarity. Figure 2A shows the full alignment of the A. deanei and S. culicis symbionts. This alignment indicates the occurrence of an inversion involving approximately one half of the genomes. However, this inversion would be validated by experimental work. The observed differences agree with phylogenetic analyses suggesting the classification of these symbionts as different species, Candidatus Kinetoplastibacterium crithidii and Candidatus Kinetoplastibacterium blastocrithidii [2,23]. The origins of symbionts in trypanosomatids. Previous phylogenetic studies based on sequencing of the small-subunit ribosomal DNA suggested that symbionts of trypanosomatids descended from a common ancestor, a b-proteobacteria of the Bordetella genus [2,22,23]. Comparisons of the endosymbiont genomes with the KEGG database revealed eight organisms that share high numbers of similar CDSs: Bordetella petrii, A. xylosoxidans, Bordetella avium, Bordetella parapertussis, Pusillimonas, Bordetella bronchiseptica and Taylorella equigenitalis. All these species are phylogenet- Figure 2. Genome alignments. The figure shows the alignment of the A. deanei endosymbiont (Endo-A. deanei) and the S. culicis endosymbiont (Endo-S. culicis) (A); between Endo-A. deanei and T. asinigenitalis (B), T. equigenitalis (C), or Wolbachia (D); and between PLOS ONE | www.plosone.org 4 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis ically related to b-proteobacteria belonging to the Alcaligenaceae family. The genus Taylorella consists of two species, T. equigenitalis and T. asinigenitalis, which are microaerophilic, slow-growing gram-negative bacteria belonging to the family Alcaligenaceae [26,27]. T. equigenitalis is an intracellular facultative pathogen in horses that causes contagious equine metritis (CEM), a sexually transmitted infection [28]. Based on these facts, clustering analysis was performed to compare these genomes and establish the genetic similarity among them. The clustering analysis compared the genomes of A. deanei and S. culicis endosymbionts, T. equigenitalis MCE9, T. asinigenitalis MCE3, B. petrii DSM 12804, A. xylosoxidans A8 and Wolbachia pipiens (WMel). For the A. deanei endosymbiont, the highest numbers of shared clusters are observed for A. xylosoxidans (490 clusters) and B. petrii (483 clusters), followed by T. asinigenitalis (376 clusters) and T. equigenitalis (375 clusters). However, considering the genome length, T. equigenitalis and T. asinigenitalis had the greater proportion of genes in clusters (24.1 and 24.67% of the annotated genes, respectively). The values for A. xylosoxidans and B. petrii are 7.59 and 9.61%, respectively. Note that the A. xylosoxidans plasmids pA81 and pA82 are not included in these comparisons. The S. culicis endosymbiont shares a high number of clusters (74%) with other genomes; considering 714 annotated genes (rRNA and tRNA genes were not taken into account), 544 (76.19%) were similar to genes of the other microorganisms. The highest number of clusters is shared between A. xylosoxidans (501 clusters) and B. petrii (495 clusters), followed by T. asinigenitalis (390) and T. equigenitalis (388 clusters). Using W. pipiens (wMel), an endosymbiont of Drosophila melanogaster, as an out-group, we found 70 clusters for A. deanei and 73 clusters for S. culicis. Wolbachia also shares a lower number of clusters with T. asinigenitalis (79) and T. equigenitalis (81). T. equigenitalis MCE9 and T. asinigenitalis MCE3 contain 1,695,860 and 1,638,559 bps, respectively. Therefore, the A. deanei and S. culicis symbiont genomes are reduced when compared to Taylorella, which also have reduced genomes when compared to Bordetella or Achromobacter [26,27]. Alignments indicate the existence of similar sequences between the Taylorella and the kinetoplastid symbionts (Figure 2B and C), corroborating the results obtained in the clustering analyses. Much less similarity is observed between A. deanei and W. pipientis wMel, as well as between W. pipientis and T. asinigenitalis using the same alignment parameters (Figure 2D and E). Both Taylorella genomes are ATrich (37.4 and 38.3% for T. equigenitalis and T. asinigenitalis, respectively), a characteristic also shared with both symbionts. Therefore, it is possible that the process of adaptation to intracellular life involved substantial base-composition modification, as most symbiotic bacteria are AT-rich [29,30]. The degree of similarity and even identity of the endosymbionts with Taylorella genomes and even with genomes of other species such as Bordetella and Achromobacter reinforce the origin of both endosymbionts from an ancestor of the Alcaligenaceae group. Both endosymbionts are similar to T. equigenitalis, T. asinigenitalis, B. petrii, and A. xylosoxidans and to other species of this family to different degrees. In absolute numbers, B. petrii and A. xylosoxidans have the highest numbers of clusters in common with the symbionts. However, considering the genome length, Taylorella species have the highest proportions of clusters in common with the A. deanei and S. culicis endosymbionts. A phylogenomic analysis using 235 orthologs was performed in order to establish the evolutionary history among A. xylosoxidans A8, B. petrii DSM 12804, T. asinigenitalis MCE3, T. equigenitalis MCE9, Ca. K. blastocrithidii and Ca. K. crithidii. The results indicated that symbionts present in both trypanosomatid species are closely PLOS ONE | www.plosone.org related to the Alcaligenaceae family (Figure S1). Pseudomonas aeruginosa PA7 was the Gammaproteobacteria used as outgroup. These data corroborate the results from Alves et al. 2011 [11]. Although the genome lengths of both trypanosomatid bacteria are slightly larger than those of Buchnera sp. [31], they are several fold larger than those of symbiotic bacteria, which have extremely reduced genomes [32]. Analysis of the B. pertussis and B. parapertussis genomes revealed a process of gene loss during host adaptation [33,34]. This process was proposed to be associated with mobile DNA elements such as Insertion Sequences (IS) and the presence of pseudo genes [33,34]. However, the mechanism(s) involved in the length reduction observed for the genomes of the two symbionts studied here needs further investigation. Our data enable future studies examining the relationship between endosymbiosis in trypanosomatids and the origin of organelles in eukaryotic cells. Host Trypanosomatid Characteristics The microtubule cytoskeleton and flagellum of the host trypanosomatids. The cytoskeleton is composed of structures such as the microtubular subpelicular corset, the axoneme, the basal body, and the paraflagellar rod [35]. Thus, the cytoskeleton controls several characteristics of trypanosomatids such as their shape, the positions of structures, the flagellar beating and the host colonization. The presence of the symbiont has been related to unique characteristics of the host trypanosomatid. Six members of the tubulin superfamily (a, b, d, c, e and f) are present in A. deanei and S. culicis. Accordingly, d and e-tubulins are present in organisms that possess basal bodies and flagella [36]. ctubulin is localized in the basal body of A. deanei [14] as in other trypanosomatids [35]. Additionally, in common with other trypanosomatids, five centrins were identified in A. deanei and S. culicis. Furthermore, symbiont-containing trypanosomatids contain e-tubulin, as in algae genomes, which can be related to the replication and inheritance of the centriole and basal bodies [37,38]. Interestingly, the absence of microtubules that form the subpelicular corset in areas where the mitochondrion touches the plasma membrane is unique to symbiont-containing trypanosomatids [15]. However, we cannot explain this atypical microtubule distribution based on database searches. Moreover, no classical eukaryotic microtubule associated proteins (MAPs) or intermediate filament homologues were identified in symbiont-bearing or other trypanosomatids, except for TOG/MOR1 and Asp. Actin and other protein homologues that play roles in the binding and nucleation of actin filaments are present in A. deanei and S. culicis. However, the ARP 2/3 complex, which is involved in the nucleation of actin, is absent in symbiont-bearing species. As actin seems to be necessary for endocytosis in trypanosomatids [39], the absence of some proteins involved in actin nucleation may be related to the low rates of endocytosis of these protozoa (unpublished data). Indeed, both symbiont-bearing trypanosomatids have low nutritional requirements, as the symbiotic bacterium completes essential metabolic routes of the host cell [3]. Trypanosomatids are the only organisms from the orders Euglenida and Kinetoplastida that have a paraflagellar rod. This structure is continuously associated with axoneme and it contains two major proteins designated PFR1 and PFR2 [35]. Importantly, only PFR1 was identified in A. deanei and S. culicis. Perhaps we missed PFR2 since these PFR proteins are highly repetitive and their assemblies are difficult. Nevertheless, these species have a reduced paraflagellar rod located at the proximal area of the flagellum [15,16], although the same pattern of flagellar beating described for other trypanosomatids is observed for A. deanei [40]. The paraflagellar rod components (PFC) 4, PFC 10, PFC 16, and 5 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis PFC 18 were detected in the A. deanei database, whereas in S. culicis PFC 11 was also identified. Other minor components of the paraflagellar rod could not be detected. Accordingly, RNA interference (RNAi) knockdown of PFCs such as PFC3 does not impair the flagellar movement of T. brucei [41], differently from PFC4 and PFC6 depletion [42]. Several other minor flagellar proteins detected in these and other trypanosomatids are absent in A. deanei and S. culicis, especially the flagellar membrane proteins and those involved in intraflagellar transport (kinesins). Symbiont-containing species had adenylate kinase B (ADKB) but not ADKA, in contrast to other trypanosomatids, which express both. These proteins are involved in the maintenance of ATP supply to the distal portion of the flagellum [43,44]. Taken together, the differences in the composition and function of the cytoskeleton in symbiont-containing trypanosomatids seem to represent adaptations to incorporate the endosymbiont. Further exploration of these differences could enable a better understanding of how endosymbiosis was established. The kinetoplast. The kinetoplast is an enlarged portion of the single mitochondrion that contains the mitochondrial DNA, which exhibits an unusual arrangement of catenated circles that form a network. The kinetoplast shape and the kDNA topology vary according to species and developmental stage. Endosymbiont-containing trypanosomatids show differences in the morphology and topology of the kDNA network when compared to other species of the same family. Both species present a loose kDNA arrangement, but in A. deanei, the kinetoplast has a trapezoid-like shape with a characteristic transversal electron-dense band, whereas in S. culicis the disk shape structure is wider at the center in relation to the extremities [2,17]. Differences in kDNA arrangement are related to low molecular weight basic proteins such as kinetoplast-associated protein (KAP), taking part in the organization and segregation of the kDNA network [45,46]. Our data indicate that KAP4 and KAP3 homologues are present in A. deanei, while KAP4, KAP2 homologues, and ScKAP-like protein are found in S. culicis (Table S1). In addition, a conserved nine amino acid domain in the Nterminal region, most likely a mitochondrial import signal [47,48], is found in AdKAP4 and ScKAP4 (amino acid positions 10 to 16) (Figure S2). Furthermore, ScKAP2 has a conserved domain called the High Mobility Group (HMG), indicating that this protein may be involved in protein-protein interactions. These KAPs might be related to the typical kDNA condensation of symbiont-bearing trypanosomatids. Housekeeping genes. Histones, which are responsible for structuring the chromatin, are highly conserved proteins that appeared in the eukaryotic branch of evolution. Although well conserved, Trypanosomatidae histones display differences in the N and C-terminal sequences, sites of post-translational modifications, when compared to other eukaryotes. Phylogenetic analysis revealed that histones and their variants in both A. deanei and S. culicis are clustered in a separate branch, between the Trypanosoma and Leishmania species (Figure 3A). Similar phylogenetic distribution is seen for the dihydrofolate reductase-thymidylate synthase when we performed the analysis using nucleotide sequences (Figure 3B). Nevertheless, the symbiont-bearing species show conservation in the sites of post-translation when compared to other trypanosomes as shown in supplementary Figure S3. In A. deanei and S. culicis the proteins related to the chromatin assembly are also maintained, including histones and histone-modifying enzymes as shown in Tables S2–S7 and Figure S4 of the supporting information. For a more detailed analysis about housekeeping genes of A. deanei and S. culicis see Text S1. PLOS ONE | www.plosone.org DNA replication, repair, transcription, translation and signal transduction in A. deanei and S. culicis functions can be respectively attributed at least to 914 ORFs and 643 ORFs (Table 3). Most of the genes are exclusive to the protozoan and are absent in the endosymbiont (Table 4), thus indicating that these processes are exclusive to the host organism as shown in the supplementary Tables S8–S13, typically containing a conserved spliced-leader RNA as found in other trypanosomes (see Figure S5 for more information). A total of 133 and 130 proteins with similar functions are detectable in the endosymbionts of both species, with up to 95% amino acid identity to proteins of Bordetella sp. and A. xylosoxidans. Similar DNA repair proteins are present in both eukaryote and prokaryote predicted sequences. These findings demonstrate that the endosymbionts conserved essential housekeeping proteins despite their genome reduction. Some differences were found in mismatch repair (MMR) between symbiont-bearing trypanosomatid genomes. As microsatellite instability is considered the molecular fingerprint of the MMR system, we compared the abundance of tandem repeats in the genomes of A. deanei and S. culicis and their respective endosymbionts. We noticed that the genomes of S. culicis and its endosymbiont are more repetitive than the genomes of A. deanei and its endosymbiont (Figure 4A). However, the higher repetitive content of the genomes of S. culicis and its endosymbiont is not only due to the higher number of microsatellite loci (Figure 4B) but also to the expansion of the size of the microsatellite sequences. These data suggest that microsatellites of S. culicis and its endosymbiont evolved faster than those of A. deanei and its endosymbiont. Interestingly, we identified some missing components of the MMR machinery in S. culicis that are present in A. deanei, such as exonuclease I (Exo I), a 59-39 exonuclease that is implicated in the excision step of the DNA mismatch repair pathway (Table S9). Several studies have correlated the silencing of the ExoI protein and/or mutations of the ExoI gene and microsatellite instability with development of lymphomas and colorectal cancer [49,50,51]. Therefore, we speculate that deficiencies in the MMR machinery in S. culicis may be related to the high proportion of microsatellites in its genome. The association between microsatellite instability and MMR deficiency has already been described for T. cruzi strains [52,53]. The same variability pattern is observed for each symbiont, despite the fact that the MMR machinery seems to be complete in both symbiotic bacteria (Table S10). It is tempting to speculate that this finding may indicate that the parasite and its endosymbiont are exposed to the same environment and therefore may be subjected to similar selective pressures imposed by an external oxidative condition. A. deanei and S. culicis have 607 and 421 putative kinaseencoding genes, respectively (Table 5). Thirty one of the A. deanei kinases were classified in the AGC family, 31 as atypical, 49 as CAMK, 15 as CK1, 108 as CMGC, 64 as STE, 1 as TKL, 81 as others, and 227 that could not be classified in any of these families. No typical tyrosine kinases (TK) are present in A. deanei or S. culicis, as in other trypanosomes, although tyrosine residues are subjected to phosphorylation [54,55]. Several phosphatases have also been described in trypanosomes, pointing toward their regulatory role in the development of these organisms. The T. brucei PTP (TbPTP1) is associated with the cytoskeleton and has been reported to be intrinsically involved in this parasite’s cycle [56]. Similar sequences are found in the A. deanei genome, including PTP1, which is not found in the S. culicis database. Additionally, a large number of other PTPs appear in both genomes, including ectophosphatases (Table S14). 6 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis PLOS ONE | www.plosone.org 7 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis Figure 3. Phylogenetic of histones of A. deanei, S. culicis, and other trypanosomatids. Histone protein (panel A) and nucleotide (panel B) sequences were generated by MUSCLE tool using 10 iterations in the Geneious package [120]. Trees were constructed using the Geneious Tree Builder, by employing Jukes-Cantor genetic distance model with a neighbor-joining method and no out-groups. The consensus trees were generated from 100 bootstrap replicates of all detected histone genes, as shown below. Scale bars are indicated for each consensus tree. The trees in panel A are based in a collection of sequences of all trypanosomatids. The nucleotide sequences used for dihydrofolate reductase-thymidylate synthase are: T. cruzi, XM_810234; T. brucei, XM_841078; T. vivax, HE573023; L. mexicana, FR799559; L. major, XM_001680805; L. infantum, XM_001680805; and C. fasciculata, M22852. doi:10.1371/journal.pone.0060209.g003 A. deanei sequences codify enzymes involved in RNAi, a mechanism described in various organisms that promotes the specific degradation of mRNA. RNAi is initiated by the recognition of double-stranded RNA through the action of endoribonucleases known as Dicer and Slicer, members of the Argonaut (Ago) protein family (RNase H-type) [58]. The cleavage of double-stranded RNA results in a complex that specifically cleaves mRNA molecules that are homologous to the doublestranded sequence. A. deanei contains the gene coding Dicer-like protein II (AGDE14022) and Ago1 (AGDE11548), homologous to enzymes in T. brucei and Leishmania braziliensis (Ngo et al., 1998; Lye et al., 2010). In addition, A. deanei contains the RNA interference factor (RIF) 4 (AGDE09645) with an exonuclease domain of the DnaQ superfamily, as described in T. brucei. A fragmented RIF5 sequence was also found in the sequence AGDE15656. These proteins were shown to interact with Ago1 as was recently demonstrated in T. brucei [59], suggesting that RNAi might be active in A. deanei. None of these sequences were found in the S. culicis database. Two major signal transduction pathways are described in trypanosomatids: one is the cyclic AMP-dependent route and the other is the mitogen-activated protein kinase pathway [57]. The major components of these pathways, including phosphatidylinositol signaling, mTOR and MAPK signaling pathways are identified in A. deanei and S. culicis. These pathways may regulate cellular activities such as gene expression, mitosis, differentiation, and cell survival/apoptosis (Table 6). Most genes encoding heat shock proteins are present in symbiont-bearing species, as was previously described in other trypanosomatids (Table S15). Genes for redox molecules and antioxidant enzymes, which are part of the oxidative stress response, are also present in the A. deanei and S. culicis genomes. Both contain slightly more copies of ascorbate peroxidase, methionine sulfoxide reductase, glucose-6-phosphate dehydrogenase, and trypanothione reductase genes than L. major. In particular, several genes related to the oxidative stress response are present in higher copy numbers in symbiont-bearing trypanosomatids than in L. major (Figure 5). Table 3. Numbers of ORFs identified in A. deanei and S. culicis and their symbionts, according to the mechanisms of DNA replication and repair, signal transduction, transcription and translation. Number of ORFs Mechanism A. deanei S. culicis A. deanei symbiont S. culicis symbiont Replication and Repair 178 148 56 54 Base excision repair 34 34 9 9 DNA replication 54 32 11 11 Homologous recombination 11 11 16 15 Mismatch repair 28 29 12 12 Non-homologous end-joining 8 7 – – Nucleotide excision repair 43 35 8 7 Signal Transduction 136 46 1 1 Phosphatidylinositol signaling system 23 17 – – – mTOR signaling pathway 113 29 – Two component system – – 1 1 Transcription 96 61 3 3 Basal transcription factors 15 4 – – RNA polymerase 28 16 3 3 Spliceosome 53 41 – – Translation 504 388 73 72 Aminoacyl-tRNA biosynthesis 63 56 25 25 mRNA surveillance pathway 43 45 – – Ribosome proteins 231 152 48 47 Ribosome biogenesis in eukaryotes 84 66 – – RNA transport 83 69 – – TOTAL 914 643 133 130 doi:10.1371/journal.pone.0060209.t003 PLOS ONE | www.plosone.org 8 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis Table 4. Summary of the origin of ORFs found in A. deanei and S. culicis. A. deanei Functional Classification Symbiont Prokaryotes* Eukaryotes** P/E*** Replication and Repair Base excision repair 5 11 4/0 Nucleotide excision repair 2 16 9/0 Non-homologous end-joining 1 5 N Mismatch repair 2 13 8/0 Homologous recombination 2 9 10/0 DNA replication 3 22 10/0 Signal Transduction Two-component system N N 1 Phosphatidylinositol signaling system 0 16 N mTOR signaling pathway 0 8 N MAPK signaling pahway - yeast 0 1 N Transcription Spliceosome 0 20 N RNA polymerase 0 16 3/0 Basal transcription factors 0 5 N Translation RNA transport 0 31 N Ribosome biogenesis in eukaryotes 0 27 N Ribosome 0 75 48/0 mRNA surveillance pathway 0 17 N Aminoacyl-tRNA biosynthesis 0 22 23 S. culicis Functional Classification Symbiont Prokaryotes Eukaryotes P/E Base excision repair 2 6 5/0 Nucleotide excision repair 2 10 7/0 Non-homologous end-joining 1 1 N Mismatch repair 1 5 8/0 Replication and Repair Homologous recombination 1 4 11/0 DNA replication 2 15 9/0 Signal Transduction Two-component system N N 1 Phosphatidylinositol signaling system 0 11 N mTOR signaling pathway 0 8 N MAPK signaling pathway - yeast 0 0 N Transcription Spliceosome 0 13 RNA polymerase 0 11 Basal transcription factors 0 2 3/0 Translation RNA transport 0 19 N Ribosome biogenesis in eukaryotes 0 20 N Ribosome 0 53 46/0 mRNA surveillance pathway 0 16 N Aminoacyl-tRNA biosynthesis 0 18 23 *Number of genes with identity to Prokaryotes. **Number of genes with identity to Eukaryotes. ***Ratio of the number of genes with identity to Prokaryotes/Eukaryotes. doi:10.1371/journal.pone.0060209.t004 PLOS ONE | www.plosone.org 9 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis Table 6. Representative ORFs involved in the signal transduction pathways in A. deanei and S. culicis. Cell cycle control in host trypanosomes. In eukaryotes, DNA replication is coordinated with cell division by a cyclin-CDK complex that triggers DNA duplication during the S phase of the cell cycle. Multiple copies of the CRK gene (cdc2-related protein kinase) are found in A. deanei and four genes coding for two different CRKs are present in S. culicis. Both proteins exhibit structural features of the kinase subunits that make up the CDK complex, as they contain the cyclin-binding PSTAIRE motif, an ATP-binding domain and a catalytic domain. These motifs and domains are not the same in different CRKs (Figure S6), strongly Table 5. Kinase families identified in trypanosomatids. AGC 31 23 Atypical 31 21 CAMK 49 39 CK1 15 8 CMGC 108 77 STE 64 31 TKL 1 0 Other 81 58 No hits found 227 164 TOTAL 607 421 S. culicis doi:10.1371/journal.pone.0060209.t005 PLOS ONE | www.plosone.org S. culicis AGDE02036 STCU01612 Diacylglycerol kinase AGDE02361 STCU00226 CDP-diacylglycerol-inositol-3phosphatidyltransferase AGDE04835 STCU01286 Myo-inositol-1(or 4) monophosphatase AGDE08470 STCU02993 Phospholipase C AGDE12052 STCU02439 Phosphatidylinositol 4-phosphate 5-kinase alpha AGDE09669 STCU03909 Inositol-1,4,5-trisphosphate (IP3) 5-phosphatase AGDE06690 nd phosphatidate cytidylyltransferase AGDE09922 nd Mitogen-activated protein kinase 5 AGDE00259 STCU00603 Protein kinase A AGDE06073 STCU01525 TP53 regulating kinase AGDE08400 nd Serine/threonine-protein kinase CTR1 AGDE00613 nd Casein kinase AGDE11868 STCU01611 Phosphoinositide-specific phospholipase C nd STCU09903 suggesting that these CRKs might control different stages of the cell cycle. A. deanei contains four genes coding for cyclins. Three of these genes are homologues to mitotic cyclin from S. cerevisiae and T. brucei. However, none of them contain the typical destruction domain present in T. brucei mitotic cyclin [60]. The fourth codes for a S. cerevisiae Clb5 homolog, an S-phase cyclin. These data indicate that more than one CRK and more than one cyclin would be involved in the cell cycle control of symbiont-containing trypanosomatids, suggesting that tight regulation must occur to guarantee the precise maintenance of only one symbiont per cell [14]. Cell cycle control in the endosymbionts. Bacterial cell division is a highly regulated event that mainly depends on two structures, the peptidoglycan layer and the Z ring. The first step in the segregation of the bacterium is the formation of a polymerized Z ring at the middle of the cell. This structure acts as a platform for the recruitment of other essential proteins named Filament Temperature Sensitive (Fts), which are mainly involved in the formation and stabilization of the Z ring [61,62] and in establishing the peptidoglycan septum formation site in most bacteria [63] (Figure 6A). Two fts sequences were identified in A. deanei and S. culicis symbionts based on Bordetella genes (Table 7). One of them is FtsZ, which requires integral membrane proteins such as Zip A and FtsA for anchoring. However, these sequences are absent in the symbionts. FtsZ should also interact with FtsE, which is absent in both symbionts. This protein is homologous to the ATP-binding cassette of ABC transporters and co-localizes with the division septum [64]. The lack of these proteins could be related to the absence of a classical Z ring in these symbionts. The other sequence is FtsK that docks FtsQ, FtsB and FtsL, which are related to the formation of the peptidoglycan layer in E. coli and B. subtilis [65,66,67], but these proteins are absent in symbionts, as in most bacteria that exhibit reduced peptidoglycan production [64]. RodA, a homologous integral membrane protein involved in bacterial cell growth, is detected in the endosymbionts. RodA could replace FtsW, which is absent in both symbionts. FtsW is The Coordinated Division of the Bacterium during the Host Protozoan Cell Cycle A. deanei A. deanei Calmodulin nd: not determined. doi:10.1371/journal.pone.0060209.t006 Figure 4. Microsatellite content in the genomes of A. deanei, S. culicis, and their endosymbionts. Panel (A) shows the percentage of repetitive nucleotides for each repeat length. The total numbers of nucleotides are derived from microsatellite sequences divided by the total number of assembled nucleotides. Panel (B) shows the microsatellite density. The values indicate the number of microsatellite loci divided by the genome length6100. doi:10.1371/journal.pone.0060209.g004 Kinase family Product 10 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis Figure 5. Oxidative stress-related genes in the genomes of A. deanei, S. culicis and L. major. The figure shows the number of ORFs for the indicated enzymes for each species. doi:10.1371/journal.pone.0060209.g005 Figure 6. Schematic representation of the cell division machinery found in the endosymbionts. Panel (A) indicates the basic model derived from a gram-negative bacterium with the localization of each component (shown on the right). Panel (B) represents the components found in the endosymbiont of A. deanei, and Panel (C) shows the steps in the assembly of the Z-ring. The missing components of the A. deanei endosymbiont are drawn in red. doi:10.1371/journal.pone.0060209.g006 PLOS ONE | www.plosone.org 11 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis Table 7. Members of the Fts family and PBPs that are present in endosymbionts of A. deanei and S. culicis. Function Protein A. deanei S. culicis Stabilization and attachment of FtsZ polymers to the inner membrane FtsA nd nd FtsE nd nd ZipA nd nd FtsK CKCE00084 CKBE00632 FtsQ nd nd FtsB Nd nd FtsL nd nd FtsN nd nd Lipid II flippase FtsW(RodA) CKCE 00486 CKBE00079 Forms a dynamic cytoplasmic ring structure at midcell FtsZ CKCE00034 CKBE00683 Penicillin binding proteins (PBPs) PBP1A CKCE00524 CKBE00119 PBP2 CKCE00487 CKBE00080 Interaction with peptidogycan synthases PBPs FtsI/PBP3 CKCE00487 CKBE00080 PBP4 nd nd PBP5/dacC CKCE00510 CKBE00105 PBP6 nd nd PBP6B nd nd PBP7 nd nd nd: not determined. doi:10.1371/journal.pone.0060209.t007 division by the host protozoan [6]. These losses could be understood since the host trypanosomatid is controlling the number of symbiotic bacteria per cell. This phenomenon has been described for obligatory intracellular bacteria that co-evolve in eukaryotic cells, as well as for the organelles of prokaryotic origin, the chloroplast and the mitochondrion [74,75]. essential for the localization of FtsI (PBP3) in the Z ring [68], which is absent in the symbiotic bacteria. Endosymbionts have only one bifunctional synthase (PBP1A), while E. coli has PBP1A, PBP1B, and PBP1C. Cells require at least one of these synthases for viability. The peptidoglycan layer is functional in trypanosomatid symbionts, as shown by treatment with b-lactam antibiotics affecting the division of the bacterium, generating filamentous structures and culminating in cell lysis. PBP1 and PBP2 have also been detected at the symbiont envelope [6]. PBP1B interacts with the two essential division proteins, FtsN and PBP3/FtsI, which are absent in the symbiont. PBP1B can also interact with PBP2 that is identified in both symbiont databases (see Table 7). A sequence encoding a minor PBP described in E. coli was also identified in the symbionts. This protein is known as a putative PBP precursor (PBP5/dacC). This PBP is involved in the regulation of the peptidoglycan structure, along with 3 other minor PBPs described in E. coli, but these are absent from the symbiont (Table 7). On the other hand, all the enzymes involved in the synthesis of activated nucleotide precursors for the assembly of the peptidoglycan layer are present in the symbiont genome, except for Braun’s lipoprotein (Lpp), which forms the lipidanchored disaccharide-pentapeptide monomer subunit [69]. In E. coli strains, mutations in Lpp genes result in a significant reduction of the permeability barrier, although small effects on the maintenance of the cell growth and metabolism were observed in these cells [70,71]. Taken together, we consider that gene loss in the dcw cluster [72] (represented in Figure 6) explains the lack of the FtsZ ring in the endosymbiont during its division process [73]. Moreover, the symbiont envelope contains a reduced peptidoglycan layer and lacks a septum during its division process, which can be related to the facilitation of metabolic exchanges, as well as to the control of PLOS ONE | www.plosone.org Metabolic Co-evolution of the Bacterium and the Host Trypanosomatid Symbiosis in trypanosomatids is characterized as a mutual association where both partners benefit. These symbiont-bearing protozoa have low nutritional requirements, as intense metabolic exchanges occur. Our data corroborate previous biochemical and ultrastructural analyses showing that the bacterium has enzymes and metabolic precursors that complete important biosynthetic pathways of the host [76]. Oxidative phosphorylation. FoF1-ATP synthase and the entire mitochondrial electron transport chain are present in A. deanei and S. culicis, although some subunits are missing (Table 8). These species have a rotenone-insensitive NADH:ubiquinone oxidoredutase in complex I, as do other trypanosomatids [77]. Ten complex II (succinate:ubiquinone reductase) subunits of the twelve identified in T. cruzi [78] are also present in both trypanosomatids. Many subunits from complex III, composed of cytochrome c reductase, are found in A. deanei and S. culicis. In addition, these protozoa contain genes for cytochrome c, as previously suggested by biochemical studies in other symbiontcontaining trypanosomatids [3,79]. Both symbionts contain sequences with hits for all subunits of complex I, NADH:ubiquinone oxidoredutase, similar to E. coli (Table 8). Complexes II and III, including cytochrome c, and complex IV (cytochrome c oxidase, succinate:ubiquinone reductase and cytochrome c reductase, respectively) are not found in 12 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis Table 8. Respiratory chain complexes identified in the predicted proteome of A. deanei, S. culicis and their respective endosymbionts. Complex I A. deanei A. deanei endosymbiont S. culicis S. culicis endosymbiont 33 0 33 0 Complex II 10 0 10 0 Complex III 5 0 4 0 Complex IV 10 2* 2 2* Complex V 10 8 3 8 *The complex IV of the endosymbionts might be a cytochrome d ubiquinol oxidase identified in both organisms, instead a classical cytochrome c oxidase. doi:10.1371/journal.pone.0060209.t008 intensive metabolic exchanges, reducing the nutritional requirements of these trypanosomatids when compared to species without the symbiotic bacterium, or to aposymbiotic strains. Several biochemical studies have been carried out analyzing the biosynthetic pathways involved in this intricate relationship as recently reviewed [76], and our genomic data corroborate these findings. A schematic description of the potential metabolic interactions concerning the metabolism of amino acids, vitamins, cofactors, and hemin is provided in Figure 7. Both symbiotic bacteria have genes potentially encoding for all necessary enzymes for lysine, phenylalanine, tryptophan and tyrosine synthesis, in agreement with previous experimental data [40]. Tyrosine is required in the growth medium of A. deanei [81], but it is not essential for S. oncolpelti or S. culicis [41,82,83]. Here, in the symbiotic bacteria, we found enzymes involved in tyrosine synthesis, as well as indications that phenylalanine and tyrosine can be interconverted. In fact, protozoan growth is very slow in absence of phenylalanine and tryptophan [81], which may either symbiont. However, we detected the presence of cytochrome d as found in Allochromatium vinosum, and also a cytochrome d oxidase with a sequence close to that of B. parapertussis. All portions of the FoF1-ATP synthase were identified in symbionts, although not every subunit of each portion was found. Lipid metabolism. The sphingophospholipid (SPL) content in A. deanei and its symbiont has been previously described, with phosphatidylcholine (PC) representing the major SPL in the host, whereas cardiolipin predominates in the symbiotic bacterium [5,80]. The synthetic pathway of phosphatidylglycerol from glycerol phosphate is present in both host trypanosomatids (Table S16). The biosynthetic pathways of PC and PE from CDP-choline and CDP-ethanolamine (Kennedy pathways), that synthesize PC and PE respectively, are incomplete in A. deanei and S. culicis. Nevertheless, the methylation pathway (Greenberg pathway), which converts PE in PC, seems to be absent in both trypanosomatids, even though one enzyme sequence was identified in A. deanei. The symbiont of A. deanei exhibits two routes for phosphatidylethanolamine (PE) synthesis, starting from CDP-diacylglycerol and producing phosphatidylserine as an intermediate (Table S17). Interestingly, this last step of the pathway is not found in the S. culicis endosymbiont. Importantly, both symbionts lack genes that encode proteins of PC biosynthetic pathways, reinforcing the idea that this phospholipid is mainly obtained from the host protozoa [5]. Remarkably, phoshpatidylglycerophosphatase A, which produces the intermediate phosphatidylglycerol necessary for cardiolipin biosynthesis, was not found in either protozoa but is present in both symbionts. As cardiolipin is present in the inner membranes of host mitochondria, the symbionts may complete cardiolipin biosynthesis. Pathways for sphingolipid production, including the synthesis of ceramide from sphingosine-1P, are present in A. deanei, while S. culicis lacks enzymes of this pathway (Table S16). Both host trypanosomatids have glycerol kinase and 3-glycerophosphate acyltransferase, enzymes for the synthesis of 1,2-diacyl-sn-glycerol and triacylglycerol from D-glycerate. In endosymbionts, glycerolipid metabolism seems to be reduced to two enzymes: 3glycerophosphate acyltransferase and 1-acylglycerol-3-phosphate O-acyltransferase (Table S17), suggesting metabolic complementation between partners. Furthermore, both hosts contain enzymes of the biosynthesis pathway for ergosterol production from zymosterol, as well as the pathway of sterol biosynthesis that produces lanosterol from farnesyl-PP. These pathways are only complete in A. deanei. The symbionts do not have enzymes for sterol biosynthesis, in accordance with our previous biochemical analysis [80]. Figure 7. Main metabolic exchanges between host and endosymbionts. Schematic representation of the amino acids, vitamins, and cofactors exchanged between A. deanei and S. culicis and their respective symbionts. Dotted lines indicate pathways that have or might have contributions from both partners, whereas metabolites inside one of the circles, representing the symbiont or host, indicate that one partner holds candidate genes coding for enzymes of the whole biosynthetic pathway. *Candidate genes were only found for the symbiont of S. culicis and not for the symbiont of A. deanei. BCAA (branched-chain amino acids) are leucine, isoleucine and valine. doi:10.1371/journal.pone.0060209.g007 Metabolism of amino acids, vitamins, cofactors and hemin. Symbiosis in trypanosomatids is characterized by PLOS ONE | www.plosone.org 13 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis However, we cannot discard the possibility that adenosine is transported to the intracellular medium by carriers of monophosphate nucleoside or by the presence of other enzymes that have the same function as 59-nucleotidase. On the other hand, the lack of 59-nucleotidase in A. deanei and S. culicis can be related to the fact that such protozoa are only insect parasites. According to this idea, several studies have shown the importance of ectonucleotidases in the establishment of infection by some trypanosomatid species [91]. The high activity of ectonucleotidases with concomitant production of adenosine, a known immune system inhibitor, lead to high susceptibility to Leishmania infection because adenosine can induce anti-inflammatory effects on the host [92,93]. Nucleoside transporters can take up nucleosides and nucleobases generated by ectonucleotidase activity. Genes encoding nucleoside transporters are present in both trypanosomatid genomes (Table S19), enabling cells to obtain exogenous purines from the medium. Furthermore, A. deanei and S. culicis contain intracellular enzymes that can convert purines to nucleotides, such as adenine phosphoribosyltransferase, hypoxanthine-guanine phosphoribosyltransferase, adenylate kinase, AMP deaminase, inosine monophosphate dehydrogenase and GMP synthetase. These data indicate that these organisms can interconvert intracellular purines into nucleotides. In contrast, both endosymbionts lack all the genes encoding enzymes related to purine salvage. Nevertheless, the symbiotic bacteria have genes encoding all the enzymes expected to participate in the de novo synthesis of purine nucleotides as previously proposed [94,95]. One interesting possibility is that the symbiotic bacterium is able to supply the host trypanosomatid with purines. According to this idea, the endosymbiont participates in the de novo purine nucleotide pathway of A. deanei, as the aposymbiotic strain is unable to utilize glycine for the synthesis of purine nucleotides, only for pyrimidine nucleotide production [87]. Protozoa are generally, but not universally considered to be capable of synthesizing pyrimidines from glutamine and aspartic acid, which are used as precursors. Our results indicate that both symbiont-bearing trypanosomatids carry out de novo pyrimidine synthesis (Table S19). Interestingly, in silico analyses also revealed the presence of all the genes for de novo pyrimidine synthesis in both symbiont genomes, but not for the pyrimidine salvage pathway. A previous report indicated that A. deanei was able to synthesize purine and pyrimidine nucleotides from glycine (‘‘de novo’’ pathway) and purine nucleotides from adenine and guanine (‘‘salvage’’ pathway). Adenine would be incorporated into both adenine and guanine nucleotides, whereas guanine was only incorporated into guanine nucleotides, suggesting a metabolic block at the level of GMP reductase [87]. Deoxyribonucleotides are derived from the corresponding ribonucleotides by reactions in which the 29-carbon atom of the D-ribose portion of the ribonucleotide is directly reduced to form the 29-deoxy derivative. This reaction requires a pair of hydrogen atoms that are donated by NADPH via the intermediate-carrying protein thioredoxin. The disulfide thioredoxin is reduced by NADPH in a reaction catalyzed by thioredoxin reductase, providing the reducing equivalents for the ribonucleotide reductase, as observed for the endosymbionts that could provide 29deoxy derivatives. In folate metabolism, the formation of thymine nucleotides requires methylation of dUMP to produce dTMP, a reaction catalyzed by thymidilate kinase, which is present in A. deanei, S. culicis, and their respective endosymbionts. Figure 8 summarizes the purine and pyrimidine metabolisms in A. deanei and S. culicis considering the metabolic complementarity between the protozoan and the endosymbiont. indicate that larger amounts of these amino acids are required for rapid cell proliferation. Our data indicate that branched-chain amino acid (BCAA) synthesis mainly occurs in the symbionts except for the last step, with the branched-chain amino acid aminotransferase found in the host protozoan. Among the pathways that (might) involve contributions from both partners, two have previously been characterized in detail, the urea cycle and heme synthesis. The urea cycle is complete in both symbiont-harboring trypanosomatids. Symbiotic bacteria contribute with ornithine carbamoyltransferase, which converts ornithine to citrulline, and with ornithine acetyltransferase, which transforms acetylornithine in ornithine. Conversely, aposymbiotic strains and symbiont-free Crithidia species need exogenous arginine or citrulline for cell proliferation [8] [68]. Our genomic data corroborate these studies. Contrary to symbiont-free trypanosomatids, A. deanei and S. culicis do not require any source of heme for growth because the bacterium contains the required enzymes to produce heme precursors that complete the heme synthesis pathway in the host cell [7,9,10,11,84]. Our results support the idea that heme biosynthesis is mainly accomplished by the endosymbiont, with the last three steps of this pathway performed by the host trypanosomatid, and in most cases also by the bacterium as described in [11]. Furthermore, this metabolic route may represent the result of extensive gene loss and multiple lateral gene transfer events in trypanosomatids [11]. According to our genomic analyses, the symbiotic bacteria also perform the synthesis of histidine, folate, riboflavin, and coenzyme A, but one step is missing in the middle of each pathway, making them candidates for metabolic interchange with the host. In the case of folate and coenzyme A biosynthesis, one candidate gene was found in the host trypanosomatid. Moreover, none of these four metabolites are required in the growth medium of A. deanei and S. culicis [85], suggesting that these pathways are fully functional. Candidate genes for the ubiquinone biosynthetic pathway were found in S. culicis but none for A. deanei endosymbionts. For the route with chorismate as precursor, only the first out of nine steps is missing in the S. culicis endosymbiont; moreover a candidate gene for that step is found in S. culicis genome. Only a few steps of these pathways are absent in A. deanei and S. culicis host organisms. In L. major, the ubiquinone ring synthesis has been described as having either acetate (via chorismate as in prokaryotes) or aromatic amino acids (as in mammalian cells) as precursors [45]. Methionine is considered essential for the growth of A. deanei, S. culicis and S. oncopelti [41,81,82]. We were not able to identify one enzyme among the four involved in the synthesis of methionine from either pyruvate or serine via cysteine in the genomes of A. deanei and S. culicis. No candidate to complement this pathway was found in the symbiotic bacteria. Purine and pyrimidine metabolism for nucleotide production. Trypanosomatids are not able to synthesize the purine ring de novo [86,87,88]. We observed that endosymbiontbearing trypanosomatids contain sequences encoding ectonucleotidases from the E-NTPDase family and the adenosine deaminase family (Table S18), which are required for the hydrolysis and deamination of extracellular nucleotides [89,90]. Interestingly, sequences encoding 59-nucleotidases are not found in either symbiont-bearing trypanosomatid. The absence of this enzyme can be related to the presence of the endosymbiont, which can supply adenosine to the host cell, as we found all genes involved in the de novo pathway in the symbionts, indicating that they are able to complement the purine requirements of the host (Figure 8). PLOS ONE | www.plosone.org 14 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis Figure 8. Purine production, acquisition, and utilization in A. deanei and S. culicis. The figure illustrates the production, acquisition and utilization of purines in the host trypanosomes considering the presence of endosymbiont enzymes. This model suggests that the trypanosomatid acquires purines from the symbiont, which synthesizes them de novo. Some ecto-localized proteins, such as apyrase (APY) and adenosine deaminase (ADA), could be responsible for the generation of extracellular nucleosides, nucleobases, and purines. Nucleobases and purines could be acquired by the parasite through membrane transporters (T) or diffusion and could be incorporated into DNA, RNA, and kDNA molecules after ‘‘purine salvage pathway’’ processing. Abbreviations: NTP (nucleoside tri-phosphate), NDP (nucleoside di-phosphate), NMP (nucleoside mono- phosphate), N (nucleobase), ADO (adenosine), INO (inosine). doi:10.1371/journal.pone.0060209.g008 In this way, both symbiont-containing protozoa express a unique complement of nutritionally indispensable salvage and interconversion enzymes that enable the acquisition of purines from the medium. The intracellular purines can be acquired through the medium by the action of ectonucleotidases and nucleoside transporters. residues is the dolichyl-diphosphooligosaccharide-protein glycosyltransferase (DDOST), an oligosaccharyltransferase (OST) that is not classified in any of the above-mentioned families. The A. deanei and S. culicis DDOSTs contain the STT3 domain, a subunit required to establish the activity of the oligosaccharyl transferase (OTase) complex of proteins, and they are orthologous to the human DDOST. These OTase complexes are responsible for transferring lipid-linked oligosaccharides to the asparagine side chain of the acceptor polypeptides in the endoplasmic reticulum [101], suggesting a conserved N-glycosylation among the trypanosomatids. Five different GalfT sequences are also present in the endosymbiont-bearing trypanosomatids, and all of them contain the proposed catalytic site, indicating genetic redundancy. Redundancy of GalfTs is commonly observed in many different trypanosomatid species, as different transferases are used for each linkage type [102]. As b-galactofuranose (b-Galf) has been shown to participate in trypanosome-host interactions [103], their presence in A. deanei and S. culicis might also indicate a role in the interaction with the insect host. However, no enzymes involved in synthesis of b-Galf-containing glycoconjugates are detected in our A. deanei dataset, despite reports of enzymes involved in b-Galf synthesis in Crithidia spp. [104,105,106]. Surface proteins and protease gene families. One remarkable characteristic of trypanosomatid genomes is the large expansion of gene families encoding surface proteins [107]. Experimental data indicated that these genes encode surface proteins involved in interactions with the hosts. We selected eight gene families encoding surface proteins present in T. cruzi, T. brucei and Leishmania spp. to search for homologous sequences in the genomes of the two symbiont-bearing trypanosomatids. Because the draft assemblies of these genomes are still fragmented, we also used a read-based analysis to search for sequences with homology to these multigene families. It is well known that misassemblies frequently occur for tandemly repeated genes, as most repetitive copies collapse into only one or two copies. A total of 3,624,411 reads (corresponding to 1,595 Mb of sequences) from the A. deanei genome and 2,666,239 reads (corresponding to 924 Mb) from the S. culicis genome were used in this comparison. In A. deanei and S. culicis, we identified gene families encoding amastins, gp63, and Factors Involved in Protozoa-host Interactions Monoxenic trypanosomatids only parasitize invertebrates, especially insects belonging to the orders Diptera and Hemiptera [1]. These organisms have been found in Malphigian tubules, in the hemolymph and hemocoel, and in the midgut, which is considered the preferential site for protozoal multiplication and colonization [1,96,97]. S. culicis, for example, is able to colonize the insect midgut, to invade the hemocoel and to reach the salivary glands [97,98]. The presence of the symbiotic bacterium has been shown to influence the interactions between trypanosomatid cells and insect cell lines, explanted guts and host insects [4,20]. This seems to occur because the endosymbiont influences the glycoprotein and polysaccharide composition of the host, the exposure of carbohydrates on the protozoan plasma membrane, and the surface charge [18,19,20,21]. Several glycosyltransferases from the two major families (GT-A and GT-B [99]) and members of the family 25 (glycosyltransferases involved in lipo-oligosaccharide protein biosynthesis) are present in both A. deanei and S. culicis genomes (Table S20). Other glycosyltransferases with no characteristic domains that are thus not classified as belonging to the GT-A or GT-B families are also found in the A. deanei and S. culicis genomes. Importantly, 1,2fucosyltransferase transferase is present in A. deanei but not in the S. culicis dataset, and fucose residues were found in high amounts on glycoinositolphospholipid (GIPL) molecules of A. deanei, different from the observations for other trypanosomatids (data not published). Although the role of fucose is unknown, fucose and arabinose transfer to lipophosphoglycan (LPG) of Leishmania is noticed when the culture medium is supplemented with this carbohydrate [100], suggesting that fucose might have a specific role in A. deanei-insect interactions. Another glycosyltransferase found in both A. deanei and S. culicis genomes and involved in the N-glycosylation of asparagine PLOS ONE | www.plosone.org 15 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis similarity to any other known protein, its function remains unknown. In this work, we identified 31 genes with sequences belonging to all four sub-families of amastins in the genome of A. deanei and 14 copies of amastin genes in S. culicis. Similar to Leishmania, members of all four amastin subfamilies were identified in symbiont-containing species (see Figure S7). cysteine peptidases (Table S21). As expected, we could not identify sequences homologous to mucin-like glycoproteins typical of T. cruzi [108], variant surface glycoprotein (VSG) characteristic of African trypanosomes, or trans-sialidases present in the genomes of all Trypanosoma species. Calpain-like cysteine peptidases constitute the largest gene family identified in the A. deanei (85 members) and S. culicis (62 members) genomes, and they are also abundant in trypanosomatids [46]. The presence of the N-terminal fatty acid acylation motif was found in some members of calpain-like cysteine peptidases, indicating that some of these peptidases are associated with membranes, as has also been shown for other members of the family [109,110]. The relatively large amount of calpain-like peptidases may be related to the presence of the endosymbiont, which would require a more complex regulation of the cell cycle and intracellular organelle distribution [14], as cytosolic calpains were found to regulate cytoskeletal remodeling, signal transduction, and cell differentiation [46]. A second large gene family in the A. deanei and S. culicis genomes encoding surface proteins with proteolytic activity is gp63. In our genomic analyses, we identified 37 and 9 genes containing sequences homologous to the gp63 of Leishmania and Trypanosoma spp. in the genomes of A. deanei and S. culicis, respectively. Proteins belonging to this group of zinc metalloproteases, also known as major surface protease (MSP) or leishmanolysin, have been characterized in various species of Leishmania and Trypanosoma [111]. Extensive studies on the role of this family in Leishmania indicate that they are involved in several aspects of host-parasite interaction including resistance to complement-mediated lysis, cell attachment, entry, and survival in macrophages [112]. Gene deletion studies in T. brucei indicated that the TbMSP of bloodstream trypanosomes acts in concert with phospholipase C to remove the variant surface protein from the membrane, required for parasite differentiation into the procyclic insect form [113]. Gp63-like molecules have been observed on the cell surface of symbiont-harboring trypanosomatids [114]. Importantly, the symbiont containing A. deanei displays a higher amount (2-fold) of leishmanolysin-like molecules at the surface compared to the aposymbiotic strain, which are unable to colonize insects [4]. As anti-gp63 antibodies decrease protozoan-insect interactions [21], our results reinforce the idea that the presence of such interactions caused the expansion of this gene family in endosymbiont-bearing organisms. In contrast, only two copies of lysosomal cathepsin-like cysteine peptidases were identified in the A. deanei (AGDE05983 and AGDE10254) and S. culicis genomes (STCU01417 and STCU06430). The two A. deanei sequences encode identical cathepsin-B-like proteins, whereas the two S. culicis genes encode proteases of the cathepsin-L-like group. This class of cysteine peptidase is represented by cruzain or cruzipain, major lysosomal proteinases of T. cruzi expressed by parasites found in insect and vertebrate hosts, and encoded by a large gene family [115,116]. In T. cruzi, these enzymes have important roles in various aspects of the host/parasite relationship and in intracellular digestion as a nutrient source [115]. Conversely, the low copy number of this class of lysosomal peptidase in symbiont-containing trypanosomatids seems to be related to their low nutritional requirements. Amastins constitute a third large gene family in the A. deanei and S. culicis genomes that encodes surface proteins. Initially described in T. cruzi [117], amastin genes have also been identified in various Leishmania species [118], in A. deanei and in another related insect parasite, Leptomonas seymouri [119]. In Leishmania, amastins constitute the largest gene family with gene expression that is regulated during the parasite life cycle. As amastin has no sequence PLOS ONE | www.plosone.org Conclusion The putative proteome of symbiont-bearing trypanosomatids revealed that these microorganisms exhibit unique features when compared to other protozoa of the same family and that they are most closely related to Leishmania species. Most relevant are the differences in the genes related to cytoskeleton, paraflagellar and kinetoplast structures, along with a unique pattern of peptidase gene organization that may be related to the presence of the symbiont and of the monoxenic life style. The symbiotic bacteria of A. deanei and S. culicis are phylogenetically related with a common ancestor, most likely a b-proteobacteria of the Alcaligenaceae family. The genomic content of these symbionts is highly reduced, indicating gene loss and/or transfer to the host cell nucleus. In addition, we confirmed that both bacteria contain genes that encode enzymes that complement several metabolic routes of the host trypanosomatids, supporting the fitness of the symbiotic relationship. Supporting Information Figure S1 Evolutionary history of endosymbionts obtained through a phylogenomic approach. The figure indicates analysis using the Neighbor joining (NJ) (A) and Maximum parsimony (MP) (B) methods. For NJ and MP, the percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1,500 replicates) is shown next to the branches. The scale bar represents amino acids substitutions per site. (TIF) Figure S2 Amino acid alignment of Kinetoplast Associ- ated Proteins. Panel (A) shows the KAP4 ClustalW alignment of A. deanei (AdKAP-4), S. culicis (ScKAP-4) and C. fasciculata (CfKAP4). Panel (B) shows the ClustalW alignment of KAP2 of S. culicis and C. fasciculata (CfKAP2-2, GenBank Q9TY84 and CfKAP2-1 GeneBank Q9TY83). Black color highlight is 100% similar gray is 80 to 99% similar light gray is 60 to 79% similar white is less than 59% similar. (TIF) Figure S3 Comparison of the histone sequences of A. deanei and S. culicis with other trypanosomes. Residues indicated in red correspond to lysines that are acetylated and green, methylated in T. cruzi and T. brucei [121]. Residues indicated in blue are predicted site for phosphorylation upon DNA damage as shown in T. brucei [122]. (TIF) Figure S4 Phylogenetic tree of sirtuins from Trypanosomatids. The numbers represent bootstrap values. The proteins from each species are grouped in nuclear and mitochondrial Sir2 based on the sequences of S. cerevisiae (nuclear), and the similarity with S. coelicolor and S. enterica. (TIF) Figure S5 Phylogenetic tree of spliced leader (SL) sequences of A. deanei and S. culicis. A neighbor-joining tree (1000 bootstraps) obtained by MEGA 5.0 using the SL gene from the A. deanei and S. culicis genome sequences and sequences 16 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis Table S11 Identified ORFs involved in DNA transcription and RNA splicing in the genome of A. deanei and S. culicis. (DOC) retrieved from GenBank (S. culicis DQ860203.1, L. pyrrhocoris JF950600.1, H. samuelpessoai X62331.1, H. mariadeanei AY547468.1, A. deanei EU099545.1, T. rangeli AF083351 and T. cruzi AY367127). (TIF) Table S12 Transcription related proteins in the endosymbionts of A. deanei and S. culicis. (DOC) Figure S6 Comparison between the amino acid sequences of S. culicis CRK sequences. The figure shows a ClustalW alignment with the ATP binding domains boxed in yellow, PSTAIRE motifs boxed in blue, and the catalytic domain boxed in pink. Red residues indicate the observed variations in the amino acids involved in the activity. (TIF) Table S13 Main ORFs detected participating in ribosomal biogenesis and translation in A. deanei and S. culicis. (DOC) Table S14 Table S15 Number of heat shock and stress response proteins in A. deanei and S.culicis. (DOC) Tree showing the distribution of amastin subfamilies in A. deanei. The amastins are grouped as deltaamastin (red), gamma-amastins (yellow), alpha-amastins (dark blue) and beta-amastins (light blue). (TIF) Figure S7 Table S16 Glycerophospholipids (GPL) enzymes of A. deanei and S. culicis endosymbionts. (DOC) Table S17 protein (KAPs) in A. deanei and S. culicis. (DOC) Table S2 Histone acetyltransferases of the MYST family present in A. deanei and S. culicis compared to other trypanosomes. (DOC) Table S18 Ectonucleotidases families and identification of ORFs found in A. deanei and S. culicis. (DOC) Table S19 ORFs encoding enzymes involved in purine and pyrimidine metabolism of A. deanei, S. culicis and their symbionts. (DOC) Table S3 Distribution of Sirtuins in the protozoan and endosymbiont species. (DOC) Table S4 Histone deacetylase identified in A. deanei and S. culicis. (DOC) Table S20 Glysosyltransferases found in A. deanei and S. culicis. (DOC) Table S5 Histone methyltransferase in A. deanei and S. Table S21 culicis. (DOC) Surface proteins of A. deanei e S. culicis. (DOC) Text S1 Histone chaperones identified in A. deanei and S. culicis. (DOC) (DOC) Table S7 Glycerophospholipids (GPL) enzymes of A. deanei and S. culicis1. (DOC) Table S1 ORFs identified as Kinetoplast-associated Table S6 Identified phosphatases in A. deanei and S. culicis. (DOC) Acknowledgments Bromodomain proteins found in A. deanei and S. culicis. (DOC) We would like to dedicate this paper to professors Erney Camargo and Marta Teixeira who have made important contributions related to the study of basic aspects of the biology of trypanosomatids, especially those harboring an endosymbiont, and identified several new species of this relevant and interesting group of eukaryotic microorganism. Table S8 Components of replication mechanism of the kDNA identified in A. deanei and S. culicis and similar endosymbionts ORFs. (DOC) Author Contributions Table S9 Identified ORFs related to DNA replication and DNA repair in A. deanei and S. culicis. (DOC) Conceived and designed the experiments: MCMM WS SS ATRV. Analyzed the data: MCMM ACAM SSAS CMCCP RS CCK LGPA OLC LPC MB ACC BAL CRM CMAS CMP CBAM CET DCB DFG DPP ECG FFG FKM GFRL GW GHG JLRF MCE MHSG MFS MP PHS RPMN SMRT TEFM TAOM TPÜ WS SS ATRV. Contributed reagents/materials/analysis tools: ATRV LGPA OLC WS. Wrote the paper: MCMM SS ATRV. Table S10 DNA replication and repair ORFs found in the A. deanei and S. culicis endosymbionts. (DOC) References 3. Edwards C, Chance B (1982) Evidence for the presence of two terminal oxidases in the trypanosomatid Crithidia oncopelti. Journal of General Microbiology 128: 1409–1414. 4. Fampa P, Correa-da-Silva MS, Lima DC, Oliveira SM, Motta MC, et al. (2003) Interaction of insect trypanosomatids with mosquitoes, sand fly and the respective insect cell lines. International Journal for Parasitology 33: 1019– 1026. 1. Wallace FG (1966) The trypanosomatid parasites of insects and arachnids. Experimental Parasitology 18: 124–193. 2. Teixeira MM, Borghesan TC, Ferreira RC, Santos MA, Takata CS, et al. (2011) Phylogenetic validation of the genera Angomonas and Strigomonas of trypanosomatids harboring bacterial endosymbionts with the description of new species of trypanosomatids and of proteobacterial symbionts. Protist 162: 503–524. PLOS ONE | www.plosone.org 17 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis 31. Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H (2000) Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407: 81–86. 32. McCutcheon JP, Moran NA (2012) Extreme genome reduction in symbiotic bacteria. Nature Reviews Microbiology 10: 13–26. 33. Parkhill J, Sebaihia M, Preston A, Murphy LD, Thomson N, et al. (2003) Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nature Genetics 35: 32–40. 34. Cummings CA, Brinig MM, Lepp PW, van de Pas S, Relman DA (2004) Bordetella species are distinguished by patterns of substantial gene loss and host adaptation. Journal of Bacteriology 186: 1484–1492. 35. Gull K (1999) The cytoskeleton of trypanosomatid parasites. Annual Review of Microbiology 53: 629–655. 36. Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H, et al. (2005) The Genome of the African Trypanosome Trypanosoma brucei. Science 309: 416–422. 37. Beech PL, Heimann K, Melkonian M (1991) Development of the Flagellar Apparatus during the Cell-Cycle in Unicellular Algae. Protoplasma 164: 23– 37. 38. Lange BM, Gull K (1996) Structure and function of the centriole in animal cells: progress and questions. Trends in Cell Biology 6: 348–352. 39. Garcia-Salcedo JA, Perez-Morga D, Gijon P, Dilbeck V, Pays E, et al. (2004) A differential role for actin during the life cycle of Trypanosoma brucei. The EMBO Journal 23: 780–789. 40. Gadelha C, Wickstead B, Gull K (2007) Flagellar and ciliary beating in trypanosome motility. Cell Motility and the Cytoskeleton 64: 629–643. 41. Portman N, Gull K (2010) The paraflagellar rod of kinetoplastid parasites: from structure to components and function. International Journal for Parasitology 40: 135–148. 42. Lacomble S, Vaughan S, Gadelha C, Morphew MK, Shaw MK, et al. (2009) Three-dimensional cellular architecture of the flagellar pocket and associated cytoskeleton in trypanosomes revealed by electron microscope tomography. Journal of Cell Science 122: 1081–1090. 43. Oberholzer M, Marti G, Baresic M, Kunz S, Hemphill A, et al. (2007) The s cAMP phosphodiesterases TbrPDEB1 and TbrPDEB2: flagellar enzymes that are essential for parasite virulence. The FASEB Journal 21: 720–731. 44. Ginger ML, Portman N, McKean PG (2008) Swimming with protists: perception, motility and flagellum assembly. Nature Reviews Microbiology 6: 838–850. 45. Xu C, Ray DS (1993) Isolation of proteins associated with kinetoplast DNA networks in vivo. Proceedings of the National Academy of Sciences of the United States of America 90: 1786–1789. 46. Ersfeld K, Barraclough H, Gull K (2005) Evolutionary relationships and protein domain architecture in an expanded calpain superfamily in kinetoplastid parasites. Journal of Molecular Evolution 61: 742–757. 47. Avliyakulov NK, Lukes J, Ray DS (2004) Mitochondrial histone-like DNAbinding proteins are essential for normal cell growth and mitochondrial function in Crithidia fasciculata. Eukaryotic Cell 3: 518–526. 48. Cavalcanti DP, Shimada MK, Probst CM, Souto-Padron TC, de Souza W, et al. (2009) Expression and subcellular localization of kinetoplast-associated proteins in the different developmental stages of Trypanosoma cruzi. BMC Microbiology 9: 120. 49. Wei K, Clark AB, Wong E, Kane MF, Mazur DJ, et al. (2003) Inactivation of Exonuclease 1 in mice results in DNA mismatch repair defects, increased cancer susceptibility, and male and female sterility. Genes & Development 17: 603–614. 50. Wu Y, Berends MJ, Post JG, Mensink RG, Verlind E, et al. (2001) Germline mutations of EXO1 gene in patients with hereditary nonpolyposis colorectal cancer (HNPCC) and atypical HNPCC forms. Gastroenterology 120: 1580– 1587. 51. Kim YR, Yoo NJ, Lee SH (2010) Somatic mutation of EXO1 gene in gastric and colorectal cancers with microsatellite instability. Acta oncologica 49: 859– 860. 52. Augusto-Pinto L, Teixeira SM, Pena SD, Machado CR (2003) Singlenucleotide polymorphisms of the Trypanosoma cruzi MSH2 gene support the existence of three phylogenetic lineages presenting differences in mismatchrepair efficiency. Genetics 164: 117–126. 53. Machado CR, Augusto-Pinto L, McCulloch R, Teixeira SM (2006) DNA metabolism and genetic diversity in Trypanosomes. Mutation Research 612: 40–57. 54. Andreeva AV, Kutuzov MA (2008) Protozoan protein tyrosine phosphatases. International Journal for Parasitology 38: 1279–1295. 55. Brenchley R, Tariq H, McElhinney H, Szoor B, Huxley-Jones J, et al. (2007) The TriTryp phosphatome: analysis of the protein phosphatase catalytic domains. BMC Genomics 8: 434. 56. Szoor B, Wilson J, McElhinney H, Tabernero L, Matthews KR (2006) Protein tyrosine phosphatase TbPTP1: a molecular switch controlling life cycle differentiation in trypanosomes. The Journal of Cell Biology 175: 293–303. 57. Huang H (2011) Signal transduction in Trypanosoma cruzi. Advances in Parasitology 75: 325–344. 58. Atayde VD, Tschudi C, Ullu E (2011) The emerging world of small silencing RNAs in protozoan parasites. Trends in Parasitology 27: 321–327. 5. de Azevedo-Martins AC, Frossard ML, de Souza W, Einicker-Lamas M, Motta MC (2007) Phosphatidylcholine synthesis in Crithidia deanei: the influence of the endosymbiont. FEMS Microbiology Letters 275: 229–236. 6. Motta MCM, Leal LHM, Souza WD, De Almeida DF, Ferreira LCS (1997) Detection of Penicillin-binding Proteins in the Endosymbiont of the Trypanosomatid Crithidia deanei. The Journal of Eukaryotic Microbiology 44: 492–496. 7. Chang KP, Chang CS, Sassa S (1975) Heme biosynthesis in bacteriumprotozoon symbioses: enzymic defects in host hemoflagellates and complemental role of their intracellular symbiotes. Proceedings of the National Academy of Sciences of the United States of America 72: 2979–2983. 8. Camargo EP, Freymuller E (1977) Endosymbiont as supplier of ornithine carbamoyltransferase in a trypanosomatid. Nature 270: 52–53. 9. Galinari S, Camargo EP (1978) Trypanosomatid protozoa: survey of acetylornithinase and ornithine acetyltransferase. Experimental Parasitology 46: 277–282. 10. Salzman TA, Batlle AM, Angluster J, de Souza W (1985) Heme synthesis in Crithidia deanei: influence of the endosymbiote. The International Journal of Biochemistry 17: 1343–1347. 11. Alves JM, Voegtly L, Matveyev AV, Lara AM, da Silva FM, et al. (2011) Identification and phylogenetic analysis of heme synthesis genes in trypanosomatids and their bacterial endosymbionts. PLoS One 6: e23518. 12. Frossard ML, Seabra SH, DaMatta RA, de Souza W, de Mello FG, et al. (2006) An endosymbiont positively modulates ornithine decarboxylase in host trypanosomatids. Biochemical and Biophysical Research Communications 343: 443–449. 13. Motta MC, Soares MJ, Attias M, Morgado J, Lemos AP, et al. (1997) Ultrastructural and biochemical analysis of the relationship of Crithidia deanei with its endosymbiont. European Journal of Cell Biology 72: 370–377. 14. Motta MC, Catta-Preta CM, Schenkman S, Azevedo Martins AC, Miranda K, et al. (2010) The bacterium endosymbiont of Crithidia deanei undergoes coordinated division with the host cell nucleus. PLoS One 5: e12415. 15. Freymuller E, Camargo EP (1981) Ultrastructural differences between species of trypanosomatids with and without endosymbionts. The Journal of Protozoology 28: 175–182. 16. Gadelha C, Wickstead B, de Souza W, Gull K, Cunha-e-Silva N (2005) Cryptic paraflagellar rod in endosymbiont-containing kinetoplastid protozoa. Eukaryotic Cell 4: 516–525. 17. Cavalcanti DP, Thiry M, de Souza W, Motta MC (2008) The kinetoplast ultrastructural organization of endosymbiont-bearing trypanosomatids as revealed by deep-etching, cytochemical and immunocytochemical analysis. Histochemistry and Cell Biology 130: 1177–1185. 18. Dwyer DM, Chang KP (1976) Surface membrane carbohydrate alterations of a flagellated protozoan mediated by bacterial endosymbiotes. Proceedings of the National Academy of Sciences of the United States of America 73: 852–856. 19. Oda LM, Alviano CS, Filho FCS, Angluster J, Roitman I, et al. (1984) Surface Anionic Groups in Symbiote-Bearing and Symbiote-Free Strains of Crithidia deanei. The Journal of Eukaryotic Microbiology 31: 131–134. 20. d9Avila-Levy CM, Silva BA, Hayashi EA, Vermelho AB, Alviano CS, et al. (2005) Influence of the endosymbiont of Blastocrithidia culicis and Crithidia deanei on the glycoconjugate expression and on Aedes aegypti interaction. FEMS Microbiology Letters 252: 279–286. 21. d9Avila-Levy CM, Santos LO, Marinho FA, Matteoli FP, Lopes AH, et al. (2008) Crithidia deanei: influence of parasite gp63 homologue on the interaction of endosymbiont-harboring and aposymbiotic strains with Aedes aegypti midgut. Experimental Parasitology 118: 345–353. 22. Du Y, Maslov DA, Chang KP (1994) Monophyletic origin of beta-division proteobacterial endosymbionts and their coevolution with insect trypanosomatid protozoa Blastocrithidia culicis and Crithidia spp. Proceedings of the National Academy of Sciences of the United States of America 91: 8437–8441. 23. Du Y, McLaughlin G, Chang KP (1994) 16S ribosomal DNA sequence identities of beta-proteobacterial endosymbionts in three Crithidia species. Journal of Bacteriology 176: 3081–3084. 24. Martin W, Hoffmeister M, Rotte C, Henze K (2001) An overview of endosymbiotic models for the origins of eukaryotes, their ATP-producing organelles (mitochondria and hydrogenosomes), and their heterotrophic lifestyle. Biological chemistry 382: 1521–1539. 25. Hollar L, Lukes J, Maslov DA (1998) Monophyly of endosymbiont containing trypanosomatids: phylogeny versus taxonomy. The Journal of Eukaryotic Microbiology 45: 293–297. 26. Hebert L, Moumen B, Duquesne F, Breuil MF, Laugier C, et al. (2011) Genome sequence of Taylorella equigenitalis MCE9, the causative agent of contagious equine metritis. Journal of Bacteriology 193: 1785. 27. Hebert L, Moumen B, Pons N, Duquesne F, Breuil MF, et al. (2012) Genomic characterization of the Taylorella genus. PLoS One 7: e29953. 28. Sugimoto C, Isayama Y, Sakazaki R, Kuramochi S (1983) Transfer of Haemophilus equigenitalis Taylor et al. 1978 to the genusTaylorella gen. nov. as Taylorella equigenitalis comb. nov. Current Microbiology 9: 155–162. 29. Moran NA, McCutcheon JP, Nakabachi A (2008) Genomics and evolution of heritable bacterial symbionts. Annual Review of Genetics 42: 165–190. 30. Toft C, Andersson SG (2010) Evolutionary microbial genomics: insights into bacterial host adaptation. Nature Reviews Genetics 11: 465–475. PLOS ONE | www.plosone.org 18 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis 91. Sansom FM, Robson SC, Hartland EL (2008) Possible effects of microbial ectonucleoside triphosphate diphosphohydrolases on host-pathogen interactions. Microbiology and Molecular Biology Reviews 72: 765–781. 92. Maioli TU, Takane E, Arantes RM, Fietto JL, Afonso LC (2004) Immune response induced by New World Leishmania species in C57BL/6 mice. Parasitology Research 94: 207–212. 93. Marques da Silva C, Miranda Rodrigues L, Passos da Silva Gomes A, Mantuano Barradas M, Sarmento Vieira F, et al. (2008) Modulation of P2X7 receptor expression in macrophages from mineral oil-injected mice. Immunobiology 213: 481–492. 94. Rebora K, Desmoucelles C, Borne F, Pinson B, Daignan-Fornier B (2001) Yeast AMP pathway genes respond to adenine through regulated synthesis of a metabolic intermediate. Molecular and Cellular Biology 21: 7901–7912. 95. Zalkin H, Nygaard P (1996) Biosynthesis of purine nucleotides. In: Frederick Carl N, editor. Escherichia coli and Salmonella : cellular and molecular biology. 2 ed. Washington, D.C.: ASM Press. 561–579. 96. Podlipaev SA (2000) Insect trypanosomatids: the need to know more. Memorias do Instituto Oswaldo Cruz 95: 517–522. 97. Correa-da-Silva MS, Fampa P, Lessa LP, Silva Edos R, dos Santos Mallet JR, et al. (2006) Colonization of Aedes aegypti midgut by the endosymbiont-bearing trypanosomatid Blastocrithidia culicis. Parasitology Research 99: 384–391. 98. Nascimento MT, Garcia MC, da Silva KP, Pinto-da-Silva LH, Atella GC, et al. (2010) Interaction of the monoxenic trypanosomatid Blastocrithidia culicis with the Aedes aegypti salivary gland. Acta Tropica 113: 269–278. 99. Lairson LL, Henrissat B, Davies GJ, Withers SG (2008) Glycosyltransferases: structures, functions, and mechanisms. Annual Review of Biochemistry 77: 521–555. 100. Mengeling BJ, Turco SJ (1998) Microbial glycoconjugates. Current Opinion in Structural Biology 8: 572–577. 101. Schwarz F, Aebi M (2011) Mechanisms and principles of N-linked protein glycosylation. Current Opinion in Structural Biology 21: 576–582. 102. Oppenheimer M, Valenciano AL, Sobrado P (2011) Biosynthesis of galactofuranose in kinetoplastids: novel therapeutic targets for treating leishmaniasis and chagas9 disease. Enzyme research 2011: 415976. 103. de Lederkremer RM, Colli W (1995) Galactofuranose-containing glycoconjugates in trypanosomatids. Glycobiology 5: 547–552. 104. Moraes CT, Bosch M, Parodi AJ (1988) Structural characterization of several galactofuranose-containing, high-mannose-type oligosaccharides present in glycoproteins of the trypanosomatid Leptomonas samueli. Biochemistry 27: 1543–1549. 105. Mendelzon DH, Previato JO, Parodi AJ (1986) Characterization of proteinlinked oligosaccharides in trypanosomatid flagellates. Molecular and Biochemical Parasitology 18: 355–367. 106. Mendelzon DH, Parodi AJ (1986) N-linked high mannose-type oligosaccharides in the protozoa Crithidia fasciculata and Crithidia harmosa contain galactofuranose residues. The Journal of Biological Chemistry 261: 2129–2133. 107. El-Sayed NM, Myler PJ, Blandin G, Berriman M, Crabtree J, et al. (2005) Comparative genomics of trypanosomatid parasitic protozoa. Science 309: 404–409. 108. El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, et al. (2005) The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science 309: 409–415. 109. Tull D, Vince JE, Callaghan JM, Naderer T, Spurck T, et al. (2004) SMP-1, a member of a new family of small myristoylated proteins in kinetoplastid parasites, is targeted to the flagellum membrane in Leishmania. Molecular Biology of the Cell 15: 4775–4786. 110. Galetovic A, Souza RT, Santos MR, Cordero EM, Bastos IM, et al. (2011) The repetitive cytoskeletal protein H49 of Trypanosoma cruzi is a calpain-like protein located at the flagellum attachment zone. PLoS One 6: e27634. 111. Yao C, Li Y, Donelson JE, Wilson ME (2010) Proteomic examination of Leishmania chagasi plasma membrane proteins: Contrast between avirulent and virulent (metacyclic) parasite forms. Proteomics Clinical applications 4: 4–16. 112. Yao C, Donelson JE, Wilson ME (2003) The major surface protease (MSP or GP63) of Leishmania sp. Biosynthesis, regulation of expression, and function. Molecular and Biochemical Parasitology 132: 1–16. 113. Grandgenett PM, Otsu K, Wilson HR, Wilson ME, Donelson JE (2007) A function for a specific zinc metalloprotease of African trypanosomes. PLoS Pathogens 3: 1432–1445. 114. Nogueira de Melo AC, d9Avila-Levy CM, Dias FA, Armada JL, Silva HD, et al. (2006) Peptidases and gp63-like proteins in Herpetomonas megaseliae: possible involvement in the adhesion to the invertebrate host. International Journal for Parasitology 36: 415–422. 115. Cazzulo JJ (2002) Proteinases of Trypanosoma cruzi: patential targets for the chemotherapy of Chagas desease. Current Topics in Medicinal Chemistry 2: 1261–1271. 116. Caffrey CR, Lima AP, Steverding D (2011) Cysteine peptidases of kinetoplastid parasites. Advances in experimental medicine and biology 712: 84–99. 117. Teixeira SM, Russell DG, Kirchhoff LV, Donelson JE (1994) A differentially expressed gene family encoding ‘‘amastin,’’ a surface protein of Trypanosoma cruzi amastigotes. The Journal of Biological Chemistry 269: 20509–20516. 118. Wu Y, El Fakhry Y, Sereno D, Tamar S, Papadopoulou B (2000) A new developmentally regulated gene family in Leishmania amastigotes encoding a homolog of amastin surface proteins. Molecular and Biochemical Parasitology 110: 345–357. 59. Barnes RL, Shi H, Kolev NG, Tschudi C, Ullu E (2012) Comparative genomics reveals two novel RNAi factors in Trypanosoma brucei and provides insight into the core machinery. PLoS Pathogens 8: e1002678. 60. Van Hellemond JJ, Neuville P, Schwarz RT, Matthews KR, Mottram JC (2000) Isolation of Trypanosoma brucei CYC2 and CYC3 cyclin genes by rescue of a yeast G(1) cyclin mutant. Functional characterization of CYC2. The Journal of Biological Chemistry 275: 8315–8323. 61. Carballido-Lopez R, Errington J (2003) A dynamic bacterial cytoskeleton. Trends in Cell Biology 13: 577–583. 62. Pichoff S, Lutkenhaus J (2002) Unique and overlapping roles for ZipA and FtsA in septal ring assembly in Escherichia coli. The EMBO Journal 21: 685–693. 63. Harry E, Monahan L, Thompson L (2006) Bacterial cell division: the mechanism and its precison. International Review of Cytology 253: 27–94. 64. Margolin W (2005) FtsZ and the division of prokaryotic cells and organelles. Nature Reviews Molecular Cell Biology 6: 862–871. 65. Buddelmeijer N, Beckwith J (2004) A complex of the Escherichia coli cell division proteins FtsL, FtsB and FtsQ forms independently of its localization to the septal region. Molecular Microbiology 52: 1315–1327. 66. Chen JC, Beckwith J (2001) FtsQ, FtsL and FtsI require FtsK, but not FtsN, for co-localization with FtsZ during Escherichia coli cell division. Molecular Microbiology 42: 395–413. 67. Chen JC, Weiss DS, Ghigo JM, Beckwith J (1999) Septal localization of FtsQ, an essential cell division protein in Escherichia coli. Journal of Bacteriology 181: 521–530. 68. Mercer KL, Weiss DS (2002) The Escherichia coli cell division protein FtsW is required to recruit its cognate transpeptidase, FtsI (PBP3), to the division site. Journal of Bacteriology 184: 904–912. 69. Bouhss A, Trunkfield AE, Bugg TD, Mengin-Lecreulx D (2008) The biosynthesis of peptidoglycan lipid-linked intermediates. FEMS microbiology reviews 32: 208–233. 70. Ni Y, Chen R (2009) Extracellular recombinant protein production from Escherichia coli. Biotechnology Letters 31: 1661–1670. 71. Ni Y, Reye J, Chen RR (2007) lpp deletion as a permeabilization method. Biotechnology and Bioengineering 97: 1347–1356. 72. Mingorance J, Tamames J, Vicente M (2004) Genomic channeling in bacterial cell division. Journal of molecular recognition 17: 481–487. 73. Motta MC, Picchi GF, Palmie-Peixoto IV, Rocha MR, de Carvalho TM, et al. (2004) The microtubule analog protein, FtsZ, in the endosymbiont of trypanosomatid protozoa. The Journal of Eukaryotic Microbiology 51: 394– 401. 74. Timmis JN, Ayliffe MA, Huang CY, Martin W (2004) Endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes. Nature Reviews Genetics 5: 123–135. 75. Pyke KA (2010) Plastid division. AoB plants 2010: plq016. 76. Motta MC (2010) Endosymbiosis in trypanosomatids as a model to study cell evolution. The Open Parasitology Journal 4: 139–147. 77. Opperdoes FR, Michels PA (2008) Complex I of Trypanosomatidae: does it exist? Trends in Parasitology 24: 310–317. 78. Morales J, Mogi T, Mineki S, Takashima E, Mineki R, et al. (2009) Novel mitochondrial complex II isolated from Trypanosoma cruzi is composed of 12 peptides including a heterodimeric Ip subunit. The Journal of Biological Chemistry 284: 7255–7263. 79. Edwards C (1984) Terminal oxidases of Crithidia oncopelti. FEMS Microbiology Letters 21: 319–322. 80. Palmie-Peixoto IV, Rocha MR, Urbina JA, de Souza W, Einicker-Lamas M, et al. (2006) Effects of sterol biosynthesis inhibitors on endosymbiont-bearing trypanosomatids. FEMS Microbiology Letters 255: 33–42. 81. Mundim MH, Roitman I, Hermans MA, Kitajima EW (1974) Simple nutrition of Crithidia deanei, a reduviid trypanosomatid with an endosymbiont. The Journal of Protozoology 21: 518–521. 82. Newton BA (1956) A synthetic growth medium for the trypanosomid flagellate Strigomonas (Herpetomonas) oncopelti. Nature 177: 279–280. 83. Newton BS (1957) Nutritional requirements and biosynthetic capabilities of the parasitic flagellate Strigomonas oncopelti. Journal of General Microbiology 17: 708–717. 84. Camargo EP, Coelho JA, Moraes G, Figueiredo EN (1978) Trypanosoma spp., Leishmania spp. and Leptomonas spp.: enzymes of ornithine-arginine metabolism. Experimental Parasitology 46: 141–144. 85. Gill JW, Vogel HJ (1963) A Bacterial Endosymbiote in Crithidia (Strigomonas) oncopelti: Biochemical and Morphological Aspects. The Journal of Eukaryotic Microbiology 10: 148–152. 86. Marr JJ, Berens RL, Nelson DJ (1978) Purine metabolism in Leishmania donovani and Leishmania braziliensis. Biochimica et Biophysica Acta 544: 360–371. 87. Ceron CR, Caldas RD, Felix CR, Mundim MH, Roitman I (1979) Purine metabolism in trypanosomatids. The Journal of Protozoology 26: 479–483. 88. Berens RL, Krugg EC, Marr JJ (1995) Purine and Pyrimidine Metabolism. In: Marr JJ, Muller M, editors. Biochemistry and Molecular Biology of Parasites. London: Academic Press. 89–117. 89. Zimmermann H (2000) Extracellular metabolism of ATP and other nucleotides. Naunyn-Schmiedeberg9s Archives of Pharmacology 362: 299–309. 90. Plesner L (1995) Ecto-ATPases: identities and functions. International Review of Cytology 158: 141–214. PLOS ONE | www.plosone.org 19 April 2013 | Volume 8 | Issue 4 | e60209 Predicting Proteins of A. deanei and S. culicis 121. Schenkman S, Pascoalino Bdos S, Nardelli SC (2011) Nuclear Structure of Trypanosoma cruzi. Advances in Parasitology 75: 251–283. 122. Glover L, Horn D (2012) Trypanosomal histone gammaH2A and the DNA damage response. Molecular and Biochemical Parasitology 183: 78–83. 119. Jackson AP (2010) The evolution of amastin surface glycoproteins in trypanosomatid parasites. Molecular Biology and Evolution 27: 33–45. 120. Drummond AJ, Ashton B, Buxton S, Cheung M, A C, et al. (2011) Geneious v5.5. PLOS ONE | www.plosone.org 20 April 2013 | Volume 8 | Issue 4 | e60209 Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10 http://www.biomedcentral.com/1471-2180/13/10 RESEARCH ARTICLE Open Access Distinct genomic organization, mRNA expression and cellular localization of members of two amastin sub-families present in Trypanosoma cruzi Monica Mendes Kangussu-Marcolino1, Rita Márcia Cardoso de Paiva2, Patrícia Rosa Araújo2, Rondon Pessoa de Mendonça-Neto2, Laiane Lemos1, Daniella Castanheira Bartholomeu3, Renato A Mortara4, Wanderson Duarte daRocha1* and Santuza Maria Ribeiro Teixeira2* Abstract Background: Amastins are surface glycoproteins (approximately 180 residues long) initially described in Trypanosoma cruzi as particularly abundant during the amastigote stage of this protozoan parasite. Subsequently, they have been found to be encoded by large gene families also present in the genomes of several species of Leishmania and in other Trypanosomatids. Although most amastin genes are organized in clusters associated with tuzin genes and are up-regulated in the intracellular stage of T. cruzi and Leishmania spp, distinct genomic organizations and mRNA expression patterns have also been reported. Results: Based on the analysis of the complete genome sequences of two T. cruzi strains, we identified a total of 14 copies of amastin genes in T. cruzi and showed that they belong to two of the four previously described amastin subfamilies. Whereas δ-amastin genes are organized in two or more clusters with alternating copies of tuzin genes, the two copies of β-amastins are linked together in a distinct chromosome. Most T. cruzi amastins have similar surface localization as determined by confocal microscopy and western blot analyses. Transcript levels for δ-amastins were found to be up-regulated in amastigotes from several T. cruzi strains, except in the G strain, which is known to have low infection capacity. In contrast, in all strains analysed, β-amastin transcripts are more abundant in epimastigotes, the stage found in the insect vector. Conclusions: Here we showed that not only the number and diversity of T. cruzi amastin genes is larger than what has been predicted, but also their mode of expression during the parasite life cycle is more complex. Although most T. cruzi amastins have a similar surface localization, only δ-amastin genes have their expression up-regulated in amastigotes. The results showing that a sub-group of this family is up-regulated in epimastigotes, suggest that, in addition of their role in intracellular amastigotes, T. cruzi amastins may also serve important functions during the insect stage of the parasite life cycle. Most importantly, evidence for their role as virulence factors was also unveiled from the data showing that δ-amastin expression is down regulated in a strain presenting low infection capacity. Keywords: Trypanosoma cruzi, Amastigote, Amastin, mRNA * Correspondence: [email protected]; [email protected] 1 Departamento de Bioquímica e Biologia Molecular, Universidade Federal do Paraná, Rua Quinze de Novembro, 1299, Centro Curitiba, PR 80060-000, Brazil 2 Departamento de Bioquímica e Imunologia, Av. Antônio Carlos, 6627, Pampulha Belo Horizonte, MG 31270-901, Brazil Full list of author information is available at the end of the article © 2013 Kangussu-Marcolino et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10 http://www.biomedcentral.com/1471-2180/13/10 Background Trypanosoma cruzi, the protozoan parasite that is the etiologic agent of Chagas disease [1], undergoes four developmental stages during its complex life cycle: epimastigotes and metacyclic trypomastigotes, present in the insect vector, and intracellular amastigotes and bloodstream trypomastigotes, present in the mammalian host. This parasite must rely on a broad set of genes that allow it to multiply in the insect gut, to differentiate into forms that are able to invade and multiply inside a large number of distinct mammalian cell types and to circumvent the host immune system. To meet the challenges it faces during its life cycle, complex regulatory mechanisms must control the expression of the T. cruzi repertoire of about 12,000 genes. Among them, there are several large gene families encoding surface proteins, which are key players directly involved in host-parasite interactions (reviewed by Epting et al. [2]). The amastin gene family was initially reported as a group of T. cruzi genes encoding 174 amino acid transmembrane glycoproteins and whose mRNA are 60-fold more abundant in amastigotes than in epimastigotes or trypomastigotes [3]. The differential expression of amastin mRNAs during the T. cruzi life cycle has been attributed to cis-acting elements present in the 3’UTR as well as to RNA binding proteins that may recognize this sequence [4,5]. It is also known that amastin genes alternate with genes encoding a cytoplasmic protein named tuzin [6]. After the completion of the genome sequences of several Trypanosomatids it was revealed that the amastin gene family is also present in various Leishmania species as well as in two related insect parasites, Leptomonas seymouri and Crithidia spp [7-9]. It has also been reported that this gene family is actually much larger in the genus Leishmania when compared to other Trypanosomatids. Predicted topology based on sequences found in the genomes of L. major, L. infantum and T. cruzi indicates that all amastins have four transmembrane regions, two extracellular domains and N- and C-terminal tails facing the cytosol [8]. Moreover, comparative analyses of amastin genes belonging to six T. cruzi strains evidenced that sequences encoding the hydrophilic, extracellular domain, which is less conserved, have higher intragenomic variability in strains belonging to T. cruzi group II and hybrid strains compared to T. cruzi I strains [10]. Based on phylogenetic analyses of amastin orthologs from various Trypanosomatids, it has been proposed that amastins can be classified into four subfamilies, named α-, β-, γ-, and δ- amastins. Importantly, in L. major and L. infantum, in which members of all four sub-families are found, amastin genes showed differences in genomic positions and expression patterns of their mRNAs [8,9]. More than fifteen years after their discovery, the function of amastins remains unknown. Because of the predicted structure and surface localization in the intracellular stage Page 2 of 11 of T. cruzi and Leishmania spp, it has been proposed that amastins may play a role in host-parasite interactions within the mammalian cell: they could be involved in transport of ions, nutrients, across the membrane, or involved with cell signaling events that trigger parasite differentiation [9]. Its preferential expression in the intracellular stage also suggest that it may constitute a relevant antigen during parasite infection, a prediction that was confirmed by studies showing that amastins peptides elicit strong immune response during Leishmanial infection [11]. Amastin antigens are considered a relevant immune biomarker of cutaneous and visceral Leishmaniasis as well as protective antigens in mice [12]. Although complete genome sequences of two strains of T. cruzi (CL Brener and SylvioX-10) have been reported, their assemblies were only partially achieved because of their unusually high repeat content [13,14]. Therefore, for several multi-gene families, such as the amastin gene family, their exact number of copies is not yet known. According to the current assembly [15], only four δamastins and two β-amastins were identified in the CL Brener genome. Herein, we used the entire data set of sequencing reads from the CL Brener [13] and Sylvio X-10 [14] genomes, to analyzed all sequences encoding amastin orthologues present in the genomes of these two T. cruzi strains and determine their copy number as well as their genome organization. Expression of distinct amastin genes in fusion with the green fluorescent protein, allowed us to examine the cellular localization of different members of both amastin sub-families. By determining the levels of transcripts corresponding to each sub-family in all three parasite stages of various strains we showed that, whereas the levels of δ-amastins are up-regulated in amastigotes, β-amastin transcripts are significantly increased in the epimastigote insect stage. Most importantly, evidence indicating that amastins may constitute T. cruzi virulence factors was suggested by the analyses showing reduced expression of δ-amastins in amastigotes from strains known to have lower infection capacity. Results and discussion The amastin gene repertoire of Trypanosoma cruzi In its current assembly, the T. cruzi (CL Brener) genome exhibits 12 putative amastin sequences. Because of its hybrid nature and the high level of divergence between alleles, this genome was assembled as two set of contigs, each corresponding to one haplotype that were denominated Esmeraldo-like and non-Esmeraldo [13]. Therefore, the 12 amastin sequences annotated in the CL Brener genome database actually correspond to 6 pairs of alleles. Based on the analyses of amastin sequences present in the genomes of different species of Trypanosoma and Leishmania, as well as in two related insect parasites (Leptomonas seymouri and Crithidia spp.), Jackson (2010) [9] proposed a Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10 http://www.biomedcentral.com/1471-2180/13/10 classification into four amastin sub-families named α-, β-, γ- and δ-amastins. In the current annotation of the T. cruzi CL Brener genoma two genes that belong to the β-amastin sub-family and four genes belonging to the δ-amastin subfamily can be identified. A phylogenetic tree constructed with all 12 amastin sequences annotated in the CL Brener genome plus orthologous sequences obtained from the genome databases of the Sylvio X-10 strain and from the partial genome sequence of the Esmeraldo strain shows a clear division between β-amastin and δ-amastins sequences (Figure 1). The tree also revealed the presence, in all three genomes, of one divergent copy of δ-amastin which we identified, in the CL Brener genome, as the two alleles annotated as Tc00.1047053511071.40 and Tc00.1047053511903.50, named here as δ-Ama40 and δ-Ama50. It should be noted that, in the phylogeny proposed by Jackson (2010) [9], a group of δ-amastins that include all T. cruzi amastins as well as amastins from Crithidia spp, were grouped in a branch that was named proto-δ-amastins from which all Leishmania δ-amastins subsequently derived. It can also be depicted from the analyses described by Jackson (2010) [9] and the phylogenetic tree shown on Figure 1 that the two members of the β-subfamily, named β1-amastin and β2-amastin are highly divergent. Whereas among the CL Brener δ-amastins, if we exclude the two divergent alleles (δ-Ama40 and δ-Ama50), the percentage of identity ranges from 85% to 100% (See Additional file 1: Figure S1A), the average identities between the two CL Page 3 of 11 Brener β-amastins range from 25% (between the two copies belonging to the Esmeraldo-like haplotype) and 18% (between the two non-Esmeraldo β-amastins). Analyses of additional sequences corresponding to δ-amastins, which were obtained from the individual reads generated during the CL Brener genome sequencing (see next paragraph), also show a sequence variability ranging from 85 to 100% when compared to the previously described δ-amastins. Besides the low homology found between β- and δ-amastins, low sequence identity is also found between δ-Ama40 and δ-Ama50 with the other members of the δ-amastin sub-family. On the other hand, sequence identities between members of the β-amastins or between members of the δamastin sub-families range from 83% up to 99% even when we compare amastins from two phylogenetically distant strains such as CL Brener and Sylvio X-10 (Additional file 1: Figure S1A). In spite of the sequence divergence, an alignment of polypeptide sequences belonging to all amastin subfamilies shows increased amino acid conservation within the putative hydrophobic transmembrane domains. Within the predicted extracellular domains, two highly conserved cysteine and one tryptophan residues, that are part of the 10 amino acid “amastin signature” [8], may be critical for amastin function (Additional file 1: Figure S1B). On the other hand, the more variable sequences present in the two predicted extracellular, hydrophilic domains suggest that this portion of the protein, which, in Figure 1 Phylogenetic analyses of amastin sequences from different T. cruzi strains. Amastin amino acid sequences from CL Brener, Esmeraldo and Sylvio X-10 strains were used to generate a tree rooted with an α-amastin sequence from Crithidia sp. Bootstrap values followed by branch length are shown in the major basal nodes. Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10 http://www.biomedcentral.com/1471-2180/13/10 amastigotes, are in contact with the host cell cytoplasm, may interact with distinct host cell proteins. Because the assembly of CL Brener genome does not include its complete sequence, we conducted a read-based analysis to estimate the total number of amastin genes in this strain of the parasite. It is well known that the assembly of the CL Brener genome is only accurate for nonrepetitive regions, and for tandemly repeated genes, misassembles frequently occurred since most repetitive copies usually collapse into one or two copies. Therefore, we used the entire dataset of reads generated by the Tri-Tryp consortium to select reads containing sequences homologous to amastin and, based on a 13 × genome coverage [13], we estimated a total number of 14 copies of amastin genes, 2 β-amastins and 12 δ-amastins in the CL Brener genome. Similar analyses performed with sequencing reads generated by Franzen et al. (2011) [14] from the genome of Sylvio X-10 indicated a comparable number of copies in the genome of this T. cruzi I strain. In the current assembly of the CL Brener genome, amastin genes are shown to be organized in three loci on chromosomes 26, 32 and 34. Forty one pairs of homologous chromosomes (corresponding to the Esmeraldo-like and non-Esmeraldo haplotypes) have been assembled using the majority of the contigs and scaffolds generated by the Tri-Tryp consortium and inferences from synteny maps with the fully assembled T. brucei genome [15]. Based on the chromosome assemblies described by Weatherley et al. [15], three copies of δ-amastins are presented on chromosome 34 as a tandem array with alternating copies of tuzin genes. Interestingly, the divergent copy of δ-amastin (which has the Esmeraldo-like δ-Ama40 allele and the non-Esmeraldo allele δ-Ama50) is found as a single sequence linked to one tuzin pseudogene on chromosome 26. In a third chromosome, two copies of β-amastins are linked together without the association with tuzin genes. This gene organization is consistent with the analyses described by Jackson (2010) [9], who found tuzin genes associated only with δ-amastins. In order to confirm the proposed genomic organization in CL Brener genome and also to verify whether similar pattern of distribution of amastin genes occurs in other T. cruzi strains, we performed Southern blot hybridizations with chromosomal bands from CL Brener (a strain belonging to T. cruzi VI) as well as from G, Sylvio X-10 and Dm28c strains (all of them belonging to T. cruzi I) and Y strain (a T. cruzi II strain) separated by pulsed field gel electrophoresis. As shown in Figure 2A, the presence of two copies of β-amastins in a 900 kb chromosomal band, which is similar to the predicted size of chromosome 32 [15], has been confirmed in all T. cruzi strains. Using a probe specific for the δ-Ama40, we detected a chromosomal band of 800 kb, similar to the size of chromosome 26 in all strains except for the SylvioX-10, where we detected two Page 4 of 11 bands of similar sizes (Figure 2B). Since significant differences in sizes of homologous chromosomal bands in T. cruzi have been frequently described [16], it is possible that the two bands detected in SylvioX-10 correspond to size variation of chromosome 26 from this strain. Compared to β-amastins, the pattern of distribution of δ-amastins appears to be much more complex and variable: similar to CL Brener, in Dm28c and G strains, a probe specific for δ-amastin sub-family, which does not recognizes either β-amastins or δ-Ama40/50, hybridizes with sequences present in three chromosomal bands with approximately 1.1, 1.3 and 2.3 Mb (Figure 2C). In Sylvio X-10, Colombiana and Y strains, these sequences were found in only one or two chromosomal bands. Thus, our analyses indicates that, in addition to β-amastins, which are located in chromosome 32, members of the δ-amastin sub-family are scattered among at least 3 chromosomes in this parasite strain. Whether two of these chromosomes correspond to allelic pairs that have significant differences in size, still needs to be verified. This highly heterogeneous pattern of distribution of δ-amastin sequences is also in agreement with previous analyses described by Jackson (2010) [9], which suggest that δ-amastin sequences are apparently highly mobile. Based on analyses of genomic position as well as the phylogeny of Leishmania amastins, it was proposed that independent movements of δ-amastins genes occurred in the genomes of different Leishmania species. Also consistent with these previous analyses, when blots containing chromosomal bands were probed with a sequence encoding one of the tuzin genes, a pattern of hybridization similar to the pattern obtained with the δ-amastin probes was observed (Figure 2D). Thus, for most T. cruzi strains, our results are consistent with the existence of more than one cluster containing linked copies of δ-amastins and tuzin genes and an additional locus with two β-amastins linked together. However, a complete description of genomic organization of amastin genes could not be attained based solely on PFGE analyses and gene copy number estimations. Further analyses based on sequencing data generated from large inserts previously mapped on specific T. cruzi chromosomes are warranted to solve this question. Distinct patterns of amastin gene expression Because analyses of amastin gene expression have been limited to members of the δ sub-family and these studies have not been conducted with different strains of the parasite, we decided to evaluate by northern blotting the expression profiles of members of the δ- and β-amastin sub-families. We also decided to compare the expression levels of different amastin genes in parasite strains representative of T. cruzi I (Sylvio X-10 and G), T. cruzi II (Y) and in CL Brener (a T. cruzi VI strain). As shown in Figure 3, the levels of amastin transcripts derived from Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10 http://www.biomedcentral.com/1471-2180/13/10 Page 5 of 11 Figure 2 Genomic localization of amastin genes in different T. cruzi strains. Chromosomal bands from different T. cruzi strains, separated by Pulsed Field Gel Electrophoresis (PFGE) and transferred to membranes, were hybridized with 32P-labelled probes corresponding to β2-amastin (A), δ-Ama40 (B), δ-amastin (C) and tuzin genes (D). T. cruzi strains or clones are SylvioX-10 (Sylvio), Colombiana (Col.), G and Dm28c, Y and CL Brener (CLBr). Sizes of yeast chromosomal bands (Sc) are indicated on the left. CL Br E T A CL Br E T A δ-amastin CL Br Y G Sylvio E T A E T A E A E T A Sylvio CL Br β1-amastin Y E T A G E T A E A E T A δ-amastin (Ama 40) Y G Sylvio E T A E T A E A β2-amastin Y G Sylvio E T A E T A E A Figure 3 Amastin mRNA expression during the T. cruzi life cycle in different parasite strains. Total RNA was extracted from epimatigote (E), trypomastigote (T) and amastigote forms (A) from CL Brener, Y, G and Sylvio X-10. Electrophoresed RNAs (~10 μg/lane) were transferred to nylon membranes and probed with the 32P- labelled sequences corresponding to δ-amastin, δ-Ama40, β1- and β2-amastins (top panels). Bottom panels show hybridization of the same membranes with a fragment of the 24Sα rRNA. Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10 http://www.biomedcentral.com/1471-2180/13/10 δ- and β- sub-families are differentially modulated throughout the T. cruzi life cycle. Most importantly, clear differences in expression levels were found when different T. cruzi strains are compared: whereas in CL Brener , Y and Sylvio X-10 strains, transcripts of δamastins are up-regulated in amastigotes, as previously described in the initial characterization of amastins performed with the Tulahuen strain (also a T. cruzi VI strains) [6], the same was not observed with the G strain. Even though it presents a more divergent sequence and is transcribed from a different locus in the genome, the expression of δ-Ama40, similar to other δamastins, is also up-regulated in amastigotes in all strains analysed except in the G strain. In contrast, in all parasite strains, the expression of β1- and β2-amastin transcripts is up-regulated in epimastigotes. Similar to β2-amastin from CL Brener, two distinct δ-Ama40 transcripts with different sizes were detected in Y and G strains. It can be speculated that transcripts showing different sizes derived from δ-Ama40 and β2-amastin genes may result from alternative mRNA processing events. Recent reports on RNA-seq analyses indicated that alternative trans-splicing and poly-adenylation as a means of regulating gene expression and creating protein diversity frequently occur in T. brucei [17]. Current analyses of RNA-seq data will help elucidating mechanism responsible for the size variations observed for this sub-set of β- and δ-amastins. Moreover, the striking difference in the expression of δ-amastins observed in the G strain is also currently being investigated. Because G strain has been largely characterized as a low virulence strain [18], we speculated that members of the δ-amastins sub-family may constitute virulence factors that contributed to the infection capacity and parasite survival in the mammalian host. This hypothesis has been recently verified by experiments in which we over-expressed one δ-amastin gene in the G strain and showed that the transfected parasites have accelerated amastigote differentiation into trypomastigotes in in vitro infections as well as parasite dissemination in tissues after infection in mice [19]. It is also noteworthy that both β-amastins exhibited increased levels in epimastigotes of all strains analysed, indicating that this amastin isoform may be involved with parasite adaptation to the insect vector. These results are consistent with previous reports describing microarray and qRTPCR analyses of the steady-state T. cruzi transcriptome, in which higher levels of β-amastins were detected in epimastigotes compared to amastigotes and trypomastigote forms [20]. Similar findings were also described for one Leishmania infantum amastin gene (LinJ34.0730), whose transcript was detected in higher levels in promastigotes after five days in contrast to all other amastin genes that showed higher expression levels in Page 6 of 11 amastigotes [8]. The generation of knock-out parasites with the β-amastin locus deleted and pull-down assays to investigate protein interactions between the distinct T. cruzi amastins and host cell proteins will help elucidate the function of these proteins. Also, to investigate the mechanisms controlling the expression of the different sub-classes of amastins, sequence alignment of the 3’UTR sequences from β- and δamastins were done. Previous work has identified regulatory elements in the 3’ UTR of δ-amastins as well as in other T. cruzi genes controlling mRNA stability [46,21,22] and mRNA translation [23]. Since we observed that the two groups of amastin genes have highly divergent sequences in their 3’UTR (not shown), we are preparing luciferase reporter constructs to identify regulatory elements that might be present in the β-amastin transcripts as well as to identify the factors responsible for the differences observed in the amastin gene expression in distinct T. cruzi strains. Amastin cellular localization In our initial studies describing a member of the δ-amastin sub-family, we showed that this glycoprotein localizes in the plasma membrane of intracellular amastigotes [3]. Here we examine the cellular localizations of other members of the amastin family by transfecting epimastigotes of the CL Brener strain with the pTREXnGFP vector [24] containing sequences of two δ-amastins as well as β1- and β2amastins in fusion with GFP. Using GFP fusion protein we were able to examine the cellular localization of each individual member of the family. Also, since several attempts of expressing the recombinant form of the full length proteins have been largely unsuccessful, it was not possible to generate specific antibodies that could be used to detect unambiguously each member of the distinct amastin subfamilies. Confocal images of stably transfected epimastigotes, shown on Figure 4, demonstrated that, whereas GFP is expressed as a soluble protein present throughout the parasite cytoplasm, (Figure 4A-C) GFP fusions of β1- and δ-amastins are clearly located at the cell surface (Figure 4D-J). Interestingly, a distinct cellular localization, with a punctuated pattern in the parasite cytoplasm of GFP fusion of δ-Ama40 as well as a more disperse distribution within the cytoplasm of the β2- amastin GFP fusion, in addition to their surface localization was observed (Figure 4G-I and M-O) Although all amastin sequences present a N-terminal signal peptide domain, the δ-Ama40 and δ-Ama50 have a C-terminal peptide that is not present in other members of the amastin family (Additional file 2: Figure S2). In spite of these differences, all amastin sequences showed a cellular localization pattern that is consistent with the topology predicted for Leishmania amastins as transmembrane proteins [8], as well as with our in silico analyses which confirm the presence of four hydrophobic Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10 http://www.biomedcentral.com/1471-2180/13/10 Page 7 of 11 Figure 4 Subcellular localization of distinct amastins in fusion with GFP. Images from stable transfected epimastigotes of the CL Brener or G strains obtained by confocal microscopy using 1000x magnification and 2.2 digital zoom. In panels (A-C), parasites transfected with a vector containing only GFP; (D-F), parasites transfected with δ-amastinGFP; (G-I), parasites transfected with δ-Ama40GFP; (J-L), parasites transfected β1amastinGFP; (M-O), parasites transfected with β2-amastinGFP. DAPI staining are shown in panels (A, D, G, J and M); GFP fluorescence in panels (B, E, H, K and N) and merged images in panels (C, F, I, L and O). (Bar = 10 μm). Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10 http://www.biomedcentral.com/1471-2180/13/10 regions, a hallmark for all amastin sequences (Additional file 1: Figure S1B). To further examine their cellular localization, particularly for the δ-Ama40:GFP fusion, which may be associated with intracellular vesicles, we performed co-localization analysis with the glycosomal protein phosphoenolpyruvatecarboxykinase (PEPCK) in immunofluorescence assays. As shown by confocal images presented on Additional file 3: Figure S3, the GFP fusion protein does not co-localize with anti-PEPCK antibodies, indicating that the vesicles containing δ-Ama40 are not associated with glycosomal components. Finally, we also performed immunoblot analyses of sub-cellular fractions of the parasite and compared the presence of GFP-fusions in enriched membrane and soluble fractions of transfected epimastigotes (Figure 5). In agreement with the confocal analyses, the immunoblot results show that all four amastins that were expressed as GFP fusion proteins are presented in membrane enriched fractions. kDa T M C 45 δ-ama Page 8 of 11 Conclusions Taken together, the results present here provided further information on the amastin sequence diversity, mRNA expression and cellular localization, which may help elucidating the function of this highly regulated family of T. cruzi surface proteins. Our analyses showed that the number of members of this gene family is larger than what has been predicted from the analysis of the T. cruzi genome and actually includes members of two distinct amastin sub-families. Although most T. cruzi amastins have a similar surface localization, as initially described, not all amastins genes have their expression up-regulated in amastigotes: although we confirmed that transcript levels of δ-amastins are up-regulated in amastigotes from different T. cruzi strains, β-amastin transcripts are more abundant in epimastigotes than in amastigotes or trypomastigotes. Together with the results showing that, in the G strain, which is known to have lower infection capacity, expression of δ-amastin is down-regulated, the additional data on amastin gene expression presented here indicated that, besides a role in the intracellular, amastigote stage, T. cruzi amastins may also serve important functions in the insect stage of this parasite. Hence, based on this more detailed study on T. cruzi amastins, we should be able to test several hypotheses regarding their functions using a combination of protein interaction assays and parasite genetic manipulation. Methods 35 δ-ama40 45 β1-ama 45 30 β2-ama GFP Figure 5 Distribution of amastin proteins in the parasite membrane fractions. Immunoblot of total (T), membrane (M) and cytoplasmic (C) fractions of epimastigotes expressing δ-Ama, δAma40, β1- and β2-amastins in fusion with GFP. All membranes were incubated with α-GFP antibodies. Sequence analyses Amastin sequences were obtained from the genome databases of T. cruzi CL Brener, Esmeraldo and Sylvio X-10 strains [25,26]. The sequences, listed in Additional file 4: Table S1, were named according to the genome annotation of CL Brener or the contig or scaffold ID for the Sylvio X10/1 and. All coding sequences were translated and aligned using ClustalW [27]. Amino acid sequences from CL Brener, Esmeraldo, Sylvio X-10, and Crithidia sp (ATCC 30255) were subjected to maximum-likelihood tree building using the SeaView version 4.4 [28] and the phylogenetic tree was built using an α-amastin from Crithidia sp as root. Weblogo 3.2 was used to display the levels of sequence conservation throughout the protein [29]. Amino acid sequences from one amastin from each sub-family were used to predict trans membrane domains, using SOSUI [30] as well as signal peptide, using SignalP 3.0 [31]. For copy number estimations, individual reads from the genome sequence of T. cruzi CL Brener [13] were aligned by reciprocal BLAST against each amastin coding sequences. Unique reads showing at least 99.7% of identity were mapped on the CDS and the coverage for each nucleotide was determined. Coverage values were normalized Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10 http://www.biomedcentral.com/1471-2180/13/10 Page 9 of 11 through z-score and the copy numbers were determined after determining the ratios between z-score and the whole genome coverage. Parasite culture T. cruzi strains or clones, obtained from different sources, were classified according to the nomenclature and genotyping protocols described by [32]. Epimastigote forms of T. cruzi strains or clones Colombiana, G, Sylvio X-10, Dm28c, Y and CL Brener were maintained at 28°C in liver infusion tryptose (LIT) medium supplemented with 10% fetal calf serum (FCS) as previously described [3]. Tissue culture derived trypomastigotes and amastigotes were obtained after infection of LLC-MK2 or L6 cells with metacyclic trypomastigotes generated in LIT medium as previously described [3]. Pulse-field gel electrophoresis and Southern blot analyses Genomic DNA, extracted from 107epimastigotes and included in agarose blocks were separated as chromosomal bands by pulse-field gel electrophoresis (PFGE) using the Gene Navigator System (Pharmacia) as described by Cano et al. (1995) [33], with the following modifications: separation was done in 0.8% agarose gels using a program with 5 phases of homogeneous pulses (north/south, east/west) with interpolation for 135 h at 83 V. Phase 1 had pulse time of 90 s (run time 30 h); phase 2 120 s (30 h); phase 3200 s (24 h); phase 4 350 s (25 h); phase 5 800 s (26 h). Chromosomes from Saccharomyces cerevisiae (Bio-Rad) were used as molecular mass standards. Separated chromosomes were transferred to nylon filters and hybridized with 32P labelled probes prepared as described in the following section. RNA purification and Northern blot assays Total RNA was isolated from approximately 5 × 108 epimastigote, trypomastigote and amastigote forms using the RNeasyW kit (Qiagen) following manufacturer’s recommendations. RNA samples (15 μg/lane) were separated by denaturing agarose gel electrophoresis, transferred to Hybond-N+ membranes and hybridized with the 32P labeled fragments corresponding to each T. cruzi amastin sequence as described [3]. The probes used were PCR amplified fragments from total genomic DNA extracted from the CL Brener strain using primers described in Table 1, in addition to a PCR fragment generated by amplification of the insert cloned in plasmid TcA21 (corresponding to δ-amastin) and the 24Sα ribosomal RNA[6]. DNA fragments were labeled using the Megaprime DNA-labeling kit (GE HealthCare) according to the manufacturer’s protocol. All membranes were hybridized in a 50% formamide buffer for 18 h at 42°C and washed twice with 2X SSC/0.1% SDS at 42°C for 30 min each, as previously described [3]. The membranes were exposed to X-ray films (Kodak) or revealed using the STORM840 PhosphoImager (GE HealthCare). Plasmid constructions To express different amastin genes in fusion with GFP we initially constructed a plasmid named pTREXAmastinGFP. The coding sequence of the TcA21 cDNA clone [3] (accession number U04339) was PCR-amplified using a forward primer (5’-CATCTAGAAAGCAATGAGCAAAC-3’) and a reverse primer (5’-CTGGATCCCTAGCATACGCAGAA GCAC-3’) containing the XbaI and BamHI restriction sites (underlined in the primers), respectively. After digesting the PCR product with XbaI and BamHI, the fragment was ligated with the vector fragment of pTREX-GFP [24] that was previously cleaved with BamHI and XhoI. To generate the GFP constructions with other amastin genes, their corresponding ORFs were PCR-amplified using the primers listed in Table 1 and total genomic DNA that was purified from epimastigote cultures of T. cruzi CL Brener according to previously described protocols [3]. The PCR products were cloned initially into pTZ (Qiagen) and the amastin sequences, digested with the indicated enzymes, were purified from agarose gels with Illustra GFXTM PCR DNA and Table 1 Sequence of primers used to amplify amastin isoforms ORFs. Primer name / gene ID Primer Sequence (5’-3’) Restriction enzyme pδ1-amastin (F) Tc00.1047053511071.40 TTGTTCTAGAGTAGGAAGCAATG XbaI pδ1-amastin (R) Tc00.1047053511071.40 CGCTGGATCCGAACCACGTGCA BamHI β1-amastin (F) Tc00.1047053509965.390 CCTAGGAGGATGTCGAAGAAGAAG AvrII β1-amastin (R) Tc00.1047053509965.390 AGATCTCGAGCACAATGAGGCCCAG BglII β2-amastin (F) Tc00.1047053509965.394 TCTAGATGGGCTTCGAAACGCTTGC XbaI β2-amastin (R) Tc00.1047053509965.394 GGATCCCCAGTGCCAGCAAGAAGACTG BamHI The underlined sequences correspond to the restriction sites recognized by the restriction enzyme. Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10 http://www.biomedcentral.com/1471-2180/13/10 Gel Band Purification Kit (GE Healthcare). The fragment corresponding to the TcA21 amastin cDNA was removed from pTREXAmastinGFP after digestion with XbaI/BamHI and the fragments corresponding to the other amastin sequences were ligated in the same vector, generating pTREXAma40GFP, pTREXAma390GFPand pTREXAma394GFP. All plasmids were purified using QIAGEN plasmid purification kits and sequenced to confirm that the amastin sequences were properly inserted, in frame with the GFP sequence. Parasite transfections and fluorescence microscopy analyses Epimastigotes of T. cruzi CL Brener, growing to a density of 1 to 2 × 107 parasites/mL, were transfected as described by DaRocha et al., 2004 [24]. After electroporation, cells were recovered in 5 ml LIT plus 10% FCS 28°C for 24 h and analysed by confocal microscopy using the ConfocalRadiance2100 (BioRad) system with a 63/100x NA 1.4 oil immersion objective. To perform co-localization analyses, transfected parasites expressing amastin-GFP fusions were prepared for immunofluorescence assays by fixing the cells for 20 minutes in 4% PFA-PBS at room temperature. Parasites adhered to poly-L-lysine coverslips (Sigma) were permeabilized with 0.1% Triton X-100-PBS for 2 minutes, blocked with 4% BSA-PBS for 1 hour and incubated with primary antibodies (rabbit polyclonal antibody antiphosphoenolpyruvate carboxykinase (anti-PEPCK, kindly provided by Stenio Fragoso, Instituto Carlos Chagas, Curitiba, Brazil) in blocking solution (5.0% non-fat dry milk) for 1 hour followed by incubation with secondary anti-rabbit IgG conjugated with Alexa546. Samples were also stained with 0.1 μg / mL 4’,6-diamidino-2phenylindole dihydrochloride (DAPI, from Sigma) at room temperature for 5 min before confocal microscopy. Parasite membrane fractionation and western blot analyses Aproximately 109 epimastigotes growing at a cell density of 2 × 107 parasites/mL were harvest, washed with saline buffer (PBS) and ressuspended in lysis buffer (Hepes 20mM; KCl 10 mM; MgCl2 1,5 mM; sacarose 250 mM; DTT 1 mM; PMSF 0,1 mM). After lysing cells with five cycles of freezing in liquid nitrogen and thawing at 37°C, an aliquot corresponding to total protein (T) extract was collected. Total cell lysate was centrifuged at a low speed (2,000 × g) for 10 min and the supernatant was subjected to ultracentrifugation (100,000 × g) for one hour. The resulting supernatant was collected and analysed as soluble, cytoplasmic fraction (C) whereas the pellet, corresponding to the membrane fraction (M) was ressuspended in lysis buffer. Volumes corresponding to 10 μg of total parasite protein extract (T), cytoplasmic (C) Page 10 of 11 and membrane (M) fractions, mixed with Laemmli’s sample buffer, were loaded onto a 12% SDS–PAGE gel, transferred to Hybond-ECLTM membranes (GE HealthCare), blocked with 5.0% non-fat dry milk and incubated with anti-GFP antibody (Santa Cruz Biotechnology) or anti-PEPCK antibody, followed by incubation with peroxidase conjugated anti-rabbit IgG and the ECL Plus reagent (GE HealthCare). Additional files Additional file 1: Comparative sequence analysis of T. cruzi amastins. (Figure S1A) Percentages of amino acid identities among all T. cruzi amastin sequences present in the CL Brener and Sylvio X-10 genome databases. (Figure S1B) Conserved amino acid residues and conserved domains among sequences corresponding to all amastin genes present in the T. cruzi CL Brener genome are represented using the WebLogo software. The x axis depicts the amino acid position. The taller the letter the lesser the variability at the site. Predicted transmembrane domains are underlined. Additional file 2: Amino acid sequences of delta- and betaamastins. (Figure S2) Predicted amino acid sequences of one representative member of δ-amastin, δ-ama40, β1 and β2-amastins present in the T. cruzi CL Brener genome. Additional file 3: Subcellular localization of δ-Ama40 fused with GFP. (Additional file 3: Figure S3) Permeabilized, stable transfected CL Brener epimastigotes were incubated with anti-PEPCK antibody and a secondary antibody conjugated to Alexa546. GFP (panels A and D), Alexa 546 (B and E) and merged (C and F) fluorescent images were obtained by confocal microscopy of parasites expressing δ-Ama40GFP as described in Figure 4. (Bar = 10 μm). Additional file 4: Table S1. Amastin sequences presented in Figure 1. Competing interests The authors declare that they have no competing interests. Authors’ contributions MMK-M, LL and WDR carried out the molecular genetic studies, microscopy analyses, sequence alignments and phylogenetic analyses. RMCP and PRA participated in molecular genetic studies. RPM-Neto and DCB participated in the sequence and phylogenetic analyses. RAM participated in the microscopy analyses. WDR and SMRT designed and coordinated the study and drafted the manuscript. All authors have read and approved the final manuscript. Acknowledgements This study was supported by funds from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, Brazil), Fundação de Amparo a pesquisa do Estado de Minas Gerais (FAPEMIG, Brazil) and the Instituto Nacional de Ciência e Tecnologia de Vacinas (INCTV, Brazil). DCB, RAM and SMRT are recipients of CNPq fellowships; The work of WDDR, MMKM and LL is supported by Fundação Araucária, Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, Coordenação de Aperfeiçoamento de Pessoal de Nivel Superior (CAPES), PPSUS/MS and CNPq. Author details 1 Departamento de Bioquímica e Biologia Molecular, Universidade Federal do Paraná, Rua Quinze de Novembro, 1299, Centro Curitiba, PR 80060-000, Brazil. 2 Departamento de Bioquímica e Imunologia, Av. Antônio Carlos, 6627, Pampulha Belo Horizonte, MG 31270-901, Brazil. 3Departamento de Parasitologia Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Pampulha Belo Horizonte, MG 31270-901, Brazil. 4Departamento de Microbiologia, Imunologia e Parasitologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, Brazil, São Paulo 04021-001, Brazil. Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10 http://www.biomedcentral.com/1471-2180/13/10 Received: 1 October 2012 Accepted: 14 January 2013 Published: 17 January 2013 References 1. Brener Z: Biology of Trypanosoma cruzi. Annu Rev Microbiol 1973, 27:347–382. 2. Epting CL, Coates BM, Engman DM: Molecular mechanisms of host cell invasion by Trypanosoma cruzi. Exp Parasitol 2010, 126:283–291. 3. Teixeira SM, Russel DG, Kirchhoff LV, Donelson JE: A differentially expressed gene family encoding “amastin,” a surface protein of Trypanosoma cruzi amastigotes. J Biol Chem 1994, 269:20509–20516. 4. Coughlin BC, Teixeira SM, Kirchhoff LV, Donelson JE: Amastin mRNA abundance in Trypanosoma cruzi is controlled by a 3’-untranslated region position-dependent cis-element and an untranslated regionbinding protein. J Biol Chem 2000, 275:12051–12060. 5. Araújo PR, Burle-Caldas GA, Silva-Pereira RA, Bartholomeu DC, Darocha WD, Teixeira SM: Development of a dual reporter system to identify regulatory cis-acting elements in untranslated regions of Trypanosoma cruzi mRNAs. Parasitol Int 2011, 60:161–169. 6. Teixeira SM, Kirchhoff LV, Donelson JE: Post-transcriptional elements regulating expression of mRNAs from the amastin/tuzin gene cluster of Trypanosoma cruzi. J Biol Chem 1995, 270:22586–22594. 7. Wu Y, El Fakhry Y, Sereno D, Tamar S, Papadopoulou B: A new developmentally regulated gene family in Leishmania amastigotes encoding a homolog of amastin surface proteins. Mol Biochem Parasitol 2000, 110:345–357. 8. Rochette A, Mcnicoll F, Girard J, Breton M, Leblanc E, Bergeron MG, Papadopoulou B: Characterization and developmental gene regulation of a large gene family encoding amastin surface proteins in Leishmania spp. Mol Biochem Parasitol 2005, 140:205–220. 9. Jackson AP: The evolution of amastin surface glycoproteins in Trypanosomatid parasites. Mol Biol Evol 2010, 27:33–45. 10. Cerqueira GC, Bartholomeu DC, Darocha WD, Hou L, Freitas-Silva DM, Machado CR, El-Sayed NM, Teixeira SM: Sequence diversity and evolution of multigene families in Trypanosoma cruzi. Mol Biochem Parasitol 2008, 157:65–72. 11. Rafati S, Hassani N, Taslimi Y, Movassagh H, Rochette A, Papadopoulou B: Amastin peptide-binding antibodies as biomarkers of active human visceral Leishmaniasis. Clin Vaccine Immunol 2006, 13:1104–1110. 12. Stober CB, Langue UG, Roberts MT, Gilmartin B, Francis R, Almeida R, Peacock CS, McCann S, Blackwell JM: From genome to vaccines for Leishmaniasis: screening 100 novel vaccine candidates against murine Leishmania major infection. Vaccine 2006, 24:2602–2616. 13. El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, Tran AN, Ghedin E, Worthey EA, Delcher AL, Blandin G, Westenberger SJ, Caler E, Cerqueira GC, Branche C, Haas B, Anupama A, Arner E, Aslund L, Attipoe P, Bontempi E, Bringaud F, Burton P, Cadag E, Campbell DA, Carrington M, Crabtree J, Darban H, da Silveira JF, de Jong P, Edwards K, et al: The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science 2005, 309:409–415. 14. Franzén O, Ochaya S, Sherwood E, Lewis MD, Llewellyn MS, Miles MA, Andersson B: Shotgun sequencing analysis of Trypanosoma cruzi I Sylvio X10/1 and comparison with T cruzi VI CL Brener. PLoS Negl Trop Dis 2011, 5:984–993. 15. Weatherly DB, Boehlke C, Tarleton RL: Chromosome level assembly of the hybrid Trypanosoma cruzi genome. BMC Genomics 2009, 10:255–268. 16. Souza RT, Lima FM, Barros RM, Cortez DR, Santos MF, Cordero EM, Ruiz JC, Goldenberg S, Teixeira MMG, Silveira JF: Genome Size. Karyotype Polymorphism and Chromosomal Evolution in Trypanosoma cruzi. PLoS One 2011, 6:e23042. 17. Nilsson D, Gunasekera K, Mani J, Osteras M, Farinelli L, Baerlocher L, Roditi I, Ochsenreiter T: Spliced leader trapping reveals widespread alternative splicing patterns in the highly dynamic transcriptome of Trypanosoma brucei. PLoS Pathog 2010, 6(8):e1001037. 18. Yoshida N: Molecular basis of mammalian cell invasion by Trypanosoma cruzi. An Acad Bras Cienc 2006, 78:87–111. 19. Cruz MC, Souza-Melo N, Vieira-da-Silva C, DaRocha WD, Bahia D, Araújo PR, Teixeira SMR, Mortara RA: Trypanosoma cruzi: role of deltaamastin on extracellular amastigote cell invasion and differentiation. PLoS One 2012, 7:e51804. Page 11 of 11 20. Minning TA, Weatherly DB, Atwood J, Orlando R, Tarleton RL: The steady-state transcriptome of the four major life-cycle stages of Trypanosoma cruzi. BMC Genomics 2009, 10:370–385. 21. Araújo PR, Teixeira SM: Regulatory elements involved in the posttranscriptional control of stage-specific gene expression in Trypanosoma cruzi - A Review. Mem Inst Oswaldo Cruz 2011, 106:257–267. 22. Li ZH, De Gaudenzi JG, Alvarez VE, Mendiondo N, Wang H, Kissinger JC, Frasch AC, Docampo R: A 43-nucleotide U-rich element in 3’untranslated region of large number of Trypanosoma cruzi transcripts is important for mRNA abundance in intracellular amastigotes. J Biol Chem 2012, 287:19058–19069. 23. McNicoll F, Müller M, Cloutier S, Boilard N, Rochette A, Dubé M, Papadopoulou B: Distinct 3’-untranslated region elements regulate stage-specific mRNA accumulation and translation in Leishmania. J Biol Chem 2005, 280:35238–35246. 24. Darocha WD, Silva RA, Bartholomeu DC, Pires SF, Freitas JM, Macedo AM, Vazquez MP, Levin MJ, Teixeira SM: Expression of exogenous genes in Trypanosoma cruzi: improving vectors and electroporation protocols. Parasitol Res 2004, 92:113–120. 25. TriTryp DB: Kinetoplastid genomic resources Database. [http://triTrypdb. org/common/downloads/release-4.1/Tcruzi/fasta/TriTrypDB]. 26. Aslett M, Aurrecoechea C, Berriman M, Brestelli J, Brunk BP, Carrington M, Depledge DP, Fischer S, Gajria B, Gao X, Gardner MJ, Gingle A, Grant G, Harb OS, Heiges M, Hertz-Fowler C, Houston R, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Logan FJ, Miller JA, Mitra S, Myler PJ, Nayak V, Pennington C, Phan I, Pinney DF, et al: TriTrypDB: a functional genomic resource for the Trypanosomatidae. Nucleic Acids Res 2010, 38:457–462. 27. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal X version 20. Bioinformatics 2007, 23:2947–2948. 28. Gouy M, Guindon S, Gascuel O: SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol 2010, 27:221–224. 29. Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14:1188–1190. 30. Hirokawa T, Boon-Chieng S, Mitaku S: SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics 1998, 14:378–379. 31. Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 30. J Mol Biol 2004, 340:783–795. 32. Zingales B, Andrade SG, Briones MR, Campbell DA, Chiari E, Fernandes O, Guhl F, Lages-Silva E, Macedo AM, Machado CR, Miles MA, Romanha AJ, Sturm NR, Tibayrenc M, Schijman AG: A new consensus for Trypanosoma cruzi intraspecific nomenclature: second revision meeting recommends TcI to TcVI. Mem Inst Oswaldo Cruz 2009, 104:1051–1054. 33. Cano MI, Gruber A, Vazquez M, Cortés A, Levin MJ, Gonzalez A, Degrave W, Rondinelli E, Ramirez JL, Alonso C, Requena JM, Franco Da Silveira J: Molecular karyotype of clone CL Brener chosen for the Trypanosoma cruzi genome project. Mol Biochem Parasitol 1995, 7:273–278. doi:10.1186/1471-2180-13-10 Cite this article as: Kangussu-Marcolino et al.: Distinct genomic organization, mRNA expression and cellular localization of members of two amastin sub-families present in Trypanosoma cruzi. BMC Microbiology 2013 13:10. Genome sequence of a highly attenuate clone of Trypanosoma cruzi identifies SAPA repeats as a major virulence factor in this human parasite Rondon Pessoa Mendonça-Netoa, Caroline Junqueirab, Daniella C. Bartolomeu c, Wanderson D. daRochad, Monica Kangussu-Marcolino d, Gabriela F.R. Luizc, Viviane Santosa, Luiz Gonzaga Paula de Almeidae, Edmundo Grisardf, Ana Tereza Vasconcelose, Sergio Schenkmanf, Nilmar S. Morettif, Ricardo Tostes Gazzinellia,b,, and Santuza M.R. Teixeiraa* a Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil; b Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, 30190-002 Belo Horizonte, Minas Gerais, Brazil; c Departamento de Parasitologia, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil; d Departamento de Bioquímica e Biologia Molecular, Universidade Federal do Paraná, Curitiba, PR, Brazil; e Laboratório Nacional da Ciência da Computação, Patrópolis, RJ, Brazil; Laboratório de Protozoologia e Bioinformatica, Universidade Federal de Santa Catarina, Florianópolis, SC, Brazil; 138 Departamento de Microbiologia, Imunologia e Parasitologia, Universidade Federal de São Paulo, São Paulo, SP, Brazil * Corresponding author: Santuza M.R. Teixeira Departamento de Bioquimica e Imunologia, ICB, Universidade Federal de Minas Gerais Av. Antônio Carlos 6627, 31270-901, Belo Horizonte, MG, Brasil Tel : +55(31) 3409-2665; FAX: 55(31) 3409 2614 E-mail: [email protected] 139 Abstract Trypanosoma cruzi, the etiologic agent of Chagas disease, belongs to a group of organisms with a peculiar genome in which a massive expansion of surface protein gene families is present and a large proportion of it is devoted to repetitive sequences. The completion of the CL Brener reference strain genome reveals several new features related to the parasite virulence. CL-14 is an avirulent clone derived from the same T. cruzi CL strain, however, in contrast to CL Brener, CL-14 is neither infective nor pathogenic in vivo, even when infecting newborn or immune deficient mice. To investigate the molecular determinants of T. cruzi virulence, we performed a direct comparison of the CL Brener and CL-14 genomes, based on the available CL Brener sequences and sequences we generated from CL-14 using the 454 FLX platform. Although both genomes were not fully assembled, we found that they have highly similar nuclear genome organization, almost 100% identical mitochondrial maxi-circle kDNA, similar numbers of predicted coding sequences as well as number of copies of members of multi-gene families. PCR analyses as well as phylogenetic inferences showed that CL-14 is also a hybrid strain that belongs to the same DTU as CL Brener (TcVI). Southern blot analyses indicate a similar karyotype and, for most multigenic families, sequence identity among the two clones is higher than 99%. The only major difference detected between these two genomes is related to a sub-group of the large Trans-Sialidase gene family (TcTS), known to have a C-terminal domain with 12amino-acid repeats called „shed acute phase antigen‟ or SAPA repeats. At least three copies of TcTS containing a repetitive domain varying from 19 to 41 repeats, which are highly immunogenic and promote an increase in the half-life of sheded TcTS protein, are present in the CL Brener genome, whereas in CL-14, only one copy containing 3 140 SAPA repeats was identified. This reduced amount of SAPA repeats in the CL-14 TcTS, confirmed by southern and western blot analyses, may constitute one of the factors responsible for the differences in virulence between these two strains. Key words: Trypanosoma cruzi, genome, CL-14, trans-sialidase, virulence 141 Introduction Trypanosoma cruzi is the etiological agent of Chagas disease, a malady affecting at least 8 million people throughout Latin America and for which there are only two drugs available, both with poor efficacy and harmful side effects(WHO, 2010). T. cruzi infection begins with metacyclic trypomastigotes that are released in the feces by the triatominae vector, during a blood meal. After reaching the host bloodstream through skin cuts and mucosa, trypomastigotes invade different cell types in the mammalian host. Once in the cytoplasm, they differentiate into replicative and non-flagellate amastigotes, which undergo several rounds of binary division, before differentiating again into trypomastigotes and bursting the host cell. Bloodstream trypomastigotes can be ingested by the vector during another blood meal where they differentiate into epimastigotes and replicate in the insect gut (Brener , 1973). The T. cruzi population is highly heterogeneous, composed of a pool of strains with distinct characteristics. This striking intra-specific variation has been extensively documented by molecular analyses and biological characterization, which showed distinct morphology, growth rate, curves of parasitemia, virulence, sensitivity to drugs,antigenic profile, metacyclogenesis and tissue tropism (reviewed by Buscaglia and DiNoia, 2003). Various studies on the genetic diversity observed among different isolates recently converged to a classification that proposes the existence of six major groups in the parasite population, also known as discrete typing units (DTUs) T. cruzi I to VI (Zingales et al., 2009). These divergent lineages occupy distinct ecological environments: T. cruziI strains are more frequently associated with the silvatic cycle whereas T. cruzi II strains are part of the domestic cycle of Chagas disease and are more frequently isolated from chronic chagasic patients (Buscaglia and DiNoia, 2003). 142 Although resulting from predominant clonal evolution, several evidences indicate that genetic exchange between parasites has occurred in the past [Buscaglia and DiNoia, 2003; Gaunt et al., 2003; Freitas et al., 2006). Among the strains that are products of hybridization events is the CL Brener, the reference strain chosen for the T. cruzi genome project. The complete sequence of the T.cruzi CL Brener genome, with an estimated haploid size of 55 Mb and about 12,000 genes, revealed a highly repetitive genome (ElSayed et al., 2005) with protein coding genes organized in long, uni-directional polycistronic transcription units. Because of its hybrid nature and repetitive content, which prevented its complete assembly, the CL Brener genome is represented by two datasets of contigs, each corresponding to one haplotype (El-Sayed et al, 2005, Weatherly et al., 2009). To help identifying the sequences belonging to each haplotype, reads from the genome of the cloned Esmeraldo strain, a member of T. cruzi II, which represents one of the CL Brener parental strain (de Freitas et al., 2006), were generated. Thus, in the annotation data of the CL Brener genome, the two haplotypes, which were assembled separately, are referred as “esmeraldo-like” or “non-esmeraldo-like” (ElSayed et al., 2005, Aslettet al., 2010). Because, in trypanosomatids, chromosomes do not condense during mitosis, karyotype analyses based on pulse field gel electrophoresis and genome assembly based on the synteny with the Trypanosoma brucei genome estimate the total number of CL Brener chromosomes in 30 to 41 pairs (Cano et al., 1995, Weatherly et al., 2009). At the same time the first T. cruzi genome was published, draft sequences of the genomes of two other human pathogens, members of the Trypanosomatid family, Trypanosoma brucei and Leishmania major, were also disclosed (Berrimanet al., 2005, 143 Ivens et al., 2005). Soon thereafter, other species of Leishmania and another T. brucei sub-species had their genomes sequenced (Peacock et al., 2007;Jackson et al.,2010, Raymond et al., 2011, Rogers et al., 2011). A draft genome sequence of Sylvio X-10, a strain belonging to T. cruzi group I, which is the predominant agent of Chagas disease in Central America and in the Amazon region has also been published (Franzén et al., 2011). Although rarely isolated from humans in endemic areas in Southern countries of Latin America where most cases of Chagas disease with mega-syndromes are found, T. cruzi I strains are highly abundant among wild hosts and vectors (Zingales et al., 1998, Buscaglia and DiNoia, 2003). The Sylvio X10 genome was found to be smaller and with several gene families encoding surface molecules presenting fewer copies compared to the CL Brener genome (Fránzen et al., 2011, Andersson, 2011). Here we described the sequence analysis of the highly attenuated CL-14 clone, which, similarly to the CL Brener, was derived from the CL strain of T. cruzi (Lima et al., 1990). In contrast to CL Brener, CL-14 is neither infective nor pathogenic in vivo, even when infecting newborn (Soares et al., 2003) or immune deficient, CD8 -/- mice (Junqueira et al., 2011) that are otherwise highly susceptible to T. cruzi infection. Although inoculation of CL-14 in adult animals results in no parasitaemia and detectable tissue parasitism (Lima et al., 1995), it prevents the development of parasitemia and mortality after challenge with the virulent CL strain (Lima et al., 1999, Soares et al., 2003). Importantly, since vaccination with live CL-14 induces a potent and long-lasting parasite-specific antibody and T-cell mediated immunity against challenge with highly virulent strains of T. cruzi, the immunological adjuvant properties of the CL-14 clone has being explored as a possible vaccine vector for induction of T cell mediated immunity against other diseases (Junqueira et al., 2011, 2012). Aiming at 144 investigating the molecular basis of the non-virulent phenotype of the CL-14, Atayde and co-workers (2004) found that that the expression of gp82, a stage-specific glycoprotein involved in infection in vivo and host cell invasion in vitro, was greatly reduced on the surface of metacyclic forms of CL-14. After performing a direct comparison of the CL Brener and CL-14 genomes, we found that both genomes are highly similar, with no substantial differences in genome organization, total number of predicted coding sequences and in the number of copies among multi-gene families. The absence of major differences at the genome level warrants for further studies focusing on gene expression to identify changes in the mRNA population or in the proteome that could explain the differences in virulence between the two strains. Materials and Methods Parasite cultures and DNA sequencing Epimastigotes were cultured at 28oCin liver infusion tryptose (LIT) medium as described by Camargo (1964). Three genomic libraries were constructed with total DNA purified from epimastigotes, two of them with 5 g of total DNA and using the shotgun method and the third one constructed using the paired end - 3 kb span - method with 500 g of DNA. Each library was sequenced individually by high-throughput pyrosequencing (Roche-454 FLX Titanium). Genome assembly and sequence analyses Whole genome assembly was carried using both ab initio methods and by comparative genomic analyses with T. cruzi CL Brener genome, using the Newbler assembler and Perl scripts. Gene prediction and annotation were performed using Gene145 MarkS (Besemer et al., 2001) and best reciprocal BLAST hit to CL Brener sequences. Individual genes were identified using reciprocal BLASTp and tBLASTn on unassembled reads. A total of 3'457'102 individual reads totaling 1,506,882,872 nucleotides that were parsed to extract low quality sequences was submitted to the different analyses. Sequence alignments were created using Megablast and Clustalw. CL-14 contigs were aligned against CL Brener coding sequences with Megablast without lowcomplexity filter to mask repetitive sequences. Those alignments were parsed using Perl (v5.10.1) and BioPerl (v1.6) scripts, to accept only reciprocal best hits where the HSPs should have 95% of reads lengths, and 90% of identity. Aligned sequences from CL-14 were used to perform multiple and global alignments in Clustalw with IUB score matrix, within each gene group. From those alignments, overhangs were extracted and the results were arranged in phenotype trees by neighbor-join algorithm by MEGA software (v4). A search for the three known classes of immunostimulatory CpG DNA motifs (26) using the fuzznuc algorithm (EMBOSS package) was performed using the individual reads as described previously (Bartholomeu et al., 2009). Phylogenetic analyses and PCR amplifications In silico PCR analyses were performed with individual reads and then confirmed by PCR amplifications using total DNA from CL-14 as template and gel electrophoresis of PCR products. Two nuclear markers, mini-exon or Spliced Leader (SL) (Burgos et al., 2007) and ribosomal subunit 24S (Souto et al., 1996) and one mitochondrial marker, cytochrome oxidase subunit II (COII) were used to determine the classification of CL-14 as described by Freitas et al., (2006). The e-PCR software (Schuler, 1997), allowing up to 2 mismatches and 2 gaps, was used to search for primer sequences F146 AAGGTGCGTCGACAGTGTGG and R-TTTTCAGAATGGCCGAACAGT corresponding to the ribosomal subunit 24S CGTACCAATATAGTACAGAAACTG and and the primer sequences FR-CTCCCCAGTGTGGCCTGGG corresponding to miniexon genes. Analysis of the COII sequences was performed by in silico PCR using F-CCATATATTGTTGCATTATT and R- TTGTAATAGGAGTCATGTTT followed by in silico digestion of PCR products with AluI restriction enzyme. PCR amplifications were confirmed with DNA extracted from epimastigote cultures of strains representatives of T. cruzi groups I-VI and two biological samples of CL-14. PCR reactions were performed with 0.75 U of GoTaqDNA polymerase (Promega) and buffer containing 1.5 mM MgSO4, 40 M dNTPsand 10 pM of each primer. PCR products obtained with the COII primers were digested with AluI, and all products subjected to electrophoresis on 6% polyacrylamide gel followed by silver staining. Pulse-field gel electrophoresis and Southern blot analyses Epimastigotes were included in agarose blocks as described by [27]. Pulse-field gel electrophoresis (PFGE) was carried out as reported by [28]. Chromosomes from Hansenula wingei (Bio-Rad) were used as molecular mass standards. Separated chromosomes were transferred to nylon filters and hybridized with 32P labeled probes as described previously (Teixeira et al., 1995). 147 Results and Discussion Genome sequencing and comparative karyotyping Using the 454 technology and whole genome shot-gun sequencing we generated a total of 1,507 Mb of sequences derived from 3,457,102 reads from three CL-14 genomic libraries and performed a comparative analysis with the genome sequences of T. cruzi CL-Brener (Table 1). Based on a haploid nuclear genome size estimated in 55 Mb (Souza et al., 2011), the total nucleotide sequenced corresponds to 27 x coverage of the CL-14 genome. A similar genome size was estimated for the CL Brener clone (ElSayed et al., 2005) and a comparison of chromosomal bands separated by pulse field gel electrophoresis analysis showed a similar pattern between CL-14 and CL Brener (Fig 1A). The 60Mb haploid nuclear genome predicted for the CL Brener, estimation based on the sequencing data as well as on fluorescent staining (Souza et al., 2011), is only slightly larger than the genome size estimation for CL-14. Both genomes are however, significantly larger than the genome of a T. cruzi I strain, Sylvio X10, which has 5.9 Mb less of haploid sequences (Franzén et al. 2011). Most differences that account for the reduced size of the Sylvio X10 genome are concentrated in the copy number of members belonging to large gene families. The estimated GC content of 51 %, based on the total reads of the CL-14 genome is also similar to the CL Brener genome but it is higher than the GC content of the Sylvio X10 (48 %) genome. A draft sequence assembly of the CL-14 genome results in a total of 43,906 contigs (Table 1). Such large number of contigs was expected since over 50% of this parasite genome consists of repeated sequences which also hampered the complete assembly of CL Brener genome, which was also based on a whole genome shot gun sequencing strategy. The haploid CL Brener genome has an estimated number of 148 12,000 genes organized in long clusters that are polycistronic transcribed (El-Sayed et al., 2005). Preliminary analyses of assembled CL-14 contigs indicate a similar number of genes with a similar genomic organization. Because of the larger number of contigs, it was not possible to make an accurate prediction of a total number of genes since a vast number of open reading frames were found to be truncated. Moreover, as indicated below, similar to CL Brener, CL-14 has a hybrid genome constituted by two distinct haplotypes. Since the assembly tools did not discriminate between the two haplotypes, we decided to conduct all further analyses described in the next sessions solely based on sequencing data generated from the reads and not from assembled contigs. However, to investigate the existence of changes in the karyotype or the presence of large chromosomal rearrangements, we hybridized chromosomal bands that were separated by PFGE with different probes. A few changes in chromosomal mapping of gp82 genes have been reported by Atayed et al., 2004, who identified the presence of two chromosomal bands hybridizing with a gp82 probe in the CL-14 clone which are absent in the CL isolate. However, since the CL isolate may contain a mix population of different clones, we decided to compare the molecular karyotype of CL-14 and the CL Brener clone. The results shown on figure 1A and 1B indicate that CL-14 and CL Brener chromosomes have similar patterns and also hybridize with GPI8, MASP, Amastin and DGF-1 sequences in a similar way. Although a few differences could be observed with the chromosomes containing MASP sequences, all 27 members of the Amastin gene family are equally organized in the CL-14 and CL Brener genomes, indicating that no major rearrangements are found between these two genomes. 149 Phylogenetic analysis To determine which T. cruzi group CL-14 belongs to, we analyzed sequences corresponding to 24S subunit of the ribosomal DNA (rDNA) and the Spliced Leader (SL) gene clusters as well as sequences corresponding to the mitochondrial gene cytochrome oxidase II (COII). Electronic PCR were performed using primers specific for these sequences and the sizes of the generated amplicons using the CL-14 reads as template were compared with the expected sizes for the corresponding amplicons derived from genomic sequences from strains representative of all six T. cruzi DTUs. For the cytochrome oxidase II (COII) amplicon, we compared the sizes of the products of Alu I digestion of the amplicons. As shown in Table 1, a comparison of the fragments resulting from the amplification of the 24S rDNA and SL markers indicated that CL-14 must be classified as T. cruzi II, since it presents amplicons with 150 bp for the SL and 125 bp for the 24S rDNA markers. However, PCR products corresponding to the mitochondrial COII gene results in two amplicons of 81 and 294 bp after Alu I digestion, which is characteristic of strains belonging to T. cruzi III, IV, V or VI. Taken together, these results as well as the results that are described in the next section, indicate that similar to CL Brener, CL-14 is a hybrid strain and must be classified as T. cruzi VI. Since the two clones were isolated from the same strain and based on the fact that the mitochondrial marker corresponds to a T. cruzi III, we hypothesize that the CL14, is derived from the same hybridization event that occurred between ancestral strains belonging to T. cruzi II and III, which, similar to CL Brener, has retained a T. cruzi III mitochondrion. The results obtained from the in silico analyses were confirmed by in vitro amplification of DNA purified from epimastigote cultures of CL-14 and CL 150 Brener using primers that amplify the SL, the 24S rDNA and COII (Supplementary Fig 1). In addition of the analyses of rDNA and SL genes, we performed sequence alignments of two nuclear single copy genes, msh2 and trypanothione reductase (TR) genes as well as one mitochondrial gene, COII. The results showed in figure 2 confirmed our prediction that CL-14 is very close phylogenetically to CL Brener and that sequences belonging to the two distict haplotypes (esmeraldo and non-esmeraldolike) are present in the CL-14 genome. Sequence alignments between 392,310 reads from the CL-14 genome that correspond to coding regions and coding sequences corresponding to both CL Brener haplotypes showed that 175,612 ( 44.3%) have a best alignment with the Esmeraldo haplotype, 185,497 reads (47.3 %) with non-Esmeraldolike haplotype. For a total of 31,201 reads (7.95%), it was not possible to distinguish between the two haplotypes. Mitochondrial maxicircle genome assembly Members of the kinetoplastid family have a mitochondrial genome organized in a peculiar organelle known as the kinetoplast DNA (kDNA). T. cruzi kDNA consists of thousands of variable, concatenated minicircles with 0.5 – 5.0 kb and dozens of concatenated maxicircles with approximately 20 kb, from which 15 kb corresponds to coding region sequences (Ruvalcaba-Trejo and Sturm, 2011; Junqueira et al., 2005). Most maxicircle genes contains open reading frame (ORF) frameshifts, which are corrected at the RNA level by a complex process of Uridine insertions and deletions known as RNA editing which depends 151 on gRNAs encoded mainly by minicirclesequences (Hajduk et al., 1993). A total of 1724 reads was found to align with maxi-circle sequences derived from the CL Brener. A consensus sequence generated from assembled 14 contigs is shown in Fig 3. The assembly and annotation of the CL14 maxi-circle shows it has approximately 20.6 Mb and contains, besides the 12 S and 9S ribosomal RNA genes (rRNA), all 18 open reading frames previously identified in the maxicircle of CL Brener and Esmeraldo strains. The assembly of CL-14 maxicircle also showed that these three mitocondrial genomes show a high level of synteny. RNA editing is a hallmark of genes encoded by trypanosome mitochondrial maxicircle DNA. Open reading frame (ORF) analyses of maxicircle DNA from the three T. cruzi strains showed that 9 genes are extensively edited and 3 genes have smaller changes in their ORF due to insertion and deletions of uridines. Alignment analyses of CL-14 maxicircle genes with Cl Brener sequences indicates that a similar number of genes undergoes RNA editing and are thus depend on this pos-transcriptional modification to generate functional mitochondrial mRNAs. Comparative analyses of multigene families After searching for the total 22,570 protein-coding genes predicted in the CL Brener genome, we found that all of them are present in the CL-14 genome, thus indicating that the gene content of both genomes is highly similar. For only three genes, Tc00.1047053506215.10 and Tc00.1047053511215.90, annotated as hypothetical proteins and Tc00.1047053509351.4, a ribosomal protein L38, a lower coverage (less than 95%) from CL-14 reads were found. We thus decided to investigate whether differences in the number of copies in the various multigene families may underlie the phenotypic differences observed among these strains. Comparative analyses based on 152 sequence reads presenting more than 97.5% identity to several of the gene families described in the CL Brener genome, showed in Table 2, indicated that, in addition of having a similar group of genes, no large differences in the copy number of members of multi-gene families are found between the two genomes. Differences in the sequences encoding SAPA repeats of trans-sialidases During the process of aligning the reads against the reference CL Brener genome, we noticed one major difference regarding a sub-group of the large Trans-Sialidase gene family (TcTS). Members of the TcTS group I are known to have a C-terminal domain with 12-amino-acid repeats called „shed acute phase antigen‟ or SAPA repeats. At least three copies of TcTS containing a repetitive domain varying from 19 to 41 repeats, which are highly immunogenic and promote an increase in the half-life of sheded TcTS protein, are present in the CL Brener genome, whereas in CL-14 only one copy containing 3 SAPA repeats was identified. This reduced amount of SAPA repeats in the CL-14 TcTS was confirmed by southern and western blot analyses using probes containing SAPA sequences and anti-SAPA antibodies, respectively (Fig. 4-A and C). We also confirmed this deletion in the sequences encoding the C-terminal SAPA repeats by PCR amplification using primers annealing in the flanking regions of this repeats and DNA purified from CL Brener and CL-14. As shown in Fig 4-B, whereas a discrete band with only 500 nucleotides was detected with CL-14, a large smear corresponding to sequences containing different sizes of the large repetitive domain present in CL Brener was generated after PCR. By eliciting a strong humoral response, TcTS containing SAPA repeats are considered virulence factors involved with mechanisms developed by the parasite to evade the host immune response. The lack of a 153 large repetitive domain in the TcTS of CL-14 may thus be one of the factors that could explain the differences in virulence between these two strains. Acknowledgments This work is supported by funds from CAPES, CNPq, Fundação de Amparo a Pesquisa do Estado de Minas Gerais- FAPEMIG (Brazil) and the Instituto Nacional de Ciencia e Tecnologia de Vacinas (INCTV). References Atayde VD, Neira I, Cortez M, Ferreira D, Freymüller E, Yoshida N. Molecular basis of non-virulence of Trypanosoma cruzi clone CL-14. Int J Parasitol. 34(7):851-60. 2004. Berriman, M.; Ghedin, E.; Hertz-Fowler, C.; Blandin, G.; Renauld, H.;Bartholomeu, C. C.; Lennard, N. J.; Caler E. et al. The genome of the African trypanosome Trypanosoma brucei, Science 309, pp. 416–422, 2005. Besemer J, Lomsadze A, Borodvsky Mark GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Research 29: 2607-2618. 2001. Brener Z . Biology of Trypanosoma cruzi. Annu Rev Microbiol 27:347-382. 1973 Burgos, J.M.et al. Direct molecular profiling of minicircle signatures and lineages of Trypanosoma cruzi bloodstream populations causing congenital Chagas disease, International Journal of Parasitology 37 (12), pp. 1319–1327, 2007. 154 Buscaglia CA and Di Noia JM . Trypanosoma cruzi clonal diversity and the epidemiology of Chagas' disease. Microbes Infect 5:419-427. 2003. Camargo, E.P., Growth and Differentiation in Trypanosoma Cruzi. I. Origin of Metacyclic Trypanosomes in Liquid Media. Rev Inst Med Trop Sao Paulo, 6: p. 93-100.1964. El-Sayed, N.M., et al., The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science, 309(5733): p. 409-15. 2005. Franzén, O.; Ochaya, S.; Sherwood, E.; Lewis, M. D.; Llewellyn, M. S. et al. Shotgun Sequencing Analysis of Trypanosoma cruzi I Sylvio X10/1 and Comparison with T. cruzi VI CL Brener. PLoS Negl Trop Dis 5(3): e984, 2011. Freitas, J. M.; Augusto-Pinto, L.; Pimenta, J. R.; Bastos-Rodrigues, L.; Gonçalves, V. F.; Teixeira, S. M.; Chiari,E.; Junqueira, A. C.; Fernandes, O.; Macedo, A. M.; Machado, C. R.; Pena, S. D. Ancestral genomes, sex and the population structure of Trypanosoma cruzi. PLoS Pathog 2: e24, 2006. Gaunt, MW, Yeo, M, Frame, IA, Stothard, JR, Carrasco, HJ, Taylor, MC, S.S. Mena, P. Veazey, G.A.J. Miles, N. Acosta, A.R. Arias, M.A. Miles. Mechanism of genetic exchange in American trypanosomes, Nature 241: 936-939.2003. Ivens, A.C.; Peacock, C. S.;Worthey, E. A.; Murphy, L.; Aggarwal, G.; Berriman, M.; Sisk, E.; Rajandream, M. A. et al. The genome of the kinetoplastid parasite, Leishmania major. Science 309, pp. 436–442, 2005. Jackson, A. P.; Sanders, M.; Berry, A.; McQuillan, J.; Aslett, M. A.; Quail, M. A.; Chukualim, B.; Capewell, P.; MacLeod, A.; Melville, S. E.; Gibson, W.; Barry, J. D.; Berriman, M.; Hertz-Fowler, C.The genome sequence of Trypanosoma brucei gambiense, causative agent of chronic human african trypanosomiasis. PLoS Negl Trop Dis. 4:e658, 2010. 155 Junqueira C, Santos LI, Galvão-Filho B, Teixeira SM, Rodrigues FG, DaRocha WD, Chiari E, Jungbluth AA, Ritter G, Gnjatic S, Old LJ, Gazzinelli RT. Trypanosoma cruzi as an effective cancer antigen delivery vector. Proc Natl Acad Sci U S A.108(49):19695-700., 2011 Junqueira, C.; Gerrero, A. T.; Galvão-Filho, B.; Andrade, W. A.; Salgado, A. P.; Cunha, T. M.; Robert, C.; Campos, M. A.; Penido, M. L.; Mendonça-Previato, L.; Previato, J. O.; Ritter, G.; Cunha, F. Q.; Gazzinelli, R. T.; Trypanosoma cruzi adjuvants potentiate T cell-mediated immunity induced by a NY-ESO-1 based antitumor vaccine. Plos One, vol. 7, 2012 Lima, M. T.; Jansen, A. M.; Rondinelli, E.; Gattass, C. R. Trypanosoma cruzi: properties of a clone isolated from the CL strain. Parasitol. Res., 77: 77-81, 1990. Lima, M. T.; Lenzi, H. L.; Gattass, C. R. Negative tissue parasitism in mice injected with a non-infective clone of Trypanosoma cruzi. Parasitol. Res. 81: 6-12, 1995. Machado, CM and Ayala, FJ (2001) Nucleotide sequences provide evidence of genetic exchange among distantly related lineages of Trypanosoma cruzi. Proc. Natl. Acad.Sci.U.S.A. 98:7396-7401, 2001. Peacock, C. S.; Seeger, K.; Harris, D.; Murphy, L.; Ruiz, J. C.; Quail, M. A.; Peters, N.; Adlem, E.; Tivey, A. et al. Comparative genomic analysis of three Leishmania species that cause diverse human disease. Nat Genet. 39(7):839-47, 2007. Raymond F., Boisvert S., Roy G., et al.; Genome sequencing of the lizard parasite Leishmania tarentolae reveals loss of genes associated to the intracellular stage of human pathogenic species. Nucleic Acids Res. 40:1131-47.2012. 156 Soares, M. B.; Goncalves, R.; et al; Balanced immunized with an avirulent cytokine-producing pattern in mice Trypanosoma cruzi. An Acad Bras Cienc, p. 167-172, 2003. Souza, R. T.; Lima, F. M.; Barros, R. M.; Cortez, D. R.; Santos, M. F.; Cordero, E. M.; Ruiz, J. C.; Goldenberg, S.; Teixeira, M. M. G.; Franco da Silveira, J.; Genome Size, Karyotype Polymorphism and Chromosomal Evolution in Trypanosoma cruzi. PLoS ONE 6(8): e23042, 2011. Teixeira, S.M., L.V. Kirchhoff, and J.E. Donelson, Post-transcriptional elements regulating expression of mRNAs from the amastin/tuzin gene cluster of Trypanosoma cruzi. J Biol Chem. 270(38): p. 22586-94. 1995 Weatherly, D.B., C. Boehlke, and R.L. Tarleton, Chromosome level assembly of the hybrid Trypanosoma cruzi genome. BMC Genomics, 10: p. 255. 2009. Westenberger, S. J.; Cerqueira, G. C.; El-Sayed, N. M.; Zingales. B.; Campbell, D. A.; Sturm, N. R. Trypanosoma cruzi mitochondrial maxicircles display species- and strain-specific variation and possess a conserved element in the non-coding region. BMC Genomics. 7: 2164-7-60, 2006. WHO, Chagas disease (American trypanosomiasis) fact sheet., in Weekly epidemiological record. 2010, World Health Organization: Geneva. p. 334 - 336. Zingales B, Stolf BS, Souto RP, Fernandes O, Briones MR. Epidemiology, biochemistry and evolution of Trypanosoma cruzi lineages based on ribosomal RNA sequences. Mem Inst Oswaldo Cruz. 94:159–164. 1999 Zingales B,et al.; Second Satellite Meeting. A new consensus for Trypanosoma cruzi intraspecific nomenclature: second revision meeting recommends TcI to TcVI.Mem Inst Oswaldo Cruz. 104:1051-1054.2009 157 Table 1: Summary of sequencing data for the CL Brener and CL-14 genomes. 158 T. cruzi COII SL 24S rDNA Tc I 30, 81 and 264 150 110 Tc II 81, 82 and 212 150 125 Tc III 81 and 294 200 110 Tc IV 81 and 294 200 125 Tc V 81 and 294 150 110 and 125 Tc VI 81 and 294 150 125 CL-14 81 and 294 150 125 groups Table 2 – Molecular marker profiles derived from amplifications of two nuclear (SL and rRNA) and one mitochondrial (COII) genes in different T. cruzi strains. Expected fragment lengths for strains belonging to each T. cruzi group and for the in silico generated products using CL-14 sequences are shown in base-pairs. 159 Multigene Family CL-14 CL Brener Identity % Trans-sialidase 1463 1481 99.80 MASP 1399 1465 99.87 Mucin 999 992 97.82 RHS 773 777 99.74 DGF 565 569 99.84 GP63 491 449 99.73 RNA helicase 156 157 99.68 Kinesin 102 102 98.78 Tuzin 83 83 99.76 Cruzain (calpain) 67 66 99.05 Dynei heavy chain 45 45 99.35 Amastin 27 27 99.69 GAPDH 21 20 99.74 MSH2 2 2 100 PGP 2 2 100 Table 3 – Number of predicted copies of members of multigene families and average identity between homologous genes. 160 A) B) Figure 1: Chromosomal bands separation and Southern blot analysis of T. cruzi CL-14 and CL Brener strains. Panel A shows ethidium staining of pulse field gel separation of chromosomal bands isolated from CL-14 and CL Brener epimastigotes. Hybridizations were performed with GPI8 and MASP DNA probes, a single copy gene and multigênica family, respectively. Panel B shows digested bands that hybridized with a 32Plabelled probe corresponding to a member of the amastin gene family from CL Brener and DGF gene family, each. 161 Figure 2: Unrooted neighbour-joining trees based on predicted amino acid sequences of the Trypanothione reductase (TR), the mismatch repair protein MSH2 and cytochrome oxidase (COII) obtained from the genome databases of T. cruzi CL Brene, CL-14 and Sylvio X-10 clones sequences. For the two nuclear genes (TR and MSH2) sequences corresponding to the two alleles (esmo like and non-esmo like) are shown. Bootstrap values were calculated over 1000 trees from pseudo-replicate datasets. 162 \ Figure 3: A representation of the T. cruzi CL-14 maxicircle with all annotated 18 protein coding genes, 2 ribosomal RNA (rRNA) genes and the repeptitive region. 163 100� bp� 500� 400� 300� 200� 100� CL� Br� Try� CL� Br� Epi� CL-14Try� CL-14� Epi� � CL� Br� Try� CL� Br� Epi� CL-14Try� CL-14� Epi� � CL� Br� Try� CL� Br� Epi� CL-14Try� MW� C CL-14� Epi� � ~500� bp� CL� Br� CL-14� � B A 177� –� 118� –� � � 75� -� ���� 51� -� � � 39� -� � � 26� -� � � 18� –� KDa� Comassie Anti-SAPA Anti-TS Figure 4 – In vitro analysis to check SAPA repeats amounts. Panel A shows southern blots of digested DNA hybridized with SAPA probes. Panel B shows PCR results of SAPA amplification. Panel C shows western blots from total protein electrophoresis with anti-TS or antiSAPA antibodies. 164 300 200 125 100 rDNA 110 300 200 150 200 SL 100 300 200 294 264 212 81 100 COII Supplementary figure 1: PCR amplification of microssatellite and maxicircle sequences from CL-14 and CL Brener. 165