Extracción de interacciones regulatorias transcripcionales de bacterias a partir de literatura biomédica utilizando inteligencia artificial
DOI:
https://doi.org/10.22201/dgtic.26832968e.2025.13.49Palabras clave:
inteligencia artificial, procesamiento de lenguaje natural, extracción de información, extracción de interacciones regulatorias, regulación transcripcionalResumen
Las redes de regulación transcripcional (TRNs) de bacterias brindan una visión global de los mecanismos de respuesta de éstas a los cambios en su ambiente. El estudio de estas redes permite ampliar el conocimiento biológico y derivar en investigaciones con implicaciones clínicas o farmacéuticas. Sin embargo, la reconstrucción de TRNs se hace tradicionalmente de forma manual mediante un proceso demandante y costoso de curación de artículos científicos. En este trabajo, describimos la aplicación de aproximaciones de inteligencia artificial (IA), específicamente la afinación de transformers pre-entrenados BERT, para la extracción automática de TRNs de la literatura. Con 1562 oraciones de entrenamiento de la bacteria Escherichia coli, comparamos seis arquitecturas tipo BERT. El mejor modelo afinado obtuvo resultados significativos (F1-score: 0.8685, coeficiente de correlación de Matthews: 0.8163). Con este modelo, extrajimos correctamente el 82% de una TRN de la bacteria Salmonella utilizando 264 artículos completos. El factor de transcripción PhoP fue relevante en la red por obtener el mayor valor de conexiones (degree=180) por lo que analizamos biológicamente su comunidad de genes. Este trabajo muestra el uso de la IA para facilitar la extracción de conocimiento biológico que podría ser utilizado por estudios futuros en el área biomédica.
Descargas
Citas
[1] X. Fang, A. Sastry, N. Mih, D. Kim, J. Tan, J.T. Yurkovich, C.J. Lloyd, Y. Gao, L. Yang, and B.O. Palsson, “Global transcriptional regulatory network for Escherichia coli robustly connects gene expression to transcription factor activities,” in Proc. Natl. Acad. Sci., vol. 114, no. 38, pp. 10286-10291 Sep. 2017. Available: https://doi.org/10.1073/pnas.1702581114.
[2] J.A. Freyre-Gonzalez and L.G. Trevino-Quintanilla, “Analyzing regulatory networks in bacteria,” Nat. Educ., vol. 3, no. 24, 2010. [Online]. Available: https://www.nature.com/scitable/topicpage/analyzing-regulatory-networks-in-bacteria-14426192/. [Accessed Feb. 28, 2025].
[3] E. Balleza, L.N. López-Bojorquez, A. Martínez-Antonio, O. Resendis-Antonio, I. Lozada-Chávez, Y.I. Balderas-Martínez, S. Encarnación, and J. Collado-Vides, “Regulation by transcription factors in bacteria: beyond description,” FEMS Microbiol. Rev., vol. 33, no.1, pp. 133-151, Jan. 2009. Available: https://doi.org/10.1111/j.1574-6976.2008.00145.x.
[4] M.M. Babu and S. A. Teichmann, “Evolution of transcription factors and the gene regulatory network in Escherichia coli,” Nucleic Acids. Res., vol. 31, no. 4, pp. 1234-1244, Feb. 2003. Available: https://doi.org/10.1093/nar/gkg210.
[5] K. Yamamoto, “The hierarchic network of metal-response transcription factors in Escherichia coli,” Biosci. Biotechnol. Biochem., vol. 78, no. 5, pp. 737-747, May 2014. Available: https://doi.org/10.1080/09168451.2014.915731.
[6] C. Dalldorf, Y. Hefner, R. Szubin, J. Johnsen, E. Mohamed, G. Li, J. Krishnan, A.M. Feist, B.O. Palsson, and D.C. Zielinski, “Diversity of Transcriptional Regulatory Adaptation in E. coli,” Mol. Biol. Evol., vol. 41, no. 11, p. msae240, Nov. 2024. Available: https://doi.org/10.1093/molbev/msae240.
[7] L. Praski Alzrigat, D.L. Huseby, G. Brandis, and D. Hughes,” Fitness cost constrains the spectrum of marR mutations in ciprofloxacin-resistant Escherichia coli,” J. Antimicrob. Chemother, vol. 72, no. 11, pp. 3016-3024, Aug. 2017. Available: https://doi.org/10.1093/jac/dkx270.
[8] D.H. Shah, “RNA sequencing reveals differences between the global transcriptomes of Salmonella enterica serovar enteritidis strains with high and low pathogenicities,” Appl. Environ. Microbiol., vol. 80, no. 3, Feb. 2014. Available: https://doi.org/10.1128/AEM.02740-13.
[9] H. Salgado, S. Gama-Castro, P. Lara, C. Mejia-Almonte, G. Alarcón-Carranza, A.G. López-Almazo, F. Betancourt-Figueroa, P. Peña-Loredo, S. Alquicira-Hernández, D. Ledezma-Tejeida, L. Arizmendi-Zagal, F. Mendez-Hernandez, A.K. Diaz-Gomez, E. Ochoa-Praxedis, L.J. Muñiz-Rascado, J.S. García-Sotelo, F.A. Flores-Gallegos, L. Gómez, C. Bonavides-Martínez, V.M. Del Moral-Chávez, A.J. Hernández-Alvarez, A. Santos-Zavaleta, S. Capella-Gutierrez, J. L. Gelpi, and J. Collado-Vides, “RegulonDB v12.0: a comprehensive resource of transcriptional regulation in E. coli K-12,” Nucleic Acids Res., vol. 52, no. D1, p. D255-D264, Jan. 2024. Available: https://doi.org/10.1093/nar/gkad1072.
[10] P. D. Karp, S. Paley, R. Caspi, A. Kothari, M. Krummenacker, P. Midford, L.R. Moore, P. Subhraveti, S. Gama-Castro, V. Tierrafria, P. Lara, L. Muñiz-Rascado, C. Bonavides-Martinez, A. Santos-Zavaleta, A. Mackie, G. Sun, T.A. Ahn-Horst, H. Choi, M.W. Covert, J. Collado-Vides, and I. Paulsen “The EcoCyc Database (2023),” EcoSal Plus, vol. 11, no. 1, pp. 1-22, Dec. 2023. Available: https://doi.org/10.1128/ecosalplus.esp-0002-2023.
[11] E. Galán-Vásquez, B.C. Luna-Olivera, M. Ramírez-Ibáñez, and A. Martínez-Antonio, “RegulomePA: a database of transcriptional regulatory interactions in Pseudomonas aeruginosa PAO1,” Database (Oxford), vol. 2020, p. baaa106, Dec. 2020. Available: https://doi.org/10.1093/database/baaa106.
[12] M. Olbei, B. Bohar, D. Fazekas, M. Madgwick, P. Sudhakar, I. Hautefort, A. Métris, J. Baranyi, R.A. Kingsley, and T. Korcsmaros, “Multilayered Networks of SalmoNet2 Enable Strain Comparisons of the Salmonella Genus on a Molecular Level,” MSystems, vol.7, no.4, p. e01493-21, Aug. 2022. Available: https://doi.org/10.1128/msystems.01493-21.
[13] J.M. Escorcia-Rodríguez, A. Tauch, and J.A. Freyre-González. ”Abasy atlas v2.2: The most comprehensive and up-to-date inventory of meta-curated, historical, bacterial regulatory networks, their completeness and system-level characterization,” Comput. Struct. Biotechnol. J., vol. 18, pp. 1228-1237, Jan. 2020. Available: https://doi.org/10.1016/j.csbj.2020.05.015.
[14] S.R. Davies, “Working in biocuration: contemporary experiences and perspectives,” Database (Oxford), vol. 2025, p. baaf003, Feb. 2025. Available: https://doi.org/10.1093/database/baaf003.
[15] National Center for Biotechnology Information (NCBI), “Pubmed Central,” National Library of Medicine (US). [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/. [Accessed Feb. 28, 2025].
[16] S. Burge, T.K. Attwood, A. Bateman, T.Z. Berardini, M. Cherry, C. O'Donovan, L. Xenarios, and P. Gaudet. , “Biocurators and biocuration: surveying the 21st century challenges,” Database (Oxford), vol. 2012, p. bar059, Mar. 2012. Available: https://doi.org/10.1093/database/baaf003.
[17] J. Guallar, L. Codina and E. Abadal, “La investigación sobre curación de contenidos: análisis de la producción académica,” Ibersid., vol. 14, no. 1., pp. 13-22, Jun. 2020. Available: https://repositori.upf.edu/handle/10230/52168.
[18] I.S. for Biocuration: “Distilling data into knowledge,” PLoS Biol., vol. 16, no. 4, p. 2002846, Apr. 2018. Available: https//doi.org/10.1371/journal.pbio.2002846.
[19] L. Bornmann, R. Haunschild, and R. Mutz, “Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases,” Humanit. and Soc. Sci. Commun., vol. 8, no. 1, pp. 1-15, Oct. 2021. Available: https://doi.org/10.1057/s41599-021-00903-w.
[20] S. Ananiadou, P. Thompson, R. Nawaz, J. McNaught, and D. B. Kell, “Event-based text mining for biology and functional genomics,” Briefings Funct. Genomics, vol. 14, no. 3, pp. 213–230, May. 2014. Available: https://doi.org/10.1093/bfgp/elu015.
[21] C.C. Huang and Z. Lu, “Community challenges in biomedical text mining over 10 years: success, failure and the future,” Briefings Bioinf., vol. 17, no. 1, pp. 132–144, Jan. 2016. Available: https://doi.org/10.1093/bib/bbv024.
[22] F. He, K. Liu, Z. Yang, M. Hannink, R.D, Hammer, M. Popescu, and D. Xu, “Applications of cutting-edge artificial intelligence technologies in biomedical literature and document mining,” Med. Rev., vol. 3, no. 3, pp. 200-204, Jun. 2023. Available: https://doi.org/10.1515/mr-2023-0011.
[23] M.A. Thorat and R, Balasubramanian, “Breast cancer prevention in high-risk women,” Best Pract. Res. Clin. Obstet. Gynaecol., vol. 65, pp. 18-31, May 2020. Available: https://doi.org/10.1016/j.bpobgyn.2019.11.006.
[24] S. Zhao, C. Su, Z. Lu, and F. Wang, “Recent advances in biomedical literature mining,” Briefings Bioinf., vol. 22, no. 3, May 2020. Available: https://doi.org/10.1093/bib/bbaa057.
[25] L. Hong, J. Lin, S.Li, F. Wan, H. Yang, T. Jiang, D. Zhao, and J. Zeng. “A novel machine learning framework for automated biomedical relation extraction from large-scale literature repositories”. Nat. Mach. Intell., vol. 2, no. 6, pp. 347-355, Jun. 2020. Available: https://doi.org/10.1038/s42256-020-0189-y.
[26] M. Zhun, B. Celikkaya, P. Bhatia, and C.K. Reddy, “Latte: Latent type modeling for biomedical entity linking,” In Proc. of the AAAI Conf. on Artificial Intelligence, vol. 34, no. 05, pp. 9757-9764, Apr. 2020. Available: https://doi.org/10.1609/aaai.v34i05.6526.
[27] F. Villena, P. Báez, S. Peñafiel, M. Rojas, I. Paredes, and J. Dunstan, “Developing and Validating an Automatic Support System for Tumor Coding in Pathology Reports in Spanish,” JCO Clin. Cancer Inform., vol. 9, p. e2400124, 2025. Available: https://doi.org/10.1200/CCI.24.00124.
[28] E. Jung, H. Jain, A.P. Sinha, and C. Gaudioso, “Building a specialized lexicon for breast cancer clinical trial subject eligibility analysis,” Health Informatics J., vol. 27, no. 1, Jan. 2021. Available: https://doi.org/10.1177/1460458221989392.
[29] J.M. Czarnecki and A.J. Shepherd, “Metabolic Pathway Mining,” Methods Mol. Biol., vol. 2, pp. 139-158, Nov. 2016. Available: https://doi.org/10.1007/978-1-4939-6613-4_8.
[30] Created in BioRender. Available: https://www.BioRender.com/
[31] J. Devlin, M.W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” In Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, Association for Computational Linguistics, vol. 1, pp. 4171–4186, May. 2019. Available: https://doi.org/10.18653/v1/N19-1423.
[32] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” [Online]. 2018. Available: https://cdn.openai.com/researchcovers/languageunsupervised/language_understanding_paper.pdf. [Accessed Feb. 28, 2025].
[33] C. Mottesi, “BERT vs GPT: Comparing the two most popular language models”, 29 August, 2024. [Online]. Available: https://blog.invgate.com/gpt-3-vs-bert. [Accessed Feb. 28, 2025].
[34] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C.H. So, and J. Kang, “BioBERT: a pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234-1240, Feb. 2020. Available: https://doi.org/10.1093/bioinformatics/btz682.
[35] M. Yasunaga, J. Leskovec, and P. Liang, “LinkBERT: Pretraining Language Models with Document Links,” In S. Muresan, P. Nakov, A. Villavicencio, Eds. Proc. of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland: Association for Computational Linguistics, vol. 1, pp. 8003–8016, Mar. 2022. Available: https://aclanthology.org/2022.acl-long.551.
[36] H.C. Shin, Y. Zhang, E. Bakhturina, R. Puri, M. Patwary, M. Shoeybi, and R. Mani, “BioMega-tron: Larger Biomedical Domain Language Model,” In B. Webber, T. Cohn, Y. He, Y. Liu, Eds. Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Online: Association for Computational Linguistics, pp. 4700–4706, Oct. 2020. Available: https://aclanthology.org/2020.emnlp-main.379.
[37] S. Gururangan, A. Marasovic, S. Swayamdipta, K. Lo, I. Beltagy, D. Downey, and N.A. Smith, “Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks,” In D. Jurafsky, J. Chai, N. Schluter, J. Tetreault, Eds. Proc, of the 58th Annual Meeting of the Association for Computational Linguistics, Online: Association for Computational Linguistics, pp. 8342–8360, Jul. 2020. Available: https://aclanthology.org/2020.acl-main.740.
[38] I. Yamada, A. Asai, H. Shindo, H. Takeda, and Y. Matsumoto, “LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention,” In B. Webber, T. Cohn, Y. He, Y. Liu, Eds. Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Online: Association for Computational Linguistics, pp. 6442–6454, Nov. 2020. Available: https://aclanthology.org/2020.emnlp-main.523.
[39] V.H. Tierrafría, C. Rioualen, H. Salgado, P. Lara, S. Gama-Castro, P. Lally, L. Gómez-Romero, P. Peña-Loredo, A.G. López-Almazo, G. Alarcón-Carranza, F. Betancourt-Figueroa, S. Alquicira-Hernández, J.E. Polanco-Morelos, J. García-Sotelo, E. Gaytan-Nuñez, C.F. Méndez-Cruz, L.J. Muñiz, C. Bonavides-Martínez, G. Moreno-Hagelsieb, J.E Galagan, J.T. Wade, and J. Collado-Vides, “RegulonDB 11.0: Comprehensive high-throughput datasets on transcriptional regulation in Escherichia coli K-12,” Microb. Genom., vol. 8, no. 5, p. 000833, May 2022. Available: https://doi.org/10.1099/mgen.0.000833.
[40] A. Fàbrega and J. Vila, “Salmonella enterica serovar Typhimurium skills to succeed in the host: virulence and regulation,” Clin. Microbiol. Rev., vol. 26, no. 2, pp. 308-341, Apr. 2013. Available: https://doi.org/10.1128/cmr.00066-12.
[41] J.S., Gunn and S.I. Miller, “PhoP-PhoQ activates transcription of pmrAB, encoding a two-component regulatory system involved in Salmonella typhimurium antimicrobial peptide resistance,” J. Bacteriol. vol. 178, no 23, pp. 6857–6864, Dec. 1996. Available: https://doi.org/10.1128/jb.178.23.6857-6864.1996.
[42] C.D. Manning, M. Surdeanu, J. Bauer, J.R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP Natural Language Processing Toolkit,” in K. Bontcheva and J. Zhu, Eds. Proc. of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, Maryland: Association for Computational Linguistics, pp. 55-60, Jun. 2014. Available: https://aclanthology.org/P14-5010/.
[43] A. Varela-Vega, A.B. Posada-Reyes, and C.F. Méndez-Cruz,” Automatic extraction of transcriptional regulatory interactions of bacteria from biomedical literature using a BERT-based approach,” Database (Oxford), vol. 2024, p. baae094, Aug. 2024. Available: https://doi.org/10.1093/database/baae094.
[44] D. Otasek, J.H. Morris, J.Boucas, A.R. Pico, and B. Demchank, “Cytoscape automation: empowering workflow-based network analysis,” Genome Biol., vol. 20, pp.1-15, Sep. 2019. Available: https://cytoscape.org/. [Accessed Feb. 28, 2025].
[45] H. Mi, A. Muruganujan, J.X. Huang, D. Ebert, C. Mills, X. Gou, and P.D. Thomas, “Protocol Update for large-scale genome and gene function analysis with the PANTHER classification system (v.14.0),” Nat. Protoc., vol. 14, no. 3, pp. 703-721, Feb. 2019. Available: https://doi.org/10.1038/s41596-019-0128-8.
[46] E.A. Groisman, A. Duprey, and J. Choi, “How the PhoP/PhoQ system controls virulence and Mg2+ homeostasis: lessons in signal transduction, pathogenesis, physiology, and evolution,” Microbiol. Mol. Biol. Rev., vol. 85, no. 3, p. e0017620, Aug. 2021. Available: https://doi.org/https://doi.org/10.1128/mmbr.00176-20
[47] T. Barrett, S.E. Wilhite, P. Ledoux, C. Evangelista, I.F. Kim, M. Tomashevsky, K.A. Marshall, K.H. Phillippy, P.M. Sherman, M. Holko, A. Yefanov, H. Lee, N. Zhang, C.L. Robertson, N. Serova, S. Davis, and A. Soboleva, “NCBI GEO: archive for functional genomics data sets–update,” Nucleic. Acids. Res., vol. 41, pp. D991-D995, Nov. 2012. Available: https://doi.org/10.1093/nar/gks1193.
[48] A. Varela-Vega, M. Ávila-García, A. León Mendoza, A.B. Posada-Reyes, Víctor H. Tierrafría, C.F. Méndez-Cruz, and J. Collado-Vides, “Automatic annotation of growth conditions from Escherichia coli experiments deposited in the Gene Expression Omnibus,” Presented at the 19th Conf. on Computational Intelligence methods for Bioinformatics and Biostatistics (CIBB 2024), Benevento, Italy, 2024.

Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2025 Ali-Berenice Posada-Reyes, Carlos-Francisco Méndez-Cruz, Sara Berenice Martínez-Luna, Alfredo Varela-Vega

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.
TIES, Revista de Tecnología e Innovación en Educación Superior, es una publicación semestral de acceso abierto bajo la licencia Creative Commons Atribución-No Comercial 4.0 Internacional (CC BY-NC 4.0).
ISSN 22683-2968 • © 2025 Universidad Nacional Autónoma de México. TIES, Revista de Tecnología e Innovación en Educación Superior es editada por la Universidad Nacional Autónoma de México a través de la Dirección General de Cómputo y de Tecnologías de Información y Comunicación (DGTIC). Circuito exterior s/n, Ciudad Universitaria, Alcaldía Coyoacán, C.P. 04510, Ciudad de México, México • Reserva de Derechos de Autor otorgado por INDAUTOR: 04-2019-011816190900-203.
El contenido de los artículos es responsabilidad de los autores y no refleja el punto de vista del Comité editorial, del Editor o de la Universidad Nacional Autónoma de México. Hecho en México, 2025.