【Rosalind】Translating RNA into Protein

最新推荐文章于 2025-04-27 20:00:00 发布

原创最新推荐文章于 2025-04-27 20:00:00 发布 · 264 阅读

0 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#生物信息学

Rosalind 专栏收录该内容

10 篇文章

订阅专栏

该博客介绍了如何利用Python的Bio.Seq模块将mRNA序列翻译成对应的蛋白质序列。通过读取输入文件中的mRNA字符串，使用Bio.Seq.translate()方法并设置to_stop=True参数，可以获取到遇到第一个终止密码子前的蛋白质序列。输出结果会被写入到指定的输出文件中。

题目描述

Problem

The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings.

The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet.

Given: An RNA string $s$ corresponding to a strand of mRNA (of length at most 10 kbp).

Return: The protein string encoded by $s$ .

Sample Dataset

AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA

Sample Output

MAMAPRTEINSTRING

题目大意

给定一个mRNA序列，将其翻译成蛋白质

题解

使用Bio.Seq.translate()方法将mRNA序列翻译成蛋白质序列。当遇到终止密码子时程序会输出’*’，如果希望在遇到第一个终止密码子时停止翻译，需要向该方法中传入参数：to_stop = True，此时遇到第一个终止密码子会停止翻译并且不输出’*’

参考代码

from Bio.Seq import Seq
from Bio.Alphabet import IUPAC

with open("rosalind_prot.txt", "r") as f:
	s = f.read().rstrip()
	f.close()
mRNA = Seq(s, IUPAC.unambiguous_rna)
with open("out.txt", "w") as f:
	f.write(str(mRNA.translate(to_stop = True)))
	f.close()