循环两个文件并计算第一个文件行中第二个文件的字符串出现次数
我需要生成一些单词(A,T,G,C)的排列实际上是二元合成的核苷酸(如AA,AT AG,AC),三合成(AAA,AAT,AAC AAG),四元,五元等(一次一个),然后检入另一个文件,其中包含一些值的序列,每个排列出现的计数。我生成了排列列表。现在,我只需要循环遍历序列(从值中拆分序列),以计算上面生成的每个排列,并在新文件中获得输出。但我只得到了一个序列的答案,其他序列没有。
我试图遵循的节目逻辑是:
在文件1中生成ATCG的排列(例如AT AG AC AA...)
读取生成的文件1和Sequence#Value文件(DNA_seq_val.txt)
读取序列并将序列与值分开
循环排列的序列,并在结果文件中打印它们的匹配值(每个值用逗号分隔)。
输入测试文件名为DNA_seq_val.txt
AAAATTTT#99
CCCCGGGG#77
ATATATCGCGCG#88
*我得到的产出是--
2,0,1,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT
77 CCGGGG
88 ATATCGCGCG
所需输出为2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT
X,x,77 CCCCGGGGx
X、x、88 ATATATCGCGCG
(其中x=相应的计数与第一行相同)
选择 | 换行 | 行号
- from itertools import product
- import os
- f2 = open('TRYYY', 'a')
- #********Generate the permutations start********
- per = product('ACGT', repeat=2) # ATGC =nucleotides; 2= for di ntd(replace 2 with 3 fir tri ntds and so on)
- f = open('myfile', 'w')
- p = ""
- for p in per:
- p = "".join(p)
- f.write(p + "\n")
- f.close()
- #********Generate the permutations ENDS********
- with open('DNA_seq_val.txt', 'r+') as SEQ, open('myfile', 'r+') as TET: #open two files
- SEQ_lines = sum(1 for line in open('DNA_seq_val.txt')) #count lines in sequences file
- #print (SEQ_lines)
- compo_lines = sum(1 for line in open('myfile')) #count lines in composition
- #print (compo_lines)
- for lines in SEQ:
- line,val1 = lines.split("#")
- val2 = val1.rstrip('\n')
- val = str(val2)
- line = line.rstrip('\n')
- length =len(line)
- #print (line)
- #print (val)
- LIN = line, val
- #print (LIN)
- newstr = "".join((line))
- print (newstr)
- #while True: # infinte loop
- for PER in TET:
- #print (line)
- PER = PER.rstrip('\n')
- length2 =len(PER)
- #print (length2)
- #print (line)
- # print (PER)
- C_PER = str(line.count(PER))
- # print (C_PER)
- for R in C_PER:
- R1 = "".join(R)
- f2.write(R1+ ",")
- f2.write(val,)
- f2.write('\t')
- f2.write(line)
- f2.write('\n')
- #exit()