分析具有公共和不同属性的制表符分隔的.txt文件

我想解析选项卡分隔.txt文件，将常见属性和不同属性与文件分开。我只想解析第一行属性而不是值。您能纠正此脚本吗？该文件可以从此URL-
ftp://ftp.ebi.ac.uk/pub/databases/mi...mx-10.sdrf.txt
我编写的源代码如下 -

选择 | 换行 | 行号

 #!/usr/bin/python
import glob
outfile = open('output_attribute.txt' , 'w')
files = glob.glob('*.sdrf.txt')
for file in files:
    infile = open(file)
    #ret = False
    for line in infile:
        lineArray = line.split('\t')
 
        if '\n\n' in line:
            ret = false
            outfile.write('')
            break;
        elif len(lineArray) > 2:            
           output = "%s\t%s\n\n"%(lineArray[0],lineArray[1])
           outfile.write(output)
        else:
            output = "%s\t\n"%(lineArray[0])
            outfile.write(output)
    infile.close()
outfile.close()
 

# 回答1

我不清楚您的最终目标。看来您想读取多个文件，读取每个文件的第一行，在选项卡字符上拆分行，然后将前两个元素写入输出文件。您能澄清您想要的输出吗？

# 回答2

亲爱的，
请找到所附的zip文件。我只想从解析的文件中提取标题。标题中的每个文件都以数组设计名称开头，但没有修复属性。因此，我想提取带有空间(\ n \ n)分离间隙的标题，该间隙以zip文件格式附加。我只想提取红色的标头。我很高兴您的支持与合作。
带着敬意，
Haobijam

附加的文件

标题

(73.7 KB，159次观看)

# 回答3

我只想从解析的文件中提取标题。标题中的每个文件都以数组设计名称开头，但以Unfix属性结尾。因此，我想提取带有空间(\ n \ n)分离间隙的标题，该间隙以zip文件格式附加。我只想提取红色的标头。我已经附上了本脚本的输出。我很高兴您的支持与合作。
该文件可以从此URL-
ftp://ftp.ebi.ac.uk/pub/databases/mi...ffy-10.adf.txt
我写的源代码如下 -

选择 | 换行 | 行号

 #!/usr/bin/python
import glob
 
outfile = open('output_attri.txt' , 'w')
files = glob.glob('*.adf.txt')
 
for file in files:
    infile = open(file)
 
    for line in infile:
        line = line.replace('^' , '\n\n').replace('!' , '').replace('#' , '').replace('\n','')
        lineArray = line.split('%s\t')
        if line == '\n\n':
            outfile.write('')
            break;
        elif len(lineArray) > 2:            
            output = "%s\t%s\n"%(lineArray[0],lineArray[1])
            outfile.write(output)
        else:
            output = "%s\t\n"%(lineArray[0])
            outfile.write(output)
    infile.close()
outfile.close()
 

带着敬意，
Haobijam

附加的文件

output_attribute.zip

(2.46 MB，125次观看)

# 回答4

是否总是有一条空白行，将所需的标头信息与您不需要的数据分开？您只想要每个标头线的前两个元素吗？未经测试：

选择 | 换行 | 行号

 outFile = open(outFileName, 'w')
for fn in fileNameList:
    f = open(fn)
    output = []
    for line in f:
        line = line.strip().split("\t")
        if line:
            output.append("\t".join(line[:2]))
        else:
            outFile.write("\n".join(output))
            break
outFile.close()
 

# 回答5

亲爱的，
是的，总是有一个空白行，将我想要的标题信息从我不想在所有文件中提取的文本数据中分开。
问候，
Haobijam

# 回答6

亲爱的，
这个脚本怎么了？我无法打印任何输出。

选择 | 换行 | 行号

 #!/usr/bin/python
import glob
 
outFile = open('output.txt', 'w')
fileNameList = glob.glob('*.adf.txt')
for file in fileNameList:
    f = open(file)
    output = []
    for line in f:
        line = line.strip().split("\t")
        #lineArray = line.split('\t')
        if line:
            #output = "%s\t%s\n"%(lineArray[0],lineArray[1])
            output.append("\t".join(line[:2]))
        else:
            outFile.write("\n".join(output))
            break
    f.close()
outFile.close()
 

代码在这里 -

# 回答7

发布代码时，请使用代码标签。这样，我将不必编辑您的帖子。
没有打印语句。输出文件中有任何内容吗？添加打印语句，如
打印行
，看看正在阅读什么。
BV-主持人

# 回答8

这将标题信息写入磁盘：

选择 | 换行 | 行号

 outFile = open(outFileName, 'w')
for fn in fileNameList:
    f = open(fn)
    output = []
    for line in f:
        line = line.strip()
        if line:
            output.append(line)
        else:
            outFile.write("\n".join(output))
            f.close()
            break
outFile.close()
 
 

# 回答9

请注意，由于将其视为两个单独的记录，因此永远不会找到他的。测试LEN(Line.Strip())而不是查找空记录。

选择 | 换行 | 行号

if '\n\n' in line:

# 回答10

尊敬的先生，
我已经写了一个脚本提取以源名称开头的第一行，并以注释[arrayExpress数据检索URI]结束，我做到了，但是我无法解析每个文件中未重复的独特或唯一属性。我想仅解析第一行属性而不是表值。您能纠正此脚本吗？我已经为所有sdrf.txt文件附加了一个zip文件。该文件可以从此URL-
ftp://ftp.ebi.ac.uk/pub/databases/mi...fmx-1.sdrf.txt

选择 | 换行 | 行号

问候，
Haobijam

附加的文件

sdrf.txt.zip

(95.7 kb，78次观点)

sdrf.txt

(536个字节，417个视图)

output_att.zip

(3.0 kb，93次观点)

# 回答11

选择 | 换行 | 行号

 #!/usr/bin/python
import glob
#import linecache
outfile = open('output_att.txt' , 'w')
files = glob.glob('*.sdrf.txt')
for file in files:
    infile = open(file)
    #count = 0
    for line in infile:
 
        lineArray = line.rstrip()
        if not line.startswith('Source Name') : continue
        #count = count + 1
        lineArray = line.split('%s\t')
        print lineArray[0]
        output = "%s\t\n"%(lineArray[0])
        outfile.write(output)
    infile.close()
outfile.close() 
 
 

# 回答12

尊敬的先生，
我想从所有sdrf.txt文件中提取唯一的术语，但是此Python代码分别为每个文件输出唯一项。像数组数据文件一样，大多数sdrf.txt文件中都重复数组设计参考...，因此我不想将其打印为唯一的术语。您能告诉我在python中隐藏案例敏感的，因为特征[有机体]被印刷为特征[有机体]的独特术语[有机部分]，对于特征[性别]的特征[性别] [sexs]。我热切地等待您的支持和积极的答复。

选择 | 换行 | 行号

 #!/usr/bin/python
import glob
import string
 
outfile = open('output.txt' , 'w')
files = glob.glob('*.sdrf.txt')
previous = set()
for file in files:
    print('\n'+file)
    infile = open(file)
    #previous = set() # uncomment this if do not need to be unique between the files
    for line in infile:
        lineArray = line.rstrip()
        if not line.startswith('Source Name') : continue
        lineArray = line.split('%s\t')
        output = "%s\t\n"%(lineArray[0])
        outfile.write(output)
        uniqwords = set(word.strip() for word in lineArray[0].split('\t')
                        if word.strip() and word.strip() not in previous) 
        print('The %i unique terms are:\n\t%s' % (len(uniqwords),'\n\t'.join(sorted(uniqwords))))
        previous |=  uniqwords 
    infile.close()
outfile.close()
print('='*80)
print('The %i terms are:\n\t%s' % (len(previous),'\n\t'.join(sorted(previous))))
 
 

附加的文件

sdrf.zip

(95.7 kb，77次观点)

attribute.zip

(2.9 kb，108次观看)

# 回答13

尊敬的先生，
我确实有一个有关解析属性的查询，并从arrayexpress中提取adf.txt文件的唯一术语[
ftp://ftp.ebi.ac.uk/pub/databases/mi...y/data/array/]
。此处编写的Python代码对于运行具有类似起始学期的单个文件是可行的，但一次在2270 ADF.TXT文件左右运行是不可行的。您能否纠正或建议我在第12行中的此Python代码的一些提示。实际上，我想为每个adf.txt文件(数字2270)解析第一行，然后从中提取唯一的术语和常见条款。为了方便起见，我已为ADF.TXT格式附加了一个ZIP文件，但是您可能会进入上面提到的FTP网站。我很高兴您的支持与合作。
温暖的问候，
Haobijam

选择 | 换行 | 行号

 #!/usr/bin/python
import glob
import string
with open('output_Reporter Name.txt' , 'w') as outfile:
    files = glob.glob('*.adf.txt')
    uniqwords = set()
    previous = set()
    for file in files:
        with open(file) as infile:
            #previous = set() # uncomment this if do not need to be unique between the files
            for line in infile:
                if not line.startswith('Reporter Name') : continue ## change this line to deal with other form
                output = line
                uniqwords = set(word.strip() for word in line.rstrip().split('\t')
                                if word.strip() and word.strip() not in previous)
                previous |=  uniqwords
                print (output)
                outfile.write(output)
print('The %i unique terms are:\n\t%s' % (len(uniqwords),'\n\t'.join(sorted(uniqwords))))                  
print('='*80)
print('The %i terms are:\n\t%s' % (len(previous),'\n\t'.join(sorted(previous))))
 
 

附加的文件

adf.zip

(1.01 MB，107视图)

分析具有公共和不同属性的制表符分隔的.txt文件

添加新评论

最新文章

分类

最近回复

归档

其它