使用python提取文本文件中两个字符串之间的值

假设我有一个文本文件(inputfile.txt，文件大小约为10 GB)。
现在我需要编写一段Python代码，它将读取文本文件并将开始和结束之间的内容复制到另一个文件。
我编写了以下代码。

选择 | 换行 | 行号

 import re  
 
with open(r'C:\Python27\log\master_input.txt', 'r') as infile, open(r'C:\Python27\log\output', 'w') as outfile:  
   copy = False  
   for line in infile:  
      if re.match("Jun  6 17:58:16(.*)", line):  
         copy = True  
      elif re.match("Jun  6 17:58:31(.*)", line):  
         copy = False  
      elif copy:  
         outfile.write(line)  
 

我没有得到预期的输出：
代码输出(Output_of_my_code.txt)：
预期输出为(Expect_output.txt)：
请在这里帮助我以最好的方式做这件事
附加的文件

输入文件.txt
(1.5KB，514浏览量)

输出：my_code.txt
(215字节，488次观看)

预期_output.txt
(1.1KB，472次浏览)

# 回答1

要获得所需的输出，请使用Re确定一个表示秒的整数，并与下限和上限进行比较。下面是一个例子：

选择 | 换行 | 行号

 import re
 
data = """Jun  6 17:58:13 other strings
Jun  6 17:58:13 other strings
Jun  6 17:58:14 other strings
Jun  6 17:58:14 other strings
Jun  6 17:58:15 other strings
Jun  6 17:58:15 other strings
Jun  6 17:58:15 other strings
Jun  6 17:58:15 other strings
Jun  6 17:58:16 other strings
Jun  6 17:58:16 other strings
Jun  6 17:58:16 other strings
Jun  6 17:58:16 other strings
Jun  6 17:58:16 other strings
Jun  6 17:58:16 other strings
Jun  6 17:58:17 other strings
Jun  6 17:58:17 other strings
Jun  6 17:58:17 other strings
Jun  6 17:58:17 other strings
Jun  6 17:58:18 other strings
Jun  6 17:58:18 other strings
Jun  6 17:58:18 other strings
Jun  6 17:58:18 other strings
Jun  6 17:58:18 other strings
Jun  6 17:58:19 other strings
Jun  6 17:58:19 other strings
Jun  6 17:58:20 other strings
Jun  6 17:58:20 other strings
Jun  6 17:58:21 other strings
Jun  6 17:58:21 other strings
Jun  6 17:58:21 other strings
Jun  6 17:58:21 other strings
Jun  6 17:58:22 other strings
Jun  6 17:58:23 other strings
Jun  6 17:58:24 other strings
Jun  6 17:58:27 other strings
Jun  6 17:58:28 other strings
Jun  6 17:58:28 other strings
Jun  6 17:58:29 other strings
Jun  6 17:58:29 other strings
Jun  6 17:58:29 other strings
Jun  6 17:58:29 other strings
Jun  6 17:58:30 other strings
Jun  6 17:58:31 other strings
Jun  6 17:58:31 other strings
Jun  6 17:58:32 other strings
Jun  6 17:58:33 other strings
Jun  6 17:58:33 other strings
Jun  6 17:58:33 other strings
Jun  6 17:58:33 other strings"""
 
patt = re.compile("Jun  6 17:58:(\d+?) (.*)")
upper = 31
lower = 16
 
for line in data.split("\n"):
    m = patt.match(line)
    if m:
        i = int(m.group(1))
        if i >= lower and i <= upper:
            print line
 

# 回答2

@bvdet：谢谢您的解决方案。这里我不知道上限和下限值..。你是怎么得到这些价值的..。

# 回答3

在你最初的帖子中，你知道上限和下限。你怎么认识他们的？如果您要处理日期和时间而不是严格格式化的数据，请考虑使用Time和DateTime模块。从日期/时间字符串创建DateTime对象的示例：

选择 | 换行 | 行号

 >>> datetime.datetime.strptime("Jun  6 17:58:13", "%b  %d %H:%M:%S")
datetime.datetime(1900, 6, 6, 17, 58, 13)
>>>
 

从那里您可以创建TimeDelta对象：

选择 | 换行 | 行号

 >>> d1 = datetime.datetime.strptime("Jun  6 17:58:13", "%b  %d %H:%M:%S")
>>> d2 = datetime.datetime.strptime("Jun  7 12:55:48", "%b  %d %H:%M:%S")
>>> d1-d2
datetime.timedelta(-1, 18145)
>>> d2-d1
datetime.timedelta(0, 68255)
>>> dt1 = d1-d2
>>> dt1.days
-1
>>> dt1.total_seconds()
-68255.0
>>> dt2 = d2-d1
>>> dt2.days
 
>>> dt2.total_seconds()
68255.0
>>> 
 

使用python提取文本文件中两个字符串之间的值

添加新评论

最新文章

分类

最近回复

归档

其它