假设我有一个文本文件(inputfile.txt,文件大小约为10 GB)。
现在我需要编写一段Python代码,它将读取文本文件并将开始和结束之间的内容复制到另一个文件。
我编写了以下代码。
- import re
-
- with open(r'C:\Python27\log\master_input.txt', 'r') as infile, open(r'C:\Python27\log\output', 'w') as outfile:
- copy = False
- for line in infile:
- if re.match("Jun 6 17:58:16(.*)", line):
- copy = True
- elif re.match("Jun 6 17:58:31(.*)", line):
- copy = False
- elif copy:
- outfile.write(line)
我没有得到预期的输出:
代码输出(Output_of_my_code.txt):
预期输出为(Expect_output.txt):
请在这里帮助我以最好的方式做这件事
附加的文件
输入文件.txt
(1.5KB,514浏览量)
输出:my_code.txt
(215字节,488次观看)
预期_output.txt
(1.1KB,472次浏览)
# 回答1
要获得所需的输出,请使用Re确定一个表示秒的整数,并与下限和上限进行比较。下面是一个例子:
- import re
-
- data = """Jun 6 17:58:13 other strings
- Jun 6 17:58:13 other strings
- Jun 6 17:58:14 other strings
- Jun 6 17:58:14 other strings
- Jun 6 17:58:15 other strings
- Jun 6 17:58:15 other strings
- Jun 6 17:58:15 other strings
- Jun 6 17:58:15 other strings
- Jun 6 17:58:16 other strings
- Jun 6 17:58:16 other strings
- Jun 6 17:58:16 other strings
- Jun 6 17:58:16 other strings
- Jun 6 17:58:16 other strings
- Jun 6 17:58:16 other strings
- Jun 6 17:58:17 other strings
- Jun 6 17:58:17 other strings
- Jun 6 17:58:17 other strings
- Jun 6 17:58:17 other strings
- Jun 6 17:58:18 other strings
- Jun 6 17:58:18 other strings
- Jun 6 17:58:18 other strings
- Jun 6 17:58:18 other strings
- Jun 6 17:58:18 other strings
- Jun 6 17:58:19 other strings
- Jun 6 17:58:19 other strings
- Jun 6 17:58:20 other strings
- Jun 6 17:58:20 other strings
- Jun 6 17:58:21 other strings
- Jun 6 17:58:21 other strings
- Jun 6 17:58:21 other strings
- Jun 6 17:58:21 other strings
- Jun 6 17:58:22 other strings
- Jun 6 17:58:23 other strings
- Jun 6 17:58:24 other strings
- Jun 6 17:58:27 other strings
- Jun 6 17:58:28 other strings
- Jun 6 17:58:28 other strings
- Jun 6 17:58:29 other strings
- Jun 6 17:58:29 other strings
- Jun 6 17:58:29 other strings
- Jun 6 17:58:29 other strings
- Jun 6 17:58:30 other strings
- Jun 6 17:58:31 other strings
- Jun 6 17:58:31 other strings
- Jun 6 17:58:32 other strings
- Jun 6 17:58:33 other strings
- Jun 6 17:58:33 other strings
- Jun 6 17:58:33 other strings
- Jun 6 17:58:33 other strings"""
-
- patt = re.compile("Jun 6 17:58:(\d+?) (.*)")
- upper = 31
- lower = 16
-
- for line in data.split("\n"):
- m = patt.match(line)
- if m:
- i = int(m.group(1))
- if i >= lower and i <= upper:
- print line
# 回答2
@bvdet:谢谢您的解决方案。这里我不知道上限和下限值..。你是怎么得到这些价值的..。
# 回答3
在你最初的帖子中,你知道上限和下限。你怎么认识他们的?如果您要处理日期和时间而不是严格格式化的数据,请考虑使用Time和DateTime模块。从日期/时间字符串创建DateTime对象的示例:
- >>> datetime.datetime.strptime("Jun 6 17:58:13", "%b %d %H:%M:%S")
- datetime.datetime(1900, 6, 6, 17, 58, 13)
- >>>
从那里您可以创建TimeDelta对象:
- >>> d1 = datetime.datetime.strptime("Jun 6 17:58:13", "%b %d %H:%M:%S")
- >>> d2 = datetime.datetime.strptime("Jun 7 12:55:48", "%b %d %H:%M:%S")
- >>> d1-d2
- datetime.timedelta(-1, 18145)
- >>> d2-d1
- datetime.timedelta(0, 68255)
- >>> dt1 = d1-d2
- >>> dt1.days
- -1
- >>> dt1.total_seconds()
- -68255.0
- >>> dt2 = d2-d1
- >>> dt2.days
-
- >>> dt2.total_seconds()
- 68255.0
- >>>