日期时间正则表达式帮助

作者: admin

时间: 22/11/24 15:14:13

大家好，开发人员，
我是一个初学者，在编写一个日期和时间的正则表达式需要帮助从一些html文件中提取。在下面的代码中，我遍历了名为Event的文件夹中的html文件，并使用美丽汤打印带有h1标记的标题。这些html页面还包含不同格式的日期和时间。我也想获取并显示此信息。这些html文档中的不同日期格式如下：
2012年11月21日至27日
2012年12月1日
2012年11月30日-12月2日
2012年11月26日
有人能帮我从这些html文档中提取这些格式吗？
下面是我的代码，用于遍历文件并从这些html文件中获取h1：
代码：

选择 | 换行 | 行号

  
    import re
    import os
    from bs4 import BeautifulSoup
 
    for subdir, dirs, files in os.walk("/home/himanshu/event/"):
        for fle in files:
            path = os.path.join(subdir, fle)    
            soup = BeautifulSoup(open(path))
 
            print (soup.h1.string)
 
            #Date and Time detection
 

# 回答1

您可以使用组合来编译struct_time对象列表
请注意
和
时间
模块。

选择 | 换行 | 行号

 import time
import re
 
patt = re.compile(r"^(\d{1,2}) - (.+)|^(\d{1,2} [a-zA-Z]{3}) - (.+)")
 
date_format = "%d %b %Y"
date_format1 = "%d %m %Y"
 
dates = ["21 - 27 Nov 2012", "1 Dec 2012", "30 Nov - 2 Dec 2012", "26 Nov 2012"]
 
output = []
 
for d in dates:
    try:
        m = patt.match(d)
        if m:
            if m.groups()[0]:
                end_date = time.strptime(m.groups()[1], date_format)
                output.append((time.strptime("%s %s %s" % (m.groups()[0],
                                                           end_date.tm_mon,
                                                           end_date.tm_year),
                                             date_format1), end_date))
            elif m.groups()[2]:
                end_date = time.strptime(m.groups()[3], date_format)
                output.append((time.strptime("%s %s" % (m.groups()[2],
                                                        end_date.tm_year),
                                             date_format), end_date))
        else:
            output.append(time.strptime(d, date_format))
    except ValueError, e:
        print "ValueError:", e
 
for item in output:
    if isinstance(item, tuple):
        print "  From:", item[0]
        print "    To:", item[1]
    else:
        print item
 

日期时间正则表达式帮助

添加新评论

最新文章

分类

最近回复

归档

其它