使用Python进行简单的日志文件解析

作者: admin

时间: 22/11/13 13:27:37

我刚刚开始学习Python，正在尝试编写一个简单的脚本来从日志文件中提取IP地址和URL。该日志文件大约有600个条目，如下所示：
208.115.113.86--[08/Apr/2016：17：36：09-0700]"Get/Pap2003/0306Asselection.htm HTTP/1.1"200 5551"Mozilla/5.0(Compatible；DotBot/1.1；http://www.opensiteexplorer.org/dotbot，Help@moz.com)""www.redlug.com""
我的编码示例如下：
F=OPEN("log.txt"，'r')
UrlPattern=r'(%ref)'
URL={}
总计计数=0
条目=f.readines()
I=0
While(i！=len(条目))：
If(不是re.earch(r'^#'，条目[i]))：
TotalCount=totalCount+1
Match=re.earch(urlPattern，Entries[i])
IF(匹配)：
Url=match.group(1)
If(urls.key()中的url)：
Urls[url]=urls[url]+1
其他：
URL[url]=1
I=i+1
F.Close()
我收到一个错误代码，说明URL未定义。有没有人能证明我哪里出了问题并帮我走出困境？
谢谢

# 回答1

因为没有代码标签/缩进，所以不能确切地知道发生了什么，但如果没有匹配(所以没有创建URL)会发生什么？下面的if会尝试执行，还是缩进在前面的if之后？

选择 | 换行 | 行号

 if (match):    ## not found
    url = match.group(1)  ## this line never executes
 
    ## indent this if, so it only executes if match
    if (url in urls.keys()):
    ## you can just use
    ## if url in urls:
 

# 回答2

对不起，剧本写得一团糟，我发的时候搞砸了。我需要做的是提取IP地址和他们访问的URL的列表。

# 回答3

从简单的东西开始，因为您不知道在发布的代码中找到了什么/没有找到什么。

选择 | 换行 | 行号

 for entry in entries:    
    if urlpattern in entry:
        print urlpattern, entry 
 

您还可以使用集合的defaultdict，如果关键字不在词典中，它会添加该关键字。请参阅上的"计数器对象"
Https://docs.python.org/2/library/collections.html

# 回答4

也许我把这件事搞得太复杂了。这就是你想要的吗？

选择 | 换行 | 行号

 url='208.115.113.86 - - [08/Apr/2016:17:36:09 -0700] "GET /paper2003/0306AssElection.htm HTTP/1.1" 200 5551 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "www.redlug.com"'
split_url=url.split()
print split_url[0], "-->", split_url[-1]  
 

# 回答5

谢谢德布拉斯。效果很好。

使用Python进行简单的日志文件解析

添加新评论

最新文章

分类

最近回复

归档

其它