apache日誌匯入oracle(日誌經過python處理)
1、apache日誌格式:
LogFormat "%h %l %u %t \"%r\" %>s %b" common(引用httpd.conf配置檔案)
注視:每個位置代表什麼含義網上都有,可以自己看。
2、apache日誌內容:(第一行空格已經標出,其他行格式一樣,不標了)
183.31.43.112 空格 -空格 -空格 [06/Nov/2012:00:00:17 空格+0800] 空格"GET空格 /hz HTTP/1.1" 空格302空格 -
211.103.82.175 - - [06/Nov/2012:00:00:24 +0800] "GET /hz HTTP/1.1" 302 -
221.179.37.12 - - [06/Nov/2012:00:00:33 +0800] "GET /hz?t=35245 HTTP/1.0" 302 -
211.139.145.101 - - [06/Nov/2012:00:00:34 +0800] "GET /hz HTTP/1.1" 302 -
120.196.208.79 - - [06/Nov/2012:00:00:38 +0800] "GET /hz HTTP/1.1" 302 -
117.136.12.210 - - [06/Nov/2012:00:00:38 +0800] "GET /hz HTTP/1.1" 302 -
221.179.85.10 - - [06/Nov/2012:00:00:40 +0800] "GET /hz?t=43385 HTTP/1.0" 302 -
119.139.89.179 - - [06/Nov/2012:00:00:46 +0800] "GET /hz HTTP/1.1" 302 -
。。。。。。。。。
。。。。。。。。。
。。。。。。。。。
3、試驗要求:
把apache日誌中每行的 “遠端主機IP/主機名”,“訪問時間”,“url”,“ 狀態”匯入到oracle資料庫中。
4、oracle資料庫建立的相關表及檢視:
(1) create table APACHE_LOG
(
remote_host CHAR(15),
access_time VARCHAR2(30),
full_url VARCHAR2(200),
status CHAR(10)
);
(2)create table APACHE_LOG_RPT
(
remote_host CHAR(15),
access_time DATE,
real_url VARCHAR2(392),
full_url VARCHAR2(200),
status CHAR(10)
);
(3)create or replace view v_apache_log_vi as
select REMOTE_HOST , trunc(to_date(ACCESS_TIME ,'dd/mon/yyyy hh24:mi:ss')) access_time ,
case when instr(full_url, '?') > 0 then
substr(full_url, 5, instr(full_url, '?')-5 )
else
substr(full_url, 5, instr(full_url, ' ',5) -5 )
end real_url
, FULL_URL , STATUS from apache_log;
5、使用python格式化apache日誌:
(1)python指令碼1:processor.py
#!/bin/env python
import sys, re
class Processor:
def __init__(self, reader, writer):
self.reader = open(reader, 'r')
self.writer = open(writer, 'w')
def convert(self, data):
assert 0, 'convert needed to be defined'
def process(self):
while 1:
line = self.reader.readline()
if not line: break
data = self.convert(line)
self.writer.write(data)
class UpperCase(Processor):
def convert(self, data):
return data.upper()
class HTMLize:
def write(self, data):
print '%s' % data[:-1]
class ScanApacheLog(Processor):
def convert(self, data):
p = re.compile(r'''
(?P\S+) #remote_host
\s+ #whitespace
\S+ #remote_login
\s+ #whitespace
\S+ #remote_user
\s+ #whitespace
(?P
\s+ #whitespace
(?P".*") #first line of request
\s+ #whitespace
(?P\d+) #status
\s+ #whitespace
(-|\d+) # send_size
\s*
''', re.X)
m = p.match(data)
if m is not None:
arr = m.groupdict()
request = re.match( r'"(.*)"', arr['request']).group(1)
access_time = re.match(r'\[(\S+)\s+\S+\]', arr['time']).group(1)
line = arr['remote_host'] + ',' + access_time + ',' + request + ',' + arr['status'] + '\n'
return line
if __name__ == '__main__':
UpperCase(open('1.txt'), sys.stdout).process()
UpperCase(open('1.txt'), open('2.txt', 'w')).process()
UpperCase(open('1.txt'), HTMLize()).process()
(2)python指令碼2:scan.py
#!/bin/env python
from processor import *
if __name__ == '__main__':
reader='/home/oracle/v1x-access_log_20121106'
writer='/tmp/dealed.log'
ScanApacheLog(reader, writer ).process()
(3)把上面兩個指令碼放在同一個目錄下,執行#python scan.py 即可
reader='/home/oracle/v1x-access_log_20121106'記錄的是apache日誌位置;
writer='/tmp/dealed.log' 記錄的是經過python指令碼格式化處理的檔案,也是下面步驟要使用的檔案。
/tmp/dealed.log'部分內容如下:
112.95.194.245,06/Nov/2012:00:00:01,GET /hz HTTP/1.1,302
120.196.208.72,06/Nov/2012:00:00:07,GET /hz HTTP/1.1,302
113.74.6.56,06/Nov/2012:00:00:09,GET /hz HTTP/1.1,302
183.31.43.112,06/Nov/2012:00:00:17,GET /hz HTTP/1.1,302
211.103.82.175,06/Nov/2012:00:00:24,GET /hz HTTP/1.1,302
221.179.37.12,06/Nov/2012:00:00:33,GET /hz?t=35245 HTTP/1.0,302
211.139.145.101,06/Nov/2012:00:00:34,GET /hz HTTP/1.1,302
120.196.208.79,06/Nov/2012:00:00:38,GET /hz HTTP/1.1,302
。。。。。。。
。。。。。。。
6、利用 sqlldr工具把'/tmp/dealed.log匯入到oracle資料庫的apache_log表
首先編寫控制檔案,控制檔案很重要
[oracle@shanxi myself]$ more apachelog2.ctl
load data
truncate into table apache_log
fields terminated by ','
trailing nullcols
(
REMOTE_HOST,
ACCESS_TIME ,
FULL_URL ,
STATUS
)
,然後把/tmp/dealed.log匯入資料庫:
[oracle@shanxi myself]$ sqlldr scott/### control=apachelog.ctl data=/tmp/dealed.log
7、最後一步,使用前面建立的封裝處理邏輯的檢視匯入到apache_log_rpt表
SQL> alter session set nls_date_language='AMERICAN';
SQL > insert into apache_log_rpt select * from v_apachelog_vi;
因為涉及到日誌格式轉換,所以需要修改session的nls_date_language引數( 主要是“select REMOTE_HOST , trunc(to_date(ACCESS_TIME ,'dd/mon/yyyy hh24:mi:ss')) access_time”的mon部分是那種格式,與nls_date_language是CHINEASE還是AMERICAN有關係),否則如果mon格式跟nls_date_language不一致的話,查詢檢視v_apache_log_vi的時候會報錯的。就是因為這個原因導致的。
這個處理邏輯就是上面這些。下面列出我之前接觸過的這方面的指令碼,功能是類似的:
試驗要求是使用crond程式呼叫shell指令碼,每天定時自動匯入昨天的apache日誌(v1x-access_log_***)到資料庫中。
目錄結構如下:
[oracle@searchdb2 apache]$pwd
/logbackup/apache
[oracle@searchdb2 apache]$ll
total 88
drwxr-xr-x 2 yangzhanli dev 9216 Oct 1 08:50 201209
drwxr-xr-x 2 yangzhanli dev 16384 Nov 1 08:50 201210
drwxr-xr-x 2 yangzhanli dev 10240 Nov 19 08:50 201211
drwxr-xr-x 2 yangzhanli dev 38912 Nov 19 08:50 md5sumfile
[oracle@searchdb2 apache]$cd
[oracle@searchdb2 apache]$cd 201211/
[oracle@searchdb2 201211]$ls -l | grep v1x | more
-rw-r--r-- 1 yangzhanli dev 820634 Nov 1 23:59 v1x-access_log_20121101
-rw-r--r-- 1 yangzhanli dev 658913 Nov 2 23:59 v1x-access_log_20121102
-rw-r--r-- 1 yangzhanli dev 1066549 Nov 3 23:59 v1x-access_log_20121103
-rw-r--r-- 1 yangzhanli dev 644101 Nov 4 23:59 v1x-access_log_20121104
-rw-r--r-- 1 yangzhanli dev 888134 Nov 5 23:59 v1x-access_log_20121105
-rw-r--r-- 1 yangzhanli dev 1336714 Nov 6 23:59 v1x-access_log_20121106
-rw-r--r-- 1 yangzhanli dev 596688 Nov 7 23:59 v1x-access_log_20121107
-rw-r--r-- 1 yangzhanli dev 497143 Nov 8 23:59 v1x-access_log_20121108
cron呼叫如下:
[oracle@searchdb2 201211]$ crontab -l
0 7 * * * /home/oracle/admin/tarsmsglogs/bin/tar_smslogs.sh cm
0 8 * * * /home/oracle/admin/tarsmsglogs/bin/tar_smslogs.sh ct
0 6 * 1 * /home/oracle/insert_log/insert_log.sh
4 9 * * * /home/oracle/admin/apache/import_apachelog2.sh > /dev/null 2>&1
shell指令碼import_apachelog2.sh內容如下:
!/bin/bash
# Filename: import_apachelog2.sh
# Date: 2012-10-31
# Desc: import the logfile '/logbackup/apache/${month}/ v1x-access_log_**.log' into the table 'apache_log_rpt' in SPICEMOM schema
# Usage: /home/oracle/admin/apache/import_apachelog2.sh
# Version: 1.1
. /home/oracle/.bash_profile
LOG_ROOT=/logbackup/apache
CUR_MONTH=`date -d '1 day ago' +%Y%m`
YESTERDAY=`date -d '1 day ago' +%Y%m%d`
LOG_PREFIX=v1x-access_log
LOG_FILE=${LOG_ROOT}/${CUR_MONTH}/${LOG_PREFIX}_${YESTERDAY}
SCRIPT_PATH=/home/oracle/admin/apache
#echo $LOG_FILE
#echo ${SCRIPT_PATH}
#exit
cd ${SCRIPT_PATH}
if [ ! -f $LOG_FILE ]; then
now=`date '+ %Y-%m-%d %H:%M:%S'`
echo "* Failed at $now:the logfile ${LOG_FILE} needed to import doesn't exist !" >> import_history.log
exit
fi
cp ${LOG_FILE} /tmp/v1x.log
python scanlog.py
sqlldr spicemom/spicemom control=apachelog.ctl data="/tmp/scanlog.txt"
if [ "$?" -eq 0 ] ; then
now=`date '+ %Y-%m-%d %H:%M:%S'`
sqlplus spicemom/spicemom @append_log.sql
echo "At $now:the logfile ${LOG_FILE} succeeds to import the table apache_log_rpt !" >> import_history.log
rm -rf /tmp/v1x.log
rm -rf /tmp/scanlog.txt
fi
所有涉及到的指令碼目錄結構如下:
[oracle@searchdb2 201211]$cd /home/oracle/admin/apache
[oracle@searchdb2 apache]$ll
total 36
-rw-r--r-- 1 oracle dba 69 Sep 12 15:29 afiedt.buf
-rw-r--r-- 1 oracle dba 383 Oct 30 13:38 apachelog.ctl
-rw-r--r-- 1 oracle dba 1750 Nov 19 09:04 apachelog.log
-rw-r--r-- 1 oracle dba 135 Oct 29 20:29 append_log.sql
-rwxr--r-- 1 oracle dba 968 Nov 4 12:19 import_apachelog.sh
-rwxr--r-- 1 oracle dba 1154 Nov 4 12:19 import_apachelog2.sh
-rw-r--r-- 1 oracle dba 2788 Nov 19 09:04 import_history.log
drwxr-xr-x 2 oracle dba 4096 Oct 30 16:00 orig_source
-rw-r--r-- 1 oracle dba 938 Oct 30 13:35 scanlog.py
再看一下指令碼內容:
python的指令碼得注意格式要符合程式結構,否則的話可能執行不了。python的程式碼是不是一段程式碼塊不是靠類似C語言的中括號{},而是靠格式對齊來判斷是不是同一段程式碼塊, 這點感覺挺噁心的。(如果有人copy可得注意了)
[oracle@searchdb2 apache]$ vi scanlog.py
#!/usr/bin/env python
import sys
import re
def ScanOneLog(sLogFile,dLogFile):
log_line_re = re.compile(r'''(?P\S+) #IP ADDRESS
\s+ #whitespace
\S+ #remote logname
\s+ #whitespace
\S+ #remote user
\s+ #whitespace
(?P
\s+ #whitespace
(?P"[^"]+") #first line of request
\s+ #whitespace
(?P\d+)
\s+ #whitespace
(?P-|\d+)
\s* #whitespace
''', re.VERBOSE)
lf=open(dLogFile,'w')
for line in open(sLogFile,'r'):
m = log_line_re.match(line)
if m is not None :
groupdict = m.groupdict()
tmps=groupdict['remote_host']+','+groupdict['status']+','+groupdict['time']+','+groupdict['request']+'\n'
lf.writelines(tmps)
lf.close()
if __name__=='__main__':
ScanOneLog('/tmp/v1x.log','/tmp/scanlog.txt')
[oracle@searchdb2 apache]$vi apachelog.ctl
load data
truncate into table apache_log
FIELDS TERMINATED BY "," TRAILING NULLCOLS
(
remote_host char,
status char,
access_time char,
url char(4000)
)
[oracle@searchdb2 apache]$vi append_log.sql
alter session set nls_date_language = 'AMERICAN';
insert into apache_log_rpt select * from v_apache_log_ori;
commit;
exit
最後給列下表和檢視結構定義:
create table APACHE_LOG
(
remote_host VARCHAR2(20),
url VARCHAR2(4000),
status VARCHAR2(10),
access_time VARCHAR2(30)
)
create table APACHE_LOG_RPT
(
remote_host VARCHAR2(20),
real_url VARCHAR2(4000),
url VARCHAR2(4000),
status VARCHAR2(10),
access_time DATE
)
create or replace view v_apache_log as
select "REMOTE_HOST","REAL_URL","URL","STATUS","ACCESS_TIME" from apache_log_rpt;
create or replace view v_apache_log_ori as
select remote_host,
case
when instr(url,'?') >0 then substr(url,6,instr(url,'?')-6)
else substr(url,6,instr(url,' ',6)-6)
end real_url,
url,status,
trunc(to_date(SUBSTR(ACCESS_TIME,2,LENGTH(ACCESS_TIME)-8),'dd/mon/yyyy:hh24:mi:ss') ) access_time from apache_log;
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/27042095/viewspace-749496/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- ORACLE 告警日誌alert過大的處理Oracle
- 處理Apache日誌的Bash指令碼Apache指令碼
- apache日誌內容匯入資料庫Apache資料庫
- 日誌分析-apache日誌分析Apache
- oracle alert日誌亂碼處理Oracle
- nginx日誌處理Nginx
- oracle監聽器日誌過大-處理辦法Oracle
- PHP日誌處理類PHP
- oracle 中 alert 報警日誌過大的處理方法Oracle
- orbeon form 的日誌處理ORBORM
- shell日誌顏色處理
- perl分析apache日誌Apache
- Apache日誌詳解Apache
- APACHE日誌檔案Apache
- Apache 配置日誌切割Apache
- 【Oracle日誌】- 日誌檔案重建Oracle
- SQL Server事務日誌過大的處理SQLServer
- 日誌服務資料匯入
- mybatis-plus匯入sql日誌MyBatisSQL
- 玄機-第二章日誌分析-apache日誌分析Apache
- SQL Server日誌檔案總結及日誌滿的處理SQLServer
- Oracle DataGuard歸檔日誌丟失處理方法Oracle
- Oracle日誌Oracle
- 刪除oracle下apache下的日誌OracleApache
- 『無為則無心』Python日誌 — 67、logging日誌模組處理流程Python
- DATAGUARD中手工處理日誌GAP
- node錯誤處理與日誌
- logstash kafka output 日誌處理Kafka
- strom打造日誌處理系統
- Db2 日誌處理二DB2
- [zt] SQL Server日誌檔案總結及日誌滿的處理SQLServer
- 畢業設計二:日誌匯入
- mariadb審計日誌通過 logstash匯入 hiveHive
- ORACLE 11G DATAGUARD 日誌中斷處理方案Oracle
- oracle LOGICAL standby 日誌無法應用處理Oracle
- sql server日誌檔案總結及日誌滿的處理辦法SQLServer
- oracle alert日誌Oracle
- Oracle 附加日誌Oracle