Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
Tags
- windows vista
- 폴안티 스파이앱
- lagom
- selenium #scraping #webscraping #jsoup #firefox
- lagom framework
- CVS
- volatile
- 폴-안티스파이앱
- 윈도우즈 비스타
- TortoiseSVN
- 스포티지r 풀체인지
- 차이점
- 설치
- 라곰
- 정통춘천닭갈비
- 폴-안티 스파이앱
- 한강 #야경 #한강야경
- svn
- 폴안티
- 폴안티스파이앱
- 라곰프레임워크
- 책상
- 폴-안티
- 모니터
- 스파이앱
- Subversion
- Lambda Expressions
- 썬
- java8 람다식
- 명주
Archives
- Today
- Total
장발의 개발러
Hadoop을 이용한 Apache Log 분석 본문
출처: http://mimul.com/pebble/default/2011/11/05/1320482173560.html
아래는 Apache 로그를 가지고 IP별 방문자수를 카운트하는 샘플을 작성해 보았습니다. 로그 분석을 위해 개념이해하는데 도움이 될까 해서 테스트 내용을 공유합니다. ^^
좀 더 응용하면 시간당, 일별, 월별, 주별 등으로 구분해서 트래픽을 산정할 수 있겠죠.
그리고 R과 결합해 그래프로 도식화하면 추세를 보는데 도움을 줄 수 있습니다.
1. 로그 데이터(Apache) 분석 준비
Apache Common로그를 다운 받아서 /database/samples/data/apache 디렉토리에 넣은 다음 아래의 순으로 HDFS에 카피를 하여 분석할 준비를 한다.
- 로컬 데이터 HDFS에 카피
[mimul]/hadoop-0.20.204.0> bin/hadoop dfs -copyFromLocal
/database/samples/data/apache apache [mimul]/hadoop-0.20.204.0> bin/hadoop dfs -ls Found 5 items drwxr-xr-x - k2 KPCT 0 2011-11-02 18:07 /user/k2/apache drwxr-xr-x - k2 KPCT 0 2011-10-21 12:52 /user/k2/gutenberg drwxr-xr-x - k2 KPCT 0 2011-10-21 12:57 /user/k2/gutenberg-output drwxr-xr-x - k2 KPCT 0 2011-09-28 20:47 /user/k2/input drwxr-xr-x - k2 KPCT 0 2011-09-28 20:49 /user/k2/output [mimul]/hadoop-0.20.204.0> bin/hadoop dfs -ls apache Found 4 items -rw-r--r-- 1 k2 KPCT 15380331 2011-11-02 18:07 /user/k2/apache/access_log.20110701 -rw-r--r-- 1 k2 KPCT 11754087 2011-11-02 18:07 /user/k2/apache/access_log.20110702 -rw-r--r-- 1 k2 KPCT 12220413 2011-11-02 18:07 /user/k2/apache/access_log.20110703 -rw-r--r-- 1 k2 KPCT 14435475 2011-11-02 18:07 /user/k2/apache/access_log.20110704
2. MapReduce 샘플 소스
- LogMapper.java
public class LogMapper extends Mapper<Object, Text, Text, IntWritable> { private static Logger logger = Logger.getLogger(LogMapper.class); @Override protected void map(Object key,Text value,Context context) throws IOException, InterruptedException {- LogReducer.java
String logEntryLine = value.toString(); String logEntryPattern = Constants.APACHE_ACCESS_LOG; int position = 1;
if (isValidLine(logEntryLine, logEntryPattern)){ String ipAddress=retrieveIPAddress(logEntryLine,logEntryPattern, position); logger.warn("Ip address in map : "+ipAddress); context.write(new Text(ipAddress), Constants.ONE); } }
public boolean isValidLine(String logEntryLine,String logEntryPattern){ Pattern p = Pattern.compile(logEntryPattern); Matcher matcher = p.matcher(logEntryLine); if (!matcher.matches()){ logger.warn("정상적인 로그 포멧이 아님"); return false; } ; return true; }
public String retrieveIPAddress(String logEntryLine,String logEntryPattern, int position){ Pattern p = Pattern.compile(logEntryPattern); Matcher matcher = p.matcher(logEntryLine); if (!matcher.matches() || Constants.PARSE_CNT != matcher.groupCount()) { logger.warn("비정상 로그"); return Constants.INVALID_IPADDRESS; } return matcher.group(position); } }
public class LogReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException ,InterruptedException { int sum=0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } }- LogMapReduceJob.java
public class LogMapReduceJob extends Configured implements Tool { private static void initJob(String jobName, Configuration config, String inputPath, String outputPath) throws IOException, InterruptedException, ClassNotFoundException{ Job job=new Job(config, jobName); job.setJarByClass(LogMapReduceJob.class); job.setMapperClass(LogMapper.class); job.setReducerClass(LogReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(inputPath)); FileOutputFormat.setOutputPath(job, new Path(outputPath)); job.waitForCompletion(true); } public static void main(String[] args) throws Exception { ToolRunner.run(new LogMapReduceJob(), args); } @Override public int run(String[] args) throws Exception { Configuration config = new Configuration(); System.out.println("Args [0] :"+args[0]); System.out.println("Args [1] :"+args[1]); System.out.println("Args [2] :"+args[2]); initJob(args[0], config, args[1], args[2]); return 0; } }- build 파일
. LogAnalizerMapReduce.jar
3. 로그 데이터 분석
LogAnalizerMapReduce.jar 파일을 실행하여 Apache로그를 분석한다.
[mimul]/hadoop-0.20.204.0> bin/hadoop jar LogAnalizerMapReduce.jar apache apache-output Args [0] :LogAnalyze Args [1] :apache Args [2] :apache-output 11/11/03 15:39:38 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 11/11/03 15:39:38 INFO input.FileInputFormat: Total input paths to process : 1 11/11/03 15:39:38 INFO mapred.JobClient: Running job: job_201111031454_0003 11/11/03 15:39:39 INFO mapred.JobClient: map 0% reduce 0% 11/11/03 15:39:56 INFO mapred.JobClient: map 46% reduce 0% 11/11/03 15:39:59 INFO mapred.JobClient: map 78% reduce 0% 11/11/03 15:40:02 INFO mapred.JobClient: map 100% reduce 0% 11/11/03 15:40:17 INFO mapred.JobClient: map 100% reduce 100% 11/11/03 15:40:22 INFO mapred.JobClient: Job complete: job_201111031454_0003 11/11/03 15:40:22 INFO mapred.JobClient: Counters: 25 11/11/03 15:40:22 INFO mapred.JobClient: Job Counters 11/11/03 15:40:22 INFO mapred.JobClient: Launched reduce tasks=1 11/11/03 15:40:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22317 11/11/03 15:40:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 11/11/03 15:40:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 11/11/03 15:40:22 INFO mapred.JobClient: Launched map tasks=1 11/11/03 15:40:22 INFO mapred.JobClient: Data-local map tasks=1 11/11/03 15:40:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13390 11/11/03 15:40:22 INFO mapred.JobClient: File Output Format Counters 11/11/03 15:40:22 INFO mapred.JobClient: Bytes Written=105178 11/11/03 15:40:22 INFO mapred.JobClient: FileSystemCounters 11/11/03 15:40:22 INFO mapred.JobClient: FILE_BYTES_READ=1806396 11/11/03 15:40:22 INFO mapred.JobClient: HDFS_BYTES_READ=15380456 11/11/03 15:40:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3655077 11/11/03 15:40:22 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=105178 11/11/03 15:40:22 INFO mapred.JobClient: File Input Format Counters 11/11/03 15:40:22 INFO mapred.JobClient: Bytes Read=15380331 11/11/03 15:40:22 INFO mapred.JobClient: Map-Reduce Framework 11/11/03 15:40:22 INFO mapred.JobClient: Reduce input groups=6327 11/11/03 15:40:22 INFO mapred.JobClient: Map output materialized bytes=1806396 11/11/03 15:40:22 INFO mapred.JobClient: Combine output records=0 11/11/03 15:40:22 INFO mapred.JobClient: Map input records=86763 11/11/03 15:40:22 INFO mapred.JobClient: Reduce shuffle bytes=1806396 11/11/03 15:40:22 INFO mapred.JobClient: Reduce output records=6327 11/11/03 15:40:22 INFO mapred.JobClient: Spilled Records=172974 11/11/03 15:40:22 INFO mapred.JobClient: Map output bytes=1633416 11/11/03 15:40:22 INFO mapred.JobClient: Combine input records=0 11/11/03 15:40:22 INFO mapred.JobClient: Map output records=86487 11/11/03 15:40:22 INFO mapred.JobClient: SPLIT_RAW_BYTES=125 11/11/03 15:40:22 INFO mapred.JobClient: Reduce input records=864874. Jobtracker
[mimul]/hadoop-0.20.204.0> bin/hadoop dfs -ls apache-output Found 3 items -rw-r--r-- 1 k2 KPCT 0 2011-11-03 15:22 /user/k2/apache-output/_SUCCESS drwxr-xr-x - k2 KPCT 0 2011-11-03 15:22 /user/k2/apache-output/_logs -rw-r--r-- 1 k2 KPCT 105178 2011-11-03 15:22 /user/k2/apache-output/part-r-00000
[mimul]/hadoop-0.20.204.0> bin/hadoop fs -cat apache-output/part-r-00000 58.225.20.33 1 58.225.23.88 6 58.226.140.46 2 58.227.139.198 7 58.227.156.67 1 58.227.19.60 2 58.227.204.143 3 58.227.31.86 3 58.228.13.105 1 58.228.3.131 1 58.228.60.70 1 58.228.89.39 2 58.229.111.26 1 58.229.146.205 10
5. Troubleshooting
분석 대상인 Apache 로그파일이 블락되어 safe mode가 되어 안전모드를 해제하고 다시 재 실행했다.
SafeModeException: Cannot delete /tmp/hadoop-k2/mapred/system. Name node is in safe mode. [mimul]/hadoop-0.20.204.0> bin/hadoop dfsadmin -safemode leave