Notice
Recent Posts
Recent Comments
Link
| 일 | 월 | 화 | 수 | 목 | 금 | 토 |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | 31 |
Tags
- 스파이앱
- 폴-안티스파이앱
- TortoiseSVN
- windows vista
- 설치
- lagom
- svn
- 라곰
- Lambda Expressions
- 라곰프레임워크
- 스포티지r 풀체인지
- volatile
- 썬
- 책상
- Subversion
- 차이점
- lagom framework
- 폴-안티
- 정통춘천닭갈비
- java8 람다식
- 폴안티
- 폴-안티 스파이앱
- selenium #scraping #webscraping #jsoup #firefox
- 폴안티 스파이앱
- 윈도우즈 비스타
- CVS
- 모니터
- 명주
- 한강 #야경 #한강야경
- 폴안티스파이앱
Archives
- Today
- Total
장발의 개발러
Hadoop을 이용한 Apache Log 분석 본문
출처: http://mimul.com/pebble/default/2011/11/05/1320482173560.html
아래는 Apache 로그를 가지고 IP별 방문자수를 카운트하는 샘플을 작성해 보았습니다. 로그 분석을 위해 개념이해하는데 도움이 될까 해서 테스트 내용을 공유합니다. ^^
좀 더 응용하면 시간당, 일별, 월별, 주별 등으로 구분해서 트래픽을 산정할 수 있겠죠.
그리고 R과 결합해 그래프로 도식화하면 추세를 보는데 도움을 줄 수 있습니다.
1. 로그 데이터(Apache) 분석 준비
Apache Common로그를 다운 받아서 /database/samples/data/apache 디렉토리에 넣은 다음 아래의 순으로 HDFS에 카피를 하여 분석할 준비를 한다.
- 로컬 데이터 HDFS에 카피
[mimul]/hadoop-0.20.204.0> bin/hadoop dfs -copyFromLocal
/database/samples/data/apache apache [mimul]/hadoop-0.20.204.0> bin/hadoop dfs -ls Found 5 items drwxr-xr-x - k2 KPCT 0 2011-11-02 18:07 /user/k2/apache drwxr-xr-x - k2 KPCT 0 2011-10-21 12:52 /user/k2/gutenberg drwxr-xr-x - k2 KPCT 0 2011-10-21 12:57 /user/k2/gutenberg-output drwxr-xr-x - k2 KPCT 0 2011-09-28 20:47 /user/k2/input drwxr-xr-x - k2 KPCT 0 2011-09-28 20:49 /user/k2/output [mimul]/hadoop-0.20.204.0> bin/hadoop dfs -ls apache Found 4 items -rw-r--r-- 1 k2 KPCT 15380331 2011-11-02 18:07 /user/k2/apache/access_log.20110701 -rw-r--r-- 1 k2 KPCT 11754087 2011-11-02 18:07 /user/k2/apache/access_log.20110702 -rw-r--r-- 1 k2 KPCT 12220413 2011-11-02 18:07 /user/k2/apache/access_log.20110703 -rw-r--r-- 1 k2 KPCT 14435475 2011-11-02 18:07 /user/k2/apache/access_log.20110704
2. MapReduce 샘플 소스
- LogMapper.java
public class LogMapper extends Mapper<Object, Text, Text, IntWritable> {
private static Logger logger = Logger.getLogger(LogMapper.class);
@Override
protected void map(Object key,Text value,Context context) throws IOException,
InterruptedException {
String logEntryLine = value.toString();
String logEntryPattern = Constants.APACHE_ACCESS_LOG;
int position = 1;
if (isValidLine(logEntryLine, logEntryPattern)){
String ipAddress=retrieveIPAddress(logEntryLine,logEntryPattern,
position);
logger.warn("Ip address in map : "+ipAddress);
context.write(new Text(ipAddress), Constants.ONE);
}
}
public boolean isValidLine(String logEntryLine,String logEntryPattern){
Pattern p = Pattern.compile(logEntryPattern);
Matcher matcher = p.matcher(logEntryLine);
if (!matcher.matches()){
logger.warn("정상적인 로그 포멧이 아님");
return false;
}
; return true;
}
public String retrieveIPAddress(String logEntryLine,String logEntryPattern,
int position){
Pattern p = Pattern.compile(logEntryPattern);
Matcher matcher = p.matcher(logEntryLine);
if (!matcher.matches() ||
Constants.PARSE_CNT != matcher.groupCount()) {
logger.warn("비정상 로그");
return Constants.INVALID_IPADDRESS;
}
return matcher.group(position);
}
}
- LogReducer.javapublic class LogReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key,Iterable<IntWritable> values,Context context)
throws IOException ,InterruptedException {
int sum=0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}
- LogMapReduceJob.javapublic class LogMapReduceJob extends Configured implements Tool
{
private static void initJob(String jobName, Configuration config,
String inputPath, String outputPath) throws IOException,
InterruptedException, ClassNotFoundException{
Job job=new Job(config, jobName);
job.setJarByClass(LogMapReduceJob.class);
job.setMapperClass(LogMapper.class);
job.setReducerClass(LogReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(inputPath));
FileOutputFormat.setOutputPath(job, new Path(outputPath));
job.waitForCompletion(true);
}
public static void main(String[] args) throws Exception {
ToolRunner.run(new LogMapReduceJob(), args);
}
@Override
public int run(String[] args) throws Exception {
Configuration config = new Configuration();
System.out.println("Args [0] :"+args[0]);
System.out.println("Args [1] :"+args[1]);
System.out.println("Args [2] :"+args[2]);
initJob(args[0], config, args[1], args[2]);
return 0;
}
}
- build 파일. LogAnalizerMapReduce.jar
3. 로그 데이터 분석
LogAnalizerMapReduce.jar 파일을 실행하여 Apache로그를 분석한다.
[mimul]/hadoop-0.20.204.0> bin/hadoop jar LogAnalizerMapReduce.jar apache apache-output Args [0] :LogAnalyze Args [1] :apache Args [2] :apache-output 11/11/03 15:39:38 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 11/11/03 15:39:38 INFO input.FileInputFormat: Total input paths to process : 1 11/11/03 15:39:38 INFO mapred.JobClient: Running job: job_201111031454_0003 11/11/03 15:39:39 INFO mapred.JobClient: map 0% reduce 0% 11/11/03 15:39:56 INFO mapred.JobClient: map 46% reduce 0% 11/11/03 15:39:59 INFO mapred.JobClient: map 78% reduce 0% 11/11/03 15:40:02 INFO mapred.JobClient: map 100% reduce 0% 11/11/03 15:40:17 INFO mapred.JobClient: map 100% reduce 100% 11/11/03 15:40:22 INFO mapred.JobClient: Job complete: job_201111031454_0003 11/11/03 15:40:22 INFO mapred.JobClient: Counters: 25 11/11/03 15:40:22 INFO mapred.JobClient: Job Counters 11/11/03 15:40:22 INFO mapred.JobClient: Launched reduce tasks=1 11/11/03 15:40:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22317 11/11/03 15:40:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 11/11/03 15:40:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 11/11/03 15:40:22 INFO mapred.JobClient: Launched map tasks=1 11/11/03 15:40:22 INFO mapred.JobClient: Data-local map tasks=1 11/11/03 15:40:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13390 11/11/03 15:40:22 INFO mapred.JobClient: File Output Format Counters 11/11/03 15:40:22 INFO mapred.JobClient: Bytes Written=105178 11/11/03 15:40:22 INFO mapred.JobClient: FileSystemCounters 11/11/03 15:40:22 INFO mapred.JobClient: FILE_BYTES_READ=1806396 11/11/03 15:40:22 INFO mapred.JobClient: HDFS_BYTES_READ=15380456 11/11/03 15:40:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3655077 11/11/03 15:40:22 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=105178 11/11/03 15:40:22 INFO mapred.JobClient: File Input Format Counters 11/11/03 15:40:22 INFO mapred.JobClient: Bytes Read=15380331 11/11/03 15:40:22 INFO mapred.JobClient: Map-Reduce Framework 11/11/03 15:40:22 INFO mapred.JobClient: Reduce input groups=6327 11/11/03 15:40:22 INFO mapred.JobClient: Map output materialized bytes=1806396 11/11/03 15:40:22 INFO mapred.JobClient: Combine output records=0 11/11/03 15:40:22 INFO mapred.JobClient: Map input records=86763 11/11/03 15:40:22 INFO mapred.JobClient: Reduce shuffle bytes=1806396 11/11/03 15:40:22 INFO mapred.JobClient: Reduce output records=6327 11/11/03 15:40:22 INFO mapred.JobClient: Spilled Records=172974 11/11/03 15:40:22 INFO mapred.JobClient: Map output bytes=1633416 11/11/03 15:40:22 INFO mapred.JobClient: Combine input records=0 11/11/03 15:40:22 INFO mapred.JobClient: Map output records=86487 11/11/03 15:40:22 INFO mapred.JobClient: SPLIT_RAW_BYTES=125 11/11/03 15:40:22 INFO mapred.JobClient: Reduce input records=864874. Jobtracker
[mimul]/hadoop-0.20.204.0> bin/hadoop dfs -ls apache-output Found 3 items -rw-r--r-- 1 k2 KPCT 0 2011-11-03 15:22 /user/k2/apache-output/_SUCCESS drwxr-xr-x - k2 KPCT 0 2011-11-03 15:22 /user/k2/apache-output/_logs -rw-r--r-- 1 k2 KPCT 105178 2011-11-03 15:22 /user/k2/apache-output/part-r-00000
[mimul]/hadoop-0.20.204.0> bin/hadoop fs -cat apache-output/part-r-00000 58.225.20.33 1 58.225.23.88 6 58.226.140.46 2 58.227.139.198 7 58.227.156.67 1 58.227.19.60 2 58.227.204.143 3 58.227.31.86 3 58.228.13.105 1 58.228.3.131 1 58.228.60.70 1 58.228.89.39 2 58.229.111.26 1 58.229.146.205 10
5. Troubleshooting
분석 대상인 Apache 로그파일이 블락되어 safe mode가 되어 안전모드를 해제하고 다시 재 실행했다.
SafeModeException: Cannot delete /tmp/hadoop-k2/mapred/system. Name node is in safe mode. [mimul]/hadoop-0.20.204.0> bin/hadoop dfsadmin -safemode leave