根据MapReduce计算的流程,在Map阶段选择好KeyValue,然后在reduce中输出计算结果,计算UV的话,最终的结果只是一个数字
我最开始的思路是:
map阶段选择常数1作为key,uid作为value,在reduce阶段将map输出放置到HashSet中排重,输出hashSet的size即为正确的UV
我运行的日志如下:
uid%1000=0 XXX
uid%1000=1 XXX
uid%1000=2 XXXX
……………………….
uid%1000=998 XXXX
uid%1000=999 XXXX
上面输出的结果并非我需要的,但无碍,可以在这基础上再做mapreduce一遍,reduce任务里求和就可以了
map代码:
public void map(Object key, Text value, Context ctx)
throws IOException, InterruptedException {
try {
String[] elements = value.toString().split(“\\|”); // 按|分隔符打散
String userid = elements[7]; // 获取用户标识
long uidHash = NetUtil.hash(NetUtil.computeMd5(userid), 0); // 哈希转整数
int mapKey = (int) (uidHash % 1000);//取模将key打散
ctx.write(new IntWritable(mapKey), new LongWritable(uidHash));
} catch (Exception e) {
e.printStackTrace();
return;
}
}
reduce代码
public void reduce(IntWritable key, Iterable<LongWritable> values,
Context ctx) throws IOException, InterruptedException {
Set<Long> uidSet = new HashSet<Long>();
Iterator<LongWritable> iter = values.iterator();
while (iter.hasNext()) {
long uid = iter.next().get();
uidSet.add(uid);
}
ctx.write(key, new LongWritable(uidSet.size()));
}
评论