hadoop运行mahout的贝叶斯

沙诺 posted @ 2013年11月17日 20:27 in 云计算 with tags hadoop mahout 分类 , 2409 阅读

mahout运行bayes(贝叶斯)算法的前提条件:

(1)启动hadoop
hadoop@master:~$ start-all.sh
(2)成功编译mahout源码
hadoop@master:~$ cd $MAHOUT_HOME
hadoop@master:~/mahout-0.5-src/mahout-distribution-0.5$ mvn install -Dmaven.test.skip=true

mahout运行bayes(贝叶斯)算法的步骤:

(1)生成input的数据
hadoop@master:~/mahout-0.5-src/mahout-distribution-0.5$ mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups -p /home/hadoop/mahout-0.5-src/mahout-distribution-0.5/my-test-data/20news-bydate/20news-bydate-train -o /home/hadoop/mahout-0.5-src/mahout-distribution-0.5/my-test-result/bayes-train-input -a org.apache.mahout.vectorizer.DefaultAnalyzer -c UTF-8  
运行结果:
Running on hadoop, using HADOOP_HOME=/home/hadoop/cloud/hadoop-1.0.4
HADOOP_CONF_DIR=/home/hadoop/cloud/confDir/hadoop/conf
Warning: $HADOOP_HOME is deprecated.

13/08/09 14:07:09 WARN driver.MahoutDriver: No org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.props found on classpath, will use command-line arguments only
13/08/09 14:07:14 INFO driver.MahoutDriver: Program took 5202 ms
(2)生成test的数据
hadoop@master:~/mahout-0.5-src/mahout-distribution-0.5$ mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups -p /home/hadoop/mahout-0.5-src/mahout-distribution-0.5/my-test-data/20news-bydate/20news-bydate-test -o /home/hadoop/mahout-0.5-src/mahout-distribution-0.5/my-test-result/bayes-test-input -a org.apache.mahout.vectorizer.DefaultAnalyzer -c UTF-8  
运行结果:
Running on hadoop, using HADOOP_HOME=/home/hadoop/cloud/hadoop-1.0.4
HADOOP_CONF_DIR=/home/hadoop/cloud/confDir/hadoop/conf
Warning: $HADOOP_HOME is deprecated.

13/08/09 14:13:35 WARN driver.MahoutDriver: No org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.props found on classpath, will use command-line arguments only
13/08/09 14:13:38 INFO driver.MahoutDriver: Program took 3428 ms
(3)将训练文本集上传到HDFS上
hadoop@master:~/mahout-0.5-src/mahout-distribution-0.5$ hadoop dfs -put /home/hadoop/mahout-0.5-src/mahout-distribution-0.5/my-test-result/bayes-train-input/ bayes-train-input
(4)模型训练:依据训练文本集来训练贝叶斯分类器模型
解释一下命令:-i:表示训练集的输入路径,HDFS路径; -o:分类模型输出路径; -type:分类器类型,这里使用bayes,可选cbayes;
-ng:(n-gram)建模的大小,默认为1; -source:数据源的位置,HDFS或HBase
hadoop@master:~/mahout-0.5-src/mahout-distribution-0.5$ mahout trainclassifier -i bayes-train-input -o bayes-newsmodel -type bayes -ng 1 -source hdfs
(5)将测试文本集上传到HDFS上
hadoop@master:~/mahout-0.5-src/mahout-distribution-0.5$ hadoop dfs -put /home/hadoop/mahout-0.5-src/mahout-distribution-0.5/my-test-result/bayes-test-input/ bayes-test-input
(6)模型测试:依据训练的贝叶斯分类器模型来进行分类测试
hadoop@master:~/mahout-0.5-src/mahout-distribution-0.5$ mahout testclassifier -m bayes-newsmodel -d bayes-test-input -type bayes -ng 1 -source hdfs -method mapreduce
运行结果(与apache官网里面的一致):
13/08/09 14:56:34 INFO bayes.BayesClassifierDriver: =======================================================
Confusion Matrix
-------------------------------------------------------
a        b        c        d        e        f        g        h        i        j        k        l        m        n        o        p        q        r        s        t        u        <--Classified as
381      0        0        0        0        9        1        0        0        0        1        0        2        0        0        1        0        0        3        0        0         |  398       a     = rec.motorcycles
1        284      0        0        0        0        1        0        6        3        11       0        3        66       0        1        6        0        4        9        0         |  395       b     = comp.windows.x
2        0        339      2        0        3        5        1        0        0        0        0        1        1        12       1        7        0        2        0        0         |  376       c     = talk.politics.mideast
4        0        1        327      0        2        2        0        0        2        1        1        5        0        1        4        12       0        2        0        0         |  364       d     = talk.politics.guns
7        0        4        32       27       7        7        2        0        12       0        0        0        6        100      9        7        31       0        0        0         |  251       e     = talk.religion.misc
10       0        0        0        0        359      2        2        0        1        3        0        6        1        0        1        0        0        11       0        0         |  396       f     = rec.autos
0        0        0        0        0        1        383      9        1        0        0        0        0        0        0        0        0        0        3        0        0         |  397       g     = rec.sport.baseball
1        0        0        0        0        0        9        382      0        0        0        0        1        1        1        0        2        0        2        0        0         |  399       h     = rec.sport.hockey
2        0        0        0        0        4        3        0        330      4        4        0        12       5        0        0        2        0        12       7        0         |  385       i     = comp.sys.mac.hardware
0        3        0        0        0        0        1        0        0        368      0        0        4        10       1        3        2        0        2        0        0         |  394       j     = sci.space
0        0        0        0        0        3        1        0        27       2        291      0        25       11       0        0        1        0        13       18       0         |  392       k     = comp.sys.ibm.pc.hardware
8        0        1        109      0        6        11       4        1        18       0        98       3        1        11       10       27       1        1        0        0         |  310       l     = talk.politics.misc
6        0        1        0        0        4        2        0        5        2        12       0        321      8        0        4        14       0        8        6        0         |  393       m     = sci.electronics
0        11       0        0        0        3        6        0        10       7        11       0        13       298      0        2        13       0        7        8        0         |  389       n     = comp.graphics
2        0        0        0        0        0        4        1        0        3        1        0        1        3        372      6        0        2        1        2        0         |  398       o     = soc.religion.christian
4        0        0        1        0        2        3        3        0        4        2        0        12       7        6        342      1        0        9        0        0         |  396       p     = sci.med
0        1        0        1        0        1        4        0        3        0        1        0        4        8        0        2        369      0        1        1        0         |  396       q     = sci.crypt
10       0        4        10       1        5        6        2        2        6        2        0        1        2        86       15       14       152      0        1        0         |  319       r     = alt.atheism
4        0        0        0        0        9        1        1        8        1        12       0        6        3        0        2        0        0        341      2        0         |  390       s     = misc.forsale
8        5        0        0        0        1        6        0        8        5        50       0        2        39       1        0        9        0        3        257      0         |  394       t     = comp.os.ms-windows.misc
0        0        0        0        0        0        0        0        0        0        0        0        0        0        0        0        0        0        0        0        0         |  0         u     = unknown
Default Category: unknown: 20


13/08/09 14:56:34 INFO driver.MahoutDriver: Program took 118128 ms

这一篇是转的,上一篇是自己写的,但是自己写的中间过程不完整,所以又转了一篇。

Avatar_small
Celeb Networth 说:
2020年9月28日 01:37

Billie Eilish - the youngest person and second person ever to win the four main Grammy categories, was born in December 18, 2001, find out more about other singer's birthday on Idol Worth

Avatar_small
토토디펜드 说:
2024年9月09日 21:50

Our the purpose is to share the reviews about the latest Jackets,Coats and Vests also share the related Movies,Gaming, Casual,Faux Leather and Leather materials available

Avatar_small
get more info 说:
2024年9月09日 22:10

Pretty useful article. I merely stumbled upon your internet site and wanted to say that I’ve very favored learning your weblog posts. Any signifies I’ll be subscribing with your feed and I hope you publish once additional soon.

Avatar_small
View data 说:
2024年9月09日 22:12

Hello to every one, the contents existing at this site are actually remarkable for people knowledge,Pretty! This has been a really wonderful article...Thank you for providing this information.Hello.This article was extremely motivating, particularly because I waas looking for thoughts onn his subjectt lat Sunday. 

Avatar_small
website 说:
2024年9月09日 22:35

I am incapable of reading articles online very often, but I’m happy I did today. It is very well written, and your points are well-expressed. I request you warmly, please, don’t ever stop writing.Do you want to buy the best Chromebook for artists?

Avatar_small
먹튀젠더 说:
2024年9月09日 22:36

This surely helps me in my work. Lol, thanks for your comment! wink Glad you found it helpful..f you want to be successful in weight loss, you have to focus on more than just how you look. An approach that taps into how you feel, your overall health

Avatar_small
contents 说:
2024年9月09日 22:54

These are all really solid tips. I can’t tell you how often I see a comment on one of the blogs I write for that amounts to “nice post” and nothing more. I’ve even worked at a couple of sites that delete comments that don’t add to the discussion!

Avatar_small
click here 说:
2024年9月09日 22:56

alary of a fashion largely depends on the nature of work. Opening a fashion outlet or a boutique gives a low start, but; ensures higher returns in the long term depending on the creativity and market sense of an individual

Avatar_small
토디야 说:
2024年9月09日 22:58

Hello, I'm happy to see some great articles on your site. Would you like to come to my site later? My site also has posts, comments and communities similar to yours. Please visit and take a look

Avatar_small
get more info 说:
2024年9月09日 23:01

Thanks for all the tips mentioned in this article! it’s always good to read things you have heard before and are implementing, but from a different perspective, always pick up some extra bits of information.

Avatar_small
read more 说:
2024年9月09日 23:02

I am unable to read articles online very often, but I’m glad I did today. This is very well written and your points are well-expressed. Please, don’t ever stop writing.I’ve even worked at a couple of sites that delete comments that don’t add to the discussion!

Avatar_small
get more info 说:
2024年9月09日 23:04

This is a great article, Given such a great amount of information in it, These kind of articles keeps the clients enthusiasm for the site, and continue sharing more … good fortunes.Click Here

Avatar_small
here 说:
2024年9月09日 23:05

 Exceptional examine, positive website online, in which did u come up with the data on this posting? I have read a number of the articles in your website now, and i definitely like your style. Thank you a million and please maintain up the powerful work.

Avatar_small
here 说:
2024年9月09日 23:07

I was very pleased to find this page. I wanted to thank you for your time due to this wonderful read!! I definitely loved every part of it and I have you bookmarked to check out new information on your blog. Feel free to visit my website;

Avatar_small
More details 说:
2024年9月09日 23:08

 to read it,Waiting For More new Update and I Already Read your Recent Post its Great Thank..great article. thank you for sharin..Great post. Thank You For Sharing Valuable. information. It is Very Informative article..I urge you to peruse this content it is fun portrayed .

Avatar_small
nakedemperornews 说:
2024年9月09日 23:09

Your article is very interesting. I think this article has a lot of information needed, looking forward to your new posts. Get permission to share .I bookmark this site and will track down your posts often from now on. Much obliged once more

Avatar_small
안전놀이터 说:
2024年9月09日 23:11

Are you playing in an old, beat up, pair of basketball shoes? You could be risking injury, poor game play, and more! This article shows you why picking the right basketball is so important, and how to do it properly. 

Avatar_small
here 说:
2024年9月09日 23:12

Thanks for all the tips mentioned in this article! it’s always good to read things you have heard before and are implementing, but from a different perspective, always pick up some extra bits of information.

Avatar_small
토토실험실 说:
2024年9月09日 23:37

I am incapable of reading articles online very often, but I’m happy I did today. It is very well written, and your points are well-expressed. I request you warmly, please, don’t ever stop writing.Do you want to buy the best Chromebook for artists?

Avatar_small
먹튀뱅크 说:
2024年9月09日 23:59

I am aware that people like you, are naturally intuitive, resourceful and already have the ability to heal yourself and transform into the person you want to be, you just need assistance to tap into your own resources.

Avatar_small
References 说:
2024年9月10日 00:22

I am incapable of reading articles online very often, but I’m happy I did today. It is very well written, and your points are well-expressed. I request you warmly, please, don’t ever stop writing.Do you want to buy the best Chromebook for artists?

Avatar_small
get more info 说:
2024年9月10日 00:37

 is everything that deals with clothes, accessories, footwear, jewelry, hairstyle and etc. It is a habitual trend in which a person dresses up in her best, does her make up, wears her accessories and shoes. Looking good is the main aim of fashion

Avatar_small
contents 说:
2024年9月10日 01:01

Pretty useful article. I merely stumbled upon your internet site and wanted to say that I’ve very favored learning your weblog posts. Any signifies I’ll be subscribing with your feed and I hope you publish once additional soon.


登录 *


loading captcha image...
(输入验证码)
or Ctrl+Enter