樓主: Nicolle
898 1

[專題彙編]Mahout [分享]

版主

巨擘

0%

還不是VIP/貴賓

-

TA的文庫  其他...

Python Programming

SAS Programming

Must-Read Books

威望
16
論壇币
12292647 个
通用積分
262.2717
學術水平
3071 点
熱心指數
3066 点
信用等級
2862 点
經驗
453053 点
帖子
21118
精華
92
在線時間
8072 小时
注冊时间
2005-4-23
最后登錄
2019-11-15

Nicolle 学生认证  发表于 2016-3-25 09:33:47 |顯示全部樓層
Mahout實戰(中文版)
  [Case Study] Cluster Analysis using Mahout
  [Case Study] Native Bayesian using Mahout
  [Case Study]Hidden Markov Model Using  Apache Mahout
  [Case Study]Logistic Regression Using  Apache Mahout
  [Case Study]Naïve Bayes using  Apache Mahout
  [Case Study]Random Forest Using  Apache Mahout
  [Case Study]基于Mahout的電影推薦系统
  Mahout in Action
  Mahout關聯規則源碼分析
  Mahout决策树之Partial Implementation 实战
  Mahout機器學習算法
  [Video Lecture]SCALING INTELLIGENT DATA  ANALYSIS WITH APACHE MAHOUT
  [Video Lecture]Scaling machine learning with  Apache Mahout
  [大数据时代]Apache Mahout Cookbook
  [大数据时代]Apache Mahout-Scalable Machine  Learning for Everyone
  [大数据时代]Dimensional Reduction using  Apache Mahout
  [大数据时代]Isabel  Drost:Apache Mahout Large Scale Machine Learning
  [大數據時代]Mahout學習資料整理
  Mahout算法解析與案例實戰迷你書
  Mahout in Action
  Learning Apache Mahout
  Learning Apache Mahout Classification
  Mahout in Action
  Apache Mahout Essentials
  Bayesian classifiers in Mahout
  Case Study  using Mahout Recommendation Algorithm
  Collaborative Filtering with Apache Mahout
  DB2,SQLserver,spark,mahout的概念特點
  Machine Learning using Mahout Naive  Bayesian
  Mahout中的PFPGrowth算法源碼解析
  Mahout實現的機器學習算法
  Mahout推薦算法API詳解
  Mahout構建貝葉斯文本分類器案例詳解
  Mahout貝葉斯算法開發思路
  Mahout資源總彙
  关于DB2,SQLserver,spark,mahout的概念特點
  基于Apache Mahout 构建社会化推荐引擎
  大数据相关技术书籍(Hadoop, MapReduce, Hive, HBase,  Redis, Mahout)
  大数据相关技术书籍(Hadoop, MapReduce, Hive, HBase,  Redis, Mahout)集中放送
  请问mahout和weka各有什么优劣 那个更好用啊~~有木有计算机专业的数据挖掘高手来帮
  贝叶斯分类解析(using Mahout)

關鍵詞:mahout Out Intelligent Dimensional classifiers learning 電影推薦 machine Random 中文版

本帖被以下文庫推荐

bailihongchen 发表于 2016-3-26 09:50:07 |顯示全部樓層

Naïve Bayes using Apache Mahout

  1. 1. Create a 20newsdata directory and unzip the data here:
  2. mkdir /tmp/20newsdata
  3. cd /tmp/20newsdata
  4. tar –xzvf /tmp/20news-bydate.tar.gz
  5. 2. You will see two folders under 20newsdata: 20news-bydate-test and 20newsbydate-
  6. train. Now create another directory called 20newsdataall and merge both
  7. the training and test data of the 20 newsgroups.
  8. 3. Come out of the directory and move to the home directory and execute the following:
  9. mkdir /tmp/20newsdataall
  10. cp –R /20newsdata/*/* /tmp/20newsdataall
  11. 4. Create a directory in Hadoop and save this data in HDFS format:
  12. hadoop fs –mkdir /user/hue/20newsdata
  13. hadoop fs –put /tmp/20newsdataall /user/hue/20newsdata
  14. 5. Convert the raw data into a sequence file. The seqdirectory command will generate
  15. sequence files from a directory. Sequence files are used in Hadoop. A sequence file is
  16. a flat file that consists of binary key/value pairs. We are converting the files into
  17. sequence files so that it can be processed in Hadoop, which can be done using the
  18. following command:
  19. bin/mahout seqdirectory -i /user/hue/20newsdata/20newsdataall -o
  20. /user/hue/20newsdataseq-out
複制代碼
Learning Apache Mahout Classification
已有 1 人評分論壇币 收起 理由
Nicolle + 20 鼓励积极發帖讨论

總評分: 論壇币 + 20   查看全部評分

您需要登錄后才可以回帖 登錄 | 我要注冊

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 論壇法律顾问:王进律师 知識産權保護聲明   免責及隱私聲明

GMT+8, 2019-11-15 10:18