[R] Regarding your distributed text mining with tm

Shivani Rao Fri, 22 Oct 2010 23:19:25 -0700

Hello,
I had been using R for text mining already. I wanted to use R for large
scale text processing and for experiments with topic modeling. I started
reading tutorials and working on some of those. I will now put down my
understanding of each of the tools:


1) R text mining toolbox: Meant for local (client side) text processing and
it uses the XML library
2) Hive: Hadoop interative, provides the framework to call map/reduce and
also provides the DFS  interface for storing files on the DFS.
3) RHIPE: R Hadoop integrated environment
4) Elastic MapReduce with R: a MapReduce framework for those who do not have
their own clusters
5) Distributed Text Mining with R: An attempt to make seamless move form
local to server side processing, from R-tm to R-distributed-tm

I have the following questions and confusions about the above packages

1) Hive and RHIPE and the distributed text mining toolbox need you to have
your own clusters. Right?

2) If I have just one computer how would DFS work in case of HIVE

3) Are we facing with the problem of duplication of effort with the above
packages?

I am hoping to get insights on the above questions in the next few days.
Your timely response will be helpful

Thanks and Regards,
Shivani

-- 
Research Scholar,
School of Electrical and Computer Engineering
Purdue University
West Lafayette IN
web.ics.purdue.edu/~sgrao <http://web.ics.purdue.edu/%7Esgrao>



-- 
Research Scholar,
School of Electrical and Computer Engineering
Purdue University
West Lafayette IN
web.ics.purdue.edu/~sgrao <http://web.ics.purdue.edu/%7Esgrao>



-- 
Research Scholar,
School of Electrical and Computer Engineering
Purdue University
West Lafayette IN
web.ics.purdue.edu/~sgrao

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Regarding your distributed text mining with tm

Reply via email to