How to integrate CLD(Google Compact Language Detector) in solr?

2014-05-22 Thread Shuai Zhang
Hi, 

Do anyone has some ideas about CLD1 OR CLD2 in solr?

I found that accuracy and performance of Google Compact Language Detector is 
very good(Michael McCandless), so I want to have a try, but I don't know how to 
use it.
 


Thanks and Best Regards,
--
Gabriel Zhang


Re: How to integrate CLD(Google Compact Language Detector) in solr?

2014-05-22 Thread Shuai Zhang



 
Correct Michael McCandless's link: Changing Bits: Accuracy and performance of 
Google's Compact Language Detector
 
   Changing Bits: Accuracy and performance of Google's ...
To get a sense of the accuracy and performance of Google's Compact
Language Detector, I ran some tests against two other packages:   
View on blog.mikemccandles... Preview by Yahoo  
 
--
Gabriel Zhang

On Thursday, May 22, 2014 6:50 PM, Shuai Zhang  wrote:
 


Hi, 

Do anyone has some ideas about CLD1 OR CLD2 in solr?

I found that accuracy and performance of Google Compact Language Detector is 
very good(Michael McCandless), so I want to have a try, but I don't know how to 
use it.
 


Thanks and Best Regards,
--
Gabriel Zhang

Re: Stopwords

2014-06-26 Thread Shuai Zhang
Hi,

In fact, you can use analysis page to check the result of query or index 
process!
 


--
Gabriel Zhang



On Thursday, June 26, 2014 5:33 PM, Geert Van Huychem  
wrote:
 


Hello
 
We have the default dutch stopwords implemented in our Solr instance, so words 
like ‘de’, ‘het’, ‘ben’ are filtered at index time.
 
Is there a way to trick Solr into ignoring those stopwords at query time, when 
users puts the search terms between quotes?
 
Best
 
 Geert Van Huychem 
 IT Services & Applications Manager 
 T. +32 2 741 60 22
M. +32 497 27 69 03
ge...@iframeworx.be 
 Media ID CVBA
Rue Barastraat 175
1070 Bruxelles - Brussel (BE)
www.media-id.be   

Does solrj support partial update for solr cloud?

2014-07-10 Thread Shuai Zhang


For now,I used solr 4.7.1, when I test the partial update operation, I found it 
worked fine in HttpSolrServer, But when I used solr cloud CloudSolrServer , it 
cannot be supported!!!

The document will be updated totally instead of partial update!!!


The code I used in my program
Map boxUpdateMap = new HashMap();
>        boxUpdateMap.put("set", boxId);
>        Map folderUpdateMap = new HashMap();
>        folderUpdateMap.put("set", folderId);
>        Map> tagUpdateMap = new HashMapList>();
>        tagUpdateMap.put("set", tagList);
>
>
>        document.addField("box", boxUpdateMap);
>        document.addField("folder", folderUpdateMap);
>        document.addField("tag", tagUpdateMap);


Does anyone give some advice for me?

Thank you very much!

Best Regards,
--
Gabriel Zhang


Re: Does solrj support partial update for solr cloud?

2014-07-11 Thread Shuai Zhang
Thanks shamik, I will check it!
 


--
Gabriel Zhang



On Friday, July 11, 2014 1:39 PM, shamik  wrote:
 


Yes it does and pretty straight forward.

Refer to following url :

http://heliosearch.org/solr/atomic-updates/

http://www.mumuio.com/solrj-4-0-0-alpha-atomic-updates/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-solrj-support-partial-update-for-solr-cloud-tp4146654p4146660.html
Sent from the Solr - User mailing list archive at Nabble.com.

Is there any data importer for cassandra in solr?

2014-07-13 Thread Shuai Zhang
Hi all,

For now, we used cassandra as our DB, and I have to rebuild all the indices for 
solr, but I cannot find any data importer for cassandra.

So for this condition, how should I do?   

Can anyone give me some advices?


Thanks very much~~

Regards,
--
Gabriel Zhang


Re: Is there any data importer for cassandra in solr?

2014-07-13 Thread Shuai Zhang
Hi Alexandre and Jack,

Thanks for your advices. But I still cannot find a better solution for my 
requirement.

For now, our Cassandra has very huge data, and solr cluster's indices has more 
than 120GB, it must be a very slow process when I rebuild all the indices with 
netflix api to fetch all the data from Cassandra(This process will cost more 
than 5 months!!! To slow!!!).

I guess this way maybe not the best way, so I hope I can find another better 
way to solve it.

--
Gabriel Zhang



On Sunday, July 13, 2014 8:11 PM, Jack Krupansky  
wrote:
 


Simple csv files are the easiest way to go:

http://www.datastax.com/dev/blog/simple-data-importing-and-exporting-with-cassandra

The Solr Data Import Handler can be used to import from RDBMS databases to 
DataStax Enterprise with its Solr integration:

http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/srch/srchConfDataHand.html

And you can use csv flat files exported by typical RDBMS's using DSE/Solr as 
with regular OSS Solr.

DataStax Enterprise also supports Hadoop/Sqoop for importing from RDBMS 
databases:

http://www.datastax.com/2012/03/how-to-move-data-from-relational-databases-to-datastax-enterprise-cassandra-using-sqoop

There are also ETL tools from Talend, Pentaho, and JasperSoft that can be 
used to import from RDBMS databases into DataStax Enterprise:

http://www.datastax.com/dev/blog/ways-to-move-data-tofrom-datastax-enterprise-and-cassandra

If those approaches are not sufficient for your needs, maybe you could 
elaborate on any special needs you have.

-- Jack Krupansky


-Original Message----- 
From: Shuai Zhang
Sent: Sunday, July 13, 2014 7:38 AM
To: solr-user@lucene.apache.org
Subject: Is there any data importer for cassandra in solr?

Hi all,

For now, we used cassandra as our DB, and I have to rebuild all the indices 
for solr, but I cannot find any data importer for cassandra.

So for this condition, how should I do?

Can anyone give me some advices?


Thanks very much~~

Regards,
--
Gabriel Zhang 

Re: Is there any data importer for cassandra in solr?

2014-07-13 Thread Shuai Zhang


 Hi Alexandre,

Do you mean the things are that you mentioned or Jack mentioned? 
I tried to search something about DSE, but I cannot find something I need.
Maybe I need to search more...

Thanks again!

Regards,
--
Gabriel Zhang



On Sunday, July 13, 2014 11:24 PM, Alexandre Rafalovitch  
wrote:
 


So you've tried all the things above?

Not clear what the exact problem is that you are trying to solve.

Regards,
    Alex

On 13/07/2014 10:07 pm, "Shuai Zhang" 
wrote:

> Hi Alexandre and Jack,
>
> Thanks for your advices. But I still cannot find a better solution for my
> requirement.
>
> For now, our Cassandra has very huge data, and solr cluster's indices has
> more than 120GB, it must be a very slow process when I rebuild all the
> indices with netflix api to fetch all the data from Cassandra(This process
> will cost more than 5 months!!! To slow!!!).
>
> I guess this way maybe not the best way, so I hope I can find another
> better way to solve it.
>
> --
> Gabriel Zhang
>
>
>
> On Sunday, July 13, 2014 8:11 PM, Jack Krupansky 
> wrote:
>
>
>
> Simple csv files are the easiest way to go:
>
>
> http://www.datastax.com/dev/blog/simple-data-importing-and-exporting-with-cassandra
>
> The Solr Data Import Handler can be used to import from RDBMS databases to
> DataStax Enterprise with its Solr integration:
>
>
> http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/srch/srchConfDataHand.html
>
> And you can use csv flat files exported by typical RDBMS's using DSE/Solr
> as
> with regular OSS Solr.
>
> DataStax Enterprise also supports Hadoop/Sqoop for importing from RDBMS
> databases:
>
>
> http://www.datastax.com/2012/03/how-to-move-data-from-relational-databases-to-datastax-enterprise-cassandra-using-sqoop
>
> There are also ETL tools from Talend, Pentaho, and JasperSoft that can be
> used to import from RDBMS databases into DataStax Enterprise:
>
>
> http://www.datastax.com/dev/blog/ways-to-move-data-tofrom-datastax-enterprise-and-cassandra
>
> If those approaches are not sufficient for your needs, maybe you could
> elaborate on any special needs you have.
>
> -- Jack Krupansky
>
>
> -Original Message-
> From: Shuai Zhang
> Sent: Sunday, July 13, 2014 7:38 AM
> To: solr-user@lucene.apache.org
> Subject: Is there any data importer for cassandra in solr?
>
> Hi all,
>
> For now, we used cassandra as our DB, and I have to rebuild all the indices
> for solr, but I cannot find any data importer for cassandra.
>
> So for this condition, how should I do?
>
> Can anyone give me some advices?
>
>
> Thanks very much~~
>
> Regards,
> --
> Gabriel Zhang

Re: Is there any data importer for cassandra in solr?

2014-07-13 Thread Shuai Zhang
Hi Jack,

Sorry to confuse you, and thanks for your reply!

Because I changed solr document structure so that I have to rebuild all indices 
again.

For our system(Mail System), it used Cassandra as DB, so if I want to rebuild 
all mails' indeces, I need to use Thrift API to read data from Cassandra, and 
use SolrJ API to post update request with these data. But it took sooo long 
time, we have tested, if I want to rebuild all indices, It needs almost 5 
months!!!(For now, we had 0.3 billion documents, index size is 130G, in 18 solr 
shards, increasing 500 thousand per day).

I think Thrift API is too slow, so I want to find another solution to do this.
So that why I said I want to find some things like " solr data importer(DIH)".

Thanks very much~~


Regards,
--
Gabriel Zhang



On Monday, July 14, 2014 10:27 AM, Jack Krupansky  
wrote:
 


Make sure your per-node Solr index data for DSE fits completely in the OS 
system memory that is available for file system caching (just like we try to 
do for OSS Solr!), and limit each node to about 50 million documents or so. 
Anything bigger than a 32GB memory node is probably a waste for a DSE Solr 
node. A 16GB machine for each DSE Solr node is probably okay, but then you 
may have to stay somewhat under that 50 million doc number for each node. 
Proper provisioning of the cluster with enough nodes and enough memory per 
node and not too many documents per node is essential.

But... none of that has anything to do with your subject question of "data 
importer", so... what is the real question here?

-- Jack Krupansky


-Original Message- 
From: Shuai Zhang
Sent: Sunday, July 13, 2014 11:06 AM
To: solr-user@lucene.apache.org
Subject: Re: Is there any data importer for cassandra in solr?

Hi Alexandre and Jack,

Thanks for your advices. But I still cannot find a better solution for my 
requirement.

For now, our Cassandra has very huge data, and solr cluster's indices has 
more than 120GB, it must be a very slow process when I rebuild all the 
indices with netflix api to fetch all the data from Cassandra(This process 
will cost more than 5 months!!! To slow!!!).

I guess this way maybe not the best way, so I hope I can find another better 
way to solve it.

--
Gabriel Zhang



On Sunday, July 13, 2014 8:11 PM, Jack Krupansky  
wrote:



Simple csv files are the easiest way to go:

http://www.datastax.com/dev/blog/simple-data-importing-and-exporting-with-cassandra

The Solr Data Import Handler can be used to import from RDBMS databases to
DataStax Enterprise with its Solr integration:

http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/srch/srchConfDataHand.html

And you can use csv flat files exported by typical RDBMS's using DSE/Solr as
with regular OSS Solr.

DataStax Enterprise also supports Hadoop/Sqoop for importing from RDBMS
databases:

http://www.datastax.com/2012/03/how-to-move-data-from-relational-databases-to-datastax-enterprise-cassandra-using-sqoop

There are also ETL tools from Talend, Pentaho, and JasperSoft that can be
used to import from RDBMS databases into DataStax Enterprise:

http://www.datastax.com/dev/blog/ways-to-move-data-tofrom-datastax-enterprise-and-cassandra

If those approaches are not sufficient for your needs, maybe you could
elaborate on any special needs you have.

-- Jack Krupansky


-Original Message- 
From: Shuai Zhang
Sent: Sunday, July 13, 2014 7:38 AM
To: solr-user@lucene.apache.org
Subject: Is there any data importer for cassandra in solr?

Hi all,

For now, we used cassandra as our DB, and I have to rebuild all the indices
for solr, but I cannot find any data importer for cassandra.

So for this condition, how should I do?

Can anyone give me some advices?


Thanks very much~~

Regards,
--
Gabriel Zhang