Solr/Lucene Faceted Search Too Many Unique Values?

2014-01-22 Thread Bing Hua
Hi,

I am going to evaluate some Lucene/Solr capabilities on handling faceted
queries, in particular, with a single facet field that contains large number
(say up to 1 million) of distinct values. Does anyone have some experience
on how lucene performs in this scenario?

e.g. 
Doc1 has tags A B C D 
Doc2 has tags B C D E 
etc etc millions of docs and there can be millions of distinct tag values.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Lucene-Faceted-Search-Too-Many-Unique-Values-tp4112860.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCell takes InputStream

2012-12-04 Thread Bing Hua
Hi,

While using ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");

The two ways of adding a file are
up.addFile(File)
up.addContentStream(ContentStream)

However my raw files are stored on some remote storage devices. I am able to
get an InputStream object for the file to be indexed. To me it may seem
awkward to have the file temporarily stored locally. Is there a way of
directly passing the InputStream in (e.g. constructing ContentStream using
the InputStream)?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCell-takes-InputStream-tp4024315.html
Sent from the Solr - User mailing list archive at Nabble.com.


Search match all tokens in Query Text

2013-01-31 Thread Bing Hua
Hello,

I have a field text with type text_general here.















When I query for text:a b, solr returns results that contain only a but not
b. That is, it uses OR operator between the two tokens.

Am I right here? What should I do to force an AND operator between the two
tokens?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-match-all-tokens-in-Query-Text-tp4037758.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search match all tokens in Query Text

2013-01-31 Thread Bing Hua
Thanks for the quick reply. Seems like you are suggesting to add explicitly
AND operator. I don't think this solves my problem.

I found it  somewhere, and this
works.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-match-all-tokens-in-Query-Text-tp4037758p4037762.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to define a lowercase fieldtype without tokenizer

2013-02-14 Thread Bing Hua
Hi,

I don't want the field to be tokenized because Solr doesn't support sorting
on a tokenized field. In order to do case insensitive sorting I need to copy
a field to a lowercase but not tokenized field. How to define this?

I did below but it says I need to specify a tokenizer or a class for
analyzer. 










Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-define-a-lowercase-fieldtype-without-tokenizer-tp4040500.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to define a lowercase fieldtype without tokenizer

2013-02-14 Thread Bing Hua
Works perfectly. Thank you. I didn't know this tokenizer does nothing before
:)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-define-a-lowercase-fieldtype-without-tokenizer-tp4040500p4040507.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multiple Embedded Servers Pointing to single solrhome/index

2012-08-06 Thread Bing Hua
Hi,

I'm trying to use two embedded solr servers pointing to a same solrhome /
index. So that's something like

System.setProperty("solr.solr.home", "SomeSolrDir");
CoreContainer.Initializer initializer = new
CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
m_server = new EmbeddedSolrServer(coreContainer, "");

on both applications. The problem is, after I have done one add+commit
SolrInputDocument on one embedded server, the other server would fail to
obtain write lock any more. I'm thinking there must be a way of releasing
write lock so other servers may pick up. Is there an API that does so?

Any inputs are appreciated.
Bing


Re: Multiple Embedded Servers Pointing to single solrhome/index

2012-08-07 Thread Bing Hua
Thanks Lance. The use case is to have a cluster of nodes which runs the same
application with EmbeddedSolrServer on each of them, and they all point to
the same index on NFS. Every application is designed equal, meaning that
everyone may index and/or search. 

In such way, after every commit the writer needs to be closed for other
nodes' availability.

Do you see any issues of this use case? Is the EmbeddedSolrServer able to
release its write lock without shutting down?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p3999591.html
Sent from the Solr - User mailing list archive at Nabble.com.


Does Solr support 'Value Search'?

2012-08-07 Thread Bing Hua
Hi folks,

Just wondering if there is a query handler that simply takes a query string
and search on all/part of fields for field values?

e.g. 
q=*admin*

Response may look like
author: [admin, system_admin, sub_admin]
last_modifier: [admin, system_admin, sub_admin]
doctitle: [AdminGuide, AdminManual]



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr index storage strategy on FileSystem

2012-08-07 Thread Bing Hua
Hi folks,

With StandardDirectoryFactory, index is stored under data/index in forms of
frq, tim, tip and a few other files. While index grows larger, more files
are generated and sometimes it merges a few of them. It's like there're some
kinds of separation and merging strategies there.

My question is, are the separation / merging strategies configurable?
Basically I want to add a size limit for any individual file. Is it feasible
without changing solr core code?

Thanks!
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-index-storage-strategy-on-FileSystem-tp3999661.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does Solr support 'Value Search'?

2012-08-08 Thread Bing Hua
Thanks for the response but wait... Is it related to my question searching
for field values? I was not asking how to use wildcards though. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p3999817.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does Solr support 'Value Search'?

2012-08-08 Thread Bing Hua
Not quite understand but I'd explain the problem I had. The response would
contain only fields and a list of field values that match the query.
Essentially it's querying for field values rather than documents. The
underlying use case would be, when typing in a quick search box, the drill
down menu may contain matches on authors, on doctitles, and potentially on
other fields.

Still thanks for your response and hopefully I'm making it clearer.
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p327.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple Embedded Servers Pointing to single solrhome/index

2012-08-09 Thread Bing Hua
Makes sense. Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p4000180.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does Solr support 'Value Search'?

2012-08-09 Thread Bing Hua
Thanks Kuli and Mikhail,

Using either termcomponent or suggester I could get some suggested terms but
it's still confusing me how to get the respective field names. In order to
get that, Use TermComponent I'll need to do a term query to every possible
field. Similar things as using SpellCheckComponent. CopyField won't help
since I want the original field name.

Any suggestions?
Bing 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p4000267.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple Embedded Servers Pointing to single solrhome/index

2012-08-09 Thread Bing Hua
I agree. We chose embedded to minimize the maintenance cost of http solr
servers.

One more concern. Even if I have only one node doing indexing, other nodes
need to reopen index reader periodically to catch up with new changes,
right? Is there a solr request that does this?

Thanks,
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p4000269.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multiple SpellCheckComponents

2012-08-09 Thread Bing Hua
Hello,

Background is that I want to use both Suggest and SpellCheck features in a
single query to have alternatives returned at one time. Right now I can only
specify one of them using spellcheck.dictionary at query time. 

  

  default
  ..



  suggest
  

  

Am I able to use two separate SpellCheckComponents for these two and add
them to a same searchhandler to achieve this? I tried and seems like one is
overwriting the other.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-SpellCheckComponents-tp4000272.html
Sent from the Solr - User mailing list archive at Nabble.com.


SpellCheckComponent Collation query

2012-08-09 Thread Bing Hua
Hello,

>From spell check component I'm able to get the collation query and its # of
hits. Is it possible to have solr execute the collated query automatically
and return doc search results without resending it on client side?

Thanks,
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheckComponent-Collation-query-tp4000273.html
Sent from the Solr - User mailing list archive at Nabble.com.


Tlog vs. buffer + softcommit.

2012-08-09 Thread Bing Hua
Hello,

I'm a bit confused with the purpose of Transaction Logs (Update Logs) in
Solr.

My understanding is, update request comes in, first the new item is put in
RAM buffer as well as T-Log. After a soft commit happens, the new item
becomes searchable but not hard committed in stable storage. Configuring
soft commit interval to 1 sec achieves NRT.

Then what exactly T-Log is doing in this scenario? Why is it there and under
what circumstances is it being cleared? 

I tried to search for online documentations but no success. Trying to get
something from source code. Any hints would be appreciated.

Thanks,
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tlog-vs-buffer-softcommit-tp4000330.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tlog vs. buffer + softcommit.

2012-08-10 Thread Bing Hua
Thanks for the information. It definitely helps a lot. There're
numDeletesToKeep = 1000; numRecordsToKeep = 100; in UpdateLog so this should
probably be what you're referring to. 

However when I was doing indexing the total size of TLogs kept on
increasing. It doesn't sound like the case where there's a cap for number of
documents? Also for peersync, can I find some intro online?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tlog-vs-buffer-softcommit-tp4000330p4000503.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tlog vs. buffer + softcommit.

2012-08-10 Thread Bing Hua
I remember I did set the 15sec autocommit and still saw the Tlogs growing
unboundedly. But sounds like theoretically it should not if I index in a
constant rate. I'll probably try it again sometime.

For the peersync, I think solr cloud now uses push-replication over pull.
Hmm, it makes sense to keep an amount of Tlogs for peers to sync up.

Thanks,
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tlog-vs-buffer-softcommit-tp4000330p4000509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr4.0 Partially update document

2012-08-13 Thread Bing Hua
Hi,

Several days ago I came across some solrj test code on partially updating
document field values. Sadly I forgot where that was. In Solr 4.0, "/update"
is able to take in document id and fields as hashmaps like

"id": "doc1"
"field1": {"set":"new_value"}

Just trying to figure out what's the solrj client code that does this.

Thanks for any help on this,
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-0-Partially-update-document-tp4000875.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr4.0 Partially update document

2012-08-13 Thread Bing Hua
Got it at 

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/test/org/apache/solr/client/solrj/SolrExampleTests.java

Problem solved.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-0-Partially-update-document-tp4000875p4000878.html
Sent from the Solr - User mailing list archive at Nabble.com.


Getting Suggestions without Search Results

2012-08-13 Thread Bing Hua
Hi,

I'm having a spell check component that does auto-complete suggestions. It
is part of "last-components" of my /select search handler. So apart from
normal search results I also get a list of suggestions.

Now I want to split things up. Is there a way that I can only get
suggestions of a query without getting the normal search results? I may need
to create a new handler for this. Can anyone please give me some ideas on
that?

Thanks,
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-Suggestions-without-Search-Results-tp4000968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting Suggestions without Search Results

2012-08-14 Thread Bing Hua
Great comments. Thanks to you all.
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-Suggestions-without-Search-Results-tp4000968p4001192.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing thousands file on solr

2012-08-14 Thread Bing Hua
You may write a client using solrj and loop through all files in that folder.
Something like,

ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(new File(fileLocation), null);
ModifiableSolrParams p = new ModifiableSolrParams();
p.add("literal.id", str);
...
up.setParams(p);
server.request(up);

Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-thousands-file-on-solr-tp4001050p4001196.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Are there any comparisons of Elastic Search specifically with SOLR 4?

2012-08-14 Thread Bing Hua
Most of existing comparisons were done on Solr3.x or earlier against ES.
After Solr4 added those cloud concepts similar to ES's, there are really
less differences. Solr is more heavier loaded and was not designed for
maximize elasticity In my opinion. It's not hard to decide which way to go
as long as you have a preference on better scalability or better stability &
online supports.

Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Are-there-any-comparisons-of-Elastic-Search-specifically-with-SOLR-4-tp4000889p4001237.html
Sent from the Solr - User mailing list archive at Nabble.com.


Send plain text file to solr for indexing

2012-08-30 Thread Bing Hua
Hello,

I used to use solrcell, which has built-in tika support to handle both
extraction and indexing of raw documents. Now I got another text extraction
provider to convert raw document to a plain text txt file so I want to let
solr bypass that extraction phase. Is there a way I can send the plain txt
file to solr to simply index that as a fulltext field without doing
extraction on that file?

Thanks,
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Send-plain-text-file-to-solr-for-indexing-tp4004515.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Send plain text file to solr for indexing

2012-08-31 Thread Bing Hua
So in order to use solrcell I'll have to add a number of dependent libraries,
which is one of what I'm trying to avoid. The second thing is, solrcell
still parses the plain text files and I don't want it to make any change to
those of my exported files.

Any ideas?
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Send-plain-text-file-to-solr-for-indexing-tp4004515p4004753.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Send plain text file to solr for indexing

2012-08-31 Thread Bing Hua
Thanks Mr.Yagami. I'll look into that.

Jack, for the latter two options, they both require reading the entire text
file into memory, right?

Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Send-plain-text-file-to-solr-for-indexing-tp4004515p4004772.html
Sent from the Solr - User mailing list archive at Nabble.com.