Solr/Lucene Faceted Search Too Many Unique Values?
Hi, I am going to evaluate some Lucene/Solr capabilities on handling faceted queries, in particular, with a single facet field that contains large number (say up to 1 million) of distinct values. Does anyone have some experience on how lucene performs in this scenario? e.g. Doc1 has tags A B C D Doc2 has tags B C D E etc etc millions of docs and there can be millions of distinct tag values. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Lucene-Faceted-Search-Too-Many-Unique-Values-tp4112860.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCell takes InputStream
Hi, While using ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); The two ways of adding a file are up.addFile(File) up.addContentStream(ContentStream) However my raw files are stored on some remote storage devices. I am able to get an InputStream object for the file to be indexed. To me it may seem awkward to have the file temporarily stored locally. Is there a way of directly passing the InputStream in (e.g. constructing ContentStream using the InputStream)? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCell-takes-InputStream-tp4024315.html Sent from the Solr - User mailing list archive at Nabble.com.
Search match all tokens in Query Text
Hello, I have a field text with type text_general here. When I query for text:a b, solr returns results that contain only a but not b. That is, it uses OR operator between the two tokens. Am I right here? What should I do to force an AND operator between the two tokens? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Search-match-all-tokens-in-Query-Text-tp4037758.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search match all tokens in Query Text
Thanks for the quick reply. Seems like you are suggesting to add explicitly AND operator. I don't think this solves my problem. I found it somewhere, and this works. -- View this message in context: http://lucene.472066.n3.nabble.com/Search-match-all-tokens-in-Query-Text-tp4037758p4037762.html Sent from the Solr - User mailing list archive at Nabble.com.
How to define a lowercase fieldtype without tokenizer
Hi, I don't want the field to be tokenized because Solr doesn't support sorting on a tokenized field. In order to do case insensitive sorting I need to copy a field to a lowercase but not tokenized field. How to define this? I did below but it says I need to specify a tokenizer or a class for analyzer. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-define-a-lowercase-fieldtype-without-tokenizer-tp4040500.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to define a lowercase fieldtype without tokenizer
Works perfectly. Thank you. I didn't know this tokenizer does nothing before :) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-define-a-lowercase-fieldtype-without-tokenizer-tp4040500p4040507.html Sent from the Solr - User mailing list archive at Nabble.com.
Multiple Embedded Servers Pointing to single solrhome/index
Hi, I'm trying to use two embedded solr servers pointing to a same solrhome / index. So that's something like System.setProperty("solr.solr.home", "SomeSolrDir"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); m_server = new EmbeddedSolrServer(coreContainer, ""); on both applications. The problem is, after I have done one add+commit SolrInputDocument on one embedded server, the other server would fail to obtain write lock any more. I'm thinking there must be a way of releasing write lock so other servers may pick up. Is there an API that does so? Any inputs are appreciated. Bing
Re: Multiple Embedded Servers Pointing to single solrhome/index
Thanks Lance. The use case is to have a cluster of nodes which runs the same application with EmbeddedSolrServer on each of them, and they all point to the same index on NFS. Every application is designed equal, meaning that everyone may index and/or search. In such way, after every commit the writer needs to be closed for other nodes' availability. Do you see any issues of this use case? Is the EmbeddedSolrServer able to release its write lock without shutting down? -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p3999591.html Sent from the Solr - User mailing list archive at Nabble.com.
Does Solr support 'Value Search'?
Hi folks, Just wondering if there is a query handler that simply takes a query string and search on all/part of fields for field values? e.g. q=*admin* Response may look like author: [admin, system_admin, sub_admin] last_modifier: [admin, system_admin, sub_admin] doctitle: [AdminGuide, AdminManual] -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr index storage strategy on FileSystem
Hi folks, With StandardDirectoryFactory, index is stored under data/index in forms of frq, tim, tip and a few other files. While index grows larger, more files are generated and sometimes it merges a few of them. It's like there're some kinds of separation and merging strategies there. My question is, are the separation / merging strategies configurable? Basically I want to add a size limit for any individual file. Is it feasible without changing solr core code? Thanks! Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-index-storage-strategy-on-FileSystem-tp3999661.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does Solr support 'Value Search'?
Thanks for the response but wait... Is it related to my question searching for field values? I was not asking how to use wildcards though. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p3999817.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does Solr support 'Value Search'?
Not quite understand but I'd explain the problem I had. The response would contain only fields and a list of field values that match the query. Essentially it's querying for field values rather than documents. The underlying use case would be, when typing in a quick search box, the drill down menu may contain matches on authors, on doctitles, and potentially on other fields. Still thanks for your response and hopefully I'm making it clearer. Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p327.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Embedded Servers Pointing to single solrhome/index
Makes sense. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p4000180.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does Solr support 'Value Search'?
Thanks Kuli and Mikhail, Using either termcomponent or suggester I could get some suggested terms but it's still confusing me how to get the respective field names. In order to get that, Use TermComponent I'll need to do a term query to every possible field. Similar things as using SpellCheckComponent. CopyField won't help since I want the original field name. Any suggestions? Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p4000267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Embedded Servers Pointing to single solrhome/index
I agree. We chose embedded to minimize the maintenance cost of http solr servers. One more concern. Even if I have only one node doing indexing, other nodes need to reopen index reader periodically to catch up with new changes, right? Is there a solr request that does this? Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p4000269.html Sent from the Solr - User mailing list archive at Nabble.com.
Multiple SpellCheckComponents
Hello, Background is that I want to use both Suggest and SpellCheck features in a single query to have alternatives returned at one time. Right now I can only specify one of them using spellcheck.dictionary at query time. default .. suggest Am I able to use two separate SpellCheckComponents for these two and add them to a same searchhandler to achieve this? I tried and seems like one is overwriting the other. -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-SpellCheckComponents-tp4000272.html Sent from the Solr - User mailing list archive at Nabble.com.
SpellCheckComponent Collation query
Hello, >From spell check component I'm able to get the collation query and its # of hits. Is it possible to have solr execute the collated query automatically and return doc search results without resending it on client side? Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/SpellCheckComponent-Collation-query-tp4000273.html Sent from the Solr - User mailing list archive at Nabble.com.
Tlog vs. buffer + softcommit.
Hello, I'm a bit confused with the purpose of Transaction Logs (Update Logs) in Solr. My understanding is, update request comes in, first the new item is put in RAM buffer as well as T-Log. After a soft commit happens, the new item becomes searchable but not hard committed in stable storage. Configuring soft commit interval to 1 sec achieves NRT. Then what exactly T-Log is doing in this scenario? Why is it there and under what circumstances is it being cleared? I tried to search for online documentations but no success. Trying to get something from source code. Any hints would be appreciated. Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Tlog-vs-buffer-softcommit-tp4000330.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tlog vs. buffer + softcommit.
Thanks for the information. It definitely helps a lot. There're numDeletesToKeep = 1000; numRecordsToKeep = 100; in UpdateLog so this should probably be what you're referring to. However when I was doing indexing the total size of TLogs kept on increasing. It doesn't sound like the case where there's a cap for number of documents? Also for peersync, can I find some intro online? -- View this message in context: http://lucene.472066.n3.nabble.com/Tlog-vs-buffer-softcommit-tp4000330p4000503.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tlog vs. buffer + softcommit.
I remember I did set the 15sec autocommit and still saw the Tlogs growing unboundedly. But sounds like theoretically it should not if I index in a constant rate. I'll probably try it again sometime. For the peersync, I think solr cloud now uses push-replication over pull. Hmm, it makes sense to keep an amount of Tlogs for peers to sync up. Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Tlog-vs-buffer-softcommit-tp4000330p4000509.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr4.0 Partially update document
Hi, Several days ago I came across some solrj test code on partially updating document field values. Sadly I forgot where that was. In Solr 4.0, "/update" is able to take in document id and fields as hashmaps like "id": "doc1" "field1": {"set":"new_value"} Just trying to figure out what's the solrj client code that does this. Thanks for any help on this, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-Partially-update-document-tp4000875.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr4.0 Partially update document
Got it at https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/test/org/apache/solr/client/solrj/SolrExampleTests.java Problem solved. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-Partially-update-document-tp4000875p4000878.html Sent from the Solr - User mailing list archive at Nabble.com.
Getting Suggestions without Search Results
Hi, I'm having a spell check component that does auto-complete suggestions. It is part of "last-components" of my /select search handler. So apart from normal search results I also get a list of suggestions. Now I want to split things up. Is there a way that I can only get suggestions of a query without getting the normal search results? I may need to create a new handler for this. Can anyone please give me some ideas on that? Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-Suggestions-without-Search-Results-tp4000968.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting Suggestions without Search Results
Great comments. Thanks to you all. Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-Suggestions-without-Search-Results-tp4000968p4001192.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing thousands file on solr
You may write a client using solrj and loop through all files in that folder. Something like, ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); up.addFile(new File(fileLocation), null); ModifiableSolrParams p = new ModifiableSolrParams(); p.add("literal.id", str); ... up.setParams(p); server.request(up); Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-thousands-file-on-solr-tp4001050p4001196.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Are there any comparisons of Elastic Search specifically with SOLR 4?
Most of existing comparisons were done on Solr3.x or earlier against ES. After Solr4 added those cloud concepts similar to ES's, there are really less differences. Solr is more heavier loaded and was not designed for maximize elasticity In my opinion. It's not hard to decide which way to go as long as you have a preference on better scalability or better stability & online supports. Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Are-there-any-comparisons-of-Elastic-Search-specifically-with-SOLR-4-tp4000889p4001237.html Sent from the Solr - User mailing list archive at Nabble.com.
Send plain text file to solr for indexing
Hello, I used to use solrcell, which has built-in tika support to handle both extraction and indexing of raw documents. Now I got another text extraction provider to convert raw document to a plain text txt file so I want to let solr bypass that extraction phase. Is there a way I can send the plain txt file to solr to simply index that as a fulltext field without doing extraction on that file? Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Send-plain-text-file-to-solr-for-indexing-tp4004515.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Send plain text file to solr for indexing
So in order to use solrcell I'll have to add a number of dependent libraries, which is one of what I'm trying to avoid. The second thing is, solrcell still parses the plain text files and I don't want it to make any change to those of my exported files. Any ideas? Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Send-plain-text-file-to-solr-for-indexing-tp4004515p4004753.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Send plain text file to solr for indexing
Thanks Mr.Yagami. I'll look into that. Jack, for the latter two options, they both require reading the entire text file into memory, right? Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Send-plain-text-file-to-solr-for-indexing-tp4004515p4004772.html Sent from the Solr - User mailing list archive at Nabble.com.