Re: Using fq as OR

2014-05-26 Thread Dmitry Kan
Erick, correct me if the following's wrong, but if you have a custom query parser configured to preprocess your searches, you'd need to send the corresponding bit of the search in the q= parameter, rather than fq= parameter. In that sense, q and fq are not exactly equal. Dmitry On Thu, May 22,

about analyzer and tokenizer

2014-05-26 Thread rachun
Dear all, How can I do this... I index the document => Macbook then when I query mac book I should get the result. This is my schema setting... Any suggest would be very appreciate. Chun. -- View this message in context

Re: Full Indexing fails on Solr-Probable connection issue.HELP!

2014-05-26 Thread Aniket Bhoi
On Thu, May 22, 2014 at 9:31 PM, Shawn Heisey wrote: > On 5/22/2014 8:31 AM, Aniket Bhoi wrote: > > On Thu, May 22, 2014 at 7:13 PM, Shawn Heisey wrote: > > > >> On 5/22/2014 1:53 AM, Aniket Bhoi wrote: > >>> Details: > >>> > >>> *Solr Version:* > >>> Solr Specification Version: 3.4.0.2012.01.23

Re: about analyzer and tokenizer

2014-05-26 Thread Dmitry Kan
Hi Chun, You can use the edge ngram filter [1] on your tokens, that will produce all possible letter sequences in a certain (configurable) range, like: ma, ac, bo, ok, mac, aac, boo, ook, book etc. Then when querying, both mac and book should hit in the sequence and you should get the macbook hit

Re: Combining Solr score with customized user ratings for a document

2014-05-26 Thread rulinma
Good. -- View this message in context: http://lucene.472066.n3.nabble.com/Combining-Solr-score-with-customized-user-ratings-for-a-document-tp4040200p4138135.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Full Indexing fails on Solr-Probable connection issue.HELP!

2014-05-26 Thread Aniket Bhoi
Another thing I have noted is that the exception always follows a commit operation.Log excerpt below: INFO: SolrDeletionPolicy.onCommit: commits:num=2 commit{dir=/opt/solr/cores/calls/data/index,segFN=segments_2qt,version=1347458723267,generation=3557,filenames=[_3z9.tii, _3z3.fnm, _3z9.nrm, _3za.

how to apply multiplcative Boost in multivalued field

2014-05-26 Thread Aman Tandon
HI, I am confused to how to apply the multiplicative boost on multivalued field. Suppose in plid the value goes like 111,1234,2345,4567,2335,9876,67 I am applying the filters on the plid like *..&fq=plid:(111 1234 2345 4567 2335 9876 67)* Now i need to apply the boost on the first th

Re: How does query on AND work

2014-05-26 Thread Per Steffensen
Do not know if this is a special-case. I guess an AND-query where one side hits 500-1000 and the other side hits billions is a special-case. But this way of carrying out the query might also be an optimization in less uneven cases. It does not require that the "lots of hits"-part of the query is

sort by spatial distance in faceting

2014-05-26 Thread Aman Tandon
Hi, Is it possible to sort the results return on faceting by geo spatial distance instead of result count. Currently i am faceting on city, which returns me the top facets on behalf of the docs matched for that particular city. e.g.: Delhi,400 Noida, 380 . . . etc. If the user selects the city

Re: about analyzer and tokenizer

2014-05-26 Thread Jack Krupansky
Unfortunately Solr and Lucene do not provide a truly clean out of the box solution for this obvious use case, but you can approximate it by using index-time synonyms, so that "mac book" will also index as "macbook" and "macbook" will also index as "mac book". Your SYNONYMS.TXT file would contai

Using SolrCloud with RDBMS or without

2014-05-26 Thread Ali Nazemian
Hi everybody, I was wondering which scenario (or the combination) would be better for my application. From the aspect of performance, scalability and high availability. Here is my application: Suppose I am going to have more than 10m documents and it grows every day. (probably in 1 years it reach

Re: Solr - Cores not initialised

2014-05-26 Thread Jack Krupansky
Usually a message like "SolrCore 'corexxx' is not available due to init failure" means that you had a syntax error in your schema.xml or solrconfig.xml so that it could not be successfully processed by Solr (which is done in the "init" method.) Do a diff between your schema and config files be

Re: Using SolrCloud with RDBMS or without

2014-05-26 Thread Jack Krupansky
You could also consider DataStax Enterprise, which integrates Apache Cassandra as the primary database and Solr for indexing and query. See: http://www.datastax.com/what-we-offer/products-services/datastax-enterprise -- Jack Krupansky -Original Message- From: Ali Nazemian Sent: Monda

Re: Using SolrCloud with RDBMS or without

2014-05-26 Thread Ali Nazemian
The fact that I ignore Cassandra is because of it seems Cassandra is perfect when you have too much write operation. In my case it is true that I have some update operation but for sure read operations are much more than write ones. By the way there are probably more scenarios for my application. M

Compression vs FieldCache for doc ids retrieval

2014-05-26 Thread jim ferenczi
Dear Solr users, we migrated our solution from Solr 4.0 to Solr 4.3 and we noticed a degradation of the search performance. We compared the two versions and found out that most of the time is spent in the decompression of the retrievable fields in Solr 4.3. The block compression of the documents i

Re: How does query on AND work

2014-05-26 Thread Alexandre Rafalovitch
Did not follow the whole story but " post-query-value-filter" does exist in Solr. Have you tried searching for pretty much that expression. and maybe something about cost-based filter. Regards, Alex On 26/05/2014 6:49 pm, "Per Steffensen" wrote: > Do not know if this is a special-case. I gue

MergeReduceIndexerTool takes a lot of time for a limited number of documents

2014-05-26 Thread Costi Muraru
Hey guys, I'm using the MergeReduceIndexerTool to import data into a SolrCloud cluster made out of 3 decent machines. Looking in the JobTracker, I can see that the mapper jobs finish quite fast. The reduce jobs get to ~80% quite fast as well. It is here where they get stucked for a long period of

Re: Using SolrCloud with RDBMS or without

2014-05-26 Thread Shawn Heisey
On 5/26/2014 7:50 AM, Ali Nazemian wrote: > I was wondering which scenario (or the combination) would be better for my > application. From the aspect of performance, scalability and high > availability. Here is my application: > > Suppose I am going to have more than 10m documents and it grows eve

Re: pdfs

2014-05-26 Thread Erick Erickson
Brian: Yeah, if you can share the PDF that would be great. Parsing via Tika should not bring down Solr, although I supposed there could be something in Tika that is pathologically bad. You could also try using Tika itself in SolrJ and indexing from a client. That might let you 1> more gracefully

ExtractingRequestHandler indexing zip files

2014-05-26 Thread marotosg
Hi, I am using ExtractingRequestHandler to be able to index different type of documents (doc,pdf,txt,html) but when I try to index compressed files like zip files solr returns the name of the file inside the field which I am using to map the content. Any idea is this is actually working? I tried

Re: SolrCloud Nodes autoSoftCommit and (temporary) missing documents

2014-05-26 Thread Erick Erickson
Siegfried's comment is spot-on. Your filter query will not be re-used unless you submit two within the same millisecond! Here's more than you want to know about why. http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/ Best, Erick On Sun, May 25, 2014 at 10:56 AM, Siegfried Goeschl

Re: Using fq as OR

2014-05-26 Thread Erick Erickson
Dmitry: You have a valid point. That said I'm pretty sure you could have the filter query use your custom parser by something like fq={!customparser} whatever Of course if you were doing something in your custom qparser that needed both halves, that wouldn't work either.. Best, Erick On

Re: MergeReduceIndexerTool takes a lot of time for a limited number of documents

2014-05-26 Thread Erick Erickson
The MapReduceIndexerTool is really intended for very large data sets, and by today's standards 80K doesn't qualify :). Basically, MRIT creates N sub-indexes, then merges them, which it may to in a tiered fashion. That is, it may merge gen1 to gen2, then merge gen2 to gen3 etc. Which is great when

Re: Using SolrCloud with RDBMS or without

2014-05-26 Thread Erick Erickson
What you haven't told us is where the data comes from. But until you put some numbers to it, it's hard to decide. I tend to prefer storing the data somewhere else, filesystem, whatever and indexing to Solr when data changes. Even if that means re-indexing the entire corpus. I don't like going to m

How to Configure Solr For Test Purposes?

2014-05-26 Thread Furkan KAMACI
Hi; I run Solr within my Test Suite. I delete documents or atomically update them and check whether if it works or not. I know that I have to setup a hard/soft commit timing for my test Solr. However even I have that settings: 1 true 1 and even

Re: “ClientAbortException: java.io.IOException” in solr query

2014-05-26 Thread Shawn Heisey
On 8/3/2013 7:18 AM, Alexandre Rafalovitch wrote: > The client closed the web-browser page or stopped loading or some other > timeout/connection close. Then, the server tries to write to no-longer > existing connection and fails. > > If you control the client, then you might have some sort of time

Re: MergeReduceIndexerTool takes a lot of time for a limited number of documents

2014-05-26 Thread Costi Muraru
Hey Erick, The job reducers began to die with "Error: Java heap space", after 1h and 22 minutes being stucked at ~80%. I did a few more tests: Test 1. 80,000 documents Each document had *20* fields. The field names were* the same *for all the documents. Values were different. Job status: success

Re: How to Configure Solr For Test Purposes?

2014-05-26 Thread Shawn Heisey
On 5/26/2014 10:57 AM, Furkan KAMACI wrote: > Hi; > > I run Solr within my Test Suite. I delete documents or atomically update > them and check whether if it works or not. I know that I have to setup a > hard/soft commit timing for my test Solr. However even I have that settings: > > >

Solr Deduplicate - Class Not Found Exception

2014-05-26 Thread Manikandan Saravanan
Hi, I’m running Nutch 2 on a Hadoop 1.2.1 cluster with 2 nodes. I’m running Solr 4 separately on a box and I replaced Solr’s schema with Nutch’s Solr-4 schema. When I run a crawl, I get the following error at the end of the job 14/05/26 14:08:32 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplica

Re: Solr Deduplicate - Class Not Found Exception

2014-05-26 Thread Shawn Heisey
On 5/26/2014 12:20 PM, Manikandan Saravanan wrote: > I’m running Nutch 2 on a Hadoop 1.2.1 cluster with 2 nodes. I’m running Solr > 4 separately on a box and I replaced Solr’s schema with Nutch’s Solr-4 > schema. When I run a crawl, I get the following error at the end of the job > > 14/05/26 14

Re: Wordbreak spellchecker excessive breaking.

2014-05-26 Thread S.L
Anyone ? On Sat, May 24, 2014 at 5:21 PM, S.L wrote: > > I am using Solr wordbreak spellchecker and the issue is that when I search > for a term like "mob ile" expecting that the wordbreak spellchecker would > actually resutn a suggestion for "mobile" it breaks the search term into > letters li

Re: Using SolrCloud with RDBMS or without

2014-05-26 Thread Ali Nazemian
Dear Erick, Thank you for you reply. Some parts of documents come from Nutch crawler and the other parts come from processing those documents. I really need it to be as fast as possible and 10 hours for indexing is not acceptable for my application. Regards. On Mon, May 26, 2014 at 9:25 PM, Erick

Re: Using SolrCloud with RDBMS or without

2014-05-26 Thread Ali Nazemian
Dear Shawn, Hi and thank you for you reply. Could you please tell me about the performance and scalability of the mentioned solutions? Suppose I have a SolrCloud with 4 different machine. Would it scale linearly if I add another 4 machines to that? I mean when the documents number increases from 10

Re: Using SolrCloud with RDBMS or without

2014-05-26 Thread Shawn Heisey
On 5/26/2014 1:48 PM, Ali Nazemian wrote: > Dear Shawn, > Hi and thank you for you reply. > Could you please tell me about the performance and scalability of the > mentioned solutions? Suppose I have a SolrCloud with 4 different machine. > Would it scale linearly if I add another 4 machines to that

RE: Using SolrCloud with RDBMS or without

2014-05-26 Thread Susheel Kumar
Few things will help here if you can clarify what is acceptable in terms of indexing hours & what is the use case for indexing · Are you looking to re-index all data (say 100 m) frequently that you need indexing hours to be on lower side (<10 or <5 etc.). If so how many reasonable hours

Re: How to Configure Solr For Test Purposes?

2014-05-26 Thread Furkan KAMACI
Hi Shawn; I know that it is a bad practise but I just commit up to 5 documents and there will not be more than 5 documents at any time at any test method. It is just for test purpose to see that my API works. I want to have automatic tests. What do you suggest for my purpose? If a test case fails

RE: 答复: Internals about "Too many values for UnInvertedField faceting on field xxx"

2014-05-26 Thread 张月祥
Thanks a lot. > There are only 256 byte arrays to hold all of the ord data, and the pointers into those arrays are only 24 bits long. That gets you back to 32 bits, or 4GB of ord data max. It's practically less since you only have to overflow one array before the exception is thrown. What does

Re: ExtractingRequestHandler indexing zip files

2014-05-26 Thread Alexandre Rafalovitch
A zip file can contain many files and directories in a nested structure. With files of any type and size. What would you expect Solr to do facing a generic Zip file? And what would you like it to do for _your_ - one assumes more restricted - scenario? Regards, Alex. Personal website: http://

Re: about analyzer and tokenizer

2014-05-26 Thread rachun
Thank you very much for your suggestion both of you. I will try more to figure out which way will be match with my case. Chun. -- View this message in context: http://lucene.472066.n3.nabble.com/about-analyzer-and-tokenizer-tp4138129p4138227.html Sent from the Solr - User mailing list archive

Re: sort by spatial distance in faceting

2014-05-26 Thread david.w.smi...@gmail.com
Hi Aman, That’s an interesting feature request that I haven’t heard before. First reaction: Helliosearch (a fork of Solr that is kept up to date with changes from Solr) is extremely close to supporting such a thing because it supports sorting facets by Helliosearch specific aggregation functions

Solr shut down by itself

2014-05-26 Thread rachun
Dear all, Could anyone tell me what wrong with this? How can I fix this problem? INFO - 2014-05-27 03:08:00.252; org.eclipse.jetty.server.Server; Graceful shutdown SocketConnector@0.0.0.0:8983 INFO - 2014-05-27 03:08:00.254; org.eclipse.jetty.server.Server; Graceful shutdown o.e.j.w.WebAppCont

Re: Solr shut down by itself

2014-05-26 Thread Alexandre Rafalovitch
INFO - 2014-05-27 03:08:00.252; org.eclipse.jetty.server.Server; Graceful shutdown SocketConnector@0.0.0.0:8983 That's the first line. Looks like a normal non-aborted shutdown. I would actually look at the messages before that first line. Also, why do you think it was abnormal? Is that something

Grouping on a multi-valued field

2014-05-26 Thread Bhoomit Vasani
Hi, Does latest release of solr supports grouping on a multi-valued field? According to this https://wiki.apache.org/solr/FieldCollapsing#Known_Limitations it doesn't, but the doc was last updated 14 months ago... -- -- Thanks & Regards, Bhoomit Vasani | SE @ Mygola WE are LIVE

Re: Applying boosting for keyword search

2014-05-26 Thread manju16832003
Hi Jack, Thank you for the suggestions. :-) -- View this message in context: http://lucene.472066.n3.nabble.com/Applying-boosting-for-keyword-search-tp4137523p4138239.html Sent from the Solr - User mailing list archive at Nabble.com.