Spellcheck not working for alphanumeric words

2014-12-12 Thread Shivanand Yeurkar
Hi, Spell check is not working for alphanumeric and numeric words. For example, I have indexed alphanumeric words like BL5C BL4C BL5F Spellcheck for word "BL" does not suggest me any of above results. I have same problem even with numeric words For example, indexed words- Nokia Lumia 520 Nokia

Re: different fields for user-supplied phrases in edismax

2014-12-12 Thread Amit Jha
Hi Mike, What is exact your use case? What do mean by "controlling the fields used for phrase queries" ? Rgds AJ > On 12-Dec-2014, at 20:11, Michael Sokolov > wrote: > > Doug - I believe pf controls the fields that are used for the phrase queries > *generated by the parser*. > > What I

Re: first time user

2014-12-12 Thread Alexandre Rafalovitch
RTFineM: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates The default separator is ',' (a coma). If you want semicolon, you need to use 'separator' parameter to tell Solr to do so. It's not quite magic, esp

first time user

2014-12-12 Thread onyourmark
Hi. I tried setting up and running solr on a pc. Then I tried to index a document that was semicolon delimited although it has a file extension of .csv and got the following: C:\Users\Owner\Downloads\SOLR\solr-4.10.2>java -classpath dist/solr-core-4.10.2.jar -Dauto org.apache.solr.util.SimplePostT

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-12 Thread Alexandre Rafalovitch
Sounds like a bug report. Can you be very specific on what the broken definition looked like. To replicate. Regards, Alex On 12/12/2014 6:54 pm, "solr-user" wrote: > I did find out the cause of my problems. Turns out the problem wasn't due > to > the solrconfig.xml file; it was in the schem

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-12 Thread solr-user
I did find out the cause of my problems. Turns out the problem wasn't due to the solrconfig.xml file; it was in the schema.xml file I spent a fair bit of time making my solrconfig closer to the default solrconfig.xml in the solr download; when that didnt get rid of the error I went back to the on

LUCENE-2899 patch will not compile

2014-12-12 Thread Tim Hearn
After I applied the LUCENE-2899.patch file to lucene-solr 4.10.2 release I tried to run an ant compile persuant to the following directions under 'instillation' : https://wiki.apache.org/solr/OpenNLP And I received the following error indicating a dependency is missing - how do I find that depend

Re: Solr hangs on distributed updates

2014-12-12 Thread Peter Keegan
The AMIs are Red Hat (not Amazon's) and the instances are properly sized for the environment (t1.micro for ZK, m3.xlarge for Solr). I do plan to add hooks for a clean shutdown of Solr when the VM is shut down, but if Solr takes too long, AWS may clobber it anyway. One frustrating part of auto scali

Re: Solr hangs on distributed updates

2014-12-12 Thread Peter Keegan
> The Solr leader should stop sending requests to the stopped replica once > that replica's live node is removed from ZK (after session expiry). Fwiw, here's the Zookeeper log entry for a graceful shutdown of the Solr replica: 2014-12-12 15:04:21,304 [myid:2] - INFO [ProcessThread(sid:2 cport:81

Re: Solr hangs on distributed updates

2014-12-12 Thread Chris Hostetter
: No, I wasn't aware of these. I will give that a try. If I stop the Solr : jetty service manually, things recover fine, but the hang occurs when I : 'stop' or 'terminate' the EC2 instance. The Zookeeper leader reports a I don't know squat about AWS Auto-Scaling, (and barely anything about AWS)

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread shamik
Ted, Thanks a lot, I had gone through your blogs but the white space issue slipped out of my mind. "replaceWhitespaceWith" addressed the issue. I think it's a great filter to have, surely takes care of an important use case. Appreciate your help. -Shamik -- View this message in context: ht

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread Doug Turnbull
Ted, nice work on this filter. I happened to read the blog article yesterday, it definitely addresses a common pain-point in a lot of relevancy work. I had been doing something similar with a combination of shingles and a keepwords list and my own query parser (or using hon-lucene-synonyms) Are th

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread Ted Sullivan
Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4174109.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread James Strassburg
Yes, I'll submit a pull-request back to the LucidWorks github. For the specific replaceWhitespaceWith nothing issue look at my changesets: https://github.com/jstrassburg/auto-phrase-tokenfilter/commit/a9450f2500d864539b3e5632c6cd47b283f3a481 and https://github.com/jstrassburg/auto-phrase-tokenfilt

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread Ted Sullivan
Hi James: Could you send me the fix? I would be interested in merging this in for my submission to Solr/Lucene, and any other bugs that you found would be much appreciated. Ted -- View this message in context: http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-A

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread James Strassburg
Yes, actually that was one of the bugs I fixed so that we could replaceWhitespaceWith nothing. On Fri, Dec 12, 2014 at 3:34 PM, Ted Sullivan wrote: > > Hi Shamik: > > One thing that might help is to use the "replaceWhitespaceWith" parameter > of > the QParserPlugin and in your index-time Autophra

Re: Use a constant entity in all imported documents of DIH

2014-12-12 Thread Alexandre Rafalovitch
Have a look at the documentation for the rootEntity attribute. https://wiki.apache.org/solr/DataImportHandler If you set it on the outer entity, I think it should give you what you want with the nested entity structure. Then the outside entity will load from the constant table and the inside from

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread Ted Sullivan
Hi Shamik: The link to the second blog post which discusses this problem is https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ Ted -- View this message in context: http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phra

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread Ted Sullivan
Hi Shamik: One thing that might help is to use the "replaceWhitespaceWith" parameter of the QParserPlugin and in your index-time Autophrase TokenFilter. so in my solrconfig.xml I have autophrases.txt _ then if in your fieldType in schema.xml if you have: The reason for this

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread James Strassburg
I'm on 4.8.1 On Fri, Dec 12, 2014 at 3:11 PM, shamik wrote: > > Jim, > > Thanks for your response. I've tried including > AutoPhrasingTokenFilterFactory as part of the query analyzer, but didn't > make any difference. > > positionIncrementGap="100"> > >

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread shamik
Jim, Thanks for your response. I've tried including AutoPhrasingTokenFilterFactory as part of the query analyzer, but didn't make any difference.

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread shamik
Ted, Here's the query I'm using and the debug info. It's still returning all 5 results back as if it's simply looking for either of the term with q.op set as OR (default). http://localhost:8983/solr/autophrase?q=text:seat+cushions&wt=xml&debugQuery=true Debug text:seat cushions text:seat

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread James Strassburg
Also, Shamik: I believe you need to configure the AutoPhrasingTokenFilterFactory in your query analyzer for your text_autophrase field type. JiM On Fri, Dec 12, 2014 at 2:39 PM, James Strassburg wrote: > > Hello, > > I've been using auto-phrasing. I believe it was my company's query to > LucidW

Re: Solr hangs on distributed updates

2014-12-12 Thread Shalin Shekhar Mangar
Sorry I should have specified. These timeouts go inside the section and apply for inter-shard update requests only. The socket and connection timeout inside the shardHandlerFactory section apply for inter-shard search requests. On Fri, Dec 12, 2014 at 8:38 PM, Peter Keegan wrote: > Btw, are the

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread James Strassburg
Hello, I've been using auto-phrasing. I believe it was my company's query to LucidWorks that got that initial implementation created. In working with it I found a few issues and forked the repo and simplified some code (where I didn't need features) and expanded the testing quite a bit. I've got m

Re: Solr hangs on distributed updates

2014-12-12 Thread Shalin Shekhar Mangar
Okay, that should solve the hung threads on the leader. When you stop the jetty service then it is a graceful shutdown where existing requests finish before the searcher thread pool is shutdown completely. A EC2 terminate probably just kills the processes and leader threads just wait due to a lack

Re: Solr hangs on distributed updates

2014-12-12 Thread Peter Keegan
Btw, are the following timeouts still supported in solr.xml, and do they only apply to distributed search? ${socketTimeout:0} ${connTimeout:0} Thanks, Peter On Fri, Dec 12, 2014 at 3:14 PM, Peter Keegan wrote: > No, I wasn't aware of these. I will give that a try. If I stop the S

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread Ted Sullivan
Hi Shamik: Can you send me a JSON output using debugQuery=true so I can help troubleshoot this? As to the question about edismax features - yes I *think* so :) but it would be great if you could give me some specific examples of queries as I am currently writing the test cases for this. General

Re: Solr hangs on distributed updates

2014-12-12 Thread Peter Keegan
No, I wasn't aware of these. I will give that a try. If I stop the Solr jetty service manually, things recover fine, but the hang occurs when I 'stop' or 'terminate' the EC2 instance. The Zookeeper leader reports a 15-sec timeout from the stopped node, and expires the session, but the Solr leader n

Re: Use a constant entity in all imported documents of DIH

2014-12-12 Thread Per Newgro
Do you mean with inner entity something like Yes that i could use. But i would use always the same entity in the where clause of the sub-entity. I would like to do something like

Re: Solr hangs on distributed updates

2014-12-12 Thread Shalin Shekhar Mangar
Do you have distribUpdateConnTimeout and distribUpdateSoTimeout set to reasonable values in your solr.xml? These are the timeouts used for inter-shard update requests. On Fri, Dec 12, 2014 at 2:20 PM, Peter Keegan wrote: > We are running SolrCloud in AWS and using their auto scaling groups to sp

Re: SolrCloud - Shard splitting and re-sizing nodes

2014-12-12 Thread Erick Erickson
A couple of options: 1> physically copy the index over 2> (what I prefer) is to use the ADDREPLICA command from the Collections API to bring up a new node on the new machine as a replica of one of your splits. It'll automatically synchronize, and after it's done then shut down the original split.

SolrCloud - Shard splitting and re-sizing nodes

2014-12-12 Thread Trilok Prithvi
Hello, We have a 2 shards (S1, S2), 2 replica (R1, R2) setup (Solr Cloud) using 4.10.2 version. Each shard and replica resides on its own nodes (so, total of 4 nodes). As the data increased, we would like to split the shards. So, we are thinking about creating 4 more nodes (2 for shards (S3, S4)

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread shamik
Anyone ? -- View this message in context: http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4174069.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Use a constant entity in all imported documents of DIH

2014-12-12 Thread Alexandre Rafalovitch
Sounds like a case for nested entity definitions with the inner entity being the one that's actually indexed? Just need to remember that all the parent mapping is also applicable to all children. Have you tried that? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr r

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Mikhail Khludnev
Tom, note about https://issues.apache.org/jira/browse/SOLR-6559 and https://issues.apache.org/jira/browse/SOLR-3585. They seem relevant. On Fri, Dec 12, 2014 at 7:31 PM, Tom Burton-West wrote: > Thanks everybody for the information. > > Shawn, thanks for bringing up the issues around making sure

Use a constant entity in all imported documents of DIH

2014-12-12 Thread Per Newgro
Hello, i would like to load an entity before document import in DIH starts. I want to use the entity id for a sub-select in the document entity. Can i achieve something like that? Thanks for helping me Per

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Tom Burton-West
Thanks everybody for the information. Shawn, thanks for bringing up the issues around making sure each document is indexed ok. With our current architecture, that is important for us. Yonik's clarification about streaming really helped me to understand one of the main advantages of CUSS: >>When

OpenNLP integration to Solr

2014-12-12 Thread Tim H
Hi everyone, I am trying to use the Open NLP and Solr integration described here: https://wiki.apache.org/solr/OpenNLP I have followed all of the steps under the 'Instillation' component of the wiki(I have completed running the ant test-contrib command), however, I do not have an 'opennlp' folde

Re: Join in SOLR

2014-12-12 Thread Mikhail Khludnev
On Fri, Dec 12, 2014 at 5:31 PM, Shawn Heisey wrote: > Using a database view that does the JOIN on the server side is pretty > much guaranteed to have far better performance. Database software is > very good at doing joins efficiently when proper DB indexes are > available ... the dataimport han

Re: different fields for user-supplied phrases in edismax

2014-12-12 Thread Michael Sokolov
Doug - I believe pf controls the fields that are used for the phrase queries *generated by the parser*. What I am after is controlling the fields used for the phrase queries *supplied by the user* -- ie surrounded by double-quotes. -Mike On 12/12/2014 08:53 AM, Doug Turnbull wrote: Michael,

Re: [Hep] tab delimited gz file indexing steps

2014-12-12 Thread Alexandre Rafalovitch
gzcat may do the job by streaming as it expands. Another option is to DataImportHandler and write a custom FileSystem data source that will do expansion. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstar

Re: Documents with SOLR function "sort" are NOT sorted by score

2014-12-12 Thread Erick Erickson
Searching is all about speed, and relevance calculations can be very expensive. As Shawn says, when you explicitly specify sort criteria you are, in effect, taking explicit control of ranking so scores don't need to be calculated and that expense can be avoided. If you need score, just specify it

Re: Join in SOLR

2014-12-12 Thread Shawn Heisey
On 12/12/2014 5:16 AM, Tomoko Uchida wrote: > I cannot find out your table structure and Solr schema, > but if your requirement is too complex to handle by DIH, you could handle > it by rich database functionality. > > I think creating a database view is good choice... > > (Of course, other exper

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Shawn Heisey
On 12/12/2014 3:49 AM, Michael Della Bitta wrote: > I seem to remember being able to do something about errors with the > handleError method, but I must have had to do it in a custom subclass to > actually have visibility into what exactly went wrong. Although it may be possible to override the ha

Re: [Hep] tab delimited gz file indexing steps

2014-12-12 Thread Shawn Heisey
On 12/11/2014 11:56 PM, Sithik wrote: > I have a compressed text file (gz) which holds tab delimited data. Is it > possible for me to index this file directly without doing any pre > processing of uncompressing the file on my own? if so, can you please tell > me the steps/config changes I am suppos

Re: Documents with SOLR function "sort" are NOT sorted by score

2014-12-12 Thread Shawn Heisey
On 12/11/2014 9:51 PM, eakarsu wrote: > I am having difficulty with my sort function. With the following sort, > documents are not sorted by score if you can see. Why sort function is not > able to sort it properly? I don't know why this is surprising. If you don't use the sort parameter at all,

Solr hangs on distributed updates

2014-12-12 Thread Peter Keegan
We are running SolrCloud in AWS and using their auto scaling groups to spin up new Solr replicas when CPU utilization exceeds a threshold for a period of time. All is well until the replicas are terminated when CPU utilization falls below another threshold. What happens is that index updates sent t

Re: different fields for user-supplied phrases in edismax

2014-12-12 Thread Doug Turnbull
Michael, I typically solve this problem by using a copyField and running different analysis on the destination field. Then you could use this field as pf insteaf of qf. If I recall, fields in pf must also be mentioned in qf for this to work. -Doug On Fri, Dec 12, 2014 at 8:13 AM, Michael Sokolov

Re: different fields for user-supplied phrases in edismax

2014-12-12 Thread Michael Sokolov
Yes, I guess it's a common expectation that searches work this way. It was actually almost trivial to add as an extension to the edismax parser, and I have what I need now; I opened SOLR-6842; if there's interest I'll try to find the time to contribute back to Solr -Mike On 12/11/14 5:20 PM,

Re: Highlighting integer field

2014-12-12 Thread Tomoko Uchida
I did not know about LUCENE-3080. (and take a look now) So discussion seems to be beyond user mailing list, but thank you for your information. Regards, Tomoko 2014-12-12 21:04 GMT+09:00 Pawel : > > Hi again, > When I removed those lines from DefaultSolrHighlighter and rebuilt Solr it > seems to

Re: Join in SOLR

2014-12-12 Thread Tomoko Uchida
I cannot find out your table structure and Solr schema, but if your requirement is too complex to handle by DIH, you could handle it by rich database functionality. I think creating a database view is good choice... (Of course, other experts may have ideas using DIH?) 2014-12-12 20:43 GMT+09:0

Re: Highlighting integer field

2014-12-12 Thread Pawel
Hi again, When I removed those lines from DefaultSolrHighlighter and rebuilt Solr it seems to work. final SchemaField schemaField = schema.getFieldOrNull(fieldName); if (schemaField != null && ((schemaField.getType() instanceof org.apache.solr.schema.TrieField) || (schemaField.

Re: Highlighting integer field

2014-12-12 Thread Tomoko Uchida
Hi, As Mike have pointed, there is no way to highlight numeric fields. If you want to highlight, you have to index them as *text* field. It's not about Solr, but Lucene. (Maybe it's possible to "highlight" them in application layer, server side program or client side JavaScript, rather than Solr?

Re: Join in SOLR

2014-12-12 Thread Rajesh
Yes. two entities are child for the first one. I've gone through the link. But what I can get out of the configuration given in that link is, I could get an array for all the individual fields defined in the sub-entities. For. e.g if my sub-entity has 3 fields name, id, desc. I'm getting a list for

Re: Join in SOLR

2014-12-12 Thread Tomoko Uchida
Thank you for config information. Three tables have relation (by foreign key) ? You might want to have one nested tag in rather than 3 one in . By using nested tag, you may able to merge tables *before* importing them to Solr. All works done by SQL. You have already seen this wiki? If not, ex

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Michael Della Bitta
Shawn: I seem to remember being able to do something about errors with the handleError method, but I must have had to do it in a custom subclass to actually have visibility into what exactly went wrong. On Dec 11, 2014 9:28 PM, "Shawn Heisey" wrote: > On 12/11/2014 9:19 AM, Michael Della Bitta w

Re: Highlighting integer field

2014-12-12 Thread Pawel
Hi, Thanks for your response. Do you maybe have an idea how to handle integers (even on low level - Lucene) in highlighter? -- Paweł On Fri, Dec 12, 2014 at 12:28 AM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: > > So the short answer to your original question is "no." Highlighting i

Re: Join in SOLR

2014-12-12 Thread Rajesh
Thanks for your reply Tomoko. My data-config file looks like the below. Each entity represents a table in DB. Now, If I want to join these three tables, can I make use of the SOLR join functionality.. -- View this message in context: http://lucene.472066.n3.nabble.com/Join-in-SOLR-

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Mikhail Khludnev
On Wed, Dec 10, 2014 at 10:12 PM, Tom Burton-West wrote: > I have very large XML documents, and the examples I see all build documents > by adding fields in Java code. Is there an example that actually reads XML > files from the file system? > Tom, What's the possible architecture, can you let S