Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Erick Erickson
Sure, I think it's fine to raise a JIRA, especially if you can include a patch, even a preliminary one to solicit feedback... which I'll leave to people who are more familiar with that code... I'm not sure how generally useful this would be, and if it comes at a cost to normal searching there's su

Re: Behavior of grouping on a field with same value spread across shards.

2015-08-25 Thread Erick Erickson
That should be the case. Best, Erick On Tue, Aug 25, 2015 at 8:55 PM, Modassar Ather wrote: > Thanks Erick, > > I saw the link. So is it that the grouping functionality works fine in > distributed search except the two cases mentioned in the link? > > Regards, > Modassar > > On Tue, Aug 25,

IOException, ConnectionTimeout Error while searching

2015-08-25 Thread Nitin Solanki
Hello, I indexed 2 million documents and after completing indexing. I tried for searching. It throws IOException and Connection Timeout Error. "error":{ "msg":"org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://192.168.1.25:8983/so

Re: Solr performance is slow with just 1GB of data indexed

2015-08-25 Thread Toke Eskildsen
On Wed, 2015-08-26 at 10:10 +0800, Zheng Lin Edwin Yeo wrote: > I'm currently trying out on the Carrot2 Workbench and get it to call Solr > to see how they did the clustering. Although it still takes some time to do > the clustering, but the results of the cluster is much better than mine. I > thin

Re: Behavior of grouping on a field with same value spread across shards.

2015-08-25 Thread Modassar Ather
Thanks Erick, I saw the link. So is it that the grouping functionality works fine in distributed search except the two cases mentioned in the link? Regards, Modassar On Tue, Aug 25, 2015 at 10:40 PM, Erick Erickson wrote: > That's not really the case. Perhaps you're confusing > group.ngroups a

Re: splitting shards on 4.7.2 with custom plugins

2015-08-25 Thread Anshum Gupta
Can you elaborate a bit more on the setup, what do the custom plugins do, what error do you get ? It seems like a classloader/classpath issue to me which doesn't really relate to Shard splitting. On Tue, Aug 25, 2015 at 7:59 PM, Jeff Courtade wrote: > I am getting failures when trying too split

splitting shards on 4.7.2 with custom plugins

2015-08-25 Thread Jeff Courtade
I am getting failures when trying too split shards on solr 4.2.7 with custom plugins. It fails regularily it cannot find the jar files for plugins when creating the new cores/shards. Ideas? -- Thanks, Jeff Courtade M: 240.507.6116

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson
Looks like I have something basic working for Trie fields. I am doing exactly what I said in my previous email, so good news there. I think this is a big step as there are only a few field types left that I need to support, those being date (should be similar to Trie) and Spatial fields, which at

Re: Solr performance is slow with just 1GB of data indexed

2015-08-25 Thread Zheng Lin Edwin Yeo
Hi Toke, Thank you for your reply. I'm currently trying out on the Carrot2 Workbench and get it to call Solr to see how they did the clustering. Although it still takes some time to do the clustering, but the results of the cluster is much better than mine. I think its probably due to the differe

CloudSolrClient does not distribute suggest.build=true

2015-08-25 Thread Arcadius Ahouansou
When using the new Suggester component (with AnalyzingInfixSuggester) in Solr trunk with solrj, the suggest.build command seems to be executed only on one of the solr cloud nodes. I had to add shards.qt=/suggest and shards=host1:port2/solr/mycollection,host2:port2/solr/mycollection... to distribut

Re: how to prevent uuid-field changing in /update query?

2015-08-25 Thread Chris Hostetter
: updates? i can't do this because i have delta-import queries which also : should be able to assign uuid when it is needed You really need to give us a full and complete picture of what exactly you are currently doing, what's working, what's not working, and when it's not working what is it

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson
Right, I had assumed (obviously here is my problem) that I'd be able to specify payloads for the field regardless of the field type. Looking at TrieField that is certainly non-trivial. After a bit of digging it appears that if I wanted to do something here I'd need to build a new TrieField, overr

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Erick Erickson
Well, you're going down a path that hasn't been trodden before ;). If you can treat your primitive types as text types you might get some traction, but that makes a lot of operations like numeric comparison difficult. H. another idea from left field. For single-valued types, what about a side

Re: Exact substring search with ngrams

2015-08-25 Thread Erick Erickson
Hmmm, this sounds like a nonsensical question, but "what do you mean by arbitrary substring"? Because if your substrings consist of whole _tokens_, then ngramming is totally unnecessary (and gets in the way). Phrase queries with no slop fulfill this requirement. But let's assume you need to march

Exact substring search with ngrams

2015-08-25 Thread Christian Ramseyer
Hi I'm trying to build an index for technical documents that basically works like "grep", i.e. the user gives an arbitray substring somewhere in a line of a document and the exact matches will be returned. I specifically want no stemming etc. and keep all whitespace, parentheses etc. because they

Re: Search opening hours

2015-08-25 Thread Yonik Seeley
On Tue, Aug 25, 2015 at 5:02 PM, O. Klein wrote: > I'm trying to find the best way to search for stores that are open NOW. It's probably not the *best* way, but assuming it's currently 4:10pm, you could do +open:[* TO 1610] +close:[1610 TO *] And to account for days of the week have different f

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson
We were originally using this approach, i.e. run things through the KeywordTokenizer -> DelimitedPayloadFilter -> WordDelimiterFilter. Again this works fine for text, though I had wanted to use the StandardTokenizer in the chain. Is there an equivalent filter that does what the StandardTokenizer

Re: Search opening hours

2015-08-25 Thread Alexandre Rafalovitch
Have you seen: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3c1354991310424-4025359.p...@n3.nabble.com%3E https://wiki.apache.org/solr/SpatialForTimeDurations https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/ Regards, Alex. Solr Analyzers

Search opening hours

2015-08-25 Thread O. Klein
I'm trying to find the best way to search for stores that are open NOW. I have day of week, open and closing times. I've seen some examples, but not an exact fit. What is the best way to tackle this? Thank you for any suggestions you have to offer. -- View this message in context: http

RE: Bot protection (CAPTCHA)

2015-08-25 Thread Davis, Daniel (NIH/NLM) [C]
> So, usually, the middleware is the answer, just like with a database. With applications backed by database systems, there is usually an application server tier, and then a database tier. There may be a web server tier in front of the application server tier.The search engine and database

Re: Unknown query parser 'terms' with TermsComponent defined

2015-08-25 Thread P Williams
Thanks Hoss! It's obvious what the problem(s) are when you lay it all out that way. On Tue, Aug 25, 2015 at 12:14 PM, Chris Hostetter wrote: > > 1) The "terms" Query Parser (TermsQParser) has nothing to do with the > "TermsComponent" (the first is for quering many distinct terms, the > later is

ANNOUNCE: Apache Solr Reference Guide for Solr 5.3 released

2015-08-25 Thread Cassandra Targett
The Lucene PMC is pleased to announce the release of the Solr Reference Guide for Solr 5.3. This 577 page PDF is the definitive guide for using Apache Solr and can be downloaded from: https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ If you have

RE: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Markus Jelsma
Well, if i remember correctly (i have no testing facility at hand) WordDelimiterFilter maintains payloads on emitted sub terms. So if you use a KeywordTokenizer, input 'some text^PAYLOAD', and have a DelimitedPayloadFilter, the entire string gets a payload. You can then split that string up agai

Re: Unknown query parser 'terms' with TermsComponent defined

2015-08-25 Thread Chris Hostetter
1) The "terms" Query Parser (TermsQParser) has nothing to do with the "TermsComponent" (the first is for quering many distinct terms, the later is for requesting info about low level terms in your index) https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParse

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Erick Erickson
Oh My. What fun! bq: I need a way to specify the payload on the other field types Not to my knowledge. The payload mechanism is built on the capability of having a filter in the analysis chain. And there's no analysis chain with primitive types (string, numeric and the like). Hmmm. Totally off t

Unknown query parser 'terms' with TermsComponent defined

2015-08-25 Thread P Williams
Hi, We've encountered a strange situation, I'm hoping someone might be able to shed some light. We're using Solr 4.9 deployed in Tomcat 7. We build a query that has these params: 'params'=>{ 'fl'=>'id', 'sort'=>'system_create_dtsi asc', 'indent'=>'true', 'start'=>'0',

Re: testing with EmbeddedSolrServer

2015-08-25 Thread Mikhail Khludnev
Hello, I'm trying to guess what are you doing. It's not clear so far. I found http://stackoverflow.com/questions/11951695/embedded-solr-dih My conclusion, if you play with DIH and EmbeddedSolrServer you'd better to avoid the third beast, you don't need to bother with tests. I guess that main() is

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson
To be clear, we are using payloads as a way to attach authorizations to individual tokens within Solr. The payloads are normal Solr Payloads though we are not using floats, we are using the identity payload encoder (org.apache.lucene.analysis.payloads.IdentityEncoder) which allows for storing a by

Re: how to index document with multiple words (phrases) and words permutation?

2015-08-25 Thread simon
What you want to do is basically named entity recognition. We have a quite similar use case (medical/scientific documents, need to look for disease names /drug names /MeSH terms, etc). Take a look at David Smiley's Solr Text Tagger ( https://github.com/OpenSextant/SolrTextTagger ) which we've been

Re: Behavior of grouping on a field with same value spread across shards.

2015-08-25 Thread Erick Erickson
That's not really the case. Perhaps you're confusing group.ngroups and group.facet with just grouping? See the ref guide: https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats Best, Erick On Tue, Aug 25, 2015 at 4:51 AM, Modassar Ather

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Erick Erickson
This really sounds like an XY problem. Or when you use "payload" it's not the Solr payload. So Solr Payloads are a float value that you can attach to individual terms to influence the scoring. Attaching the _same_ payload to all terms in a field is much the same thing as boosting on any matches in

Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson
I would like to specify a particular payload for all tokens emitted from a tokenizer, but don't see a clear way to do this. Ideally I could specify that something like the DelimitedPayloadTokenFilter be run on the entire field and then standard analysis be done on the rest of the field, so in the

attribute based recommender with solr

2015-08-25 Thread ReconX92
Hey Guys, I wanted to create a simple, attributed based food recommender with solr. The User makes his choice concerning ingredients, cooking time, difficulty and so on. It is based on a SQL database where the recipes are stored. So, for example the user likes tomatoes, then the recipes with toma

Re: Please answer my question on StackOverflow ... "Best approach to guarantee commits in SOLR"

2015-08-25 Thread Jack Krupansky
You could also look at an integrated product such as DataStax Enterprise which fully integrates the Cassandra database and Solr - you execute your database transactions in Cassandra and then DSE Search automatically indexes the data in the embedded version of Solr. See: http://www.datastax.com/pro

Re: Using copyField with dynamicField

2015-08-25 Thread Scott Dawson
Zach, As an alternative to 'copyField', you might want to consider the CloneFieldUpdateProcessorFactory: http://lucene.apache.org/solr/5_0_0/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html It supports specification of field names with regular expressions, exclusion

RE: User Authentication

2015-08-25 Thread Davis, Daniel (NIH/NLM) [C]
We use CAS as well, and are also not using ZooKeeper/SolrCloud. We may move to SolrCloud after getting our current very-basic setup into production. We'll definitely take a look at the rule-based authorization plugin and see how we can leverage that. -Original Message- From: LeZotte, T

Re: Query timeAllowed and its behavior.

2015-08-25 Thread Shawn Heisey
On 8/25/2015 3:18 AM, Modassar Ather wrote: > Kindly help me understand the query time allowed attribute. The following > is set in solrconfig.xml. > 30 > > Does this setting stop the query from running after the timeAllowed is > reached? If not is there a way to stop it as it will occupy resou

Re: Please answer my question on StackOverflow ... "Best approach to guarantee commits in SOLR"

2015-08-25 Thread Upayavira
On Tue, Aug 25, 2015, at 01:21 PM, Simer P wrote: > http://stackoverflow.com/questions/32138845/what-is-the-best-approach-to-guarantee-commits-in-apache-solr > . > > *Question:* How can I get "guarantee commits" with Apache SOLR where > persisting data to disk and visibility are both equally imp

Re: how to prevent uuid-field changing in /update query?

2015-08-25 Thread Jack Krupansky
UUIDUpdateProcessorFactory - "An update processor that adds a newly generated UUID value to any document being added that does not already have a value in the specified field." See: http://lucene.apache.org/solr/5_2_1/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html -- J

Re: Bot protection (CAPTCHA)

2015-08-25 Thread Alexandre Rafalovitch
The standard answer is that exposing the API is a REALLY bad idea. To start from, you can issue the delete commands through the API. And they can be escaped in multiple different ways. Plus, you have admin UI there as well to manipulate the cores as well as to see the configuration files for them.

testing with EmbeddedSolrServer

2015-08-25 Thread Moen Endre
Is there an example of integration-testing with EmbeddedSolrServer that loads data from a data importhandler - then queries the data? Ive tried doing this based on org.apache.solr.client.solrj.embedded.TestEmbeddedSolrServerConstructors. But no data is being imported. Here is the test-class iv

Bot protection (CAPTCHA)

2015-08-25 Thread Dmitry Savenko
Hello, I plan to expose Solr search REST API to the world, so it can be called from my web page directly, without additional server layer. I'm concerned about bots, so I plan to add CAPTCHA to my page. Surely, I'd like to do it with as little effort as possible. Does Solr provide CAPTCHA support o

RE: Spellcheck / Suggestions : Append custom dictionary to SOLR default index

2015-08-25 Thread Dyer, James
Max, If you know the entire list of words you want to spellcheck against, you can use FileBasedSpellChecker. See http://wiki.apache.org/solr/FileBasedSpellChecker . If, however, you have a field you want to spellcheck against but also want additional words added, consider using a copy of the

Please answer my question on StackOverflow ... "Best approach to guarantee commits in SOLR"

2015-08-25 Thread Simer P
http://stackoverflow.com/questions/32138845/what-is-the-best-approach-to-guarantee-commits-in-apache-solr . *Question:* How can I get "guarantee commits" with Apache SOLR where persisting data to disk and visibility are both equally important ? *Background:* We have a website which requires high

Re: Performance gain with setting !cache=false in the query for complex queries

2015-08-25 Thread wwang525
Hi Erick, Up to now, all the tests were based on randomly generated requests. In reality, many requests will get executed more than twice since this is to support the advertising project. On the other hand, new queries could be generated daily. So some of the filter queries will be used frequent

Re: how to prevent uuid-field changing in /update query?

2015-08-25 Thread Jamie Johnson
I am honestly not familiar enough to say. Best to try it On Aug 25, 2015 7:59 AM, "CrazyDiamond" wrote: > It sounds like you need to control when the uuid is and is not created, > just feels like you'd get better mileage doing this outside of solr > Can I simply insert a condition(blank or not )

Re: how to prevent uuid-field changing in /update query?

2015-08-25 Thread CrazyDiamond
It sounds like you need to control when the uuid is and is not created, just feels like you'd get better mileage doing this outside of solr Can I simply insert a condition(blank or not ) in uuid update-chain? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-prevent-u

Behavior of grouping on a field with same value spread across shards.

2015-08-25 Thread Modassar Ather
Hi, As per my understanding, to group on a field all documents with the same value in the field have to be in the same shard. Can we group by a field where the documents with the same value in that field will be distributed across shards? Please let me know what are the limitations, feature not a

Re: how to prevent uuid-field changing in /update query?

2015-08-25 Thread Jamie Johnson
It sounds like you need to control when the uuid is and is not created, just feels like you'd get better mileage doing this outside of solr On Aug 25, 2015 7:49 AM, "CrazyDiamond" wrote: > Why not generate the uuid client side on the initial save and reuse this on > updates? i can't do this beca

Re: how to prevent uuid-field changing in /update query?

2015-08-25 Thread CrazyDiamond
Why not generate the uuid client side on the initial save and reuse this on updates? i can't do this because i have delta-import queries which also should be able to assign uuid when it is needed -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-prevent-uuid-field-ch

Re: Query timeAllowed and its behavior.

2015-08-25 Thread Modassar Ather
Thanks for your response Jonathon. Please correct me if I am wrong in following points. -query actually ceases to run once time allowed is reached and releases all the resources. -query expansion is stopped and the query is terminated from execution releasing all the resources. Thanks, Moda

Re: how to prevent uuid-field changing in /update query?

2015-08-25 Thread Jamie Johnson
Why not generate the uuid client side on the initial save and reuse this on updates? On Aug 25, 2015 4:22 AM, "CrazyDiamond" wrote: > i have uuid field. it is not set as unique, but nevertheless i want it not > to > be changed every time when i call /update. it might be because i added > request

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-25 Thread Jamie Johnson
I had seen this as well, if I over wrote this by extending SolrIndexSearcher how do I have my extension used? I didn't see a way that could be plugged in. On Aug 25, 2015 7:15 AM, "Mikhail Khludnev" wrote: > On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson wrote: > > > Thanks Mikhail. If I'm rea

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-25 Thread Mikhail Khludnev
On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson wrote: > Thanks Mikhail. If I'm reading the SimpleFacets class correctly, out > delegates to DocValuesFacets when facet method is FC, what used to be > FieldCache I believe. DocValuesFacets either uses DocValues or builds then > using the Uninverti

Re:Query timeAllowed and its behavior.

2015-08-25 Thread Jonathon Marks (BLOOMBERG/ LONDON)
timeAllowed applies to the time taken by the collector in each shard (TimeLimitingCollector). Once timeAllowed is exceeded the collector terminates early, returning any partial results it has and freeing the resources it was using. From Solr 5.0 timeAllowed also applies to the query expansion ph

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-25 Thread Jamie Johnson
Thanks Mikhail. If I'm reading the SimpleFacets class correctly, out delegates to DocValuesFacets when facet method is FC, what used to be FieldCache I believe. DocValuesFacets either uses DocValues or builds then using the UninvertingReader. I am not seeing a clean extension point to add a cust

Query timeAllowed and its behavior.

2015-08-25 Thread Modassar Ather
Hi, Kindly help me understand the query time allowed attribute. The following is set in solrconfig.xml. 30 Does this setting stop the query from running after the timeAllowed is reached? If not is there a way to stop it as it will occupy resources in background for no benefit. Thanks, Modass

how to prevent uuid-field changing in /update query?

2015-08-25 Thread CrazyDiamond
i have uuid field. it is not set as unique, but nevertheless i want it not to be changed every time when i call /update. it might be because i added requesthandler with name "/update" which contains uuid update срфшт .But if i not do this i have no uuid at all.May be i can config uuid update-chain

Re: Solr performance is slow with just 1GB of data indexed

2015-08-25 Thread Toke Eskildsen
On Tue, 2015-08-25 at 10:40 +0800, Zheng Lin Edwin Yeo wrote: > Would like to confirm, when I set rows=100, does it mean that it only build > the cluster based on the first 100 records that are returned by the search, > and if I have 1000 records that matches the search, all the remaining 900 > rec