Re: solr 5.x on glassfish/tomcat instead of jetty

2015-05-21 Thread TK Solr
On 5/21/15, 5:26 AM, Steven White wrote: Hi TK, Can you share the thread you found on this WAR topic? Steve, Actually, that was my mistake. I still don't know why WARs are bad. In the thread "Solr 5.0, Jetty and WAR", which you started and are familiar with, https://wiki.apache.org/solr/Wh

Re: Java upgrade for solr in master-slave configuration

2015-05-21 Thread Kamal Kishore Aggarwal
Hi, Anybody tried upgrading master first prior to slave Java upgrade. Please suggest. On Tue, May 19, 2015 at 6:50 PM, Shawn Heisey wrote: > On 5/19/2015 12:21 AM, Kamal Kishore Aggarwal wrote: > > I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. The solr > > configuration has

Applying gzip compression in Solr 5.1

2015-05-21 Thread Zheng Lin Edwin Yeo
Hi, I'm trying to apply gzip compression in Solr 5.1. I understand that Running Solr on Tomcat is no longer supported from Solr 5.0, so I've tried to implement it in Solr. I've downloaded jetty-servlets-9.3.0.RC0.jar and placed it in my webapp\WEB-INF folder, and have added the following in webap

Re: Index optimize runs in background.

2015-05-21 Thread Modassar Ather
Hi An insight on the question will be really helpful. Thanks, Modassar On Thu, May 21, 2015 at 5:51 PM, Modassar Ather wrote: > Hi, > > I am using Solr-5.1.0. I have an indexer class which invokes > cloudSolrClient.optimize(true, true, 1). My indexer exits after the > invocation of optimize an

Re: Index Sizes

2015-05-21 Thread Shawn Heisey
On 1/7/2014 7:48 AM, Steven Bower wrote: > I was looking at the code for getIndexSize() on the ReplicationHandler to > get at the size of the index on disk. From what I can tell, because this > does directory.listAll() to get all the files in the directory, the size on > disk includes not only what

Re: SolrCloud with local configs

2015-05-21 Thread Shawn Heisey
On 5/21/2015 7:24 PM, Steven Bower wrote: > Is it possible to run in "cloud" mode with zookeeper managing > collections/state/etc.. but to read all config files (solrconfig, schema, > etc..) from local disk? > > Obviously this implies that you'd have to keep them in sync.. > > My thought here is

Re: Search for numbers

2015-05-21 Thread david.w.smi...@gmail.com
Hi Holger, It’s not apparent to me why you are using the spatial field to index a number. Why not simply a “tfloat” or whatever numeric field? Then you could use {!frange} with a function to get the difference and filter it to be in the range you want. RE query parsing (problem #1): you should

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread david.w.smi...@gmail.com
Indeed: https://github.com/dsmiley/SOLR-2155 On Thu, May 21, 2015 at 8:59 PM alexw wrote: > Thanks David. Unfortunately we are on Solr 3.5, so I am not sure whether > RPT > is available. If not, is there a way to patch 3.5 to make it work? > > > > -- > View this message in context: > http://luce

SolrCloud with local configs

2015-05-21 Thread Steven Bower
Is it possible to run in "cloud" mode with zookeeper managing collections/state/etc.. but to read all config files (solrconfig, schema, etc..) from local disk? Obviously this implies that you'd have to keep them in sync.. My thought here is of running Solr in a docker container, but instead of ha

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread alexw
Thanks David. Unfortunately we are on Solr 3.5, so I am not sure whether RPT is available. If not, is there a way to patch 3.5 to make it work? -- View this message in context: http://lucene.472066.n3.nabble.com/Price-Range-Faceting-Based-on-Date-Constraints-tp4206817p4207003.html Sent from the

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm posting the fields from one of my problem document, based on this comment I found from Shawn on Grokbase. >> If you are trying to use a Map object as the value of a field, that is >> probably why it is interpreting your add request as an atomic update. >> If this is the case, and you're doin

Re: Indexing gets significantly slower after every batch commit

2015-05-21 Thread Erick Erickson
bq: Which is logical as index growth and time needed to put something to it is log(n) Not really. Solr indexes to segments, each segment is a fully consistent "mini index". When a segment gets flushed to disk, a new one is started. Of course there'll be a _little bit_ of added overyead, but it sho

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread David Smiley
Another more modern option, very related to this, is to use DateRangeField in 5.0. You have full 64 bit precision. More info is in the Solr Ref Guide. If Alessandro sticks with RPT, then the best reference to give is this: http://wiki.apache.org/solr/SpatialForTimeDurations ~ David https://www

Re: Is it possible to search for the empty string?

2015-05-21 Thread Chris Hostetter
: Subject: Re: Is it possible to search for the empty string? : : Not out of the box. : : Fields are parsed into tokens and queries search on tokens. An empty : string has no tokens for that field and a missing field has no tokens : for that field. that's a missleading over simplification of

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread alexw
Thanks Holger and Alessandro, SpatialRecursivePrefixTreeFieldType is a new concept to me, and I need some time to dig into it and see how it can help solve my problem. Alex Wang Technical Architect Crossview, Inc. C: (647) 409-3066 aw...@crossview.com On Thu, May 21, 2015 at 11:50 AM, Holger Rie

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
a few further clues to this unresolved problem 1. I found one of my 5 zookeeper instances was down 2. I tried another reindex of a bad document but no change on the SOLR side 3. I deleted and reindexed the same doc, that worked (obviously, but at this point I don't know what to expect) -- View

Re: Confused about whether Real-time Gets must be sent to leader?

2015-05-21 Thread Yonik Seeley
On Thu, May 21, 2015 at 3:15 PM, Timothy Potter wrote: > I'm seeing that RTG requests get routed to any active replica of the > shard hosting the doc requested by /get ... I was thinking only the > leader should handle that request since there's a brief window of time > where the latest update may

Re: solr uima and opennlp

2015-05-21 Thread Tommaso Teofili
Hi Andreaa, 2015-05-21 18:12 GMT+02:00 hossmaa : > Hi everyone > > I'm trying to plug in a new UIMA annotator into solr. What is necessary for > this? Is is enough to build a Jar similarly to the ones from the > uima-addons > package? yes, exactly. Actually you just need a jar containing the An

Re: Indexing gets significantly slower after every batch commit

2015-05-21 Thread Sergey Shvets
Hi Angel We also noticed that kind of performance degrade in our workloads. Which is logical as index growth and time needed to put something to it is log(n) четверг, 21 мая 2015 г. пользователь Angel Todorov написал: > hi Shawn, > > Thanks a bunch for your feedback. I've played with the heap

Confused about whether Real-time Gets must be sent to leader?

2015-05-21 Thread Timothy Potter
I'm seeing that RTG requests get routed to any active replica of the shard hosting the doc requested by /get ... I was thinking only the leader should handle that request since there's a brief window of time where the latest update may not be on the replica (albeit usually very brief) and the lates

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm relying on an autocommit of 60 secs. I just ran the same test via my SolrJ client and result was the same, SolrCloud query always returns correct number of fields. Is there a way to find out which shard and replica a particular document lives on? -- View this message in context: http://

Re: Solr suggester

2015-05-21 Thread Erick Erickson
right. File-based suggestions should be much faster to build, but it's certainly the case with large indexes that you have to build it periodically so they won't be completely up to date. However, this stuff is way cool. AnalyzingInfixSuggester, for instance, suggests entire fields rather than iso

Re: Reindex of document leaves old fields behind

2015-05-21 Thread Erick Erickson
My guess is that you're not committing from your SolrJ program. That's automatic when you post. Best, Erick On Thu, May 21, 2015 at 10:13 AM, tuxedomoon wrote: > OK it is composite > > I've just used post.sh to index a test doc with 3 fields to leader 1 of my > SolrCloud. I then reindexed it wi

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
OK it is composite I've just used post.sh to index a test doc with 3 fields to leader 1 of my SolrCloud. I then reindexed it with 1 field removed and the query on it shows 2 fields. I repeated this a few times and always get the correct field count from Solr. I'm now wondering if SolrJ is so

Re: Indexing gets significantly slower after every batch commit

2015-05-21 Thread Angel Todorov
hi Shawn, Thanks a bunch for your feedback. I've played with the heap size, but I don't see any improvement. Even if i index, say , a million docs, and the throughput is about 300 docs per sec, and then I shut down solr completely - after I start indexing again, the throughput is dropping below 30

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread Alessandro Benedetti
Just thinking a little bit on it, I should investigate more the . SpatialRecursivePrefixTreeFieldType . Each value of that field is it a Point ? Actually each of our values must be the rectangle. Because the time frame and the price are a single value ( not only the duration of the price 'end dat

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread Alessandro Benedetti
The geo-spatial idea is brilliant ! Do you think translating the date into ms ? Alex, you should try that approach, it can work ! Cheers 2015-05-21 16:49 GMT+01:00 Holger Rieß : > Give geospatial search a chance. Use the > 'SpatialRecursivePrefixTreeFieldType' field type, set 'geo' to false. > T

Re: Reindex of document leaves old fields behind

2015-05-21 Thread Shawn Heisey
On 5/21/2015 9:54 AM, tuxedomoon wrote: > I'm doing all my index to leader 1 and have not specified any router > configuration. But there is an equal distribution of 240M docs across 5 > shards. I think I've been stating I have 3 shards in these posts, I have 5, > sorry. > > How do I know what k

solr uima and opennlp

2015-05-21 Thread hossmaa
Hi everyone I'm trying to plug in a new UIMA annotator into solr. What is necessary for this? Is is enough to build a Jar similarly to the ones from the uima-addons package? More specifically, are the uima-addona Jars identical to the ones found in solr's contrib folder? Thanks! Andreea --

Re: Solr suggester

2015-05-21 Thread jon kerling
Hi Erick, I have read your blog and it is really helpful.I'm thinking about upgrading to Solr 5.1 but it won't solve all my problems with this issue, as you said each build will have to read all docs, and analyze it's fields. The only advantage is that I can skip default suggest.build on start u

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm doing all my index to leader 1 and have not specified any router configuration. But there is an equal distribution of 240M docs across 5 shards. I think I've been stating I have 3 shards in these posts, I have 5, sorry. How do I know what kind of routing I am using? -- View this messag

AW: Price Range Faceting Based on Date Constraints

2015-05-21 Thread Holger Rieß
Give geospatial search a chance. Use the 'SpatialRecursivePrefixTreeFieldType' field type, set 'geo' to false. The date is located on the X-axis, prices on the Y axis. For every price you get a horizontal line between start and end date. Index a rectangle with height 0.001(< 1 cent) and width 'en

Re: Is it possible to do term Search for the filtered result set

2015-05-21 Thread Upayavira
and then facet on the tags field. &facet=on&facet.field=tags Upayavira On Thu, May 21, 2015, at 04:34 PM, Erick Erickson wrote: > Have you tried > > &fq=type:A > > Best, > Erick > > On Thu, May 21, 2015 at 5:49 AM, Danesh Kuruppu > wrote: > > Hi all, > > > > Is it possible to do term search

Re: Solr suggester

2015-05-21 Thread Erick Erickson
Frankly, the suggester is rather broken in Solr 4.x with large indexes. Building the suggester index (or FST) requires that _all_ the docs get read, the stored fields analyzed and added to the suggester. Unfortunately, this happens _every_ time you start Solr and can take many minutes whether or no

Re: Is it possible to do term Search for the filtered result set

2015-05-21 Thread Erick Erickson
Have you tried &fq=type:A Best, Erick On Thu, May 21, 2015 at 5:49 AM, Danesh Kuruppu wrote: > Hi all, > > Is it possible to do term search for the filtered result set. we can do > term search for all documents. Can we do the term search only for the > specified filtered result set. > > Lets sa

Re: Reindex of document leaves old fields behind

2015-05-21 Thread Shawn Heisey
On 5/21/2015 9:02 AM, tuxedomoon wrote: > l>> If it is "implicit" then >>> you may have indexed the new document to a different shard, which means >>> that it is now in your index more than once, and which one gets returned >>> may not be predictable. > > If a document with uniqueKey "1234" is ass

Re: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-21 Thread Erick Erickson
I question your base assumption: bq: So shard by document producer seems a good choice Because what this _also_ does is force all of the work for a query onto one node and all indexing for a particular producer ditto. And will cause you to manually monitor your shards to see if some of them grow

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread alexw
Hi Alex, Thanks for the link to the presentation. I am going through the slides and trying to figure out the time-sensitive search it talks about and how it relates to the problem I am facing. It looks like it tries to solve the problem of sku availability based on date, while in my case, all skus

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread alexw
Thanks Alessandro. I am implementing this in the Hybris framework. It is not easy to create nested documents during indexing using the Hybris Solr indexer. So I am trying to avoid additional documents and cores if at all possible. -- View this message in context: http://lucene.472066.n3.nabble.

Clarification on Collections API for 5.x

2015-05-21 Thread Jim . Musil
Hi, In the guide for moving from Solr 4.x to 5.x, it states the following: "Solr 5.0 only supports creating and removing SolrCloud collections through the Collections API, unlike previous versions. While not using the collection

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
l>> If it is "implicit" then >> you may have indexed the new document to a different shard, which means >> that it is now in your index more than once, and which one gets returned >> may not be predictable. If a document with uniqueKey "1234" is assigned to a shard by SolrCloud, implicit routing w

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
>> let's see the code. simplified code and some comments 1. solrUrl points at leader 1 of 3 leaders, each with a replica 2. createSolrDoc takes a full Mongo doc and returns a valid SolrInputDocument 3. I have done dumps of the returned solrDoc and verified it does not have the unwanted fiel

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread Alexandre Rafalovitch
Did you look at Gilt's presentation from a while ago: http://www.slideshare.net/trenaman/personalized-search-on-the-largest-flash-sale-site-in-america Slides 33 on might be most relevant. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread Alessandro Benedetti
Hi Alex, this is not a simple problem. In your domain we can consider a Product as a document and the list of nested Documents. Ideally we would model the Product as the father and the prices as children. Each will be defined by : - *start_date * - *end_date * - *price * - *productI

Re: Logic on Term Frequency Calculation : Bug or Functionality

2015-05-21 Thread Ahmet Arslan
Hi Ariya, DefaultSimilarity does not use raw term frequency, but instead it uses square root of raw term frequency. If you want to observe raw term frequency information in explain section, I suggest you to play with org.apache.lucene.search.similarities.SimilarityBase and its sub-classes. ahme

Solr suggester

2015-05-21 Thread jon kerling
Hi, I'm using solr 4.10 and I'm trying to add autosuggest ability to my application. I'm currently using this kind of configuration: mySuggester FuzzyLookupFactory suggester_fuzzy_dir DocumentDictionaryFactory field2 weightField text_gen

Price Range Faceting Based on Date Constraints

2015-05-21 Thread alexw
Hi, I have an unique requirement to facet on product prices based on date constraints, for which I have been thinking for a solution for a couple of days now, but to no avail. The details are as follows: 1. Each product can have multiple prices, each price has a start-date and an end-date. 2. At

Logic on Term Frequency Calculation : Bug or Functionality

2015-05-21 Thread ariya bala
Hi, I am puzzled on the Term Frequency Behaviour of the DefaultSimilarity implementation I have suppressed the IDF by setting to 1. TF-IDF would inturn reflect the same value as in Term Frequency Below are the inferences: Red coloured are expected to give a hit count(Term Frequency) of 2 but was

Is it possible to do term Search for the filtered result set

2015-05-21 Thread Danesh Kuruppu
Hi all, Is it possible to do term search for the filtered result set. we can do term search for all documents. Can we do the term search only for the specified filtered result set. Lets says we have, Doc1 --> type: A tags: T1 T2 Doc2 --> type: A tags: T1 T3 Doc3 --> t

Re: SolrCloud Leader Election

2015-05-21 Thread Ramkumar R. Aiyengar
This shouldn't happen, but if it does, there's no good way currently for Solr to automatically fix it. There are a couple of issues being worked on to do that currently. But till then, your best bet is to restart the node which you expect to be the leader (you can look at ZK to see who is at the he

Re: solr 5.x on glassfish/tomcat instead of jetty

2015-05-21 Thread Steven White
Hi TK, Can you share the thread you found on this WAR topic? Thanks, Steve On Wed, May 20, 2015 at 8:58 PM, TK Solr wrote: > Never mind. I found that thread. Sorry for the noise. > > > On 5/20/15, 5:56 PM, TK Solr wrote: > >> On 5/20/15, 8:21 AM, Shawn Heisey wrote: >> >>> As of right now, th

Index optimize runs in background.

2015-05-21 Thread Modassar Ather
Hi, I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until th

Re: [solr 5.1] Looking for full text + collation search field

2015-05-21 Thread Björn Keil
Thanks for the advice. I have tried the field type and it seems to do what it is supposed to in combination with a lower case filter. However, that raises another slight problem: German umlauts are supposed to be treated slightly different for the purpose of searching than for sorting. For sort

optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-21 Thread Matteo Grolla
Hi I'd like some feedback on how I'd like to solve the following sharding problem I have a collection that will eventually become big Average document size is 1.5kb Every year 30 Million documents will be indexed Data come from different document producers (a person, owner of his documents) an

Re: Need help with Nested docs situation

2015-05-21 Thread Alessandro Benedetti
This scenario is a perfect fit to play with Solr Joins [1] . As you observed, you would prefer to go with a query time join. THis kind of join can be done inter-collection . You can have you collection and collection . Every product will have one field to match all the parent deals. When you a

Re: Indexing gets significantly slower after every batch commit

2015-05-21 Thread Shawn Heisey
On 5/21/2015 2:07 AM, Angel Todorov wrote: > I'm crawling a file system folder and indexing 10 million docs, and I am > adding them in batches of 5000, committing every 50 000 docs. The problem I > am facing is that after each commit, the documents per sec that are indexed > gets less and less. >

Indexing gets significantly slower after every batch commit

2015-05-21 Thread Angel Todorov
hi guys, I'm crawling a file system folder and indexing 10 million docs, and I am adding them in batches of 5000, committing every 50 000 docs. The problem I am facing is that after each commit, the documents per sec that are indexed gets less and less. If I do not commit at all, I can index thos

Search for numbers

2015-05-21 Thread Holger Rieß
Hi, I try to search numbers with a certain deviation. My parser is ExtendedDisMax. A possible search expression could be 'twist drill 1.23 mm'. It will not match any documents, because the document contains the keywords 'twist drill', '1.2' and 'mm'. In order to reach my goal, I've indexed all