Re: really slow performance when trying to get facet.field

2012-01-18 Thread Dmitry Kan
Sounds good! So the take away lesson here is to remember cache pre-warming. And of course keep track of RAM allocation :) On Tue, Jan 17, 2012 at 11:23 PM, Daniel Bruegge < daniel.brue...@googlemail.com> wrote: > Ok, I have now changed the static warming in the solrconfig.xml using > first- and n

Re: Question on Reverse Indexing

2012-01-18 Thread Dmitry Kan
Just to play safe here, can you double check that the reversing is not any more the case by issuing a query through the admin analysis page? Dmitry On Wed, Jan 18, 2012 at 4:23 AM, Shyam Bhaskaran < shyam.bhaska...@synopsys.com> wrote: > Hi Francois, > > I understand that disabling of ReversedWi

Re: How to return the distance geo distance on solr 3.5 with bbox filtering

2012-01-18 Thread Mikhail Khludnev
Maxim, Which version of Solr you are using? Why the second approach at the link doesn't work for you? just move q=trafficRouteId:235to fq=, becaus

RE: Question on Reverse Indexing

2012-01-18 Thread Shyam Bhaskaran
Dimitry, Using http://localhost:7070/solr/docs/admin/analysis.jsp passed the query *lock and did not find ReversedWildcardFilterFactory to the indexer or any other filters that could do the reversing. -Shyam -Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: Wedne

Re: Question on Reverse Indexing

2012-01-18 Thread Dmitry Kan
OK. Not sure what is your system architecture there, but could your queries stay cached in some server caches even after you have re-indexed your data? The way the index level leading wildcard works (reading SOLR 3.4 code, but seems to be true circa 1.4) is that the following check is done for the

RE: Question on Reverse Indexing

2012-01-18 Thread Shyam Bhaskaran
Dimitry, We are using Solr 4.0. To confirm server caching issues I have restarted our tomcat webserver after performing a re-index. For reverseIndexing we have defined a fieldType "text_rev" and this fieldyType was used against the fields.

Re: Question on Reverse Indexing

2012-01-18 Thread Dmitry Kan
Shyam, You still didn't say if you have started re-indexing from the clean index, i.e. if you have removed all the data prior to re-indexing. You can use the luke (http://code.google.com/p/luke/) to check the contents of your text field, and see if it still contains reversed sequences. On Wed, Ja

Solrj use wrong queryResponseWriter

2012-01-18 Thread tschiela
Hello, i run into dubious problems here. I use SolrJ 3.5 to query my Solr Server 3.5 So i set the QueryResponseWriter to xml in my code and in solrconfig.xml... in code i use this.server.setParser(new XMLResponseParser()); After i query Solr i want to output the QueryResponse: String xml = solrs

Re: How to return the distance geo distance on solr 3.5 with bbox filtering

2012-01-18 Thread Maxim Veksler
Hello Mikhail, Please see reply inline. On Wed, Jan 18, 2012 at 11:00 AM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Maxim, > > Which version of Solr you are using? > As mentioned in the title, I'm using Solr 3.5. > Why the second approach at the link doesn't work for you? > just

RE: Improving Solr Spell Checker Results

2012-01-18 Thread O. Klein
Dyer, James wrote > > David, > > The spellchecker normally won't give suggestions for any term in your > index. So even if "wever" is misspelled in context, if it exists in the > index the spell checker will not try correcting it. There are 3 > workarounds: > 1. Use the patch included with SOL

Different mm for spellcheckquery

2012-01-18 Thread O. Klein
What is the best way to search with a mm of 0%, but use a mm of 100% on the spellcheck query so maxCollationTries gives the best results? -- View this message in context: http://lucene.472066.n3.nabble.com/Different-mm-for-spellcheckquery-tp3669200p3669200.html Sent from the Solr - User mailing l

Re: Grouping results after Sorting or vice-versa

2012-01-18 Thread Vijayaragavan
Thanks Tomás and Juan... I got the expected results when i updated solr to v3.5.0 -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-results-after-Sorting-or-vice-versa-tp3615957p3669299.html Sent from the Solr - User mailing list archive at Nabble.com.

"index-time" over boosted

2012-01-18 Thread remi tassing
Hello all, I've come accros a problem where newly indexed pages almost always come first even when the term frequency is relatively slow. I read the posts below on "fieldNorm" and "omitNorms" but setting "omitNorms=true" doesn't change anything for me on the calculation of fieldNorm. e.g.: 0.123

ReversedWildcardFilterFactory Question

2012-01-18 Thread Jamie Johnson
I'm trying to determine when it is appropriate to use the solr.ReversedWildcardFilterFactory, specifically if I have a field content of type text (from default schema) which I want to be able to search with leading wildcards do I need to index this information into both a text field and a text_rev

Size of fields from one document (monitoring, debugging)

2012-01-18 Thread Vadim Kisselmann
Hello folks, is it possible to find out the size (in KB) of specific fields from one document? Eventually with Luke or Lucid Gaze? My case: docs in my old index (Solr 1.4) have sizes of 3-4KB each. In my new index(Solr 4.0 trunk) there are about 15KB per doc. I changed only 2 things in my schema.x

Re: ReversedWildcardFilterFactory Question

2012-01-18 Thread Dmitry Kan
You can store both the non-reverted and reverted terms in one field, so that you can do leading wildcard and other searches against one field. So your schema may look something like this: On Wed, Jan 18, 2012 at 4:19 PM, Jamie Johnson wrote: > I'm t

Re: Trying to understand SOLR memory requirements

2012-01-18 Thread Dave
I'm using 3.5 On Tue, Jan 17, 2012 at 7:57 PM, Lance Norskog wrote: > Which version of Solr do you use? 3.1 and 3.2 had a memory leak bug in > spellchecking. This was fixed in 3.3. > > On Tue, Jan 17, 2012 at 5:59 AM, Robert Muir wrote: > > I committed it already: so you can try out branch_3x i

Re: Trying to understand SOLR memory requirements

2012-01-18 Thread Dave
Robert, where can I pull down a nightly build from? Will it include the apache-solr-core-3.3.0.jar and lucene-core-3.3-SNAPSHOT.jar jars? I need to re-build with a custom SpellingQueryConverter.java. Thanks, Dave On Tue, Jan 17, 2012 at 8:59 AM, Robert Muir wrote: > I committed it already: so y

RE: DataImportHandler in Solr 4.0

2012-01-18 Thread Dyer, James
You need to find "apache-solr-solrj-4.0.jar" from your distribution and put it in the classpath somewhere. Perhaps the easiest thing is to include it in your core's "lib" directory. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Rob [mailto:

RE: Improving Solr Spell Checker Results

2012-01-18 Thread Dyer, James
Taking a quick look at DirectSolrSpellChecker I think I agree that using DirectSolrSpellChecker and the "thresholdTokenFrequency" parameter may provide an additional workaround for David's situation. One caveat is that terms like "wever" need to always be low-frequency. Also, DirectSolrSpellCh

replication, disk space

2012-01-18 Thread Jonathan Rochkind
So Solr 1.4. I have a solr master/slave, where it actually doesn't poll for replication, it only replicates irregularly when I issue a replicate command to it. After the last replication, the slave, in solr_home, has a data/index directory as well as a data/index.20120113121302 directory. Th

Re: Solr Cloud Indexing

2012-01-18 Thread Sujatha Arun
Thanks for the input.I conclude that It does not make sense to do it this way, Regards Sujatha On Wed, Jan 18, 2012 at 6:26 AM, Lance Norskog wrote: > Cloud upload bandwidth is free, but download bandwidth costs money. If > you upload a lot of data but do not query it often, Amazon can make > s

Re: PositionIncrementGap inside a field

2012-01-18 Thread maurizio1976
This is actually a *Nested proximity search*. I think the query you wrote there, Mergio, will not work. and I think there is no way in Solr to run a Nested proximity query yet. Do you know anything about that Erik? this is what you want to do: http://www.slideshare.net/MarkHarwood/proposal-for-nes

Re: replication, disk space

2012-01-18 Thread Artem Lokotosh
Which OS do you using? Maybe related to this Solr bug https://issues.apache.org/jira/browse/SOLR-1781 On Wed, Jan 18, 2012 at 6:32 PM, Jonathan Rochkind wrote: > So Solr 1.4. I have a solr master/slave, where it actually doesn't poll for > replication, it only replicates irregularly when I issue

RE: replication, disk space

2012-01-18 Thread Dyer, James
I've seen this happen when the configuration files change on the master and replication deems it necessary to do a core-reload on the slave. In this case, replication copies the entire index to the new directory then does a core re-load to make the new config files and new index directory go liv

Re: "index-time" over boosted

2012-01-18 Thread Jan Høydahl
> I've come accros a problem where newly indexed pages almost always come > first even when the term frequency is relatively slow. There is no inherent index-time boost, so this must be something else. Can you give us an example of a query? Which query parser do you use? > I read the posts below

Re: replication, disk space

2012-01-18 Thread Tomás Fernández Löbbe
As far as I know, the replication is supposed to delete the old directory index. However, the initial question is "why is this new index directory being created". Are you adding/updating documents in the slave? what about optimizing it? Are you rebuilding the index from scratch in the master? Also

Pdf Portfolios

2012-01-18 Thread Lucas Simão
Hello , I am trying to index PDF files in Solr when the PDF file is simple everything is fine but when i use Portfolio PDF Portfolio ( http://help.adobe.com/en_US/Acrobat/9.0/Standard/WSA2872EA8-9756-4a8c-9F20-8E93D59D91CE.html ) using tika it does not works. Someone know how to extract data f

Re: Trying to understand SOLR memory requirements

2012-01-18 Thread Dave
Ok, I've been able to pull the code from SVN, build it, and compile my SpellingQueryConverter against it. However, I'm at a loss as to where to find / how to build the solr.war file? On Tue, Jan 17, 2012 at 8:59 AM, Robert Muir wrote: > I committed it already: so you can try out branch_3x if you

RE: Trying to understand SOLR memory requirements

2012-01-18 Thread Steven A Rowe
Hi Dave, Try 'ant usage' from the solr/ directory. Steve > -Original Message- > From: Dave [mailto:dla...@gmail.com] > Sent: Wednesday, January 18, 2012 2:11 PM > To: solr-user@lucene.apache.org > Subject: Re: Trying to understand SOLR memory requirements > > Ok, I've been able to pull

Re: How to return the distance geo distance on solr 3.5 with bbox filtering

2012-01-18 Thread Mikhail Khludnev
Can you try to specify two fqs, geodist as a function query, sort by score? fq={!bbox}&.&sort=score%20asc&fq=trafficRouteId:235&q={!func}geodist()&fl=*,score On Wed, Jan 18, 2012 at 4:46 PM, Maxim Veksler wrote: > Hello Mikhail, > > Please see reply inline. > > On Wed, Jan 18, 2012 at 11:00

Re: How can I index this?

2012-01-18 Thread ahammad
That would certainly work. Just as a general thing, how would one go about indexing Sharepoint content anyway? I heard about the Sharepoint connector for Lucene but I know nothing about it. Is there a standard best practice method? Also, what are your thoughts on extending the DIH? Is that recomm

Solr hides some facet.fields when doing a distributed search over multiple shards

2012-01-18 Thread Daniel Bruegge
Hi, I have asked the question already over Stackoverflow ( http://stackoverflow.com/questions/8913654/solr-hides-some-facet-fields-when-doing-a-distributed-search), but maybe someone here can give me a hint how to solve this issue: I am searching over 6 Solr shards (Solr version 3.5). What I reco

Re: Solr hides some facet.fields when doing a distributed search over multiple shards

2012-01-18 Thread Yonik Seeley
On Wed, Jan 18, 2012 at 3:36 PM, Daniel Bruegge wrote: > > Hi, > > I have asked the question already over Stackoverflow ( > http://stackoverflow.com/questions/8913654/solr-hides-some-facet-fields-when-doing-a-distributed-search), > but maybe someone here can give me a hint how to solve this issue:

Re: Solr hides some facet.fields when doing a distributed search over multiple shards

2012-01-18 Thread Daniel Bruegge
Thanks a lot. That worked like a charm. On Wed, Jan 18, 2012 at 9:50 PM, Yonik Seeley wrote: > On Wed, Jan 18, 2012 at 3:36 PM, Daniel Bruegge > wrote: > > > > Hi, > > > > I have asked the question already over Stackoverflow ( > > > http://stackoverflow.com/questions/8913654/solr-hides-some-fac

conditional field weighting

2012-01-18 Thread Jack Kanaska
Hello Solr Users, I am wondering if there's any mechanism to achieve conditional field weighting. For example, let's say I have 3 fields which are being searched: NAME, DESCRIPTION, LOCATION I want the weights to be applied according to these rules: 1) If search term is found in NAME, use weigh

Re: Trying to understand SOLR memory requirements

2012-01-18 Thread Dave
Unfortunately, that doesn't look like it solved my problem. I built the new .war file, dropped it in, and restarted the server. When I tried to build the spellchecker index, it ran out of memory again. Is there anything I needed to change in the configuration? Did I need to upload new .jar files, o

Highlighting more than 1 term

2012-01-18 Thread Tim Hibbs
Hello, I have so far been unable to get more than one term highlighted for a given field. In the example below, I expected (and want) both the words "Scheduling" and "Pickup" to be surrounded with , but only one word is ever highlighted. Any advice would be greatly appreciated. Points of in

Takes a while to see changes in data even after comit

2012-01-18 Thread abhayd
hi we have a small index . Whenever we commit new data we still see some old data coming from SOLR. ( NOT A BROWSER CACHE ISSUE) We do have autowarmcount set. As i read auto-warm count gets entries from old cache to pre-populate filter cache. Can this cause such type of issue? -- View this mes

Enforce overall Solr timeout

2012-01-18 Thread Jose Aguilar
Hi all, Is there a setting to enforce an overall timeout for Solr? For example, we are using setting timeallowed=2000 in solrconfig.xml (using version 3.5), but as far as I can tell, that only applies to the search part that returns partial results if it takes more than 2 seconds and returns pa

How can a distributed Solr setup scale to TB-data, if URL limitations are 4000 for distributed shard search?

2012-01-18 Thread Daniel Bruegge
Hi, I am just wondering how I can 'grow' a distributed Solr setup to an index size of a couple of terabytes, when one of the distributed Solr limitations is max. 4000 characters in URI limitation. See: *The number of shards is limited by number of characters allowed for GET > method's URI; most W

Re: How can a distributed Solr setup scale to TB-data, if URL limitations are 4000 for distributed shard search?

2012-01-18 Thread Mark Miller
You can raise the limit to a point. On Jan 18, 2012, at 5:59 PM, Daniel Bruegge wrote: > Hi, > > I am just wondering how I can 'grow' a distributed Solr setup to an index > size of a couple of terabytes, when one of the distributed Solr limitations > is max. 4000 characters in URI limitation. Se

Re: Highlighting more than 1 term

2012-01-18 Thread aronitin
Hi Tim, Can you share the "text_en" type definition? Do check if your have Stemmer configured in the type definition. If not then that might be the reason of scheduled not matching with scheduling. Thanks Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-

How to boost the relevancy of a field

2012-01-18 Thread Dean Del Ponte
I'm indexing some web pages and would like the "title" field to hold more relevency than any other fields. What's the best way to do this? For example, if I search for the word "Solr", a web page with a title of "Solr" should rank higher than a web page with a title of "Nutch", even if the nutch

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Steven A Rowe
Hi Peter, Commercial solicitations are taboo here, except in the context of a request for help that is directly relevant to a product or service. Please don’t do this again. Steve Rowe From: Peter Velikin [mailto:pe...@velobit.com] Sent: Wednesday, January 18, 2012 6:33 PM To: solr-user@lucene

Re: How can a distributed Solr setup scale to TB-data, if URL limitations are 4000 for distributed shard search?

2012-01-18 Thread Daniel Bruegge
But you can read so often about huge solr clusters and I am wondering how they do this. Because I also read often, that the Index size of one shard should fit into RAM. Or at least the heap size should be as big as the index size. So I see a lots of limitations hardware-wise. Or am I on the totally

RE: Highlighting more than 1 term

2012-01-18 Thread Tim Hibbs
Aro, thanks for your interest and response. I'm using the "stock" definition in the supplied config.xml, as follows: When viewing the debug output of the results, I have: 2.097683 = (MATCH) sum of: 1.072057 = (MATCH) weight(text:schedul in 1595), product of: 0.75786966

Re: How can a distributed Solr setup scale to TB-data, if URL limitations are 4000 for distributed shard search?

2012-01-18 Thread Darren Govoni
Try changing the URI/HTTP/GET size limitation on your app server. On 01/18/2012 05:59 PM, Daniel Bruegge wrote: Hi, I am just wondering how I can 'grow' a distributed Solr setup to an index size of a couple of terabytes, when one of the distributed Solr limitations is max. 4000 characters in UR

Re: How to boost the relevancy of a field

2012-01-18 Thread aronitin
Hi Dean, You can use Query Time boosting where you specify the boost value in the query itself that title:solr^2 OR body:solr Thanks Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-boost-the-relevancy-of-a-field-tp3671020p3671118.html Sent from the Solr - User m

RE: Question on Reverse Indexing

2012-01-18 Thread Shyam Bhaskaran
Dimitry, Completed a clean index and I still see the same behavior. Did not use Luke but from the search page we use leading wild card search is working. -Shyam -Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: Wednesday, January 18, 2012 5:07 PM To: solr-user@lu

Re: Takes a while to see changes in data even after comit

2012-01-18 Thread Jan Høydahl
Hi, What Solr version? How many docs? What do you use as qutowarm count? If it's too high, it may take time. Do you use spellcheck and buildOnCommit? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 18. jan. 2012, at 23:45, abhayd

Re: How to boost the relevancy of a field

2012-01-18 Thread Jan Høydahl
And using dismax query parser makes this easier: http://wiki.apache.org/solr/DisMaxQParserPlugin Example: q=solr&defType=edismax&qf=title^10 body^0.5 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 19. jan. 2012, at 01:29, aroni

Re: How can I index this?

2012-01-18 Thread Matthew Parker
I just started trying Apache ManifoldCF, which has a SharePoint connector that appears to integrate through Sharepoint's web services. Nutch also has a SharePoint connector, and it can publish documents into SOLR for indexing. On Wed, Jan 18, 2012 at 3:34 PM, ahammad wrote: > That would certain

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Jason Rutherglen
Steven, If you are going to admonish people for advertising, it should be equally dished out or not at all. On Wed, Jan 18, 2012 at 6:38 PM, Steven A Rowe wrote: > Hi Peter, > > Commercial solicitations are taboo here, except in the context of a request > for help that is directly relevant to a

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Darren Govoni
And to be honest, many people on this list are professionals who not only build their own solutions, but also buy tools and tech. I don't see what the big deal is if some clever company has something of imminent value here to share it. Considering that its a rare event. On 01/18/2012 08:28 PM

Re: first time query is very slow

2012-01-18 Thread gabriel shen
Hi Yonik, The index I am querying against is 20gb, containing 200,000documents, some of the documents are quite big, the schema contains more than 50 fields. Main content field are defined as both stored and indexed, applied htmlstripping, standardtokenization, decompounding, stemming filters, wit

Re: Enforce overall Solr timeout

2012-01-18 Thread Otis Gospodnetic
Jose, I'm not aware of such functionality in Solr.  But there may be something of that sort doable on the servlet container or, if you are using SolrJ to talk to Solr, you should be able to set the socket/HTTP connection timeout via the underlying HttpClient API. Otis  Performance Monitor

Re: conditional field weighting

2012-01-18 Thread csscouter
Jack, Did you see this response to a similar question? I think this is how to refer to it: http://lucene.472066.n3.nabble.com/How-to-boost-the-relevancy-of-a-field-tp3671020p3671020.html How to boost the relevancy of a field I have / had a similar question to yours, and the response to this qu

Re: How can a distributed Solr setup scale to TB-data, if URL limitations are 4000 for distributed shard search?

2012-01-18 Thread Otis Gospodnetic
Hi Daniel, > > From: Daniel Bruegge >Subject: Re: How can a distributed Solr setup scale to TB-data, if URL >limitations are 4000 for distributed shard search? > >But you can read so often about huge solr clusters and I am wondering how >they do this.  Huge is r

Re: first time query is very slow

2012-01-18 Thread Otis Gospodnetic
Gabriel, It sounds like it's not the CPU. Are you watching disk IO?  Maybe the time is spent reading from disk?  Although if you are repeating the same query the results should be cached by Solr if you have query cache enabled. Or JVM/GC?  Maybe the heap is too small and the JVM is busy GCing?  

Re: conditional field weighting

2012-01-18 Thread Jack Kanaska
Hi Tim, Unfortunately that's not what I am looking for. I understand how to use the relevancy of a field as described in that example, but it doesn't do what I asked, which is conditional field weighting. The difference is that specifying a query with something like &qf=name^10 description^5 lo

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Steven A Rowe
Why Jason, I declare, whatever do you mean? > -Original Message- > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: Wednesday, January 18, 2012 8:29 PM > To: solr-user@lucene.apache.org > Subject: Re: How to accelerate your Solr-Lucene appication by 4x > > Steven, > >

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Steven A Rowe
Hi Darren, I think it's rare because it's rare: if this were found to be a useful advertising space, rare would cease to be descriptive of it. But I could be wrong. Steve > -Original Message- > From: Darren Govoni [mailto:dar...@ontrenet.com] > Sent: Wednesday, January 18, 2012 8:40 P

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Jason Rutherglen
Steven, Fun-NY... 17 hits for this spam: http://search-lucene.com/?q=%22Performance+Monitoring+SaaS+for+Solr%22 Though this was already partially discussed with Chris @ fucu.org which according to him, should have already been moved to Lucene General. On Wed, Jan 18, 2012 at 11:04 PM, Steven A

Ngram autocompleter and term frequency boosting

2012-01-18 Thread Cuong Hoang
Hi guys, I'm trying to build a Ngram-based autocompleter that takes term frequency into account. Let's say I have the following documents: D1: title => "Java Developer" D2: title => "Java Programmer" D3: title => "Java Developer" When the user types in "Java", I want to display 1. "Java Develo

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Ted Dunning
On Thu, Jan 19, 2012 at 1:40 AM, Darren Govoni wrote: > And to be honest, many people on this list are professionals who not only > build their own solutions, but also buy tools and tech. > > I don't see what the big deal is if some clever company has something of > imminent value here to share i

RE: Question on Reverse Indexing

2012-01-18 Thread Shyam Bhaskaran
Dimitry, I downloaded Luke but it was not working for me against solr indexes. But using the solr analysis page I did not find any reversed sequences on the field. -Shyam -Original Message- From: Shyam Bhaskaran [mailto:shyam.bhaska...@synopsys.com] Sent: Thursday, January 19, 2012 6

Re: "index-time" over boosted

2012-01-18 Thread remi tassing
Hi, just a background on my setup. I'm crawling with Nutch-1.2, I used Solr-1.4 and Solr-3.5, with the same result. Solr is still using the default settings. I found this problem just by accident. I queried "mobile broadband", page A, has 2 occurences and scores higher than page B that has 19 occ

RE: Question on Reverse Indexing

2012-01-18 Thread Shyam Bhaskaran
Dimitry, I have used lukeall-3.5.0.jar and when trying to open the index it gives me the error "No Valid Directory at the location, try another location" When using the below command I see this error "luke java.lang.ArrayIndexOutOfBoundsException: 1" java -cp C:\lukeall-3.5.0.jar org.getopt.luk