date:20150928

Keyword match distance rule issue

2015-09-28 Thread anil.vadhavane

Hello, I'm using Lucene Solr 4.10.4 for Keyword match functionality. I found some issues with distance rule. I have added search keyword with distance 2 "Bridgewater~2". When I make search it did not return "bridwater" in results which should be. If I change placing of 'ge' at any other place it

Re: More Like This on numeric fields - BF accepted by MLT handler

2015-09-28 Thread Alessandro Benedetti

Hi Upaya, thanks for the explanation, I actually already did some investigations about it ( my first foundation was : http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ ) and then I took a look to the code. Was just wondering what the community was thinking about including/providi

Re: query parsing

2015-09-28 Thread Alessandro Benedetti

happy to read that, regarding the spellcheck, is a different thing, so let us know for further details ! Cheers 2015-09-27 18:59 GMT+01:00 Mark Fenbers : > I am delighted to announce that I have it all working again! Well, not > all, just the searching! > > I deleted my core and created a new o

RE: New Project setup too clunky

2015-09-28 Thread Duck Geraint (ext) GBJH

Huh, strange - I didn't even notice that you could create cores through the UI. I suppose it depends what order you read and infer from the documentation. See "Create a Core": https://cwiki.apache.org/confluence/display/solr/Running+Solr I followed the "solr create -help" option to work out how

What kind of nutch documents does Solr index?

2015-09-28 Thread Daniel Holmes

Hi, I am using apache Nutch 1.7 to crawl and apache Solr 4.7.2 for indexing. In my tests there is a gap between number of fetched results of Nutch and number of indexed documents in Solr. For example one of the crawls is fetched 23343 pages and 1146 images successfully while in the Solr 19250 docs

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-28 Thread Toke Eskildsen

On Sun, 2015-09-27 at 14:47 +0200, Uwe Reh wrote: > Like Walter Underwood wrote, in technical sense faceting on authors > isn't a good idea. In a technical sense, there is no good or bad about faceting on high-cardinality fields in Solr. The faceting code is fairly efficient (modulo the newly dis

Re: Keyword match distance rule issue

2015-09-28 Thread Alessandro Benedetti

Maybe it's a silly observation... But are you lowercasing at indexing/querying time ? Can you show us the schema analysis config for the field type you use ? Because strictly talking about Levenshtein distance bridwater is 3 edits from Bridgewater. Cheers 2015-09-28 8:26 GMT+01:00 anil.vadhavane

Re: position of the search term

2015-09-28 Thread Alessandro Benedetti

So, based on my knowledge, it is not possible ( except if you customise the component) . Read here : http://lucene.472066.n3.nabble.com/How-do-I-recover-the-position-and-offset-a-highlight-for-solr-4-1-4-2-td4051763.html Another data structure that you can think as useful is to store the Term Vect

Re: firstSearcher cache warming with own QuerySenderListener

2015-09-28 Thread Christian Reuschling

Erick, Walter and all, as I wrote, I am aware of the firstSearcher event, we tried it manually before we choosed to enhance the QuerySenderListener. I think our usage scenario (I didn't wrote about it for simplicity) is a bit different from yours, what makes this necessary. We are implementing

PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Moen Endre

How does facet_count work with a facet field that is defined as solr. PathHierarchyTokenizerFactory? I have multiple records that contains field Parameter which is of type PathHierarchyTokenizerFactory. E.g "Parameter": [ "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE>WATER TEMPERATUR

Re: What kind of nutch documents does Solr index?

2015-09-28 Thread Upayavira

I suspect you may be better off asking this on the Nutch user list. The decisions you are describing will be within the Nutch codebase, not Solr. Someone here may know (hopefully) but you may get more support over on the Nutch list. One suggestion -start with a clean, empty index. Run a crawl. Loo

[ANNOUNCE] Luke 5.3.0 released

2015-09-28 Thread Dmitry Kan

This is a major release supporting lucene / solr 5.3.0. Download the zip here: https://github.com/DmitryKey/luke/releases/tag/luke-5.3.0 This release runs on Java8 and does not run on Java7. The release includes a number of pull requests and github issues. Worth mentioning: https://github.com/Dm

Re: PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Upayavira

There is also facet.limit which says how many facet entries to return. Is that catching you? The document either matches your query, or doesn't. If it does, then all values of the Parameter field should be included in your faceting. But, perhaps not all facet buckets are being returned to you - he

Cost of having multiple search handlers?

2015-09-28 Thread Oliver Schrenk

Hi, I want to register multiple but identical search handler to have multiple buckets to measure performance for our different apis and consumers (and to find out who is actually using Solr). What are there some costs associated with having multiple search handlers? Are they neglible? Cheers,

RE: String index out of range exception from Spell check

2015-09-28 Thread Dyer, James

This looks similar to SOLR-4489, which is marked fixed for version 4.5. If you're using an older version, the fix is to upgrade. Also see SOLR-3608, which is similar but here it seems as if the user's query is more than spellcheck was designed to handle. This should still be looked at and p

RE: PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Moen Endre

Yes, that solved my problem. There must be an implisite facet.limit set because I tried the same url query with face.limit=1. And got back records with "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC" Cheers! Endre -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: 28. sept

Re: More Like This on numeric fields - BF accepted by MLT handler

2015-09-28 Thread Upayavira

You could use the MLT query parser, and combine that with other queries, whether as filters or boosts. You can't yet use stream.body yet, so would need to use the handler if you need that. Upayavira On Mon, Sep 28, 2015, at 09:53 AM, Alessandro Benedetti wrote: > Hi Upaya, > thanks for the expla

Re: Cost of having multiple search handlers?

2015-09-28 Thread Upayavira

I would expect this to be negligible. Upayavira On Mon, Sep 28, 2015, at 01:30 PM, Oliver Schrenk wrote: > Hi, > > I want to register multiple but identical search handler to have multiple > buckets to measure performance for our different apis and consumers (and > to find out who is actually us

Re: Cost of having multiple search handlers?

2015-09-28 Thread Shawn Heisey

On 9/28/2015 6:30 AM, Oliver Schrenk wrote: > I want to register multiple but identical search handler to have multiple > buckets to measure performance for our different apis and consumers (and to > find out who is actually using Solr). > > What are there some costs associated with having multi

Re: bulk reindexing 5.3.0 issue

2015-09-28 Thread Gili Nachum

Were all of shard replica in active state (green color in admin ui) before starting? Sounds like it otherwise you won't hit the replica that is out of sync. Replicas can get out of sync, and report being in sync after a sequence of stop start w/o a chance to complete sync. See if it might have hap

entity processing order during updates

2015-09-28 Thread Roxana Danger

Hello, I am importing in solr 2 entities coming from 2 different tables, and I have defined an update request processor chain with two custom processor factories: - the first processor factory needs to be executed first for one type of entities and then for the other (I differentiate the

Re: PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Alessandro Benedetti

>From the Solr wiki, the default facet.limit should be 100 ! Anyway I find the way field facet is shown for field path hierarchy token filtered fields, to be not so user friendly. Ideally for those fields we should show a facet representation similar to facet pivot. Should be nice to think an idea

Re: Cost of having multiple search handlers?

2015-09-28 Thread Gili Nachum

A different solution to the same need: I'm measuring response times of different collections measuring online/batch queries apart using New Relic. I've added a servlet filter that analyses the request and makes this info available to new relic over a request argument. The built in new relic solr

Re: Cost of having multiple search handlers?

2015-09-28 Thread Walter Underwood

We did the same thing, but reporting performance metrics to Graphite. But we won’t be able to add servlet filters in 6.x, because it won’t be a webapp. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 28, 2015, at 11:32 AM, Gili Nachum wrote: >

Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Ajinkya Kale

Hi, I am trying to retrieve all the documents from a solr index in a batched manner. I have 100M documents. I am retrieving them using the method proposed here https://nowontap.wordpress.com/2014/04/04/solr-exporting-an-index-to-an-external-file/ I am dumping 10M document splits in each file. I ge

Re: bulk reindexing 5.3.0 issue

2015-09-28 Thread Ravi Solr

Gili I was constantly checking the cloud admin UI and it always stayed Green, that is why I initially overlooked sync issues...finally when all options dried out I went individually to each node and quieried and that is when i found the out of sync issue. The way I resolved my issue was shut down t

RE: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Markus Jelsma

Hi - you need to use the CursorMark feature for larger sets: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results M. -Original message- > From:Ajinkya Kale > Sent: Monday 28th September 2015 20:46 > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org > Subj

Re: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Ajinkya Kale

If I am not wrong this works only with Solr version > 4.7.0 ? On Mon, Sep 28, 2015 at 12:23 PM Markus Jelsma wrote: > Hi - you need to use the CursorMark feature for larger sets: > https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results > M. > > > > -Original message- > > F

highlighting

2015-09-28 Thread Mark Fenbers

Greetings! I have highlighting turned on in my Solr searches, but what I get back is tags surrounding the found term. Since I use a SWT StyledText widget to display my search results, what I really want is the offset and length of each found term, so that I can highlight it in my own way wi

Re: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Gili Nachum

If you can't use CursorMark, then I suggest not using the start parameter, instead sort asc by a unique field and and range the query to records with a field value larger then the last doc you read. Then set rows to be whatever you found can fit in memory. On Mon, Sep 28, 2015 at 10:59 PM, Ajinkya

Passing Basic Auth info to HttpSolrClient

2015-09-28 Thread Steven White

Hi, I'm using HttpSolrClient to connect to Solr. Everything works until when I enabled basic authentication in Jetty. My question is, how do I pass to SolrJ the basic auth info. so that I don't get a 401 error? Thanks in advance Steve

error reporting during indexing

2015-09-28 Thread Matteo Grolla

Hi, if I need fine grained error reporting I use Http Solr server and send 1 doc per request using the add method. I report errors on exceptions of the add method, I'm using autocommit so I'm not seing errors related to commit. Am I loosing some errors? Is there a better way? Thanks

CloudSolrClient timeout settingsr

2015-09-28 Thread Arcadius Ahouansou

CloudSolrClient has zkClientTimeout/zkConnectTimeout for access to zookeeper. It would be handy to also have the possibility to set something like soTimeout/connectTimeout for accessing the solr nodes similarly to the old non-cloud client. Currently, in order to set a timeout for the client to

Re: Cost of having multiple search handlers?

2015-09-28 Thread Jeff Wartes

One would hope that https://issues.apache.org/jira/browse/SOLR-4735 will be done by then. On 9/28/15, 11:39 AM, "Walter Underwood" wrote: >We did the same thing, but reporting performance metrics to Graphite. > >But we won’t be able to add servlet filters in 6.x, because it won’t be a >webapp

RE: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread will martin

http://opensourceconnections.com/blog/2014/07/13/reindexing-collections-with-solrs-cursor-support/ -Original Message- From: Ajinkya Kale [mailto:kaleajin...@gmail.com] Sent: Monday, September 28, 2015 2:46 PM To: solr-user@lucene.apache.org; java-u...@lucene.apache.org Subject: Solr jav

Re: error reporting during indexing

2015-09-28 Thread Erick Erickson

You shouldn't be losing errors with HttpSolrServer. Are you seeing evidence that you are or is this mostly a curiosity question? Do not it's better to batch up docs, your throughput will increase a LOT. That said, when you do batch (e.g. send 500 docs per update or whatever) and you get an error b

Re: CloudSolrClient timeout settingsr

2015-09-28 Thread Shawn Heisey

On 9/28/2015 4:04 PM, Arcadius Ahouansou wrote: > CloudSolrClient has zkClientTimeout/zkConnectTimeout for access to > zookeeper. > > It would be handy to also have the possibility to set something like > soTimeout/connectTimeout for accessing the solr nodes similarly to the old > non-cloud clien

Re: Cost of having multiple search handlers?

2015-09-28 Thread Walter Underwood

We built our own because there was no movement on that. Don’t hold your breath. Glad to contribute it. We’ve been running it in production for a year, but the config is pretty manual. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 28, 2015, at

38 matches

Mail list logo