RE: Please any idea? Highlighting exact phrases with solr

2013-10-14 Thread Bryan Loofbourrow
Sil, When you switched over to using the Fast Vector Highlighter, did you change your schema so that the fields that you want to highlight provide term vector information, and reindex your documents? Term vectors are necessary when using the Fast Vector Highlighter. Posting your schema may show va

RE: Highlighting externally stored text

2013-07-16 Thread Bryan Loofbourrow
> I'm trying to find a way to best highlight search results even though > those > results are not stored in my index. Has anyone been successful in reusing > the SOLR highlighting logic on non-stored data? I was able to do this by slightly modifying the FastVectorHighlighter so that it returned b

RE: Highlighting externally stored text

2013-07-31 Thread Bryan Loofbourrow
> Hey Bryan, Thanks for the response! To make use of the > FastVectorHighlighter > you need to enable termVectors, termPositions, and termOffsets correct? > Which takes a considerable amount of space, but is good to know and I may > possibly pursue this solution as well. Just starting to look at

RE: Solr highlighting fragment issue

2013-09-04 Thread Bryan Loofbourrow
>> I’m having some issues with Solr search results (using Solr 1.4 ) . I have enabled highlighting of searched text (hl=true) and set the fragment size as 500 (hl.fragsize=500) in the search query. Below is the (screen shot) results shown when I searched for the term ‘grandfather’ (2 results are

RE: Some highlighted snippets aren't being returned

2013-09-08 Thread Bryan Loofbourrow
Eric, Your example document is quite long. Are you setting hl.maxAnalyzedChars? If you don't, the highlighter you appear to be using will not look past the first 51,200 characters of the document for snippet candidates. http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars -- Br

RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-05-20 Thread Bryan Loofbourrow
My guess is that the problem is those 200M documents. FastVectorHighlighter is fast at deciding whether a match, especially a phrase, appears in a document, but it still starts out by walking the entire list of term vectors, and ends by breaking the document into candidate-snippet fragments, both p

RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-05-29 Thread Bryan Loofbourrow
atches/highlighting. I have setup another request handler that > only searches the whole word fields and it returns in 850 ms with > highlighting. > > Any ideas? > > - Andy > > > -Original Message- > From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic

RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-06-18 Thread Bryan Loofbourrow
description_par content content_par" so that it > returns highlights for full and partial word matches. All of those > fields have indexed, stored, termPositions, termVectors, and termOffsets > set to "true". > > It all seems redundant just to allow for partial

RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-06-18 Thread Bryan Loofbourrow
hen > > I turn on highlighting that I take the huge performance hit. > > > > Again, I'm using the FastVectorHighlighting. The hl.fl is set to "name > > name_par description description_par content content_par" so that it > > returns highligh

Improving proximity search performance

2012-02-16 Thread Bryan Loofbourrow
. Thanks, -- Bryan Loofbourrow

RE: Frequent garbage collections after a day of operation

2012-02-16 Thread Bryan Loofbourrow
A couple of thoughts: We wound up doing a bunch of tuning on the Java garbage collection. However, the pattern we were seeing was periodic very extreme slowdowns, because we were then using the default garbage collector, which blocks when it has to do a major collection. This doesn't sound like yo

RE: Improving proximity search performance

2012-02-17 Thread Bryan Loofbourrow
Apologies. I meant to type “1.4 TB” and somehow typed “1.4 GB.” Little wonder that no one thought the question was interesting, or figured I must be using Sneakernet to run my searches. -- Bryan Loofbourrow -- *From:* Bryan Loofbourrow [mailto:bloofbour

Exception using distributed field-collapsing

2012-06-20 Thread Bryan Loofbourrow
I am doing a search on three shards with identical schemas (I double-checked!), using the group feature, and Solr/Lucene 3.5. Solr is giving me back the exception listed at the bottom of this email: Other information: My schema uses the following field types: StrField, DateField, TrieDateFiel

RE: Exception using distributed field-collapsing

2012-06-20 Thread Bryan Loofbourrow
> Hi Bryan, > > What is the fieldtype of the groupField? You can only group by field > that is of type string as is described in the wiki: > http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters > > When you group by another field type a http 400 should be returned > instead if this error.

RE: Exception using distributed field-collapsing

2012-06-21 Thread Bryan Loofbourrow
indexed. > I've had a problem with distributed not working when the uniqueKey field > was indexed but not stored. Was it the same exception I'm seeing? -- Bryan > > -Original Message- > From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] > Sent

RE: Exception using distributed field-collapsing

2012-06-21 Thread Bryan Loofbourrow
ieve it was a different exception, just brainstorming. (it was a > null reference iirc) > > Does a *:* query with no sorting work? > > Cody > > -Original Message- > From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] > Sent: Thursday, June 21, 2012 1

Displaying highlights in formatted HTML document

2011-06-08 Thread Bryan Loofbourrow
Here is my use case: I have a large number of HTML documents, sizes in the 0.5K-50M range, most around, say, 10M. I want to be able to present the user with the formatted HTML document, with the hits tagged, so that he may iterate through them, and see them in the context of the document, wit

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Bryan Loofbourrow
Ludovic, >> how do you index your html files ? I mean do you create fields for different parts of your document (for different stop words lists, stemming, etc) ? with DIH or solrj or something else ? << We are sending them over http, and using Tika to strip the HTML, at present. We do not split

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Bryan Loofbourrow
> -Original Message- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: Wednesday, June 08, 2011 11:56 PM > To: solr-user@lucene.apache.org > Subject: Re: Displaying highlights in formatted HTML document > > > > --- On Thu, 6/9/11, Bryan Loofbourrow

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Bryan Loofbourrow
> > OK, I think see what you're up to. Might be pretty viable > > for me as well. > > Can you talk about anything in your mappings.txt files that > > is an > > important part of the solution? > > It is not important. I just copied it. Plus html strip char filter does > not have mappings parameter.

RE: solr java.lang.NullPointerException on select queries

2012-06-26 Thread Bryan Loofbourrow
Regarding the large number of files, even after optimize, we found that when rebuilding a large, experimental 1.7TB index on Solr 3.5, instead of Solr 1.4.1, there were a ton of index files, thousands, in 3.5, when there used to be just 10 (or 11?) segments worth (as expected with mergeFactor set t

RE: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-16 Thread Bryan Loofbourrow
5 min is ridiculously long for a query that used to take 65ms. That ought to be a great clue. The only two things I've seen that could cause that are thrashing, or GC. Hard to see how it could be thrashing, given your hardware, so I'd initially suspect GC. Aim VisualVM at the JVM. It shows how muc

RE: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-16 Thread Bryan Loofbourrow
estion. > > I haven't used VisualVM before but I am going to use it to see where CPU > is > going. I saw that CPU is overly used. I haven't seen so much CPU use in > testing. > Although I think GC is not a problem, splitting the jvm per shard would be > a good idea. >

A strange Solr NullPointerException while shutting down Tomcat, possible connection to messed-up index files

2012-09-18 Thread Bryan Loofbourrow
I’m using Solr/Lucene 3.6 under Tomcat 6. When shutting down an indexing server after much indexing activity, occasionally, I see the following NullPointerException trace from Tomcat: INFO: Stopping Coyote HTTP/1.1 on http-1800 Exception in thread "Lucene Merge Thread #1" org.apache.lucene.i