Sil,
When you switched over to using the Fast Vector Highlighter, did you
change your schema so that the fields that you want to highlight provide
term vector information, and reindex your documents? Term vectors are
necessary when using the Fast Vector Highlighter. Posting your schema may
show va
> I'm trying to find a way to best highlight search results even though
> those
> results are not stored in my index. Has anyone been successful in
reusing
> the SOLR highlighting logic on non-stored data?
I was able to do this by slightly modifying the FastVectorHighlighter so
that it returned b
> Hey Bryan, Thanks for the response! To make use of the
> FastVectorHighlighter
> you need to enable termVectors, termPositions, and termOffsets correct?
> Which takes a considerable amount of space, but is good to know and I
may
> possibly pursue this solution as well. Just starting to look at
>> I’m having some issues with Solr search results (using Solr 1.4 ) . I
have enabled highlighting of searched text (hl=true) and set the fragment
size as 500 (hl.fragsize=500) in the search query.
Below is the (screen shot) results shown when I searched for the term
‘grandfather’ (2 results are
Eric,
Your example document is quite long. Are you setting hl.maxAnalyzedChars?
If you don't, the highlighter you appear to be using will not look past
the first 51,200 characters of the document for snippet candidates.
http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars
-- Br
My guess is that the problem is those 200M documents.
FastVectorHighlighter is fast at deciding whether a match, especially a
phrase, appears in a document, but it still starts out by walking the
entire list of term vectors, and ends by breaking the document into
candidate-snippet fragments, both p
atches/highlighting. I have setup another request handler that
> only searches the whole word fields and it returns in 850 ms with
> highlighting.
>
> Any ideas?
>
> - Andy
>
>
> -Original Message-
> From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic
description_par content content_par" so that it
> returns highlights for full and partial word matches. All of those
> fields have indexed, stored, termPositions, termVectors, and termOffsets
> set to "true".
>
> It all seems redundant just to allow for partial
hen
> > I turn on highlighting that I take the huge performance hit.
> >
> > Again, I'm using the FastVectorHighlighting. The hl.fl is set to "name
> > name_par description description_par content content_par" so that it
> > returns highligh
.
Thanks,
-- Bryan Loofbourrow
A couple of thoughts:
We wound up doing a bunch of tuning on the Java garbage collection.
However, the pattern we were seeing was periodic very extreme slowdowns,
because we were then using the default garbage collector, which blocks
when it has to do a major collection. This doesn't sound like yo
Apologies. I meant to type “1.4 TB” and somehow typed “1.4 GB.” Little
wonder that no one thought the question was interesting, or figured I must
be using Sneakernet to run my searches.
-- Bryan Loofbourrow
--
*From:* Bryan Loofbourrow [mailto:bloofbour
I am doing a search on three shards with identical schemas (I
double-checked!), using the group feature, and Solr/Lucene 3.5. Solr is
giving me back the exception listed at the bottom of this email:
Other information:
My schema uses the following field types: StrField, DateField,
TrieDateFiel
> Hi Bryan,
>
> What is the fieldtype of the groupField? You can only group by field
> that is of type string as is described in the wiki:
> http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters
>
> When you group by another field type a http 400 should be returned
> instead if this error.
indexed.
> I've had a problem with distributed not working when the uniqueKey field
> was indexed but not stored.
Was it the same exception I'm seeing?
-- Bryan
>
> -Original Message-
> From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
> Sent
ieve it was a different exception, just brainstorming. (it was
a
> null reference iirc)
>
> Does a *:* query with no sorting work?
>
> Cody
>
> -Original Message-
> From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
> Sent: Thursday, June 21, 2012 1
Here is my use case:
I have a large number of HTML documents, sizes in the 0.5K-50M range, most
around, say, 10M.
I want to be able to present the user with the formatted HTML document, with
the hits tagged, so that he may iterate through them, and see them in the
context of the document, wit
Ludovic,
>> how do you index your html files ? I mean do you create fields for
different
parts of your document (for different stop words lists, stemming, etc) ?
with DIH or solrj or something else ? <<
We are sending them over http, and using Tika to strip the HTML, at
present.
We do not split
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Wednesday, June 08, 2011 11:56 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Displaying highlights in formatted HTML document
>
>
>
> --- On Thu, 6/9/11, Bryan Loofbourrow
> > OK, I think see what you're up to. Might be pretty viable
> > for me as well.
> > Can you talk about anything in your mappings.txt files that
> > is an
> > important part of the solution?
>
> It is not important. I just copied it. Plus html strip char filter does
> not have mappings parameter.
Regarding the large number of files, even after optimize, we found that
when rebuilding a large, experimental 1.7TB index on Solr 3.5, instead of
Solr 1.4.1, there were a ton of index files, thousands, in 3.5, when there
used to be just 10 (or 11?) segments worth (as expected with mergeFactor
set t
5 min is ridiculously long for a query that used to take 65ms. That ought
to be a great clue. The only two things I've seen that could cause that
are thrashing, or GC. Hard to see how it could be thrashing, given your
hardware, so I'd initially suspect GC.
Aim VisualVM at the JVM. It shows how muc
estion.
>
> I haven't used VisualVM before but I am going to use it to see where CPU
> is
> going. I saw that CPU is overly used. I haven't seen so much CPU use in
> testing.
> Although I think GC is not a problem, splitting the jvm per shard would
be
> a good idea.
>
I’m using Solr/Lucene 3.6 under Tomcat 6.
When shutting down an indexing server after much indexing activity,
occasionally, I see the following NullPointerException trace from Tomcat:
INFO: Stopping Coyote HTTP/1.1 on http-1800
Exception in thread "Lucene Merge Thread #1"
org.apache.lucene.i
24 matches
Mail list logo