sumptions can be painfully expensive.
Thank you for reading my bloated e-mail. Again, I'm mostly just looking to
be pointed to various pieces of the Lucene / Solr code-base, and am trolling
for any insight that people might share.
Scott Gonyea
t all
20. Further, the white-listing can generally be applied to other sites in
which they appear.
I'd love to get some thoughts on how to tackle this problem, but I think
that kicking off separate documents, within Solr, for each specific
occurrence... would be the simplest path.
to the
> original pages for each context
>
> You may be able to represent your grammar as textual rules instead of code.
> Your latency may be minutes instead of milliseconds though...
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training
Hi,
I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create
tokens, based solely on lower-casing characters. Is there a way to tell it
NOT to drop non-characters? It's amazingly frustrating that the
TokenizerFactory and the FilterFactory have two entirely different modes of
behav
I went for a different route:
https://issues.apache.org/jira/browse/LUCENE-2644
Scott
On Tue, Sep 14, 2010 at 11:18 AM, Robert Muir wrote:
> On Tue, Sep 14, 2010 at 1:54 PM, Scott Gonyea wrote:
>
> > Hi,
> >
> > I'm tweaking my schema and the LowerCaseTo
ated
> performance advantage. (At least I hope that's what happened, otherwise
> there's no excuse for it!).
>
> Do you know you get a worthwhile performance benefit for what you're doing?
> If not, why do it?
>
> Jonathan
>
>
> Scott Gonyea wrote:
>
>&g
it for the demonstrated
> > performance advantage. (At least I hope that's what happened, otherwise
> > there's no excuse for it!).
> >
> > Do you know you get a worthwhile performance benefit for what you're
> doing?
> > If not, why do it?
> >
> &
table feeling? If it performs worse enough to matter,
> then that's why you'd need a custom tokenizer, other than that I'm not sure
> anything's undesirable about the PatternTokenizer.
>
>
> Jonathan
>
> Scott Gonyea wrote:
>
>> I'd agree with yo
If you want to do it in Ruby, you can use this script as scaffolding:
require 'rsolr' # run `gem install rsolr` to get this
solr = RSolr.connect(:url => 'http://ip-10-164-13-204:8983/solr')
total = solr.select({:rows => 0})["response"]["numFound"]
rows = 10
query = {
:rows => rows,
:sta
lol, note to self: scratch out IPs. Good thing firewalls exist to
keep my stupidity at bay.
Scott
On Thu, Sep 16, 2010 at 2:55 PM, Scott Gonyea wrote:
> If you want to do it in Ruby, you can use this script as scaffolding:
> require 'rsolr' # run `gem install rsolr` to
Break your HTML pages into the desired fields, format it as follows:
http://wiki.apache.org/solr/UpdateXmlMessages
And away you go. You may want to search / review the Wiki. Also, if
you're indexing websites and want to place it in Solr, you should look
at Nutch. It can do all that work for yo
Your solrconfig has a highlighting section. You can make that CDATA
thing whatever you want. I changed it to .
On Thu, Sep 30, 2010 at 2:54 PM, efr...@gmail.com wrote:
> Hi all -
>
> Does anyone know how to produce solr results where the match term is
> highlighted in bold rather than italic?
>
Wow, this is probably the most annoying Solr issue I've *ever* dealt
with. First question: How do I debug Dismax, and its query handling?
Issue: When I query against this StrField, I am attempting to do an
*exact* match... Albeit one that is case-insensitive :). So, 90%
exact. It works in a maj
Wow, that's pretty infuriating. Thank you for the suggestion. I
added it to the Wiki, with the hope that if it contains misinformation
then someone will correct it and, consequently, save me from another
one of these experiences :) (...and to also document that, hey, there
is a tokenizer which t
RD_STARTbaffleTEST_KEYWORD_END_prices.html\"]TEST_KEYWORD_STARTbaffleTEST_KEYWORD_END
Is there something about this data that makes the highlighter not want to split
it up? Do I have to have Solr tokenize the words by some character that I
somehow excluded?
Thank you,
Scott Gonyea
Thank you,
Scott Gonyea
First, make sure your request handler is set to spit out everything. I take
it you did, but I hate to assume.
Second, I suggest indexing your data twice. One as tokenized-text, the
other as a string. It'll save you from howling at the moon in anguish...
Unless you really only do care about pure
17 matches
Mail list logo