Hello, I think you are confused between two different index
structures, probably because of the name of the options in solr.
1. indexing term vectors: this means given a document, you can go
lookup a miniature "inverted index" just for that document. That means
each document has "term vectors" whi
April 2014, Apache Solr™ 4.7.2 available
The Lucene PMC is pleased to announce the release of Apache Solr 4.7.2
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted sear
May 2014, Apache Solr™ 4.8.1 available
The Lucene PMC is pleased to announce the release of Apache Solr 4.8.1
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search
25 June 2014, Apache Solr™ 4.9.0 available
The Lucene PMC is pleased to announce the release of Apache Solr 4.9.0
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted se
I think its a bug, but thats just my opinion. i sent a patch to dev@
for thoughts.
On Tue, Oct 29, 2013 at 6:09 PM, Erick Erickson wrote:
> Hmmm, so you're saying that merging indexes where a field
> has been removed isn't handled. So you have some documents
> that do have a "what" field, but you
which example? there are so many.
On Wed, Nov 13, 2013 at 1:00 PM, Mark Miller wrote:
> RE: the example folder
>
> It’s something I’ve been pushing towards moving away from for a long time -
> see https://issues.apache.org/jira/browse/SOLR-3619 Rename 'example' dir to
> 'server' and pull exampl
Your analyzer needs to set positionIncrement correctly: sounds like its broken.
On Thu, Dec 5, 2013 at 1:53 PM, Isaac Hebsh wrote:
> Hi,
> we implemented a morphologic analyzer, which stems words on index time.
> For some reasons, we index both the original word and the stem (on the same
> positi
ll right (for me).
> 2) fieldNorm is determined by the size of the termVector, isn't it? the
> termVector size isn't affected by the positions.
>
>
> On Fri, Dec 6, 2013 at 10:46 AM, Robert Muir wrote:
>
>> Your analyzer needs to set positionIncrement correctly: so
its accurate, you are wrong.
please, look at setDiscountOverlaps in your similarity. This is really
easy to understand.
On Sun, Dec 8, 2013 at 7:23 AM, Manuel Le Normand
wrote:
> Robert, you last reply is not accurate.
> It's true that the field norms and termVectors are independent. But this
>
no, its turned on by default in the default similarity.
as i said, all that is necessary is to fix your analyzer to emit the
proper position increments.
On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand
wrote:
> In order to set discountOverlaps to true you must have added the
> to the schema.x
This exception comes from OffsetAttributeImpl (e.g. you dont need to
index anything to reproduce it).
Maybe you have a missing clearAttributes() call (your tokenizer
'returns true' without calling that first)? This could explain it, if
something like a StopFilter is also present in the chain: basi
January 2014, Apache Solr™ 4.6.1 available The Lucene PMC is pleased
to announce the release of Apache Solr 4.6.1Solr is the popular,
blazing fast, open source NoSQL search platform from the Apache Lucene
project. Its major features include powerful full-text search, hit
highlighting, faceted searc
you need the solr analysis-extras jar in your classpath, too.
On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer wrote:
> Hello,
>
> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>
> I get consistently the error message
classes mentioned are
> loaded.
>
> Do you know which jar is supposed to contain the ICUCollationField?
>
> Best regards
> Thomas
>
>
>
> Am 19.02.2014 um 13:54 schrieb Robert Muir:
>
> > you need the solr analysis-extras jar in your classpath, too.
> >
&g
HOME/lib in order to use it."
>
> is misleading insofar as this README.txt doesn't mention the
> solr-analysis-extras-4.6.1.jar in dist.
>
> Best
> Thomas
>
>
> Am 19.02.2014 um 14:27 schrieb Robert Muir:
>
> > you need the solr analysis-extras
On Wed, Feb 19, 2014 at 10:33 AM, Thomas Fischer wrote:
>
> > Hmm, for standardization of text fields, collation might be a little
> > awkward.
>
> I arrived there after using custom rules for a while (see
> "RuleBasedCollator" on http://wiki.apache.org/solr/UnicodeCollation) and
> then being tol
I debugged the PDF a little. FWIW, the following code (using iText)
takes it to 9MB:
public static void main(String args[]) throws Exception {
Document document = new Document();
PdfSmartCopy copy = new PdfSmartCopy(document, new
FileOutputStream("/home/rmuir/Downloads/test.pdf"));
/
Where do you get the docid from? Usually its best to just look at the whole
algorithm, e.g. docids come from per-segment readers by default anyway so
ideally you want to access any per-document things from that same
segmentreader.
As far as supporting docvalues, FieldCache API "passes thru" to doc
If you use wikipediatokenizer it will tag different wiki elements with
different types (you can see it in the admin UI).
so then followup with typetokenfilter to only filter the types you care
about, and i think it will do what you want.
On Tue, Jul 23, 2013 at 7:53 AM, Furkan KAMACI wrote:
> Hi
On Mon, Aug 5, 2013 at 11:42 AM, Chris Hostetter
wrote:
>
> : I agree with you, 0xfffe is a special character, that is why I was asking
> : how it's handled in solr.
> : In my document, 0xfffe does not appear at the beginning, it's in the
> : content.
>
> Unless i'm missunderstanding something (an
On Mon, Aug 5, 2013 at 3:03 PM, Chris Hostetter
wrote:
>
> : > 0xfffe is not a special character -- it is explicitly *not* a character in
> : > Unicode at all, it is set asside as "not a character." specifically so
> : > that the character 0xfeff can be used as a BOM, and if the BOM is read
> : >
On Fri, Aug 9, 2013 at 7:48 PM, Erick Erickson wrote:
>
> So is there a good way, without optimizing, to purge any segments not
> referenced in the segments file? Actually I doubt that optimizing would
> even do it if I _could_, any phantom segments aren't visible from the
> segments file anyway..
On Mon, Aug 12, 2013 at 8:38 AM, Mathias Lux wrote:
> Hi!
>
> I'm basically searching for a method to put byte[] data into Lucene
> DocValues of type BINARY (see [1]). Currently only primitives and
> Strings are supported according to [1].
>
> I know that this can be done with a custom update hand
On Mon, Aug 12, 2013 at 12:25 PM, Mathias Lux wrote:
>
> Another thing for not using the the SORTED_SET and SORTED
> implementations is, that Solr currently works with Strings on that and
> I want to have a small memory footprint for millions of images ...
> which does not go well with immutables.
did you do a (real) commit before trying to use this?
I am not sure how this splitting works, but at least the merge option
requires that.
i can't see this happening unless you are somehow splitting a 0
document index (or, if the splitter is creating 0 document splits)
so this is likely just a sym
Well, i meant before, but i just took a look and this is implemented
differently than the "merge" one.
In any case, i think its the same bug, because I think the only way
this can happen is if somehow this splitter is trying to create a
0-document "split" (or maybe a split containing all deletions
On Tue, Aug 13, 2013 at 11:39 AM, Shalin Shekhar Mangar
wrote:
> The splitting code calls commit before it starts the splitting. It creates
> a LiveDocsReader using a bitset created by the split. This reader is merged
> to an index using addIndexes.
>
> Shouldn't the addIndexes code then ignore al
On Wed, Aug 14, 2013 at 3:53 AM, ses wrote:
> We are trying out the new PostingsHighlighter with Solr 4.2.1 and finding
> that the highlighting section of the response includes self-closing tags
> for
> all the fields in hl.fl (by default for edismax it is all fields in qf)
> where there are no h
On Wed, Aug 14, 2013 at 5:29 PM, Chris Hostetter
wrote:
>
> : why? Those are my sort fields and they are occupying a lot of space (doubled
> : in this case but I see that sometimes I have three or four "old" segment
> : references)
> :
> : Is there something I can do to remove those old references
On Wed, Aug 14, 2013 at 5:58 PM, Chris Hostetter
wrote:
>
> : > FieldCaches are managed using a WeakHashMap - so once the IndexReader's
> : > associated with those FieldCaches are no logner used, they will be garbage
> : > collected when and if the JVMs garbage collector get arround to it.
> : >
>
On Sat, Aug 17, 2013 at 3:59 AM, Chris Collins wrote:
> I am using 4.4 in an embedded mode and found that it has a dependency on
> hadoop 2.0.5. alpha that in turn depends on jetty 6.1.26 which I think
> pre-dates electricity :-}
>
I think this is only a "test dependency" ?
On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen wrote:
> Schema with DocValues attempt at solving problem:
> http://pastebin.com/Ne23NnW4
> Config: http://pastebin.com/x1qykyXW
>
This schema isn't using docvalues, due to a typo in your config.
it should not be DocValues="true" but docValues="true"
done. let us know if you have any problems.
On Sat, May 4, 2013 at 10:12 AM, Krunal wrote:
> Dear Sir,
>
> Kindly add me to the contributor group to help me contribute to the Solr
> wiki.
>
> My Email id: jariwalakru...@gmail.com
> Login Name: Krunal
>
> Specific changes I would like to make to
If you have a good idea... Just do it. Open an issue
On Jun 11, 2013 9:34 PM, "Alexandre Rafalovitch" wrote:
> I think it is quite hard for beginners that basic solr example
> directory is competing for attention with other - nested - examples. I
> see quite a lot of questions on which directory
On Fri, Sep 16, 2011 at 6:53 PM, Burton-West, Tom wrote:
> Hello,
>
> The TieredMergePolicy has become the default with Solr 3.3, but the
> configuration in the example uses the mergeFactor setting which applys to the
> LogByteSizeMergePolicy.
>
> How is the mergeFactor interpreted by the Tiered
On Mon, Sep 19, 2011 at 9:57 AM, Burton-West, Tom wrote:
> Thanks Robert,
>
> Removing "set" from " setMaxMergedSegmentMB" and using "maxMergedSegmentMB"
> fixed the problem.
> ( Sorry about the multiple posts. Our mail server was being flaky and the
> client lied to me about whether the messag
On Tue, Sep 20, 2011 at 12:32 PM, Michael McCandless
wrote:
>
> Or: is it possible you reopened the reader several times against the
> index (ie, after committing from Solr)? If so, I think 2.9.x never
> unmaps the mapped areas, and so this would "accumulate" against the
> system limit.
In order
https://issues.apache.org/jira/browse/LUCENE-3421
Note: if you are using this 'includeSpanScore=false' (which I think
you are, as thats where the bug applies), be aware this means the
score is *only* the result of your payload, boosts, tf, length
normalization, idf, none of this is incorporated in
Your persian pdf problem is different, and already taken care of in pdfbox trunk
https://issues.apache.org/jira/browse/PDFBOX-1127
On Tue, Oct 4, 2011 at 2:04 PM, ahmad ajiloo wrote:
> I have this problem too, in indexing some of persian pdf files.
>
> 2011/10/4 Héctor Trujillo
>
>> Hi all, I'm
On Wed, Oct 5, 2011 at 2:23 PM, David Ryan wrote:
> Hi,
>
> According to the IRA issue 2959,
> https://issues.apache.org/jira/browse/LUCENE-2959
>
> BM25 will be included in the next release of LUCENE.
>
> 1). Will BM25F be included in the next release as well as part
> of LUCENE-2959?
should be
On Wed, Oct 5, 2011 at 3:03 PM, David Ryan wrote:
> Do you mean both BM25 and BM25F?
>
>
No, BM25F and other "fielded" or structured models are somewhat different.
In these model, if you have two fields (body/title) you are saying
that "dogs" in body is actually the same term as "dogs" in title.
The word delimiter filter also does other things, it treats ' as
punctuation by default. So it normally splits on ', except if its 's
(in this case it removes the 's completely if you use this
stemEnglishPossessive).
There are a couple approaches you can use:
1. you can keep worddelimiterfilter wi
On Thu, Oct 27, 2011 at 6:00 PM, Simon Willnauer
wrote:
> we are not actively removing norms. if you set omitNorms=true and
> index documents they won't have norms for this field. Yet, other
> segment still have norms until they get merged with a segment that has
> no norms for that field ie. omit
On Fri, Oct 28, 2011 at 5:03 PM, Jason Rutherglen
wrote:
> +1 I suggested it should be backported a while back. Or that Lucene
> 4.x should be released. I'm not sure what is holding up Lucene 4.x at
> this point, bulk postings is only needed useful for PFOR.
This is not true, most modern index
On Fri, Oct 28, 2011 at 8:10 PM, Jason Rutherglen
wrote:
>> Otherwise we have "flexible indexing" where "flexible" means "slower
>> if you do anything but the default".
>
> The other encodings should exist as modules since they are pluggable.
> 4.0 can ship with the existing codec. 4.1 with addit
On Wed, Nov 2, 2011 at 8:53 AM, Phil Hoy wrote:
> It is solr 4.0 and uses the new FSTSynonymFilterFactory i believe but defers
> to ZkSolrResourceLoader to load the synonym file when in cloud mode.
> Phil
>
FYI: The synonyms implementation supports multiple formats (currently
"solr" and "wordnet
what is the point of a unique indexed field?
If for all of your fields, there is only one possible document, you
don't need length normalization, scoring, or a search engine at all...
just use a HashMap?
On Thu, Nov 10, 2011 at 7:42 AM, Ivan Hrytsyuk
wrote:
> Hello everyone,
>
> We have large in
hi,
locale sensitive range queries don't work with these filters, only sort,
although erick erickson has a patch that will enable this (the lowercasing
wildcards patch, then you could add this filter to your multiterm chain).
separately locale range queries and sort both work easily on trunk (wit
On Wed, Nov 23, 2011 at 11:22 PM, Michael Sokolov wrote:
> Thanks for confirming that, and laying out the options, Robert.
>
FYI: Erick committed the multiterm stuff, so I opened an issue for
this: https://issues.apache.org/jira/browse/SOLR-2919
--
lucidimagination.com
On Sat, Nov 26, 2011 at 8:43 PM, Michael Sokolov wrote:
> That's great news! We can't really track trunk, but it looks like this is
> targeted for 3.6, right? As a short-term alternative, I was considering
> using ICUFoldingFilter; this won't preserve some of the finer distinctions,
> but will at
technically it could? I'm just not sure if the current spellchecking
apis allow for it? But maybe someone has a good idea on how to easily
expose this.
I think its a good idea.
Care to open a JIRA issue?
On Mon, Nov 28, 2011 at 1:31 PM, Phil Hoy wrote:
> Hi,
>
> Can the DirectSolrSpellChecker b
On Mon, Nov 28, 2011 at 4:36 PM, Phil Hoy wrote:
> Added issue: https://issues.apache.org/jira/browse/SOLR-2926
> Please let me know if more information needs adding to JIRA.
>
> Phil
>
Thanks, I'll followup on the issue
--
lucidimagination.com
On Tue, Nov 29, 2011 at 8:07 AM, elisabeth benoit
wrote:
> Hello,
>
> I'd like to know if the Levensthein distance algorithm used by Solr 4.0
> DirectSpellChecker (working quite well I must say) is considering an
> inversion as distance = 1 or distance = 2?
>
> For instance, if I write Monteruil a
On Tue, Nov 29, 2011 at 9:21 AM, elisabeth benoit
wrote:
> ok, thanks.
>
> I think it would be a nice improvment to consider inversion as distance =
> 1, since it's a so common mistake. The distance = 2 makes it difficult to
> correct transpositions on small words (for instance, the DirectSpellChe
On Thu, Dec 8, 2011 at 11:01 AM, Jay Luker wrote:
> Hi,
>
> I am trying to provide a means to search our corpus of nearly 2
> million fulltext astronomy and physics articles using regular
> expressions. A small percentage of our users need to be able to
> locate, for example, certain types of iden
On Thu, Dec 8, 2011 at 10:46 AM, Mark Miller wrote:
>
> On Dec 8, 2011, at 8:50 AM, Jamie Johnson wrote:
>
>> Isn't the codec stuff merged with trunk now?
>
> Robert merged this recently AFAIK.
>
true but that issue only moved the majority of the rest of the index
(stored fields, term vectors, fi
On Thu, Dec 8, 2011 at 12:55 PM, Jamie Johnson wrote:
> Thanks Andrzej. I'll continue to follow the portable format JIRA
> along with 3622, are there any others that you're aware of that are
> blockers that would be useful to watch?
>
There is a lot to be done, particularly norms and deleted doc
On Sun, Dec 11, 2011 at 11:34 AM, eks dev wrote:
> on the latest trunk, my schema.xml with field type declaration
> containing //codec="Pulsing"// does not work any more (throws
> exception from FieldType). It used to work wit approx. a month old
> trunk version.
>
> I didn't dig deeper, can be th
On Mon, Dec 12, 2011 at 5:18 AM, Max wrote:
> The end offset remains 11 even after folding and transforming "æ" to
> "ae", which seems wrong to me.
End offsets refer to the *original text* so this is correct.
What is wrong, is EdgeNGramsFilter. See how it turns that 11 to a 12?
>
> I also stum
On Mon, Dec 12, 2011 at 5:18 AM, Max wrote:
> It seems like there is some weird stuff going on when folding the
> string, it can be seen in the analysis view, too:
>
> http://i.imgur.com/6B2Uh.png
>
I created a bug here, https://issues.apache.org/jira/browse/LUCENE-3642
Thanks for the screensho
The old one didn't really handle this correctly either.
Koji, what is the highlighting problem? Can we have a test case?
2011/12/26 Koji Sekiguchi :
> I found that SynonymFilter javadoc says:
>
> "Matches single or multi word synonyms in a token stream.
> This token stream cannot properly handle
On Mon, Dec 26, 2011 at 10:54 AM, Koji Sekiguchi wrote:
> I don't have JUnit test case. What I tried was:
>
> I have indexing time synonym definition:
>
> nhl, national hockey league
>
> and I indexed "I like national hockey league".
>
> Then I searched nhl with hl=on, I got an unwanted highlight
On Sat, Jan 14, 2012 at 12:58 PM, wrote:
> Hi,
>
> I'm switching from Lucene 2.3 to Solr 3.5. I want to reuse the existing
> indexes (huge...).
If you want to use a Lucene 2.3 index, then you should set this in
your solrconfig.xml:
LUCENE_23
>
> In Lucene I use an untweaked org.apache.lucene.a
On Sat, Jan 14, 2012 at 5:09 PM, Lance Norskog wrote:
> Has the GermanAnalyzer behavior changed at all? This is another kind
> of mismatch, and it can cause very subtle problems. If text is
> indexed and queried using different Analyzers, queries will not do
> what you think they should.
It acts
looks like https://issues.apache.org/jira/browse/SOLR-2888.
Previously, FST would need to hold all the terms in RAM during
construction, but with the patch it uses offline sorts/temporary
files.
I'll reopen the issue to backport this to the 3.x branch.
On Mon, Jan 16, 2012 at 8:31 PM, Dave wrot
how long it will take to
> get a fix? Would I be better switching to trunk? Is trunk stable enough for
> someone who's very much a SOLR novice?
>
> Thanks,
> Dave
>
> On Mon, Jan 16, 2012 at 10:08 PM, Robert Muir wrote:
>
>> looks like https://issues.apache.org/j
>> at
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>> at
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>
countryid,
> c.plainname as countryname, p.timezone as timezone, r.id as regionid,
> r.plainname as regionname from places p, regions r, countries c, cities c2
> where c2.id = p.cityid AND p.settingid = 1 AND p.regionid > 1 AND
> p.countryid=c.id AND r.id=p.regionid"
> transformer="TemplateTransformer">
>
>
>
make it work with the KStem jars?
>
> Thanks!
>
--
Robert Muir
rcm...@gmail.com
idea.)
>
>
I don't think we should do this. how many tokens would make?
(such malformed input exists in the wild, e.g. someone spills beer on their
keyboard and they key gets sticky)
--
Robert Muir
rcm...@gmail.com
On Wed, Jun 23, 2010 at 3:34 PM, Peter Karich wrote:
>
> So, you mean I should try it out her:
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/
>
>
yes, the speedups are only in trunk.
--
Robert Muir
rcm...@gmail.com
results with word
> 'comfort' in the title. I assume it is because of stemming. What is the
> right way to handle this?
>
from your examples, it seems a more lightweight stemmer might be an easy
option: https://issues.apache.org/jira/browse/LUCENE-2503
--
Robert Muir
rcm...@gmail.com
ost on the
> nGram_text field.
>
> If I do a *:* on the Solr administration interface it shows the nGram_text
> field to be populated.
> However if I search for plan (Assume I indexed the word Plane) no results
> are shown.
> Is there any other configurations that needs to be done ?
>
> Thanks in advance,
>
> Regards,
> Indika
>
--
Robert Muir
rcm...@gmail.com
ght be something in analysis)
--
Robert Muir
rcm...@gmail.com
--
> Peter M. Wolanin, Ph.D.
> Momentum Specialist, Acquia. Inc.
> peter.wola...@acquia.com
>
--
Robert Muir
rcm...@gmail.com
Solr - User mailing list archive at Nabble.com.
>
--
Robert Muir
rcm...@gmail.com
ache.org/jira/browse/SOLR-2003
In this case, the wrong encoding could have been detected and saved you some
time...
--
Robert Muir
rcm...@gmail.com
cluded
> >> ZooKeeper jar (java versioning issue) - so I had to download the source
> and
> >> build this. Now 'ant' gets a bit further, to the stage listed above.
> >>
> >> Any idea of the problem??? THANKS!
> >>
> >> [javac] Compiling 438 source files to
> >> /Volumes/newpart/solrcloud/cloud/build/solr
> >> [javac]
> >>
> /Volumes/newpart/solrcloud/cloud/src/java/org/apache/solr/cloud/ZkController.java:588:
> >> cannot find symbol
> >> [javac] symbol : method stringPropertyNames()
> >> [javac] location: class java.util.Properties
> >> [javac] for (String sprop :
> >> System.getProperties().stringPropertyNames()) {
> >>
> >
> >
> >
>
--
Robert Muir
rcm...@gmail.com
and I would want when a user make a search and forget to accent
> the words the search results show both posibilities: the results without
> the
> accent an the results with the accent.
>
> would you help me please ???
> Regards
> Ariel
>
--
Robert Muir
rcm...@gmail.com
wrote:
> Hi,
>
> I want to setup an solr with support for several languages.
> The language list includes slovene, unfortunately I found nothing about it
> in the wiki.
> Has some one experiences with solr 1.4 and slovene?
>
> thanks for help
> Markus
--
Robert Muir
rcm...@gmail.com
message in context:
> http://lucene.472066.n3.nabble.com/Stemming-tp982690p982690.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
--
Robert Muir
rcm...@gmail.com
Stemming-tp982690p982786.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
--
Robert Muir
rcm...@gmail.com
м, Коврове.
>
> Are there other stemming plugins for the russian language that can handle
> this?
> If not, what are the options. A simple solution may be to use the wildcard
> queries in Standard mode instead of the DisMaxQueryHandler:
> Ковров*
>
> but I'd like to avoid it.
>
> Thanks.
>
--
Robert Muir
rcm...@gmail.com
might give you less problems on
average, but I noticed it has this same problem with the example you gave.
On Tue, Jul 27, 2010 at 4:25 AM, Robert Muir wrote:
> All of your examples stem to "ковров":
>
>assertAnalyzesTo(a, "Коврова Коврову Ковровом Коврове",
>
?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0
> >
> > Немцов: 74 articles
> >
> >
> http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2
> >
> >
> >
> >
>
--
Robert Muir
rcm...@gmail.com
.* stopwords is ideal for the english language,
> although in russian nouns are inflected: Борис, Борису, Бориса, Борисом
>
> I'll try the RussianLightStemFilterFactory (the article in the PDF
> mentioned
> it's more accurate).
>
> Once again thanks,
> Oleg Bu
agine such a list could be added to the example protwords.txt
>
> Thanks,
> Otis
>
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
--
Robert Muir
rcm...@gmail.com
lo,
>
> I have found the analysis tool in the admin page to be very useful in
> understanding my schema. I've made changes to my schema so that a
> particular case I'm looking at matches properly. I restarted solr,
> deleted the document from the index, and added it again. But still,
> when I do a query, the document does not get returned in the results.
>
> Does anyone have any tips for debugging this sort of issue? What is
> different between what I see in analysis tool and new documents added
> to the index?
>
> Thanks,
> Justin
>
--
Robert Muir
rcm...@gmail.com
ing the
> "match" highlighting will actaully reduce confusion, but perhaps there is
> verbage/disclaimers that could be added to make it more clear?
>
As i said before, I think i disagree with you. I think for stuff like this
the technicals are less important, whats important is this is a misleading
checkbox that really confuses users.
I suggest disabling it entirely, you are only going to remove confusion.
--
Robert Muir
rcm...@gmail.com
rom 'Query Analyzer' is completely bogus.
On Wed, Aug 4, 2010 at 1:57 PM, Robert Muir wrote:
>
>
> On Wed, Aug 4, 2010 at 1:45 PM, Chris Hostetter
> wrote:
>
>>
>> it really only attempts to identify when there is overlap between
>> analaysis at
after and including version
X-1.0, but may-or-may-not be able to read indexes generated by version
X-2.N.
(And personally I think there is stuff in 2.x like modified-utf8 that i
would object to adding support for with terms now as byte[])
--
Robert Muir
rcm...@gmail.com
tem won't form phrase queries unless the user explicitly puts
double quotes around it.
--
Robert Muir
rcm...@gmail.com
ll actually form slow phrase queries by default.
> >
>
> do you mean that http://lucene.apache.org will be split up into "http
> lucene apache org" and solr will perform a phrase query?
>
> Regards,
> Peter.
>
--
Robert Muir
rcm...@gmail.com
on whitespace first. That's my
> point: analysis.jsp doesn't make any assumptions about what query parser
> *might* be used, it just tells you what your analyzers do with strings.
>
you're right, we should just fix the bug that the queryparser tokenizes on
whitespace first. then
sing.
> even if you change the Lucene QUeryParser so that whitespace isn't a meta
> character it doens't affect the underlying issue: analysis.jsp is agnostic
> about QueryParsers.
analysis.jsp isn't agnostic about queryparsers, its ignorant of them, and
your default queryparser is actually a de-facto whitespace tokenizer, don't
try to sugarcoat it.
--
Robert Muir
rcm...@gmail.com
r indexes will not be able to be
read natively without conversion first (with maybe loss of analyzer
compatibility)."
the fact 4.0 can read 3.x indexes *at all* without a converter tool is
only because Mike Mccandless went the extra mile.
i dont see anything suggesting we should support any tools for 2.x indexes!
--
Robert Muir
rcm...@gmail.com
ly why it comes up on the mailing list it seems
at least every week [at this point you have to admit, there is a problem].
If you want to say the analysis tool is agnostic about queryparsers, thats
fine, you can keep saying that. I'm saying it shouldn't be.
--
Robert Muir
rcm...@gmail.com
ght, we should just fix the bug that the queryparser tokenizes on
> whitespace first. then analysis.jsp will be significantly less confusing.
>> dude .. not trying to get into a holy war here
> -1 from me.
>
>
well, that might be your opinion, but it doesn't change the facts.
--
Robert Muir
rcm...@gmail.com
e analyzers jar!
This way, in a single jar you have the TurkishLowerCaseFilter, but also the
Turkish stemmer from snowball, a set of Turkish stopwords in resources/, and
a Lucene TurkishAnalyzer that puts it all together.
--
Robert Muir
rcm...@gmail.com
uery of foo bar is processed as TokenStream(foo) +
TokenStream(bar)
so query-time shingling like this doesn't work as you expect for this
reason.
--
Robert Muir
rcm...@gmail.com
1 - 100 of 373 matches
Mail list logo