StandardTokenizer splits your text into tokens, and the suggester
suggests tokens independently. It sounds as if you want the suggestions
to be based on the entire text (not just the current word), and that
only adjacent words in the original should appear as suggestions.
Assuming that's what
On 02/17/2015 03:46 AM, Volkan Altan wrote:
First of all thank you for your answer.
You're welcome - thanks for sending a more complete example of your
problem and expected behavior.
I don’t want to use KeywordTokenizer. Because, as long as the compound words
written by the user are availabl
There is also PostingsHighlighter -- I recommend it, if only for the
performance improvement, which is substantial, but I'm not completely
sure how it handles this issue. The one drawback I *am* aware of is
that it is insensitive to positions (so words from phrases get
highlighted even in isol
Yes, Lux automatically indexes text in XML elements associated with
their element names so you can run efficient XPath/XQuery queries; in
your case I would write:
q=/MainData/Info/Info[@name="Bob"][city="Cincinnati"]
or
q=//Info[@name="Bob"][city="Cincinnati"]
It also let's you mix "regular"
While upgrading from 4.2.1 to 4.6.1 I noticed that many of the fields
defined in the example schema.xml that used to be indexed and not stored
are now defined as indexed and stored. Is there anything behind this
change other than the idea that it would be more convenient to have all
the values
at 1:02 PM, Michael Sokolov
wrote:
While upgrading from 4.2.1 to 4.6.1 I noticed that many of the fields
defined in the example schema.xml that used to be indexed and not stored are
now defined as indexed and stored. Is there anything behind this change
other than the idea that it would be more
upansky
-----Original Message- From: Michael Sokolov
Sent: Saturday, March 15, 2014 1:02 PM
To: solr-user@lucene.apache.org
Subject: example schema now stores most field values
While upgrading from 4.2.1 to 4.6.1 I noticed that many of the fields
defined in the example schema.xml that used to
ly, it doesn't seem to be working. (Anonymous - via GTD
book)
On Sun, Mar 16, 2014 at 9:28 PM, Michael Sokolov
wrote:
Thanks for hunting that down, Jack. It may very well have been a change
that we made (to remove the stored="true". Sorry if I led you on a wild
goose chase.
A
I'm getting a similar exception when writing documents (on the client
side). I can write one document fine, but the second (which is being
routed to a different shard) generates the error. It happens every time
- definitely not a resource issue or timing problem since this database
is complet
an error at
the same time and put it on pastebin or the like.
Thanks,
Greg
On Mar 20, 2014, at 3:36 PM, Michael Sokolov
wrote:
I'm getting a similar exception when writing documents (on the client side). I
can write one document fine, but the second (which is being routed to a
differ
Excellent, thanks Shalin!
On 3/22/2014 3:32 PM, Shalin Shekhar Mangar wrote:
Thanks Michael! I just committed your fix. It will be released with 4.7.1
On Fri, Mar 21, 2014 at 8:30 PM, Michael Sokolov
wrote:
I just managed to track this down -- as you said the disconnect was a red
herring
On 3/22/2014 2:16 AM, anupamk wrote:
Hi,
Is the solrTomcat wiki article valid for solr-4.7.0 ?
http://wiki.apache.org/solr/SolrTomcat
I am not able to deploy solr after following the instructions there.
When I try to access the solr admin page I get a 404.
I followed every step exactly as me
On 4/1/14 2:32 PM, Walter Underwood wrote:
And here is another peculiarity of short text fields.
The movie "New York, New York" should not be twice as relevant for the query "new
york". Is there a way to use a binary term frequency rather than a count?
wunder
--
Walter Underwood
wun...@wunderw
On 4/3/14 7:46 AM, Michael Sokolov wrote:
On 4/1/14 2:32 PM, Walter Underwood wrote:
And here is another peculiarity of short text fields.
The movie "New York, New York" should not be twice as relevant for
the query "new york". Is there a way to use a binary term freq
I had to grapple with something like this problem when I wrote Lux's
app-server. I extended SolrDispatchFilter and handle parameter
swizzling to keep everything nicey-nicey for Solr while being able to
play games with parameters of my own. Perhaps this will give you some
ideas:
https://gith
getParameterNames before SolrDispatchFilter has a chance to access the
InputStream.
I opened https://issues.apache.org/jira/browse/SOLR-5969 to discuss further
and attached our current patch.
On Mon, Apr 7, 2014 at 2:02 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:
I had to grappl
I think I would like to do something like copyfield from a bunch of
fields into a single field, but with different analysis for each source,
and I'm pretty sure that's not a thing. Is there some alternate way to
accomplish my goal?
Which is to have a suggester that suggests words from my full
Thanks
Mike
On 4/9/2014 4:16 PM, Michael Sokolov wrote:
I think I would like to do something like copyfield from a bunch of
fields into a single field, but with different analysis for each
source, and I'm pretty sure that's not a thing. Is there some
alternate way to accomplish my goal?
, Apr 11, 2014 at 8:05 AM, Michael Sokolov
wrote:
The lack of response to this question makes me think that either there is no
good answer, or maybe the question was too obtuse. So I'll give it one more
go with some more detail ...
My main goal is to implement autocompletion with a mix of
hin a field based upon multiple inputs.
All the best,
Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search & Analytics @ CareerBuilder
On Thu, Apr 10, 2014 at 9:05 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:
The lack of response to this question makes m
I just wanted to let people know about some recent Solr books and videos
that are now available at safariflow.com. You can sign up for a free
trial and get instant access, buy a subscription, or you may already be
a subscriber. I don't normally send out announcements like this, but
because we
I've had this happen to me before too; it's always a mystery. I wonder
if it has to do with specifying the "file" appender for both rootLogger
and solrj?
-Mike
On 4/12/2014 5:20 PM, Shawn Heisey wrote:
On 4/11/2014 3:21 PM, Shawn Heisey wrote:
This is lucene_solr_4_7_2_r1586229, downloaded
usage/statistics too. To know
which chapters of my book were most useful/recommended.
Regards,
Alex
On 11/04/2014 8:45 pm, "Michael Sokolov"
wrote:
I just wanted to let people know about some recent Solr books and videos
that are now available at safariflow.com. You can sign up for a
I lost the original thread; sorry for the new / repeated topic, but
thought I would follow up to let y'all know that I ended up implementing
Alex's idea to implement an UpdateRequestProcessor in order to apply
different analysis to different fields when doing something analogous to
copyFields.
mine.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
On Tue, Apr 15, 2014 at 8:52 AM, Michael Sokolov
wrote:
I lost the original thread; sorry for the new / repeated topic, but thought
I would follow up to
experience thus sounds like either two or no blog
posts. I certainly have killed a bunch of good articles by waiting for
perfection:-)
On 15/04/2014 7:01 pm, "Michael Sokolov"
wrote:
A blog post is a great idea, Alex! I think I should wait until I have a
complete end-to-end implem
I've been working on getting AnalyzingInfixSuggester to make suggestions
using tokens drawn from multiple fields. I've done this by copying
tokens from each of those fields into a destination field, and building
suggestions using that destination field. This allows me to use
different analysi
I believe you could use term vectors to retrieve all the terms in a
document, with their offsets. Retrieving them from the inverted index
would be expensive since the index is term-oriented, not
document-oriented. Without tv, I think you essentially have to scan the
entire term dictionary loo
The ordering at the lowest level in Lucene is controlled based on an
arbitrary weighting factor: I believe the only option you have at the
Solr level is to order by term value (eg alphabetically), or by term
frequency. You could do this by creating a field with all of your
"sales" - if you cre
I'm trying to understand the facet counts I'm getting back from Solr
when the main query includes a term that restricts on a field that is
being faceted. After reading the docs on the wiki (both wikis) I'm
confused.
In my little test dataset, if I facet on "type" and use q=*:*, I get
facet c
On 4/27/2014 6:30 PM, Trey Grainger wrote:
So my question basically is: which restrictions are applied to the docset
from which (field) facets are computed?
Facets are generated based upon values found within the documents matching
your "q=" parameter and also all of your "fq=" parameters. Basi
On 4/27/14 7:02 PM, Michael Sokolov wrote:
On 4/27/2014 6:30 PM, Trey Grainger wrote:
So my question basically is: which restrictions are applied to the
docset
from which (field) facets are computed?
Facets are generated based upon values found within the documents
matching
your &q
I've been wanting to try out the PostingsHighlighter, so I added
storeOffsetsWithPositions to my field definition, enabled the
highlighter in solrconfig.xml, reindexed and tried it out. When I
issue a query I'm getting this error:
|field 'text' was indexed without offsets, cannot highlight
I checked using the analysis admin page, and I believe there are offsets
being generated (I assume start/end=offsets). So IDK I am going to try
reindexing again. Maybe I neglected to reload the config before I
indexed last time.
-Mike
On 05/02/2014 09:34 AM, Michael Sokolov wrote:
I
For posterity, in case anybody follows this thread, I tracked the
problem down to WordDelimiterFilter; apparently it creates an offset of
-1 in some case, which PostingsHighlighter rejects.
-Mike
On 5/2/2014 10:20 AM, Michael Sokolov wrote:
I checked using the analysis admin page, and I
on lucene 4.8?
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-5111
Michael Sokolov schreef:For posterity, in case
anybody follows this thread, I tracked the
problem down to WordDelimiterFilter; apparently it creates an offset of
-1 in some case, which PostingsHighlighter rejects.
-M
I'm pretty sure there's nothing to automate that task, but there are
some tools to help with indexing XML. Lux (http://luxdb.org) is one; it
can index all the element text and attribute values, effectively
creating an index for each tag name -- these are not specifically
Solr/Lucene fields, bu
I don't know what the design was, but your use case seems valid to me: I
think you should submit a ticket and a patch. If you write a test, I
suppose it might be more likely to get accepted.
-Mike
On 5/6/2014 10:59 AM, Cario, Elaine wrote:
I experimented locally with modifying the SolrCore c
On 5/11/2014 12:55 PM, Olivier Austina wrote:
Hi All,
Is there a way to know if a website use Solr? Thanks.
Regards
Olivier
Ask the people who run the site?
It seems as if the location of the suggester dictionary directory is not
core-specific, so when the suggester is defined for multiple cores, they
collide: you get exceptions attempting to obtain the lock, and the
suggestions bleed from one core to the other. There is an
(undocumented) "indexP
Thanks Dmitry!
On 05/15/2014 07:54 AM, Dmitry Kan wrote:
Hi Mike,
The core name can be accessed via: ${solr.core.name} in solrconfig.xml
(verified in a solr replication config).
HTH,
Dmitry
On Fri, May 9, 2014 at 4:07 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:
It
Alex - the query parsers generally accept an analyzer, which they must
apply after they perform their own tokenization. Consider: how would a
capitalized query term match lower-cased terms in the index without
query analysis?
-Mike
On 5/17/2014 4:05 AM, Alexandre Rafalovitch wrote:
Hello,
Is it possible that all your requests are routed to that single shard?
I.e. you are not using the smart client that round-robins requests? I
think that could cause all of the merging of results to be done on a
single node.
Also - is it possible you have a "bad" document in that shard? Like o
Joe - there shouldn't really be a problem *indexing* these fields:
remember that all the terms are spread across the index, so there is
really no storage difference between one 180MB document and 180 1 MB
documents from an indexing perspective.
Making the field "stored" is more likely to lead
this shard
Best,
Erick
On Mon, Jun 2, 2014 at 4:27 AM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:
Joe - there shouldn't really be a problem *indexing* these fields:
remember that all the terms are spread across the index, so there is
really
no storage diffe
It seems as if 0-hit queries should be pretty fast since they can
terminate very early? Are you seeing a big difference between
first-time and subsequent (cached) no-match queries?
-Mike
On 6/5/2014 8:47 AM, Dmitry Kan wrote:
Hi,
Solr is good at caching: even if first "cold" query takes lo
On 10/26/2013 8:31 PM, Bill Bell wrote:
Full JSON support deep complex object indexing and search Game changer
Bill Bell
Sent from mobile
Not JSON (yet?) but take a look at http://luxdb.org which does XML
indexing and search. We index all the text of all the nodes in your
tree: no nee
I'm pleased to announce the release of Lux, version 0.11.2, the Dublin
edition.
There have been the usual round of bug fixes and enhancements, but the
main news with this release is the inclusion of support for SolrCloud.
You can now store and search XML documents in a distributed index using
Don't feel bad: character encoding problems are often said to be among
the hardest in software engineering.
There's no simple answer to problems like this since as Erick said, any
tool in your chain could be the culprit. I doubt anyone on this list
will be able to guess "the answer" since the
I think there is a place for a client-side query hierarchy. It would be
nice if you could build a Lucene Query and the Solr client would
serialize it for you. If there were a general-purpose query
serialization library then you could support a similar programming model
for Lucene-only and wit
Did you say what the memory profile of your machine is? How much
memory, and how large are the shards? This is just a random guess, but
it might be that if you are memory-constrained, there is a lot of
thrashing caused by paging (swapping?) in and out the sharded indexes
while a single index c
Some of our customers want to display a "number of matches" score next
to each search result. I think what they want is to list the number of
matches that will be displayed when the entire document is highlighted.
But this can be slow to do for every search result (some documents can
be very
OK -- I did find SOLR-1298
<https://issues.apache.org/jira/browse/SOLR-1298>which explains how to
request the function as a field value. Still looking for a function
that does what I asked for ...
<https://issues.apache.org/jira/browse/SOLR-1298>
On 11/18/2013 11:55 AM, Michael S
return new SumFloatFunction(termcounts);
}
}
On 11/18/13 2:19 PM, Michael Sokolov wrote:
OK -- I did find SOLR-1298
<https://issues.apache.org/jira/browse/SOLR-1298>which explains how to
request the function as a field value. Still looking for a function
that does what I asked for ...
On 11
t();
for (Term t : terms) {
if (fields.isEmpty() || fields.contains (t.field())) {
termcounts.add (new TermFreqValueSource(t.field(), t.text(), t.field(),
t.bytes()));
}
}
return new SumFloatFunction(termcounts.toArray(new
ValueSource[termcounts.size()]));
}
}
On 11/18/13 8:38 PM, Michael Sokolov wrote:
ny case.
-- Jack Krupansky
-Original Message- From: Michael Sokolov
Sent: Thursday, November 21, 2013 8:56 AM
To: solr-user@lucene.apache.org
Subject: Re: How to index X™ as ™ (HTML decimal entity)
I have to agree w/Walter. Use unicode as a storage format. The entity
encodings are for trans
I have to agree w/Walter. Use unicode as a storage format. The entity
encodings are for transfer/interchange. Encode/decode on the way in and
out if you have to. Would you store "a" as "A" ? It makes it
impossible to search for, for one thing. What if someone wants to
search for the TM ch
I just posted a writeup of the Lucene/Solr Revolution Dublin
conference. I've been waiting for videos to become available, but I got
impatient. Slides are there, mostly though. Sorry if I missed your
talk -- I'm hoping to catch up when the videos are posted...
http://blog.safariflow.com/201
On 12/03/2013 01:55 AM, Dmitry Kan wrote:
Hello!
We have been experimenting with post filtering lately. Our setup is a
filter having long boolean query; drawing the example from the Dublin's
Stump the Chump:
fq=UserId:(user1 OR user2 OR...OR user1000)
The underlining issue impacting performanc
On 12/9/2013 11:13 PM, neerajp wrote:
Hi,
Pls. find my response in-line:
That said, the obvious alternative is to use /update/extract instead of
/update – this gives you a way of handling up to one binary stream in
addition to any number of fields that can be represented as text. In that
case, y
Have you considered using a custom UpdateProcessor to catch the
exception and provide more context in the logs?
-Mike
On 01/03/2014 03:33 PM, Benson Margulies wrote:
Robert,
Yes, if the problem was not data-dependent, indeed I wouldn't need to
index anything. However, I've run a small mountai
t up OK, I think you get the insert
messages at INFO level?
-Mike
On 1/4/2014 9:24 PM, Benson Margulies wrote:
I rather assumed that there was some log4j-ish config to be set that
would do this for me. Lacking one, I guess I'll end up there.
On Fri, Jan 3, 2014 at 8:23 PM, Michael Sokolo
I think the key optimization when there are no deletions is that you
don't need to renumber documents and can bulk-copy blocks of contiguous
documents, and that is independent of merge policy. I think :)
-Mike
On 01/06/2014 01:54 PM, Shawn Heisey wrote:
On 1/6/2014 11:24 AM, Otis Gospodnetic
On 01/28/2014 11:55 AM, Alexandre Rafalovitch wrote:
As to ESS, like I mentioned, the classpath issue seem to be quite a
challenge. Again, perhaps not something that shows up during the
testing because the directory layout during testing is rather
different from the end-user's layout.
I'm not s
,
Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
book)
On Wed, Jan 29, 2014 at 5:35 AM,
Maybe he can use updateable docvalues (LUCENE-5189)? I heard that was a
thing. Has it made its way into Solr in some way?
-Mike
On 2/19/2014 4:23 AM, Mikhail Khludnev wrote:
Just a side note. Sidecar index might be really useful for updating blocked
docs, but it's in experimenting stage iirc
On 3/1/2014 6:53 PM, Jack Krupansky wrote:
NoSQL? To me it's just a marketing term, like Big Data.
Data store? That does imply support for persistence, as opposed to
mere caching, but mere persistence doesn't assure that the store is
suitable for use as a System of Record which is a requiremen
On 3/3/2014 1:54 AM, KNitin wrote:
3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
As others have pointed out, this is really unusual for Solr. We often
see high permgen in our app servers due to dynamic class loading that
the framework performs; maybe you are somehow
Does that mean newer clients work with older servers (I think so, from
reading this thread), or the other way round? If so, I guess the advice
would be -- upgrade all your clients first?
-Mike
On 03/04/2014 10:00 AM, Mark Miller wrote:
Yeah, sorry :( the fix applied is only for compatibil
ld client -> new server or new
client -> old server), but we as a community need to pick one and build out the
test suites that ensure SolrJ compatibility with different versions.
Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com
________
F
On 3/5/2014 1:36 AM, Shawn Heisey wrote:
On 3/4/2014 8:15 PM, Michael Sokolov wrote:
Thanks, Tim, it's great to hear you say that! I tried to make that
point myself with various patches, but they never really got taken up by
committers, so I kind of gave up, but I agree with you 100% this
I've been looking at ExternalFileField to handle popularity boosting.
Since Solr updatable docvalues (SOLR-5944) isn't quite there yet. My
question is whether there is any support for uploading the external file
via Solr, or if people do that some other (external, I guess) way?
-Mike
Thanks for the links, Ramzi. I had already read the wiki page, which
merely talks about how to reload the file into memory once it has been
updated on disk. It doesn't mention any support for uploading that I can
see. Did I miss it?
-Mike
On 10/23/14 1:36 PM, Ramzi Alqrainy wrote:
Of cour
That's what I thought; thanks, Markus.
On 10/23/14 2:19 PM, Markus Jelsma wrote:
You either need to upload them and issue the reload command, or download them
from the machine, and then issue the reload command. There is no REST support
for it (yet) like the synonym filter, or was it stop filt
3.16e-11.0 looks fishy to me
On 10/23/14 5:09 PM, eShard wrote:
Good evening,
I'm using solr 4.0 Final.
I tried using this function
boost=recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))
but it fails with this error:
org.apache.lucene.queryparser.classic.ParseException: Expected ')' at
posi
This project (https://github.com/safarijv/ifpress-solr-plugin/) has some
examples of custom Solr UpdateRequestProcessors that feed a single
suggester from multiple fields, applying different weights to them,
using complete values from some and analyzing others into tokens.
The first thing I di
really offer a solution to your
problem, but there are some possibly helpful similarities: you will
probably want to write a custom UpdateRequestProcessor, and you will
want to feed the suggester with a custom Dictionary / InputIterator as I
have done in that example.
-Mike
-Clemens
-U
I noticed that when you include a function as a result field, the
corresponding key in the result markup includes trailing whitespace,
which seems like a bug. I wonder if anyone knows if there is a ticket
for this already?
Example:
fl="id field(units_used) archive_id"
ends up returning resu
OK, I opened SOLR-6672; not sure how I stumbled into using white space;
I would ordinarily use commas too, I think.
-Mike
On 10/29/14 1:23 PM, Chris Hostetter wrote:
: fl="id field(units_used) archive_id"
I didn't even realize until today that fl was documented to support space
seperated fiel
Just to get the obvious sledgehammer solution out of the way - upload a
new, edited solrconfig.xml with the default changed, and reload the core.
-Mike
On 11/3/14 6:28 AM, Dmitry Kan wrote:
Hello solr fellows,
I'm working on a project that involves using two update chains. One default
chain
Shawn this is really weird -- we run log4j in lots of installations and
have never seen an issue like this.
I wonder if you might be running some other log rotation software (like
logrotate) that is somehow getting in the way or conflicting?
-Mike
On 11/01/2014 01:45 PM, Shawn Heisey wrote:
You didn't describe your analysis chain, but maybe you are using
WordDelimiterFilter to break up hyphenated words? If so, it has a
protwords.txt feature that lets you specify exceptions
-Mike
On 11/5/2014 5:36 PM, Michael Della Bitta wrote:
Pretty sure what you need is called KeywordMarkerFil
The goal is to ensure that suggestions from autocomplete are actually
terms in the main index, so that the suggestions will actually result in
matches. You've considered expanding the main index by adding the
suggestion n-grams to it, but it would probably be better to alter your
suggester so
The usual approach is to use copyField to copy multiple fields to a
single field.
I posted a solution using an UpdateRequestProcessor to merge fields, but
with different analyzers, here:
https://blog.safaribooksonline.com/2014/04/15/search-suggestions-with-solr-2/
My latest approach is this:
We routinely store images and pdfs in Solr. There *is* a benefit, since
you don't need to manage another storage system, you don't have to worry
about Solr getting out of sync with the other system, you can use Solr
replication for all your assets, etc.
I don't use DIH, so personally I don't c
I believe the spellchecker component persists these indexes now and
reloads them on restart rather than rebuilding.
-Mike
On 11/13/14 7:40 PM, Walter Underwood wrote:
We have to manually rebuild the suggest dictionaries after a restart. This
seems odd, since someone else had a problem because
4/14 2:01 AM, Walter Underwood wrote:
We get no suggestions until we force a build with suggest.build=true. Maybe we
need to define a spellchecker component to get that behavior?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Nov 13, 2014, at 10:56 PM, Michael
an use filter query like "fq=terms:a:1"
2014. 11. 13. 오전 3:59에 "Michael Sokolov"
님이
작성:
We routinely store images and pdfs in Solr. There *is* a benefit, since
you don't need to manage another storage system, you don't have to worry
about Solr getting out of sync with
On 11/14/2014 01:43 PM, Erick Erickson wrote:
Just skimming, so maybe I misinterpreted.
ExternalFileField and ExternalFileFieldReloader
refer to storing values for each doc in an external file, they have
nothing to do with storing _files_.
The usual pattern is to have Solr store just enough da
Mike
On 11/14/14 2:01 AM, Walter Underwood wrote:
We get no suggestions until we force a build with suggest.build=true. Maybe we
need to define a spellchecker component to get that behavior?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Nov 13, 2014, at
nerating
multiple "stems" causes issues
On 11/18/2014 02:33 PM, Michael Sokolov wrote:
I find that a query for stemmed terms sometimes fails with the edismax
query parser and hunspell stemmer. Looklng at the output of analysis
for the query (text:following) I can see that it generates two
I find that a query for stemmed terms sometimes fails with the edismax
query parser and hunspell stemmer. Looklng at the output of analysis for
the query (text:following) I can see that it generates two different
terms at the same position: "follow" and "following". Then edismax seems
to genera
OK - please disregard; I found a rogue new component in our analyzer
that was messing everything up.
The hunspell behavior was perhaps a little confusing, but I don't
believe it leads to broken queries.
-Mike
On 11/18/2014 02:38 PM, Michael Sokolov wrote:
followup - hunspell has:
f
If you're willing to write some Java you can do something more efficient
by intersecting two terms enumerations: this works with constant memory
for any number of values in two fields, basically like intersecting any
two sorted lists, you leap frog between them. I have an example if
you're int
Those Spi classes rely on a configuration file that gets stored in the
META-INF folder. I'm not familiar with who OSGI works, but I'm pretty
sure that failure is because the file
META-INF/services/org.apache.lucene.codecs.Codec (you'll see it in the
lucene-core jar) can't be found
-Mike
On
maybe try
description_shingle:(Highest quality)
On 11/24/14 1:46 PM, vit wrote:
I have Solr 4.2.1
I am using the following analyser:
The index size will not increase as quickly as you might think, and is
not an issue in most cases. An alternative to two fields, though, is to
index both upper- and lower-case tokens at the same position in a single
field, and then to perform no case folding at query time. There is no
standar
right -- missed Ahmet's answer there in my haste to respond ...
-Mike
On 11/25/14 6:56 AM, Ahmet Arslan wrote:
Hi Apurv,
I wouldn't worry about index size, increase in index size is not linear (2x)
like that.
Please see similar discussion :
https://issues.apache.org/jira/browse/LUCENE-5620
A
Scores are related to total term frequencies *in each shard*, not
globally, and I think they may include term counts from deleted
documents as well, which could account for the discrepancy in scores
across the two shards.
-Mike
On 11/25/14 3:22 AM, rashi gandhi wrote:
Hi,
I have created t
Yes - here's a working example we have in production (tested in 4.8.1
and 4.10.2, but the underlying lucene stuff hasn't changed since 4.6.1
I'm pretty sure):
https://github.com/safarijv/ifpress-solr-plugin/blob/master/src/main/java/com/ifactory/press/db/solr/processor/UpdateDocValuesProcessor.
1 - 100 of 247 matches
Mail list logo