: > I'm surprised, as you are, by the non-linearity. Out of curiosity, what is
Unless the data in "stored" fields is significantly greater then "indexed"
fields the Index size almost never grows linearly with the number of
documents -- it's the number of unique terms that tends to primarily
in
Since this looks like more of a lucene issue, I've replied in
[EMAIL PROTECTED]
-Yonik
On Thu, Aug 14, 2008 at 10:18 PM, Ian Connor <[EMAIL PROTECTED]> wrote:
> I seem to be able to reproduce this very easily and the data is
> medline (so I am sure I can share it if needed with a quick email to
>
I seem to be able to reproduce this very easily and the data is
medline (so I am sure I can share it if needed with a quick email to
check).
- I am using fedora:
%uname -a
Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30
13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
%java -vers
Rob,
Actually I am copying *_facet to text. I have the following for
copyField in my schema:
This is my field configuration in my schema:
Thanks,
- Jake
On Thu, Aug 14, 2008 at 5:49 PM, Rob Casson <[EMAI
you're likely not copyField-ing *_facet to text, and we'd need to see
what type of field it is to see how it will be analyzed at both
search/index time.
the default schema.xml file is pretty well documented, so you might
want to spend some time looking thru it, and reading the
commentslots of
On Thu, 14 Aug 2008 11:34:47 -0400
"Steven A Rowe" <[EMAIL PROTECTED]> wrote:
[...]
> The kind of filter Walter is talking about - a generalized language-aware
> character normalization Solr/Lucene filter - does not yet exist. My guess is
> that if/when it does materialize, both the Solr and th
On Thu, Aug 14, 2008 at 6:31 PM, Chris Harris <[EMAIL PROTECTED]> wrote:
> (The only time a
> segment will be modified is if you delete files from it, and that will
> only alter the segment's .del file, leaving .tis and friends alone.)
Actually, these days .del files are even versioned.
> I don't
Hi Shalin,
"foobar_facet" is a dynamic field. Its defined in my schema like this:
I have the default search field set to text. Can I use more than one
default search field?
text
Thanks,
- Jake
On Thu, Aug 14, 2008 at 2:48 PM, Shalin Shekhar Mangar
<[EMAIL PROTECTED]> wrote:
> Hi Jake,
>
> W
The main thing that bugs me about this index now is that the latest
version of Luke (0.8.1) won't open it. ("Unknown format version: -6")
The Solr Luke handler works fine with it, though.
Luke comes with a released version of Lucene probably, while solr is
using a later version. You have to
On Thu, Aug 14, 2008 at 2:01 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Chris Harris <[EMAIL PROTECTED]> wrote:
>> It's my understanding that if my mergeFactor is 10, then there
>> shouldn't be more than 11 segments in my index directory (10 segments,
>> plus an additional segment if a mer
A question mark huh? You sure there are no character encoding issues
going on?
Otis Gospodnetic wrote:
Paul, we had many highlighter-related changes since 1.2, so I suggest you try
the nightly.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
Hi Jake,
What is the type of the foobar_facet field in your schema.xml ?
Did you add foobar_facet as the default search field?
On Fri, Aug 15, 2008 at 3:13 AM, Jake Conk <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I inserted the following documents into Solr:
>
>
> --
Hello,
I inserted the following documents into Solr:
---
124
Jake Conk
125
Jake Conk
---
id is the only requ
The DocSet isn't part of the cache key. The key is usually just a simple string
(e.g. companyId). They just return a DocSet. I think the user caches are fine.
This DocSet is then used as a filter for the actual query. I believe it is this
step that is slow.
However, I am guessing that the solut
Paul, we had many highlighter-related changes since 1.2, so I suggest you try
the nightly.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: pdovyda2 <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, August 14, 2008 2:56
Chris Harris <[EMAIL PROTECTED]> wrote:
> It's my understanding that if my mergeFactor is 10, then there
> shouldn't be more than 11 segments in my index directory (10 segments,
> plus an additional segment if a merge is in progress).
Actually, mergeFactor 10 means each *level* will have <= 10 seg
On Thu, Aug 14, 2008 at 3:15 PM, Kevin Osborn <[EMAIL PROTECTED]> wrote:
> The problem here is that the calls in SolrIndexSearcher don't appear to use
> the QueryResultsCache if the filer is a DocSet rather than a List.
Right... using a DocSet as part of the cache key would be pretty slow
(key co
Yikes... not good. This shouldn't be due to anything you did wrong
Ian... it looks like a lucene bug.
Some questions:
- what platform are you running on, and what JVM?
- are you using multicore? (I fixed some index locking bugs recently)
- are there any exceptions in the log before this?
- how re
In the effort to clean up confusion around MultiCore usage, we have
renamed the class that handle runtime core administration from
"MultiCoreX" to CoreAdminX. Additionally, the path that the default
MultiCoreRequest expects to hit is: /admin/cores rather then /admin/
multicore -- if you hav
We have a bunch of user caches that return DocSet objects. So, we intersect
them and send a DocSet filter and the actual query to getDocListAndSet or
getDocList. The problem here is that the calls in SolrIndexSearcher don't
appear to use the QueryResultsCache if the filer is a DocSet rather than
Assuming you mean significant in the traditional IR sense, I would
start with the MoreLikeThis. See http://wiki.apache.org/solr/MoreLikeThisHandler
In particular the mlt.interestingTerms option.
As for phrases, that is a bit harder. You could try playing around
with token-based n-grams (
Right before I sent the message. Did a 'svn up src/;and clean;ant
dist' and it failed. Seems to work fine now.
On Aug 14, 2008, at 2:38 PM, Ryan McKinley wrote:
have you updated recently?
isEnabled() was removed last night...
On Aug 14, 2008, at 2:30 PM, Doug Steigerwald wrote:
I'd try
This is kind of a strange issue, but when I submit a query and ask for
highlighting back, sometimes the highlighted text includes a question mark
at the beginning, although a question mark character does not appear in the
field that the highlighted text is taken from.
I've put some sample XML out
Hi,
I have rebuilt my index a few times (it should get up to about 4
Million but around 1 Million it starts to fall apart).
Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.IndexOutOfBoundsException: Index: 105, Size: 33
at
org.ap
Humm, I am new to the world of search
I am looking for something that will give me a list of significant words or
phrases extracted from a document stored in solr.
Jack
On Fri, Aug 8, 2008 at 9:33 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> See https://issues.apache.org/jira/browse/SOLR-65
have you updated recently?
isEnabled() was removed last night...
On Aug 14, 2008, at 2:30 PM, Doug Steigerwald wrote:
I'd try, but the build is failing from (guessing) Ryan's last commit:
compile:
[mkdir] Created dir: /Users/dsteiger/Desktop/java/solr/build/core
[javac] Compiling 337 s
It's my understanding that if my mergeFactor is 10, then there
shouldn't be more than 11 segments in my index directory (10 segments,
plus an additional segment if a merge is in progress). It would seem
to follow that there shouldn't be more than 11 fdt files, 11 tis
files, etc.. However, I'm looki
I'd try, but the build is failing from (guessing) Ryan's last commit:
compile:
[mkdir] Created dir: /Users/dsteiger/Desktop/java/solr/build/core
[javac] Compiling 337 source files to /Users/dsteiger/Desktop/
java/solr/build/core
[javac] /Users/dsteiger/Desktop/java/solr/client/java/s
I believe I just fixed this on SOLR-606 (thanks to Stefan's patch).
Give it a try and let us know.
-Grant
On Aug 13, 2008, at 2:25 PM, Doug Steigerwald wrote:
I've noticed a few things with the new spellcheck component that
seem a little strange.
Here's my document:
5
wii blackberry
I have 2 fields which will sometimes contain the same data. When they do
contain the same data, am I paying the same performance cost as when they
contain unique data? I think the real question here is: does Lucene index
values per field, or per document?
--
View this message in context:
http://
Thank you for your suggestion, I really don't see anything 'wrong'
with the longer lists.. I entered https://issues.apache.org/jira/browse/SOLR-702
for this issue, and attached relevant files. If you need anything
more, don't hesitate to contact me!
Thanks for your time!
Matthew Runo
Softw
There should be no limit, so you may have uncovered a bug. Could you
open a JIRA issue? If it's a real bug, it should get fixed before
1.3.
-Yonik
On Thu, Aug 14, 2008 at 12:35 PM, Matthew Runo <[EMAIL PROTECTED]> wrote:
> Hello folks!
>
> Having a heck of a time trying to get a synonyms file t
Hello folks!
Having a heck of a time trying to get a synonyms file to work
properly. It seems that something's wrong with the way it's been set
up, but, honestly, I can't see anything wrong with it. Some samples...
This works...
zutanoapparel => zutano
But this does not...
aadias, aadidas,
Erick Erickson wrote:
I'm surprised, as you are, by the non-linearity. Out of curiosity, what is
your MaxFieldLength? By default only the first 10,000 tokens are added
to a field per document. If you haven't set this higher, that could account
for it.
We set it to a very large number so we in
Hi Shalin,
As there is certainly the potential for several thousand different
attribute types across all of our category's I guess I will have to
manage them myself (was hoping for a short-cut or that I was missing a
trick) but no problem. Solr still seems to outperform the commercial
package we a
Hi Norberto,
On 08/14/2008 at 8:10 AM, Norberto Meijome wrote:
> > On 8/13/08 9:16 AM, "Steven A Rowe" <[EMAIL PROTECTED]> wrote:
> >
> > > Hi Norberto,
> > >
> > > https://issues.apache.org/jira/browse/LUCENE-1343
>
> hi Steve,
> thanks for the pointer. this is a Lucene entry... I thought the
Hi Barry,
If each category has an exclusive set of fields on which you want to facet
on, then you can simply facet on all facet-able fields (across all
categories). The ones which are not present for the selected category will
show up with zero facets which your front-end can suppress. However if
Hi Yonik & Erik,
Thanks to both of you. It seems like our container had some issues and
was causing this problem.
Thanks,
Raghu
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, August 13, 2008 10:57 AM
To: solr-user@lucene
On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman <[EMAIL PROTECTED]> wrote:
> Duh. I should have thought of that. I'm a big fan of djbdns so I'm quite
> familiar with daemontools.
>
> Thanks!
>
:) My pleasure. Was nice to hear recently that DJB is moving toward more
flexible licensing terms. For
Hi,
I have solr setup to index technical data for a number of different
types of products, and this means that different product have different
facet fields available.
For example here would be a small example of the sort of data we are
indexing, in reality there are between 10 and 20 facet fields
( 2 in 1 reply)
On Wed, 13 Aug 2008 09:59:21 -0700
Walter Underwood <[EMAIL PROTECTED]> wrote:
> Stripping accents doesn't quite work. The correct translation
> is language-dependent. In German, o-dieresis should turn into
> "oe", but in English, it shoulde be "o" (as in "co__perate" or
> "M__tle
On Thu, 14 Aug 2008 12:21:13 +0530
"Shalin Shekhar Mangar" <[EMAIL PROTECTED]> wrote:
> The SpellCheckerRequestHandler is now deprecated with Solr 1.3 and it has
> been replaced by SpellCheckComponent.
>
> http://wiki.apache.org/solr/SpellCheckComponent
which works quite well with dismax.
B
__
42 matches
Mail list logo