Hi All,
I am working on a module using Solr, where I want to get the stats of
each keyword found in each field.
If my search term is: (title:("web2.0" OR "ajax") OR
description:("web2.0" OR "ajax"))
Then I want to know how many times web2.0/ajax were found in title or
description.
Any suggestio
Ok this might be a simple one, or more likely, my understanding of solr is
shot to bits
We have a catalogue of documents that we have a solr index on. We need to
provide an alphabetical search, so that a user can list all documents with a
title beginning A, B and so on...
So how do we do th
On Jul 22, 2008, at 5:08 AM, Adrian M Bell wrote:
We have a catalogue of documents that we have a solr index on. We
need to
provide an alphabetical search, so that a user can list all
documents with a
title beginning A, B and so on...
So how do we do this?
Currently we have built up the f
hi,
try using faceted search,
http://wiki.apache.org/solr/SimpleFacetParameters
something like facet=true&facet.query=title:("web2.0" OR "ajax")
facet.query - gives the number of matching documents for a query.
You can run the examples in the above link and see how it works..
You can also try u
Hi,
I have a category field in my index which I'd like to use as a facet.
However my search frontend only allows you to search in one category at a
time for which I'm using a filter query. Unfortunately the filter query
restricts the facets as well.
My query looks like this:
?q=content:foo&fq=cat
28 votes so far and counting!
When should we close this poll?
On Tue, Jul 22, 2008 at 1:18 AM, Mark Miller <[EMAIL PROTECTED]> wrote:
> Perfect! Thank you Shalin. Much appreciated, and a dead simple system. My
> vote is in.
>
> - Mark
>
>
> Shalin Shekhar Mangar wrote:
>
>> Will this do? A 1-5 f
My opinion: if its already a runaway, we might as well not prolong
things. If not though, we should probably give some time for any
possible laggards. The 'admin look' poll received its first 19-20 votes
in the first night / morning, and has only gotten 2 or 3 since then, so
probably no use goi
This is *exactly* my issue ... very nicely worded :-)
I would have thought facet.query=*:* would have been the solution but
it does not seem to work. Im interested in getting these *total*
counts for UI display.
- Jon
On Jul 22, 2008, at 6:05 AM, Stefan Oestreicher wrote:
Hi,
I have a
All facet counts currently returned are _within_ the set of documents
constrained by query (q) and filter query (fq) parameters - just to
clarify what it does. Why? That's the general use case. Returning
back counts from differently constrained sets requires some custom
coding - perhaps
It seems that spellchecker works great except all the "7 words you
can't say on TV" resolve to very important people, is there a way to
contain just certain words so they don't resolve?
Thanks.
- Jon
Can someone point me to an in depth explanation of the Solr cache
statistics? I'm having a hard time finding it online. Specifically, I'm
interested in these fields that are listed on the Solr admin statistics
pages in the cache section:
lookups
hits
hitratio
inserts
evictions
size
cumulative_
Shalin Shekhar Mangar wrote:
The problems you described in the spellchecker are noted in
https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue to
synchronize spellcheck.build so that the index is not corrupted.
I'd like to discuss this a little...
I'm not sure that I want
How can I require an exact field match in a query. For instance, if a
title field contains "Nature" or "Nature Cell Biology", when I search
title:Nature I only want "Nature" and not "Nature Cell Biology". Is
that something I do as a query or do I need to re index it with the
field defined in a cert
On Tue, Jul 22, 2008 at 11:07 AM, Geoffrey Young
<[EMAIL PROTECTED]> wrote:
> Shalin Shekhar Mangar wrote:
>>
>> The problems you described in the spellchecker are noted in
>> https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue
>> to
>> synchronize spellcheck.build so that the
On Tue, Jul 22, 2008 at 8:37 PM, Geoffrey Young <[EMAIL PROTECTED]>
wrote:
>
>
> Shalin Shekhar Mangar wrote:
>
>> The problems you described in the spellchecker are noted in
>> https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue
>> to
>> synchronize spellcheck.build so that
On Tue, Jul 22, 2008 at 11:08 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
> How can I require an exact field match in a query. For instance, if a
> title field contains "Nature" or "Nature Cell Biology", when I search
> title:Nature I only want "Nature" and not "Nature Cell Biology". Is
> that someth
Hi Guys,
This is my first post. We are running solr with multiple Indexes, 20
Indexes. I'm facing problem with 5 one. I'm not able to run optimized on
that index. I'm getting following error. Your help is really appreciated.
java.io.IOException: read past EOF
at
org.apache.lucene.store.B
Hi, we've looked for info about this issue online and in the code and am
none the wiser - help would be much appreciated.
We are indexing the full text of journals using Solr. We currently pass
in the journal text, up to maybe 130 pages, and index it in one go.
We are seeing Solr stop indexing af
At the moment for "string", I have:
is there an example type so that it will do exact matches?
Would "alphaOnlySort" do the trick? It looks like it might.
On Tue, Jul 22, 2008 at 11:20 AM, Yonik Se
On Tue, Jul 22, 2008 at 11:39 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
> omitNorms="true"/>
This will give you an exact match. As I said, if it's not, then you
didn't restart and reindex, or you are querying the wrong field.
-Yonik
Lucene has a maxFieldLength (the number of tokens to index for a given
field name).
It can be configured via solrconfig.xml:
1
-Yonik
On Tue, Jul 22, 2008 at 11:38 AM, Tom Lord <[EMAIL PROTECTED]> wrote:
> Hi, we've looked for info about this issue online and in the code and am
> none the wis
Lucene index corrupted... which harddrive do you use?
Quoting Rohan <[EMAIL PROTECTED]>:
Hi Guys,
This is my first post. We are running solr with multiple Indexes, 20
Indexes. I'm facing problem with 5 one. I'm not able to run optimized on
that index. I'm getting following error. Your help is
Indeed - one of my shards had it listed as "text" doh!
thanks for the assurance that led me to find my bug
On Tue, Jul 22, 2008 at 11:43 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Tue, Jul 22, 2008 at 11:39 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
>> > omitNorms="true"/>
>
> This will giv
lookups : how many times the cache is referenced
hits : how many times the cache hits
hitratio : hits/lookups
and for other items, see my previous mail at:
http://www.nabble.com/about-cache-to10192953.html
Koji
Marshall Gunter wrote:
Can someone point me to an in depth explanation of the Solr c
I'm somewhat perplexed, under what circumstances would you be able to
send one query to Solr but not two?
-Mike
On 21-Jul-08, at 8:37 PM, Jon Baer wrote:
Well that's my problem ... I can't :-)
When you put a fq=doctype:news in there your can't get an explicit
facet.query, it will only let
: http://people.apache.org/~shalin/poll.html
Except the existing Solr logo isn't on that list.
i smell election tampering :)
Seriously though: I realized a long time ago that there was too much email
to reply too, too many features to work on, too many patches to review,
and too few hours in
Chris Hostetter wrote:
: http://people.apache.org/~shalin/poll.html
Except the existing Solr logo isn't on that list.
i smell election tampering :)
I had put it in my poll :) I actually considered bringing that up to
Shalin as well, but couldn't bring myself to be so fair I suppose
Serious
Hi,We are developing a product in a agile manner and the current
implementation has a data of size just about a 800 megs in dev. The memory
allocated to solr on dev (Dual core Linux box) is 128-512. My config=
trueMy Field===
Just tried adding a pf field to my request handler. When I did this, solr
returned all document fields for each doc (no "score") instead of returning
the fields specified in fl. Bug? Feature? Anyone know what the reason for
this behavior is? I'm using solr 1.2.
Thanks,
Jason
On 22-Jul-08, at 11:53 AM, Jason Rennie wrote:
Just tried adding a pf field to my request handler. When I did
this, solr
returned all document fields for each doc (no "score") instead of
returning
the fields specified in fl. Bug? Feature? Anyone know what the
reason for
this behavior
Sorry for that. I didnt realise how my had finally arrived. Sorry!!!
From: [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Subject: OOM on Solr Sort
Date: Tue, 22 Jul 2008 18:33:43 +
Hi,
We are developing a product in a agile manner and the current
implementation has a data of size ju
I'm using solrj and all I did was add a pf entry to solrconfig.xml. I don't
think it could be an ampersand issue...
Here's an example query:
wt=xml&rows=10&start=0&q=urban+outfitters&qt=recsKeyword&version=2.2
Here's qt config:
0.06
name^1.5 tags description^0
Hi,
SOrry again fellos. I am not sure whats happening. The day with solr is bad for
me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm
subscription and when I did, it said I was already a member. Now my mails are
all coming out bad. Sorry for troubling y'all this ba
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Out of memory on Solr sorting
> Date: Tue, 22 Jul 2008 19:11:02 +
>
>
> Hi,
> Sorry again fellos. I am not sure whats happening. The day with solr is bad
> for me I guess. EZMLM didnt let me send any mails this morning
Hi,
In my project i have to index whole database which contains text data only.
So if i follow incremental indexing approch than my problem is that how will
I pick delta data from database. Is there any utility in solr to keep track
the last indexed record. Or is there any other approch to solve
Doh! I mistakenly changed the request handler from dismax to standard.
Ignore me...
Jason
On Tue, Jul 22, 2008 at 2:59 PM, Jason Rennie <[EMAIL PROTECTED]> wrote:
> I'm using solrj and all I did was add a pf entry to solrconfig.xml. I
> don't think it could be an ampersand issue...
>
> Here's
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)
- this piece of code do not request Array[100M] (as I seen with
Lucene), it asks only few bytes / Kb for a field...
Probably 128 - 512 is not enough; it is also advisable to use equal sizes
-Xms1024M -Xmx1024M
(i
Thanks Fuad.
But why does just sorting provide an OOM. I executed the
query without adding the sort clause it executed perfectly. In fact I even
tried remove the maxrows=10 and executed. it came out fine. Queries with bigger
results seems to come out fine too. But why just sort
Because to sort efficiently, Solr loads the term to sort on for each doc
in the index into an array. For ints,longs, etc its just an array the
size of the number of docs in your index (i believe deleted or not). For
a String its an array to hold each unique string and an array of ints
indexing
I've even seen exceptions (posted here) when "sort"-type queries
caused Lucene to allocate 100Mb arrays, here is what happened to me:
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
at
org.apache.lucene.search.FieldCacheImpl$1
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
I just noticed, this is an exact number of documents in index: 25191979
(http://www.tokenizer.org/, you can sort - click headers Id, [COuntry,
Site, Price] in a table; experimental)
Thanks for the explanation mark. The reason I had it as 512 max was cos earlier
the data file was just about 30 megs and it increased to this much for of the
usage of EdgeNGramFactoryFilter for 2 fields. Thats great to know it just
happens for the first search. But this exception has been occur
Sorry, Not 30, but 300 :)
From: [EMAIL PROTECTED]: [EMAIL PROTECTED]: RE: Out of memory on Solr
sortingDate: Tue, 22 Jul 2008 20:19:49 +
Thanks for the explanation mark. The reason I had it as 512 max was cos earlier
the data file was just about 30 megs and it increased to this much for of
Fuad Efendi wrote:
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
I just noticed, this is an exact number of documents in index: 25191979
(http://www.tokenizer.org/, you can sort - click headers Id, [COuntry,
Site, Price] in a tab
Can't you write triggers for your database/tables you want to index?
That way you can keep track of all kinds of changes and updates and
not just addition of a new record.
Ravish
On Tue, Jul 22, 2008 at 8:15 PM, anshuljohri <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> In my project i have to index whol
Mark,
Question: how much memory I need for 25,000,000 docs if I do a sort by
field, 256 bytes. 6.4Gb?
Quoting Mark Miller <[EMAIL PROTECTED]>:
Because to sort efficiently, Solr loads the term to sort on for each
doc in the index into an array. For ints,longs, etc its just an array
the siz
Thank you very much Mark,
it explains me a lot;
I am guessing: for 1,000,000 documents with a [string] field of
average size 1024 bytes I need 1Gb for single IndexSearcher instance;
field-level cache it is used internally by Lucene (can Lucene manage
size if it?); we can't have 1G of such
Hmmm...I think its 32bits an integer with an index entry for each doc, so
**25 000 000 x 32 bits = 95.3674316 megabytes**
Then you have the string array that contains each unique term from your
index...you can guess that based on the number of terms in your index
and an avg length guess.
Hi Mark,
I am still getting an OOM even after increasing the heap to 1024.
The docset I have is
numDocs : 1138976 maxDoc : 1180554
Not sure how much more I would need. Is there any other way out of this. I
noticed another interesting behavior. I have a Solr setup on a personal B
Someone else is going to have to take over Sundar - I am new to solr
myself. I will say this though - 25 million docs is pushing the limits
of a single machine - especially with only 2 gig of RAM, especially with
any sort fields. You are at the edge I believe.
But perhaps you can get by. Have
Thanks for your help Mark. Lemme explore a little more and see if some one else
can help me out too. :)
> Date: Tue, 22 Jul 2008 16:53:47 -0400> From: [EMAIL PROTECTED]> To:
> solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr sorting> >
> Someone else is going to have to take over
Ok, what is confusing me is implicit guess that FieldCache contains
"field" and Lucene uses in-memory sort instead of using file-system
"index"...
Array syze: 100Mb (25M x 4 bytes), and it is just pointers (4-byte
integers) to documents in index.
org.apache.lucene.search.FieldCacheI
Ok, after some analysis of FieldCacheImpl:
- it is supposed that (sorted) Enumeration of "terms" is less than
total number of documents
(so that SOLR uses specific field type for sorted searches:
solr.StrField with omitNorms="true")
It creates int[reader.maxDoc()] array, checks (sorted) En
I haven't seen the source code before, But I don't know why the sorting isn't
done after the fetch is done. Wouldn't that make it more faster. at least in
case of field level sorting? I could be wrong too and the implementation might
probably be better. But don't know why all of the fields have
I am hoping [new StringIndex (retArray, mterms)] is called only once
per-sort-field and cached somewhere at Lucene;
theoretically you need multiply number of documents on size of field
(supposing that field contains unique text); you need not tokenize
this field; you need not store TermVect
Yes, it is a cache, it stores "sorted" by "sorted field" array of
Document IDs together with sorted fields; query results can intersect
with it and reorder accordingly.
But memory requirements should be well documented.
It uses internally WeakHashMap which is not good(!!!) - a lot of
"unde
How about releasing the preliminary results so we can see if a run-off
is in order!
On Tue, Jul 22, 2008 at 6:37 AM, Mark Miller <[EMAIL PROTECTED]> wrote:
> My opinion: if its already a runaway, we might as well not prolong things.
> If not though, we should probably give some time for any possib
Hey everybody, I'll be giving a talk called "Apache Solr: Beyond the Box"
at ApacheCon this year, which will focus on the how/when/why of
writing Solr Plugins...
http://us.apachecon.com/c/acus2008/sessions/10
I've got several use cases I can refer to for examples, both from my day
j
On 22-Jul-08, at 4:34 PM, Chris Hostetter wrote:
Hey everybody, I'll be giving a talk called "Apache Solr: Beyond the
Box" at ApacheCon this year, which will focus on the how/when/why of
writing Solr Plugins...
http://us.apachecon.com/c/acus2008/sessions/10
I've got several use
Did you take a look at DataImportHandler?
On Wed, Jul 23, 2008 at 1:57 AM, Ravish Bhagdev
<[EMAIL PROTECTED]> wrote:
> Can't you write triggers for your database/tables you want to index?
> That way you can keep track of all kinds of changes and updates and
> not just addition of a new record.
>
>
Thanks Paul, this is what I was looking for :)
-Anshul Johri
Noble Paul നോബിള് नोब्ळ् wrote:
>
> Did you take a look at DataImportHandler?
>
> On Wed, Jul 23, 2008 at 1:57 AM, Ravish Bhagdev
> <[EMAIL PROTECTED]> wrote:
>> Can't you write triggers for your database/tables you want to index?
61 matches
Mail list logo