The SolrRelevancyFAQ does suggest that both index-time and search-time boosting
can be used to boost the score of newer documents, but doesn't suggest what
reasons/contexts one might choose one vs the other. It only provides an
example of search-time boost though, so it doesn't answer the quest
It seems like it would be far more efficient to calculate the boost factor
once and store it rather than calculating it for each request in real-time.
Some of our queries match tens of thousands if not hundreds of thousands of
documents in a 15GB index. However, I'm not well-versed in lucene inter
+1
Good question, my use of Solr would benefit from nested annotated beans as well.
Awaiting the reply,
Thom
On 2010-06-03, at 1:35 PM, Peter Hanning wrote:
>
> When modeling documents with a lot of fields (hundreds) the bean class used
> with SolrJ to interact with the Solr index tends to g
On Fri, Jun 4, 2010 at 7:33 PM, Andy wrote:
> Yonik,
>
> Just curious why does using enum improve the facet performance.
>
> Furkan was faceting on a text field with each word being a facet value. I'd
> imagine that'd mean there's a large number of facet values. According to the
> documentation
I've done a lot of recency boosting to documents, and I'm wondering why you
would want to do that at index time. If you are continuously indexing new
documents, what was "recent" when it was indexed becomes, over time "less
recent". Are you unsatisfied with your current performance with the boost
f
the queryparser first splits on whitespace.
so each individual word of your query: short,red,evil,fox gets its own
tokenstream, and therefore isn't shingled.
On Fri, Jun 4, 2010 at 6:21 PM, Greg Bowyer wrote:
> Hi all
>
> Interesting and by the looks of things very solid project you have here
>
Perhaps I should have been more specific in my initial post. I'm doing
date-based boosting on the documents in my index, so as to assign a higher
score to more recent documents. Currently I'm using a boost function to
achieve this. I'm wondering if there would be a performance improvement if
ins
Yonik,
Just curious why does using enum improve the facet performance.
Furkan was faceting on a text field with each word being a facet value. I'd
imagine that'd mean there's a large number of facet values. According to the
documentation (http://wiki.apache.org/solr/SimpleFacetParameters#facet
Index time boosting is different than search time boosting, so
asking about performance is irrelevant.
Paraphrasing Hossman from years ago on the Lucene list (from
memory).
...index time boosting is a way of saying this documents'
title is more important than other documents' titles. Search
time
Hi all
Interesting and by the looks of things very solid project you have here with
SOLR, however ..
I have an index that contains a large number of "phrases" that I need to search
for over, each of these phrases is fairly small being on average about 4 words
long.
The search terms that I am
: That is still really small for 5MB documents. I think the default solr
: document cache is 512 items, so you would need at least 3 GB of memory
: if you didn't change that and the cache filled up.
that assumes that the extracted text tika extracts from each document is
the same size as the o
: to format the data from my sources. I can read through the catalina
: log, but this seems to just log requests; not much info is given about
: errors or when the service hangs. Here are some examples:
if you are only seeing one log line per request, then you are just looking
at the "request"
On 10-06-04 05:11 PM, Ahmet Arslan wrote:
I have an issue with range queries on a long value in our
dataset (the dataset is fairly large, but i believe the
problem still exists for smaller datasets). When i
query the index with a range, as such: id:[1 TO 2000], I get
values back that are wel
> I have an issue with range queries on a long value in our
> dataset (the dataset is fairly large, but i believe the
> problem still exists for smaller datasets). When i
> query the index with a range, as such: id:[1 TO 2000], I get
> values back that are well outside that range. Its as
> if th
Hi,
What are the performance ramifications for using a function-based boost at
search time (through bf in dismax parser) versus an index-time boost?
Currently I'm using boost functions on a 15GB index of ~14mm documents. Our
queries generally match many thousands of documents. I'm wondering if I
Check the wiki
1. Do I need to copy the entire example folder from my local machine to Solr
home on Sun Solaris box?
http://wiki.apache.org/solr/SolrJBoss
2. How can I have multiple cores on the Sun Solaris box?
http://wiki.apache.org/solr/CoreAdmin
Regards
Juan
www.linebee.com
B
Hi guys,
I have a list of consultants and the users (people who work for the
company) are supposed to be able to search for consultants based on
the time frame they worked for, for a company. For example, I should
be able to search for all consultants who worked for Bear Stearns in
the month of j
Hi,
I have an issue with range queries on a long value in our dataset (the
dataset is fairly large, but i believe the problem still exists for
smaller datasets). When i query the index with a range, as such: id:[1
TO 2000], I get values back that are well outside that range. Its as if
the r
: Ok so I think that Solr (lucene) will only remove deleted/updated
: documents from the disk after an optimize or after an 'expungeDeletes'
: request. Is there a way to trigger the expunsion (new word) across the
: entire index? I tried :
deletes are removed when segments are merged -- an opt
You are my hero. I replaced the Tika 0.8 snapshots that were included with Solr
with 0.6 and it works now. Thank you!
Brad
On Jun 3, 2010, at 6:22 AM, David George wrote:
>
> Which version of Tika do you have? There was a problem introduced somewhere
> between Tika 0.6 and Tika 0.7 whereby the
Hello out there,
I am searching for a solution for conditional Document Boosting.
During analyzing the fields of a document, I want to create a document boost
based on some metrics.
There are two approaches:
First: I preprocess the data. The main problem with this is, that I need to
take care ab
Very informative - thank you!
I think it might be useful to have this feature - maybe have an interface for
plugins to register a XSD or otherwise declare its expected xml elements and
attributes. I'm not sure if there's enough demand for this to justify the time
it would take to make this chan
> P.S. Might it be helpful for Solr to complain about invalid
> XML during startup? Does it do this and I'm just not
> noticing?
Chris's explanation about a similar topic:
http://search-lucene.com/m/11JWX1hxL4u/
I installed Solr on my local machine and it works fine with Jetty. I am trying
to install on JBoss which is running on a Sun Solaris box and I have the
following questions:
1. Do I need to copy the entire example folder from my local machine to Solr
home on Sun Solaris box?
2. How can I ha
That did it. Thank you =)
P.S. Might it be helpful for Solr to complain about invalid XML during startup?
Does it do this and I'm just not noticing?
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Friday, June 04, 2010 12:18 PM
To: solr-user@lucene.apache.org
Subje
>
>
Simple lowercase F is causing this. It should be
(10/05/25 0:31), n...@frameweld.com wrote:
Hello,
How am I able to highlight a field that contains a specific value? If I have a field
called type, how am I able to highlight the rows whose values contain something like
"title"?
http://localhost:8983/solr/select?q=title&hl=on&hl.fl=type
Ok so I think that Solr (lucene) will only remove deleted/updated documents
from the disk after an optimize or after an 'expungeDeletes' request. Is there
a way to trigger the expunsion (new word) across the entire index? I tried :
final UpdateRequest request = new UpdateRequest()
request.setPar
I am using 1.4 version.
I have tried your suggestion,
it takes around 25-30 seconds now.
Thank you,
On Fri, Jun 4, 2010 at 5:54 PM, Yonik Seeley wrote:
> Faceting on a full-text field is hard.
> What version of Solr are you using?
>
> If it's 1.4 or later, try setting
> facet.method=enum
>
>
All,
I am trying to sort on a text field and can't get it to work. I try sorting on
"sortTitle" and get no errors, it just doesn't appear to sort. The pertinent
parts of my schema:
... lots of filters that do work...
Hi Guys,
I'm experiencing the same issue with a single war. I'm using a brand new
Solr war built from yestertay's version of the trunk.
I've got one master with 2 cores and one slave with a single core. I'm using
one core from master as the master of the second core (which is configured
as a re
Faceting on a full-text field is hard.
What version of Solr are you using?
If it's 1.4 or later, try setting
facet.method=enum
And to use the filterCache less, try
facet.enum.cache.minDf=100
-Yonik
http://www.lucidimagination.com
On Fri, Jun 4, 2010 at 10:31 AM, Furkan Kuru wrote:
> Hello,
>
>
Hello,
I have been dealing with real-time data.
As the number of total indexed documents gets larger (now 5 M)
a faceted search on a text field limited by the creation time, which we use
to find the most used word in all these text fields, gets slow down.
query string: created_time:[NOW-1HOUR
I guess the following works.
A. similar to your option 2, but using the filtercache
fq=-item_id:001 -item_id:002
B. similar to your option 3, but using the filtercache
fq=-users_excluded_field:
the advantage being that the filter is cached independently from the rest of
the query so it can be re
How would you model this?
We have a table of news items that people can view in their news stream and
comment on. Users have the ability to "mute" item so they never see them in
their feed or search results.
>From what I can see there are a couple ways to accomplish this.
1 - Post process the
Additionally, I should have mentioned that you can instead do:
fq=field_3:[* TO *], which uses the filtercache.
The method presented by Chris will probably outperform the above method but
only on the first request, from then on the filtercache takes over.
>From a performance standpoint it's probab
nice one! thanks.
>
>> i could be wrong but it seems this
>> way has a performance hit?
>>
>> or i am missing something?
>
> Did you read Chris's message in http://search-lucene.com/m/1o5mEk8DjX1/
> He proposes alternative (more efficient) way other than [* TO *]
>
>
>
>
Hoss,
thanks a lot! (We are using tomcat so the logging properties file is fine.)
Do you know what the reason of the mentioned exception could be?
It seems to me that if this exception accurs that even the replication
for that index does not work.
If I then remove the data director + reload + poll
> i could be wrong but it seems this
> way has a performance hit?
>
> or i am missing something?
Did you read Chris's message in http://search-lucene.com/m/1o5mEk8DjX1/
He proposes alternative (more efficient) way other than [* TO *]
i could be wrong but it seems this way has a performance hit?
or i am missing something?
> field1:"new york"+field2:"new york"+field3:[* TO *]
>
> 2010/6/4 bluestar
>
>> hi there,
>>
>> say my search query is "new york", and i am searching field1 and field2
>> for it, how do i specify that i wan
field1:"new york"+field2:"new york"+field3:[* TO *]
2010/6/4 bluestar
> hi there,
>
> say my search query is "new york", and i am searching field1 and field2
> for it, how do i specify that i want to exlude docs where field3 doesnt
> exist?
>
> thanks
>
>
> say my search query is "new york", and i am searching
> field1 and field2
> for it, how do i specify that i want to exlude docs where
> field3 doesnt
> exist?
http://search-lucene.com/m/1o5mEk8DjX1/
Hi,
Here's a field type using synonyms :
synonyms="french-synonyms.txt" ignoreCase="true" expand="true"/>
mapping="mapping-ISOLatin1Accent.txt"/>
mapping="mapping-ISOLatin1Accent.txt"/>
Here are the contents of 'french-synonyms.txt' that I used for testing :
PC,parti co
43 matches
Mail list logo