Hi all,
I want to implement a facet search for my application. Can some1 of you help
me out?
Thanks,
Regards,
Sajith Vimukthi Weerakoon.
You can send arbitrary requests via SolrJ, just use the parameter map
via the query method: http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/SolrServer.html
.
-Grant
On Apr 7, 2009, at 1:52 PM, Fink, Clayton R. wrote:
These URLs give me what I want - word completion and t
: > The last value is used for sorting in multi-valued fields. What is the
: > reason behind sorting on a multi-valued field?
strictly speaking the behavior is non-determinisitic. in most cases
attempting to sort on a multi-valued field will generate an error.
: Cant do much about it, that is
: Does anybody have any further suggestions on what I might try in this
: situation? Any tools perhaps that might help me put my finger on Solr's
: pulse so I can figure out just what's going on in there at index and query
: time?
1) FYI: you don't always need the settings on every filter to be
If you've been using a MultiSearcher to query multiple *remote* searchers,
then Distributed searching in solr should be a appropriate.
if you're use to useing MultiSearcher as a way of aggregating from
multiple *local* indexes distributed searching is probably going to seem
slow compared to wh
leap years don't just complicate the calucation when a person was born on
Feb 29 ... even if no one was born on feb 29, answering the question
"who's birthday is in the next/last X days?" is complicated by needing to
know whether the current year is a leap year...
: Or have two fields, dayofy
Or have two fields, dayofyear and dayofleapyear, then use the
right field in the right year. --wunder
On 4/7/09 4:32 PM, "Stephen Weiss" wrote:
> If someone's birthday falls on a leap year, in most countries their
> birthday is considered to be February 28th unless it happens to be a
> leap year
If someone's birthday falls on a leap year, in most countries their
birthday is considered to be February 28th unless it happens to be a
leap year. You could make the field a float, encode the day number as
59.5, so it will match where it should, and write special handling
along these line
ashokc wrote:
What I am doing right now is to capture all the content under "content_korea"
for example, use 'copyField' to duplicate that content to "content_english".
"content_korea" gets processed with CJK analyzers, and "content_english"
gets processed with usual detailed index/query analyzer
(for people who don't know, the schema browser and the lUke
handler return a "histogram" for each field)
: I have noticed that I can�t seem to make sense of the histogram. For every
: field the x-axis shows powers of 2 which make no sense for things like brand
: name. Am I looking at it wrong
What I am doing right now is to capture all the content under "content_korea"
for example, use 'copyField' to duplicate that content to "content_english".
"content_korea" gets processed with CJK analyzers, and "content_english"
gets processed with usual detailed index/query analyzers, filters, syn
: Hi everyone,
: I have an index that stores birth-dates, and I would like to search for
: anybody whose birth-date is within X days of a certain month/day. For
: example, I'd like to know if anybody's birthday is coming up within a
: certain number of days, regardless of what year they were
: I have documents where text from two languages, e.g. (english & korean) or
: (english & german) are mixed u p in a fairly intensive way. 20-30% of the
if you search the list archives you'll find a lot of results for
"languages" ... it's not something i deal with much but i believe using
separ
: Any documents marked deleted in this index are just the result of updates to
: those documents. There are no purely deleted documents. Furthermore, the
: field that I am ordering by in my function query remains untouched over the
: updates.
it doesn't matter wether it was an update or a true
: 1. What is the userId to be given in scripts.conf file.
it's just a username that the scripts will try to sudo to if specified ...
it's a way of ensuring that all of the actions the script takes (logging,
creating files, etc...) are executed by a specific unix user no matter who
runs the scr
: - Going to http://localhost:8983/core1/admin/stats.jsp#cache shows a
: nearly empty Cache section. The only cache that shows up there is
: fieldValueCache (which is really commented out in solrconfig.xml, but
: Solr creates it anyway, which is normal). All other caches are missing.
:
: Any
: Sorry, I just realized I can use SolrIndexSearcher.search(Query, Hit)...
:
: that was my question basically.
I wouldn't recommend it ... those methods bypass all of the goodness Solr
adds on top of of Lucene (caching, etc...)
if you're writing plugin/embedded code where you have access to th
: Indeed. I wrote the following test:
:
: Pattern p = Pattern.compile("(.*)");
: Matcher m = p.matcher("xyz");
: Assert.assertEquals("", "Video", m.replaceAll("Video"));
:
: The test fails. It gives "VideoVideo" as the actual result. I guess there is
: something about Matcher.replaceAll that I d
StandardTokenizer is tricky.
it does a lot of kooky things that probably made sense when it was
written, you'll not in your output that the "term type" is getting set to
"HOST" Standard Tokenizer has decided that L.I.C looks like a hostname, so
it's not splitting on the periods.
: analys
Looks like I was using the wrong field when searching (tokenized instead
of untokenized) and this approach actually worked. Sorry for the
confusion.
-Original Message-
From: Vauthrin, Laurent
Sent: Monday, April 06, 2009 10:03 AM
To: solr-user@lucene.apache.org
Subject: RE: Wildcard sear
These URLs give me what I want - word completion and term counts. What I don't
see is a way to call these via SolrJ. I could call the server directly using
java.net classes and process the XML myself, I guess. There needs to be an auto
suggest request class.
http://localhost:8983/solr/autoSugge
I see this interesting line in the wiki page LargeIndexes
http://wiki.apache.org/solr/LargeIndexes (sorting section towards the bottom)
Using _val:ord(field) as a search term will sort the results without incurring
the memory cost.
I'd like to know what this means, but I'm having a bit of trou
It does end up in the right order (sorted), but it's very expensive. Sorting
by a couple fields that each have fewer unique index values seems to limit the
memory consumption greatly.
-Original Message-
From: Walter Underwood [mailto:wunderw...@netflix.com]
Sent: Tuesday, April 07, 2009
Why tokenize the date? It sorts just fine as a string. --wunder
On 4/7/09 8:50 AM, "Erick Erickson" wrote:
> Your observations about date sorting are probably correct. The
> issue is that the sort caches in Lucene look at the unique terms.
> There are many more unique terms (nearly every one) in
Good info to have. Thanks Erick.
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, April 07, 2009 10:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Coming up with a model of memory usage
Your observations about date sorting are probably correct.
Cool, great resource, thanks.
-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
Sent: Tuesday, April 07, 2009 10:13 AM
To: solr-user@lucene.apache.org
Subject: Re: Coming up with a model of memory usage
On Tue, Apr 7, 2009 at 8:25 PM, Joe Pollard wrote:
> It
>Thank you much Fergus,
>
>I was considering implementing a database which would hold a path name
>and an MD5 sum of each file.
Snap. That is close to what we did. However due to our pervious
duff full text search engine we had to hold this information in
a separate checksums file. Solr is much bet
Your observations about date sorting are probably correct. The
issue is that the sort caches in Lucene look at the unique terms.
There are many more unique terms (nearly every one) in
2008-08-12T12:18:26.510
then when the field is split. You can reduce memory consumption
when sorting even more by
Note that Solr (trunk, soon to be 1.4) has a duplicate detection
feature that may work for your need. See http://wiki.apache.org/solr/Deduplication
(looks like docs need updating to say 1.4 here) and http://issues.apache.org/jira/browse/SOLR-799
Erik
On Apr 7, 2009, at 11:25 AM, Ves
Thank you much Fergus,
I was considering implementing a database which would hold a path name
and an MD5 sum of each file.
Then as a part of Solr indexing, one could check against the DB if a
file path exists, if Yes, then compare MD5 and only index if different.
Regards,
Veselin K
On Tue, Apr
On Tue, Apr 7, 2009 at 8:25 PM, Joe Pollard wrote:
> It doesn't seem to matter whether fields are stored or not, but I've
> found a rather striking difference in the memory requirements during
> sorting. Sorting on a string field representing datetime like
> '2008-08-12T12:18:26.510' is about twi
It doesn't seem to matter whether fields are stored or not, but I've
found a rather striking difference in the memory requirements during
sorting. Sorting on a string field representing datetime like
'2008-08-12T12:18:26.510' is about twice as memory intense as sorting
first by '2008-08-12' and th
Hi,
I want to use the NGramTokenizerFactory tokeniser to enable partial
matching on a field in my index. For instance for the field:
"Lorem ipsum"
I want it to match "lor" "lorem" and "lorem i". However I am finding it
matches the first two but not the third - the white space is causing
problems
Hi,
So I did two test on two servers;
First server : with just replication every 20mn like you can notice:
http://www.nabble.com/file/p22930179/cpu_without_request.png
cpu_without_request.png
http://www.nabble.com/file/p22930179/cpu2_without_request.jpg
cpu2_without_request.jpg
Second server
Yeah, that is a good idea. Some of it can be obtained already through
the Editorial Boosting, some through function queries, similarity
factory, custom sorting and other features.
User feedback and click log analysis would be nice features to have as
well.
http://wiki.apache.org/solr/How
Hi all,
I am looking for a mechanism to check the amount
of difference between a document already in the index
with the one updated with some new content. Basically,
I want to design a criteria to decide whether or not to
update the document with the new one.
In case solr already has something lik
Can you add the values as literals?
http://wiki.apache.org/solr/ExtractingRequestHandler#head-88b9f55989c9878638e88be5d335b5126550f87c
On Apr 3, 2009, at 8:29 PM, Venu Mittal wrote:
Hi,
I am using ExtractingRequestHandler to index rich text documents.
The way I am doing it is I get some dat
Would it not be a good idea to provide Ranking as solr plugin, in which
users can write their custom ranking algorithms and reorder the results
returned by Solr in whichever way they need. It may also help Solr users to
incorporate learning (from search user feedback - such as click logs), and
reor
yes, non cached. If I repeat a query the response is fast since the results
are cached.
2009/4/7 Noble Paul നോബിള് नोब्ळ्
> are these the numbers for non-cached requests?
>
> On Tue, Apr 7, 2009 at 11:46 AM, CIF Search wrote:
> > Hi,
> >
> > I have around 10 solr servers running indexes of aro
are these the numbers for non-cached requests?
On Tue, Apr 7, 2009 at 11:46 AM, CIF Search wrote:
> Hi,
>
> I have around 10 solr servers running indexes of around 80-85 GB each and
> and with 16,000,000 docs each. When i use distrib for querying, I am not
> getting a satisfactory response time.
Let me assume that the graph shows the CPU idle time. How do I know
that the spikes are during the replication
It is possible that you observe CPU spikes soon after the replication
because that is when you will have very few cache hits . Because
searches are done live.
Even if the index is very l
Hi Noble
I turnd off autoWarming to zero.
And yes it's during it replicate, it takes all the data index.
Because it merges too much, too much update 2000docs every 30mn, it always
merge my index.
So the replication bring back all my data/index.
which use a big part of the cpu like u can see on t
Veselin,
Well, as far as solr is concerned, there is two issues here:-
1) To stop the same document ending up in the indexes twice, use the document
pathname as the unique ID. Then if you do index it twice, the previous index
information will be discarded. Not very efficient, but it may be
43 matches
Mail list logo