Hi all!
i found some strange behavior of solr. If I do sorting by 2 text fields in
chain, I do receive some results doubled.
The both text fields are not multivalued, one of them is string, the other
custom type based on text field and keyword analyzer.
I do this:
*CommonsHttpSolrServer
Well these are pretty different things. SolrCloud is meant to handle
distributed search in a more easy way that "raw" solr distributed search.
You have to build the shards in your own way.
Solr+hadoop is a way to build these shards/indexes in paralel.
--
View this message in context:
http://luc
My guess is two things are happening:
1/ Your combination of filters is in parallel,or an OR expression. This I
think for sure maybe, seen next.
2/ To get 3 duplicate results, your custom filter AND the OR expression above
have to be working togther, or it's possible that your customer f
The balanced segment merging is a really cool idea. I'll definetely
have a look at this, thanks!
One thing I forgot to mention in the original post is we use a
mergeFactor of 25. Somewhat on the high side, so that incoming commits
aren't trying to merge new data into large segments.
25 is a good b
1. You can run multiple Solr instances in separate JVMs, with both
having their solr.xml configured to use the same index folder.
You need to be careful that one and only one of these instances will
ever update the index at a time. The best way to ensure this is to use
one for writing only,
and the
Hi Erik,
I thought this would be good for the wiki, but I've not submitted to
the wiki before, so I thought I'd put this info out there first, then
add it if it was deemed useful.
If you could let me know the procedure for submitting, it probably
would be worth getting it into the wiki (couldn't d
Hi Dennis,
These are the Lucene file segments that hold the index data on the file system.
Have a look at: http://wiki.apache.org/solr/SolrPerformanceFactors
Peter
On Mon, Sep 13, 2010 at 7:02 AM, Dennis Gearon wrote:
> BTW, what is a segment?
>
> I've only heard about them in the last 2 weeks
On Mon, Sep 13, 2010 at 8:02 AM, Dennis Gearon wrote:
> BTW, what is a segment?
On the Lucene level an index is composed of one or more index
segments. Each segment is an index by itself and consists of several
files like doc stores, proximity data, term dictionaries etc. During
indexing Lucene /
Hi,
May you show us what result you actually get? Wouldn't it make more sense to
choose a numeric fieldtype? To get proper sort order of numbers in a string
field, all number need to be exactly same length since order will be
lexiographical, i.e. "10" will come before "2", but after "02".
--
J
As Erick points out, you don't want a random doc as response!
What you're looking at is how to avoid the "0 hits" problem.
You could look into one of these:
* Introduce autosuggest to avoid many 0-hits cases
* Introduce spellchecking
* Re-run the failed query with fuzzy turned on (e.g. alpha~)
* Re
Hi Dennis,
thanks for reply.
Please explain me what filter do you mean.
I'm searching only on one field with names:
query.setQuery(suchstring);
then I'm adding two sortings on another fields:
query.addSortField("type", SolrQuery.ORDER.asc);
query.addSortField("sortName", SolrQuery.ORDER.asc);
th
MitchK schrieb:
Frank,
have a look at SOLR-646.
Do you think a workaround for the data-dir-tag in the solrconfig.xml can
help?
I think about something like ${solr./data/corename} for
illustration.
Unfortunately I am not very skilled in working with solr's variables and
therefore I do not know
A couple of things come to mind:
1> what happens if you remove the sort clauses?
Because I suspect they're irrelevant and your
duplicate issue is something different.
2> SOLR admin should let you determine this.
3> Please show us the configurations that
make you sure that the documen
Let's suppose we have a regular search field body_t, and an internal
boolean flag flag_t not exposed to the user.
I'd like
body_t:foo AND flag_t:true
to be an intersection, but if "foo" is a stopword I get all documents
for which flag_t is true, as if the first class was dropped, or if
techn
On Mon, Sep 13, 2010 at 3:27 PM, Xavier Noria wrote:
> Let's suppose we have a regular search field body_t, and an internal
> boolean flag flag_t not exposed to the user.
>
> I'd like
>
> body_t:foo AND flag_t:true
this is solr right? why don't you use filterquery for you unexposed
flat_t fiel
On Mon, Sep 13, 2010 at 4:29 PM, Simon Willnauer
wrote:
> On Mon, Sep 13, 2010 at 3:27 PM, Xavier Noria wrote:
>> Let's suppose we have a regular search field body_t, and an internal
>> boolean flag flag_t not exposed to the user.
>>
>> I'd like
>>
>> body_t:foo AND flag_t:true
>
> this is so
Hi Erik,
I completely agree with you that showing a random document for user's query
would be very poor experience. I have raised this in our product review
meetings before. I was told that because of contractual agreement some
sponsored content needs to be returned even if it meant no match. And
You're right, it would be better to just give it a sortable numerical value.
For now I gave time_code a sdouble type and see if it sorted, and it did.
However all the 0's are trimmed, but that shouldn't be a problem unless it were
to truncate any values past the hundreds column.
Thanks.
- Noel
I thought I saw 'custom analyzer', but you wrote 'custom field'.
My mistake.
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Mon, 9/13/10, Stanislaw wrote:
> From:
This issue is one I hope to head off in my application / on my site. Instead of
an ad feed, I HOPE to be able to have an ad QUEUE on my site. If necessary,
I'll convert the feed TO a queue.
The queue will get a first pass done on it by either an employee or a
compensated user. Either one genera
I just tried several searches again on google.
I think they've refined the ads placements so that certain kind of searches
return no ads, the kinds that I've been doing relative to programming being one
of them.
If OTOH I do some product related search, THEN lots of ads show up, but fairly
acc
Thanks guys for the explanation.
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Mon, 9/13/10, Simon Willnauer wrote:
> From: Simon Willnauer
> Subject: Re: Tuning
You do not need either addition if you just want to have multiple Solr
instances on different machines, and query them all at once. Look at
this for the simplest way:
http://wiki.apache.org/solr/DistributedSearch
On Mon, Sep 13, 2010 at 12:52 AM, Marc Sturlese wrote:
>
> Well these are pretty di
"Java Swing" no longer gives ads for "swinger's clubs".
On Mon, Sep 13, 2010 at 9:37 AM, Dennis Gearon wrote:
> I just tried several searches again on google.
>
> I think they've refined the ads placements so that certain kind of searches
> return no ads, the kinds that I've been doing relative
On Mon, Sep 13, 2010 at 8:07 PM, Lance Norskog wrote:
> "Java Swing" no longer gives ads for "swinger's clubs".
damned no i have to explicitly enter it?! - argh!
:)
simon
>
> On Mon, Sep 13, 2010 at 9:37 AM, Dennis Gearon wrote:
>> I just tried several searches again on google.
>>
>> I think th
Hi Savannah,
if you *only want to boost* documents based on the information you
calculate from the MoreLikeThis results (i.e. numeric measure), you
might want to take a look at the ExternalFileField type. This field type
reads its contents from a file which contains key-value pairs, e.g. the
Thanks Robert and everyone!
I'm working on changing our JVM settings today, since putting Solr 1.4.1 into
production will take a bit more work and testing. Hopefully, I'll be able to
test the setTermIndexDivisor on our test server tomorrow.
Mike, I've started the process to see if we can provi
Thanks Kent for your info.
We are not doing any faceting, sorting, or much else. My guess is that most of
the memory increase is just the data structures created when parts of the frq
and prx files get read into memory. Our frq files are about 77GB and the prx
files are about 260GB per sha
Hi
is it possible to issue a query to solr, to get a list which contains all the
field names in the index?
What about to get a list of the freqency of individual words in each field?
thanks,
Peter
: Yes, I have thought of that, or even extending field type. But this does not
: work for my use case, since I can have multiple fields of a same type
: (therefore with the same field type, and same analyzer), but each one of them
: needs specific information. Therefore, I think the only "nice" wa
: References:
: <4c881061.60...@jhu.edu>
:
: In-Reply-To:
:
: Subject: Need Advice for Finding Freelance Solr Expert
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an exis
check:
http://wiki.apache.org/solr/LukeRequestHandler
On Mon, Sep 13, 2010 at 7:00 PM, Peter A. Kirk wrote:
> Hi
>
> is it possible to issue a query to solr, to get a list which contains all the
> field names in the index?
>
> What about to get a list of the freqency of individual words in eac
Fantastic - that is exactly what I was looking for!
But here is one thing I don't undertstand:
If I call the url:
http://localhost:8983/solr/admin/luke?numTerms=10&fl=name
Some of the result looks like:
18
Does this mean that the term "gb" occurs 18 times in the name field?
Be
On Mon, Sep 13, 2010 at 6:29 PM, Burton-West, Tom wrote:
> Thanks Robert and everyone!
>
> I'm working on changing our JVM settings today, since putting Solr 1.4.1 into
> production will take a bit more work and testing. Hopefully, I'll be able to
> test the setTermIndexDivisor on our test serv
I tracked down the problem and found a workaround. If there is a
wildcard entry in schema.xml such as the following.
then sort by function fails and returns Error 400 can not sort on
unindexed field:
Removing the name="*" entry from schema.xml is a workaround. I noted
this in the Solr-1
On Mon, Sep 13, 2010 at 6:45 PM, Burton-West, Tom wrote:
> Thanks Kent for your info.
>
> We are not doing any faceting, sorting, or much else. My guess is that most
> of the memory increase is just the data structures created when parts of the
> frq and prx files get read into memory. Our frq
Think about THE big one - google.
(First, China for this example is avoided because much Chinese data is
ILLEGAL to be
provided for search outside of China)
If there is data generated by people in Europe, in various languages:
1/ Is it stored close to where it is generated?
2/ Are sharding an
We are running SOLR 1.4.1 (Lucene 2.9.3) on a 2-CPU Linux host, but it seems
that only 1 CPU is ever being used. It almost seems like something is
single-threading inside the SOLR application. The CPU utilization is very
seldom over 0.9 even under load.
We are running on virtual Linux hosts and o
On Tue, Sep 14, 2010 at 1:39 AM, Peter A. Kirk wrote:
> Fantastic - that is exactly what I was looking for!
>
> But here is one thing I don't undertstand:
>
> If I call the url:
> http://localhost:8983/solr/admin/luke?numTerms=10&fl=name
>
> Some of the result looks like:
>
>
>
>
> 18
Hi,
I'm trying to spell check a whole field using a lowercasing keyword
tokenizer [1].
for example if I query for "furntree gully" I'm hoping to get back
"ferntree gully" as a suggestion. Unfortunately the spell checker
seems to be recognizing this as two tokens and returning suggestions
for bot
Nevermind this one... With a bit more research I discovered I can use
spellcheck.q to provide the correct suggestion.
On 14 September 2010 16:02, Glen Stampoultzis wrote:
> Hi,
>
> I'm trying to spell check a whole field using a lowercasing keyword
> tokenizer [1].
>
> for example if I query for
41 matches
Mail list logo