Hi,
Long story short, we use a CMS that is integrated with Solr 4.6, with the solrj
jar file in the global/common Tomcat classpath. We currently use a Google
Search Appliance machine for our own freetext search needs, but plan to replace
that with some other solution in the near future. Since w
Yeah, sort of. Solr isn't bundled in the CMS, it is in a separate Tomcat
instance. But our code is running on the same Tomcat as the CMS, and the CMS
uses solrj 4.x to talk with its solr. And now we want to be able to talk with
our own separate solr, running solr 5.x, and would prefer to use sol
Shawn wrote:
>
> If you are NOT running SolrCloud, then that should work with no problem.
> The HTTP API is fairly static and has not seen any major upheaval recently.
> If you're NOT running SolrCloud, you may even be able to replace the
> SolrJ jar in your existing system with the 5.4.1 versio
OK, so just to be clear. As far as you know, and from your point of view, you
would consider it a better solution to stick with the 4.6 solrj client jar for
both the 4.6 and 5.x communication, rather than switching the 4.6 solrj client
jar to the 5.x version and hoping that the CMS solr-specific
Oh, one more thing. Would this setup still be possible if we would want to have
the new 5.x solr server be the solr cloud version? I'm not saying that
SolrCloud is a requirement for us (it might even not be suitable, since our
index is not that large), but still would be good to know.
/Jimi
--
Hi,
We have a use case where we want to influence the score of the documents based
on the document type, and I am a bit unsure what is the best way to achieve
this. In essence we have about 100.000 documents, of about 15 different
document types. And we more or less want to tweak the score diff
Hi,
I want to setup ExtendedDisMax in our solr 4.6 server, but I can't seem to find
any example configuration for this. Ie the configuration needed in
solrconfig.xml. In the wiki page http://wiki.apache.org/solr/ExtendedDisMax it
simply says:
"Extended DisMax is already configured in the examp
I'm sorry, but I am still confused. I'm expecting to see some
tag somewhere. Why doesn't the documentation nor the example solrconfig.xml
contain such a tag?
If the edismax requestHandler is defined automatically, the documentation
should explain that. Also, there should still exist some xml c
I have no problem with automatic. It is "automagicall" stuff that I find a bit
hard to like. Ie things that are automatic, but doesn't explain how and why
they are automatic. But Disney Land and Disney World are actually really good
examples of places where the magic stuff is suitable, ie in the
There is no need to deliberately misinterpret what I wrote. What I was trying
to say was that "automagical" things don't belong in a professional
environment, because it is hiding important information from people. And this
is bad as it is, but if it on top of that is the *intended* meaning for
Hi Jan,
Well, I have very likely confused some old documentation to be up to date, but
all I did was to google for "ExtendedDisMax" and clicked on the first result:
https://wiki.apache.org/solr/ExtendedDisMax
I could only assume that this was a valid page since it belongs to
wiki.apache.org/so
Well, I have to say that I strongly disagree with you. No regular user should
have to resort to the source code to understand that edismax is preconfigured.
Because that is what this is all about, in essence. The current documentation
doesn't mention this, and the only documentation about config
Thanks Shawn,
I had more or less assumed that the cwiki site was focused on the latest Solr
version, but never really noticed that the "reference guide" was available in
version-specific releases. I guess that is partly because I prefer googling
about a specific topic, instead of reading some
On Thursday, March 17, 2016 7:58 PM, wun...@wunderwood.org wrote:
>
> Think about using popularity as a boost. If one movie has a million rentals
> and one has a hundred rentals, there is no additive formula that balances
> that with text relevance. Even with log(popularity), it doesn't work.
I
On Thursday, March 17, 2016 11:21 PM, u...@odoko.co.uk wrote:
>
> If you use additive boosting, when you add a boost to a search with one term,
> (e.g. between 0 and 1)
> you get a different effect compared to when you add the same boost to a
> search with four terms (e.g. between 0 and 4).
Wo
On Friday, March 18, 2016 5:11 PM, wun...@wunderwood.org wrote:
>
> I used a popularity score based on the DVD being in people's queues and the
> streaming views.
> The Peter Jackson films were DVD only. They were in about 100 subscriber
> queues.
> The first Twilight film was in 1.25 million
Hi,
After reading a bit on various sites, and especially the blog post "Comparing
boost methods in Solr", it seems that the preferred boosting type is the
multiplicative one, over the additive one. But I can't really get my head
around *why* that is so, since in most boosting problems I can thi
Hi,
We are using Solrj to query our solr server, and it works great. However, it
uses the binary format wt=javabin, and now when I'm trying to get better debug
output, I notice a problem with this. The thing is, I want to include the
explain data for each search result, by adding "[explain]" as
On Friday, March 18, 2016 4:25 PM, wun...@wunderwood.org wrote:
>
> That works fine if you have a query that matches things with a wide range of
> popularities. But that is the easy case.
>
> What about the query "twilight", which matches all the Twilight movies, all
> of which are popular (mil
On Friday, March 18, 2016 3:53 PM, wun...@wunderwood.org wrote:
>
> Popularity has a very wide range. Try my example, scale 1 million and 100
> into the same 1.0-0.0 range. Even with log popularity.
Well, in our case, we don't really care do differentiate between documents with
low popularity.
On Friday, March 18, 2016 2:19 PM, apa...@elyograg.org wrote:
>
> The "max score" of a particular query can vary widely, and only has meaning
> within the context of that query.
> One query on an index might produce a max score of 0.944, so *every* document
> has a score less than one,
> whil
Forgot to add that we use Solr 4.6.0.
-Original Message-
From: jimi.hulleg...@svensktnaringsliv.se
[mailto:jimi.hulleg...@svensktnaringsliv.se]
Sent: Wednesday, March 16, 2016 9:39 PM
To: solr-user@lucene.apache.org
Subject: Explain style json? Without using wt=json...
Hi,
We are using
Hi,
I'm trying to boost documents using a phrase field boosting (ie the pf
parameter for edismax), but I can't get it to work (ie boosting documents where
the pf field match the query as a phrase).
As far as I can tell, solr, or more specifically the edismax handler, does
*something* when I ad
OK. Interesting. But... I added a solr.TrimFilterFactory at the end of my
analyzer definition. Shouldn't that take care of the added space at the end?
The admin analysis page indicates that it works as it should, but I still can't
get edismax to boost.
-Original Message-
From: Jack Krup
I now used the Eclipse debugger, to try and see if I can understand what is
happening, I it seems like the ExtendedDismaxQParser simply ignores my pf
parameter, since it doesn't interpret it as a phrase query.
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.6.0/solr/core/src/ja
Some more input, before I call it a day. Just for the heck of it, I tried
changing minClauseSize to 0 using the Eclipse debugger, so that it didn't
return null at line 1203, but instead returned the TermQuery on line 1205. Then
everything worked exactly as it should. The matching document got bo
I guess I can conclude that this is a bug. But I wasn't able to report it in
Jira. I just got to some servicedesk form
(https://issues.apache.org/jira/servicedesk/customer/portal/5/create/27) that
didn't seem related to solr in any way, (the affects/fix version fields didn't
correspond to any s
OK, well I'm not sure I agree with you. First of all, you ask me to point my
"pf" towards a tokenized field, but I already do that (the fact that all text
is tokenized into a single token doesn't change that fact). Also, I don't agree
with the view that a single term phrase never is valid/reason
I think that this parameter is only used to interpret the dates provided in the
query, like query filters. At least that is how I interpret the wiki text. Your
interpretation makes more sense in general though, it would be nice if it was
possible to modify the timezone for both the query and the
On Wednesday, April 6, 2016 2:50 PM, apa...@elyograg.org wrote:
>
> If you can only create a service desk request, then you might be clicking the
> "Service Desk" menu item,
> or maybe you're clicking the little down arrow on the right side of the big
> red "Create" button.
> Try clicking the
Hi,
In general I think that the fieldNorm factor in the score calculation is quite
good. But when the text is short I think that the effect is two big.
Ie with two documents that have a short text in the same field, just a few
characters extra in of the documents lower the fieldNorm factor too
Hi,
I have been looking a bit at the tie parameter, and I think I understand how it
works, but I still have a few questions about it.
1. It is not documented anywhere (as far as I have seen) what the default value
is. Some testing indicates that the default value is 0, and it makes perfect
sen
OK. Well, still, the fact that the score increases almost 20% because of just
one extra term in the field, is not really reasonable if you ask me. But you
seem to say that this is expected, reasonable and wanted behavior for most use
case?
I'm not sure that I feel comfortable replacing the defa
Hi Ahmet,
SweetSpotSimilarity seems quite nice. Some simple testing by throwing some
different values at the class gives quite good results. Setting ln_min=1,
ln_max=2, steepness=0.1 and discountOverlaps=true should give me more or less
what I want. At least for the title field. I'm not sure wh
Thanks Ahmet! The second I read that part about the "albino elephant" query I
remembered that I had read that before, but just forgotten about it. That
explanation is really good, and really should be part of the regular
documentation if you ask me. :)
/Jimi
-Original Message-
From: Ah
Hang on... It didn't work out as I wanted. But the problem seems to be in the
encoding of the fieldNorm value. The decoded value is so coarse, so that when
it is decoded the result is that two values that were quite close to each other
originally, can become quite far apart after encoding and de
I am talking about the title field. And for the title field, a sweetspot
interval of 1 to 50 makes very little sense. I want to have a fieldNorm value
that differentiates between for example 2, 3, 4 and 5 terms in the title, but
only very little.
The 20% number I got by simply calculating the d
Ok sure, I can try and give some examples :)
Lets say that we have the following documents:
Id: 1
Title: John Doe
Id: 2
Title: John Doe Jr.
Id: 3
Title: John Lennon: The Life
Id: 4
Title: John Thompson's Modern Course for the Piano: First Grade Book
Id: 5
Title: I Rode With Stonewall: Being C
Yes, the example was contrived. Partly because our documents are mostly in
Swedish text, but mostly because I thought that the example should be simple
enough so it focused on the thing discussed (even though I simplifyed it to
such a degree that I left out the current main problem with the fiel
Yes, we do edismax per field boosting, with explicit boosting of the title
field. So it sure makes length normalization less relevant. But not
*completely* irrelevant, which is why I still want to have it as part of the
scoring, just with much less impact that it currently has.
/Jimi
__
Yes, it definately seems to be the main problem for us. I did some simple tests
of the encoding and decoding calculations in DefaultSimilarity, and my findings
are:
* For input between 1.0 and 0.5, a difference of 0.01 in the input causes the
output to change by a value of 0 or 0.125 depending
Hi Ahmet,
Yes, I have also come to that conclusion, that I need to do one of those things
if I want this function, since Solr/Lucene is lacking in this area. Although
after some discussion with my coworkers, we decided to simply disable norms for
the title field, and not do anything more, for n
Hi,
An extra tip, on top of everything that Erick said:
Add an extra field to all documents, that contains the date the document was
indexed. That way, you can always compare the solr documents on different
machines, and quickly see what "version" exists on each machine.
And you don't have to
Hi,
Is it possible using some Solr schema magic to make solr get the default value
for a field from another field? Ie, if the value is specified in the document
to be indexed, then that value is used. Otherwise it uses the value of another
field. As far as I understand it, the field property "d
Hi Emir,
Thanks for the tip about DefaultValueUpdateProcessorFactory. But even though I
agree that it most likely isn't too hard to write custom code that does this,
the overhead is a bit too much I think considering we now use a vanilla Solr
with no custom code deployed. So we would need to se
Thank you Alexandre! It worked great. :)
And here is how it is configured, if someone else wants to do this, but is too
busy to read the documentation for these classes:
source_field
target_field
target_field
Hi,
I'm trying to add the spelling suggestion feature to our search, but I'm having
problems getting suggestions on some misspellings.
For example, the Swedish word 'mycket' exists in ~14.000 of a total of ~40.000
documents in our index.
A search for the incorrect spelling 'myket' (a missing '
Hi,
I wasn't happy with how our current solr configuration handled diacritics (like
'é') in the text and in search queries, since it simply considered the letter
with a diacritic as a distinct letter. Ie 'é' didn't match 'e', and vice versa.
Except for a handful rare words where the diacritical
Hi Kshitij,
Quoting Yonik, the creator of solr:
"Ties are the same as in lucene... internal docid (equiv to the order in which
they were added to the index)."
Also, you can have multiple sort clauses, where score can be the first one.
Like sort=score DESC, publishDate DESC. But I think the rec
Hi Eric.
> But that's not the most important bit. Have you considered something like
> MappingCharFitlerFactory?
> Unfortunately that's a charFilter which transforms everything before it gets
> to the repeatFilter so you'd have to use two fields.
Yes, that is actually what I tried after giving
No one has any input on my post below about the spelling suggestions? I just
find it a bit frustrating not being able to understand this feature better, and
why it doesn't give the expected results. A built in "explain" feature really
would have helped.
/Jimi
-Original Message-
From: j
Hi Alessandro,
Thanks for your explanation. It helped a lot. Although setting
"spellcheck.maxResultsForSuggest" to a value higher than zero was not enough. I
also had to set "spellcheck.alternativeTermCount". With that done, I now get
suggestions when searching for 'mycet' (a misspelling of the
I just noticed why setting maxResultsForSuggest to a high value was not a good
thing. Because now it show spelling suggestions even on correctly spelled words.
I think, what I would need is the logic of SuggestMode.
SUGGEST_WHEN_NOT_IN_INDEX, but with a configurable limit instead of it being
ha
Hi,
What is the prefered way to do searches with date truncation with respect to a
specific time zone? The dates are stored correctly, ie I can see the UTC date
in the index and if I add 1 or 2 hours (depending on daylight saving time or
not) I get the time in our time zone (CET/CEST). But when
OK. Feels a bit strange that one would have to do this manual calculation in
every place that performs searches like this.
Would be much more logical if solr supported specifying the timezone in the
query (with a default setting being possible to configure in solrconfig), and
that solr itself di
Thanks! I totally forgot to add the word "math" (as in 'solr date math time
zone') when searching for a solution on this, so I never stumbled upon that
jira issue. Will giv it a try.
/Jimi
> -Ursprungligt meddelande-
> Från: Erick Erickson [mailto:erickerick...@gmail.com]
> Skickat: den
Hi,
Is there a way to dynamically change a field name using some magic regex or
similar in the schema file?
For example, if we have a field named "subtitle_string_indexed_stored", then we
could have a dynamic field that matches "*_string_indexed_stored" and then
renames it to simply "subtitle"
Thanks Jan, for the links and quick explanation.
In our case Solr is integrated into the CMS we use, so I Think upgrading to a
4.4+ version (that supports write calls to the Schema API) is not an option at
the moment.
The field alias function for the result set sounds simple enough, as long as
Hi,
I'm experimenting with date range faceting, and would like to use different
gaps depending on how old the date is. But I am not sure on how to do that.
This is what I have tried, using the java API Solrj 4.0.0 and Solr 4.1.0:
solrQuery.addDateRangeFacet("scheduledate_start_tdate", date1, da
Directly after I sent my email, I tested using two different field names,
instead of the same field name for both range facets. And then it worked.
So, it seems there is a bug that can't handle multiple range facets for the
same field name. A workaround is to use a copyfield to another field, an
> Chris Hostetter wrote:
>
> You can see that in the resulting URL you got the params are duplicated -- the
> problem is that when expressed this way, Solr doesn't know when the
> different values of the start/end/gap params should be applied -- it just
> loops over each of the facet.range fields (
Hi,
We have a problem with solr in our test environment in our new project. The
stack trace can be seen below. The thing is that this only effects the search
that is performed by the CMS itself. Our own custom searches still works fine.
Anyone know what could cause this error? A restart doesn't
62 matches
Mail list logo