from:"\"jimi.hullegard\""

Mix Solr 4 and 5?

2016-01-21 Thread jimi.hullegard

Hi, Long story short, we use a CMS that is integrated with Solr 4.6, with the solrj jar file in the global/common Tomcat classpath. We currently use a Google Search Appliance machine for our own freetext search needs, but plan to replace that with some other solution in the near future. Since w

RE: Mix Solr 4 and 5?

2016-01-22 Thread jimi.hullegard

Yeah, sort of. Solr isn't bundled in the CMS, it is in a separate Tomcat instance. But our code is running on the same Tomcat as the CMS, and the CMS uses solrj 4.x to talk with its solr. And now we want to be able to talk with our own separate solr, running solr 5.x, and would prefer to use sol

RE: Mix Solr 4 and 5?

2016-01-22 Thread jimi.hullegard

Shawn wrote: > > If you are NOT running SolrCloud, then that should work with no problem. > The HTTP API is fairly static and has not seen any major upheaval recently. > If you're NOT running SolrCloud, you may even be able to replace the > SolrJ jar in your existing system with the 5.4.1 versio

RE: Mix Solr 4 and 5?

2016-01-22 Thread jimi.hullegard

OK, so just to be clear. As far as you know, and from your point of view, you would consider it a better solution to stick with the 4.6 solrj client jar for both the 4.6 and 5.x communication, rather than switching the 4.6 solrj client jar to the 5.x version and hoping that the CMS solr-specific

RE: Mix Solr 4 and 5?

2016-01-22 Thread jimi.hullegard

Oh, one more thing. Would this setup still be possible if we would want to have the new 5.x solr server be the solr cloud version? I'm not saying that SolrCloud is a requirement for us (it might even not be suitable, since our index is not that large), but still would be good to know. /Jimi --

Index time or query time boost, and help with boost syntax

2016-02-22 Thread jimi.hullegard

Hi, We have a use case where we want to influence the score of the documents based on the document type, and I am a bit unsure what is the best way to achieve this. In essence we have about 100.000 documents, of about 15 different document types. And we more or less want to tweak the score diff

ExtendedDisMax configuration nowhere to be found

2016-02-28 Thread jimi.hullegard

Hi, I want to setup ExtendedDisMax in our solr 4.6 server, but I can't seem to find any example configuration for this. Ie the configuration needed in solrconfig.xml. In the wiki page http://wiki.apache.org/solr/ExtendedDisMax it simply says: "Extended DisMax is already configured in the examp

RE: ExtendedDisMax configuration nowhere to be found

2016-02-28 Thread jimi.hullegard

I'm sorry, but I am still confused. I'm expecting to see some tag somewhere. Why doesn't the documentation nor the example solrconfig.xml contain such a tag? If the edismax requestHandler is defined automatically, the documentation should explain that. Also, there should still exist some xml c

RE: ExtendedDisMax configuration nowhere to be found

2016-02-28 Thread jimi.hullegard

I have no problem with automatic. It is "automagicall" stuff that I find a bit hard to like. Ie things that are automatic, but doesn't explain how and why they are automatic. But Disney Land and Disney World are actually really good examples of places where the magic stuff is suitable, ie in the

RE: ExtendedDisMax configuration nowhere to be found

2016-02-29 Thread jimi.hullegard

There is no need to deliberately misinterpret what I wrote. What I was trying to say was that "automagical" things don't belong in a professional environment, because it is hiding important information from people. And this is bad as it is, but if it on top of that is the *intended* meaning for

RE: ExtendedDisMax configuration nowhere to be found

2016-02-29 Thread jimi.hullegard

Hi Jan, Well, I have very likely confused some old documentation to be up to date, but all I did was to google for "ExtendedDisMax" and clicked on the first result: https://wiki.apache.org/solr/ExtendedDisMax I could only assume that this was a valid page since it belongs to wiki.apache.org/so

RE: ExtendedDisMax configuration nowhere to be found

2016-02-29 Thread jimi.hullegard

Well, I have to say that I strongly disagree with you. No regular user should have to resort to the source code to understand that edismax is preconfigured. Because that is what this is all about, in essence. The current documentation doesn't mention this, and the only documentation about config

RE: ExtendedDisMax configuration nowhere to be found

2016-02-29 Thread jimi.hullegard

Thanks Shawn, I had more or less assumed that the cwiki site was focused on the latest Solr version, but never really noticed that the "reference guide" was available in version-specific releases. I guess that is partly because I prefer googling about a specific topic, instead of reading some

RE: Why is multiplicative boost prefered over additive?

2016-03-18 Thread jimi.hullegard

On Thursday, March 17, 2016 7:58 PM, wun...@wunderwood.org wrote: > > Think about using popularity as a boost. If one movie has a million rentals > and one has a hundred rentals, there is no additive formula that balances > that with text relevance. Even with log(popularity), it doesn't work. I

RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard

On Thursday, March 17, 2016 11:21 PM, u...@odoko.co.uk wrote: > > If you use additive boosting, when you add a boost to a search with one term, > (e.g. between 0 and 1) > you get a different effect compared to when you add the same boost to a > search with four terms (e.g. between 0 and 4). Wo

RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard

On Friday, March 18, 2016 5:11 PM, wun...@wunderwood.org wrote: > > I used a popularity score based on the DVD being in people's queues and the > streaming views. > The Peter Jackson films were DVD only. They were in about 100 subscriber > queues. > The first Twilight film was in 1.25 million

Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard

Hi, After reading a bit on various sites, and especially the blog post "Comparing boost methods in Solr", it seems that the preferred boosting type is the multiplicative one, over the additive one. But I can't really get my head around *why* that is so, since in most boosting problems I can thi

Explain style json? Without using wt=json...

2016-03-19 Thread jimi.hullegard

Hi, We are using Solrj to query our solr server, and it works great. However, it uses the binary format wt=javabin, and now when I'm trying to get better debug output, I notice a problem with this. The thing is, I want to include the explain data for each search result, by adding "[explain]" as

RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard

On Friday, March 18, 2016 4:25 PM, wun...@wunderwood.org wrote: > > That works fine if you have a query that matches things with a wide range of > popularities. But that is the easy case. > > What about the query "twilight", which matches all the Twilight movies, all > of which are popular (mil

RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard

On Friday, March 18, 2016 3:53 PM, wun...@wunderwood.org wrote: > > Popularity has a very wide range. Try my example, scale 1 million and 100 > into the same 1.0-0.0 range. Even with log popularity. Well, in our case, we don't really care do differentiate between documents with low popularity.

RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard

On Friday, March 18, 2016 2:19 PM, apa...@elyograg.org wrote: > > The "max score" of a particular query can vary widely, and only has meaning > within the context of that query. > One query on an index might produce a max score of 0.944, so *every* document > has a score less than one, > whil

RE: Explain style json? Without using wt=json...

2016-03-20 Thread jimi.hullegard

Forgot to add that we use Solr 4.6.0. -Original Message- From: jimi.hulleg...@svensktnaringsliv.se [mailto:jimi.hulleg...@svensktnaringsliv.se] Sent: Wednesday, March 16, 2016 9:39 PM To: solr-user@lucene.apache.org Subject: Explain style json? Without using wt=json... Hi, We are using

Can't get phrase field boosting to work using edismax

2016-04-05 Thread jimi.hullegard

Hi, I'm trying to boost documents using a phrase field boosting (ie the pf parameter for edismax), but I can't get it to work (ie boosting documents where the pf field match the query as a phrase). As far as I can tell, solr, or more specifically the edismax handler, does *something* when I ad

RE: Can't get phrase field boosting to work using edismax

2016-04-05 Thread jimi.hullegard

OK. Interesting. But... I added a solr.TrimFilterFactory at the end of my analyzer definition. Shouldn't that take care of the added space at the end? The admin analysis page indicates that it works as it should, but I still can't get edismax to boost. -Original Message- From: Jack Krup

RE: Can't get phrase field boosting to work using edismax

2016-04-05 Thread jimi.hullegard

I now used the Eclipse debugger, to try and see if I can understand what is happening, I it seems like the ExtendedDismaxQParser simply ignores my pf parameter, since it doesn't interpret it as a phrase query. https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.6.0/solr/core/src/ja

RE: Can't get phrase field boosting to work using edismax

2016-04-05 Thread jimi.hullegard

Some more input, before I call it a day. Just for the heck of it, I tried changing minClauseSize to 0 using the Eclipse debugger, so that it didn't return null at line 1203, but instead returned the TermQuery on line 1205. Then everything worked exactly as it should. The matching document got bo

RE: Can't get phrase field boosting to work using edismax

2016-04-06 Thread jimi.hullegard

I guess I can conclude that this is a bug. But I wasn't able to report it in Jira. I just got to some servicedesk form (https://issues.apache.org/jira/servicedesk/customer/portal/5/create/27) that didn't seem related to solr in any way, (the affects/fix version fields didn't correspond to any s

RE: Can't get phrase field boosting to work using edismax

2016-04-06 Thread jimi.hullegard

OK, well I'm not sure I agree with you. First of all, you ask me to point my "pf" towards a tokenized field, but I already do that (the fact that all text is tokenized into a single token doesn't change that fact). Also, I don't agree with the view that a single term phrase never is valid/reason

RE: How to use TZ parameter in a query

2016-04-06 Thread jimi.hullegard

I think that this parameter is only used to interpret the dates provided in the query, like query filters. At least that is how I interpret the wiki text. Your interpretation makes more sense in general though, it would be nice if it was possible to modify the timezone for both the query and the

RE: Can't get phrase field boosting to work using edismax

2016-04-06 Thread jimi.hullegard

On Wednesday, April 6, 2016 2:50 PM, apa...@elyograg.org wrote: > > If you can only create a service desk request, then you might be clicking the > "Service Desk" menu item, > or maybe you're clicking the little down arrow on the right side of the big > red "Create" button. > Try clicking the

Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard

Hi, In general I think that the fieldNorm factor in the score calculation is quite good. But when the text is short I think that the effect is two big. Ie with two documents that have a short text in the same field, just a few characters extra in of the documents lower the fieldNorm factor too

Questions about tie parameter for dismax/edismax

2016-04-20 Thread jimi.hullegard

Hi, I have been looking a bit at the tie parameter, and I think I understand how it works, but I still have a few questions about it. 1. It is not documented anywhere (as far as I have seen) what the default value is. Some testing indicates that the default value is 0, and it makes perfect sen

RE: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard

OK. Well, still, the fact that the score increases almost 20% because of just one extra term in the field, is not really reasonable if you ask me. But you seem to say that this is expected, reasonable and wanted behavior for most use case? I'm not sure that I feel comfortable replacing the defa

RE: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard

Hi Ahmet, SweetSpotSimilarity seems quite nice. Some simple testing by throwing some different values at the class gives quite good results. Setting ln_min=1, ln_max=2, steepness=0.1 and discountOverlaps=true should give me more or less what I want. At least for the title field. I'm not sure wh

RE: Questions about tie parameter for dismax/edismax

2016-04-20 Thread jimi.hullegard

Thanks Ahmet! The second I read that part about the "albino elephant" query I remembered that I had read that before, but just forgotten about it. That explanation is really good, and really should be part of the regular documentation if you ask me. :) /Jimi -Original Message- From: Ah

RE: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard

Hang on... It didn't work out as I wanted. But the problem seems to be in the encoding of the fieldNorm value. The decoded value is so coarse, so that when it is decoded the result is that two values that were quite close to each other originally, can become quite far apart after encoding and de

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard

I am talking about the title field. And for the title field, a sweetspot interval of 1 to 50 makes very little sense. I want to have a fieldNorm value that differentiates between for example 2, 3, 4 and 5 terms in the title, but only very little. The 20% number I got by simply calculating the d

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard

Ok sure, I can try and give some examples :) Lets say that we have the following documents: Id: 1 Title: John Doe Id: 2 Title: John Doe Jr. Id: 3 Title: John Lennon: The Life Id: 4 Title: John Thompson's Modern Course for the Piano: First Grade Book Id: 5 Title: I Rode With Stonewall: Being C

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard

Yes, the example was contrived. Partly because our documents are mostly in Swedish text, but mostly because I thought that the example should be simple enough so it focused on the thing discussed (even though I simplifyed it to such a degree that I left out the current main problem with the fiel

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard

Yes, we do edismax per field boosting, with explicit boosting of the title field. So it sure makes length normalization less relevant. But not *completely* irrelevant, which is why I still want to have it as part of the scoring, just with much less impact that it currently has. /Jimi __

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard

Yes, it definately seems to be the main problem for us. I did some simple tests of the encoding and decoding calculations in DefaultSimilarity, and my findings are: * For input between 1.0 and 0.5, a difference of 0.01 in the input causes the output to change by a value of 0 or 0.125 depending

RE: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-21 Thread jimi.hullegard

Hi Ahmet, Yes, I have also come to that conclusion, that I need to do one of those things if I want this function, since Solr/Lucene is lacking in this area. Although after some discussion with my coworkers, we decided to simply disable norms for the title field, and not do anything more, for n

Re: Replicas for same shard not in sync

2016-04-23 Thread jimi.hullegard

Hi, An extra tip, on top of everything that Erick said: Add an extra field to all documents, that contains the date the document was indexed. That way, you can always compare the solr documents on different machines, and quickly see what "version" exists on each machine. And you don't have to

Default value from another field?

2017-10-03 Thread jimi.hullegard

Hi, Is it possible using some Solr schema magic to make solr get the default value for a field from another field? Ie, if the value is specified in the document to be indexed, then that value is used. Otherwise it uses the value of another field. As far as I understand it, the field property "d

RE: Default value from another field?

2017-10-03 Thread jimi.hullegard

Hi Emir, Thanks for the tip about DefaultValueUpdateProcessorFactory. But even though I agree that it most likely isn't too hard to write custom code that does this, the overhead is a bit too much I think considering we now use a vanilla Solr with no custom code deployed. So we would need to se

RE: Default value from another field?

2017-10-04 Thread jimi.hullegard

Thank you Alexandre! It worked great. :) And here is how it is configured, if someone else wants to do this, but is too busy to read the documentation for these classes: source_field target_field target_field

Can't get spelling suggestions to work properly

2016-12-16 Thread jimi.hullegard

Hi, I'm trying to add the spelling suggestion feature to our search, but I'm having problems getting suggestions on some misspellings. For example, the Swedish word 'mycket' exists in ~14.000 of a total of ~40.000 documents in our index. A search for the incorrect spelling 'myket' (a missing '

ICUFoldingFilter with swedish characters, and tokens with the keyword attribute?

2017-01-09 Thread jimi.hullegard

Hi, I wasn't happy with how our current solr configuration handled diacritics (like 'é') in the text and in search queries, since it simply considered the letter with a diacritic as a distinct letter. Ie 'é' didn't match 'e', and vice versa. Except for a handful rare words where the diacritical

RE: Same score listing order

2017-01-10 Thread jimi.hullegard

Hi Kshitij, Quoting Yonik, the creator of solr: "Ties are the same as in lucene... internal docid (equiv to the order in which they were added to the index)." Also, you can have multiple sort clauses, where score can be the first one. Like sort=score DESC, publishDate DESC. But I think the rec

RE: ICUFoldingFilter with swedish characters, and tokens with the keyword attribute?

2017-01-10 Thread jimi.hullegard

Hi Eric. > But that's not the most important bit. Have you considered something like > MappingCharFitlerFactory? > Unfortunately that's a charFilter which transforms everything before it gets > to the repeatFilter so you'd have to use two fields. Yes, that is actually what I tried after giving

RE: Can't get spelling suggestions to work properly

2017-01-10 Thread jimi.hullegard

No one has any input on my post below about the spelling suggestions? I just find it a bit frustrating not being able to understand this feature better, and why it doesn't give the expected results. A built in "explain" feature really would have helped. /Jimi -Original Message- From: j

RE: Can't get spelling suggestions to work properly

2017-01-13 Thread jimi.hullegard

Hi Alessandro, Thanks for your explanation. It helped a lot. Although setting "spellcheck.maxResultsForSuggest" to a value higher than zero was not enough. I also had to set "spellcheck.alternativeTermCount". With that done, I now get suggestions when searching for 'mycet' (a misspelling of the

RE: Can't get spelling suggestions to work properly

2017-01-13 Thread jimi.hullegard

I just noticed why setting maxResultsForSuggest to a high value was not a good thing. Because now it show spelling suggestions even on correctly spelled words. I think, what I would need is the logic of SuggestMode. SUGGEST_WHEN_NOT_IN_INDEX, but with a configurable limit instead of it being ha

Date truncation and time zone when searching

2014-05-21 Thread jimi.hullegard

Hi, What is the prefered way to do searches with date truncation with respect to a specific time zone? The dates are stored correctly, ie I can see the UTC date in the index and if I add 1 or 2 hours (depending on daylight saving time or not) I get the time in our time zone (CET/CEST). But when

SV: Date truncation and time zone when searching

2014-05-21 Thread jimi.hullegard

OK. Feels a bit strange that one would have to do this manual calculation in every place that performs searches like this. Would be much more logical if solr supported specifying the timezone in the query (with a default setting being possible to configure in solrconfig), and that solr itself di

SV: Date truncation and time zone when searching

2014-05-21 Thread jimi.hullegard

Thanks! I totally forgot to add the word "math" (as in 'solr date math time zone') when searching for a solution on this, so I never stumbled upon that jira issue. Will giv it a try. /Jimi > -Ursprungligt meddelande- > Från: Erick Erickson [mailto:erickerick...@gmail.com] > Skickat: den

Any way to dynamically rename fields in the schema?

2013-10-22 Thread jimi.hullegard

Hi, Is there a way to dynamically change a field name using some magic regex or similar in the schema file? For example, if we have a field named "subtitle_string_indexed_stored", then we could have a dynamic field that matches "*_string_indexed_stored" and then renames it to simply "subtitle"

RE: Any way to dynamically rename fields in the schema?

2013-10-22 Thread jimi.hullegard

Thanks Jan, for the links and quick explanation. In our case Solr is integrated into the CMS we use, so I Think upgrading to a 4.4+ version (that supports write calls to the Schema API) is not an option at the moment. The field alias function for the result set sounds simple enough, as long as

Date range faceting with various gap sizes?

2013-11-12 Thread jimi.hullegard

Hi, I'm experimenting with date range faceting, and would like to use different gaps depending on how old the date is. But I am not sure on how to do that. This is what I have tried, using the java API Solrj 4.0.0 and Solr 4.1.0: solrQuery.addDateRangeFacet("scheduledate_start_tdate", date1, da

SV: Date range faceting with various gap sizes?

2013-11-12 Thread jimi.hullegard

Directly after I sent my email, I tested using two different field names, instead of the same field name for both range facets. And then it worked. So, it seems there is a bug that can't handle multiple range facets for the same field name. A workaround is to use a copyfield to another field, an

Re: Date range faceting with various gap sizes?

2013-11-15 Thread jimi.hullegard

> Chris Hostetter wrote: > > You can see that in the resulting URL you got the params are duplicated -- the > problem is that when expressed this way, Solr doesn't know when the > different values of the start/end/gap params should be applied -- it just > loops over each of the facet.range fields (

Invalid use of SingleClientConnManager: connection still allocated

2013-11-16 Thread jimi.hullegard

Hi, We have a problem with solr in our test environment in our new project. The stack trace can be seen below. The thing is that this only effects the search that is performed by the CMS itself. Our own custom searches still works fine. Anyone know what could cause this error? A restart doesn't

62 matches

Mail list logo