date slider
Hi, I have implemented a search, where all the facet's are offered as checkbox style filters along with a fulltext search to first narrow down the result set. For this I have implemented the search to run the fulltext search with the facets. If additional checkbox filters have been deselected, then I run a secondary query where I leave the faceting out to get the actual results (and setting rows=0 in the facet query). I just stumbled over an entry in the wiki [1], which seems to look like I do not really need that secondary query if filters are selected. But this is not the main topic of my question. Now I also want to offer a slider to define the range to include in the result set. However here I do not want to do faceting, instead I just want to find out the min and max date values in the result (without any of the facet filters applies) so I know the start and end points for the slider. The user can then move the sliders to further filter the result set. How can I best go about fetching just those min and max values, ideally without having to add a separate query just for this? regards, Lukas Kahwe Smith m...@pooteeweet.org [1] http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
Re: Autosuggest
Hi, maybe you would like to have a look at solr.ShingleFilterFactory [1] to expand your autosuggest to more than one term. -Sascha [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory Blargy wrote: Thanks for your help and especially your analyzer.. probably saved me a full-import or two :)
Re: How to tell which field matched?
Hi, I'm not sure if debugQuery=on is a feasible solution in a productive environment, as generating such extra information requires a reasonable amount of computation. -Sascha Jon Baer wrote: Does the standard debug component (?debugQuery=on) give you what you need? http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22 - Jon On May 14, 2010, at 4:03 PM, Tim Garton wrote: All, I've searched around for help with something we are trying to do and haven't come across much. We are running solr 1.4. Here is a summary of the issue we are facing: A simplified example of our schema is something like this: When someone does a search we search across the title, supplement_title, and supplement_pdf_text fields. When we get our results, we would like to be able to tell which field the search matched and if it's a multiValued field, which of the multiple values matched. This is so that we can display results similar to: Example Title Example Supplement Title Example Supplement Title 2 (your search matched this document) Example Supplement Title 3 Example Title 2 Example Supplement Title 4 Example Supplement Title 5 Example Supplement Title 6 (your search matched this document) etc. How would you recommend doing this? Is there some way to get solr to tell us which field matched, including multiValued fields? As a workaround we have been using highlighting to tell which field matched, but it doesn't get us what we want for multiValued fields and there is a significant cost to enabling the highlighting. Should we design our schema in some other fashion to achieve these results? Thanks. -Tim
Re: How to tell which field matched?
Additionally, I don't think this gets us what we want with multiValued fields. It tells if a multiValued field matched, but not which value out of the multiple values matched. I am beginning to suspect that this information can't be returned and we may have to restructure our schema. -Tim On Sat, May 15, 2010 at 7:12 AM, Sascha Szott wrote: > Hi, > > I'm not sure if debugQuery=on is a feasible solution in a productive > environment, as generating such extra information requires a reasonable > amount of computation. > > -Sascha > > Jon Baer wrote: >> >> Does the standard debug component (?debugQuery=on) give you what you need? >> >> >> http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22 >> >> - Jon >> >> On May 14, 2010, at 4:03 PM, Tim Garton wrote: >> >>> All, >>> I've searched around for help with something we are trying to do >>> and haven't come across much. We are running solr 1.4. Here is a >>> summary of the issue we are facing: >>> >>> A simplified example of our schema is something like this: >>> >>> >> required="true" /> >>> >> required="true" /> >>> >>> >> stored="true" multiValued="true" /> >>> >> stored="true" multiValued="true" /> >>> >> stored="true" multiValued="true" /> >>> >>> When someone does a search we search across the title, >>> supplement_title, and supplement_pdf_text fields. When we get our >>> results, we would like to be able to tell which field the search >>> matched and if it's a multiValued field, which of the multiple values >>> matched. This is so that we can display results similar to: >>> >>> Example Title >>> Example Supplement Title >>> Example Supplement Title 2 (your search matched this document) >>> Example Supplement Title 3 >>> >>> Example Title 2 >>> Example Supplement Title 4 >>> Example Supplement Title 5 >>> Example Supplement Title 6 (your search matched this document) >>> >>> etc. >>> >>> How would you recommend doing this? Is there some way to get solr to >>> tell us which field matched, including multiValued fields? As a >>> workaround we have been using highlighting to tell which field >>> matched, but it doesn't get us what we want for multiValued fields and >>> there is a significant cost to enabling the highlighting. Should we >>> design our schema in some other fashion to achieve these results? >>> Thanks. >>> >>> -Tim >> > >
Re: Autosuggest
On 2010-05-15 02:46, Blargy wrote: > > Thanks for your help and especially your analyzer.. probably saved me a > full-import or two :) > Also, take a look at this issue: https://issues.apache.org/jira/browse/SOLR-1316 -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Autosuggest
Andrzej is this ready for production usage? "Hopefully in the future we can include user click through rates to boost those terms/phrases higher" - This could be huge! -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p819762.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autosuggest
Maybe I should have phrased it as: "Is this ready to be used with Solr 1.4?" Also, as Grang asked in the thread, what is the actual status of that patch? Thanks again! -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p819765.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to tell which field matched?
Sorry my response wasn't to actually use debugQuery on for production it was more of wondering if it (the component) gave you the insight data you were looking for, on a side note Im also interested in this type of component because there are a number of projects I have worked on recently where it seems people outside of tuning the index want to know "why did my query match these results?" in some sort of ~plain english explanation~. I have the feeling what you want is possible it's just not finding it's way into the result set yet (guess) or needs a plugin. - Jon On May 15, 2010, at 11:16 AM, Tim Garton wrote: > Additionally, I don't think this gets us what we want with multiValued > fields. It tells if a multiValued field matched, but not which value > out of the multiple values matched. I am beginning to suspect that > this information can't be returned and we may have to restructure our > schema. > > -Tim > > On Sat, May 15, 2010 at 7:12 AM, Sascha Szott wrote: >> Hi, >> >> I'm not sure if debugQuery=on is a feasible solution in a productive >> environment, as generating such extra information requires a reasonable >> amount of computation. >> >> -Sascha >> >> Jon Baer wrote: >>> >>> Does the standard debug component (?debugQuery=on) give you what you need? >>> >>> >>> http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22 >>> >>> - Jon >>> >>> On May 14, 2010, at 4:03 PM, Tim Garton wrote: >>> All, I've searched around for help with something we are trying to do and haven't come across much. We are running solr 1.4. Here is a summary of the issue we are facing: A simplified example of our schema is something like this: >>> required="true" /> >>> required="true" /> >>> stored="true" multiValued="true" /> >>> stored="true" multiValued="true" /> >>> stored="true" multiValued="true" /> When someone does a search we search across the title, supplement_title, and supplement_pdf_text fields. When we get our results, we would like to be able to tell which field the search matched and if it's a multiValued field, which of the multiple values matched. This is so that we can display results similar to: Example Title Example Supplement Title Example Supplement Title 2 (your search matched this document) Example Supplement Title 3 Example Title 2 Example Supplement Title 4 Example Supplement Title 5 Example Supplement Title 6 (your search matched this document) etc. How would you recommend doing this? Is there some way to get solr to tell us which field matched, including multiValued fields? As a workaround we have been using highlighting to tell which field matched, but it doesn't get us what we want for multiValued fields and there is a significant cost to enabling the highlighting. Should we design our schema in some other fashion to achieve these results? Thanks. -Tim >>> >> >>
Re: Connection Pool
Connection spooling is specified by the underlying apache commons connection manager when you create the Server. The SUSS does socket pooling by default and is the preferred way to do concurrent indexing. There are some quirks in the Server implementation set, and SUSS avoids them. Unless you are willing to root around in the SolrJ Server code and understand exactly how it works, stay with the SUSS. On Fri, May 14, 2010 at 6:44 AM, gabriele renzi wrote: > On Fri, May 14, 2010 at 3:35 PM, Anderson vasconcelos > wrote: >> Hi >> I wanna to know if has any connection pool client to manage the connections >> with solr. In my system, we have a lot of concurrency index request. I cant >> shared my connection, i need to create one per transaction. But if i create >> one per transaction, i think the performance will down. >> >> How you resolve this problem? > > The commonsHttpSolrServer class does connection pooling, and IIRC also > the StreamingUpdateSolrServer. > > > > -- > blog en: http://www.riffraff.info > blog it: http://riffraff.blogsome.com > -- Lance Norskog goks...@gmail.com
Re: Short DismaxRequestHandler Question
Okay, I will do so in future, if another problem like this occurs. At the moment, everything is fine after I followed your suggestions. Kind regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/Short-DismaxRequestHandler-Question-tp775913p820355.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multi-valued associated fields
Here's the problem with mixing dissimilar text: relevance. Your text relevance depends on a document's "delta" with all other documents in the index. If you index nothing but technical papers, searching a technical term will find what you expect. If you mix technical papers and movie titles, text query will be useless. On Thu, May 13, 2010 at 12:06 PM, Eric Grobler wrote: > Hi Ahmed > > Thanks again for sharing your insight and experience. > I will discuss the multi-core approach with members of our team. > > Regards > Eric > > On Wed, May 12, 2010 at 9:24 PM, ahammad wrote: > >> >> In our deployment, we thought that complications might arise when >> attempting >> to hit the Solr server with addresses of too many cores. For instance, we >> have 15+ cores running at the moment. At the worst case, we will have to >> use >> all 15+ addresses of all the cores to search all our data. What we >> eventually did was to combine all the cores into a single core, which will >> basically give us a more clean solution. You will get the simplicity of >> querying one core, but the flexibility of modifying cores separately. >> >> Basically, we have all the cores indexing separately. We set up a script >> that would use the index merge functionality of Solr to combine all the >> indexes into a single index accessible through one core. Yes, there will be >> some overhead on the server, but I believe that it's a good compromise. In >> our case, we have multiple servers at our disposal, so this was not a >> problem to implement. It all depends on your data set and the volume of >> documents that you will be indexing. >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813419.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > -- Lance Norskog goks...@gmail.com
Re: sort by function
Can you provide us some more information on what you really want to do? Like the examples in the wiki said, the returned value of the function query is multiplied with the score - you can boost your returned value from the function query, if you like to do so. Kind regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/sort-by-function-tp814380p820359.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Advancded Reading
One my tricks for studying a deep project is to look at bug fixes/release notes/new features. Understanding one little bug fix will cause you to learn a subset of the code. Once you have that structure in your head, exploring more bugs & features on the Jira will fill out that structure. Lance On Thu, May 13, 2010 at 11:34 AM, Peter Sturge wrote: > A truly indispensable resource is Yonik's Mastering Solr 1.4 on-demand > webinar: > > > http://www.lucidimagination.com/solutions/Webinars/mastering-solr-1.4-with-yonik-seeley > > > > > On Thu, May 13, 2010 at 6:04 PM, Blargy wrote: > >> >> Does anyone know of any documentation that is more in-depth that the wiki >> and >> the Solr 1.4 book? I'm passed the basic usage of Solr and creating simple >> support plugins. I really want to know all about the inner workings of Solr >> and Lucene. Can someone recommend anything? >> >> Thanks >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Advancded-Reading-tp815382p815382.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > -- Lance Norskog goks...@gmail.com
Re: ContentStreamUpdateRequest - out of memory on a large file
There is a known problem (that I can't find at the moment) where an uploaded file is retained while the next one is processed. When these two successive files are both huge, the coexistence of two giant causes an OOM. Do you have this problem on the first file, second file, or at some time later? But, yes, a Content-Length during upload is obviously a great help. On Thu, May 13, 2010 at 5:39 AM, Grant Ingersoll wrote: > > On May 12, 2010, at 1:58 PM, Christopher Baird wrote: > >> We're running into an out of memory problem when sending a large file to our >> SOLR server using the ContentStreamUpdateRequest. It appears that this >> happens because when the request method of CommonsHttpSolrServer is called >> (this is called even when using a StreamingUpdateSolrServer instance because >> the ContentStreamUpdateRequest class is not an instance of UpdateRequest) an >> InputStreamRequestEntity is used in the PostMethod buffers the content. The >> buffering happens because the content length is not provided and thus >> defaults to "CONTENT_LENGHT_AUTO" which instructs InputStreamRequestEntity >> to buffer the entire content. >> >> >> >> Is there an existing work-around to this? >> >> >> >> If not, can anyone think of why I wouldn't want to update the code to pass >> in the content-length and avoid the buffering (I don't want to walk down a >> path to find out I really stepped in something). > > I can't think of any reason not to put up a patch for it. -- Lance Norskog goks...@gmail.com
Re: grouping in fq
Wait. If the default op is OR, I thought this query: (+category:xyz +price:[100 TO *]) -category:xyz meant "with xyz and range, OR without xyz" because without a plus or minus, OR really means SHOULD (which, bizzarely, is not a keyword). (+category:xyz +price:[100 TO *]) (-category:xyz) Is this what I'm thinking of? Does this really need an OR in the middle? On Thu, May 13, 2010 at 9:48 AM, Chris Hostetter wrote: > > : >> (+category:xyz +price:[100 TO *]) -category:xyz > : > : this one doesn't seem to work (I'm not using a price field, but a text field > : -- using price field here just for example). > > it never will, it's saying only things that are in category xyz and above > 100 dollars can match, but anything in category xyz can not match. > > inherient contradiction. > > : (+category:xyz +price:[100 TO *]) (-category:xyz) -- returns only results > : with category xyz and price >=100 > > you can't have a pure negative clauses in a boolean query -- they match > nothing (by definition: a query that only rejects things doesn't select > anything) the second set of parens creates a boolean query with one > negative clause, so it selects nothing, hence you only get docs matching > the first part. > > > : (+category:xyz +price:[100 TO *]) (*:* -category:xyz) -- returns results > : with category xyz and price >=100 AND results where category!=xyz > > exactly. *:* selects all docs, and -category:xyz then rejects the ones in > category xyz. these are then combined with the docs from the first part > (in cat xyz and above 100) > > so now you have what you want... > > : > >> > How do I implement a requirement like "if category is xyz, > : > >> > the price should > : > >> > be greater than 100 for inclusion in the result set". > > > -Hoss > > -- Lance Norskog goks...@gmail.com
Re: maximum recommended document cache size
The general recommendation is to watch the caches during normal user searches and keep increasing the size until evictions start happening. This may or may not work for your situation. The problem is that the eviction rate does not show "lifetime in cache". So if 90% of the cache sits there indefinitely and the remaining 10% churns, the cache is fine but you'll show zillions of evictions. On Thu, May 13, 2010 at 10:38 AM, Nagelberg, Kallin wrote: > I am trying to tune my Solr setup so that the caches are well warmed after > the index is updated. My documents are quite small, usually under 10k. I > currently have a document cache size of about 15,000, and am warming up 5,000 > with a query after each indexing. Autocommit is set at 30 seconds, and my > caches are warming up easily in just a couple of seconds. I've read of > concerns regarding garbage collection when your cache is too large. Does > anyone have experience with this? Ideally I would like to get 90% of all > documents from the last month in memory after each index, which would be > around 25,000. I'm doing extensive load testing, but if someone has > recommendations I'd love to hear them. > > Thanks, > -Kallin Nagelberg > -- Lance Norskog goks...@gmail.com