date slider

2010-05-15 Thread Lukas Kahwe Smith
Hi,

I have implemented a search, where all the facet's are offered as checkbox 
style filters along with a fulltext search to first narrow down the result set. 
For this I have implemented the search to run the fulltext search with the 
facets. If additional checkbox filters have been deselected, then I run a 
secondary query where I leave the faceting out to get the actual results (and 
setting rows=0 in the facet query). I just stumbled over an entry in the wiki 
[1], which seems to look like I do not really need that secondary query if 
filters are selected.

But this is not the main topic of my question.

Now I also want to offer a slider to define the range to include in the result 
set. However here I do not want to do faceting, instead I just want to find out 
the min and max date values in the result (without any of the facet filters 
applies) so I know the start and end points for the slider. The user can then 
move the sliders to further filter the result set.

How can I best go about fetching just those min and max values, ideally without 
having to add a separate query just for this?

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

[1] 
http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters



Re: Autosuggest

2010-05-15 Thread Sascha Szott

Hi,

maybe you would like to have a look at solr.ShingleFilterFactory [1] to 
expand your autosuggest to more than one term.


-Sascha

[1] 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory


Blargy wrote:


Thanks for your help and especially your analyzer.. probably saved me a
full-import or two  :)





Re: How to tell which field matched?

2010-05-15 Thread Sascha Szott

Hi,

I'm not sure if debugQuery=on is a feasible solution in a productive 
environment, as generating such extra information requires a reasonable 
amount of computation.


-Sascha

Jon Baer wrote:

Does the standard debug component (?debugQuery=on) give you what you need?

http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22

- Jon

On May 14, 2010, at 4:03 PM, Tim Garton wrote:


All,
 I've searched around for help with something we are trying to do
and haven't come across much.  We are running solr 1.4.  Here is a
summary of the issue we are facing:

A simplified example of our schema is something like this:

   
   
   
   
   
   

When someone does a search we search across the title,
supplement_title, and supplement_pdf_text fields.  When we get our
results, we would like to be able to tell which field the search
matched and if it's a multiValued field, which of the multiple values
matched.  This is so that we can display results similar to:

Example Title
Example Supplement Title
Example Supplement Title 2 (your search matched this document)
Example Supplement Title 3

Example Title 2
Example Supplement Title 4
Example Supplement Title 5
Example Supplement Title 6 (your search matched this document)

etc.

How would you recommend doing this?  Is there some way to get solr to
tell us which field matched, including multiValued fields?  As a
workaround we have been using highlighting to tell which field
matched, but it doesn't get us what we want for multiValued fields and
there is a significant cost to enabling the highlighting.  Should we
design our schema in some other fashion to achieve these results?
Thanks.

-Tim






Re: How to tell which field matched?

2010-05-15 Thread Tim Garton
Additionally, I don't think this gets us what we want with multiValued
fields.  It tells if a multiValued field matched, but not which value
out of the multiple values matched.  I am beginning to suspect that
this information can't be returned and we may have to restructure our
schema.

-Tim

On Sat, May 15, 2010 at 7:12 AM, Sascha Szott  wrote:
> Hi,
>
> I'm not sure if debugQuery=on is a feasible solution in a productive
> environment, as generating such extra information requires a reasonable
> amount of computation.
>
> -Sascha
>
> Jon Baer wrote:
>>
>> Does the standard debug component (?debugQuery=on) give you what you need?
>>
>>
>> http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22
>>
>> - Jon
>>
>> On May 14, 2010, at 4:03 PM, Tim Garton wrote:
>>
>>> All,
>>>     I've searched around for help with something we are trying to do
>>> and haven't come across much.  We are running solr 1.4.  Here is a
>>> summary of the issue we are facing:
>>>
>>> A simplified example of our schema is something like this:
>>>
>>>   >> required="true" />
>>>   >> required="true" />
>>>   
>>>   >> stored="true" multiValued="true" />
>>>   >> stored="true" multiValued="true" />
>>>   >> stored="true" multiValued="true" />
>>>
>>> When someone does a search we search across the title,
>>> supplement_title, and supplement_pdf_text fields.  When we get our
>>> results, we would like to be able to tell which field the search
>>> matched and if it's a multiValued field, which of the multiple values
>>> matched.  This is so that we can display results similar to:
>>>
>>>    Example Title
>>>        Example Supplement Title
>>>        Example Supplement Title 2 (your search matched this document)
>>>        Example Supplement Title 3
>>>
>>>    Example Title 2
>>>        Example Supplement Title 4
>>>        Example Supplement Title 5
>>>        Example Supplement Title 6 (your search matched this document)
>>>
>>>    etc.
>>>
>>> How would you recommend doing this?  Is there some way to get solr to
>>> tell us which field matched, including multiValued fields?  As a
>>> workaround we have been using highlighting to tell which field
>>> matched, but it doesn't get us what we want for multiValued fields and
>>> there is a significant cost to enabling the highlighting.  Should we
>>> design our schema in some other fashion to achieve these results?
>>> Thanks.
>>>
>>> -Tim
>>
>
>


Re: Autosuggest

2010-05-15 Thread Andrzej Bialecki
On 2010-05-15 02:46, Blargy wrote:
> 
> Thanks for your help and especially your analyzer.. probably saved me a
> full-import or two  :)
> 

Also, take a look at this issue:

https://issues.apache.org/jira/browse/SOLR-1316


-- 
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Autosuggest

2010-05-15 Thread Blargy

Andrzej is this ready for production usage?

"Hopefully in the future we can include user click through rates to boost
those terms/phrases higher"
 - This could be huge!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p819762.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Autosuggest

2010-05-15 Thread Blargy

Maybe I should have phrased it as: "Is this ready to be used with Solr 1.4?"

Also, as Grang asked in the thread, what is the actual status of that patch?
Thanks again!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p819765.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to tell which field matched?

2010-05-15 Thread Jon Baer
Sorry my response wasn't to actually use debugQuery on for production it was 
more of wondering if it (the component) gave you the insight data you were 
looking for, on a side note Im also interested in this type of component 
because there are a number of projects I have worked on recently where it seems 
people outside of tuning the index want to know "why did my query match these 
results?" in some sort of ~plain english explanation~.

I have the feeling what you want is possible it's just not finding it's way 
into the result set yet (guess) or needs a plugin.

- Jon  

On May 15, 2010, at 11:16 AM, Tim Garton wrote:

> Additionally, I don't think this gets us what we want with multiValued
> fields.  It tells if a multiValued field matched, but not which value
> out of the multiple values matched.  I am beginning to suspect that
> this information can't be returned and we may have to restructure our
> schema.
> 
> -Tim
> 
> On Sat, May 15, 2010 at 7:12 AM, Sascha Szott  wrote:
>> Hi,
>> 
>> I'm not sure if debugQuery=on is a feasible solution in a productive
>> environment, as generating such extra information requires a reasonable
>> amount of computation.
>> 
>> -Sascha
>> 
>> Jon Baer wrote:
>>> 
>>> Does the standard debug component (?debugQuery=on) give you what you need?
>>> 
>>> 
>>> http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22
>>> 
>>> - Jon
>>> 
>>> On May 14, 2010, at 4:03 PM, Tim Garton wrote:
>>> 
 All,
 I've searched around for help with something we are trying to do
 and haven't come across much.  We are running solr 1.4.  Here is a
 summary of the issue we are facing:
 
 A simplified example of our schema is something like this:
 
   >>> required="true" />
   >>> required="true" />
   
   >>> stored="true" multiValued="true" />
   >>> stored="true" multiValued="true" />
   >>> stored="true" multiValued="true" />
 
 When someone does a search we search across the title,
 supplement_title, and supplement_pdf_text fields.  When we get our
 results, we would like to be able to tell which field the search
 matched and if it's a multiValued field, which of the multiple values
 matched.  This is so that we can display results similar to:
 
Example Title
Example Supplement Title
Example Supplement Title 2 (your search matched this document)
Example Supplement Title 3
 
Example Title 2
Example Supplement Title 4
Example Supplement Title 5
Example Supplement Title 6 (your search matched this document)
 
etc.
 
 How would you recommend doing this?  Is there some way to get solr to
 tell us which field matched, including multiValued fields?  As a
 workaround we have been using highlighting to tell which field
 matched, but it doesn't get us what we want for multiValued fields and
 there is a significant cost to enabling the highlighting.  Should we
 design our schema in some other fashion to achieve these results?
 Thanks.
 
 -Tim
>>> 
>> 
>> 



Re: Connection Pool

2010-05-15 Thread Lance Norskog
Connection spooling is specified by the underlying apache commons
connection manager when you create the Server.

The SUSS does socket pooling by default and is the preferred way to do
concurrent indexing. There are some quirks in the Server
implementation set, and SUSS avoids them. Unless you are willing to
root around in the SolrJ Server code and understand exactly how it
works, stay with the SUSS.

On Fri, May 14, 2010 at 6:44 AM, gabriele renzi  wrote:
> On Fri, May 14, 2010 at 3:35 PM, Anderson vasconcelos
>  wrote:
>> Hi
>> I wanna to know if has any connection pool client to manage the connections
>> with solr. In my system, we have a lot of concurrency index request. I cant
>> shared my  connection, i need to create one per transaction. But if i create
>> one per transaction, i think the performance will down.
>>
>> How you resolve this problem?
>
> The commonsHttpSolrServer class does connection pooling, and IIRC also
> the StreamingUpdateSolrServer.
>
>
>
> --
> blog en: http://www.riffraff.info
> blog it: http://riffraff.blogsome.com
>



-- 
Lance Norskog
goks...@gmail.com


Re: Short DismaxRequestHandler Question

2010-05-15 Thread MitchK

Okay, I will do so in future, if another problem like this occurs.

At the moment, everything is fine after I followed your suggestions.

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Short-DismaxRequestHandler-Question-tp775913p820355.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: multi-valued associated fields

2010-05-15 Thread Lance Norskog
Here's the problem with mixing dissimilar text: relevance. Your text
relevance depends on a document's "delta" with all other documents in
the index. If you index nothing but technical papers, searching a
technical term will find what you expect. If you mix technical papers
and movie titles, text query will be useless.

On Thu, May 13, 2010 at 12:06 PM, Eric Grobler
 wrote:
> Hi Ahmed
>
> Thanks again for sharing your insight and experience.
> I will discuss the multi-core approach with members of our team.
>
> Regards
> Eric
>
> On Wed, May 12, 2010 at 9:24 PM, ahammad  wrote:
>
>>
>> In our deployment, we thought that complications might arise when
>> attempting
>> to hit the Solr server with addresses of too many cores. For instance, we
>> have 15+ cores running at the moment. At the worst case, we will have to
>> use
>> all 15+ addresses of all the cores to search all our data. What we
>> eventually did was to combine all the cores into a single core, which will
>> basically give us a more clean solution. You will get the simplicity of
>> querying one core, but the flexibility of modifying cores separately.
>>
>> Basically, we have all the cores indexing separately. We set up a script
>> that would use the index merge functionality of Solr to combine all the
>> indexes into a single index accessible through one core. Yes, there will be
>> some overhead on the server, but I believe that it's a good compromise. In
>> our case, we have multiple servers at our disposal, so this was not a
>> problem to implement. It all depends on your data set and the volume of
>> documents that you will be indexing.
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813419.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: sort by function

2010-05-15 Thread MitchK

Can you provide us some more information on what you really want to do?
Like the examples in the wiki said, the returned value of the function query
is multiplied with the score - you can boost your returned value from the
function query, if you like to do so. 

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-by-function-tp814380p820359.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Advancded Reading

2010-05-15 Thread Lance Norskog
One my tricks for studying a deep project is to look at bug
fixes/release notes/new features. Understanding one little bug fix
will cause you to learn a subset of the code. Once you have that
structure in your head, exploring more bugs & features on the Jira
will fill out that structure.

Lance

On Thu, May 13, 2010 at 11:34 AM, Peter Sturge
 wrote:
> A truly indispensable resource is Yonik's Mastering Solr 1.4 on-demand
> webinar:
>
>
> http://www.lucidimagination.com/solutions/Webinars/mastering-solr-1.4-with-yonik-seeley
>
>
>
>
> On Thu, May 13, 2010 at 6:04 PM, Blargy  wrote:
>
>>
>> Does anyone know of any documentation that is more in-depth that the wiki
>> and
>> the Solr 1.4 book? I'm passed the basic usage of Solr and creating simple
>> support plugins. I really want to know all about the inner workings of Solr
>> and Lucene. Can someone recommend anything?
>>
>> Thanks
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Advancded-Reading-tp815382p815382.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: ContentStreamUpdateRequest - out of memory on a large file

2010-05-15 Thread Lance Norskog
There is a known problem (that I can't find at the moment) where an
uploaded file is retained while the next one is processed. When these
two successive files are both huge, the coexistence of two giant
causes an OOM.

Do you have this problem on the first file, second file, or at some time later?

But, yes, a Content-Length during upload is obviously a great help.

On Thu, May 13, 2010 at 5:39 AM, Grant Ingersoll  wrote:
>
> On May 12, 2010, at 1:58 PM, Christopher Baird wrote:
>
>> We're running into an out of memory problem when sending a large file to our
>> SOLR server using the ContentStreamUpdateRequest.  It appears that this
>> happens because when the request method of CommonsHttpSolrServer is called
>> (this is called even when using a StreamingUpdateSolrServer instance because
>> the ContentStreamUpdateRequest class is not an instance of UpdateRequest) an
>> InputStreamRequestEntity is used in the PostMethod buffers the content.  The
>> buffering happens because the content length is not provided and thus
>> defaults to "CONTENT_LENGHT_AUTO" which instructs InputStreamRequestEntity
>> to buffer the entire content.
>>
>>
>>
>> Is there an existing work-around to this?
>>
>>
>>
>> If not, can anyone think of why I wouldn't want to update the code to pass
>> in the content-length and avoid the buffering (I don't want to walk down a
>> path to find out I really stepped in something).
>
> I can't think of any reason not to put up a patch for it.



-- 
Lance Norskog
goks...@gmail.com


Re: grouping in fq

2010-05-15 Thread Lance Norskog
Wait. If the default op is OR, I thought this query:

(+category:xyz +price:[100 TO *]) -category:xyz

meant "with xyz and range, OR without xyz" because without a plus or
minus, OR really means SHOULD (which, bizzarely, is not a keyword).

(+category:xyz +price:[100 TO *]) (-category:xyz)

Is this what I'm thinking of? Does this really need an OR in the middle?

On Thu, May 13, 2010 at 9:48 AM, Chris Hostetter
 wrote:
>
> : >> (+category:xyz +price:[100 TO *]) -category:xyz
> :
> : this one doesn't seem to work (I'm not using a price field, but a text field
> : -- using price field here just for example).
>
> it never will, it's saying only things that are in category xyz and above
> 100 dollars can match, but anything in category xyz can not match.
>
> inherient contradiction.
>
> : (+category:xyz +price:[100 TO *]) (-category:xyz) -- returns only results
> : with category xyz and price >=100
>
> you can't have a pure negative clauses in a boolean query -- they match
> nothing (by definition: a query that only rejects things doesn't select
> anything) the second set of parens creates a boolean query with one
> negative clause, so it selects nothing, hence you only get docs matching
> the first part.
>
>
> : (+category:xyz +price:[100 TO *]) (*:* -category:xyz) -- returns results
> : with category xyz and price >=100 AND results where category!=xyz
>
> exactly.  *:* selects all docs, and -category:xyz then rejects the ones in
> category xyz.  these are then combined with the docs from the first part
> (in cat xyz and above 100)
>
> so now you have what you want...
>
> : > >> > How do I implement a requirement like "if category is xyz,
> : > >> > the price should
> : > >> > be greater than 100 for inclusion in the result set".
>
>
> -Hoss
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: maximum recommended document cache size

2010-05-15 Thread Lance Norskog
The general recommendation is to watch the caches during normal user
searches and keep increasing the size until evictions start happening.
This may or may not work for your situation.

The problem is that the eviction rate does not show "lifetime in
cache". So if 90% of the cache sits there indefinitely and the
remaining 10% churns, the cache is fine but you'll show zillions of
evictions.

On Thu, May 13, 2010 at 10:38 AM, Nagelberg, Kallin
 wrote:
> I am trying to tune my Solr setup so that the caches are well warmed after 
> the index is updated. My documents are quite small, usually under 10k. I 
> currently have a document cache size of about 15,000, and am warming up 5,000 
> with a query after each indexing. Autocommit is set at 30 seconds, and my 
> caches are warming up easily in just a couple of seconds. I've read of 
> concerns regarding garbage collection when your cache is too large. Does 
> anyone have experience with this? Ideally I would like to get 90% of all 
> documents from the last month in memory after each index, which would be 
> around 25,000. I'm doing extensive load testing, but if someone has 
> recommendations I'd love to hear them.
>
> Thanks,
> -Kallin Nagelberg
>



-- 
Lance Norskog
goks...@gmail.com