lowercase text/strings to be used in list box

2007-10-19 Thread Ben Incani
I have a field which will only contain several values (that include
spaces).

I want to display a list box with all possible values by browsing the
lucene terms.

I have setup a field in the schema.xml file.


  


  


I also tried;


  


  



This allows me to browse all the values no problem, but when it comes to
search the documents I have to use the lucene
org.apache.lucene.analysis.KeywordAnalyzer, when I would rather use the
org.apache.lucene.analysis.standard.StandardAnalyzer and the power of
the default query parser to perform a phrase query such as my_field:(the
value) or my_field:"the value", which don't work?

So is there a way to prevent tokenisation of a field using the
StandardAnalyzer, without implementing your own TokenizerFactory?

Regards

Ben


Re: Solr + Tomcat Undeploy Leaks

2007-10-19 Thread Ed Summers
On 10/18/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
> I have a large number of servers, each running 1-2 containers, each
> having 1-2 solr deployments (fixed).  If I want a new Solr instance,
> I just start a new container (possibly on a new server).  I treat it
> like a process, and can shut it down using kill and other unix tools.

I realize this is a bit off-topic -- but I'm curious what the
rationale was behind having that many solr instances on that many
machines and how they are coordinated. Is it a master/slave setup or
are they distinct indexes? Any further details about your architecture
would be interesting to read about :-)

//Ed


Re: Geographical distance searching

2007-10-19 Thread patrick o'leary




Hi Ryan

Thanks for looking at it, yes definitely I'd like to contribute back.

I'm currently working out some of the logistics to make it easier to
maintain.
I think I'd need a centralized project to do the releases from, it will
have 2 sets
of patches in 2 different trunks (lucene & solr), until it becomes
core in lucene.
So a batteries included release will make that easier for users, and
developers alike.
Once it's accepted I'll move development to ASF.

SOLR-281 looks like it will solve one of my frustrations, another being
that the handlers were final ;-)
Is it close to being committed to the trunk? 

Thanks
P

Ryan McKinley wrote:
This
looks good!
  
  
Are you interested in contributing it to solr core?
  
  
One major thing in the solr pipeline you may want to be aware of is the
search component interface (SOLR-281).
  
  
This would let you make simple component that adds the:
  
  DistanceQuery dq = new DistanceQuery(dlat,dlng,dradius);
  
  dsort = new DistanceSortSource(filter);
  
  
and later adds the 'distance' to each result
  
  
This way you could reuse the other standard search stuff (faceting,
debugging, etc) and would not need to make your own custom
LocalResponseWritter.
  
  
ryan
  
  
  
Doug Daniels wrote:
  
  Hi Patrick,


Was mainly interested in seeing how you did the RequestHandler.  Thanks
for

sending the link!


Best,

Doug



patrick o'leary wrote:

Hi Doug
  
  
What exactly are you looking for?
  
The code for localsolr is still in dev state, but I've left my work
open
  
and available for download
  
at http://www.nsshutdown.com/viewcvs/viewcvs.cgi/localsolr/
  
  
Once I'm happy with it, I'll donate it back in the form of patches
until
  
/ unless it's accepted
  
as a contribution, depending on how folks feel.
  
  
If your talking about the demo ui, it's a little piece of html &
JS, you
  
can pull directly from the jar.
  
I've not included that in the repository.
  
  
HTH
  
P
  
  
Doug Daniels wrote:
  
  Hi Patrick,


Are the solr components of that demo in the repository as well?  I

couldn't

find them there.


Best,

Doug



patrick o'leary wrote:

 
As far as I'm concerned nothings going
to beat PG's GIS calculations,
  
but it's tsearch was
  
a lot slower than myisam.
  
  
My goal was a single solution to reduce our complexity, but am
  
interested to know if combining
  
both an rdbms & lucene works for you. Definitely let me know how it
goes
  
!
  
  
P
  
  
Guillaume Smet wrote:
  
   
  Hi Patrick,


On 9/27/07, patrick o'leary <[EMAIL PROTECTED]> wrote:

   
 p.s after a little tidy up I'll be
adding this to both lucene and
  
solr's repositories if folks feel that it's a useful addition.
  
        
It's definitely very interesting. Did you compare performances of

Lucene with a database allowing you to perform real GIS queries?

I'm more a PostgreSQL guy and I must admit we usually use cube contrib

or PostGIS for this sort of thing and with both, we are capable to use

indexes for proximity queries and they can be pretty fast. Using the

method you used with MySQL is definitely too slow and not used as soon

as you have a certain amount of data in your table.


Regards,


    
-- 
  
Patrick O'Leary
  
  
  
You see, wire telegraph is a kind of a very, very long cat. You pull
his
  
tail in New York and his head is meowing in Los Angeles.
  
 Do you understand this? And radio operates exactly the same way: you
send signals here, they
  
receive them there. The only difference is that there is no cat.
  
  - Albert Einstein
  
  
View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile
  

  
  
  
    
  
-- 
  
Patrick O'Leary
  
  
  
You see, wire telegraph is a kind of a very, very long cat. You pull
his
  
tail in New York and his head is meowing in Los Angeles.
  
 Do you understand this? And radio operates exactly the same way: you
send signals here, they
  
receive them there. The only difference is that there is no cat.
  
  - Albert Einstein
  
  
View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile
  

  
  
  


  
  
  


-- 
Patrick O'Leary


You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio

Re: GET_SCORES flag in SolrIndexSearcher

2007-10-19 Thread Chris Hostetter
: Besides the typo :), the only problem for what I want is the fact that it
: returns null for a default score sort instead of setting
: SortField.FIELD_SCORE.  I want a default score desc sort, but I want the
: scores from the lucene Hits object.  Is the only way to get score values to
: modify the solr code?

i'm a little confused ... Solr doesn't use "Hits" at all ... if you ask 
for scores, you get raw unormalized scores (normalizing is easy enough to 
do after the fact)  if you want the scores returned, all you have to do is 
ask for them in the fl param; if you want to sort by score, all you have 
to do is ask for it in the sort param (it doesn't matter that parseSort 
returns null when the sort string is just "score" ... SolrIndexSearcher 
recognizes a null Sort as being the default sort by score)




-Hoss



Re: Solr, operating systems and globalization

2007-10-19 Thread Ken Krugler

On 18-Oct-07, at 11:43 AM, Chris Hostetter wrote:

: This is easy--I always convert dates to UTC.  Doubly important 
since several

: of our servers operate in different timezones.
:
: Less easy is changing Solr's interpretation of NOW in DateMath to be UTC.
: What is the correct way to go about this?

You lost me there ... "Dates" in java have no concept of timezone, they
are absolute moments in the space/time continuom.  timezones only affect
the parsing/formating of dates.  "NOW" is whenever Solr parses the string,
and when Solr then formats that Date as a string, it formats it in UTC.


Ah, that is good.  So if:

$ date
Thu Oct 18 12:07:42 PDT 2007

Then NOW in Solr will be the absolute date Thu Oct 18 04:07:42 2007 
(which is the current time in UTC)?



i'm guessing you are refering to the notion of rounding down the the
nearest "day" (or anything of less granularity) ... this is currently
hardcoded to be done relative UTC -- but as I mention, this is the type of
thing where ideally Solr would have a setting to let you specify which
timezone the rounding should be relative to.


I'm not sure this is desirable.  If your user's are all over the 
world, you'd ideally want to round to _their_ timezone, but I don't 
see how this is realistic.


We had the same general issue just a few months ago. We can generate 
reports on things like SCM commit activity for a given day. For 
larger customers, they have users in multiple timezones - so what is 
"the" timezone to use?


I wrote a blog post about it at http://blog.krugle.com/?p=267, but 
the short answer is that ultimately we decided to use UTC for all 
times (server, report, API, and UI) as the least heinous of the 
various options.


-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"


Re: Geographical distance searching

2007-10-19 Thread Ryan McKinley

patrick o'leary wrote:
Actually misspoke it's the XMLWritter that's final that was a little 
annoying rather than a handler.

Would have been nice to just extend that, and cut down on the code.



aaah -- Just to be clear, if you could augment the doc list with a 
calculated field ('distance') you would not need to extend XMLWritter - 
correct?


And then it would also work with the JSON/pyton/php etc writers.

ryan


Re: Geographical distance searching

2007-10-19 Thread Chris Hostetter

: aaah -- Just to be clear, if you could augment the doc list with a calculated
: field ('distance') you would not need to extend XMLWritter - correct?
: 
: And then it would also work with the JSON/pyton/php etc writers.

i don't think i ever looked at the patch in question (is it in Jira?) but 
i would definitely recommend going a route like this rather then adding a 
new "primitive" type that all the ResponseWriters would need to know 
about.  

Personally: i don't even think "augmenting DocList" should be done ... 
once the distance score calculations are done, and the documents are 
ordered, returning the distance as numeric value for the clients is really 
no different then returning highlighting info or score explanations ... it 
should be a seperate piece of info in the response.

(i think if we had it to do all over again, i would suggest the same 
approach for returning the "scores" of docs ... at the moment there is 
ambiguity between the relevancy score and the possibility of a field named 
"score")




-Hoss



Re: Geographical distance searching

2007-10-19 Thread patrick o'leary




Actually misspoke it's the XMLWritter that's final that was a little
annoying rather than a handler. 
Would have been nice to just extend that, and cut down on the code.

P

Ryan McKinley wrote:

  
SOLR-281 looks like it will solve one of my frustrations, another being
that the handlers were final ;-)

  
  
What handlers are final that you found annoying?
  
  
  Is it close to being committed to the trunk?


  
  
I hope so ;)  Since this patch reworks the *core* query handlers
(dismax/standard) I really want someone else to look it over before
committing...
  
  
ryan
  
  
  


-- 
Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





Re: Geographical distance searching

2007-10-19 Thread Ryan McKinley


SOLR-281 looks like it will solve one of my frustrations, another being 
that the handlers were final ;-)


What handlers are final that you found annoying?


Is it close to being committed to the trunk?



I hope so ;)  Since this patch reworks the *core* query handlers 
(dismax/standard) I really want someone else to look it over before 
committing...


ryan



Re: Solr, operating systems and globalization

2007-10-19 Thread Chris Hostetter
: Ah, that is good.  So if:
: 
: $ date
: Thu Oct 18 12:07:42 PDT 2007
: 
: Then NOW in Solr will be the absolute date Thu Oct 18 04:07:42 2007 (which is
: the current time in UTC)?

first off: PDT is only 7 hours off UTC

Second: i'm going to get a little bit pedandic... 

NOW is "now" .. it's an abstract DateTime instance, a point in the 
one-dimensional space representing linear time.  TimeZones are an 
artificial concept that exist only in the perspective of an observer who 
places a coordinate system (with an origin) in that dimension.

When you try to express an abstract DateTime instance in an email (or in 
an HTTP response) it stops being an abstract moment in time, and becomes a 
string representation of a DateTime instance relative that coordinate 
system.  If string representation includes a TimeZone delcaration (and as 
much precision as is measurable in your universe, but for now lets gloss 
over that and assume milliseconds is a quantum unit and all string 
representations include millisecond precision), then that string 
representation is unambiguous. (just as refering to a point in space using 
coordinates requires you to have an origin and a unit of distance in 
order for it to be unambiguous).

"Thu Oct 18 12:07:42 PDT 2007" and "Thu Oct 18 05:07:42 UTC 2007" are 
both unambiguous string representations of the same moment in time.  they 
happen to be relative differnet origins, but the information about the 
differnece in their orrigins is included in their string representation.

Date objects in java represent abstract moments in time, and Solr uses 
those abstract objects when doing all of it's date based calculations.  
when it's neccessary to know about the "coordinate system" (in order to 
represent as a string, or to do rounding or math) Solr uses the UTC 
coordinate system.

: I'm not sure this is desirable.  If your user's are all over the world, you'd
: ideally want to round to _their_ timezone, but I don't see how this is
: realistic.

hence the reason it's not implemented yet :)

in theory, we should at least allow the schema to specify what the 
"normal" TimeZone and Locale are for a date field ... and then let clients 
specify alternative per request ... this would only affect the computation 
of info (and perhps the string representations accepted from clients or 
returned to clinents) the string representations stored in the physical 
index should always be in UTC.


-Hoss



Re: [jira] Commented: (SOLR-380) There's no way to convert search results into page-level hits of a "structured document".

2007-10-19 Thread Erik Hatcher


On Oct 18, 2007, at 11:53 AM, Binkley, Peter wrote:

I think the requirements I mentioned in a comment
(https://issues.apache.org/jira/browse/SOLR-380#action_12535296)  
justify

abandoning the one-page-per-document approach. The increment-gap
approach would break the cross-page searching, and would involve about
as much work as the stored map, since the gap would have to vary
depending on the number of terms on each page, wouldn't it? (if there
are 100 terms on page one, the gap has to be 900 to get page two to
start at 1000 - or can you specify the absolute position you want  
for a

term?).


Yeah, one Solr document per page is not sufficient for this purpose.

As for position increment gap and querying across page boundaries, I  
still think having all text in a single field is necessary, but to  
somehow separate pages such that whether a query can control whether  
it spans pages or not.  This could be accomplished trivially with a  
position increment gap.  The gap used only depends on the slop factor  
you need for phrase queries, not on the number of tokens per page.   
"quick fox"~10, for example - the default gap of 100, say, would  
prevent that query from matching across page boundaries.   I haven't  
thought this through thoroughly, so more thinking is needed here.



I think the problem of indexing books (or any text with arbitrary
subdivisions) is common enough that a generic approach like this would
be useful to more people than just me, and justifies some enhancements
within Solr to make the solution easy to reuse; but maybe when we've
figured out the best approach it will become clear how much of it is
worth packing into Solr.


Most definitely this would be a VERY useful addition to Solr.  I know  
of several folks that are working with XTF (which uses a custom  
version of Lucene and other interesting data structures) to achieve  
this capability, but blending that sort of thing into Solr would make  
life a lot better for these projects.



(and just to clarify roles: Tricia's the one who'll actually be coding
this, if it's feasible; I'm just helping to think out requirements and
approaches based on a project in hand.)


There is more to consider here.  Lucene now supports "payloads",  
additional metadata on terms that can be leveraged with custom  
queries.  I've not yet tinkered with them myself, but my  
understanding is that they would be useful (and in fact designed in  
part) for representing structured documents.  It would behoove us to  
investigate how payloads might be leveraged for your needs here, such  
that a single field could represent an entire document, with payloads  
representing the hierarchical structure.  This will require  
specialized Analyzer and Query subclasses be created to take  
advantage of payloads.  The Lucene community itself is just now  
starting to exploit this new feature, so there isn't a lot out there  
on it yet, but I think it holds great promise for these purposes.


Erik



Re: preconfiguring which xsl file to use

2007-10-19 Thread Robert Young
Thanks Eric. For the moment we're only using one requestHandler for
basic querying so that should work OK

Cheers
Rob

On 10/19/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
> On Oct 19, 2007, at 8:30 AM, Robert Young wrote:
> > Is it possible to configure which xsl file to use for a particular
> > queryResponseWriter in the solrconfig.xml?
>
> I don't believe so, but instead I think something like this will work:
>
> class="solr.StandardRequestHandler">
>  
>xslt
>opensearch.xsl
>  
>
>
> And then a ?qt=opensearch should do the trick.   The dilemma here is
> that you can't then toggle between various request handlers, unless
> you mapped them separately.
>
> Erik
>
>
> >
> > I would like to have something like the following so that I don't have
> > put it in for every query.
> >  > class="org.apache.solr.request.XSLTResponseWriter">
> >   5
> >   opensearch.xsl
> > 
> >
> > Any ideas?
> >
> > Cheers
> > Rob
>
>


Re: GET_SCORES flag in SolrIndexSearcher

2007-10-19 Thread John Reuning
Ah ha, perfect.  That worked brilliantly.  In case anyone is interested, 
it turns out that defining "id score" as the field list for the standard 
request handler in solrconfig.xml does the same thing.


  

 
   explicit
   id score

  

Thanks for the help,

-jrr

Henrib wrote:

I believe that keeping you code as is but initializing the query parameters
should do the trick:

HashMap params = new HashMap();
params.add("fl", "id score"); // field list is id & score
...
Regards


Re: [jira] Commented: (SOLR-380) There's no way to convert search results into page-level hits of a "structured document".

2007-10-19 Thread Tricia Williams


I echo the apology for using JIRA to work out ideas on this.

Just thinking out loud here:

  * Is there any reason why the page id should be an integer?  I mean
could the page identifier be an alphanumeric string? 
  * Ideally our project would like to store some page level meta-data

(especially a URL link to page content).  Would this be contorting
the use of a field too much?  If we stored the URL in a dynamic
field URL_*, how would we retrieve this at query time? 
  * Is there a way to alter FieldType to use the Composite design

pattern?  (http://en.wikipedia.org/wiki/Composite_pattern)  In
this way a document could be composed of fields, which could be
composed of fields.  For example: The monograph is a document, a
page in the monograph is a field in the document, the text on the
page is a field in the field, a single piece of metadata for the
page is a field in the field, etc. ( monograph
( page ( fulltext, page_metadata_1, page_metadata_2, etc ),
monograph_metadata_1, monograph_metadata_2, etc ) ).  Maybe what
I'm trying to describe is that Documents can contain Documents?

Following the path of least resistance, I think the first step is to 
create a highlighter which returns positions instead of highlighted 
text.  The next step would be to create an Analyzer and/or Filter and/or 
Tokenizer, as well as a FieldType which creates the page mappings. The 
last step (and the one I am least certain of how it could work) is to 
evolve the position highlighter to get the position to page mapping and 
group the positions by page (number or id) or alternately just write out 
the page (number or id) and drop the position.


Tricia

Binkley, Peter wrote:

(I'm taking this discussion to solr-user, as Mike Klaas suggested; sorry
for using JIRA for it. Previous discussion is at
https://issues.apache.org/jira/browse/SOLR-380).

I think the requirements I mentioned in a comment
(https://issues.apache.org/jira/browse/SOLR-380#action_12535296) justify
abandoning the one-page-per-document approach. The increment-gap
approach would break the cross-page searching, and would involve about
as much work as the stored map, since the gap would have to vary
depending on the number of terms on each page, wouldn't it? (if there
are 100 terms on page one, the gap has to be 900 to get page two to
start at 1000 - or can you specify the absolute position you want for a
term?). 


I think the problem of indexing books (or any text with arbitrary
subdivisions) is common enough that a generic approach like this would
be useful to more people than just me, and justifies some enhancements
within Solr to make the solution easy to reuse; but maybe when we've
figured out the best approach it will become clear how much of it is
worth packing into Solr.

Assuming the two-field approach works
(https://issues.apache.org/jira/browse/SOLR-380#action_12535755), then
what we're really talking about is two things: a token filter to
generate and store the map, and a process like the highlighter to
generate the output. Suppose the map is stored in tokens with the
starting term position for each page, like this:

0:1
345:2
827:3

The output function would then imitate the highlighter to discover term
positions, use the map (either by loading all its terms or by doing
lookups) to convert them to page positions, and generate the appropriate
output. I'm not clear where that output process should live, but we can
just imitate the highlighter.

(and just to clarify roles: Tricia's the one who'll actually be coding
this, if it's feasible; I'm just helping to think out requirements and
approaches based on a project in hand.)

Peter
  






Re: Geographical distance searching

2007-10-19 Thread Ryan McKinley

This looks good!

Are you interested in contributing it to solr core?

One major thing in the solr pipeline you may want to be aware of is the 
search component interface (SOLR-281).


This would let you make simple component that adds the:
  DistanceQuery dq = new DistanceQuery(dlat,dlng,dradius);
  dsort = new DistanceSortSource(filter);

and later adds the 'distance' to each result

This way you could reuse the other standard search stuff (faceting, 
debugging, etc) and would not need to make your own custom 
LocalResponseWritter.


ryan


Doug Daniels wrote:

Hi Patrick,

Was mainly interested in seeing how you did the RequestHandler.  Thanks for
sending the link!

Best,
Doug


patrick o'leary wrote:

Hi Doug

What exactly are you looking for?
The code for localsolr is still in dev state, but I've left my work open
and available for download
at http://www.nsshutdown.com/viewcvs/viewcvs.cgi/localsolr/

Once I'm happy with it, I'll donate it back in the form of patches until
/ unless it's accepted
as a contribution, depending on how folks feel.

If your talking about the demo ui, it's a little piece of html & JS, you
can pull directly from the jar.
I've not included that in the repository.

HTH
P

Doug Daniels wrote:

Hi Patrick,

Are the solr components of that demo in the repository as well?  I
couldn't
find them there.

Best,
Doug


patrick o'leary wrote:
  

As far as I'm concerned nothings going to beat PG's GIS calculations,
but it's tsearch was
a lot slower than myisam.

My goal was a single solution to reduce our complexity, but am
interested to know if combining
both an rdbms & lucene works for you. Definitely let me know how it goes
!

P

Guillaume Smet wrote:


Hi Patrick,

On 9/27/07, patrick o'leary <[EMAIL PROTECTED]> wrote:
  
  

 p.s after a little tidy up I'll be adding this to both lucene and
solr's repositories if folks feel that it's a useful addition.



It's definitely very interesting. Did you compare performances of
Lucene with a database allowing you to perform real GIS queries?
I'm more a PostgreSQL guy and I must admit we usually use cube contrib
or PostGIS for this sort of thing and with both, we are capable to use
indexes for proximity queries and they can be pretty fast. Using the
method you used with MySQL is definitely too slow and not used as soon
as you have a certain amount of data in your table.

Regards,

  
  

--

Patrick O'Leary


You see, wire telegraph is a kind of a very, very long cat. You pull his
tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they

receive them there. The only difference is that there is no cat.
  - Albert Einstein

View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile




  

--

Patrick O'Leary


You see, wire telegraph is a kind of a very, very long cat. You pull his
tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they

receive them there. The only difference is that there is no cat.
  - Albert Einstein

View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile









Best practice to get all highlighting terms for a document?

2007-10-19 Thread Thomas

Hi,

One of the requirements of the application I'm currently working on is 
highlighting
of matching terms not only in the search result page but also when the 
user clicks
on a result and the whole page is displayed. In this particular app it 
is not possible
to just query for the selected document and set hl.fragsize=0. For 
display, I have

to retrieve the document from a different source.

Is there a "best practice" to retrieve all the highlighted terms? I 
thought about setting
hl.fragsize=1 and using an xsltresponsewriter to filter out the 
highlighted keywords.

Is there an easier/cleaner solution?

Thanks,

Thomas


Filtering by document unique key

2007-10-19 Thread Henrib


I'm trying to filter my document collection by an external "mean" that
produces a set of document unique keys.
Assuming this goes into a custom request handler (solr-281 making that
easy), any pitfall using a ConstantScoreQuery (or an equivalent filtering
functionality) as a Solr "filter query" ?
Thanks
-- 
View this message in context: 
http://www.nabble.com/Filtering-by-document-unique-key-tf4654343.html#a13298066
Sent from the Solr - User mailing list archive at Nabble.com.



Re: GET_SCORES flag in SolrIndexSearcher

2007-10-19 Thread Henrib

I believe that keeping you code as is but initializing the query parameters
should do the trick:

HashMap params = new HashMap();
params.add("fl", "id score"); // field list is id & score
...
Regards


John Reuning-2 wrote:
> 
> My first pass was to implement the embedded solr example:
> 
> --
> MultiCore mc = MultiCore.getRegistry();
> SolrCore core = mc.getCore(mIndexName);
> 
> SolrRequestHandler handler = core.getRequestHandler("");
> HashMap params = new HashMap();
> 
> SolrQueryRequest request = new LocalSolrQueryRequest(core, query, 
> "standard", 0, 100, params);
> SolrQueryResponse response = new SolrQueryResponse();
> core.execute(handler, request, response);
> 
> DocList docs = (DocList) response.getValues().get("response");
> --
> 
> Is the only way to access scores to call directly to SolrIndexSearcher? 
>   I was wondering if there's a solr config option I'm missing somewhere 
> that tells the SolrIndexSearcher to retain lucene scores.  I'll keep 
> digging.  Maybe there's a way to set a LocalSolrQueryRequest param that 
> passes the right info through to SolrIndexSearcher?
> 
> Thanks,
> 
> -jrr
> 
> Chris Hostetter wrote:
>> : The scores list in DocIterator is null after a successful query.
>> There's a
>> : flag in SolrIndexSearcher, GET_SCORES, that looks like it should
>> trigger
>> : setting the scores array for the resulting DocList, but I can't figure
>> out how
>> : to set it.  Any suggestions?  I'm using the svn trunk code.
>> 
>> Can you elaborate (ie: paste some code examples) on how you are aquiring 
>> your DocList ... what method are you calling on SolrIndexSearcher? what 
>> arguments are you passing it?
>> 
>> NOTE: the SolrIndexSearcher.getDocList* methods may choose to build 
>> the DocList from a DocSet unless:
>> a) you use a sort that inlcudes score
>> or  b) you use a method sig that takes a flags arg and explicitly set 
>>the GET_SCORES mask on your flags arg.
>> 
>> 
>> 
>> 
>> -Hoss
>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/GET_SCORES-flag-in-SolrIndexSearcher-tf4641637.html#a13297657
Sent from the Solr - User mailing list archive at Nabble.com.



Re: preconfiguring which xsl file to use

2007-10-19 Thread Erik Hatcher


On Oct 19, 2007, at 8:30 AM, Robert Young wrote:

Is it possible to configure which xsl file to use for a particular
queryResponseWriter in the solrconfig.xml?


I don't believe so, but instead I think something like this will work:

  class="solr.StandardRequestHandler">


  xslt
  opensearch.xsl

  

And then a ?qt=opensearch should do the trick.   The dilemma here is  
that you can't then toggle between various request handlers, unless  
you mapped them separately.


Erik




I would like to have something like the following so that I don't have
put it in for every query.

  5
  opensearch.xsl


Any ideas?

Cheers
Rob




NPE on auto-warming and out of memory issues

2007-10-19 Thread briand

We are experiencing OOM issues with a SOLR index that has about 12G of
indexed data with 2GB allocated to the JVM.   We first see these type of
messages in the log: 

Oct 18, 2007 10:25:00 AM org.apache.solr.core.SolrException log
SEVERE: Error during auto-warming of key:+(search_place_type:citi
search_place_type:attract search_place_type:airport
search_place_type:univers):java.lang
.OutOfMemoryError: Java heap space

Oct 18, 2007 10:25:07 AM org.apache.solr.core.SolrException log
SEVERE: Error during auto-warming of
key:+search_place_type:busi:java.lang.OutOfMemoryError: Java heap space

After a log of continuing messages like the ones above we'll see log series
of messages like this:

Oct 18, 2007 10:55:58 AM org.apache.solr.core.SolrException log
SEVERE: Error during auto-warming of
key:[EMAIL PROTECTED]:java.lang.NullPointerException
at org.apache.lucene.index.Term.compareTo(Term.java:91)
at
org.apache.lucene.index.TermInfosReader.getIndexOffset(TermInfosReader.java:112)
at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:147)
at
org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:342)
at org.apache.lucene.index.MultiReader.docFreq(MultiReader.java:220)
at
org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:87)
at org.apache.lucene.search.Similarity.idf(Similarity.java:459)
at
org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java:44)
at
org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:145)
at
org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.(DisjunctionMaxQuery.java:99)
at
org.apache.lucene.search.DisjunctionMaxQuery.createWeight(DisjunctionMaxQuery.java:161)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.(BooleanQuery.java:187)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight2.(BooleanQuery.java:342)
at
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
at org.apache.lucene.search.Query.weight(Query.java:95)
at org.apache.lucene.search.Searcher.createWeight(Searcher.java:171)
at org.apache.lucene.search.Searcher.search(Searcher.java:118)
at org.apache.lucene.search.Searcher.search(Searcher.java:97)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:888)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:805)
at
org.apache.solr.search.SolrIndexSearcher.access$100(SolrIndexSearcher.java:60)
at
org.apache.solr.search.SolrIndexSearcher$2.regenerateItem(SolrIndexSearcher.java:251)
at org.apache.solr.search.LRUCache.warm(LRUCache.java:193)
at
org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1385)
at org.apache.solr.core.SolrCore$1.call(SolrCore.java:488)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)

We have our configuration set for autowarmCount to:







It appears that autowarming has something to do with our OOM.   I've read
that if you set the autowarmCount to 0 or a lower value then the first
request after a commit may take some time.Definitely willing to try
setting the autowarmCount to a lower value.   Does anyone have any other
ideas to help with this autowarming issue?   Thanks. 
-- 
View this message in context: 
http://www.nabble.com/NPE-on-auto-warming-and-out-of-memory-issues-tf4654164.html#a13297439
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Geographical distance searching

2007-10-19 Thread Doug Daniels

Hi Patrick,

Was mainly interested in seeing how you did the RequestHandler.  Thanks for
sending the link!

Best,
Doug


patrick o'leary wrote:
> 
> Hi Doug
> 
> What exactly are you looking for?
> The code for localsolr is still in dev state, but I've left my work open
> and available for download
> at http://www.nsshutdown.com/viewcvs/viewcvs.cgi/localsolr/
> 
> Once I'm happy with it, I'll donate it back in the form of patches until
> / unless it's accepted
> as a contribution, depending on how folks feel.
> 
> If your talking about the demo ui, it's a little piece of html & JS, you
> can pull directly from the jar.
> I've not included that in the repository.
> 
> HTH
> P
> 
> Doug Daniels wrote:
>> Hi Patrick,
>>
>> Are the solr components of that demo in the repository as well?  I
>> couldn't
>> find them there.
>>
>> Best,
>> Doug
>>
>>
>> patrick o'leary wrote:
>>   
>>> As far as I'm concerned nothings going to beat PG's GIS calculations,
>>> but it's tsearch was
>>> a lot slower than myisam.
>>>
>>> My goal was a single solution to reduce our complexity, but am
>>> interested to know if combining
>>> both an rdbms & lucene works for you. Definitely let me know how it goes
>>> !
>>>
>>> P
>>>
>>> Guillaume Smet wrote:
>>> 
 Hi Patrick,

 On 9/27/07, patrick o'leary <[EMAIL PROTECTED]> wrote:
   
   
>  p.s after a little tidy up I'll be adding this to both lucene and
> solr's repositories if folks feel that it's a useful addition.
> 
> 
 It's definitely very interesting. Did you compare performances of
 Lucene with a database allowing you to perform real GIS queries?
 I'm more a PostgreSQL guy and I must admit we usually use cube contrib
 or PostGIS for this sort of thing and with both, we are capable to use
 indexes for proximity queries and they can be pretty fast. Using the
 method you used with MySQL is definitely too slow and not used as soon
 as you have a certain amount of data in your table.

 Regards,

   
   
>>> -- 
>>>
>>> Patrick O'Leary
>>>
>>>
>>> You see, wire telegraph is a kind of a very, very long cat. You pull his
>>> tail in New York and his head is meowing in Los Angeles.
>>>  Do you understand this? 
>>> And radio operates exactly the same way: you send signals here, they
>>> receive them there. The only difference is that there is no cat.
>>>   - Albert Einstein
>>>
>>> View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile
>>> 
>>>
>>>
>>> 
>>
>>   
> 
> -- 
> 
> Patrick O'Leary
> 
> 
> You see, wire telegraph is a kind of a very, very long cat. You pull his
> tail in New York and his head is meowing in Los Angeles.
>  Do you understand this? 
> And radio operates exactly the same way: you send signals here, they
> receive them there. The only difference is that there is no cat.
>   - Albert Einstein
> 
> View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Geographical-distance-searching-tf4524338.html#a13296862
Sent from the Solr - User mailing list archive at Nabble.com.



Re: GET_SCORES flag in SolrIndexSearcher

2007-10-19 Thread John Reuning

So, I found the following in QueryParsing::parseSort

  if( "score".equals(part) ) {
if (top) {
  // If thre is only one thing in the list, just do the regular 
thing...

  if( parts.length == 1 ) {
return null; // do normal scoring...
  }
  lst[i] = SortField.FIELD_SCORE;
}
else {
  lst[i] = new SortField(null, SortField.SCORE, true);
}

Besides the typo :), the only problem for what I want is the fact that 
it returns null for a default score sort instead of setting 
SortField.FIELD_SCORE.  I want a default score desc sort, but I want the 
scores from the lucene Hits object.  Is the only way to get score values 
to modify the solr code?


Thanks,

-jrr

Chris Hostetter wrote:

: The scores list in DocIterator is null after a successful query. There's a
: flag in SolrIndexSearcher, GET_SCORES, that looks like it should trigger
: setting the scores array for the resulting DocList, but I can't figure out how
: to set it.  Any suggestions?  I'm using the svn trunk code.

Can you elaborate (ie: paste some code examples) on how you are aquiring 
your DocList ... what method are you calling on SolrIndexSearcher? what 
arguments are you passing it?


NOTE: the SolrIndexSearcher.getDocList* methods may choose to build 
the DocList from a DocSet unless:

a) you use a sort that inlcudes score
or  b) you use a method sig that takes a flags arg and explicitly set 
   the GET_SCORES mask on your flags arg.





-Hoss





Re: FunctionQuery, DisMax and Highlighting

2007-10-19 Thread Alf Eaton
Mike Klaas wrote:
> On 18-Oct-07, at 8:47 AM, Alf Eaton wrote:
> 
>> I'm currently using the standard request handler for queries, because it
>> provides highlighting (unlike DisMax). I'd also like to be able to use
>> FunctionQuery to boost certain fields.
>>
>> From looking through the lists and JIRA it looks like there has been
>> some work to add highlighting to DisMax queries, but that things seem to
>> be stalled waiting for a more modular approach (Search Components). Is
>> that a fair assessment, and, if so, can anyone suggest the best way to
>> get both highlighting and FunctionQuery at the same time?
> 
> I'm pleased to inform you that DisMax already provides highlighting, in
> exactly the same was as does StandardRequestHandler.

Good, thanks Mike - I must have been getting confused with the
wildcards/DisMax/highlighting situation.

alf


Re: GET_SCORES flag in SolrIndexSearcher

2007-10-19 Thread John Reuning

My first pass was to implement the embedded solr example:

--
MultiCore mc = MultiCore.getRegistry();
SolrCore core = mc.getCore(mIndexName);

SolrRequestHandler handler = core.getRequestHandler("");
HashMap params = new HashMap();

SolrQueryRequest request = new LocalSolrQueryRequest(core, query, 
"standard", 0, 100, params);

SolrQueryResponse response = new SolrQueryResponse();
core.execute(handler, request, response);

DocList docs = (DocList) response.getValues().get("response");
--

Is the only way to access scores to call directly to SolrIndexSearcher? 
 I was wondering if there's a solr config option I'm missing somewhere 
that tells the SolrIndexSearcher to retain lucene scores.  I'll keep 
digging.  Maybe there's a way to set a LocalSolrQueryRequest param that 
passes the right info through to SolrIndexSearcher?


Thanks,

-jrr

Chris Hostetter wrote:

: The scores list in DocIterator is null after a successful query. There's a
: flag in SolrIndexSearcher, GET_SCORES, that looks like it should trigger
: setting the scores array for the resulting DocList, but I can't figure out how
: to set it.  Any suggestions?  I'm using the svn trunk code.

Can you elaborate (ie: paste some code examples) on how you are aquiring 
your DocList ... what method are you calling on SolrIndexSearcher? what 
arguments are you passing it?


NOTE: the SolrIndexSearcher.getDocList* methods may choose to build 
the DocList from a DocSet unless:

a) you use a sort that inlcudes score
or  b) you use a method sig that takes a flags arg and explicitly set 
   the GET_SCORES mask on your flags arg.





-Hoss





Re: allow some IP to access web UI

2007-10-19 Thread Gabriel Sosa
sorry.. I forgot said that I'm using jetty

2007/10/19, Gabriel Sosa <[EMAIL PROTECTED]>:
>
> hi all,
> I'm currently running the wen interface of solr in my devel stage, i want
> to know how can allow some IP to access in the solr administrator in my live
> stage and deny for all others ip..
>
> make sense?
>
> sorry for my english
>
> best regards
>
> gabriel
>
> --
> Los sabios buscan la sabiduría; los necios creen haberla encontrado.
> Gabriel Sosa




-- 
Los sabios buscan la sabiduría; los necios creen haberla encontrado.
Gabriel Sosa


Re: Search results problem

2007-10-19 Thread Maximilian Hütter
Yonik Seeley schrieb:
> On 10/17/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote:
>> I also found this:
>>
>> "Controls the maximum number of terms that can be added to a Field for a
>> given Document, thereby truncating the document. Increase this number if
>> large documents are expected. However, setting this value too high may
>> result in out-of-memory errors."
>>
>> Coming from: http://www.ibm.com/developerworks/library/j-solr2/index.html
>>
>> That might be a problem for me.
>>
>> I was thinking about using copyFields, instead of one large fulltext
>> field. Would that solve my problem, or would the maxFieldLength still
>> apply when using copyFields?
> 
> maxFieldLength is a setting on the IndexWriter and applies to all fields.
> If you want more tokens indexed, simply increase the value of
> maxFieldLength to something like 20 and you should be fine.
> 
> There's no penalty for setting it higher than the largest field you
> are indexing (no diff between 1M and 2B if all your docs have field
> lengths less than 1M tokens anyway).
> 
> -Yonik
> 
Yes, that would be an easy solution, as there is no performance penalty
as say.
I am still unsure, if the maxFieldLength applies to copyFields?
When using copyFields I get an array back for that field (I copied to).
So it seems to be different.
Is there a performance penalty for using copyFields when indexing? How
about the mixed fieldtypes in the source fields? What happens when I
copy an sint based field and a string based field to a string based field?

Best regards,

Max

-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel:  (+49) 0711 - 45 10 17 578
Fax:  (+49) 0711 - 45 10 17 573
e-mail :  [EMAIL PROTECTED]
Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich


Re: Tagging in solr

2007-10-19 Thread Thorsten Scherler
On Fri, 2007-10-19 at 11:01 +0100, Spas Poptchev wrote:
> Hi,
>  
> what i want to do is to store tags that belong to products. Each tag should 
> also store information about how often it was used with a certain product.
> So for example:
>  
> product1
> cool 5=> product1 was tagged 5 times with cool
>  
> What would be the best way to implement this kind of stuff in solr?

There is a wiki page on some brainstorming on how to implement  
tagging within Solr: 

It's easy enough to have a tag_keywords field, but updating a single  
tag_keywords field is not so straightforward without sending the  
entire document to Solr every time it is tagged.  See SOLR-139's  
extensive comments and patches to see what you're getting into.

salu2
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



Re: Solr + Tomcat Undeploy Leaks

2007-10-19 Thread Mike Klaas

On 19-Oct-07, at 7:19 AM, Ed Summers wrote:


On 10/18/07, Mike Klaas <[EMAIL PROTECTED]> wrote:



I realize this is a bit off-topic -- but I'm curious what the
rationale was behind having that many solr instances on that many
machines and how they are coordinated. Is it a master/slave setup or
are they distinct indexes? Any further details about your architecture
would be interesting to read about :-)


Rationale?  Performance!  I can't divulge the exact size of our  
corpus, but it is between zero and 1 billion web documents.  To  
search that many documents efficiently requires distributing over  
many machines.


Most of the architecture is not Solr-related, but it is pretty  
standard large-scale search engine stuff (namely, distributing  
documents using some sort of unique hash across multiple machines).   
I'm sure Nutch's design is similar, and there are several academic  
papers on the subject.


Solr plays the role of index at the nodes--it isn't the primary  
document storage.   Each individual index doesn't look so different  
from a typical-size Solr index: the main differences are 1) splitting  
the stored fields among two Solr apps running in a single jvm for io  
performance (for highlighting) 2) scoring/query tweaks.


cheers,
-Mike


allow some IP to access web UI

2007-10-19 Thread Gabriel Sosa
hi all,
I'm currently running the wen interface of solr in my devel stage, i want to
know how can allow some IP to access in the solr administrator in my live
stage and deny for all others ip..

make sense?

sorry for my english

best regards

gabriel

-- 
Los sabios buscan la sabiduría; los necios creen haberla encontrado.
Gabriel Sosa


preconfiguring which xsl file to use

2007-10-19 Thread Robert Young
Hi,

Is it possible to configure which xsl file to use for a particular
queryResponseWriter in the solrconfig.xml?

I would like to have something like the following so that I don't have
put it in for every query.

  5
  opensearch.xsl


Any ideas?

Cheers
Rob


Tagging in solr

2007-10-19 Thread Spas Poptchev

Hi,
 
what i want to do is to store tags that belong to products. Each tag should 
also store information about how often it was used with a certain product.
So for example:
 
product1
cool 5=> product1 was tagged 5 times with cool
 
What would be the best way to implement this kind of stuff in solr?
 
Btw. its a huge database with over 50 products.
 
cheers,
Spas
_
Invite your mail contacts to join your friends list with Windows Live Spaces. 
It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us