Re: Solr nightly build and the multicore mode

2008-03-12 Thread Nguyen Kien Trung

Hi Vinci,

Q1. I don't know how to set the path...WHERE should I put the core1 and
core0 folder? somewhare in the solr/home or somewhere in webapps?, and make
the admin panel working?
  

for the multicore.xml

 
 

the directory structure is as follows:

solr.home
| core1
|--- conf
  |--- *.xml
| core2
|--- conf
  |--- *.xml
| multicore.xml

Q2 how can I disable the multicore function when multicore.xml exist? just
remove the second core?
  

Remove the second core or de-register the second core upon start up.

Hope it helps

Trung


Re: [Update] Solr can be started from jetty but not tomcat

2008-03-12 Thread Ryan McKinley

Vinci wrote:
Hi all, 


after several hour I make the solr works a little bit: the jetty version
works, but the tomcat version doesn't.



To me it looks like the xml parser is not loading properly... check the 
last line of your trace.



at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:433)
Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to
create an XPathFactory for the default object model:
http://java.sun.com/jaxp/xpath/dom with the
XPathFactoryConfigurationException:
javax.xml.xpath.XPathFactoryConfiguration


I'm not sure how that is configured in tomcat though...


Re: How to get incrementPositionGap value from IndexSchema ?

2008-03-12 Thread Renaud Delbru

Hi Chris,

Thanks for your reply. Indeed, there is the getPositionIncrementGap 
method, I forgot it.


I need this information to be able to configure my query processor. I 
have extended Solr with a new query parser to be able to search document 
on a sentence-based granularity. Each sentence is a fieldable instance 
of a field 'sentences', and I execute span queries to be able to match a 
boolean combination of terms on a sentence-level, not a document-level.

I hope this explanation is clear and makes sense.

Regards.


Chris Hostetter wrote:

: I am looking for a way to access the incrementPositionGap value defined for a
: field type in the schema.xml.

I think you mean "positionIncrementGap"

It's a property of the  in schema.xml, but internally it's 
passed to SolrAnalyzer.setPositionIncrementGap.  if you want to 
programaticly know what the "positionIncrementGap" is for any analyzer of 
any field or fieldtype regardless of wether or not it's a SolrAnalyzer, 
just use Analzer.getPositionIncrementGap(String fieldName) 


ie: myFieldType.getAnalyzer().getPositionIncrementGap(myFieldName)


If you don't mind me asking:  why do you want/need this information in 
your custom code?



-Hoss
  



--
Renaud Delbru,
E.C.S., Ph.D. Student,
Semantic Information Systems and
Language Engineering Group (SmILE),
Digital Enterprise Research Institute,
National University of Ireland, Galway.
http://smile.deri.ie/


Re: Query Level Boosting

2008-03-12 Thread Ryan McKinley

oleg_gnatovskiy wrote:

Hello. I was wondering if anyone knew a way to do query level boosting with
SolrJ. On the http client I could just do something like sku:123^2.3 which
would boost the sky query 2.3 points.


boosting is part of the query string, try:
 query.setQuery( "sku:123^2.3" );




Re: schema help

2008-03-12 Thread Geoffrey Young



Rachel McConnell wrote:

Our Solr use consists of several rather different data types, some of
which have one-to-many relationships with other types.  We don't need
to do any searching of quite the kind you describe, but I have an idea
about it, depending on what you need to do with the book data.  It is
rather hacky, but maybe you can improve it.


coolio, thanks :)

[snip]



If your 'authors' 'write' 'books' with great frequency, you'd need to
update a lot...


yeah, unfortunately that's the case :)

I was using the book analogy because I figured it was simple to explain, 
not necessarily because I was trying to be vague :)



Another possibility is to do two searches, with this kind of
structure, which sort of mimics an RDBMS:
* everything in Solr has a field, type (book, author, library, etc).
these can be filtered on a search by search basis
* books have a field, authorId, uniquely referencing the author
* your first search will restricted to just authors, from which you
will extract the IDs.
* your second search will be restricted to just books, whose authorId
field is exactly one of the IDs from the first search


I think this approach solves the mindset issues I was having - I didn't 
want to be left with a schema like this


  authorId
  bookID1
  bookID2
  ...

but since lucene allows for all kinds of slots to exist and be empty, it 
seems I can simplify that to


  authorId
  bookId

and use multiple queries to satisfy the display needs.  it's probably 
more a duh! moment for the majority, but lucene is sufficiently 
different from what I'm used to that it's taking me a bit of time :)




As you have noticed, Lucene is not an RDBMS.  Searching through all
the text of all the books is more the use it was designed around; of
course the analogy might not be THAT strong with your need!


I think the fulltext search capabilities will serve us well for some 
aspects of our search needs.  the stemming, language, and other filters 
will definitely be a help to just about everything we do.


speaking of language, this is my last question for now...

what's the idiomatic way to represent multiple languages?  left to my 
own devices I'd probably do something like


   name_en-us
   name-es-us

anyway, thanks so much for your help.

--Geoff


Re: return only sorted Field, but with a different Field Name

2008-03-12 Thread Ryan McKinley

Chris Hostetter wrote:
: 
: For example, say I want to sort by the field '162_sortable_s' then I add a

: parameter like so 'sort=162_sortable_s.' I need to change the settings so
: that when the result set is returned from solr, it takes the values of
: '162_sortable_s' and inserts them into a separate field called 'SortedField'
: so that the return doc looks like this:

there is nothing like this in solr right now, it doesn't seem like 
something that should be odne in solr, as it would be a simple translation 
that could be done via an XSLT or some client layer code.




It may be more work then it is worth, but I would like to see something 
comparable to the SQL 'AS' syntax:


 &fl=162_sortable_s as SortedField,name,id

This would be good because it solves the problem for non XML based 
responses also.


ryan



Re: schema help

2008-03-12 Thread Geoffrey Young



the trouble I'm having is one of dimension.  an author has many, many
 attributes (name, birthdate, biography in $language, etc).  as does
each book (title in $language, summary in $language, genre, etc).  as
does each library (name, address, directions in $language, etc).  so
an author with N books doesn't seem to scale very well in the flat 
representations I'm finding in all the lucene/solr docs and

examples... at least not in some way I can wrap my head around.

OG: I'm not sure why the number of attributes worries you.  Imagine
is as a wide RDBMS table, if it helps.  Indices with dozens of fields
are not uncommon.


it's not necessarily the number of fields, it's the Attribute1 .. 
AttributeN-style numbering that worries me.  but I think it's all 
starting to make sense now... if wanting to pull data in multiple 
queries was my holdup.



OG: You certainly can do that.  I'm not sure I understand where the
hard part is.  You seem to know what attributes each entity has.
Maybe you are confused by how to handle N different types of entities
in a single index? 


yes... or, more properly, how to relate them to eachother.

I understand that the schema can hold tons of attributes that are unused 
in different documents.  my question seems to be how to organize my data 
such that I can answer the question "how do I get a list of libraries 
with $book like $pattern" - where does the de-normalization typically 
occur?  if a document fully represents "a book by an author in a 
library" such that the same book (with all it's attributes) is in my 
index multiple times (one for each library) how do I drill down to 
showing just the directions to a specific library?



(I'm assuming a single index is what you currently
have in mind)


using different indices is what my lucene+compass counterparts are 
doing.  I couldn't find an example of that in the solr docs (unless the 
answer is running multiple, distinct instances at the same time)



eew :)  seriously, though, that's what we have now - all rdbms
driven. if solr could only conceptually handle the initial lookup
there wouldn't be much point.

OG: Well, there might or might not be, depending on how much data you
have, how flexible and fast your RDBMS-powered (full-text?) search,
and so on.  The Lucene/Solr for full-text search + RDBMS/BDB for
display data is a common combination.


"the decision has been made to use lucene to replace all rdbms 
functionality for search"


*cough*

:)



maybe I'm thinking about this all wrong (as is to be expected :), but
I just can't believe that nobody is using solr to represent data a
bit more complex than the examples out there.

OG: Oh, lots of people are, it's just that examples are simple, so
people new to Solr, Lucene, etc. have easier time learning.


:)

thanks for your help here.

--Geoff


sorting on a multivalued field

2008-03-12 Thread Joshua Reedy
I'd like to be able to sort documents based on date.  For ascending
sort, the first date in the future relative to the time of the search
would be used as the sort date.  If all dates are in the past, the
last date should be used.  For descending sort, the opposite . . .

document 1:
id: 1
date: 2008-03-10 11am, 2008-03-10 3pm, 2008-04-10 11am

document 2:
id: 2
date: 2008-02-10 12pm, 2008-03-10 1pm, 2008-03-11 12pm, 2008-03-11
1pm, 2008-04-02 12pm

at  11:30am on 2008-03-10, doc2 sort before doc1 in an ascending sort
and at 1:30pm, doc1 would sort before doc2


It appears that adding sort functions would be done in Lucene, and not
in solr.  I'm not sure I want to go down that path, so I'm wondering
if there's a way to accomplish this with solr.  From recent
discussions, it sounds like I might be able to do this with some boost
magic.  Unfortunately, I haven't found any examples of boosting that
seem close to what I want to do.



thanks,
joshua

-- 
-
Be who you are and say what you feel,
because those who mind don't matter and
those who matter don't mind.
 -- Dr. Seuss


Re: Companies Using Solr

2008-03-12 Thread Ryan Grange
It's definitely not immutable.  A while back I added DollarDays 
International.  Just remember to be polite and add yourself to the end 
of the list.


Ryan Grange, IT Manager
DollarDays International, LLC
[EMAIL PROTECTED]
480-922-8155 x106



oleg_gnatovskiy wrote:


Clay Webster wrote:
  

Hey Folks,

Reminder: http://wiki.apache.org/solr/PublicServers lists the sites using
Solr.  The listing is a bit thin.  I know many people don't know about the
list or don't have the time to add themselves to the list.  I'd like to be
able to promote open sourcing more systems (like Solr) and this
information
would help show it is helping a large community.

Feel free to reply directly to me and I can add you.

Thanks.

--cw

Clay Webster
Associate VP, Platform Infrastructure
CNET, Inc. (Nasdaq:CNET)





How would you add to that list anyway? It's immutable.
  


Re: SolrQuery.getStart(), SolrQuery.getRows(), always return null

2008-03-12 Thread Ryan McKinley
I just committed a change to SolrQuery so that getRows and getStart use 
"getInt()" rather then "getFieldInt()"


Thanks for pointing this out!


Thijs wrote:
I'm running into a problem where the calls to SolrQuery.getStart(), 
SolrQuery.getRows() always return null

I'm using trunk of 1.3
I think I also found the problem.

If I use SolrQuery.setRows(20), the value is set in the LinkedHashMap 
with the key-values {"rows", {"20"}} in method set() (line 66 in 
ModifiableSolrParams)
However when I use  SolrQuery.getRows(), the values are retrieved 
through SolrParams.getIntField("rows", null) --> 
SolrParams.getFieldParam. This method first calls method fpname("rows", 
null) which returns "f.rows.null" and that value is used as key in 
ModifiableSolrParams to get the value from the LinkedHashMap. This of 
course will return nothing because that key is not in the hashmap.

So if I change methode fpname in SolrParams to

protected String fpname(String field, String param) {
   return "field+(param==null?"":param);
//return "f."+field+'.'+param;
 }

it works and getStart en  getRows returns the values previously set.

I'm not sure this is the correct solution, could someone have a look and 
if ok, commit it to the codebase?


Thanks

Thijs







Re: return only sorted Field, but with a different Field Name

2008-03-12 Thread Chris Hostetter

: > there is nothing like this in solr right now, it doesn't seem like something
: > that should be odne in solr, as it would be a simple translation that could
: > be done via an XSLT or some client layer code.

: It may be more work then it is worth, but I would like to see something
: comparable to the SQL 'AS' syntax:
: 
:  &fl=162_sortable_s as SortedField,name,id
: 
: This would be good because it solves the problem for non XML based responses
: also.

But regardless of the format, If the client is the one saying "ugly_fieldname 
AS pretty_fieldname" then why not let the client do the translation?

I can see having server side configuration to "hide" the real field names 
from clients ( http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams )  
I just dont' see the benefit in having the client ask the server to do 
something the client can do just as easily.

If, however, some of then ideas from that FieldAliasesAndGlobsInParams 
wiki were implemented, then we probably would need someway to configure 
generic fieldname renaming in the response ... if you've got server 
configuration that maps pretty_fieldname to ugly_fieldname for the 
purposes of sorting you need to ensure that the users can see 
pretty_fieldname in the response (hmm, the sort field could probably be 
handled as a special case, just like score field).  Likewise if you have 
server config to translate "fl=pretty1,pretty2,name,nice*" to 
"fl=uglyA,uglyB,name,text_*" then you need to make sure the do the reverse 
translation when output the stored field values for uglyA, uglyB and the 
text_ fields (which could get complicated when dealing with regex based 
field mappings ... i guess we could just require the solr admin to 
cofigure both regex if they want both the param translation and the 
response translation)


-Hoss



Empty fields - dynamic

2008-03-12 Thread Lance Norskog
Is there a way to specify that a dynamic field cannot have an empty string?
With static fields, you can enforce this with 'required="true"
default="-1"'.
 
Is there any way to do enforce this in the shipped Solr 1.2?  One could
write a new custom analyzer that requires input. But is there anything
available out of the box?
 
Thanks,
 
Lance Norskog