Re: multicore shards and relevancy score

2009-09-15 Thread Shalin Shekhar Mangar
On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen wrote:

>
> I've done a few experiments with searching two cores with the same schema
> using the shard syntax. (using solr 1.3)
>
> My use case is that I want to have multiple cores because a few different
> people will be managing the indexing, and that will happen at different
> times. The data, however, is homogeneous.
>
>
Multiple cores were not built for distributed search. It is inefficient as
compared to a single index. But if you want to use them that way, that's
your choice.


> I've noticed in my tests that the results are not interwoven, but it might
> just be my test data. In other words, all the results from one core appear,
> then all the results from the other core.
>
> In thinking about it, it would make sense if the relevancy scores for each
> core were completely independent of each other. And that would mean that
> there is no way to compare the relevancy scores between the cores.
>
> In other words, I'd like the following results:
>
> - really relevant hit from core0
> - pretty relevant hit from core1
> - kind of relevant hit from core0
> - not so relevant hit from core1
>
> but I get:
>
> - really relevant hit from core0
> - kind of relevant hit from core0
> - pretty relevant hit from core1
> - not so relevant hit from core1
>
> So, are the results supposed to be interwoven, and I need to study my data
> more, or is this just not something that is possible?
>
>
The only difference wrt relevancy between a distributed search and a
single-node search is that there is no distributed IDF and therefore a
distributed search assumes a random distribution of terms among shards. I'm
not sure if that is what you are seeing.


> Also, if this is insurmountable, I've discovered two show stoppers that
> will prevent using multicore in my project (counting the lack of support for
> faceting in multicore). Are these issues addressed in solr 1.4?
>
>
Can you give more details on what these two issues are?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Dataimport MySQLNonTransientConnectionException: No operations allowed after connection closed

2009-09-15 Thread Noble Paul നോബിള്‍ नोब्ळ्
First of all let us confirm this issue is fixed in 1.4.

1.4 is stable and a lot of people are using it in production and it is
going to be released pretty soon

On Mon, Sep 14, 2009 at 8:05 PM, palexv  wrote:
>
> I am using 1.3
> Do you suggest 1.4 from developer trunk? I am concern if it stable. Is it
> safe to use it in big commerce app?
>
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> which version of Solr are you using. can you try with a recent one and
>> confirm this?
>>
>> On Mon, Sep 14, 2009 at 7:45 PM, palexv  wrote:
>>>
>>> I know that my issue is related to
>>> http://www.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html#a19160129
>>> and https://issues.apache.org/jira/browse/SOLR-728
>>> but my case is quite different.
>>> As I understand patch at https://issues.apache.org/jira/browse/SOLR-728
>>> prevents concurrent executing of import operation but does NOT put
>>> command
>>> in a queue.
>>>
>>> I have only few records to index. When run full reindex - it works very
>>> fast. But when I try to rerun this even after a couple of seconds - I am
>>> getting
>>> Caused by:
>>> com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
>>> No operations allowed after connection closed.
>>>
>>> At this time, when I check status - it says that status is idle and
>>> everything was indexed success.
>>> Second run of reindex without exception I can run only after 10 seconds.
>>> It does not work for me! If I apply patch from
>>> https://issues.apache.org/jira/browse/SOLR-728 - I will unable to reindex
>>> in
>>> next 10 seconds as well.
>>> Any suggestions?
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436605.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436948.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Is it possible to query for "everything" ?

2009-09-15 Thread Erik Hatcher
[* TO *] on the standard handler is an implicit query of  
default_field_name:[* TO *] which matches only documents that have the  
default field on them.   So [* TO *] and *:* are two very different  
queries, only the latter guaranteed to match all documents.


Erik


On Sep 14, 2009, at 9:39 PM, Bill Au wrote:


For the standard query handler, try [* TO *].
Bill

On Mon, Sep 14, 2009 at 8:46 PM, Jay Hill   
wrote:



With dismax you can use q.alt when the q param is missing:
q.alt=*:*
should work.

-Jay


On Mon, Sep 14, 2009 at 5:38 PM, Jonathan Vanasco 
wrote:


Thanks Jay & Matt

I tried *:* on my app, and it didn't work

I tried it on the solr admin, and it did

I checked the solr config file, and realized that it works on  
standard,

but

not on dismax, queries

So i have my app checking *:* on a standard qt, and then filtering  
what I

need on other qts!

I would never have figured this out without you two!







Re: Single Core or Multiple Core?

2009-09-15 Thread Noble Paul നോബിള്‍ नोब्ळ्
A large majority of users use single core ONLY. It is hard to explain
them the need for an extra componentin the url.

I would say it is a design problem which we should solve instead of
asking users to change

On Tue, Sep 15, 2009 at 3:12 AM, Uri Boness  wrote:
> IMO forcing the users to do configuration change in Solr or in their
> application is the same thing - it all boils down to configuration change
> (I'll be very surprised if someone is actually hardcoding the Solr URL in
> their system - most probably it is configurable, and if it's not, forcing
> them to change it is actually a good thing).
>>
>> Besides,
>> if there's only one core, why need a name?
>
> Consistency. Having a default core as Israel suggested can probably do the
> trick. But, at first it might seem that having a default core and not
> needing to specify the core name will make it easier for users to use. But I
> actually disagree - don't under estimate the power of being consistent. I
> rather have a manual telling me "this is how it works and it always work
> like that in all scenarios" then having something like "this is how it works
> but if you have scenario A then it works differently and you have to do this
> instead".
>
> Shalin Shekhar Mangar wrote:
>>
>> On Mon, Sep 14, 2009 at 8:16 PM, Uri Boness  wrote:
>>
>>
>>>
>>> Is it really a problem? I mean, as i see it, solr to cores is what RDBMS
>>> is
>>> to databases. When you connect to a database you also need to specify the
>>> database name.
>>>
>>>
>>>
>>
>> The problem is compatibility. If we make solr.xml compulsory then we only
>> force people to do a configuration change. But if we make a core name
>> mandatory, then we force them to change their applications (or the
>> applications' configurations). It is better if we can avoid that. Besides,
>> if there's only one core, why need a name?
>>
>>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Dealing with term vectors

2009-09-15 Thread Licinio Fernández Maurelo
Hi there,

i want to recover the term vectors from indexes not calculating then but
just only recovering  instead.

Some questions about this topic:


   1. When i put the   option ... what's happening behind?
  1. Is Lucene storing the tv in the index?
  2. Is Lucene storing additional info to allow tv's calculation?
   2. Reading Solr 1.4 Enterprise Search book (amazing book!) found this: "
   In Solr 1.4, it is now possible to tell Lucene that a field should store
   these for efficient retrieval. Without them, the same information can be
   derived at runtime but that's slower" (p. 286) - Does this mean that older
   Solr versions don't come with this functionality?
   3. Can tv component expose raw tem vectors for fields not marked wirh
   ?


Thx

-- 
Lici


New to Solr : How to create solr index for rich documents especially .xls

2009-09-15 Thread busbus

Hi

I am a newbie to Solr. Right now I have to do a task of converting rich
documents to Solr readable index format so that I can use the index for
searching.

I learnt about Solr and got a rough idea of what has to be done.

Requirement 1: 

1)  I have to index the rich document format files like .xls,.pdf,doc,ppt 

Information that I know:

For this as far as I searched in Internet I came to know that we can use
Data Import Handler, Apache Tika. (  but how to do that with this ).Should I
code with the Data Import Handler ?

So far I have downloaded a sample document from net and tried running that.
The application runs on a Jetty Web Server and when I query in I get an xml
file as output.

Problems faced:

Since I am very new to java I am not able to get a clear picture of what has
to be done and what is this Ant tool used for.

Requirement 2:

I need to change the Web server from Jetty to Jboss Application server. What
has to be done for this?



Solution tried:

I tried copying the solr.war in to the web app directory and tried running
the application. Since I am very new to java I might have made some basic
mistake too. Please guide me.

Thanks in advance.


-- 
View this message in context: 
http://www.nabble.com/New-to-Solr-%3A-How-to-create-solr-index-for-rich-documents-especially-.xls-tp25451164p25451164.html
Sent from the Solr - User mailing list archive at Nabble.com.



Best strategy to commit often under load.

2009-09-15 Thread Jérôme Etévé
Hi all,

 I've got a solr server under significant load ( ~40/s ) and a single
process which can potentially commit as often as possible.
Typically, when it commits every 5 or 10s, my solr server slows down
quite a lot and this can lead to congestion problems on my client
side.

What would you recommend in this situation, is it better to leave solr
performs the commits automatically with reasonable autocommit
parameters?

What are solr's best practices concerning this point?

Thanks for your help!

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net


Re: Dealing with term vectors

2009-09-15 Thread Grant Ingersoll


On Sep 15, 2009, at 5:31 AM, Licinio Fernández Maurelo wrote:


Hi there,

i want to recover the term vectors from indexes not calculating then  
but

just only recovering  instead.



http://wiki.apache.org/solr/TermVectorComponent


Some questions about this topic:


  1. When i put the   option ... what's happening  
behind?

 1. Is Lucene storing the tv in the index?


Yes.


 2. Is Lucene storing additional info to allow tv's calculation?
  2. Reading Solr 1.4 Enterprise Search book (amazing book!) found  
this: "
  In Solr 1.4, it is now possible to tell Lucene that a field should  
store
  these for efficient retrieval. Without them, the same information  
can be
  derived at runtime but that's slower" (p. 286) - Does this mean  
that older

  Solr versions don't come with this functionality?


I haven't gotten to that section yet, but I bet it's referring to  
recreating by analyzing the content.


  3. Can tv component expose raw tem vectors for fields not marked  
wirh

  ?



Not yet.  You can use the FieldAnalysisRequestHandler (I think that's  
the name, it used to be called the DocumentAnalysisRequestHandler) to  
do that, but that would require two trips to the server.


-Grant

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Retrieving a field from all result docuemnts & couple of more queries

2009-09-15 Thread Shashikant Kore
Hi,

I am familiar with Lucene and trying out Solr.

I have index which was created outside solr. The index is fairly
simple with two field - document_id  & content. The query result needs
to return all the document IDs. The result need not be ordered by the
score. For this, in Lucene, I use custom hit collector with search to
get results quickly. The index has a few million documents and queries
returning hundreds of thousands of documents are not uncommon. So, the
speed is crucial here.

Since retrieving the document_id for each document is slow, I am using
FileldCache to store the values of document_id. For all the results
collected (in a bitset) with hit collector, document_id field is
retrieved from the fieldcache.

1. How can I effectively disable scoring? I have read that
ConstantScoreQuery is quite fast, but from the code, I see that it is
used only for wildcard queries. How can I use ConstantScoreQuery for
all the queries (boolean, term, phrase, ..)?  Also, is
ConstantScoreQuery as fast as a custom hit collector?

2. How can Solr take advantage of the fieldcache while returning the
field document_id? The documentation says, fieldcache can be
explicitly auto warmed with Solr.  If fieldcache is available and
initialized at the beginning, will solr look into the cache to
retrieve the fields to be returned?

3. If there is an additional field for stemmed_content on which search
needs to use different analyzer, I suppose, that could be specified by
fieldType attribute in the schema.

Thank you,

--shashi


How to create a new index file automatically

2009-09-15 Thread busbus

Hi all,

I am newbie to Solr.

I have downloaded and used the solr  example and I have a basic doubt.

There are some xml documents present in
apache-solr-1.3.0\example\exampledocs.
These are the input files to solr index and I found that by giving this
command 

java –jar post.jar *.xml 

. All these xml documents have basic structure schema.

Say for example



   abc 
…
….




I want to index some more files. Then in that case should I have to create a
new xml file manually or what should I do to create it automatically. 

Please give me a solution. I am very new to Solr and so please make it as
simple as possible.

Thanks a lot...

-- 
View this message in context: 
http://www.nabble.com/How-to-create-a-new-index-file-automatically-tp25455045p25455045.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: multicore shards and relevancy score

2009-09-15 Thread Paul Rosen

Shalin Shekhar Mangar wrote:

On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen wrote:


I've done a few experiments with searching two cores with the same schema
using the shard syntax. (using solr 1.3)

My use case is that I want to have multiple cores because a few different
people will be managing the indexing, and that will happen at different
times. The data, however, is homogeneous.



Multiple cores were not built for distributed search. It is inefficient as
compared to a single index. But if you want to use them that way, that's
your choice.


Well, I'm experimenting with them because it will simplify index 
maintenance greatly. I am beginning to think that it won't work in my 
case, though.





I've noticed in my tests that the results are not interwoven, but it might
just be my test data. In other words, all the results from one core appear,
then all the results from the other core.

In thinking about it, it would make sense if the relevancy scores for each
core were completely independent of each other. And that would mean that
there is no way to compare the relevancy scores between the cores.

In other words, I'd like the following results:

- really relevant hit from core0
- pretty relevant hit from core1
- kind of relevant hit from core0
- not so relevant hit from core1

but I get:

- really relevant hit from core0
- kind of relevant hit from core0
- pretty relevant hit from core1
- not so relevant hit from core1

So, are the results supposed to be interwoven, and I need to study my data
more, or is this just not something that is possible?



The only difference wrt relevancy between a distributed search and a
single-node search is that there is no distributed IDF and therefore a
distributed search assumes a random distribution of terms among shards. I'm
not sure if that is what you are seeing.



Also, if this is insurmountable, I've discovered two show stoppers that
will prevent using multicore in my project (counting the lack of support for
faceting in multicore). Are these issues addressed in solr 1.4?



Can you give more details on what these two issues are?



The first issue is detailed above, where the results from a search over 
two shards don't appear to be returned in relevancy order.


The second issue was detailed in an email last week "shards and facet 
count". The facet information is lost when doing a search over two 
shards, so if I use multicore, I can no longer have facets.





RE: Dataimport MySQLNonTransientConnectionException: No operations allowed after connection closed

2009-09-15 Thread Fuad Efendi

Easy FIX: use autoReconnect=true for MySQL:

jdbc:mysql://localhost:3306/?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true


May be it will help; connection is auto-closed " after a couple of seconds" 
(usually 10 seconds) by default, for MySQL... connection pooling won't help 
(their JDBC is already pool based, and server closes connection after some 
delays)


-Fuad
(MySQL contributor)




> -Original Message-
> From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble
> Paul ??? ??
> Sent: September-15-09 3:48 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Dataimport MySQLNonTransientConnectionException: No operations
> allowed after connection closed
> 
> First of all let us confirm this issue is fixed in 1.4.
> 
> 1.4 is stable and a lot of people are using it in production and it is
> going to be released pretty soon
> 
> On Mon, Sep 14, 2009 at 8:05 PM, palexv  wrote:
> >
> > I am using 1.3
> > Do you suggest 1.4 from developer trunk? I am concern if it stable. Is it
> > safe to use it in big commerce app?
> >
> >
> >
> > Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> >>
> >> which version of Solr are you using. can you try with a recent one and
> >> confirm this?
> >>
> >> On Mon, Sep 14, 2009 at 7:45 PM, palexv  wrote:
> >>>
> >>> I know that my issue is related to
> >>> http://www.nabble.com/dataimporthandler-and-multiple-delta-import-
> td19160129.html#a19160129
> >>> and https://issues.apache.org/jira/browse/SOLR-728
> >>> but my case is quite different.
> >>> As I understand patch at https://issues.apache.org/jira/browse/SOLR-728
> >>> prevents concurrent executing of import operation but does NOT put
> >>> command
> >>> in a queue.
> >>>
> >>> I have only few records to index. When run full reindex - it works very
> >>> fast. But when I try to rerun this even after a couple of seconds - I am
> >>> getting
> >>> Caused by:
> >>> com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
> >>> No operations allowed after connection closed.
> >>>
> >>> At this time, when I check status - it says that status is idle and
> >>> everything was indexed success.
> >>> Second run of reindex without exception I can run only after 10 seconds.
> >>> It does not work for me! If I apply patch from
> >>> https://issues.apache.org/jira/browse/SOLR-728 - I will unable to reindex
> >>> in
> >>> next 10 seconds as well.
> >>> Any suggestions?
> >>> --
> >>> View this message in context:
> >>> http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-
> No-operations-allowed-after-connection-closed-tp25436605p25436605.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> -
> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >>
> >>
> >
> > --
> > View this message in context: http://www.nabble.com/Dataimport-
> MySQLNonTransientConnectionException%3A-No-operations-allowed-after-
> connection-closed-tp25436605p25436948.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> 
> 
> 
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com




Re: Return one word - Auto Complete Request Handler

2009-09-15 Thread Grant Ingersoll


On Sep 14, 2009, at 2:06 PM, Mohamed Parvez wrote:


I am trying configure an request handler that will be uses in the Auto
Complete Query.

I am limiting the result to one field by using the "fl" parameter,  
which can

be used to specify field to return.

How to make the field return only one word not full sentences.




Is http://wiki.apache.org/solr/TermsComponent helpful?


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



do NOT want to stem plurals for a particular field, or words

2009-09-15 Thread DHast

I have a field where there are items that are plurals, and used as very
specific locators, so i do a solr search type:articles, and it translates it
into : type:article, then into type:articl... is tehre a way to stop it from
doing this on either the field "type" or on a list of words "articles,
notes, etc"

i tried enering into the protwords.txt file and dont seem to get any where
-- 
View this message in context: 
http://www.nabble.com/do-NOT-want-to-stem-plurals-for-a-particular-field%2C-or-words-tp25455570p25455570.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: do NOT want to stem plurals for a particular field, or words

2009-09-15 Thread Jérôme Etévé
Hi,

  You can enable/disable stemming per field type in the schema.xml, by
removing the stemming filters from the type definition.

Basically, copy your prefered type, rename it to something like
'text_nostem', remove the stemming filter from the type and use your
'text_nostem' type for your field 'type' .

By what you say, I guess your field 'type' will be even more happier
to simply be of type 'string' .

Jerome.

2009/9/15 DHast :
>
> I have a field where there are items that are plurals, and used as very
> specific locators, so i do a solr search type:articles, and it translates it
> into : type:article, then into type:articl... is tehre a way to stop it from
> doing this on either the field "type" or on a list of words "articles,
> notes, etc"
>
> i tried enering into the protwords.txt file and dont seem to get any where
> --
> View this message in context: 
> http://www.nabble.com/do-NOT-want-to-stem-plurals-for-a-particular-field%2C-or-words-tp25455570p25455570.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net


Expected Approximate Release Date Solr 1.4

2009-09-15 Thread Mohamed Parvez
Its 15th-November-2009. Its been a year since Solr 1.3 was released.

Everyone is eagerly expecting that around this time Solr 1.4 will be
released.

(Refer Book: Solr 1.4 Enterprise Search Server, By David Smiley & Eric
Pugh,  Page 11
"the latest official release. Solr 1.3 was released on September 15th, 2008.
Solr 1.4 is expected around the same time a year later")

Is there any expected approximate release date for Solr 1.4


Thanks/Regards,
Parvez


Re: stopfilterFactory isn't removing field name

2009-09-15 Thread mike anderson
Could this be related to SOLR-1423?

On Mon, Sep 14, 2009 at 8:51 AM, Yonik Seeley wrote:

> Thanks, I'll see if I can reproduce...
>
> -Yonik
> http://www.lucidimagination.com
>
> On Mon, Sep 14, 2009 at 2:10 AM, mike anderson 
> wrote:
> > Yeah.. that was weird. removing the line "forever,for ever" from my
> synonyms
> > file fixed the problem. In fact, i was having the same problem for every
> > double word like that. I decided I didn't really need the synonym filter
> for
> > that field so I just took it out, but I'd really like to know what the
> > problem is.
> > -mike
> >
> > On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley <
> yo...@lucidimagination.com>
> > wrote:
> >>
> >> That's pretty strange... perhaps something to do with your synonyms
> >> file mapping "for" to a zero length token?
> >>
> >> -Yonik
> >> http://www.lucidimagination.com
> >>
> >> On Mon, Sep 14, 2009 at 12:13 AM, mike anderson  >
> >> wrote:
> >> > I'm kind of stumped by this one.. is it something obvious?
> >> > I'm running the latest trunk. In some cases the stopFilterFactory
> isn't
> >> > removing the field name.
> >> >
> >> > Thanks in advance,
> >> >
> >> > -mike
> >> >
> >> > From debugQuery (both words are in the stopwords file):
> >> >
> >> > http://localhost:8983/solr/select?q=citations:for&debugQuery=true
> >> >
> >> > citations:for
> >> > citations:for
> >> > citations:
> >> > citations:
> >> >
> >> >
> >> > http://localhost:8983/solr/select?q=citations:the&debugQuery=true
> >> >
> >> > citations:the
> >> > citations:the
> >> > 
> >> > 
> >> >
> >> >
> >> >
> >> >
> >> > schema analyzer for this field:
> >> > 
> >> >  >> > positionIncrementGap="100">
> >> >  
> >> > 
> >> >  >> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
> >> > 
> >> > >> > words="citationstopwords.txt"/>
> >> >
> >> >   
> >> >
> >> >
> >> >  
> >> >  
> >> >  
> >> >>> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
> >> >  
> >> >   >> > words="citationstopwords.txt"/>
> >> >  
> >> >
> >> >   
> >> >  
> >> >
> >> >
> >
> >
>


Re: multicore shards and relevancy score

2009-09-15 Thread Jason Rutherglen
You can query multiple cores using MultiEmbeddedSearchHandler in
SOLR-1431.  Then the facet counts will be merged just like the current
distributed requests.

On Tue, Sep 15, 2009 at 7:41 AM, Paul Rosen  wrote:
> Shalin Shekhar Mangar wrote:
>>
>> On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen
>> wrote:
>>
>>> I've done a few experiments with searching two cores with the same schema
>>> using the shard syntax. (using solr 1.3)
>>>
>>> My use case is that I want to have multiple cores because a few different
>>> people will be managing the indexing, and that will happen at different
>>> times. The data, however, is homogeneous.
>>>
>>>
>> Multiple cores were not built for distributed search. It is inefficient as
>> compared to a single index. But if you want to use them that way, that's
>> your choice.
>
> Well, I'm experimenting with them because it will simplify index maintenance
> greatly. I am beginning to think that it won't work in my case, though.
>
>>
>>> I've noticed in my tests that the results are not interwoven, but it
>>> might
>>> just be my test data. In other words, all the results from one core
>>> appear,
>>> then all the results from the other core.
>>>
>>> In thinking about it, it would make sense if the relevancy scores for
>>> each
>>> core were completely independent of each other. And that would mean that
>>> there is no way to compare the relevancy scores between the cores.
>>>
>>> In other words, I'd like the following results:
>>>
>>> - really relevant hit from core0
>>> - pretty relevant hit from core1
>>> - kind of relevant hit from core0
>>> - not so relevant hit from core1
>>>
>>> but I get:
>>>
>>> - really relevant hit from core0
>>> - kind of relevant hit from core0
>>> - pretty relevant hit from core1
>>> - not so relevant hit from core1
>>>
>>> So, are the results supposed to be interwoven, and I need to study my
>>> data
>>> more, or is this just not something that is possible?
>>>
>>>
>> The only difference wrt relevancy between a distributed search and a
>> single-node search is that there is no distributed IDF and therefore a
>> distributed search assumes a random distribution of terms among shards.
>> I'm
>> not sure if that is what you are seeing.
>>
>>
>>> Also, if this is insurmountable, I've discovered two show stoppers that
>>> will prevent using multicore in my project (counting the lack of support
>>> for
>>> faceting in multicore). Are these issues addressed in solr 1.4?
>>>
>>>
>> Can you give more details on what these two issues are?
>>
>
> The first issue is detailed above, where the results from a search over two
> shards don't appear to be returned in relevancy order.
>
> The second issue was detailed in an email last week "shards and facet
> count". The facet information is lost when doing a search over two shards,
> so if I use multicore, I can no longer have facets.
>
>
>


Re: Best strategy to commit often under load.

2009-09-15 Thread Jason Rutherglen
Hi Jerome,

5 seconds is too little using Solr 1.3 or 1.4 because of caching
and segment warming. If you turn off caching and segment
warming, then you may be able do 5s latency using either a
RAMDirectory or an SSD. In the future these issues will be fixed
and less than 1s will be possible.

-J

On Tue, Sep 15, 2009 at 3:07 AM, Jérôme Etévé  wrote:
> Hi all,
>
>  I've got a solr server under significant load ( ~40/s ) and a single
> process which can potentially commit as often as possible.
> Typically, when it commits every 5 or 10s, my solr server slows down
> quite a lot and this can lead to congestion problems on my client
> side.
>
> What would you recommend in this situation, is it better to leave solr
> performs the commits automatically with reasonable autocommit
> parameters?
>
> What are solr's best practices concerning this point?
>
> Thanks for your help!
>
> Jerome.
>
> --
> Jerome Eteve.
> http://www.eteve.net
> jer...@eteve.net
>


Re: stopfilterFactory isn't removing field name

2009-09-15 Thread Yonik Seeley
On Tue, Sep 15, 2009 at 1:14 PM, mike anderson  wrote:
> Could this be related to SOLR-1423?

Nope, and I haven't been able to reproduce the bug you saw either.

-Yonik

> On Mon, Sep 14, 2009 at 8:51 AM, Yonik Seeley 
> wrote:
>
>> Thanks, I'll see if I can reproduce...
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> On Mon, Sep 14, 2009 at 2:10 AM, mike anderson 
>> wrote:
>> > Yeah.. that was weird. removing the line "forever,for ever" from my
>> synonyms
>> > file fixed the problem. In fact, i was having the same problem for every
>> > double word like that. I decided I didn't really need the synonym filter
>> for
>> > that field so I just took it out, but I'd really like to know what the
>> > problem is.
>> > -mike
>> >
>> > On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley <
>> yo...@lucidimagination.com>
>> > wrote:
>> >>
>> >> That's pretty strange... perhaps something to do with your synonyms
>> >> file mapping "for" to a zero length token?
>> >>
>> >> -Yonik
>> >> http://www.lucidimagination.com
>> >>
>> >> On Mon, Sep 14, 2009 at 12:13 AM, mike anderson > >
>> >> wrote:
>> >> > I'm kind of stumped by this one.. is it something obvious?
>> >> > I'm running the latest trunk. In some cases the stopFilterFactory
>> isn't
>> >> > removing the field name.
>> >> >
>> >> > Thanks in advance,
>> >> >
>> >> > -mike
>> >> >
>> >> > From debugQuery (both words are in the stopwords file):
>> >> >
>> >> > http://localhost:8983/solr/select?q=citations:for&debugQuery=true
>> >> >
>> >> > citations:for
>> >> > citations:for
>> >> > citations:
>> >> > citations:
>> >> >
>> >> >
>> >> > http://localhost:8983/solr/select?q=citations:the&debugQuery=true
>> >> >
>> >> > citations:the
>> >> > citations:the
>> >> > 
>> >> > 
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > schema analyzer for this field:
>> >> > 
>> >> > > >> > positionIncrementGap="100">
>> >> >      
>> >> > 
>> >> >         > >> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
>> >> > 
>> >> >        > >> > words="citationstopwords.txt"/>
>> >> >        
>> >> >   
>> >> >
>> >> >        
>> >> >      
>> >> >      
>> >> >      
>> >> >       > >> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
>> >> >  
>> >> >  > >> > words="citationstopwords.txt"/>
>> >> >      
>> >> >    
>> >> >       
>> >> >      
>> >> >    
>> >> >
>> >
>> >
>>
>


Re: CSV Update - Need help mapping csv field to schema's ID

2009-09-15 Thread Insight 49, LLC

Bump. Can anyone help guide me in the right direction?

Want to map each sku field to the schema unique id field using update/csv.

Thanks. Dan.


Insight 49, LLC wrote:
Using http://localhost:8983/solr/update/csv?stream.file, is there any 
way to map one of the csv fields to one's schema unique id?


e.g. A file with 3 fields (sku, product,price):
http://localhost:8983/solr/update/csv?stream.file=products.csv&stream.contentType=text/plain;charset=utf-8&header=true&separator=%2c&encapsulator=%22&escape=%5c&fieldnames=sku,product,price 



I would like to add an additional name:value pair for every line, 
mapping the sku field to my schema's id field:


.map={sku.field}:{id}

I would prefer NOT to change the schema by adding a source="sku" dest="id"/>.


I read: http://wiki.apache.org/solr/UpdateCSV, but can't quite get it.

Thanks!

Dan



Re: CSV Update - Need help mapping csv field to schema's ID

2009-09-15 Thread Mark A. Matienzo
On Tue, Sep 15, 2009 at 2:23 PM, Insight 49, LLC  wrote:
> Want to map each sku field to the schema unique id field using update/csv.

You can set the sku field to be the uniqueKey field in the schema. See
http://wiki.apache.org/solr/SchemaXml#head-bec9b4f189d7f493c42f99b479ed0a8d0dd3d76e
for more info.

Mark Matienzo
Applications Developer, Digital Experience Group
The New York Public Library


Re: "standard" requestHandler components

2009-09-15 Thread Chris Hostetter

: I just copied this information to the wiki at
: http://wiki.apache.org/solr/SolrRequestHandler

FYI: All of this info is specific to SearchComponents which are specific 
to SearchHandler -- so that page is a missleading place to put this info 
(plenty of other SearchHandlers don't support components at all)

I've updated the wiki accordingly (most of this info was already on the 
SearchComponent wiki page)



-Hoss



Re: How to create a new index file automatically

2009-09-15 Thread Chris Harris
There are a few different ways to get data into Solr. XML is one way,
and probably the most common. As far as Solr is concerned it doesn't
matter whether you construct XML input by hand or write some kind of
code to do it. Solr won't automatically create any files like the
example .xml files for you, though, nor would it make all that much
sense for it to do so.

For testing it's fine to use the post.jar script like you're doing,
but most people are probably not going to do this in production;
rather they'll submit the XML to Solr with an HTTP POST from some
indexing process. The format for the XML files is described at

http://wiki.apache.org/solr/UpdateXmlMessages

If you're doing an HTTP POST, the URL to post to will be something like

http://:/solr/update

Solr can also accept input in CSV format. Or it can import data from
your Sql database using http://wiki.apache.org/solr/DataImportHandler
It can import documents in certain other formats using the
http://wiki.apache.org/solr/ExtractingRequestHandler

Note: I'm not sure if you understand, from your message, that you're
going to have to create a schema for your data at some point. The
"example" directory contains an example schema, but it probably won't
be suitable for your application. See
http://wiki.apache.org/solr/SchemaXml

2009/9/15 busbus :
>
> Hi all,
>
> I am newbie to Solr.
>
> I have downloaded and used the solr  example and I have a basic doubt.
>
> There are some xml documents present in
> apache-solr-1.3.0\example\exampledocs.
> These are the input files to solr index and I found that by giving this
> command
>
> java –jar post.jar *.xml
>
> . All these xml documents have basic structure schema.
>
> Say for example
>
> 
> 
>   abc 
>    …
>    ….
>
> 
> 
>
> I want to index some more files. Then in that case should I have to create a
> new xml file manually or what should I do to create it automatically.
>
> Please give me a solution. I am very new to Solr and so please make it as
> simple as possible.
>
> Thanks a lot...
>
> --
> View this message in context: 
> http://www.nabble.com/How-to-create-a-new-index-file-automatically-tp25455045p25455045.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Solr exception with missing required field (meta_guid_s)

2009-09-15 Thread kedardes

Hi, I have a data-config file where I map the fields of a very simple table
using dynamic field definitions : 









but when I run the dataimport I get this error:
WARNING: Error creating document : SolrInputDocumnt[{id_i=id_i(1.0)={2},
name_s=name_s(1.0)={John Smith}, city_s=city_s(1.0)={Newark}}]
org.apache.solr.common.SolrException: Document [null] missing required
field: meta_guid_s

>From the schema.xml I see that the meta_guid_s field is defined as a "Global
unique ID" but does this have to be set explicitly or mapped to a particular
field?

thanks.
-- 
View this message in context: 
http://www.nabble.com/Solr-exception-with-missing-required-field-%28meta_guid_s%29-tp25460529p25460529.html
Sent from the Solr - User mailing list archive at Nabble.com.



faceted query not working as i expected

2009-09-15 Thread Jonathan Vanasco
I'm trying to request documents that have "facet.venue_type" as  
"Private Collection"


Instead I'm also getting items where another field is marked  
"Permanent Collection"


My schema has:


required="false" />
stored="true" required="false" />





My query is

q=*:*
qt=standard
facet=true
facet.missing=true
facet.field=facet.venue_type
fq=venue_type:Private+Collection

Can anyone offer a suggestion as to what I'm doing wrong ?


Re: Single Core or Multiple Core?

2009-09-15 Thread Chris Hostetter

: A large majority of users use single core ONLY. It is hard to explain
: them the need for an extra componentin the url.

A majority use only a single core because that's all they know because 
it's what the default example and the tutorial use.  Even when people 
have no have use for running multiple cores with differnet 
schemas *concurrently* the value of swapping out cores on config upgrade 
is certainly worth the inconvinince of needing to add "/corename" to the 
urls they connect from in their clients.

: I would say it is a design problem which we should solve instead of
: asking users to change

the pros/cons of default core names were discussed at great length when 
multicore support was first added.  Because of core swapping and path 
based requestHandler naming the confusion introduced by trying to have a 
default core winds up being *vastly* worse then the confusion of trying to 
explain why they should use "/solr/core/select" instead of "/solr/select"


-Hoss



Re: faceted query not working as i expected

2009-09-15 Thread AHMET ARSLAN


--- On Tue, 9/15/09, Jonathan Vanasco  wrote:

> From: Jonathan Vanasco 
> Subject: faceted query not working as i expected
> To: solr-user@lucene.apache.org
> Date: Tuesday, September 15, 2009, 10:54 PM
> I'm trying to request documents that
> have "facet.venue_type" as "Private Collection"
> 
> Instead I'm also getting items where another field is
> marked "Permanent Collection"
> 
> My schema has:
> 
> 
>      indexed="true" stored="true" required="false" />
>      type="string" indexed="true" stored="true" required="false"
> />
> 
>  />
> 
> 
> My query is
> 
>     q=*:*
>     qt=standard
>     facet=true
>     facet.missing=true
>     facet.field=facet.venue_type
>     fq=venue_type:Private+Collection
> 
> Can anyone offer a suggestion as to what I'm doing wrong ?
> 

The filter query fq=venue_type:Private+Collection has a part that runs on 
default field. It is parsed to venue_type:Private defaultField:Collection You 
can use 
fq=venue_type:"Private+Collection"
or 
fq=venue_type:(Private AND Collection)
instead.

These will/may bring documents having "something Private Collection" in 
venue_type field since it is a tokenized field.

If you want to retrieve documents that have "facet.venue_type" as "Private 
Collection" you can use fq:facet.venue_type:"Private Collection" that operates 
on a string (non-tokenized) field.

Hope this helps.





documentation deficiency : case sensitivity of boolean operators

2009-09-15 Thread Jonathan Vanasco

I couldn't find this anywhere on solr's docs / faq

i finally found a reference on lucene
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

this should really be added somewhere.  i'm not sure where, but I  
thought this was worth bringing up to the list -- as it really  
confused the hell out of me :)


Re: documentation deficiency : case sensitivity of boolean operators

2009-09-15 Thread Chris Hostetter

: Subject: documentation deficiency : case sensitivity of boolean operators
: 
: I couldn't find this anywhere on solr's docs / faq

if you have suggestions on places to add it, feel free to update the wiki.

(most of the documentation is deliberatly agnostic to the specifics of the 
query parser syntax, instead relying on links to point you to the same 
refrence URL you found ... so i can't actually think of anywhere in the 
Solr docs that mentions the AND/OR/NOT syntax that it would make sense to 
clarify this)

-Hoss



Re: documentation deficiency : case sensitivity of boolean operators

2009-09-15 Thread Yonik Seeley
That's already linked from
http://wiki.apache.org/solr/SolrQuerySyntax

-Yonik
http://www.lucidimagination.com


On Tue, Sep 15, 2009 at 5:38 PM, Jonathan Vanasco  wrote:
> I couldn't find this anywhere on solr's docs / faq
>
> i finally found a reference on lucene
>        http://lucene.apache.org/java/2_4_0/queryparsersyntax.html
>
> this should really be added somewhere.  i'm not sure where, but I thought
> this was worth bringing up to the list -- as it really confused the hell out
> of me :)
>


Re: Automatically calculate boost factor

2009-09-15 Thread Chris Hostetter

: http://wiki.apache.org/solr/FunctionQuery.  Either that or roll it up into the
: document boost, but that loses some precision.

but if that's what you want to do then yes: solr can compute the 
documenta boost on submission based on the field values ... *IF* if you 
write an UpdateProcessor to do that.

: > 1.2
: > 1.5
: > 0.8
: > 
: > Document boost = 1.2*1.5*0.8
: > 
: > Is it possible to get SOLr to calculate the boost automatically upon
: > submission based on field values?

-Hoss



Re: Expected Approximate Release Date Solr 1.4

2009-09-15 Thread Chris Hostetter

: Its 15th-November-2009. Its been a year since Solr 1.3 was released.

It's september actaully.

: Is there any expected approximate release date for Solr 1.4

there is no specific date, but the timeframe and what the release is 
dependent on have been discussed in several threads...

http://wiki.apache.org/solr/Solr1.4


-Hoss



Re: CSV Update - Need help mapping csv field to schema's ID

2009-09-15 Thread Chris Hostetter

: I would like to add an additional name:value pair for every line, mapping the
: sku field to my schema's id field:
: 
: .map={sku.field}:{id}

the map param is for replacing a *value* with a different' value ... it's 
useful for things like numeric codes in CSV files that you want to replace 
with strings in your index.

: I would prefer NOT to change the schema by adding a .

that's the only solution i can think of unless you want to write an 
UpdateProcessor.


-Hoss



Multiple parsedquery in the result set when debugQuery=true

2009-09-15 Thread Jason Rutherglen
Are there supposed to be multiple parsedquery entries for a
distributed query when debugQuery=true?


Re: Retrieving a field from all result docuemnts & couple of more queries

2009-09-15 Thread abhay kumar
Hi,

1)Solr has various type of caches . We can specify how many documents cache
can have at a time.
   e.g. if windowsize=50
   50 results will be cached in queryResult Cache.
if user makes a new request to server for results after 50
documents a new request will be sent to the server & server will retrieve
next 50 results in the cache.
   http://wiki.apache.org/solr/SolrCaching
   Yes, solr looks into the cache to retrieve the fields to be returned.

2) Yes, we can have different tokenizers or filters for index & search. We
need not create a different fieldtype. We need to configure the same
fieldtype (datatype) for index & search analyzers sections differently.

   e.g.


  **
 
 

 
 
 
 
   
  * *
 
 

 
 
  




Regards,
Abhay

On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore wrote:

> Hi,
>
> I am familiar with Lucene and trying out Solr.
>
> I have index which was created outside solr. The index is fairly
> simple with two field - document_id  & content. The query result needs
> to return all the document IDs. The result need not be ordered by the
> score. For this, in Lucene, I use custom hit collector with search to
> get results quickly. The index has a few million documents and queries
> returning hundreds of thousands of documents are not uncommon. So, the
> speed is crucial here.
>
> Since retrieving the document_id for each document is slow, I am using
> FileldCache to store the values of document_id. For all the results
> collected (in a bitset) with hit collector, document_id field is
> retrieved from the fieldcache.
>
> 1. How can I effectively disable scoring? I have read that
> ConstantScoreQuery is quite fast, but from the code, I see that it is
> used only for wildcard queries. How can I use ConstantScoreQuery for
> all the queries (boolean, term, phrase, ..)?  Also, is
> ConstantScoreQuery as fast as a custom hit collector?
>
> 2. How can Solr take advantage of the fieldcache while returning the
> field document_id? The documentation says, fieldcache can be
> explicitly auto warmed with Solr.  If fieldcache is available and
> initialized at the beginning, will solr look into the cache to
> retrieve the fields to be returned?
>
> 3. If there is an additional field for stemmed_content on which search
> needs to use different analyzer, I suppose, that could be specified by
> fieldType attribute in the schema.
>
> Thank you,
>
> --shashi
>


Re: How to create a new index file automatically

2009-09-15 Thread busbus



> It can import documents in certain other formats using the 
> http://wiki.apache.org/solr/ExtractingRequestHandler
> 

1) According to my inference.Solr uses Apache Tikka to convert other rich
document format files to Text Files, so that the Class ExtractRequestHandler
use the output text file to create the Index files.

2. If Point 1 is correct,then I think this could suit my requirements since
I need to index rich documents files especially .xls format.
But i cant find the class ExtractRequestHandler which has to be configured
in SOLRCONFIG.xml file, so that i can import XLS documents through the
servlet

ttp://localhost:8983/solr/update/extract?=
-- 
View this message in context: 
http://www.nabble.com/How-to-create-a-new-index-file-automatically-tp25455045p25466714.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr exception with missing required field (meta_guid_s)

2009-09-15 Thread Shalin Shekhar Mangar
On Wed, Sep 16, 2009 at 1:13 AM, kedardes  wrote:

>
> Hi, I have a data-config file where I map the fields of a very simple table
> using dynamic field definitions :
>
>
>
>
>
>
>
>
>
> but when I run the dataimport I get this error:
> WARNING: Error creating document : SolrInputDocumnt[{id_i=id_i(1.0)={2},
> name_s=name_s(1.0)={John Smith}, city_s=city_s(1.0)={Newark}}]
> org.apache.solr.common.SolrException: Document [null] missing required
> field: meta_guid_s
>
> From the schema.xml I see that the meta_guid_s field is defined as a
> "Global
> unique ID" but does this have to be set explicitly or mapped to a
> particular
> field?
>

You have created that schema so you are the better person to answer that
question. As far as a required field or uniqueKey is concerned, their values
have to be set or copied from another field.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Questions on copyField

2009-09-15 Thread Rahul R
Would appreciate any help on this. Thanks

Rahul
On Mon, Sep 14, 2009 at 5:12 PM, Rahul R  wrote:

> Hello,
> I have a few questions regarding the copyField directive in schema.xml
>
> 1. Does the destination field store a reference or the actual data ?
> If I have soemthing like this
> 
> then will the values in the 'name' field get copied into the 'text' field
> or will the 'text' field only store a reference to the 'name' field ? To put
> it more simply, if I later delete the 'name' field from the index will I
> lose the corresponding data in the 'text' field ?
>
> 2. Is there any inbuilt API which I can use to do the copyField action
> programmatically ?
>
> 3. Can I do a copyfield from the schema as well as programmatically for the
> same destination field
> Suppose I want the 'text' field to contain values for name, age and
> location. In my index only 'name' and 'age' are defined as fields. So I can
> add directives like
> 
> 
> The location however, I want to add it to the 'text' field
> programmatically. I don't want to store the location as a separate field in
> the index. Can I do this ?
>
> Thank you.
>
> Regards
> Rahul
>