Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

2008-12-03 Thread Marc Sturlese

That's what I am trying to do. Thanks for the advice. Once I have it done I
will rise the issue and upload the patch.
 

Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> OK . I guess I see it.  I am thinking of exposing the writes to the
> properties file via an API.
> 
> say Context#persist(key,value);
> 
> 
> This can write the data to the dataimport.properties.
> 
> You must be able to retrieve that value by ${dataimport.persist.}
> 
> or through an API, Context.getPersistValue(key)
> 
> You can raise an issue and give a patch and we can get it committed
> 
> I guess this is what you wish to achieve
> 
> --Noble
> 
> 
> 
> On Wed, Dec 3, 2008 at 3:28 AM, Marc Sturlese <[EMAIL PROTECTED]>
> wrote:
>>
>> Do you mean the file used by dataimporthandler called
>> dataimport.properties?
>> If you mean this one it's writen at the end of the indexing proccess. The
>> writen date will be used in the next indexation by delta-query to
>> identify
>> the new or modified rows from the database.
>>
>> What I am trying to do is instead of saving a timestamp save the last
>> indexed id. Doing that, in the next execution I will start indexing from
>> the
>> last doc that was indexed in the previous indexation. But I am still a
>> bit
>> confused about how to do that...
>>
>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>> delta-import file?
>>>
>>>
>>> On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <[EMAIL PROTECTED]>
>>> wrote:
 Does the DIH delta feature rewrite the delta-import file for each set
 of
 rows? If it does not, that sounds like a bug/enhancement.
 Lance

 -Original Message-
 From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, December 02, 2008 8:51 AM
 To: solr-user@lucene.apache.org
 Subject: Re: DataImportHandler: Deleteing from index and db;
 lastIndexed
 id feature

 You can write the details to a file using a Transformer itself.

 It is wise to stick to the public API as far as possible. We will
 maintain back compat and your code will be usable w/ newer versions.


 On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <[EMAIL PROTECTED]>
 wrote:
>
> Thanks I really apreciate your help.
>
> I didn't explain myself so well in here:
>
>> 2.-This is probably my most difficult goal.
>> Deltaimport reads a timestamp from the dataimport.properties and
>> modify/add all documents from db wich were inserted after that date.
>> What I want is to be able to save in the field the id of the last
>> idexed doc. So in the next time I ejecute the indexer make it start
>> indexing from that last indexed id doc.
> You can use a Transformer to write something to the DB.
> Context#getDataSource(String) for each row
>
> When I said:
>
>> be able to save in the field the id of the last idexed doc
> I made a mistake, wanted to mean :
>
> be able to save in the file (dataimport.properties) the id of the last
> indexed doc.
> The point would be to do my own deltaquery indexing from the last doc
> indexed id instead of the timestamp.
> So I think this would not work in that case (it's my mistake because
> of the bad explanation):
>
>>You can use a Transformer to write something to the DB.
>>Context#getDataSource(String) for each row
>
> It is because I was saying:
>> I think I should begin modifying the SolrWriter.java and
>> DocBuilder.java.
>> Creating functions like getStartTime, persistStartTime... for ID
>> control
>
> I am in the correct direction?
>  Sorry for my englis and thanks in advance
>
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
>> <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Hey there,
>>>
>>> I have my dataimporthanlder almost completely configured. I am
>>> missing three goals. I don't think I can reach them just via xml
>>> conf or transformer and sqlEntitProcessor plugin. But need to be
>>> sure of that.
>>> If there's no other way I will hack some solr source classes, would
>>> like to know the best way to do that. Once I have it solved, I can
>>> upload or post the source in the forum in case someone think it can
>>> be helpful.
>>>
>>> 1.- Every time I execute dataimporthandler (to index data from a
>>> db), at the start time or end time I need to delete some expired
>>> documents. I have to delete them from the database and from the
>>> index. I know wich documents must be deleted because of a field in
>>> the db that says it. Would not like to delete first all from DB or
>>> first all from index but one from index and one from doc every time.
>>
>> You can override the init() destroy() of the SqlEntityProcessor and
>> use it as the processor for the root entity. At this point you can
>> run the necessary db queries a

Re: Solr 1.3 - response time very long

2008-12-03 Thread sunnyfr

Hi again,

In my test, I've maximum response time : 65 sec for an average at 3,
So it might come some request which provide error, for exemple in my test
for 50 000 requests I've around 30 requests which get back error, that's why
the max time response is 65sec.

I just don't get why I've this error on some request. like :
/test/selector?cache=0&backend=solr&request=/relevance/search/D
/test/selector?cache=0&backend=solr&request=/relevance/search/?f+you
/test/selector?cache=0&backend=solr&request=/relevance/search/?
/test/selector?cache=0&backend=solr&request=/relevance/search/the
/test/selector?cache=0&backend=solr&request=/relevance/search/?
...
When I search it manually not by jMeter .. indeed it takes a long time and
then it get back ids.
What do you think?

Thanks a lot for your help.


sunnyfr wrote:
> 
> Hi Matthew, Hi Yonik,
> 
> ...sorry for the flag .. didnt want to ...
> 
> Solr 1.3  / Apache 5.5
> 
> Data's directory size : 7.9G
> I'm using jMeter to hit http request, I'm sending exactly the same on solr
> and sphinx(mysql) by http either.
> 
> solr
> http://test-search.com/test/selector?cache=0&backend=solr&request=/relevance/search/dog
> sphinx
> http://test-search.com/test/selector?cache=0&backend=mysql&request=/relevance/search/dog
> 
> when threads are more than 4 it's gettting slower, for a big test during
> 40mn with increasing to 100 threads/sec for solr like for sphinx, at the
> end the average for solr is 3sec and for sphinx 1sec.
> 
> solrconfig.xml :  http://www.nabble.com/file/p20802690/solrconf.xml
> solrconf.xml 
> 
> schema.xml:
>  
>  stored="true"  omitNorms="true" />
> 
>  stored="false" omitNorms="true" />
>  stored="true" omitNorms="true" />
>  stored="false" omitNorms="true" />
>  stored="true" omitNorms="true" />
>  stored="false" omitNorms="true" />
>  stored="false" omitNorms="true" />
>  stored="true" omitNorms="true" />
>...
>  stored="true" omitNorms="true" />
>  stored="false" omitNorms="true" />
>  stored="false" omitNorms="true" />
>  stored="false" omitNorms="true" />
>  stored="false" omitNorms="true" />
>  stored="false" omitNorms="true" />
>  stored="false" omitNorms="true" />
>  stored="false" omitNorms="true" />
> ...
>  indexed="true" stored="true" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
> 
>  stored="true" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
>  indexed="true" stored="false" />
> 
> 
> multiValued="true"/>
> default="NOW" multiValued="false"/>
> multiValued="true"/>
>
>  
> 
> What would you reckon ???
> Thanks a lot,
> 
> 
> 
> 
> Matthew Runo wrote:
>> 
>> Could you provide more information? How big is the index? How are you  
>> searching it? Some examples might help pin down the issue.
>> 
>> How long are the queries taking? How long did they take on Sphinx?
>> 
>> Thanks for your time!
>> 
>> Matthew Runo
>> Software Engineer, Zappos.com
>> [EMAIL PROTECTED] - 702-943-7833
>> 
>> On Dec 2, 2008, at 9:04 AM, sunnyfr wrote:
>> 
>>>
>>> Hi,
>>>
>>> I tested my old search engine which is sphinx and my new one which  
>>> solr and
>>> I've got a uge difference of result.
>>> How can I make it faster?
>>>
>>> Thanks a lot,
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20795134.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20809121.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

2008-12-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
Good.
We need usecases like these and contributions from users .

This is a win-win
you will not have to manage the code yourself once it is checked in
As we have more eyes on the DIH code it will also improve

Thanks a lot,
Noble

On Wed, Dec 3, 2008 at 1:49 PM, Marc Sturlese <[EMAIL PROTECTED]> wrote:
>
> That's what I am trying to do. Thanks for the advice. Once I have it done I
> will rise the issue and upload the patch.
>
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> OK . I guess I see it.  I am thinking of exposing the writes to the
>> properties file via an API.
>>
>> say Context#persist(key,value);
>>
>>
>> This can write the data to the dataimport.properties.
>>
>> You must be able to retrieve that value by ${dataimport.persist.}
>>
>> or through an API, Context.getPersistValue(key)
>>
>> You can raise an issue and give a patch and we can get it committed
>>
>> I guess this is what you wish to achieve
>>
>> --Noble
>>
>>
>>
>> On Wed, Dec 3, 2008 at 3:28 AM, Marc Sturlese <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Do you mean the file used by dataimporthandler called
>>> dataimport.properties?
>>> If you mean this one it's writen at the end of the indexing proccess. The
>>> writen date will be used in the next indexation by delta-query to
>>> identify
>>> the new or modified rows from the database.
>>>
>>> What I am trying to do is instead of saving a timestamp save the last
>>> indexed id. Doing that, in the next execution I will start indexing from
>>> the
>>> last doc that was indexed in the previous indexation. But I am still a
>>> bit
>>> confused about how to do that...
>>>
>>> Noble Paul നോബിള്‍ नोब्ळ् wrote:

 delta-import file?


 On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <[EMAIL PROTECTED]>
 wrote:
> Does the DIH delta feature rewrite the delta-import file for each set
> of
> rows? If it does not, that sounds like a bug/enhancement.
> Lance
>
> -Original Message-
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, December 02, 2008 8:51 AM
> To: solr-user@lucene.apache.org
> Subject: Re: DataImportHandler: Deleteing from index and db;
> lastIndexed
> id feature
>
> You can write the details to a file using a Transformer itself.
>
> It is wise to stick to the public API as far as possible. We will
> maintain back compat and your code will be usable w/ newer versions.
>
>
> On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <[EMAIL PROTECTED]>
> wrote:
>>
>> Thanks I really apreciate your help.
>>
>> I didn't explain myself so well in here:
>>
>>> 2.-This is probably my most difficult goal.
>>> Deltaimport reads a timestamp from the dataimport.properties and
>>> modify/add all documents from db wich were inserted after that date.
>>> What I want is to be able to save in the field the id of the last
>>> idexed doc. So in the next time I ejecute the indexer make it start
>>> indexing from that last indexed id doc.
>> You can use a Transformer to write something to the DB.
>> Context#getDataSource(String) for each row
>>
>> When I said:
>>
>>> be able to save in the field the id of the last idexed doc
>> I made a mistake, wanted to mean :
>>
>> be able to save in the file (dataimport.properties) the id of the last
>> indexed doc.
>> The point would be to do my own deltaquery indexing from the last doc
>> indexed id instead of the timestamp.
>> So I think this would not work in that case (it's my mistake because
>> of the bad explanation):
>>
>>>You can use a Transformer to write something to the DB.
>>>Context#getDataSource(String) for each row
>>
>> It is because I was saying:
>>> I think I should begin modifying the SolrWriter.java and
>>> DocBuilder.java.
>>> Creating functions like getStartTime, persistStartTime... for ID
>>> control
>>
>> I am in the correct direction?
>>  Sorry for my englis and thanks in advance
>>
>>
>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
>>> <[EMAIL PROTECTED]>
>>> wrote:

 Hey there,

 I have my dataimporthanlder almost completely configured. I am
 missing three goals. I don't think I can reach them just via xml
 conf or transformer and sqlEntitProcessor plugin. But need to be
 sure of that.
 If there's no other way I will hack some solr source classes, would
 like to know the best way to do that. Once I have it solved, I can
 upload or post the source in the forum in case someone think it can
 be helpful.

 1.- Every time I execute dataimporthandler (to index data from a
 db), at the start time or end time I need to delete some expired
 documents. I have to delete them from the database and fro

Re: Solr 1.3 - response time very long

2008-12-03 Thread sunnyfr

this is my error :
Caused by: java.net.SocketException: Unexpected end of file from server
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)

It's like it doesn't find data but it takes time to look for it ???


sunnyfr wrote:
> 
> Hi again,
> 
> In my test, I've maximum response time : 65 sec for an average at 3,
> So it might come some request which provide error, for exemple in my test
> for 50 000 requests I've around 30 requests which get back error, that's
> why the max time response is 65sec.
> 
> I just don't get why I've this error on some request. like :
> /test/selector?cache=0&backend=solr&request=/relevance/search/D
> /test/selector?cache=0&backend=solr&request=/relevance/search/?f+you
> /test/selector?cache=0&backend=solr&request=/relevance/search/?
> /test/selector?cache=0&backend=solr&request=/relevance/search/the
> /test/selector?cache=0&backend=solr&request=/relevance/search/?
> ...
> When I search it manually not by jMeter .. indeed it takes a long time and
> then it get back ids.
> What do you think?
> 
> Thanks a lot for your help.
> 
> 
> sunnyfr wrote:
>> 
>> Hi Matthew, Hi Yonik,
>> 
>> ...sorry for the flag .. didnt want to ...
>> 
>> Solr 1.3  / Apache 5.5
>> 
>> Data's directory size : 7.9G
>> I'm using jMeter to hit http request, I'm sending exactly the same on
>> solr and sphinx(mysql) by http either.
>> 
>> solr
>> http://test-search.com/test/selector?cache=0&backend=solr&request=/relevance/search/dog
>> sphinx
>> http://test-search.com/test/selector?cache=0&backend=mysql&request=/relevance/search/dog
>> 
>> when threads are more than 4 it's gettting slower, for a big test during
>> 40mn with increasing to 100 threads/sec for solr like for sphinx, at the
>> end the average for solr is 3sec and for sphinx 1sec.
>> 
>> solrconfig.xml :  http://www.nabble.com/file/p20802690/solrconf.xml
>> solrconf.xml 
>> 
>> schema.xml:
>>  
>> > stored="true"  omitNorms="true" />
>> 
>> > stored="false" omitNorms="true" />
>> > stored="true" omitNorms="true" />
>> > stored="false" omitNorms="true" />
>> > stored="true" omitNorms="true" />
>> > stored="false" omitNorms="true" />
>> > stored="false" omitNorms="true" />
>> > stored="true" omitNorms="true" />
>>...
>> > stored="true" omitNorms="true" />
>> > stored="false" omitNorms="true" />
>> > stored="false" omitNorms="true" />
>> > stored="false" omitNorms="true" />
>> > stored="false" omitNorms="true" />
>> > stored="false" omitNorms="true" />
>> > stored="false" omitNorms="true" />
>> > stored="false" omitNorms="true" />
>> ...
>> > indexed="true" stored="true" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> 
>> > stored="true" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> > indexed="true" stored="false" />
>> 
>> 
>>> multiValued="true"/>
>>> default="NOW" multiValued="false"/>
>>> multiValued="true"/>
>>
>>  
>> 
>> What would you reckon ???
>> Thanks a lot,
>> 
>> 
>> 
>> 
>> Matthew Runo wrote:
>>> 
>>> Could you provide more information? How big is the index? How are you  
>>> searching it? Some examples might help pin down the issue.
>>> 
>>> How long are the queries taking? How long did they take on Sphinx?
>>> 
>>> Thanks for your time!
>>> 
>>> Matthew Runo
>>> Software Engineer, Zappos.com
>>> [EMAIL PROTECTED] - 702-943-7833
>>> 
>>> On Dec 2, 2008, at 9:04 AM, sunnyfr wrote:
>>> 

 Hi,

 I tested my old search engine which is sphinx and my new one which  
 solr and
 I've got a uge difference of result.
 How can I make it faster?

 Thanks a lot,
 -- 
 View this message in context:
 http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20795134.html
 Sent from the Solr - User mailing list archive at Nabble.com.

>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20810804.html
Sent from the Solr - User mailing list archive at Nabble.com.



sql tables to XML(indexing SQL tables)

2008-12-03 Thread Neha Bhardwaj
I have just starting using  solr  and with the help of documentation
available I can't  figure out if Is there any way with which I can convert

SQL data into XML so that I can index them in solr.

 

 

Can anyone help me on that.

 

 

 


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: sql tables to XML(indexing SQL tables)

2008-12-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
Did you look at the DataImportHandler
http://wiki.apache.org/solr/DataImportHandler


On Wed, Dec 3, 2008 at 4:29 PM, Neha Bhardwaj
<[EMAIL PROTECTED]> wrote:
> I have just starting using  solr  and with the help of documentation
> available I can't  figure out if Is there any way with which I can convert
>
> SQL data into XML so that I can index them in solr.
>
>
>
>
>
> Can anyone help me on that.
>
>
>
>
>
>
>
>
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is the 
> property of Persistent Systems Ltd. It is intended only for the use of the 
> individual or entity to which it is addressed. If you are not the intended 
> recipient, you are not authorized to read, retain, copy, print, distribute or 
> use this message. If you have received this communication in error, please 
> notify the sender and delete all copies of this message. Persistent Systems 
> Ltd. does not accept any liability for virus infected mails.
>



-- 
--Noble Paul


Re: Multi Language Search

2008-12-03 Thread Shalin Shekhar Mangar
Option 1 is correct.

On Tue, Dec 2, 2008 at 3:22 PM, tushar kapoor <
[EMAIL PROTECTED]> wrote:

>
> Hi,
>
> Before I start with Solr specific question, there is one thing I need to
> get
> information on.
>
> If I am a Russian user on a Russian Website & I want to search for indexes
> having two Russian words how is the query term going to look like.
>
> 1.  AND 
>
> or rather,
>
> 2 .   
>
> Now over to solr specific question. In case the answer to above is either
> 1.
> or 2. how does one do it using Solr. I tried using the Language anallyzers
> but I m not too sure how exactly it works.
>
> Regards,
> Tushar.
> --
> View this message in context:
> http://www.nabble.com/Multi-Language-Search-tp20789025p20789025.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Query ID range? possible?

2008-12-03 Thread Shalin Shekhar Mangar
On Wed, Dec 3, 2008 at 6:16 AM, <[EMAIL PROTECTED]> wrote:

> We are using Solr and would like to know is there a query syntax to
> retrieve the newest x records? in decending order.?


Not out of the box. You can keep a new field in the schema of date type with
default value of "NOW". Then you can ask for documents sorted desc by this
field.


>
> Our id field is simply that (unique id record identifier)  so ideally we
> would want to get the last say 100 records added.
>
> Possible?
>
> Also is there a special way it needs to be defined in the schema?
>  id
>   required="true"  omitNorms="false" />
>

A word of caution. Don't keep the uniqueKey as a text type because (if you
are haven't modified the example schema) text type is tokenized. Keep it as
a string type.


> In addition, what if we want the last 100 records added (order by id desc)
> and another field.. say media type A for example
>  omitNorms="true" required="false"/>
>

Same as my first suggestion. Keep a date field which defaults to NOW, sort
desc by this field and additionally add your own sort fields.

-- 
Regards,
Shalin Shekhar Mangar.


disappearing index

2008-12-03 Thread Justin
I built up two indexes using a multicore configuration
one containing 52,000+ documents and the other over 10 million,  the entire
indexing process showed now errors.

The server crashed over night, well after the indexing had completed, and
now no documents are reported for either index.

This despite the fact that the core's both have huge /data folders.  (one is
1.5GB the other is 8.5GB).

Any ideas?


boost field which are not stored

2008-12-03 Thread sunnyfr

Hi,

I would like to know if it's a problem, I've around 50 fields and I just
need back the id.
I would like to know if I need to store field which needs to be boost by qf
or bf in dismax?
I stored language titles .. and description and my data folder now is 8G, it
takes sometimes long time to get back data in multi thread ... is there a
link ...? is it better to store data or instead no ?? 
Should I limit my boost because it looks like :
select/?qt=dismax&fl=id,score,
language,title,status_official,stat_views&q=svr09+tutorial&debugQuery=true&qf=title_en+title^1.1+status_official^2.2+status_creative^1.4+description&bf=recip(rord(created),1,10,10)^25+pow(stat_views,0.1)^4

Maybe its too much .. ???
thanks a lot,

-- 
View this message in context: 
http://www.nabble.com/boost-field-which-are-not-stored-tp20815036p20815036.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: disappearing index

2008-12-03 Thread Toby Cole

Could be that all your documents have not yet been committed.
Have you tried running a commit?

On 3 Dec 2008, at 15:00, Justin wrote:


I built up two indexes using a multicore configuration
one containing 52,000+ documents and the other over 10 million,  the  
entire

indexing process showed now errors.

The server crashed over night, well after the indexing had  
completed, and

now no documents are reported for either index.

This despite the fact that the core's both have huge /data folders.   
(one is

1.5GB the other is 8.5GB).

Any ideas?


Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: [EMAIL PROTECTED]
W: www.semantico.com



Re: Solr 1.3 - response time very long

2008-12-03 Thread Matthew Runo
Are you manipulating the query at all between the url like /test/ 
selector?cache=0&backend=solr&request=/relevance/search/D and what  
gets sent to Solr? To me, those don't look like solr requests (I could  
be missing something though). I'd be curious to see the actual  
requests to try and let you know why you're getting an error (what  
error is it giving you?).


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Dec 3, 2008, at 1:02 AM, sunnyfr wrote:



Hi again,

In my test, I've maximum response time : 65 sec for an average at 3,
So it might come some request which provide error, for exemple in my  
test
for 50 000 requests I've around 30 requests which get back error,  
that's why

the max time response is 65sec.

I just don't get why I've this error on some request. like :
/test/selector?cache=0&backend=solr&request=/relevance/search/D
/test/selector?cache=0&backend=solr&request=/relevance/search/?f+you
/test/selector?cache=0&backend=solr&request=/relevance/search/?
/test/selector?cache=0&backend=solr&request=/relevance/search/the
/test/selector?cache=0&backend=solr&request=/relevance/search/?
...
When I search it manually not by jMeter .. indeed it takes a long  
time and

then it get back ids.
What do you think?

Thanks a lot for your help.


sunnyfr wrote:


Hi Matthew, Hi Yonik,

...sorry for the flag .. didnt want to ...

Solr 1.3  / Apache 5.5

Data's directory size : 7.9G
I'm using jMeter to hit http request, I'm sending exactly the same  
on solr

and sphinx(mysql) by http either.

solr
http://test-search.com/test/selector?cache=0&backend=solr&request=/relevance/search/dog
sphinx
http://test-search.com/test/selector?cache=0&backend=mysql&request=/relevance/search/dog

when threads are more than 4 it's gettting slower, for a big test  
during
40mn with increasing to 100 threads/sec for solr like for sphinx,  
at the

end the average for solr is 3sec and for sphinx 1sec.

solrconfig.xml :  http://www.nabble.com/file/p20802690/solrconf.xml
solrconf.xml

schema.xml:

   indexed="true"

stored="true"  omitNorms="true" />

   indexed="true"

stored="false" omitNorms="true" />
   indexed="true"

stored="true" omitNorms="true" />
   indexed="true"

stored="false" omitNorms="true" />
   indexed="true"

stored="true" omitNorms="true" />
   indexed="true"

stored="false" omitNorms="true" />
   indexed="true"

stored="false" omitNorms="true" />
   indexed="true"

stored="true" omitNorms="true" />
  ...
   indexed="true"

stored="true" omitNorms="true" />
   indexed="true"

stored="false" omitNorms="true" />
   indexed="true"

stored="false" omitNorms="true" />
   indexed="true"

stored="false" omitNorms="true" />
   indexed="true"

stored="false" omitNorms="true" />
   indexed="true"

stored="false" omitNorms="true" />
   indexed="true"

stored="false" omitNorms="true" />
   indexed="true"

stored="false" omitNorms="true" />
...
   
   
   
   
   
   
   
   
   
   
   

   indexed="true"

stored="true" />
   
   
   
   
   
   
   
   
   


  
  
  
  


What would you reckon ???
Thanks a lot,




Matthew Runo wrote:


Could you provide more information? How big is the index? How are  
you

searching it? Some examples might help pin down the issue.

How long are the queries taking? How long did they take on Sphinx?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Dec 2, 2008, at 9:04 AM, sunnyfr wrote:



Hi,

I tested my old search engine which is sphinx and my new one which
solr and
I've got a uge difference of result.
How can I make it faster?

Thanks a lot,
--
View this message in context:
http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20795134.html
Sent from the Solr - User mailing list archive at Nabble.com.










--
View this message in context: 
http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20809121.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Solr 1.3 - response time very long

2008-12-03 Thread sunnyfr

Sorry the request is more :

/select?q=text:"svr09\+tutorial"+AND+status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_explicit:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_read
or even I tried :

select/?qt=dismax&fl=id,score,%20language,title,status_official,stat_views&q=svr09+tutorial&debugQuery=true&qf=title_en+title^1.1+status_official^2.2+status_creative^1.4+description&bf=recip(rord(created),1,10,10)^25+pow(stat_views,0.1)^4

Thanks Matthew



Matthew Runo wrote:
> 
> Are you manipulating the query at all between the url like /test/ 
> selector?cache=0&backend=solr&request=/relevance/search/D and what  
> gets sent to Solr? To me, those don't look like solr requests (I could  
> be missing something though). I'd be curious to see the actual  
> requests to try and let you know why you're getting an error (what  
> error is it giving you?).
> 
> Thanks for your time!
> 
> Matthew Runo
> Software Engineer, Zappos.com
> [EMAIL PROTECTED] - 702-943-7833
> 
> On Dec 3, 2008, at 1:02 AM, sunnyfr wrote:
> 
>>
>> Hi again,
>>
>> In my test, I've maximum response time : 65 sec for an average at 3,
>> So it might come some request which provide error, for exemple in my  
>> test
>> for 50 000 requests I've around 30 requests which get back error,  
>> that's why
>> the max time response is 65sec.
>>
>> I just don't get why I've this error on some request. like :
>> /test/selector?cache=0&backend=solr&request=/relevance/search/D
>> /test/selector?cache=0&backend=solr&request=/relevance/search/?f+you
>> /test/selector?cache=0&backend=solr&request=/relevance/search/?
>> /test/selector?cache=0&backend=solr&request=/relevance/search/the
>> /test/selector?cache=0&backend=solr&request=/relevance/search/?
>> ...
>> When I search it manually not by jMeter .. indeed it takes a long  
>> time and
>> then it get back ids.
>> What do you think?
>>
>> Thanks a lot for your help.
>>
>>
>> sunnyfr wrote:
>>>
>>> Hi Matthew, Hi Yonik,
>>>
>>> ...sorry for the flag .. didnt want to ...
>>>
>>> Solr 1.3  / Apache 5.5
>>>
>>> Data's directory size : 7.9G
>>> I'm using jMeter to hit http request, I'm sending exactly the same  
>>> on solr
>>> and sphinx(mysql) by http either.
>>>
>>> solr
>>> http://test-search.com/test/selector?cache=0&backend=solr&request=/relevance/search/dog
>>> sphinx
>>> http://test-search.com/test/selector?cache=0&backend=mysql&request=/relevance/search/dog
>>>
>>> when threads are more than 4 it's gettting slower, for a big test  
>>> during
>>> 40mn with increasing to 100 threads/sec for solr like for sphinx,  
>>> at the
>>> end the average for solr is 3sec and for sphinx 1sec.
>>>
>>> solrconfig.xml :  http://www.nabble.com/file/p20802690/solrconf.xml
>>> solrconf.xml
>>>
>>> schema.xml:
>>> 
>>>>> indexed="true"
>>> stored="true"  omitNorms="true" />
>>>
>>>>> indexed="true"
>>> stored="false" omitNorms="true" />
>>>>> indexed="true"
>>> stored="true" omitNorms="true" />
>>>>> indexed="true"
>>> stored="false" omitNorms="true" />
>>>>> indexed="true"
>>> stored="true" omitNorms="true" />
>>>>> indexed="true"
>>> stored="false" omitNorms="true" />
>>>>> indexed="true"
>>> stored="false" omitNorms="true" />
>>>>> indexed="true"
>>> stored="true" omitNorms="true" />
>>>   ...
>>>>> indexed="true"
>>> stored="true" omitNorms="true" />
>>>>> indexed="true"
>>> stored="false" omitNorms="true" />
>>>>> indexed="true"
>>> stored="false" omitNorms="true" />
>>>>> indexed="true"
>>> stored="false" omitNorms="true" />
>>>>> indexed="true"
>>> stored="false" omitNorms="true" />
>>>>> indexed="true"
>>> stored="false" omitNorms="true" />
>>>>> indexed="true"
>>> stored="false" omitNorms="true" />
>>>>> indexed="true"
>>> stored="false" omitNorms="true" />
>>> ...
>>>>> indexed="true" stored="true" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>
>>>>> indexed="true"
>>> stored="true" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>>>> indexed="true" stored="false" />
>>> 
>>>
>>>   >> multiValued="true"/>
>>>   >> default="NOW" multiValued="false"/>
>>>   >> multiValued="true"/>
>>>   
>>> 
>>>
>>> What would you reckon ???
>>> Thanks a lot,
>>>
>>>
>>>
>>>
>>> Matthew Runo wrote:

 Could you prov

Newbie question - using existing Lucene Index

2008-12-03 Thread Sudarsan, Sithu D.
Hi All,

Using Lucene, index has been created. It has five different fields.

How to just use those index from SOLR for searching? I tried changing
the schema as in tutorial, and copied the index to the data directory,
but all searches return empty and no error message!

Is there a sample project available which shows using tomcat as the web
engine rather than jetty?

Your help is appreciated,
Sincerely,
Sithu D Sudarsan

ORISE Fellow, DESE/OSEL/CDRH
WO62 - 3209
&
GRA, UALR

[EMAIL PROTECTED]
[EMAIL PROTECTED]



Query performance insight ...

2008-12-03 Thread souravm
Hi All,

Though my testing I found that query performance, when it is not served from 
cache, is largely depending on number of hits and concurrent number of queries. 
And in both the cases the query is essentially CPU bound.

Just wondering whether we can update this somewhere in Wiki as this would be 
very helpful for anyone planning to use Solr.

Regards,
Sourav
 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: Encoded search string & qt=Dismax

2008-12-03 Thread tushar kapoor

Hoss,

If the way I am doing it (Query 1) is a fluke, what is the correct way of
doing it? Seems like there is something fundamental that I am missing.

It would be great if you could list down the steps required to support multi
language search. Please provide some context on how exactly Language
analyzers are used.

I am attaching - 

http://www.nabble.com/file/p20817191/schema.xml schema.xml 
http://www.nabble.com/file/p20817191/solrconfig.xml solrconfig.xml 

Also, I am using a multicore setup with support for only one language per
core.
The field type on which I have applied language analyzer(Russian) is "text".

Regards,
Tushar.


hossman wrote:
> 
> 
> First of all...
> 
> standard request handler uses the default search field specified in your 
> schema.xml -- dismax does not.  dismax looks at the "qf" param to decide 
> which fields to search for the "q" param.  if you started with the example 
> schema the dismax handler may have a default value for "qf" which is 
> trying to query different fields then you actaully use in your documents.
> 
> &debugQuery=true will show you exactly what query structure (and on which 
> fields) each request is using.
> 
> Second...
> 
> I don't know Russian, and character encoding issues tend to make my head 
> spin, but the fact that the responseHeader is echoing back a q param 
> containing java string literal sequences suggests that you are doing 
> soemthing wrong.  you should be sending the URL encoding of the actaul 
> characters, not the URL encoding of the actual Russian word, not the URL 
> encoding or the java string literal encoding of the Russian word.  I 
> suspect the fact that you are getting any results at all from your first 
> query is a fluke.
> 
> The  in the responseHeader should show you the real word you 
> want to search for -- once it does, then you'll know that you have the 
> URL+UTF8 encoding issues straightened out.  *THEN* i would worry about the 
> dismax/standard behavior.
> 
> :  
> ::
> name="q">\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435
>  
> :   
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Encoded--search-string---qt%3DDismax-tp20797703p20817191.html
Sent from the Solr - User mailing list archive at Nabble.com.



Ordering updates

2008-12-03 Thread Laurence Rowe
Hi,

Our CMS is distributed over a cluster and I was wandering how I can
ensure that index records of newer versions of documents are never
overwritten by older ones. Amazon AWS uses a timestamp on requests to
ensure 'eventual consistency' of operations. Is there a way to supply
a transaction ID with an update so an update is conditional on the
supplied transaction id being greater than the existing indexed
transaction id?

Laurence


Re: Solr 1.3 - response time very long

2008-12-03 Thread Yonik Seeley
On Wed, Dec 3, 2008 at 11:49 AM, sunnyfr <[EMAIL PROTECTED]> wrote:
> Sorry the request is more :
>
/select?q=text:"svr09\+tutorial"+AND+status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_explicit:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_read
> or even I tried :

There are a bunch of things you could try to speed things up a bit:
1) optimize the index if you haven't
2) use a faster response writer with a more compact format (i.e. add
wt=javabin for a binary format or wt=json for JSON)
3) use fl (field list) to restrict the results to only the fields you need
4) never use debugQuery to benchmark performance (I don't think you
actually did, but you did list it in the example dismax URL)
5) pull out clauses that match many documents and that are common
across many queries into filters.

/select?q=text:"svr09\+tutorial"&fq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_explicit:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_read

You can also use multiple filter queries for better caching if some of
the clauses appear in smaller groups or in isolation.  If you can give
more examples, we can tell what the common parts are.

-Yonik


Re: Compiling Solr 1.3.0 + KStem

2008-12-03 Thread Rob Casson
i've experimented with the KStem stuff in the past, and just pulled a
fresh copy of solr from trunk

it looks like Hoss' suggestion #1 does the trick, by simply commenting
out the super.init call...loaded the example data, tested some
analysis, and it seems to work as before.

just a confirmation, and thanks,
rob

On Fri, Nov 28, 2008 at 6:18 PM, Chris Hostetter
<[EMAIL PROTECTED]> wrote:
>
> : /usr/local/build/apache-solr-1.3.0/src/java/org/apache/solr/analysis/
> : KStemFilterFactory.java:63:
> : cannot find symbol
> : [javac] symbol  : method
> : init(org.apache
> : .solr.core.SolrConfig,java.util.Map)
> : [javac] location: class org.apache.solr.analysis.BaseTokenFilterFactory
> : [javac] super.init(solrConfig, args);
> : [javac]  ^
>
> that KStemFilterFactory seems to be trying to use a method that existed
> for a while on the trunk, but was never released.
>
> i'm not familiary with KStemFilterFactory to know why/if it needs a
> SolrConfig, but a few things you can try...
>
> 1) if there are no references to solrConfig anywhere except the init
> method (and the super.init method it calls) just remove the refrences to
> it (so the methods just deal with the Map)
>
> 2) if there are other refrences to the solrConfig, they *may* just be to
> take advantage of ResourceLoader methods, so after making the changes
> above, make KStemFilterFactory "implements ResourceLoaderAware" and then
> add a method like this...
>
>  public void inform(ResourceLoader loader) {
>// code that used solrConfig should go here, but use loader
>  }
>
> ...it will get called after the init(Map) method and let
> KStemmFilterFactory get access to files on disk.
>
> 3) if that doesn't work ... i don't know what else to try (i'd need to get
> a lot more familiar with KStem to guess)
>
>
>
> -Hoss
>
>


Re: boost field which are not stored

2008-12-03 Thread Yonik Seeley
On Wed, Dec 3, 2008 at 10:25 AM, sunnyfr <[EMAIL PROTECTED]> wrote:
> I would like to know if it's a problem, I've around 50 fields and I just
> need back the id.
> I would like to know if I need to store field which needs to be boost by qf
> or bf in dismax?

Nope.  Searching/Querying is completely separate from retrieval of
stored  fields for the hits.
Index a field you want to search on (or facet by or sort by), and
store a field you want returned back.

> I stored language titles .. and description and my data folder now is 8G, it
> takes sometimes long time to get back data in multi thread ... is there a
> link ...?

If it's really mult-threaded related (many query threads executing at
once), the very latest nightly build may help with lock contention
while reading index files (provided you aren't running on Windows):
http://hudson.zones.apache.org/hudson/job/Solr-trunk/lastSuccessfulBuild/artifact/trunk/dist/

A little more about that at
http://yonik.wordpress.com/2008/12/01/solr-scalability-improvements/

-Yonik


> is it better to store data or instead no ??
> Should I limit my boost because it looks like :
> select/?qt=dismax&fl=id,score,
> language,title,status_official,stat_views&q=svr09+tutorial&debugQuery=true&qf=title_en+title^1.1+status_official^2.2+status_creative^1.4+description&bf=recip(rord(created),1,10,10)^25+pow(stat_views,0.1)^4
>
> Maybe its too much .. ???
> thanks a lot,