Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature
That's what I am trying to do. Thanks for the advice. Once I have it done I will rise the issue and upload the patch. Noble Paul നോബിള് नोब्ळ् wrote: > > OK . I guess I see it. I am thinking of exposing the writes to the > properties file via an API. > > say Context#persist(key,value); > > > This can write the data to the dataimport.properties. > > You must be able to retrieve that value by ${dataimport.persist.} > > or through an API, Context.getPersistValue(key) > > You can raise an issue and give a patch and we can get it committed > > I guess this is what you wish to achieve > > --Noble > > > > On Wed, Dec 3, 2008 at 3:28 AM, Marc Sturlese <[EMAIL PROTECTED]> > wrote: >> >> Do you mean the file used by dataimporthandler called >> dataimport.properties? >> If you mean this one it's writen at the end of the indexing proccess. The >> writen date will be used in the next indexation by delta-query to >> identify >> the new or modified rows from the database. >> >> What I am trying to do is instead of saving a timestamp save the last >> indexed id. Doing that, in the next execution I will start indexing from >> the >> last doc that was indexed in the previous indexation. But I am still a >> bit >> confused about how to do that... >> >> Noble Paul നോബിള് नोब्ळ् wrote: >>> >>> delta-import file? >>> >>> >>> On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <[EMAIL PROTECTED]> >>> wrote: Does the DIH delta feature rewrite the delta-import file for each set of rows? If it does not, that sounds like a bug/enhancement. Lance -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2008 8:51 AM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature You can write the details to a file using a Transformer itself. It is wise to stick to the public API as far as possible. We will maintain back compat and your code will be usable w/ newer versions. On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <[EMAIL PROTECTED]> wrote: > > Thanks I really apreciate your help. > > I didn't explain myself so well in here: > >> 2.-This is probably my most difficult goal. >> Deltaimport reads a timestamp from the dataimport.properties and >> modify/add all documents from db wich were inserted after that date. >> What I want is to be able to save in the field the id of the last >> idexed doc. So in the next time I ejecute the indexer make it start >> indexing from that last indexed id doc. > You can use a Transformer to write something to the DB. > Context#getDataSource(String) for each row > > When I said: > >> be able to save in the field the id of the last idexed doc > I made a mistake, wanted to mean : > > be able to save in the file (dataimport.properties) the id of the last > indexed doc. > The point would be to do my own deltaquery indexing from the last doc > indexed id instead of the timestamp. > So I think this would not work in that case (it's my mistake because > of the bad explanation): > >>You can use a Transformer to write something to the DB. >>Context#getDataSource(String) for each row > > It is because I was saying: >> I think I should begin modifying the SolrWriter.java and >> DocBuilder.java. >> Creating functions like getStartTime, persistStartTime... for ID >> control > > I am in the correct direction? > Sorry for my englis and thanks in advance > > > Noble Paul നോബിള് नोब्ळ् wrote: >> >> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese >> <[EMAIL PROTECTED]> >> wrote: >>> >>> Hey there, >>> >>> I have my dataimporthanlder almost completely configured. I am >>> missing three goals. I don't think I can reach them just via xml >>> conf or transformer and sqlEntitProcessor plugin. But need to be >>> sure of that. >>> If there's no other way I will hack some solr source classes, would >>> like to know the best way to do that. Once I have it solved, I can >>> upload or post the source in the forum in case someone think it can >>> be helpful. >>> >>> 1.- Every time I execute dataimporthandler (to index data from a >>> db), at the start time or end time I need to delete some expired >>> documents. I have to delete them from the database and from the >>> index. I know wich documents must be deleted because of a field in >>> the db that says it. Would not like to delete first all from DB or >>> first all from index but one from index and one from doc every time. >> >> You can override the init() destroy() of the SqlEntityProcessor and >> use it as the processor for the root entity. At this point you can >> run the necessary db queries a
Re: Solr 1.3 - response time very long
Hi again, In my test, I've maximum response time : 65 sec for an average at 3, So it might come some request which provide error, for exemple in my test for 50 000 requests I've around 30 requests which get back error, that's why the max time response is 65sec. I just don't get why I've this error on some request. like : /test/selector?cache=0&backend=solr&request=/relevance/search/D /test/selector?cache=0&backend=solr&request=/relevance/search/?f+you /test/selector?cache=0&backend=solr&request=/relevance/search/? /test/selector?cache=0&backend=solr&request=/relevance/search/the /test/selector?cache=0&backend=solr&request=/relevance/search/? ... When I search it manually not by jMeter .. indeed it takes a long time and then it get back ids. What do you think? Thanks a lot for your help. sunnyfr wrote: > > Hi Matthew, Hi Yonik, > > ...sorry for the flag .. didnt want to ... > > Solr 1.3 / Apache 5.5 > > Data's directory size : 7.9G > I'm using jMeter to hit http request, I'm sending exactly the same on solr > and sphinx(mysql) by http either. > > solr > http://test-search.com/test/selector?cache=0&backend=solr&request=/relevance/search/dog > sphinx > http://test-search.com/test/selector?cache=0&backend=mysql&request=/relevance/search/dog > > when threads are more than 4 it's gettting slower, for a big test during > 40mn with increasing to 100 threads/sec for solr like for sphinx, at the > end the average for solr is 3sec and for sphinx 1sec. > > solrconfig.xml : http://www.nabble.com/file/p20802690/solrconf.xml > solrconf.xml > > schema.xml: > > stored="true" omitNorms="true" /> > > stored="false" omitNorms="true" /> > stored="true" omitNorms="true" /> > stored="false" omitNorms="true" /> > stored="true" omitNorms="true" /> > stored="false" omitNorms="true" /> > stored="false" omitNorms="true" /> > stored="true" omitNorms="true" /> >... > stored="true" omitNorms="true" /> > stored="false" omitNorms="true" /> > stored="false" omitNorms="true" /> > stored="false" omitNorms="true" /> > stored="false" omitNorms="true" /> > stored="false" omitNorms="true" /> > stored="false" omitNorms="true" /> > stored="false" omitNorms="true" /> > ... > indexed="true" stored="true" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > > stored="true" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > indexed="true" stored="false" /> > > > multiValued="true"/> > default="NOW" multiValued="false"/> > multiValued="true"/> > > > > What would you reckon ??? > Thanks a lot, > > > > > Matthew Runo wrote: >> >> Could you provide more information? How big is the index? How are you >> searching it? Some examples might help pin down the issue. >> >> How long are the queries taking? How long did they take on Sphinx? >> >> Thanks for your time! >> >> Matthew Runo >> Software Engineer, Zappos.com >> [EMAIL PROTECTED] - 702-943-7833 >> >> On Dec 2, 2008, at 9:04 AM, sunnyfr wrote: >> >>> >>> Hi, >>> >>> I tested my old search engine which is sphinx and my new one which >>> solr and >>> I've got a uge difference of result. >>> How can I make it faster? >>> >>> Thanks a lot, >>> -- >>> View this message in context: >>> http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20795134.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> >> > > -- View this message in context: http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20809121.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature
Good. We need usecases like these and contributions from users . This is a win-win you will not have to manage the code yourself once it is checked in As we have more eyes on the DIH code it will also improve Thanks a lot, Noble On Wed, Dec 3, 2008 at 1:49 PM, Marc Sturlese <[EMAIL PROTECTED]> wrote: > > That's what I am trying to do. Thanks for the advice. Once I have it done I > will rise the issue and upload the patch. > > > Noble Paul നോബിള് नोब्ळ् wrote: >> >> OK . I guess I see it. I am thinking of exposing the writes to the >> properties file via an API. >> >> say Context#persist(key,value); >> >> >> This can write the data to the dataimport.properties. >> >> You must be able to retrieve that value by ${dataimport.persist.} >> >> or through an API, Context.getPersistValue(key) >> >> You can raise an issue and give a patch and we can get it committed >> >> I guess this is what you wish to achieve >> >> --Noble >> >> >> >> On Wed, Dec 3, 2008 at 3:28 AM, Marc Sturlese <[EMAIL PROTECTED]> >> wrote: >>> >>> Do you mean the file used by dataimporthandler called >>> dataimport.properties? >>> If you mean this one it's writen at the end of the indexing proccess. The >>> writen date will be used in the next indexation by delta-query to >>> identify >>> the new or modified rows from the database. >>> >>> What I am trying to do is instead of saving a timestamp save the last >>> indexed id. Doing that, in the next execution I will start indexing from >>> the >>> last doc that was indexed in the previous indexation. But I am still a >>> bit >>> confused about how to do that... >>> >>> Noble Paul നോബിള് नोब्ळ् wrote: delta-import file? On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Does the DIH delta feature rewrite the delta-import file for each set > of > rows? If it does not, that sounds like a bug/enhancement. > Lance > > -Original Message- > From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 02, 2008 8:51 AM > To: solr-user@lucene.apache.org > Subject: Re: DataImportHandler: Deleteing from index and db; > lastIndexed > id feature > > You can write the details to a file using a Transformer itself. > > It is wise to stick to the public API as far as possible. We will > maintain back compat and your code will be usable w/ newer versions. > > > On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <[EMAIL PROTECTED]> > wrote: >> >> Thanks I really apreciate your help. >> >> I didn't explain myself so well in here: >> >>> 2.-This is probably my most difficult goal. >>> Deltaimport reads a timestamp from the dataimport.properties and >>> modify/add all documents from db wich were inserted after that date. >>> What I want is to be able to save in the field the id of the last >>> idexed doc. So in the next time I ejecute the indexer make it start >>> indexing from that last indexed id doc. >> You can use a Transformer to write something to the DB. >> Context#getDataSource(String) for each row >> >> When I said: >> >>> be able to save in the field the id of the last idexed doc >> I made a mistake, wanted to mean : >> >> be able to save in the file (dataimport.properties) the id of the last >> indexed doc. >> The point would be to do my own deltaquery indexing from the last doc >> indexed id instead of the timestamp. >> So I think this would not work in that case (it's my mistake because >> of the bad explanation): >> >>>You can use a Transformer to write something to the DB. >>>Context#getDataSource(String) for each row >> >> It is because I was saying: >>> I think I should begin modifying the SolrWriter.java and >>> DocBuilder.java. >>> Creating functions like getStartTime, persistStartTime... for ID >>> control >> >> I am in the correct direction? >> Sorry for my englis and thanks in advance >> >> >> Noble Paul നോബിള് नोब्ळ् wrote: >>> >>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese >>> <[EMAIL PROTECTED]> >>> wrote: Hey there, I have my dataimporthanlder almost completely configured. I am missing three goals. I don't think I can reach them just via xml conf or transformer and sqlEntitProcessor plugin. But need to be sure of that. If there's no other way I will hack some solr source classes, would like to know the best way to do that. Once I have it solved, I can upload or post the source in the forum in case someone think it can be helpful. 1.- Every time I execute dataimporthandler (to index data from a db), at the start time or end time I need to delete some expired documents. I have to delete them from the database and fro
Re: Solr 1.3 - response time very long
this is my error : Caused by: java.net.SocketException: Unexpected end of file from server at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) It's like it doesn't find data but it takes time to look for it ??? sunnyfr wrote: > > Hi again, > > In my test, I've maximum response time : 65 sec for an average at 3, > So it might come some request which provide error, for exemple in my test > for 50 000 requests I've around 30 requests which get back error, that's > why the max time response is 65sec. > > I just don't get why I've this error on some request. like : > /test/selector?cache=0&backend=solr&request=/relevance/search/D > /test/selector?cache=0&backend=solr&request=/relevance/search/?f+you > /test/selector?cache=0&backend=solr&request=/relevance/search/? > /test/selector?cache=0&backend=solr&request=/relevance/search/the > /test/selector?cache=0&backend=solr&request=/relevance/search/? > ... > When I search it manually not by jMeter .. indeed it takes a long time and > then it get back ids. > What do you think? > > Thanks a lot for your help. > > > sunnyfr wrote: >> >> Hi Matthew, Hi Yonik, >> >> ...sorry for the flag .. didnt want to ... >> >> Solr 1.3 / Apache 5.5 >> >> Data's directory size : 7.9G >> I'm using jMeter to hit http request, I'm sending exactly the same on >> solr and sphinx(mysql) by http either. >> >> solr >> http://test-search.com/test/selector?cache=0&backend=solr&request=/relevance/search/dog >> sphinx >> http://test-search.com/test/selector?cache=0&backend=mysql&request=/relevance/search/dog >> >> when threads are more than 4 it's gettting slower, for a big test during >> 40mn with increasing to 100 threads/sec for solr like for sphinx, at the >> end the average for solr is 3sec and for sphinx 1sec. >> >> solrconfig.xml : http://www.nabble.com/file/p20802690/solrconf.xml >> solrconf.xml >> >> schema.xml: >> >> > stored="true" omitNorms="true" /> >> >> > stored="false" omitNorms="true" /> >> > stored="true" omitNorms="true" /> >> > stored="false" omitNorms="true" /> >> > stored="true" omitNorms="true" /> >> > stored="false" omitNorms="true" /> >> > stored="false" omitNorms="true" /> >> > stored="true" omitNorms="true" /> >>... >> > stored="true" omitNorms="true" /> >> > stored="false" omitNorms="true" /> >> > stored="false" omitNorms="true" /> >> > stored="false" omitNorms="true" /> >> > stored="false" omitNorms="true" /> >> > stored="false" omitNorms="true" /> >> > stored="false" omitNorms="true" /> >> > stored="false" omitNorms="true" /> >> ... >> > indexed="true" stored="true" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> >> > stored="true" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> > indexed="true" stored="false" /> >> >> >>> multiValued="true"/> >>> default="NOW" multiValued="false"/> >>> multiValued="true"/> >> >> >> >> What would you reckon ??? >> Thanks a lot, >> >> >> >> >> Matthew Runo wrote: >>> >>> Could you provide more information? How big is the index? How are you >>> searching it? Some examples might help pin down the issue. >>> >>> How long are the queries taking? How long did they take on Sphinx? >>> >>> Thanks for your time! >>> >>> Matthew Runo >>> Software Engineer, Zappos.com >>> [EMAIL PROTECTED] - 702-943-7833 >>> >>> On Dec 2, 2008, at 9:04 AM, sunnyfr wrote: >>> Hi, I tested my old search engine which is sphinx and my new one which solr and I've got a uge difference of result. How can I make it faster? Thanks a lot, -- View this message in context: http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20795134.html Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >>> >> >> > > -- View this message in context: http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20810804.html Sent from the Solr - User mailing list archive at Nabble.com.
sql tables to XML(indexing SQL tables)
I have just starting using solr and with the help of documentation available I can't figure out if Is there any way with which I can convert SQL data into XML so that I can index them in solr. Can anyone help me on that. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: sql tables to XML(indexing SQL tables)
Did you look at the DataImportHandler http://wiki.apache.org/solr/DataImportHandler On Wed, Dec 3, 2008 at 4:29 PM, Neha Bhardwaj <[EMAIL PROTECTED]> wrote: > I have just starting using solr and with the help of documentation > available I can't figure out if Is there any way with which I can convert > > SQL data into XML so that I can index them in solr. > > > > > > Can anyone help me on that. > > > > > > > > > DISCLAIMER > == > This e-mail may contain privileged and confidential information which is the > property of Persistent Systems Ltd. It is intended only for the use of the > individual or entity to which it is addressed. If you are not the intended > recipient, you are not authorized to read, retain, copy, print, distribute or > use this message. If you have received this communication in error, please > notify the sender and delete all copies of this message. Persistent Systems > Ltd. does not accept any liability for virus infected mails. > -- --Noble Paul
Re: Multi Language Search
Option 1 is correct. On Tue, Dec 2, 2008 at 3:22 PM, tushar kapoor < [EMAIL PROTECTED]> wrote: > > Hi, > > Before I start with Solr specific question, there is one thing I need to > get > information on. > > If I am a Russian user on a Russian Website & I want to search for indexes > having two Russian words how is the query term going to look like. > > 1. AND > > or rather, > > 2 . > > Now over to solr specific question. In case the answer to above is either > 1. > or 2. how does one do it using Solr. I tried using the Language anallyzers > but I m not too sure how exactly it works. > > Regards, > Tushar. > -- > View this message in context: > http://www.nabble.com/Multi-Language-Search-tp20789025p20789025.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.
Re: Query ID range? possible?
On Wed, Dec 3, 2008 at 6:16 AM, <[EMAIL PROTECTED]> wrote: > We are using Solr and would like to know is there a query syntax to > retrieve the newest x records? in decending order.? Not out of the box. You can keep a new field in the schema of date type with default value of "NOW". Then you can ask for documents sorted desc by this field. > > Our id field is simply that (unique id record identifier) so ideally we > would want to get the last say 100 records added. > > Possible? > > Also is there a special way it needs to be defined in the schema? > id > required="true" omitNorms="false" /> > A word of caution. Don't keep the uniqueKey as a text type because (if you are haven't modified the example schema) text type is tokenized. Keep it as a string type. > In addition, what if we want the last 100 records added (order by id desc) > and another field.. say media type A for example > omitNorms="true" required="false"/> > Same as my first suggestion. Keep a date field which defaults to NOW, sort desc by this field and additionally add your own sort fields. -- Regards, Shalin Shekhar Mangar.
disappearing index
I built up two indexes using a multicore configuration one containing 52,000+ documents and the other over 10 million, the entire indexing process showed now errors. The server crashed over night, well after the indexing had completed, and now no documents are reported for either index. This despite the fact that the core's both have huge /data folders. (one is 1.5GB the other is 8.5GB). Any ideas?
boost field which are not stored
Hi, I would like to know if it's a problem, I've around 50 fields and I just need back the id. I would like to know if I need to store field which needs to be boost by qf or bf in dismax? I stored language titles .. and description and my data folder now is 8G, it takes sometimes long time to get back data in multi thread ... is there a link ...? is it better to store data or instead no ?? Should I limit my boost because it looks like : select/?qt=dismax&fl=id,score, language,title,status_official,stat_views&q=svr09+tutorial&debugQuery=true&qf=title_en+title^1.1+status_official^2.2+status_creative^1.4+description&bf=recip(rord(created),1,10,10)^25+pow(stat_views,0.1)^4 Maybe its too much .. ??? thanks a lot, -- View this message in context: http://www.nabble.com/boost-field-which-are-not-stored-tp20815036p20815036.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: disappearing index
Could be that all your documents have not yet been committed. Have you tried running a commit? On 3 Dec 2008, at 15:00, Justin wrote: I built up two indexes using a multicore configuration one containing 52,000+ documents and the other over 10 million, the entire indexing process showed now errors. The server crashed over night, well after the indexing had completed, and now no documents are reported for either index. This despite the fact that the core's both have huge /data folders. (one is 1.5GB the other is 8.5GB). Any ideas? Toby Cole Software Engineer Semantico Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE T: +44 (0)1273 358 238 F: +44 (0)1273 723 232 E: [EMAIL PROTECTED] W: www.semantico.com
Re: Solr 1.3 - response time very long
Are you manipulating the query at all between the url like /test/ selector?cache=0&backend=solr&request=/relevance/search/D and what gets sent to Solr? To me, those don't look like solr requests (I could be missing something though). I'd be curious to see the actual requests to try and let you know why you're getting an error (what error is it giving you?). Thanks for your time! Matthew Runo Software Engineer, Zappos.com [EMAIL PROTECTED] - 702-943-7833 On Dec 3, 2008, at 1:02 AM, sunnyfr wrote: Hi again, In my test, I've maximum response time : 65 sec for an average at 3, So it might come some request which provide error, for exemple in my test for 50 000 requests I've around 30 requests which get back error, that's why the max time response is 65sec. I just don't get why I've this error on some request. like : /test/selector?cache=0&backend=solr&request=/relevance/search/D /test/selector?cache=0&backend=solr&request=/relevance/search/?f+you /test/selector?cache=0&backend=solr&request=/relevance/search/? /test/selector?cache=0&backend=solr&request=/relevance/search/the /test/selector?cache=0&backend=solr&request=/relevance/search/? ... When I search it manually not by jMeter .. indeed it takes a long time and then it get back ids. What do you think? Thanks a lot for your help. sunnyfr wrote: Hi Matthew, Hi Yonik, ...sorry for the flag .. didnt want to ... Solr 1.3 / Apache 5.5 Data's directory size : 7.9G I'm using jMeter to hit http request, I'm sending exactly the same on solr and sphinx(mysql) by http either. solr http://test-search.com/test/selector?cache=0&backend=solr&request=/relevance/search/dog sphinx http://test-search.com/test/selector?cache=0&backend=mysql&request=/relevance/search/dog when threads are more than 4 it's gettting slower, for a big test during 40mn with increasing to 100 threads/sec for solr like for sphinx, at the end the average for solr is 3sec and for sphinx 1sec. solrconfig.xml : http://www.nabble.com/file/p20802690/solrconf.xml solrconf.xml schema.xml: indexed="true" stored="true" omitNorms="true" /> indexed="true" stored="false" omitNorms="true" /> indexed="true" stored="true" omitNorms="true" /> indexed="true" stored="false" omitNorms="true" /> indexed="true" stored="true" omitNorms="true" /> indexed="true" stored="false" omitNorms="true" /> indexed="true" stored="false" omitNorms="true" /> indexed="true" stored="true" omitNorms="true" /> ... indexed="true" stored="true" omitNorms="true" /> indexed="true" stored="false" omitNorms="true" /> indexed="true" stored="false" omitNorms="true" /> indexed="true" stored="false" omitNorms="true" /> indexed="true" stored="false" omitNorms="true" /> indexed="true" stored="false" omitNorms="true" /> indexed="true" stored="false" omitNorms="true" /> indexed="true" stored="false" omitNorms="true" /> ... indexed="true" stored="true" /> What would you reckon ??? Thanks a lot, Matthew Runo wrote: Could you provide more information? How big is the index? How are you searching it? Some examples might help pin down the issue. How long are the queries taking? How long did they take on Sphinx? Thanks for your time! Matthew Runo Software Engineer, Zappos.com [EMAIL PROTECTED] - 702-943-7833 On Dec 2, 2008, at 9:04 AM, sunnyfr wrote: Hi, I tested my old search engine which is sphinx and my new one which solr and I've got a uge difference of result. How can I make it faster? Thanks a lot, -- View this message in context: http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20795134.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20809121.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.3 - response time very long
Sorry the request is more : /select?q=text:"svr09\+tutorial"+AND+status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_explicit:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_read or even I tried : select/?qt=dismax&fl=id,score,%20language,title,status_official,stat_views&q=svr09+tutorial&debugQuery=true&qf=title_en+title^1.1+status_official^2.2+status_creative^1.4+description&bf=recip(rord(created),1,10,10)^25+pow(stat_views,0.1)^4 Thanks Matthew Matthew Runo wrote: > > Are you manipulating the query at all between the url like /test/ > selector?cache=0&backend=solr&request=/relevance/search/D and what > gets sent to Solr? To me, those don't look like solr requests (I could > be missing something though). I'd be curious to see the actual > requests to try and let you know why you're getting an error (what > error is it giving you?). > > Thanks for your time! > > Matthew Runo > Software Engineer, Zappos.com > [EMAIL PROTECTED] - 702-943-7833 > > On Dec 3, 2008, at 1:02 AM, sunnyfr wrote: > >> >> Hi again, >> >> In my test, I've maximum response time : 65 sec for an average at 3, >> So it might come some request which provide error, for exemple in my >> test >> for 50 000 requests I've around 30 requests which get back error, >> that's why >> the max time response is 65sec. >> >> I just don't get why I've this error on some request. like : >> /test/selector?cache=0&backend=solr&request=/relevance/search/D >> /test/selector?cache=0&backend=solr&request=/relevance/search/?f+you >> /test/selector?cache=0&backend=solr&request=/relevance/search/? >> /test/selector?cache=0&backend=solr&request=/relevance/search/the >> /test/selector?cache=0&backend=solr&request=/relevance/search/? >> ... >> When I search it manually not by jMeter .. indeed it takes a long >> time and >> then it get back ids. >> What do you think? >> >> Thanks a lot for your help. >> >> >> sunnyfr wrote: >>> >>> Hi Matthew, Hi Yonik, >>> >>> ...sorry for the flag .. didnt want to ... >>> >>> Solr 1.3 / Apache 5.5 >>> >>> Data's directory size : 7.9G >>> I'm using jMeter to hit http request, I'm sending exactly the same >>> on solr >>> and sphinx(mysql) by http either. >>> >>> solr >>> http://test-search.com/test/selector?cache=0&backend=solr&request=/relevance/search/dog >>> sphinx >>> http://test-search.com/test/selector?cache=0&backend=mysql&request=/relevance/search/dog >>> >>> when threads are more than 4 it's gettting slower, for a big test >>> during >>> 40mn with increasing to 100 threads/sec for solr like for sphinx, >>> at the >>> end the average for solr is 3sec and for sphinx 1sec. >>> >>> solrconfig.xml : http://www.nabble.com/file/p20802690/solrconf.xml >>> solrconf.xml >>> >>> schema.xml: >>> >>>>> indexed="true" >>> stored="true" omitNorms="true" /> >>> >>>>> indexed="true" >>> stored="false" omitNorms="true" /> >>>>> indexed="true" >>> stored="true" omitNorms="true" /> >>>>> indexed="true" >>> stored="false" omitNorms="true" /> >>>>> indexed="true" >>> stored="true" omitNorms="true" /> >>>>> indexed="true" >>> stored="false" omitNorms="true" /> >>>>> indexed="true" >>> stored="false" omitNorms="true" /> >>>>> indexed="true" >>> stored="true" omitNorms="true" /> >>> ... >>>>> indexed="true" >>> stored="true" omitNorms="true" /> >>>>> indexed="true" >>> stored="false" omitNorms="true" /> >>>>> indexed="true" >>> stored="false" omitNorms="true" /> >>>>> indexed="true" >>> stored="false" omitNorms="true" /> >>>>> indexed="true" >>> stored="false" omitNorms="true" /> >>>>> indexed="true" >>> stored="false" omitNorms="true" /> >>>>> indexed="true" >>> stored="false" omitNorms="true" /> >>>>> indexed="true" >>> stored="false" omitNorms="true" /> >>> ... >>>>> indexed="true" stored="true" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>> >>>>> indexed="true" >>> stored="true" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>>>> indexed="true" stored="false" /> >>> >>> >>> >> multiValued="true"/> >>> >> default="NOW" multiValued="false"/> >>> >> multiValued="true"/> >>> >>> >>> >>> What would you reckon ??? >>> Thanks a lot, >>> >>> >>> >>> >>> Matthew Runo wrote: Could you prov
Newbie question - using existing Lucene Index
Hi All, Using Lucene, index has been created. It has five different fields. How to just use those index from SOLR for searching? I tried changing the schema as in tutorial, and copied the index to the data directory, but all searches return empty and no error message! Is there a sample project available which shows using tomcat as the web engine rather than jetty? Your help is appreciated, Sincerely, Sithu D Sudarsan ORISE Fellow, DESE/OSEL/CDRH WO62 - 3209 & GRA, UALR [EMAIL PROTECTED] [EMAIL PROTECTED]
Query performance insight ...
Hi All, Though my testing I found that query performance, when it is not served from cache, is largely depending on number of hits and concurrent number of queries. And in both the cases the query is essentially CPU bound. Just wondering whether we can update this somewhere in Wiki as this would be very helpful for anyone planning to use Solr. Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Encoded search string & qt=Dismax
Hoss, If the way I am doing it (Query 1) is a fluke, what is the correct way of doing it? Seems like there is something fundamental that I am missing. It would be great if you could list down the steps required to support multi language search. Please provide some context on how exactly Language analyzers are used. I am attaching - http://www.nabble.com/file/p20817191/schema.xml schema.xml http://www.nabble.com/file/p20817191/solrconfig.xml solrconfig.xml Also, I am using a multicore setup with support for only one language per core. The field type on which I have applied language analyzer(Russian) is "text". Regards, Tushar. hossman wrote: > > > First of all... > > standard request handler uses the default search field specified in your > schema.xml -- dismax does not. dismax looks at the "qf" param to decide > which fields to search for the "q" param. if you started with the example > schema the dismax handler may have a default value for "qf" which is > trying to query different fields then you actaully use in your documents. > > &debugQuery=true will show you exactly what query structure (and on which > fields) each request is using. > > Second... > > I don't know Russian, and character encoding issues tend to make my head > spin, but the fact that the responseHeader is echoing back a q param > containing java string literal sequences suggests that you are doing > soemthing wrong. you should be sending the URL encoding of the actaul > characters, not the URL encoding of the actual Russian word, not the URL > encoding or the java string literal encoding of the Russian word. I > suspect the fact that you are getting any results at all from your first > query is a fluke. > > The in the responseHeader should show you the real word you > want to search for -- once it does, then you'll know that you have the > URL+UTF8 encoding issues straightened out. *THEN* i would worry about the > dismax/standard behavior. > > : > :: > name="q">\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 > > : > > > -Hoss > > > -- View this message in context: http://www.nabble.com/Encoded--search-string---qt%3DDismax-tp20797703p20817191.html Sent from the Solr - User mailing list archive at Nabble.com.
Ordering updates
Hi, Our CMS is distributed over a cluster and I was wandering how I can ensure that index records of newer versions of documents are never overwritten by older ones. Amazon AWS uses a timestamp on requests to ensure 'eventual consistency' of operations. Is there a way to supply a transaction ID with an update so an update is conditional on the supplied transaction id being greater than the existing indexed transaction id? Laurence
Re: Solr 1.3 - response time very long
On Wed, Dec 3, 2008 at 11:49 AM, sunnyfr <[EMAIL PROTECTED]> wrote: > Sorry the request is more : > /select?q=text:"svr09\+tutorial"+AND+status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_explicit:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_read > or even I tried : There are a bunch of things you could try to speed things up a bit: 1) optimize the index if you haven't 2) use a faster response writer with a more compact format (i.e. add wt=javabin for a binary format or wt=json for JSON) 3) use fl (field list) to restrict the results to only the fields you need 4) never use debugQuery to benchmark performance (I don't think you actually did, but you did list it in the example dismax URL) 5) pull out clauses that match many documents and that are common across many queries into filters. /select?q=text:"svr09\+tutorial"&fq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_explicit:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_read You can also use multiple filter queries for better caching if some of the clauses appear in smaller groups or in isolation. If you can give more examples, we can tell what the common parts are. -Yonik
Re: Compiling Solr 1.3.0 + KStem
i've experimented with the KStem stuff in the past, and just pulled a fresh copy of solr from trunk it looks like Hoss' suggestion #1 does the trick, by simply commenting out the super.init call...loaded the example data, tested some analysis, and it seems to work as before. just a confirmation, and thanks, rob On Fri, Nov 28, 2008 at 6:18 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : /usr/local/build/apache-solr-1.3.0/src/java/org/apache/solr/analysis/ > : KStemFilterFactory.java:63: > : cannot find symbol > : [javac] symbol : method > : init(org.apache > : .solr.core.SolrConfig,java.util.Map) > : [javac] location: class org.apache.solr.analysis.BaseTokenFilterFactory > : [javac] super.init(solrConfig, args); > : [javac] ^ > > that KStemFilterFactory seems to be trying to use a method that existed > for a while on the trunk, but was never released. > > i'm not familiary with KStemFilterFactory to know why/if it needs a > SolrConfig, but a few things you can try... > > 1) if there are no references to solrConfig anywhere except the init > method (and the super.init method it calls) just remove the refrences to > it (so the methods just deal with the Map) > > 2) if there are other refrences to the solrConfig, they *may* just be to > take advantage of ResourceLoader methods, so after making the changes > above, make KStemFilterFactory "implements ResourceLoaderAware" and then > add a method like this... > > public void inform(ResourceLoader loader) { >// code that used solrConfig should go here, but use loader > } > > ...it will get called after the init(Map) method and let > KStemmFilterFactory get access to files on disk. > > 3) if that doesn't work ... i don't know what else to try (i'd need to get > a lot more familiar with KStem to guess) > > > > -Hoss > >
Re: boost field which are not stored
On Wed, Dec 3, 2008 at 10:25 AM, sunnyfr <[EMAIL PROTECTED]> wrote: > I would like to know if it's a problem, I've around 50 fields and I just > need back the id. > I would like to know if I need to store field which needs to be boost by qf > or bf in dismax? Nope. Searching/Querying is completely separate from retrieval of stored fields for the hits. Index a field you want to search on (or facet by or sort by), and store a field you want returned back. > I stored language titles .. and description and my data folder now is 8G, it > takes sometimes long time to get back data in multi thread ... is there a > link ...? If it's really mult-threaded related (many query threads executing at once), the very latest nightly build may help with lock contention while reading index files (provided you aren't running on Windows): http://hudson.zones.apache.org/hudson/job/Solr-trunk/lastSuccessfulBuild/artifact/trunk/dist/ A little more about that at http://yonik.wordpress.com/2008/12/01/solr-scalability-improvements/ -Yonik > is it better to store data or instead no ?? > Should I limit my boost because it looks like : > select/?qt=dismax&fl=id,score, > language,title,status_official,stat_views&q=svr09+tutorial&debugQuery=true&qf=title_en+title^1.1+status_official^2.2+status_creative^1.4+description&bf=recip(rord(created),1,10,10)^25+pow(stat_views,0.1)^4 > > Maybe its too much .. ??? > thanks a lot,