Re: questions about autocommit & committing documents
Hi Andy, Andy-152 wrote: > > > 1 > 1000 > > > has been commented out. > > - With commented out, does it mean that every new document > indexed to Solr is being auto-committed individually? Or that they are not > being auto-committed at all? > I am not sure, whether there is a default value, but if not, commenting out would mean that you have to send a commit explicitly. > - If I enable and set at 1, does it mean that > my new documents won't be avalable for searching until 10,000 new > documents have been added? > Yes, that's correct. However, you can do a commit explicitly, if you want to do so. > - When I add a new document to Solr, do I need to call commit explicitly? > If so, how do I do that? > I look at the Solr tutorial ( > http://lucene.apache.org/solr/tutorial.html), the command used to index > documents (java -jar post.jar solr.xml monitor.xml) doesn't include any > explicit call to commit the documents. So I'm not sure if it's necessary. > > Thanks > Committing is necessary, since every added document is not visible at query-time, if there was no commit to it. Kind regards, Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/questions-about-autocommit-committing-documents-tp1582487p1582676.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: questions about autocommit & committing documents
Thanks Mitch. How do I do an explicit commit? Andy --- On Sun, 9/26/10, MitchK wrote: > From: MitchK > Subject: Re: questions about autocommit & committing documents > To: solr-user@lucene.apache.org > Date: Sunday, September 26, 2010, 4:13 AM > > Hi Andy, > > > Andy-152 wrote: > > > > > > 1 > > 1000 > > > > > > has been commented out. > > > > - With commented out, does it mean > that every new document > > indexed to Solr is being auto-committed individually? > Or that they are not > > being auto-committed at all? > > > I am not sure, whether there is a default value, but if > not, commenting out > would mean that you have to send a commit explicitly. > > > > > - If I enable and set > at 1, does it mean that > > my new documents won't be avalable for searching until > 10,000 new > > documents have been added? > > > Yes, that's correct. However, you can do a commit > explicitly, if you want to > do so. > > > > > - When I add a new document to Solr, do I need to call > commit explicitly? > > If so, how do I do that? > > I look at the Solr tutorial ( > > http://lucene.apache.org/solr/tutorial.html), the > command used to index > > documents (java -jar post.jar solr.xml monitor.xml) > doesn't include any > > explicit call to commit the documents. So I'm not sure > if it's necessary. > > > > Thanks > > > Committing is necessary, since every added document is not > visible at > query-time, if there was no commit to it. > > Kind regards, > Mitch > -- > View this message in context: > http://lucene.472066.n3.nabble.com/questions-about-autocommit-committing-documents-tp1582487p1582676.html > Sent from the Solr - User mailing list archive at > Nabble.com. >
Re: questions about autocommit & committing documents
First: Usually you do not use post.jar for updating your index. It's a simple tool, but normally you use features like the csv- or xml-update-RequestHandler. Have a look at "UpdateCSV" and "UpdateXMLMessages" in the wiki. There you can find examples on how to commit explicitly. With the post.jar you need to set either dcommit=yes or to append "", I think. Hope this helps. - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/questions-about-autocommit-committing-documents-tp1582487p1582846.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Concurrent DB updates and delta import misses few records
You could store the last indexed ID in the DB. Implement the delta import as a stored procedure that saves the last imported ID in the DB. On subsequent delta imports, use the deltaQuery to get that ID from the DB and use it in the deltaImportQuery See http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201009.mbox/%3 c9f8b39cb3b7c6d4594293ea29ccf438b0174c...@icq-mail.icq.il.office.aol.com %3e Ephraim Ofir -Original Message- From: Shashikant Kore [mailto:shashik...@gmail.com] Sent: Thursday, September 23, 2010 8:48 AM To: solr-user@lucene.apache.org Subject: Re: Concurrent DB updates and delta import misses few records Thanks for the pointer, Shawn. It, definitely, is useful. I am wondering if you could retrieve minDid from the solr rather than storing it externally. Max id from Solr index and max id from DB should define the lower and upper thresholds, respectively, of the delta range. Am I missing something? --shashi On Wed, Sep 22, 2010 at 6:47 PM, Shawn Heisey wrote: > On 9/22/2010 1:39 AM, Shashikant Kore wrote: > >> Hi, >> >> I'm using DIH to index records from a database. After every update on >> (MySQL) DB, Solr DIH is invoked for delta import. In my tests, I have >> observed that if db updates and DIH import is happening concurrently, >> import >> misses few records. >> >> Here is how it happens. >> >> The table has a column 'lastUpdated' which has default value of current >> timestamp. Many records are added to database in a single transaction that >> takes several seconds. For example, if 10,000 rows are being inserted, the >> rows may get timestamp values from '2010-09-20 18:21:20' to '2010-09-20 >> 18:21:26'. These rows become visible only after transaction is committed. >> That happens at, say, '2010-09-20 18:21:30'. >> >> If Solr is import gets triggered at '18:20:29', it will use a timestamp of >> last import for delta query. This import will not see the records added in >> the aforementioned transaction as transaction was not committed at that >> instant. After this import, the dataimport.properties will have last index >> time as '18:20:29'. The next import will not able to get all the rows of >> previously referred trasaction as some of the rows have timestamp earlier >> than '18:20:29'. >> >> While I am testing extreme conditions, there is a possibility of missing >> out >> on some data. >> >> I could not find any solution in Solr framework to handle this. The table >> has an auto increment key, all updates are deletes followed by inserts. >> So, >> having last_indexed_id would have helped, where last_indexed_id is the max >> value of id fetched in that import. The query would then become "Select id >> where id>last_indexed_id.' I suppose, Solr does not have any provision >> like >> this. >> >> Two options I could think of are: >> (a) Ensure at application level that there are no concurrent DB updates >> and >> DIH import requests going concurrently. >> (b) Use exclusive locking during DB update >> >> What is the best way to address this problem? >> > > Shashi, > > I was not solving the same problem, but perhaps you can adapt my solution > to yours. My main problem was that I don't have a modified date in my > database, and due to the size of the table, it is impractical to add one. > Instead, I chose to track the database primary key (a simple autoincrement) > outside of Solr and pass min/max values into DIH for it to use in the SELECT > statement. You can see a simplified version of my entity here, with a URL > showing how to send the parameters in via the dataimport GET: > > http://www.mail-archive.com/solr-user@lucene.apache.org/msg40466.html > > The update script that runs every two minutes gets MAX(did) from the > database, retrieves the minDid from a file on an NFS share, and runs a > delta-import with those two values. When the import is reported successful, > it writes the maxDid value to the minDid file on the network share for the > next run. If the import fails, it sends an alarm and doesn't update the > minDid. > > Shawn > >
how are you using Solr?
I am trying to understand the width of its usage! I am from Finance and I am using for content/material search, initially we were storing these in the database but we had performance issues with the search. so later on we moved to Solr. How about you? why did you choose Solr and what business stream you are using it in?
Re: how are you using Solr?
Custom search engine in stealth mode. Will be going 'private alpha' near end of year. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Sun, 9/26/10, Girish Pandit wrote: > From: Girish Pandit > Subject: how are you using Solr? > To: solr-user@lucene.apache.org > Date: Sunday, September 26, 2010, 5:24 AM > I am trying to understand the width > of its usage! > > I am from Finance and I am using for content/material > search, initially we were storing these in the database but > we had performance issues with the search. so later on we > moved to Solr. > > How about you? why did you choose Solr and what business > stream you are using it in? >
RE: how are you using Solr?
http://wiki.apache.org/solr/PublicServers http://www.lucidimagination.com/developer/Community/Application-Showcase-Wiki -Original message- From: Girish Pandit Sent: Sun 26-09-2010 14:16 To: solr-user@lucene.apache.org; Subject: how are you using Solr? I am trying to understand the width of its usage! I am from Finance and I am using for content/material search, initially we were storing these in the database but we had performance issues with the search. so later on we moved to Solr. How about you? why did you choose Solr and what business stream you are using it in?
Re: how are you using Solr?
We are building a knowledge networking app that is powered using Solr. Right now in alpha - will be in beta by end of year. www.bibkosh.com Markus Jelsma wrote: http://wiki.apache.org/solr/PublicServers http://www.lucidimagination.com/developer/Community/Application-Showcase-Wiki -Original message- From: Girish Pandit Sent: Sun 26-09-2010 14:16 To: solr-user@lucene.apache.org; Subject: how are you using Solr? I am trying to understand the width of its usage! I am from Finance and I am using for content/material search, initially we were storing these in the database but we had performance issues with the search. so later on we moved to Solr. How about you? why did you choose Solr and what business stream you are using it in?
Re: TokenFilter that removes payload ?
Erik, On Sep 26, 2010, at 8:04 AM, Erick Erickson wrote: > The reason I ask is that you had to put the payloads into the > input in the first place, and they don't affect searching unless > you want them to. So why do you want to remove them > with a token filter? Our Tokenizer puts a part-of-speech tag into each Token as a payload. There is an accompanying TokenFilter that removes Tokens marked with a configurable set of part-of-speech tags later in the analysis chain. As I understand it, payloads go to the Lucene index. In most cases, the part-of-speech tags are not used if retrieved by the search applications. So they shouldn't go to the index. So I'd like to know if there is an existing TokenFilter that does this. Otherwise, I'd like to write one. T. "Kuro" Kurosaka
Re: TokenFilter that removes payload ?
On Sun, Sep 26, 2010 at 11:49 PM, Teruhiko Kurosaka wrote: > > As I understand it, payloads go to the Lucene index. > In most cases, the part-of-speech tags are not used if > retrieved by the search applications. So they shouldn't > go to the index. So I'd like to know if there is an > existing TokenFilter that does this. Otherwise, I'd like > to write one. > I agree with Erick, I think a better approach would be to put the part of speech tags into another attribute. For example, you can put them in TypeAttribute, which is not stored in the index by default. Then, if the user wants to store them in the index, they just add TypeAsPayloadTokenFilterFactory, which copies the type into the payload... but otherwise they would not be stored. -- Robert Muir rcm...@gmail.com