On Mon, Jun 6, 2011 at 1:47 PM, Naveen Gupta <nkgiit...@gmail.com> wrote:
> Hi Tomas, > > 1. Regarding SolrInputDocument, > > We are not using java client, rather we are using php solr, wrapping > content > in SolrInputDocument, i am not sure how to do in PHP client? In this case, > we need tika related jars to avail the metadata such as content .. we > certainly don't want to handle all these things in PHP client. > I don't understand, Tika IS integrated in Solr, it doesn't matter which client or client language you are using. To add a static value, all you have to do is add it as a request parameter with the prefix "literal". Something like "literal.somefield=thevalue". Content and other file metadata such as author etc (see http://wiki.apache.org/solr/ExtractingRequestHandler#Metadata) will be added to the document inside Solr and indexed. You don't need to handle this on the client application. > > Secondly, what i was asking about commit strategy -- > > what about suppose you have 100 docs > > iterate over 99 docs and fire curl without commit in url > > and for 100th doc, we will use commit .... > > so doing so, will it also update the indexes for last 99 docs .... > > while(upto 99){ > curl_command = url without commit; > } > > when i = 100, url would be commit > You can certainly do this. The 100 documents will be available for search after the commit. Non of the documents will be available for search before commit. > > i wanted to achieve something similar to optimize kind of thing .... > Optimize command should be issued when not many queries or updates are sent to the index. It uses lots of resources and will slow down queries. > > why these kind of use cases which are general purpose not included in > example (especially in other language ...java guys can easily do using API) > They are, you can the auto-commit feature, configured on solrconfig.xml file. You can either tell Solr to commit on a time interval or when a number of documents are updated and not committed. On the example file, the autocommit is commented, but you can uncomment it. > I am basically a Java Guy, so i can feel the problem > > Thanks > Naveen > 2011/6/6 Tomás Fernández Löbbe <tomasflo...@gmail.com> > > > 1. About the commit strategy, all the ExtractingRequestHandler (request > > handler that uses Tika to extract content from the input file) will do is > > extract the content of your file and add it to a SolrInputDocument. The > > commit strategy should not change because of this, compared to other > > documents you might be indexing. It is usually not recommended to commit > on > > every new / updated document. > > > > 2. Don't know if I understand the question. you can add all the static > > fields you want to the document by adding the "literal." prefix to the > name > > of the fields when using ExtractingRequestHandler (as you are doing with > " > > literal.id"). You can also leave empty fields if they are not marked as > > "required" at the schema.xml file. See: > > http://wiki.apache.org/solr/ExtractingRequestHandler#Literals > > > > 3. Solr cores can work almost as completely different Solr instances. You > > could tell one core to replicate from another core. I don't think this > > would > > be of any help here. If you want to separate the indexing operations from > > the query operations, you could probably use different machines, that's > > usually a better option. Configure the indexing box as master and the > query > > box as slave. Here you have some more information about it: > > http://wiki.apache.org/solr/SolrReplication > > > > Were this the answers you were looking for or did I misunderstand your > > questions? > > > > Tomás > > > > On Mon, Jun 6, 2011 at 2:54 AM, Naveen Gupta <nkgiit...@gmail.com> > wrote: > > > > > Hi > > > > > > Since it is php, we are using solphp for calling curl based call, > > > > > > what my concern here is that for each user, we might be having 20-40 > > > attachments needed to be indexed each day, and there are various users > > > ..daily we are targeting around 500-1000 users .. > > > > > > right now if you see, we > > > > > > <?php > > > $ch = curl_init(' > > > http://localhost:8010/solr/update/extract?literal.id=doc2&commit=true' > ); > > > curl_setopt ($ch, CURLOPT_POST, 1); > > > curl_setopt ($ch, CURLOPT_POSTFIELDS, array('myfile'=>"@paper.pdf")); > > > $result= curl_exec ($ch); > > > ?> > > > > > > also we are planning to use other fields which are to be indexed and > > stored > > > ... > > > > > > > > > There are couple of questions here > > > > > > 1. what would be the best strategies for commit. if we take all the > > > documents in an array and iterating one by one and fire the curl and > for > > > the > > > last doc, if we commit, will it work or for each doc, we need to > commit? > > > > > > 2. we are having several fields which are already defined in schema and > > few > > > of the them are required earlier, but for this purpose, we don't want, > > how > > > to have two requirement together in the same schema? > > > > > > 3. since it is frequent commit, how to use solr multicore for write and > > > read > > > operations separately ? > > > > > > Thanks > > > Naveen > > > > > >