Help creating schema for indexable document
Hi Guys. I am struggling to create a schema with a determinist content model for a set of documents I want to index. My indexable documents will look something like: 1 code1 code2 mycategory My service will be mission critical and will accept batch imports from a potentially unreliable source. Are there any xml schema guru's who can help me with creating xn xsd which will work with my sample document? Thanks in advance for your help, -- Ross -- View this message in context: http://www.nabble.com/Help-creating-schema-for-indexable-document-tp24862700p24862700.html Sent from the Solr - User mailing list archive at Nabble.com.
Additional metadata when using Solr Cell
Hi. I am indexing a PDF document with the ExtractingRequestHandler. My curl post has a URL like: ../solr/update/extract?ext.idx.attr=true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody Sure enough I see in the server logs: params={ext.def.fl=text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody} I am trying to get my field back in the results from a query: ../solr/select?indent=on&version=2.2&q=hello&start=0&rows=10&fl=author%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= I see the score in the results 'doc' but no reference to author. Can anyone advise on what I am forgetting to do, to get hold of this field? Thanks in advance for your help, -- Ross -- View this message in context: http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Additional metadata when using Solr Cell
There is no reference to the author field I am trying to set.. I am using the latest nightly download. -- Ross Grant Ingersoll-6 wrote: > > what does /admin/luke show for fields and terms in the fields? > > On May 14, 2009, at 10:03 AM, rossputin wrote: > >> >> Hi. >> >> I am indexing a PDF document with the ExtractingRequestHandler. My >> curl >> post has a URL like: >> >> ../solr/update/extract? >> ext >> .idx >> .attr >> =true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody >> >> Sure enough I see in the server logs: >> >> params >> = >> {ext >> .def >> .fl >> = >> text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody} >> >> I am trying to get my field back in the results from a query: >> >> ../solr/select? >> indent=on&version=2.2&q=hello&start=0&rows=10&fl=author >> %2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= >> >> I see the score in the results 'doc' but no reference to author. >> >> Can anyone advise on what I am forgetting to do, to get hold of this >> field? >> >> Thanks in advance for your help, >> >> -- Ross >> -- >> View this message in context: >> http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > > > -- View this message in context: http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Additional metadata when using Solr Cell
There is now, thanks for your help. On the same topic.. is there a best practice for modifying schema, in a future-proof way ? -- Ross Grant Ingersoll-6 wrote: > > Do you have an author field in your schema? > > On May 14, 2009, at 10:31 AM, rossputin wrote: > >> >> There is no reference to the author field I am trying to set.. I am >> using the >> latest nightly download. >> >> -- Ross >> >> >> Grant Ingersoll-6 wrote: >>> >>> what does /admin/luke show for fields and terms in the fields? >>> >>> On May 14, 2009, at 10:03 AM, rossputin wrote: >>> >>>> >>>> Hi. >>>> >>>> I am indexing a PDF document with the ExtractingRequestHandler. My >>>> curl >>>> post has a URL like: >>>> >>>> ../solr/update/extract? >>>> ext >>>> .idx >>>> .attr >>>> =true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody >>>> >>>> Sure enough I see in the server logs: >>>> >>>> params >>>> = >>>> {ext >>>> .def >>>> .fl >>>> = >>>> text >>>> &ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody} >>>> >>>> I am trying to get my field back in the results from a query: >>>> >>>> ../solr/select? >>>> indent=on&version=2.2&q=hello&start=0&rows=10&fl=author >>>> %2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= >>>> >>>> I see the score in the results 'doc' but no reference to author. >>>> >>>> Can anyone advise on what I am forgetting to do, to get hold of this >>>> field? >>>> >>>> Thanks in advance for your help, >>>> >>>> -- Ross >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>> >>> >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >>> using Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > > > -- View this message in context: http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23542620.html Sent from the Solr - User mailing list archive at Nabble.com.
highlight results from pdf search
Hi. I have some PDF documents indexed through solr cell. My highlighting queries work fine on standard xml doc types, eg the samples. I would now like to highlight some queries on a PDF document. Currently for my simple examples I am just indexing a PDF, providing an id, and an arbitrary ext.literal. I would like to be able to get highlighted snippets back from the extracted content of the PDF. Is this possible? Thanks in advance for your help, - Ross -- View this message in context: http://www.nabble.com/highlight-results-from-pdf-search-tp23791905p23791905.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr document structure for preserving version information
Hi Guys. This is a schema design question I suppose. I would like to store a series of version elements comprising of two attributes, 'updated' (a date) and 'reason' (just a simple string). I aim to produce xml based on a search which would look something like: So I realise I could use multiValued fields, but I want to avoid doing something like: 01/04/2009 10:30:00|changes made (using | or some other separator) As I would need to split the field in my code. This approach does not seem the best. Has anyone got an approach they could share ? Thanks in advance for your help, - Ross -- View this message in context: http://www.nabble.com/Solr-document-structure-for-preserving-version-information-tp23885262p23885262.html Sent from the Solr - User mailing list archive at Nabble.com.
posting binary file and metadata in two separate documents
Hi. I am currently using Solr Cell to extract content from binary files, and I am passing along some additional metadata with ext.literal params. Sample below: curl "http://localhost:8983/solr/update/extract?ext.literal.id=2&ext.literal.some_code1=code1&ext.literal.some_code2=code2&ext.idx.attr=true\&ext.def.fl=text"; -F "myfi...@myfile.pdf" Where I have large numbers of ext.literal params this becomes a bit of a chore.. and it would be the same case in an html form with many params... can I pass both files to '/update/extract' as documents, (files) linked together? Or are there any other options like this? Perhaps something I can do with Solrj. Thanks in advance for your help, regards, Ross. -- View this message in context: http://www.nabble.com/posting-binary-file-and-metadata-in-two-separate-documents-tp24375649p24375649.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: posting binary file and metadata in two separate documents
Hi. Apologies for bumping this one, but another question occurred to me... is there a limit to the number of &ext.literal components I can put in my curl command... if so, i will definitely need to find another way to get this data in, as I am building up relationships between documents, and there will be many of them. Thanks in advance for your help, regards, Ross rossputin wrote: > > Hi. > > I am currently using Solr Cell to extract content from binary files, and I > am passing along some additional metadata with ext.literal params. Sample > below: > > curl > "http://localhost:8983/solr/update/extract?ext.literal.id=2&ext.literal.some_code1=code1&ext.literal.some_code2=code2&ext.idx.attr=true\&ext.def.fl=text"; > -F "myfi...@myfile.pdf" > > Where I have large numbers of ext.literal params this becomes a bit of a > chore.. and it would be the same case in an html form with many params... > can I pass both files to '/update/extract' as documents, (files) linked > together? Or are there any other options like this? Perhaps something I > can do with Solrj. > > Thanks in advance for your help, > > regards, > > Ross. > > > -- View this message in context: http://www.nabble.com/posting-binary-file-and-metadata-in-two-separate-documents-tp24375649p24423267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: posting binary file and metadata in two separate documents
Hi. Thanks for your reply, shame nobody has already implemented the multiple 'ContentStreams' idea :-) With regards to posting in a form, I had considered that, but unfortunately there can be an arbitrary number of 'ext.literals', so it would be difficult to build a form which would handle all cases. Regards, -- Ross hossman wrote: > > > : Subject: posting binary file and metadata in two separate documents > > there was some discussion a while back about that fact that you can push > multiple "ContentStreams" to SOlr in a single request, and while the > existing handelrs all just iterate over and process them seperately, it > would be *possible* for a variant of ExtractingRequest handler to use the > first stream to get document metadat, and have that metdata refrence the > other streams in some way for large chunks of text) > > But no one has attempted to implement that as far as i know. > > : > "http://localhost:8983/solr/update/extract?ext.literal.id=2&ext.literal.some_code1=code1&ext.literal.some_code2=code2&ext.idx.attr=true\&ext.def.fl=text"; > : -F "myfi...@myfile.pdf" > : > : Where I have large numbers of ext.literal params this becomes a bit of a > : chore.. and it would be the same case in an html form with many > params... > : can I pass both files to '/update/extract' as documents, (files) linked > : together? Or are there any other options like this? Perhaps something > I > : can do with Solrj. > > there's no reason those params have ot be in the URL. you can do a > multipart POST with application/x-www-form-urlencoded in one part and your > pdf file in another part (just like doing a POST from a massive HTML form > with an '' option) > > > -Hoss > > > -- View this message in context: http://www.nabble.com/posting-binary-file-and-metadata-in-two-separate-documents-tp24375649p24530051.html Sent from the Solr - User mailing list archive at Nabble.com.
post error - ERROR:unknown field 'title'
Hi guys. I have two different solr versions as I am evaluating nightly builds. On a more recent one.. I think 15th July I am getting the following error : ERROR:unknown field 'title' I am posting to 'solr/update/extract' with the following: curl "http://localhost:8983/solr/update/extract?ext.literal.id=1&ext.literal.code=somecode&ext.literal.url=someurl/file.pdf&ext.literal.category=somecat&ext.literal.updated=2009-06-01T09:10:30.000Z&ext.idx.attr=true\&ext.def.fl=text"; -F "myfi...@1411_9.pdf" My schema does not, and is not intended to contain a 'title' field. Thanks in advance for your help, -- Ross -- View this message in context: http://www.nabble.com/post-error---ERROR%3Aunknown-field-%27title%27-tp24567235p24567235.html Sent from the Solr - User mailing list archive at Nabble.com.