Help creating schema for indexable document

2009-08-07 Thread rossputin

Hi Guys.

I am struggling to create a schema with a determinist content model for a
set of documents I want to index.

My indexable documents will look something like:


  
1
code1
code2
mycategory
  


My service will be mission critical and will accept batch imports from a
potentially unreliable source.  Are there any xml schema guru's who can help
me with creating xn xsd which will work with my sample document?

Thanks in advance for your help,

 -- Ross
-- 
View this message in context: 
http://www.nabble.com/Help-creating-schema-for-indexable-document-tp24862700p24862700.html
Sent from the Solr - User mailing list archive at Nabble.com.



Additional metadata when using Solr Cell

2009-05-14 Thread rossputin

Hi.

I am indexing a PDF document with the ExtractingRequestHandler.  My curl
post has a URL like:

../solr/update/extract?ext.idx.attr=true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody

Sure enough I see in the server logs:

params={ext.def.fl=text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody}

I am trying to get my field back in the results from a query:

../solr/select?indent=on&version=2.2&q=hello&start=0&rows=10&fl=author%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=

I see the score in the results 'doc' but no reference to author.

Can anyone advise on what I am forgetting to do, to get hold of this field?

Thanks in advance for your help,

 -- Ross
-- 
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Additional metadata when using Solr Cell

2009-05-14 Thread rossputin

There is no reference to the author field I am trying to set.. I am using the
latest nightly download.

 -- Ross


Grant Ingersoll-6 wrote:
> 
> what does /admin/luke show for fields and terms in the fields?
> 
> On May 14, 2009, at 10:03 AM, rossputin wrote:
> 
>>
>> Hi.
>>
>> I am indexing a PDF document with the ExtractingRequestHandler.  My  
>> curl
>> post has a URL like:
>>
>> ../solr/update/extract? 
>> ext 
>> .idx 
>> .attr 
>> =true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody
>>
>> Sure enough I see in the server logs:
>>
>> params 
>> = 
>> {ext 
>> .def 
>> .fl 
>> = 
>> text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody}
>>
>> I am trying to get my field back in the results from a query:
>>
>> ../solr/select? 
>> indent=on&version=2.2&q=hello&start=0&rows=10&fl=author 
>> %2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=
>>
>> I see the score in the results 'doc' but no reference to author.
>>
>> Can anyone advise on what I am forgetting to do, to get hold of this  
>> field?
>>
>> Thanks in advance for your help,
>>
>> -- Ross
>> -- 
>> View this message in context:
>> http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Additional metadata when using Solr Cell

2009-05-14 Thread rossputin

There is now, thanks for your help.  On the same topic.. is there a best
practice for modifying schema, in a future-proof way ?

 -- Ross



Grant Ingersoll-6 wrote:
> 
> Do you have an author field in your schema?
> 
> On May 14, 2009, at 10:31 AM, rossputin wrote:
> 
>>
>> There is no reference to the author field I am trying to set.. I am  
>> using the
>> latest nightly download.
>>
>> -- Ross
>>
>>
>> Grant Ingersoll-6 wrote:
>>>
>>> what does /admin/luke show for fields and terms in the fields?
>>>
>>> On May 14, 2009, at 10:03 AM, rossputin wrote:
>>>
>>>>
>>>> Hi.
>>>>
>>>> I am indexing a PDF document with the ExtractingRequestHandler.  My
>>>> curl
>>>> post has a URL like:
>>>>
>>>> ../solr/update/extract?
>>>> ext
>>>> .idx
>>>> .attr
>>>> =true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody
>>>>
>>>> Sure enough I see in the server logs:
>>>>
>>>> params
>>>> =
>>>> {ext
>>>> .def
>>>> .fl
>>>> =
>>>> text 
>>>> &ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody}
>>>>
>>>> I am trying to get my field back in the results from a query:
>>>>
>>>> ../solr/select?
>>>> indent=on&version=2.2&q=hello&start=0&rows=10&fl=author
>>>> %2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=
>>>>
>>>> I see the score in the results 'doc' but no reference to author.
>>>>
>>>> Can anyone advise on what I am forgetting to do, to get hold of this
>>>> field?
>>>>
>>>> Thanks in advance for your help,
>>>>
>>>> -- Ross
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>
>>> --
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>> using Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23542620.html
Sent from the Solr - User mailing list archive at Nabble.com.



highlight results from pdf search

2009-05-30 Thread rossputin

Hi.

I have some PDF documents indexed through solr cell.  My highlighting
queries work fine on standard xml doc types, eg the samples.  I would now
like to highlight some queries on a PDF document.  Currently for my simple
examples I am just indexing a PDF, providing an id, and an arbitrary
ext.literal.  I would like to be able to get highlighted snippets back from
the extracted content of the PDF.  Is this possible?

Thanks in advance for your help,

 - Ross
-- 
View this message in context: 
http://www.nabble.com/highlight-results-from-pdf-search-tp23791905p23791905.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr document structure for preserving version information

2009-06-05 Thread rossputin

Hi Guys.

This is a schema design question I suppose.  I would like to store a series
of version elements comprising of two attributes, 'updated' (a date) and
'reason' (just a simple string).  I aim to produce xml based on a search
which would look something like:


  
  
  


So I realise I could use multiValued fields, but I want to avoid doing
something like:

01/04/2009 10:30:00|changes made (using | or some other
separator)

As I would need to split the field in my code.  This approach does not seem
the best.  Has anyone got an approach they could share ?

Thanks in advance for your help,

 - Ross
-- 
View this message in context: 
http://www.nabble.com/Solr-document-structure-for-preserving-version-information-tp23885262p23885262.html
Sent from the Solr - User mailing list archive at Nabble.com.



posting binary file and metadata in two separate documents

2009-07-07 Thread rossputin

Hi.

I am currently using Solr Cell to extract content from binary files, and I
am passing along some additional metadata with ext.literal params. Sample
below:

curl
"http://localhost:8983/solr/update/extract?ext.literal.id=2&ext.literal.some_code1=code1&ext.literal.some_code2=code2&ext.idx.attr=true\&ext.def.fl=text";
-F "myfi...@myfile.pdf"

Where I have large numbers of ext.literal params this becomes a bit of a
chore.. and it would be the same case in an html form with many params... 
can I pass both files to '/update/extract' as documents, (files) linked
together?  Or are there any other options like this?  Perhaps something I
can do with Solrj.

Thanks in advance for your help,

regards,

Ross.


-- 
View this message in context: 
http://www.nabble.com/posting-binary-file-and-metadata-in-two-separate-documents-tp24375649p24375649.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: posting binary file and metadata in two separate documents

2009-07-10 Thread rossputin

Hi.

Apologies for bumping this one, but another question occurred to me... is
there a limit to the number of &ext.literal components I can put in my curl
command... if so, i will definitely need to find another way to get this
data in, as I am building up relationships between documents, and there will
be many of them.

Thanks in advance for your help,

regards,

Ross



rossputin wrote:
> 
> Hi.
> 
> I am currently using Solr Cell to extract content from binary files, and I
> am passing along some additional metadata with ext.literal params. Sample
> below:
> 
> curl
> "http://localhost:8983/solr/update/extract?ext.literal.id=2&ext.literal.some_code1=code1&ext.literal.some_code2=code2&ext.idx.attr=true\&ext.def.fl=text";
> -F "myfi...@myfile.pdf"
> 
> Where I have large numbers of ext.literal params this becomes a bit of a
> chore.. and it would be the same case in an html form with many params... 
> can I pass both files to '/update/extract' as documents, (files) linked
> together?  Or are there any other options like this?  Perhaps something I
> can do with Solrj.
> 
> Thanks in advance for your help,
> 
> regards,
> 
> Ross.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/posting-binary-file-and-metadata-in-two-separate-documents-tp24375649p24423267.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: posting binary file and metadata in two separate documents

2009-07-17 Thread rossputin

Hi.

Thanks for your reply, shame nobody has already implemented the multiple
'ContentStreams' idea :-)
With regards to posting in a form, I had considered that, but unfortunately
there can be an arbitrary number of 'ext.literals', so it would be difficult
to build a form which would handle all cases.

Regards,

 -- Ross


hossman wrote:
> 
> 
> : Subject: posting binary file and metadata in two separate documents
> 
> there was some discussion a while back about that fact that you can push 
> multiple "ContentStreams" to SOlr in a single request, and while the 
> existing handelrs all just iterate over and process them seperately, it 
> would be *possible* for a variant of ExtractingRequest handler to use the 
> first stream to get document metadat, and have that metdata refrence the 
> other streams in some way for large chunks of text)
> 
> But no one has attempted to implement that as far as i know.
> 
> :
> "http://localhost:8983/solr/update/extract?ext.literal.id=2&ext.literal.some_code1=code1&ext.literal.some_code2=code2&ext.idx.attr=true\&ext.def.fl=text";
> : -F "myfi...@myfile.pdf"
> : 
> : Where I have large numbers of ext.literal params this becomes a bit of a
> : chore.. and it would be the same case in an html form with many
> params... 
> : can I pass both files to '/update/extract' as documents, (files) linked
> : together?  Or are there any other options like this?  Perhaps something
> I
> : can do with Solrj.
> 
> there's no reason those params have ot be in the URL.  you can do a 
> multipart POST with application/x-www-form-urlencoded in one part and your 
> pdf file in another part (just like doing a POST from a massive HTML form 
> with an '' option)
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/posting-binary-file-and-metadata-in-two-separate-documents-tp24375649p24530051.html
Sent from the Solr - User mailing list archive at Nabble.com.



post error - ERROR:unknown field 'title'

2009-07-20 Thread rossputin

Hi guys.

I have two different solr versions as I am evaluating nightly builds.  On a
more recent one.. I think 15th July I am getting the following error :

ERROR:unknown field 'title'

I am posting to 'solr/update/extract' with the following:

curl
"http://localhost:8983/solr/update/extract?ext.literal.id=1&ext.literal.code=somecode&ext.literal.url=someurl/file.pdf&ext.literal.category=somecat&ext.literal.updated=2009-06-01T09:10:30.000Z&ext.idx.attr=true\&ext.def.fl=text";
-F "myfi...@1411_9.pdf"

My schema does not, and is not intended to contain a 'title' field.

Thanks in advance for your help,

 -- Ross
-- 
View this message in context: 
http://www.nabble.com/post-error---ERROR%3Aunknown-field-%27title%27-tp24567235p24567235.html
Sent from the Solr - User mailing list archive at Nabble.com.