Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

javaxmlsoapdev Wed, 25 Nov 2009 05:53:21 -0800

Grant, can you assist. I am going clueless as to why its not indexing content
of the file. I have provided schema, code info below/previous threads. do I
need to explicitly add param("content", "') into ContentStreamUpdateRequest?
which I don't think is the right thing to do. Please advie.


let me know if you need anything else. Appreciate your help.

Thanks,

javaxmlsoapdev wrote:
> 
> Following is luke response. <lst name="fields" /> is empty. can someone
> assist to find out why file content isn't being index?
> 
>   <?xml version="1.0" encoding="UTF-8" ?> 
>  <response>
>  <lst name="responseHeader">
>   <int name="status">0</int> 
>   <int name="QTime">0</int> 
>   </lst>
>  <lst name="index">
>   <int name="numDocs">0</int> 
>   <int name="maxDoc">0</int> 
>   <int name="numTerms">0</int> 
>   <long name="version">1259085661332</long> 
>   <bool name="optimized">false</bool> 
>   <bool name="current">true</bool> 
>   <bool name="hasDeletions">false</bool> 
>   <str
> name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index</str>
>  
>   <date name="lastModified">2009-11-24T18:01:01Z</date> 
>   </lst>
>   <lst name="fields" /> 
>  <lst name="info">
>  <lst name="key">
>   <str name="I">Indexed</str> 
>   <str name="T">Tokenized</str> 
>   <str name="S">Stored</str> 
>   <str name="M">Multivalued</str> 
>   <str name="V">TermVector Stored</str> 
>   <str name="o">Store Offset With TermVector</str> 
>   <str name="p">Store Position With TermVector</str> 
>   <str name="O">Omit Norms</str> 
>   <str name="L">Lazy</str> 
>   <str name="B">Binary</str> 
>   <str name="C">Compressed</str> 
>   <str name="f">Sort Missing First</str> 
>   <str name="l">Sort Missing Last</str> 
>   </lst>
>   <str name="NOTE">Document Frequency (df) is not updated when a document
> is marked for deletion. df values include deleted documents.</str> 
>   </lst>
>   </response>
> 
> javaxmlsoapdev wrote:
>> 
>> I was able to configure /docs index separately from my db data index.
>> 
>> still I am seeing same behavior where it only puts .docName & its size in
>> the "content" field (I have renamed field to "content" in this new
>> schema)
>> 
>> below are the only two fields I have in schema.xml
>> <field name="key" type="slong" indexed="true" stored="true"
>> required="true" /> 
>> <field name="content" type="text" indexed="true" stored="true"
>> multiValued="true"/>  
>> 
>> Following is updated code from test case
>> 
>> File fileToIndex = new File("file.txt");
>> 
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978");
>> up.setParam("literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>> NamedList list = server.request(up);
>> assertNotNull("Couldn't upload .txt",list);
>>                      
>> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
>> assertEquals( 1, rsp.getResults().getNumFound() );
>> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
>> 
>> Also from solr admin UI when I search for "doc123.txt" then only it
>> returns me following response. not sure why its not indexing file's
>> content into "content" attribute.
>> - <result name="response" numFound="1" start="0">
>> - <doc>
>> - <arr name="content">
>>   <str>702</str> 
>>   <str>text/plain</str> 
>>   <str>doc123.txt</str> 
>>   <str /> 
>>   </arr>
>>   <long name="key">8978</long> 
>>   </doc>
>>   </result>
>> 
>> Any idea?
>> 
>> Thanks,
>> 
>> 
>> javaxmlsoapdev wrote:
>>> 
>>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>>> its not able to find luke.
>>> 
>>> I am reusing schema, which is used for indexing other entity from
>>> database, which has no relevance to documents. that was my next question
>>> that what do I put in, in a schema if my documents don't need any column
>>> mappings or anything. plus I want to keep file documents index
>>> separately from database entity index. what's the best way to do this?
>>> If I don't have any db columns etc to map and file documents index
>>> should leave separate from db entity index, what's the best way to
>>> achieve this.
>>> 
>>> thanks,
>>> 
>>> 
>>> 
>>> Grant Ingersoll-6 wrote:
>>>> 
>>>> 
>>>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>>>> 
>>>>> 
>>>>> *:* returns me 1 count but when I search for specific word (which was
>>>>> part of
>>>>> .txt file I indexed before) it doesn't return me anything. I don't
>>>>> have luke
>>>>> setup on my end.
>>>> 
>>>> http://localhost:8983/solr/admin/luke should give yo some info.
>>>> 
>>>> 
>>>>> let me see if I can set that up quickly but otherwise do
>>>>> you see anything I am missing in solrconfig mapping or something?
>>>> 
>>>> What's your schema look like and how are you querying?
>>>> 
>>>>> which maps
>>>>> document "content" to wrong attribute?
>>>>> 
>>>>> thanks,
>>>>> 
>>>>> Grant Ingersoll-6 wrote:
>>>>>> 
>>>>>> 
>>>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Following code is from my test case where it tries to index a file
>>>>>>> (of
>>>>>>> type
>>>>>>> .txt)
>>>>>>> ContentStreamUpdateRequest up = new
>>>>>>> ContentStreamUpdateRequest("/update/extract");
>>>>>>> up.addFile(fileToIndex);
>>>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);  
>>>>>>> server.request(up);             
>>>>>>> 
>>>>>>> test case doesn't give me any error and "I think" its indexing the
>>>>>>> file?
>>>>>>> but
>>>>>>> when I search for a text (which was part of the .txt file) search
>>>>>>> doesn't
>>>>>>> return me anything.
>>>>>> 
>>>>>> What do your logs show?  Else, what does Luke show or doing a *:*
>>>>>> query
>>>>>> (assuming this is the only file you added)?
>>>>>> 
>>>>>> Also, I don't think you need ext.literal anymore, just literal.
>>>>>> 
>>>>>>> 
>>>>>>> Following is the config from solrconfig.xml where I have mapped
>>>>>>> content
>>>>>>> to
>>>>>>> "description" field(default search field) in the schema.
>>>>>>> 
>>>>>>> <requestHandler name="/update/extract"
>>>>>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>>>>   <lst name="defaults">
>>>>>>>     <str name="map.content">description</str>
>>>>>>>     <str name="defaultField">description</str>
>>>>>>>   </lst>
>>>>>>> </requestHandler>
>>>>>>> 
>>>>>>> Clearly it seems I am missing something. Any idea?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --------------------------
>>>>>> Grant Ingersoll
>>>>>> http://www.lucidimagination.com/
>>>>>> 
>>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>>>> using
>>>>>> Solr/Lucene:
>>>>>> http://www.lucidimagination.com/search
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> -- 
>>>>> View this message in context:
>>>>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>> 
>>>> 
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>> 
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>> using Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26513001.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

Reply via email to