Re: Perform incremental import with PDF Files

Emir Arnautović Tue, 30 Jan 2018 07:41:01 -0800

Hi Karan,
clean=false will not delete existing documents in index, but if you reimport 
documents with the same ID they will be overwritten. If you see the same doc 
with updated timestamp, then it probably means that you did full-import of docs 
with the same file name.


HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 30 Jan 2018, at 08:34, Karan Saini <maximus...@gmail.com> wrote:
> 
> Hi Emir,
> 
> There is one behavior i noticed while performing the incremental import. I
> added a new field into the managed-schema.xml to test the incremental
> nature of using the clean=false.
> 
>         *<field name="xtimestamp" type="date" indexed="true" stored="true"
> default="NOW" multiValued="false"/>*
> 
> Now xtimestamp is having a new value even on every DIH import with
> clean=false property. Now i am confused that how will i know, if
> clean=false is working or not ?
> Please suggest.
> 
> Kind regards,
> Karan
> 
> 
> 
> On 29 January 2018 at 20:12, Emir Arnautović <emir.arnauto...@sematext.com>
> wrote:
> 
>> Hi Karan,
>> Glad it worked for you.
>> 
>> I am not sure how to do it in C# client, but adding clean=false parameter
>> in URL should do the trick.
>> 
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 29 Jan 2018, at 14:48, Karan Saini <maximus...@gmail.com> wrote:
>>> 
>>> Thanks Emir :-) . Setting the property *clean=false* worked for me.
>>> 
>>> Is there a way, i can selectively clean the particular index from the
>>> C#.NET code using the SolrNet API ?
>>> Please suggest.
>>> 
>>> Kind regards,
>>> Karan
>>> 
>>> 
>>> On 29 January 2018 at 16:49, Emir Arnautović <
>> emir.arnauto...@sematext.com>
>>> wrote:
>>> 
>>>> Hi Karan,
>>>> Did you try running full import with clean=false?
>>>> 
>>>> Emir
>>>> --
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>> 
>>>> 
>>>> 
>>>>> On 29 Jan 2018, at 11:18, Karan Saini <maximus...@gmail.com> wrote:
>>>>> 
>>>>> Hi folks,
>>>>> 
>>>>> Please suggest the solution for importing and indexing PDF files
>>>>> *incrementally*. My requirements is to pull the PDF files remotely from
>>>> the
>>>>> network folder path. This network folder will be having new sets of PDF
>>>>> files after certain intervals (for say 20 secs). The folder will be
>>>> forced
>>>>> to get empty, every time the new sets of PDF files are copied into it.
>> I
>>>> do
>>>>> not want to loose the earlier saved index of the old files, while doing
>>>> the
>>>>> next incremental import.
>>>>> 
>>>>> Currently, i am using Solr 6.6 version for the research.
>>>>> 
>>>>> The dataimport handler config is currently like this :-
>>>>> 
>>>>> <!--Remote Access--><dataConfig>
>>>>> <dataSource type="BinFileDataSource"/>
>>>>> <document>
>>>>>  <entity name="K2FileEntity" processor="FileListEntityProcessor"
>>>>> dataSource="null"
>>>>>                     recursive = "true"
>>>>>                     baseDir="\\CLDSINGH02\*RemoteFileDepot*"
>>>>>                     fileName=".*pdf" rootEntity="false">
>>>>> 
>>>>>                     <field column="file" name="id"/>
>>>>>                      <field column="fileSize" name="size" />-->
>>>>>                      <field column="fileLastModified"
>>>> name="lastmodified" />
>>>>> 
>>>>>                       <entity name="pdf" processor="
>> TikaEntityProcessor"
>>>> onError="skip"
>>>>>                                       url="${K2FileEntity.
>> fileAbsolutePath}"
>>>> format="text">
>>>>> 
>>>>>                             <field column="title" name="title"
>>>> meta="true"/>
>>>>>                             <field column="dc:format" name="format"
>>>> meta="true"/>
>>>>>                             <field column="text" name="text"/>
>>>>>                       </entity>
>>>>>  </entity>
>>>>> </document></dataConfig>
>>>>> 
>>>>> 
>>>>> Kind regards,
>>>>> Karan Singh
>>>> 
>>>> 
>> 
>>

Re: Perform incremental import with PDF Files

Reply via email to