Quick Query about

2017-11-09 Thread Karan Saini
Hi there,

I am new to the Apache Solr and currently exploring how to use this
technology to search in the PDF files.


https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#the-tikaentityprocessor




I am able to index the PDF files using the "BinFileDataSource" for the PDF
files within the same server as shown in the below example.

Now i want to know if there is a way to change the baseDir pointing to the
folder present under a different server.

Please suggest an example to access the PDF files from another server.



  * *
  



-->


  





  

  



Kind regards,
Karan


Re: Quick Query about

2017-11-09 Thread Karan Saini
Hi Deepak,

I think you mistaken my query. I am looking to access the PDF files from
another server, not the database.

Thanks,
Karan


On 9 November 2017 at 14:49, Deepak Vohra 
wrote:

> Provide the url to the data source on a different server.
> dataConfig>
>  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> url="jdbc:sqlserver://10.0.0.102;databaseName=Dictionary;"
> user="sa"
> password=""
> batchSize="5" />
>
> 
> 
> 
> 
>     
> 
> 
> 
> 
> On Thu, 11/9/17, Karan Saini  wrote:
>
>  Subject: Quick Query about
>  To: solr-user@lucene.apache.org
>  Received: Thursday, November 9, 2017, 1:13 AM
>
>  Hi there,
>
>  I am new to the Apache Solr and
>  currently exploring how to use this
>  technology to search in the PDF files.
>
>  <https://lucene.apache.org/solr/guide/6_6/uploading-
> structured-data-store-data-with-the-data-import-handler.
> html#the-tikaentityprocessor>
>  https://lucene.apache.org/solr/guide/6_6/uploading-
> structured-data-store-data-with-the-data-import-handler.
> html#the-tikaentityprocessor
>
>  <https://lucene.apache.org/solr/guide/6_6/uploading-
> structured-data-store-data-with-the-data-import-handler.
> html#the-tikaentityprocessor>
>
>  <https://lucene.apache.org/solr/guide/6_6/uploading-
> structured-data-store-data-with-the-data-import-handler.
> html#the-tikaentityprocessor>
>  I am able to index the PDF files using
>  the "BinFileDataSource" for the PDF
>  files within the same server as shown
>  in the below example.
>
>  Now i want to know if there is a way to
>  change the baseDir pointing to the
>  folder present under a different
>  server.
>
>  Please suggest an example to access the
>  PDF files from another server.
>
>
>  
>*  type="BinFileDataSource"/> *
>
>name="K2FileEntity" processor="FileListEntityProcessor"
>  dataSource="null"
>
>recursive = "true"
>
>
>  *baseDir="C:/solr-6.6.1/server/solr/core_K2_Depot/Depot"*
>  fileName=".*pdf"
>  rootEntity="false">
>
>
>
>
>-->
>
>  name="lastmodified" />
>
>
>processor="TikaEntityProcessor"
>  onError="skip"
>
>
>  url="${K2FileEntity.fileAbsolutePath}" format="text">
>
>
>  meta="true"/>
>
>  name="format" meta="true"/>
>
>  name="text"/>
>
>
>  
>  
>
>  
>
>
>  Kind regards,
>  Karan
>
>


Re: Quick Query about

2017-11-09 Thread Karan Saini
Thanks Charlie Hull for the quick answer. It worked for me in windows.

*baseDir="\\CLDserver02\RemoteK1Depot"*

Regards,
Karan



On 9 November 2017 at 14:58, Charlie Hull  wrote:

> On 09/11/2017 09:13, Karan Saini wrote:
>
>> Hi there,
>>
>
> Hi Karan,
>
> Have you tried the syntax baseDir="//servername/sharedfoldername" ? I
> believe this should work on a Windows network.
>
> Regards
>
> Charlie
>
>
>> I am new to the Apache Solr and currently exploring how to use this
>> technology to search in the PDF files.
>>
>> <https://lucene.apache.org/solr/guide/6_6/uploading-structur
>> ed-data-store-data-with-the-data-import-handler.html#the-
>> tikaentityprocessor>
>> https://lucene.apache.org/solr/guide/6_6/uploading-structure
>> d-data-store-data-with-the-data-import-handler.html#the-
>> tikaentityprocessor
>>
>> <https://lucene.apache.org/solr/guide/6_6/uploading-structur
>> ed-data-store-data-with-the-data-import-handler.html#the-
>> tikaentityprocessor>
>>
>> <https://lucene.apache.org/solr/guide/6_6/uploading-structur
>> ed-data-store-data-with-the-data-import-handler.html#the-
>> tikaentityprocessor>
>> I am able to index the PDF files using the "BinFileDataSource" for the PDF
>> files within the same server as shown in the below example.
>>
>> Now i want to know if there is a way to change the baseDir pointing to the
>> folder present under a different server.
>>
>> Please suggest an example to access the PDF files from another server.
>>
>>
>> 
>>* *
>>
>>  > dataSource="null"
>>  recursive = "true"
>>  *baseDir="C:/solr-6.6.1/server/solr/core_K2_Depot/Depot"*
>> fileName=".*pdf" rootEntity="false">
>>
>>  
>>  -->
>>  
>>
>>> onError="skip"
>>url="${K2FileEntity.fileAbsolutePath}"
>> format="text">
>>
>>  
>>  
>>  
>>
>>
>>  
>>
>> 
>>
>>
>> Kind regards,
>> Karan
>>
>>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>


Make search on the particular field to be case sensitive

2017-11-09 Thread Karan Saini
Hi guys,

Solr version :: 6.6.1

**

I have around 10 fields in my core. I want to make the search on this
specific field to be case sensitive. Please advise, how to introduce case
sensitivity at the field level. What changes do i need to make for this
field ?

Thanks,
Karan


Re: Make search on the particular field to be case sensitive

2017-11-10 Thread Karan Saini
Hi Erick,

Thanks for the help. It is working fine with the
*KeywordTokenizerFactory. *Like
you mentioned, i want to search for "dog" or "*dog*" alone also.
Case sensitivity is working fine, but i want to have the wild based search
also.

So I tried this changed code, but no luck !!

  

  
  
  



Please suggest, where am i making the mistake.

Kind regards,
Karan



On 9 November 2017 at 21:05, Erick Erickson  wrote:

> This won't quite work. "string" types are totally un-analyzed you
> cannot add filters to a solr.StrField, you must use solr.TextField
> rather than solr.StrField.
>
>
>  docValues="true"/>
> 
>   
>
>  
>  
>
>
> start over and re-index from scratch in a new collection of course.
>
> You also need to make sure you really want to search on the whole
> field. The KeywordTokenizerFactory doesn't split the incoming test up
> _at all_. So if the input is
> "my dog has fleas" you can't search for just "dog" unless you use the
> extremely inefficient *dog* form. If you want to search for words, use
> an tokenizer that breaks up the input, WhitespaceTokenizer for
> instance.
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 3:24 AM, Amrit Sarkar 
> wrote:
> > Behavior of the field values is defined by fieldType analyzer
> declaration.
> >
> > If you look at the managed-schema;
> >
> > You will find fieldType declarations like:
> >
> >  positionIncrementGap="100">
> >>  
> >>  >> ignoreCase="true"/> 
>  >> class="solr.EnglishPossessiveFilterFactory"/>  >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>  >> class="solr.PorterStemFilterFactory"/>   type="query">
> >>   >> "solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true"
> synonyms=
> >> "synonyms.txt"/>  >> "lang/stopwords_en.txt" ignoreCase="true"/>  >> "solr.LowerCaseFilterFactory"/>  >> "solr.EnglishPossessiveFilterFactory"/>  >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>  >> class="solr.PorterStemFilterFactory"/>  
> >
> >
> > In you case fieldType is "string". *You need to write analyzer chain for
> > the same fieldType and don't include:*
> >  
> >
> > LowerCaseFilterFactory is responsible lowercase the token coming in query
> > and while indexing.
> >
> > Something like this will work for you:
> >
> >  > docValues="true"/>
> >  
>   > fieldType>
> >
> > I listed "KeywordTokenizerFactory" considering this is string, not text.
> >
> > More details on: https://lucene.apache.org/solr/guide/6_6/analyzers.html
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Thu, Nov 9, 2017 at 4:41 PM, Karan Saini 
> wrote:
> >
> >> Hi guys,
> >>
> >> Solr version :: 6.6.1
> >>
> >> **
> >>
> >> I have around 10 fields in my core. I want to make the search on this
> >> specific field to be case sensitive. Please advise, how to introduce
> case
> >> sensitivity at the field level. What changes do i need to make for this
> >> field ?
> >>
> >> Thanks,
> >> Karan
> >>
>


Re: Make search on the particular field to be case sensitive

2017-11-10 Thread Karan Saini
Hi Erick,

Please ignore my earlier mail. I got it worked ! I missed the rule
attribute.



Now it is working.

Thanks,
Karan



On 10 November 2017 at 15:59, Karan Saini  wrote:

> Hi Erick,
>
> Thanks for the help. It is working fine with the *KeywordTokenizerFactory.
> *Like you mentioned, i want to search for "dog" or "*dog*" alone also.
> Case sensitivity is working fine, but i want to have the wild based search
> also.
>
> So I tried this changed code, but no luck !!
>
>sortMissingLast="true">
> 
>   
>   
>   
> 
> 
>
> Please suggest, where am i making the mistake.
>
> Kind regards,
> Karan
>
>
>
> On 9 November 2017 at 21:05, Erick Erickson 
> wrote:
>
>> This won't quite work. "string" types are totally un-analyzed you
>> cannot add filters to a solr.StrField, you must use solr.TextField
>> rather than solr.StrField.
>>
>>
>> > docValues="true"/>
>> 
>>   
>>
>>  
>>  
>>
>>
>> start over and re-index from scratch in a new collection of course.
>>
>> You also need to make sure you really want to search on the whole
>> field. The KeywordTokenizerFactory doesn't split the incoming test up
>> _at all_. So if the input is
>> "my dog has fleas" you can't search for just "dog" unless you use the
>> extremely inefficient *dog* form. If you want to search for words, use
>> an tokenizer that breaks up the input, WhitespaceTokenizer for
>> instance.
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 9, 2017 at 3:24 AM, Amrit Sarkar 
>> wrote:
>> > Behavior of the field values is defined by fieldType analyzer
>> declaration.
>> >
>> > If you look at the managed-schema;
>> >
>> > You will find fieldType declarations like:
>> >
>> > > positionIncrementGap="100">
>> >>  
>> >> > >> ignoreCase="true"/> 
>> > >> class="solr.EnglishPossessiveFilterFactory"/> > >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> > >> class="solr.PorterStemFilterFactory"/>  > type="query">
>> >>  > >> "solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true"
>> synonyms=
>> >> "synonyms.txt"/> > >> "lang/stopwords_en.txt" ignoreCase="true"/> > >> "solr.LowerCaseFilterFactory"/> > >> "solr.EnglishPossessiveFilterFactory"/> > >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> > >> class="solr.PorterStemFilterFactory"/>  
>> >
>> >
>> > In you case fieldType is "string". *You need to write analyzer chain for
>> > the same fieldType and don't include:*
>> >  
>> >
>> > LowerCaseFilterFactory is responsible lowercase the token coming in
>> query
>> > and while indexing.
>> >
>> > Something like this will work for you:
>> >
>> > > > docValues="true"/>
>> >  
>>  > > fieldType>
>> >
>> > I listed "KeywordTokenizerFactory" considering this is string, not text.
>> >
>> > More details on: https://lucene.apache.org/solr
>> /guide/6_6/analyzers.html
>> >
>> > Amrit Sarkar
>> > Search Engineer
>> > Lucidworks, Inc.
>> > 415-589-9269
>> > www.lucidworks.com
>> > Twitter http://twitter.com/lucidworks
>> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > Medium: https://medium.com/@sarkaramrit2
>> >
>> > On Thu, Nov 9, 2017 at 4:41 PM, Karan Saini 
>> wrote:
>> >
>> >> Hi guys,
>> >>
>> >> Solr version :: 6.6.1
>> >>
>> >> **
>> >>
>> >> I have around 10 fields in my core. I want to make the search on this
>> >> specific field to be case sensitive. Please advise, how to introduce
>> case
>> >> sensitivity at the field level. What changes do i need to make for this
>> >> field ?
>> >>
>> >> Thanks,
>> >> Karan
>> >>
>>
>
>


Solr - How to Clear the baseDir folder after the DIH import

2017-11-20 Thread Karan Saini
Hi guys,

Solr Version :: 6.6.1

I am able to import the pdf files into the Solr system using the DIH and
performs the indexing as expected. But i wish to clear the folder
C:/solr-6.6.1/server/solr/core_K2_Depot*/Depot* after the successful finish
of the indexing process.

Please suggest, if there is a way to delete all the files from the folder
via the DIH data-config.xml.




  
  





  





  

  


T
​hanks,

Karan​


Solr :: How to trigger the DIH from SolrNet API with C# code

2017-12-11 Thread Karan Saini
Hi guys,

*Solr Version :: 6.6.1*
API :: SolrNet with C# based application

I wish to invoke or trigger the data import handler from the C# code with
the help of SolrNet. But i am unable to locate any tutorial in the SolrNet
API.

Please suggest how do i invoke the data import action from the C# based
application ?

Regards,
Karan


Solr - Achieve Delta-Import with the FileListEntityProcessor for PDF Files

2017-12-12 Thread Karan Saini
Solr version :: 6.6.1

I am using the solr to index the PDF files and it is working fine as
expected. Now i have a requirement to perform the option of delta-import on
the PDF file.
I am not able to locate the example of implementing the delta-import with
FileListEntityProcessor.

Please suggest.


*data-config.xml* file looks like this one.


  
  






  





  

  



Thanks,
Karan


Perform incremental import with PDF Files

2018-01-29 Thread Karan Saini
Hi folks,

Please suggest the solution for importing and indexing PDF files
*incrementally*. My requirements is to pull the PDF files remotely from the
network folder path. This network folder will be having new sets of PDF
files after certain intervals (for say 20 secs). The folder will be forced
to get empty, every time the new sets of PDF files are copied into it. I do
not want to loose the earlier saved index of the old files, while doing the
next incremental import.

Currently, i am using Solr 6.6 version for the research.

The dataimport handler config is currently like this :-


  
  



-->


   




  

  


Kind regards,
Karan Singh


Re: Perform incremental import with PDF Files

2018-01-29 Thread Karan Saini
Thanks Emir :-) . Setting the property *clean=false* worked for me.

Is there a way, i can selectively clean the particular index from the
C#.NET code using the SolrNet API ?
Please suggest.

Kind regards,
Karan


On 29 January 2018 at 16:49, Emir Arnautović 
wrote:

> Hi Karan,
> Did you try running full import with clean=false?
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 29 Jan 2018, at 11:18, Karan Saini  wrote:
> >
> > Hi folks,
> >
> > Please suggest the solution for importing and indexing PDF files
> > *incrementally*. My requirements is to pull the PDF files remotely from
> the
> > network folder path. This network folder will be having new sets of PDF
> > files after certain intervals (for say 20 secs). The folder will be
> forced
> > to get empty, every time the new sets of PDF files are copied into it. I
> do
> > not want to loose the earlier saved index of the old files, while doing
> the
> > next incremental import.
> >
> > Currently, i am using Solr 6.6 version for the research.
> >
> > The dataimport handler config is currently like this :-
> >
> > 
> >  
> >  
> > > dataSource="null"
> >   recursive = "true"
> >   baseDir="\\CLDSINGH02\*RemoteFileDepot*"
> >   fileName=".*pdf" rootEntity="false">
> >
> >   
> >-->
> > name="lastmodified" />
> >
> >  onError="skip"
> > 
> > url="${K2FileEntity.fileAbsolutePath}"
> format="text">
> >
> >meta="true"/>
> >meta="true"/>
> >   
> > 
> >
> >  
> >
> >
> > Kind regards,
> > Karan Singh
>
>


Re: Perform incremental import with PDF Files

2018-01-29 Thread Karan Saini
Hi Emir,

There is one behavior i noticed while performing the incremental import. I
added a new field into the managed-schema.xml to test the incremental
nature of using the clean=false.

 **

Now xtimestamp is having a new value even on every DIH import with
clean=false property. Now i am confused that how will i know, if
clean=false is working or not ?
Please suggest.

Kind regards,
Karan



On 29 January 2018 at 20:12, Emir Arnautović 
wrote:

> Hi Karan,
> Glad it worked for you.
>
> I am not sure how to do it in C# client, but adding clean=false parameter
> in URL should do the trick.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 29 Jan 2018, at 14:48, Karan Saini  wrote:
> >
> > Thanks Emir :-) . Setting the property *clean=false* worked for me.
> >
> > Is there a way, i can selectively clean the particular index from the
> > C#.NET code using the SolrNet API ?
> > Please suggest.
> >
> > Kind regards,
> > Karan
> >
> >
> > On 29 January 2018 at 16:49, Emir Arnautović <
> emir.arnauto...@sematext.com>
> > wrote:
> >
> >> Hi Karan,
> >> Did you try running full import with clean=false?
> >>
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 29 Jan 2018, at 11:18, Karan Saini  wrote:
> >>>
> >>> Hi folks,
> >>>
> >>> Please suggest the solution for importing and indexing PDF files
> >>> *incrementally*. My requirements is to pull the PDF files remotely from
> >> the
> >>> network folder path. This network folder will be having new sets of PDF
> >>> files after certain intervals (for say 20 secs). The folder will be
> >> forced
> >>> to get empty, every time the new sets of PDF files are copied into it.
> I
> >> do
> >>> not want to loose the earlier saved index of the old files, while doing
> >> the
> >>> next incremental import.
> >>>
> >>> Currently, i am using Solr 6.6 version for the research.
> >>>
> >>> The dataimport handler config is currently like this :-
> >>>
> >>> 
> >>> 
> >>> 
> >>>>>> dataSource="null"
> >>>  recursive = "true"
> >>>  baseDir="\\CLDSINGH02\*RemoteFileDepot*"
> >>>  fileName=".*pdf" rootEntity="false">
> >>>
> >>>  
> >>>   -->
> >>>>> name="lastmodified" />
> >>>
> >>> >> onError="skip"
> >>>url="${K2FileEntity.
> fileAbsolutePath}"
> >> format="text">
> >>>
> >>>   >> meta="true"/>
> >>>   >> meta="true"/>
> >>>  
> >>>
> >>>   
> >>> 
> >>>
> >>>
> >>> Kind regards,
> >>> Karan Singh
> >>
> >>
>
>