Re: Error when submitting PDF to Solr w/text fields using SolrJ

Erick Erickson Fri, 19 Jun 2015 09:22:18 -0700

This may be another forehead-slapper (man, you don't know how often
I've injured myself that way).


Did you commit at the end of the SolrJ indexing to Testcore2? DIH automatically
commits at the end of the run, and depending on how your SolrJ program
is written
it may not have. Or just set autoCommit (with openSearcher=true) in
your solrconfig
file. Or set autoSoftCommit there. In either case, wait until the
interval has expired
after your indexing has run.

Or, for that matter, you can insure you've committed by using curl or
just entering
something like
..../Testcore2/update?commit=true
in a url.

And another one that'll make you cringe is if your SolrJ program looks like:

while (more docs) {
   create a solr doc and add it to my list
   if (list > 100) {
      send list to Solr
      clear list
  }
}
end of program.

As the program exits, there'll still be docs in the list that haven'
been sent to Solr.

Alessandro's question hints at things like this, the question is
whether the doc is
all the docs got sent to Solr or not. Second question is whether
they're analyzed
differently in the two cores. Third question....

Best,
Erick



On Fri, Jun 19, 2015 at 8:32 AM, Alessandro Benedetti
<benedetti.ale...@gmail.com> wrote:
> So, the first I can say is if that is true : "it almost killed Solr with
> 280 files" you are doing something wrong for sure.
> At least if you are not trying to index 4k full movies xD
>
> Joking apart :
> 1) You should carefully design your analyser.
> 2) You should store your fields initially to verify you index what you were
> supposed to ( in number and in content)
> Assuming you are a beginner storing the fields will make easier for you to
> check, as they will pop out of the results.
>
> is at least the number of docs indexed correct ?
>
>
> 2015-06-19 15:34 GMT+01:00 Paden <rumsey...@gmail.com>:
>
>> Yeah, actually changing the field to "text_en" or "text_en_splitting"
>> actually made it so my indexer indexed all my files. The only problem is, I
>> don't think it's doing it well.
>>
>> I have two Cores that I'm working with. Both of them have indexed the same
>> set of files. The first core, which I will refer to as Testcore, I used a
>> DIH configuration that indexed the files with their metadata. (It indexed
>> everything fine but it almost killed Solr with 280 files I would hate to
>> see
>> what would happen with say, 10,000 files.). When I query Testcore on some
>> random common word like "a" it returns like 279 files. A good margin I can
>> accept that.
>>
>> The second core, which I will refer to as Testcore2, I used my own indexer
>> that I created and use SolrJ as the client. It indexes everything. However,
>> when I query on the same word "a" it only returns 208 of the 281 files.
>> Which is weird cause I'm using the exact same Querying handler for both. So
>> I don't think a comprehensive indexed text is being sent to Solr.
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704p4212933.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England

Re: Error when submitting PDF to Solr w/text fields using SolrJ

Reply via email to