No, there’s no a-priori file size in Solr. But ingesting a 340M file will take 
a long time. A very long time. The timeout is probably just the client timeout, 
I’ve seen a situation where the doc does get indexed even though there’s a 
timeout.

However:

1> There are several timeouts to be aware of that you can lengthen, all in 
solr.xml:
        • socketTimeout

        • connTimeout

        • distribUpdateConnTimeout

        • distribUpdateSoTimeout


distribUdateConnTimeout is important. If you have leaders and replicas 
(SolrCloud), the leader forwards the doc to the follower. If this timeout is 
exceeded, the leader may put the follower into “Leader Initiated Recovery”. You 
really need to insure that this parameter is longer than any anticipated 
timeout.

2> If you’re just throwing a 340MB  “semi structured” document at Solr (i.e. 
Word, PDF, whatever) you’re putting an awful lot of work on the node doing the 
indexing. You probably want to move the parsing off Solr, see: 
https://lucidworks.com/post/indexing-with-solrj/ or use one of the services.

3> I always question the utility of indexing such a large document. Assuming 
that’s mostly textual data, what are you going to do with it? It’ll have so 
many words in it that it’ll be found by many, many, many searches. It’ll also 
have so many words in it that it’ll tend to be far down in the results list. 
Assuming you’re OK with those issues, what will the user do with it if they 
click on it? Wait until the entire file is returned to the laptop then have the 
browser blow up trying to load it? My point is perhaps a better idea is to ask 
what use-case indexing this document serves. It may be that you have a 
perfectly valid reason, I just want to be sure you’ve thought through the 
implications.

Best,
Erick

> On Aug 22, 2019, at 9:00 AM, Sanjoy Ganguly <gangulysanjoy.gang...@gmail.com> 
> wrote:
> 
> Hello,
> 
> Good evening!
> 
> I am facing issue while trying to index 4 files. Getting "time out error"
> in log.
> 
> I am using Solr 7.5, installed in the Linux server.  We have lot of
> business document that we are able to index but except below listed file.
> 
> 1. File 1
>    Size- approx 340 MB
>    Page count- approx 5800
> 
> Rest files are also have same type of figure.
> 
> Just to clarify this file are opening in Adobe reader. File are having text.
> 
> All files are in PDF format.
> 
> Question-  Is there any file size or page count restriction in solr?
> 
> *Asper business protocol I will not be able to attach the files.
> 
> Thanks .
> 
> Awaiting your response.
> 
> Regards,
> Sanjoy Ganguly

Reply via email to