No, there’s no a-priori file size in Solr. But ingesting a 340M file will take
a long time. A very long time. The timeout is probably just the client timeout,
I’ve seen a situation where the doc does get indexed even though there’s a
timeout.
However:
1> There are several timeouts to be aware of that you can lengthen, all in
solr.xml:
• socketTimeout
• connTimeout
• distribUpdateConnTimeout
• distribUpdateSoTimeout
distribUdateConnTimeout is important. If you have leaders and replicas
(SolrCloud), the leader forwards the doc to the follower. If this timeout is
exceeded, the leader may put the follower into “Leader Initiated Recovery”. You
really need to insure that this parameter is longer than any anticipated
timeout.
2> If you’re just throwing a 340MB “semi structured” document at Solr (i.e.
Word, PDF, whatever) you’re putting an awful lot of work on the node doing the
indexing. You probably want to move the parsing off Solr, see:
https://lucidworks.com/post/indexing-with-solrj/ or use one of the services.
3> I always question the utility of indexing such a large document. Assuming
that’s mostly textual data, what are you going to do with it? It’ll have so
many words in it that it’ll be found by many, many, many searches. It’ll also
have so many words in it that it’ll tend to be far down in the results list.
Assuming you’re OK with those issues, what will the user do with it if they
click on it? Wait until the entire file is returned to the laptop then have the
browser blow up trying to load it? My point is perhaps a better idea is to ask
what use-case indexing this document serves. It may be that you have a
perfectly valid reason, I just want to be sure you’ve thought through the
implications.
Best,
Erick
> On Aug 22, 2019, at 9:00 AM, Sanjoy Ganguly <[email protected]>
> wrote:
>
> Hello,
>
> Good evening!
>
> I am facing issue while trying to index 4 files. Getting "time out error"
> in log.
>
> I am using Solr 7.5, installed in the Linux server. We have lot of
> business document that we are able to index but except below listed file.
>
> 1. File 1
> Size- approx 340 MB
> Page count- approx 5800
>
> Rest files are also have same type of figure.
>
> Just to clarify this file are opening in Adobe reader. File are having text.
>
> All files are in PDF format.
>
> Question- Is there any file size or page count restriction in solr?
>
> *Asper business protocol I will not be able to attach the files.
>
> Thanks .
>
> Awaiting your response.
>
> Regards,
> Sanjoy Ganguly