For general search use cases, it's generally not a good idea to index giant
documents. A relevance score for an entire book is generally less
meaningful than if you can break it up into chapters or sections. Those
subdivisions are often much more useful to a user from a usability
standpoint for understanding not just that say a book is relevant but a
particular section in a book is relevant to their query.

Just my 2 cents
-Doug

On Thu, Nov 3, 2016 at 9:57 AM Shawn Heisey <apa...@elyograg.org> wrote:

> On 11/3/2016 2:49 AM, Chien Nguyen wrote:
> > Hi everyone! I'm a newbie in using Apache Solr. I've read some
> > documents about it. But i can't answer some questions.
>
> Second reply, so I'm aiming for more detail.
>
> > 1. How many documents Solr can search at a moment??
>
> A *single* Solr index has Lucene's limitation of slightly more than 2
> billion documents.  This is part of the problem solved by SolrCloud.  By
> throwing multiple machines/shards at the problem, there is effectively
> no limit to the size of a SolrCloud collection.  I have encountered
> someone who has a collection with five billion documents in it.
>
> That 2 billion document limit I mentioned, which is Java's
> Integer.MAX_VALUE, is the ONLY hard limit that I know of in the
> software, and only applies when the index is not sharded.
>
> > 2. Can Solr index the media data??
>
> I have no idea what you meant here, but if you mean metadata, Solr most
> likely can handle it.  If you meant actual media, like an image, I
> believe there is a binary field type that you can even store a full
> source document in, but that is not normally the way Solr is used, and I
> don't recommend it.
>
> > 3. What's the max size of document that Solr can index???
>
> I don't think there is a limit.  I think there are some limits on the
> number and size of individual terms, but not on the total size of a
> document.  If documents get particularly large and numerous, performance
> might suffer, but I am not aware of any total size limitations.
>
> Thanks,
> Shawn
>
>

Reply via email to