For general search use cases, it's generally not a good idea to index giant documents. A relevance score for an entire book is generally less meaningful than if you can break it up into chapters or sections. Those subdivisions are often much more useful to a user from a usability standpoint for understanding not just that say a book is relevant but a particular section in a book is relevant to their query.
Just my 2 cents -Doug On Thu, Nov 3, 2016 at 9:57 AM Shawn Heisey <apa...@elyograg.org> wrote: > On 11/3/2016 2:49 AM, Chien Nguyen wrote: > > Hi everyone! I'm a newbie in using Apache Solr. I've read some > > documents about it. But i can't answer some questions. > > Second reply, so I'm aiming for more detail. > > > 1. How many documents Solr can search at a moment?? > > A *single* Solr index has Lucene's limitation of slightly more than 2 > billion documents. This is part of the problem solved by SolrCloud. By > throwing multiple machines/shards at the problem, there is effectively > no limit to the size of a SolrCloud collection. I have encountered > someone who has a collection with five billion documents in it. > > That 2 billion document limit I mentioned, which is Java's > Integer.MAX_VALUE, is the ONLY hard limit that I know of in the > software, and only applies when the index is not sharded. > > > 2. Can Solr index the media data?? > > I have no idea what you meant here, but if you mean metadata, Solr most > likely can handle it. If you meant actual media, like an image, I > believe there is a binary field type that you can even store a full > source document in, but that is not normally the way Solr is used, and I > don't recommend it. > > > 3. What's the max size of document that Solr can index??? > > I don't think there is a limit. I think there are some limits on the > number and size of individual terms, but not on the total size of a > document. If documents get particularly large and numerous, performance > might suffer, but I am not aware of any total size limitations. > > Thanks, > Shawn > >