Indexing should be fine - depending on your total document count. I think the potential issue is the FieldCache at query time. I think it should be linear based on number of documents, fields, and unique terms per field for string values, so if you do two tests, index with 1,000 docs and then 2,000 docs, and then check Java memory usage after a simple query, then after a query with a significant number of these faceted fields, and then after a couple more queries with a high number of distinct fields that are faceted, and then multiply those memory use increments to scale up to your expected range of documents, that should give you a semi-decent estimate of memory the JVM will need. CPU requirement estimating would be more complex, but memory has to work out first. And the delta for index size between 1,000 and 2,000 should give you a number to scale up to total index size, roughly, but depending on relative uniqueness of field values.

-- Jack Krupansky

-----Original Message----- From: Keswani, Nitin - BLS CTR
Sent: Monday, May 14, 2012 10:27 AM
To: solr-user@lucene.apache.org
Subject: RE: Documents With large number of fields

Unfortunately I never got any response. However I did a POC with a Document containing 400 fields and loaded around 1000 docs to my local machine. I didn’t see any issue but then again the document set was very small. Hopefully as mentioned below providing enough memory should help alleviate any performance issues.

Thanks.

Regards,

Nitin Keswani


-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Sunday, May 13, 2012 10:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Documents With large number of fields

I didn't see any response. There was a similar issue recently, where someone had 400 faceted fields with 50-70 facets per query and they were running out of memory due to accumulation of the FieldCache for these faceted fields, but that was on a 3 GB system.

It probably could be done, assuming a fair number of 64-bit sharded machines.

-- Jack Krupansky

-----Original Message-----
From: Darren Govoni
Sent: Sunday, May 13, 2012 7:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Documents With large number of fields

Was there a response to this?

On Fri, 2012-05-04 at 10:27 -0400, Keswani, Nitin - BLS CTR wrote:
Hi,

My data model consist of different types of data. Each data type has
its own characteristics

If I include the unique characteristics of each type of data, my
single Solr Document could end up containing 300-400 fields.

In order to drill down to this data set I would have to provide
faceting on most of these fields so that I can drilldown to very small
set of Documents.

Here are some of the questions :

1) What's the best approach when dealing with documents with large
number of fields .
    Should I keep a single document with large number of fields or
split my
    document into a number of smaller  documents where each document
would consist of some fields

2) From an operational point of view, what's the drawback of having a
single document with a very large number of fields.
    Can Solr support documents with large number of fields (say 300 to
400).


Thanks.

Regards,

Nitin Keswani


Reply via email to