Oh, and to make matters even more "interesting", for docValues=true fields there's no need to even store anything, you can return the fields in the fl list that are docValues=true, stored=false.......
On Tue, Nov 15, 2016 at 1:53 AM, Prateek Jain J <prateek.j.j...@ericsson.com> wrote: > > Thanks a lot Erick > > > Regards, > Prateek Jain > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: 14 November 2016 09:14 PM > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: index and data directories > > Theoretically, perhaps. And it's quite true that stored data for fields > marked stored=true are just passed through verbatim and compressed on disk > while the data associated with indexed=true fields go through an analysis > chain and are stored in a much different format. However these different data > are simply stored in files with different suffixes in a segment. So you might > have _0.fdx, _0.fdt, _0.tim, _0.tvx etc. that together form a single segment. > > This is done on a per-segment basis. So certain segment files, namely the > *.fdt and *.fdx file will contain the stored data while other extensions have > the indexed data, see: "File naming" here for a somewhat out of date format, > but close enough for this discussion: > https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html. > And there's no option to store the *.fdt and *.fdx files independently from > the rest of the segment files. > > This statement: "I mean documents which are to be indexed" really doesn't > make sense. You send these things called Solr documents to be indexed, but > they are just a set of fields with values handled as their definitions > indicate (i.e. respecting stored=true|false, indexed=true false, > docValues=true|false. The Solr document sent by SolrJ is simply thrown away > after processing into segment files. > > If you're sending semi-structured docs (say Word, PDF etc) to be indexed > through Tika they are simply transformed into a Solr doc (set of field/value > pairs) and the original document is thrown away as well. There's no option to > store the original semi-structured doc either. > > > Best, > Erick > > On Mon, Nov 14, 2016 at 12:35 PM, Prateek Jain J > <prateek.j.j...@ericsson.com> wrote: >> >> By data, I mean documents which are to be indexed. Some fields can be >> stored="true" but that doesn’t matter. >> >> For example: App1 creates an object (AppObj) to be indexed and sends it to >> SOLR via solrj. Some of the attributes of this object can be declared to be >> used for storage. >> >> Now, my understanding is data and indexes generated on data are two separate >> things. In my particular example, all fields have stored="true" but only >> selected fields have indexed="true". My expectation is, indexes are stored >> separately from data because indexes can be generated by different >> techniques/algorithms but data/documents remain unchanged. Please correct me >> if my understanding is not correct. >> >> >> Regards, >> Prateek Jain >> >> -----Original Message----- >> From: Erick Erickson [mailto:erickerick...@gmail.com] >> Sent: 14 November 2016 07:05 PM >> To: solr-user <solr-user@lucene.apache.org> >> Subject: Re: index and data directories >> >> The question is pretty opaque. What do you mean by "data" as opposed to >> "indexes"? Are you talking about where Lucene puts stored="true" >> fields? If not, what do you mean by "data"? >> >> If you are talking about where Lucene puts the stored="true" bits the no, >> there's no way to segregate that our from the other files that make up a >> segment. >> >> Best, >> Erick >> >> On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J >> <prateek.j.j...@ericsson.com> wrote: >>> >>> Hi Alex, >>> >>> I am unable to get it correctly. Is it possible to store indexes and data >>> separately? >>> >>> >>> Regards, >>> Prateek Jain >>> >>> -----Original Message----- >>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] >>> Sent: 14 November 2016 03:53 PM >>> To: solr-user <solr-user@lucene.apache.org> >>> Subject: Re: index and data directories >>> >>> solr.xml also has a bunch of properties under the core tag: >>> >>> <cores adminPath="/admin/cores"> >>> <core name="core0" instanceDir="core0"> >>> <property name="dataDir" value="/data/core0"/></core> >>> <core name="core1" instanceDir="core1"/> >>> </cores> >>> >>> You can get the Reference Guide for your specific version here: >>> http://archive.apache.org/dist/lucene/solr/ref-guide/ >>> >>> Regards, >>> Alex. >>> ---- >>> Solr Example reading group is starting November 2016, join us at >>> http://j.mp/SolrERG Newsletter and resources for Solr beginners and >>> intermediates: >>> http://www.solr-start.com/ >>> >>> >>> On 15 November 2016 at 02:37, Prateek Jain J <prateek.j.j...@ericsson.com> >>> wrote: >>>> >>>> Hi All, >>>> >>>> We are using solr 4.8.1 and would like to know if it is possible to >>>> store data and indexes in separate directories? I know following tag >>>> exist in solrconfig.xml file >>>> >>>> <!-- Data Directory Used to specify an alternate directory to hold all >>>> index >>>> data other than the default ./data under >>>> the Solr home. If replication is >>>> in use, this should match the replication >>>> configuration. --> >>>> <dataDir>C:/del-it/solr/cm_events_nbi/data</dataDir> >>>> >>>> >>>> >>>> Regards, >>>> Prateek Jain