No, one cannot ignore the schema. If you try to add a field not in the schema you get an error. One could, however, use any arbitrary subset of the fields defined in the schema for any particular #document# in the index. Say your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one doc, and fields f6-f10 in another and f1, f4, f9 in another and.....
The only field(s) that #must# be in a document are the required="true" fields. There's no real penalty for omitting fields from particular documents. This allows you to store "special" documents that aren't part of normal searches. You could, for instance, use a document to store meta-information about your index that had whatever meaning you wanted in a field(s) that *no* other document had. Your app could then read that "special" document and make use of that info. Searches on "normal" documents wouldn't return that doc, etc. You could effectively have N indexes contained in one index where a document in each logical sub-index had fields disjoint from the other logical sub-indexes. Why you'd do something like that rather than use cores is a very good question, but you #could# do it that way... All this is much different from a database where there are penalties for defining a large number of unused fields. Whether doing this is wise or not given the particular problem you're trying to solve is another discussion <G>.. Best Erick On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon <gear...@sbcglobal.net>wrote: > Based on more searches and manual consolidation, I've put together some of > the ideas for this already suggested in a summary below. The last item in > the > summary > seems to be interesting, low technical cost way of doing it. > > Basically, it treats the index like a 'BigTable', a la "No SQL". > > Erick Erickson pointed out: > "...but there's absolutely no requirement > that all documents in SOLR have the same fields..." > > I guess I don't have the right understanding of what goes into a Document > in Solr. Is it just a set of fields, each with it's own independent field > type > declaration/id, it's name, and it's content? > > So even though there's a schema for an index, one could ignore it and > jsut throw any other named fields and types and content at document > addition > time? > > So If I wanted to search on a base set, all documents having it, I could > then > additionally filter based on the (might be wrong use of this) dynamic > fields? > > > > > > > Origninal Thread that I started: > ---------------------------------------- > > http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html > > > ----------------------------------------------------------------------------------------------------- > > Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!): > > ----------------------------------------------------------------------------------------------------- > > > 1/ Base object of some kind, x number of fields > 2/ Derived objects representing Divisiion in company, different customer > bases, > etc. > each having 2 additional, unique fields. > 3/ Assume 1000 such derived object types > 4/ A 'flattened' Index would have the x base object fields, > ****and 2000**** additional fields > > > ================================================ > Solutions Posited > ----------------------- > > A/ First thought, muliti-value columns as key pairs. > 1/ Difficult to access individual items of more than one 'word' length > for querying in multivalued fields. > 2/ All sorts of statistical stuff probably wouldn't apply? > 3/ (James Dayer said:) There's also one "gotcha" we've experienced > when > searching acrosse > multi-valued fields: SOLR will match across field occurences. > In the example below, if you were to search > q=contrib_name:(james > AND smith), > you will get this record back. It matches one name from one > contributor and > > another name from a different contributor. This is not what > our > users want. > > > As a work-around, I am converting these to phrase queries with > slop: "james smith"~50 ... Just use a slop # smaller than your > positionIncrementGap > > and bigger than the # of terms entered. This will prevent the > cross-field matches > > yet allow the words to occur in any order. > > The problem with this approach is that Lucene doesn't support > wildcards in phrases > B/ Dynamic fields was suggested, but I am not sure exactly how they > work, and the person who suggested it was not sure it would work, > either. > C/ Different field naming conventions were suggested in field types were > similar. > I can't predict that. > D/ Found this old thread, and i had other suggestions: > 1/ Use multiple cores, one for each record type/schema, aggregate > them in > during the query. > 2/ Use a fixed number of additional fields X 2. Eatch additional > field is > actually a pair of fields. > The first of the pair gives the colmn name, the second gives the > data. > > a) Although I like this, I wonder how many extra fields to use, > b) it was pointed out that relevancy and other statistical > criterial > for queries might suffer. > 3/ Index the different objects exactly as they are, i.e. as Erick > Erickson said: > "I'm not entirely sure this is germane, but there's absolutely no > requirement > > that all documents in SOLR have the same fields. So it's possible > for > you to > > index the "wildly different content" in "wildly different fields" > <G>. Then > > searching for screen:LCD would be straightforward."... > Dennis Gearon > > > Signature Warning > ---------------- > It is always a good idea to learn from your own mistakes. It is usually a > better > idea to learn from others’ mistakes, so you do not have to make them > yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. > >