Hi, (should this be on solr-dev mailing list?)
I have this kind of data, about articles in newspapers: article A-001 . published on 2010-10-31, in newspaper "N-1", edition "E1" . published on 2010-10-30, in newspaper "N-2", edition "E2" article A-002 . published on 2010-10-30, in newspaper "N-1", edition "E1" I have to be able to search on those "sub-fields", eg: all articles published on 2010-10-30 in newspaper "N-1" (all editions) I expect to find document A-002, but not document A-001 I control the indexing, analyzers,... but I would like use standard Solr query syntax (or an extension of it) If I index those documents: <add> <doc> <field name="id">A-001</field> <field name="pubDate">2010-10-31</field> <field name="ns">N-1</field> <field name="ed">E1</field> <field name="pubDate">2010-10-30</field> <field name="ns">N-2</field> <field name="ed">E2</field> </doc> <doc> <field name="id">A-002</field> <field name="pubDate">2010-10-30</field> <field name="ns">N-1</field> <field name="ed">E1</field> </doc> </add> (ie: flattening the structure, losing the link between newspapers and dates) then a search for "pubDate=2010-10-30 AND ns=N-1" will give me both documents (because A-001 has been published in newspaper N-1 (at another date) and has been published on 2010-10-30 (but in another newspaper)) Is there any way to index the data/express the search/... to be able to find only document "A-002"? In Solr terms, I believe that this is a multi-valued "poly" field (not yet in the current stable version 1.4...) Will this be supported by the next release? (what syntax?) Some idea that I've had (usable with Solr 1.4) (1) Add fields like this for doc A-001: <field name="combined">N-1/E1/2010-10-31</field> <field name="combined">N-2/E2/2010-10-30</field> and make a wildcard search "N-1/*/2010-10-30" this will work for simple queries, but: . I think that it will not allow range queries: "all articles published in newspaper N-1 between 2009-08-01 and 2010-10-15" . a wildcard query on N-1/E2/* will be very inefficient! . writing queries will be more difficult (sometimes the user has to use the field "ns", something the field "combined",...) (2) Make the simple query "pubDate=2010-10-30 AND ns=N-1", but filter the results (the above query will give all correct results, plus some more). This is not a generic solution, and writing the filter will be difficult if the query is more complex: (pubDate=2010-10-31 AND ns=N-1 ) OR (text contains "Barack") (3) On the same field as (1) here above, use an analyzer that will cheat the proximity search, in issuing the following terms: term 1: "ns:N-1" term 2: "ed:E1" term 3: "pubDate:2010-10-31" term 11: "ns:N-2" term 12: "ed:E2" term 13: "pubDate:2010-10-30" ... then a proximity search (combined:"ns:N-1" AND combined:"pubDate:2010-10-30")~3 will give me only document A-002, not document A-001 Again, this will make problems with range queries, won't it? Isn't there any better way to do this? Ideally, I would index this (with my own syntax...): <doc> <field name="id">A-001</field> <field name="pubDate" set="1">2010-10-31</field> <field name="ns" set="1">N-1</field> <field name="ed" set="1">E1</field> <field name="pubDate" set="2">2010-10-30</field> <field name="ns" set="2">N-2</field> <field name="ed" set="2">E2</field> </doc> and then search: (pubDate=2010-10-31 AND ns=N-1){sameSet} or something like this... I've found references to similar questions, but no answer that I could use in my case. (this one being the closer to my problem: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c9b742a34aa31814594f2bed8dfd9cceec96ca...@sparky.office.techtarget.com%3e or *http://tinyurl.com/3527w4u*) Thanks in advance for your ideas! (and sorry for any english mistakes)