Re: "Avoiding" a schema.xml

Erick Erickson Sun, 03 May 2015 09:59:16 -0700

OK, I don't think you actually need the managed schema stuff (although
you could use it).


So, you're analyzing these docs and making guesses (educated guesses,
probably very
sophisticated guesses, but guesses) about what kind of thing it is
(numeric, name, city,
concept, whatever).

You can simply define a bunch of canonical dynamic fields, there are
samples in the schema.xml
files in the distro that are defined however you want. Now, when
you're indexing things
just map your fields into one of the dynamic field definitions.

HTH,
Erick

On Sat, May 2, 2015 at 2:33 PM, Sznajder ForMailingList
<bs4mailingl...@gmail.com> wrote:
> Thanks!
>
> Indeed, one of my issues is that I can not know about the fields to be
> indexed before seeing (and making some entity extraction) on the browsed
> documents.
> It is the reason I thought to avoid the schema definition ...
>
> The schema API sounds interesting! Does it exist via SolrJ?
>
> Many thanks!
>
> Benjamin
>
> On Thu, Apr 30, 2015 at 6:27 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Could you explain a bit more _why_ you want to do this? As you're
>> probably well aware, there
>> are multiple ways to shoot yourself in the foot in lower-level Lucene.
>>
>> If you have some situation where you're creating indexes on the fly
>> that may vary then
>> you could consider the "managed schema" that lets you create a schema
>> via API calls,
>> then you wouldn't need to mess with editing the schema.xml file for
>> instance.
>>
>> Best,
>> Erick
>>
>> On Thu, Apr 30, 2015 at 8:12 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>> > On 4/30/2015 8:43 AM, Sznajder ForMailingList wrote:
>> >> I am interested to index some documents in Solr, as I did in Lucene.
>> >>
>> >> I mean: giving via solrJ all the information about the field I am adding
>> >> (Tokenize, store, facet etc...)
>> >>
>> >> can we do that? Or is it mandatory to define a schema on the collection?
>> >
>> > All that information is defined on the server.  You do not have direct
>> > access to the Lucene index - Solr is intended as an abstraction, so the
>> > admin and the users/applications that use Solr do not need to understand
>> > all the low-level details that go into a Lucene application.  The admin
>> > just has to deal with configuration files like schema.xml, and the users
>> > just need to know what fields are in each document and how the query
>> > syntax works.  Deeper Lucene knowledge is helpful, but not strictly
>> > necessary.
>> >
>> > If you want Lucene-level control, you'll need to write the search server
>> > yourself using Lucene.  If you have very specific needs that Solr's
>> > approach can't satisfy, you always have this option.
>> >
>> > The newest Solr versions do have an example of what's known as a
>> > "data-driven" schema, or schemaless mode.  In this mode, Solr builds up
>> > the schema automatically, guessing the field type based on what kind of
>> > data is the first to arrive for each field.  This is good for
>> > prototyping, but for production use, I would want to be in full manual
>> > control of the schema.
>> >
>> > Thanks,
>> > Shawn
>> >
>>

Re: "Avoiding" a schema.xml

Reply via email to