bq: So I want to allow people to upload any CSV/XML/JSON to solr they want so having a predefined schema isn't going to cut it....
Piling on to Shawn's excellent comments.... I would really advise agains this. Sure, you could make everything a text field using the * catch-all, but..... If a field in that CSV/XML/JSON is an integer, they can't do numeric range queries. If it's a date they can't do date-math. If it's French they can't get French stopwords. Or accent folding. Or accent keeping. And if it's a language that doesn't break on whitespace or read left-to-right the search experience will be very poor if it works at all. So what you're looking at is a long list of dynamic fields with suffixes as Shawn advised. Possibly multiple variants of text (i.e. *_txt_en, *_txt_fr and the like) if you mean to support more languages than just English. Schemaless isn't particularly suited for this case. As Shawn indicated, Solr will try to guess. So if the first time it sees a field it's an integer and the next time a float indexing will fail. Best, Erick On Sat, Jul 18, 2015 at 9:04 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 7/18/2015 9:49 AM, Charlie Hubbard wrote: >> So I want to allow people to upload any CSV/XML/JSON to solr they want so >> having a predefined schema isn't going to cut it. After reading about my >> options I figured my choices were schema-less mode and dynamic fields using >> the * with a type other than ignore. I know the docs say schema-less isn't >> something for production, but it seems like that is changing if I read >> between the lines. With dynamic fields can I still use the schema API to >> describe all of the fields that have been indexed? >> >> I like how easy dynamic fields is to configure, so what are the pros and >> cons of both? > > Dynamic fields are a reasonable way to do *some* things. Using a full > wildcard of "*" for them is generally NOT a good way to do it, although > it will work. It's better to do things like "*_i" for integer, "*_s" > for string, etc. > > Schemaless mode has some inherent risks. You are asking Solr to *guess* > what fieldType the new field will get ... if Solr guesses wrong, you > can't fix it without manually modifying the schema, which will almost > certainly require a reindex. > > https://wiki.apache.org/solr/HowToReindex > > Schemaless mode is great during prototyping and initial setup, but I > personally would not want to run in that mode in production. There's > nothing *wrong* with doing so, but I would not want my production schema > to change because the data guys added a new field and didn't tell me. I > would rather have the indexing fail loudly so everyone is aware that the > config needs attention. At that point, I can fix the config, and I will > know it's fixed correctly. A reindex might *still* be required after > the change, of course. > > Your situation sounds like it might be a little different than mine. If > it were me, I would require that the users conform their field names to > something like the *_i and *_s that I mentioned above, and use dynamic > fields. The schema is still completely static and behavior is entirely > predictable. > > Thanks, > Shawn >