OK, this is bizarre. You'd have had to set up SolrCloud by specifying the -zkRun command when you start Solr or the -zkHost; highly unlikely. On the admin page there would be a "cloud" link on the left side, I really doubt one's there.
You should have a data directory, it should be the parent of the index and tlog directories. As of sanity check try looking at the analysis page. Type a bunch of words in the left hand side indexing box and uncheck the verbose box. As you can tell I'm grasping at straws. I'm still puzzled why you don't have a "data" directory here, but that shouldn't really matter. How did you create this index? I don't mean data import handler more how did you create the core that you're indexing to? Best, Erick On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers <mark.fenb...@noaa.gov> wrote: > On 9/23/2015 12:30 PM, Erick Erickson wrote: > >> Then my next guess is you're not pointing at the index you think you are >> when you 'rm -rf data' >> >> Just ignore the Elall field for now I should think, although get rid of it >> if you don't think you need it. >> >> DIH should be irrelevant here. >> >> So let's back up. >> 1> go ahead and "rm -fr data" (with Solr stopped). >> > I have no "data" dir. Did you mean "index" dir? I removed 3 index > directories (2 for spelling): > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex > >> 2> start Solr >> 3> do NOT re-index. >> 4> look at your index via the schema-browser. Of course there should be >> nothing there! >> > Correct! It said "there is no term info :(" > >> 5> now kick off the DIH job and look again. >> > Now it shows a histogram, but most of the "terms" are long -- the full > texts of (the table.column) eventlogtext.logtext, including the whitespace > (with %0A used for newline characters)... So, it appears it is not being > tokenized properly, correct? > >> Your logtext field should have only single tokens. The fact that you have >> some very >> long tokens presumably with whitespace) indicates that you aren't really >> blowing >> the index away between indexing. >> > Well, I did this time for sure. I verified that initially, because it > showed there was no term info until I DIH'd again. > >> Are you perhaps in Solr Cloud with more than one replica? >> > Not that I know of, but being new to Solr, there could be things going on > that I'm not aware of. How can I tell? I certainly didn't set anything up > for solrCloud deliberately. > >> In that case you >> might be getting the index replicated on startup assuming you didn't >> blow away all replicas. If you are in SolrCloud, I'd just delete the >> collection and >> start over, after insuring that you'd pushed the configset up to >> Zookeeper. >> >> BTW, I always look at the schema.xml file from the Solr admin window just >> as >> a sanity check in these situations. >> > Good idea! But the one shown in the browser is identical to the one I've > been editing! So that's not an issue. > >