On 9/23/2015 12:30 PM, Erick Erickson wrote:
Then my next guess is you're not pointing at the index you think you are
when you 'rm -rf data'

Just ignore the Elall field for now I should think, although get rid of it
if you don't think you need it.

DIH should be irrelevant here.

So let's back up.
1> go ahead and "rm -fr data" (with Solr stopped).
I have no "data" dir. Did you mean "index" dir? I removed 3 index directories (2 for spelling):
cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
2> start Solr
3> do NOT re-index.
4> look at your index via the schema-browser. Of course there should be
nothing there!
Correct!  It said "there is no term info :("
5> now kick off the DIH job and look again.
Now it shows a histogram, but most of the "terms" are long -- the full texts of (the table.column) eventlogtext.logtext, including the whitespace (with %0A used for newline characters)... So, it appears it is not being tokenized properly, correct?
Your logtext field should have only single tokens. The fact that you have
some very
long tokens presumably with whitespace) indicates that you aren't really
blowing
the index away between indexing.
Well, I did this time for sure. I verified that initially, because it showed there was no term info until I DIH'd again.
Are you perhaps in Solr Cloud with more than one replica?
Not that I know of, but being new to Solr, there could be things going on that I'm not aware of. How can I tell? I certainly didn't set anything up for solrCloud deliberately.
In that case you
might be getting the index replicated on startup assuming you didn't
blow away all replicas. If you are in SolrCloud, I'd just delete the
collection and
start over, after insuring that you'd pushed the configset up to Zookeeper.

BTW, I always look at the schema.xml file from the Solr admin window just as
a sanity check in these situations.
Good idea! But the one shown in the browser is identical to the one I've been editing! So that's not an issue.

Reply via email to