Then my next guess is you're not pointing at the index you think you are when you 'rm -rf data'
Just ignore the Elall field for now I should think, although get rid of it if you don't think you need it. DIH should be irrelevant here. So let's back up. 1> go ahead and "rm -fr data" (with Solr stopped). 2> start Solr 3> do NOT re-index. 4> look at your index via the schema-browser. Of course there should be nothing there! 5> now kick off the DIH job and look again. Your logtext field should have only single tokens. The fact that you have some very long tokens presumably with whitespace) indicates that you aren't really blowing the index away between indexing. Are you perhaps in Solr Cloud with more than one replica? In that case you might be getting the index replicated on startup assuming you didn't blow away all replicas. If you are in SolrCloud, I'd just delete the collection and start over, after insuring that you'd pushed the configset up to Zookeeper. BTW, I always look at the schema.xml file from the Solr admin window just as a sanity check in these situations. Best, Erick On Wed, Sep 23, 2015 at 9:22 AM, Mark Fenbers <mark.fenb...@noaa.gov> wrote: > On 9/23/2015 11:28 AM, Erick Erickson wrote: > >> This is totally weird. >> >> Don't only re-index your old docs, find the data directory and >> rm -rf data (with Solr stopped) and re-index. >> > I pretty much do that. The thing is: I don't have a data directory > anywhere! Most of my stuff is in /localapps/dev/EventLog/solr/, but I *do* > have a /localapps/dev/EventLog/index/ directory where the main index > resides. I'd like to move that into /localapps/dev/EventLog/solr/ so that > I can keep all Solr-related files under one parent dir, but I can't find > where the configuration for that is... > > Perhaps I should also share what start command I'm using (in case it is > wrong!): > > /localapps/dev/solr-5.3.0/bin/solr start -s /localapps/dev/EventLog > >> re: the analysis page Alessandro mentioned. >> Go to the Solr admin UI (http://localhost:8983/solr). You'll >> see a drop-down on the left that lets you select a core, >> select the appropriate one. >> >> Now you'll see a bunch of new choices. The "analysis" section >> is what Alessandro is referencing. That shows you _exactly_ what >> effects your analysis chain has at index and query time. >> >> On the same page, you'll find "schema browser". Take a look at >> your logtext field and hit the "load term info" button. You should >> see a bunch of single-word tokens listed. If you see really long ones, >> then your index is hosed and you should start by blowing away >> the data directory.... >> > I wish I could show a screen capture! But according to your symptoms, my > index is hosed (I see very few single-word tokens and lots of really long > ones.) I have no data directory to blow away, though. I've blown away > /localapps/dev/EventLog/index/ before, but that has had no effect on the > problem. > > Am I indexing improperly perhaps? I'm using /dataimport. Here is my > data-config.xml, which hasn't been giving me any obvious trouble. Import > seems successful. And I can get correct search results so long as I wrap > my search text in asterisks... > > <?xml version="1.0"?> > <dataConfig> > <dataSource user="awips" url="jdbc:postgresql://dx1f/OHRFC" > driver="org.postgresql.Driver"/> > <document> > <entity deltaQuery="SELECT posttime AS id FROM eventlogtext > WHERE lastmodtime > '${dataimporter.last_index_time}';" query="SELECT > posttime AS id, username, logtext, category FROM eventlogtext;" > name="eventlogtext"> > <entity query="SELECT catname FROM categorytypes WHERE > catid='${eventlogtext.category}';" name="categorytypes"> </entity> > </entity> > </document> > </dataConfig> > > Because this symptom is totally explained by searching on a "string" >> rather than a "text" type. But your definition is clearly a tokenized text >> type so I'm mystified. >> >> The ELall field is a red herring. The debug output shows you're searching >> on the logtext field, this line is the relevant one: >> "parsedquery_toString":"logtext:deeper", >> > Should I just get rid of "ELall"? I only created it with the intent to be > able to search on "fenbers" and get hits if "fenbers" occurred in either > place, the logtext field or the username field. > > thanks, > Mark > >