No need to re-install Solr, just create a new core, this time it'd probably be easiest to use the bin/solr create_core command. In the Solr directory just type bin/solr create_core -help to see the options.
We're pretty much trying to migrate to using bin/solr for all the maintenance we can, but as always the documentation lags the code. Yeah, things are a bit ragged. The admin UI/core UI is really a legacy bit of code that has _always_ been confusing, I'm hoping we can pretty much remove it at some point since it's as trappy as it is. Best, Erick On Sat, Sep 26, 2015 at 12:49 PM, Mark Fenbers <mark.fenb...@noaa.gov> wrote: > OK, a lot of dialog while I was gone for two days! I read the whole thread, > but I'm a newbie to Solr, so some of the dialog was Greek to me. I > understand the words, of course, but applying it so I know exactly what to > do without screwing something else up is the problem. After all, that is > how I got into the mess in the first place. I'm glad I have good help to > untangle the knots I've made! > > I'd like to start over (option 1 below), but does this mean delete all my > config and reinstalling Solr?? Maybe that is not a bad idea, but I will at > least save off my data-config.xml as that is clearly the one thing that is > probably working right. However, I did do quite a bit of editing that I > would have to do again. Please advise... > > To be fair, I must answer Erick's question of how I created the data index > in the first place, because this might be relevant... > > The bulk of the data is read from 9000+ text files, where each file was > manually typed. Before inserting into the database, I do a little bit of > processing of the text using "sed" to delete the top few and bottom few > lines, and to substitute each single-quote character with a pair of > single-quotes (so PostgreSQL doesn't choke). Line-feed characters are > preserved as ASCII 10 (hex 0A), but there shouldn't be (and I am not aware > of) any characters aside from what is on the keyboard. > > Next, I insert it with this command: > psql -U awips -d OHRFC -c "INSERT INTO EventLogText VALUES('$postDate', > '$user', '$postDate', '$entryText', '$postCatVal');" > > In case you are wondering about my table, it is defined in this way: > CREATE TABLE eventlogtext ( > posttime timestamp without time zone NOT NULL, -- Timestamp of this > entry's original posting > username character varying(8), -- username (logname) of the original > poster > lastmodtime timestamp without time zone, -- Last time record was altered > logtext text, -- text of the log entry > category integer, -- bit-wise category value > CONSTRAINT eventlogtext_pkey PRIMARY KEY (posttime) > ) > > To do the indexing, I merely use /dataimport?full-import, but it knows what > to do from my data-config.xml; which is here: > > <dataConfig> > <dataSource driver="org.postgresql.Driver" > url="jdbc:postgresql://dx1f/OHRFC" user="awips" /> > <document> > <entity name="eventlogtext" query="SELECT posttime AS id, username, > logtext, category FROM eventlogtext;" > deltaQuery="SELECT posttime AS id FROM eventlogtext WHERE > lastmodtime > '${dataimporter.last_index_time}';"> > <entity name="categorytypes" query="SELECT catname FROM > categorytypes WHERE catid='${eventlogtext.category}';"> > </entity> > </entity> > </document> > </dataConfig> > > Hope this helps! > > Thanks, > Mark > > On 9/24/2015 10:57 AM, Erick Erickson wrote: >> >> Geraint: >> >> Good Catch! I totally missed that. So all of our focus on schema.xml has >> been... totally irrelevant. Now that you pointed that out, there's also >> the >> addition: add-unknown-fields-to-the-schema, which indicates you started >> this up in "schemaless" mode. >> >> In short, solr is trying to guess what your field types should be and >> guessing wrong (again and again and again). This is the classic weakness >> of >> schemaless. It's great for indexing stuff fast, but if it guesses wrong >> you're stuck. >> >> >> So to the original problem: I'd start over and either >> 1> use the regular setup, not schemaless >> or >> 2> use the _managed_ schema API to explicitly add fields and fieldTypes to >> the managed schema >> >> Best, >> Erick >> >> On Thu, Sep 24, 2015 at 2:02 AM, Duck Geraint (ext) GBJH < >> geraint.d...@syngenta.com> wrote: >> >>> Okay, so maybe I'm missing something here (I'm still relatively new to >>> Solr myself), but am I right in thinking the following is still in your >>> solrconfig.xml file: >>> >>> <schemaFactory class="ManagedIndexSchemaFactory"> >>> <bool name="mutable">true</bool> >>> <str name="managedSchemaResourceName">managed-schema</str> >>> </schemaFactory> >>> >>> If so, wouldn't using a managed schema make several of your field >>> definitions inside the schema.xml file semi-redundant? >>> >>> Regards, >>> Geraint >>> >>> >>> Geraint Duck >>> Data Scientist >>> Toxicology and Health Sciences >>> Syngenta UK >>> Email: geraint.d...@syngenta.com >>> >>> >>> -----Original Message----- >>> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] >>> Sent: 24 September 2015 09:23 >>> To: solr-user@lucene.apache.org >>> Subject: Re: query parsing >>> >>> I would focus on this : >>> >>> " >>> >>>> 5> now kick off the DIH job and look again. >>>> >>> Now it shows a histogram, but most of the "terms" are long -- the full >>> texts of (the table.column) eventlogtext.logtext, including the >>> whitespace >>> (with %0A used for newline characters)... So, it appears it is not being >>> tokenized properly, correct?" >>> Can you open from your Solr ui , the schema xml and show us the snippets >>> for that field that seems to not tokenise ? >>> Can you show us ( even a screenshot is fine) the schema browser page >>> related ? >>> Could be a problem of encoding ? >>> Following Erick details about the analysis, what are your results ? >>> >>> Cheers >>> >>> 2015-09-24 8:04 GMT+01:00 Upayavira <u...@odoko.co.uk>: >>> >>>> typically, the index dir is inside the data dir. Delete the index dir >>>> and you should be good. If there is a tlog next to it, you might want >>>> to delete that also. >>>> >>>> If you dont have a data dir, i wonder whether you set the data dir >>>> when creating your core or collection. Typically the instance dir and >>>> data dir aren't needed. >>>> >>>> Upayavira >>>> >>>> On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote: >>>>> >>>>> OK, this is bizarre. You'd have had to set up SolrCloud by >>>>> specifying the -zkRun command when you start Solr or the -zkHost; >>>>> highly unlikely. On the admin page there would be a "cloud" link on >>>>> the left side, I really doubt one's there. >>>>> >>>>> You should have a data directory, it should be the parent of the >>>>> index and tlog directories. As of sanity check try looking at the >>>>> analysis page. >>>>> Type >>>>> a bunch of words in the left hand side indexing box and uncheck the >>>>> verbose box. As you can tell I'm grasping at straws. I'm still >>>>> puzzled why you don't have a "data" directory here, but that >>>>> shouldn't really matter. How did you create this index? I don't mean >>>>> data import handler more how did you create the core that you're >>>>> indexing to? >>>>> >>>>> Best, >>>>> Erick >>>>> >>>>> On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers >>>>> <mark.fenb...@noaa.gov> >>>>> wrote: >>>>> >>>>>> On 9/23/2015 12:30 PM, Erick Erickson wrote: >>>>>> >>>>>>> Then my next guess is you're not pointing at the index you think >>>>>>> you >>>> >>>> are >>>>>>> >>>>>>> when you 'rm -rf data' >>>>>>> >>>>>>> Just ignore the Elall field for now I should think, although get >>>>>>> rid >>>> >>>> of it >>>>>>> >>>>>>> if you don't think you need it. >>>>>>> >>>>>>> DIH should be irrelevant here. >>>>>>> >>>>>>> So let's back up. >>>>>>> 1> go ahead and "rm -fr data" (with Solr stopped). >>>>>>> >>>>>> I have no "data" dir. Did you mean "index" dir? I removed 3 >>>>>> index directories (2 for spelling): >>>>>> cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex >>>>>> >>>>>>> 2> start Solr >>>>>>> 3> do NOT re-index. >>>>>>> 4> look at your index via the schema-browser. Of course there >>>>>>> 4> should >>>> >>>> be >>>>>>> >>>>>>> nothing there! >>>>>>> >>>>>> Correct! It said "there is no term info :(" >>>>>> >>>>>>> 5> now kick off the DIH job and look again. >>>>>>> >>>>>> Now it shows a histogram, but most of the "terms" are long -- the >>>>>> full texts of (the table.column) eventlogtext.logtext, including >>>>>> the >>>> >>>> whitespace >>>>>> >>>>>> (with %0A used for newline characters)... So, it appears it is >>>>>> not >>>> >>>> being >>>>>> >>>>>> tokenized properly, correct? >>>>>> >>>>>>> Your logtext field should have only single tokens. The fact that >>>>>>> you >>>> >>>> have >>>>>>> >>>>>>> some very >>>>>>> long tokens presumably with whitespace) indicates that you aren't >>>> >>>> really >>>>>>> >>>>>>> blowing >>>>>>> the index away between indexing. >>>>>>> >>>>>> Well, I did this time for sure. I verified that initially, >>>>>> because it showed there was no term info until I DIH'd again. >>>>>> >>>>>>> Are you perhaps in Solr Cloud with more than one replica? >>>>>>> >>>>>> Not that I know of, but being new to Solr, there could be things >>>>>> going >>>> >>>> on >>>>>> >>>>>> that I'm not aware of. How can I tell? I certainly didn't set >>>> >>>> anything up >>>>>> >>>>>> for solrCloud deliberately. >>>>>> >>>>>>> In that case you >>>>>>> might be getting the index replicated on startup assuming you >>>>>>> didn't blow away all replicas. If you are in SolrCloud, I'd just >>>>>>> delete the collection and start over, after insuring that you'd >>>>>>> pushed the configset up to Zookeeper. >>>>>>> >>>>>>> BTW, I always look at the schema.xml file from the Solr admin >>>>>>> window >>>> >>>> just >>>>>>> >>>>>>> as >>>>>>> a sanity check in these situations. >>>>>>> >>>>>> Good idea! But the one shown in the browser is identical to the >>>>>> one >>>> >>>> I've >>>>>> >>>>>> been editing! So that's not an issue. >>>>>> >>>>>> >>> >>> >>> -- >>> -------------------------- >>> >>> Benedetti Alessandro >>> Visiting card - http://about.me/alessandro_benedetti >>> Blog - http://alexbenedetti.blogspot.co.uk >>> >>> "Tyger, tyger burning bright >>> In the forests of the night, >>> What immortal hand or eye >>> Could frame thy fearful symmetry?" >>> >>> William Blake - Songs of Experience -1794 England >>> ________________________________ >>> >>> >>> Syngenta Limited, Registered in England No 2710846;Registered Office : >>> Syngenta Limited, European Regional Centre, Priestley Road, Surrey >>> Research >>> Park, Guildford, Surrey, GU2 7YH, United Kingdom >>> ________________________________ >>> This message may contain confidential information. If you are not the >>> designated recipient, please notify the sender immediately, and delete >>> the >>> original and any copies. Any use of the message by you is prohibited. >>> >