bq: However, I discovered that if I search on "Wednesday*" (trailing
asterisk), then I get all the results containing Wednesday that I'm
looking for!

This almost always means you're not searching on the field you think
you're searching on and/or the field isn't being analyzed as you think
(i.e. the fieldType isn't what you expect). If you're really searching
on a fieldType of text_en (and you haven't changed the definition),
then there's something very weird here. FieldTypes are totally
mutable, they are composed of various analysis chains that you (or
someone else) can freely alter, so seeing the <field> definition that
references a type="text_en" is suggestive but not definitive.

I'm going to further guess that when you search on "Wednesday*", all
the matches are at the beginning of the line, and you find docs where
the field has "Wednesday, September...." but not "The party was on
Wednesday".

So let's see the <fieldType> associated with the logtext field. Plus,
the results of adding &debug=true to the query.

But you can get a lot of info a lot faster if you go to the admin UI
screen, select the proper core from the drop-down on the left sied and
go to the "analysis" section. Pick the field (or field type), enter
some text and hit analyze (or uncheck the "verbose" box, that's
largely uninteresting info at this level). That'll show you exactly
how the input document is parsed, exactly how the query is parsed etc.
And be sure to enter something like
"september first was a Wednesday" in the left-hand (index) box, then
just "Wednesday" in the right hand (query) side. My bet: You'll see on
the index side that the input is not broken up, not transformed, etc.

Best,
Erick

On Mon, Sep 21, 2015 at 9:49 AM, Mark Fenbers <mark.fenb...@noaa.gov> wrote:
> Ok, Erick, you provided useful info to help with my understanding. However,
> I still get zero results when I search on literal text (e.g., "Wednesday"),
> even with making changes that you suggest. However, I discovered that if I
> search on "Wednesday*" (trailing asterisk), then I get all the results
> containing Wednesday that I'm looking for!  Why would adding a wildcard
> token change the results I get back?
>
> In my schema.xml, my customized section now looks like this, based on your
> previous message:
>
> <field name="id" type="date" indexed="true" stored="true" required="true" />
> <field name="logtext" type="text_en" indexed="true" stored="true"
> required="true" />
> <field name="username" type="string" indexed="true" stored="true"
> required="true" />
> <field name="category" type="int" indexed="true" stored="true"
> required="true" />
>
> <field name="ELall" type="text" indexed="true" stored="true"
> multiValued="true" />
> <copyField source="logtext" dest="ELall" />
> <copyField source="username" dest="ELall" />
>
> Then I removed the data subdir, did a solr restart, and did a /dataimport
> again.  It successfully processed all 9857 documents. No stack traces in
> solr.log.  It is at this point that searching on Wednesday gave zero results
> (Boo!), but searching on Wednesday* gave hundreds of results. (Yay!)  My
> changes to schema.xml were to make logtext be the type "text_en".
> Previously, the only line in schema.xml was the first one ("id"), and I
> changed that from type="text" to type="date" because it is a Timestamp
> object in Java and a "timestamp without time zone" in PostgreSQL.  But even
> with these changes, the results are the same as before.
>
> Do you have any more ideas why searching on any literal string finds zero
> documents?
>
> Thanks,
> Mark
>
>
> On 9/18/2015 10:30 PM, Erick Erickson wrote:
>>
>> bq: There is no fieldType defined in my solrconfig.xml, unless you are
>> referring to this line:
>>
>> Well, that's because you should be looking in schema.xml ;).....
>>
>> This line from your stacktrace file is very suspicious:
>>    <str name="parsedquery_toString">logtext:Wednesday</str>
>>
>> It _looks_ like your logtext file is perhaps a "string" type. String
>> types are totally unanalyzed,
>> so unless the input matches _exactly_ (and by exactly mean same case,
>> same words, same
>> order, identical punctuation) you won't find the doc. Thus with a
>> string field type, if the doc had
>> "my Dog has fleas.", searching for "my" or "My" or "My dog has fleas"
>> or "my Dog has fleas"
>> would all not find the doc (this last one has no period".
>>
>> You usually want one of the text types, text_en or the like. Note that
>> you will be a _long_ time
>> figuring out how all that works and affects your searches, the
>> admin/analysis page is definitely
>> your friend.
>>
>> There should be a line similar to
>> <field name="logtext" type="something" blah blah blah/>
>>
>> Somewhere else there should be something like:
>> <fieldType name="something" potentially a lot of stuff, perhaps lots
>> of lines maybe not />
>>
>> The fieldType is what determines how the text is handled to search,
>> how it's broken up
>> and, in essence, how searches behave.
>>
>> So what Erik and Shawn were asking is those two definitions.
>>
>> Do note if you've changed the definitions here, it's usually wise to
>> 'rm -rf <core>/data' and completely re-index from scratch.
>>
>> Best,
>> Erick
>>
>

Reply via email to