Re: Solr Basic Configuration - Highlight - Begginer

Erick Erickson Wed, 16 Dec 2015 09:18:12 -0800

Ok, you're getting confused by all the options, an easy thing to do.
You're trying to do too many things at once without making sure
the basics work....


1> Forget all about the f.content.hl.... stuff. That's there in case
you want to specify different parameters for different fields in the same
highlight request. That's an advanced option for later....

2> start with the basic techproducts example. Then this should show
you hightlights:
q=features:accents&hl=true&hl.fl=features

That's about as basic as you get. It's searching for "accents" in the
features field and returning highlights on the features field.

Once that's working, _then_ refine.

Best,
Erick

On Wed, Dec 16, 2015 at 8:21 AM, Evert R. <evert.ra...@gmail.com> wrote:
> Hi Andrea,
>
> ok, let´s do it:
>
> 1. it does has the 'nietava' term, so it brings the only book (pdf file)
> has this word, and all its content as my previous message to Erick, so the
> content field is there.
>
> 2. using content:nietava it does not show any result.... as below:
>
> { "responseHeader": { "status": 400, "QTime": 12, "params": { "q":
> "contents:nietava", "indent": "true", "fl": "id", "wt": "json", "_":
> "1450282631352" } }, "error": { "msg": "undefined field contents", "code":
> 400 } }
>
> 3. Here is what I found when grepping 'content' from the techproducts conf
> folder:
>
> schema.xml: <field name="content_type" type="string" indexed="true"
> stored="true" multiValued="true"/> schema.xml: <field name="content"
> type="text_general" indexed="false" stored="true" multiValued="true"/>
> schema.xml: <copyField source="content" dest="text"/> schema.xml:
> <copyField source="content_type" dest="text"/> solrconfig.xml: <str
> name="facet.field">content_type</str> solrconfig.xml: <str
> name="hl.fl">content features title name</str> solrconfig.xml: <str
> name="f.content.hl.snippets">3</str> solrconfig.xml: <str
> name="f.content.hl.fragsize">200</str> solrconfig.xml: <str
> name="f.content.hl.alternateField">content</str> solrconfig.xml: <str
> name="f.content.hl.maxAlternateFieldLength">750</str> solrconfig.xml: <str
> name="stream.contentType">application/json</str> solrconfig.xml: <str
> name="stream.contentType">application/csv</str> solrconfig.xml: <str
> name="content-type">text/plain; charset=UTF-8</str>
>
> and the grep on 'content_type':
>
> schema.xml:   <field name="content_type" type="string" indexed="true"
> stored="true" multiValued="true"/>
> schema.xml:   <copyField source="content_type" dest="text"/>
> solrconfig.xml:       <str name="facet.field">content_type</str>
>
> =)
>
> Thanks for checking out.
>
>
>
> *Evert *
>
> 2015-12-16 12:59 GMT-02:00 Andrea Gazzarini <a.gazzar...@gmail.com>:
>
>> hl=f.content.hl.content (I guess) is definitely wrong. Some questions:
>>
>>    - First, sorry, the obvious question: are you sure the documents contain
>>    the "nietava" term?
>>    - Could you try to use q=content:nietaval?
>>    - Could you paste the definition (field & fieldtype) of the content
>>    field?
>>
>> > Should I have this configuration in the XML file?
>>
>> You could, but it's up to you and it strongly depends on your context. The
>> simple thing is that if you have those parameters within the configuration
>> you can avoid to pass them (as part of the requests), but probably in this
>> phase, where you are testing, it's better to have them there (in the
>> request).
>>
>> Andrea
>>
>> 2015-12-16 15:28 GMT+01:00 Evert R. <evert.ra...@gmail.com>:
>>
>> > Hi Andrea,
>> >
>> > Thanks for the reply!
>> >
>> > I tried with the hl.fl parameter as well, using as below:
>> >
>> >
>> >
>> http://localhost:8983/solr/techproducts/select?q=nietava&fl=id%2C+content&wt=json&indent=true&hl=true&;
>> >
>> >
>> hl.fl=f.content.hl.content%3D4&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>> >
>> > with the parameter under the hl field in the solr ui:
>> >
>> > 1. f.content.hl.snnipets=2
>> > 2. f.content.hl.content=4
>> > 3. content
>> >
>> > with no success...
>> >
>> > Should I have this configuration in the XML file?
>> >
>> > Regards,
>> >
>> > *Evert *
>> >
>> > 2015-12-16 11:23 GMT-02:00 Andrea Gazzarini <a.gazzar...@gmail.com>:
>> >
>> > > Hi Evert,
>> > > what is the configuration of the default request handler? Did you set
>> the
>> > > hl.fl parameter?
>> > >
>> > > Please check here [1] the parameters that the highlighting component
>> > > expects. Required parameters should be in the query string or declared
>> > > within the request handler which answers to your query.
>> > >
>> > > Andrea
>> > >
>> > > [1] https://wiki.apache.org/solr/HighlightingParameters
>> > >
>> > >
>> > >
>> > >
>> > > 2015-12-16 12:51 GMT+01:00 Evert R. <evert.ra...@gmail.com>:
>> > >
>> > > > Hi everyone!
>> > > >
>> > > > I think I should not have posted my server name... never had that
>> many
>> > > > access attempts...
>> > > >
>> > > >
>> > > >
>> > > > 2015-12-16 9:03 GMT-02:00 Evert R. <evert.ra...@gmail.com>:
>> > > >
>> > > > > Hello Erick,
>> > > > >
>> > > > > Thanks again for your time.
>> > > > >
>> > > > > Here is as far as I have gone:
>> > > > >
>> > > > > 1. I started a fresh install and did the following:
>> > > > >
>> > > > > [evert@nix]$ bin/solr start -e techproducts
>> > > > > [evert@nix]$ curl '
>> > > > >
>> > > >
>> > >
>> >
>> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
>> > > > '
>> > > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
>> > > > >
>> > > > > 2. I am using only the Solr Admin UI to check the query respond,
>> here
>> > > is
>> > > > > an example:
>> > > > >
>> > > > > Query: http://
>> > > > > localhost
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> :8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>> > > > >
>> > > > > Result: {
>> > > > >   "responseHeader": {
>> > > > >     "status": 0,
>> > > > >     "QTime": 14,
>> > > > >     "params": {
>> > > > >       "q": "nietava",
>> > > > >       "hl": "true",
>> > > > >       "hl.simple.post": "</em>",
>> > > > >       "indent": "true",
>> > > > >       "fl": "id, author, content",
>> > > > >       "wt": "json",
>> > > > >       "hl.simple.pre": "<em>",
>> > > > >       "_": "1450262674102"
>> > > > >     }
>> > > > >   },
>> > > > >   "response": {
>> > > > >     "numFound": 1,
>> > > > >     "start": 0,
>> > > > >     "docs": [
>> > > > >       {
>> > > > >         "id": "pdf1",
>> > > > >         "author": "Wander",
>> > > > >         "content": [
>> > > > >           "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n
>> > \n
>> > > > > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n
>> Sexo e
>> > > > > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo
>> > Espiritual”
>> > > > \n
>> > > > > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n
>> \n
>> > \n
>> > > > \n
>> > > > > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua
>> > > Souza
>> > > > > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
>> > > > > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier -
>> Sexo e
>> > > > > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n
>> > > Coleção
>> > > > > \n“A Vida no Mundo Espiritual” \n"
>> > > > >         ]
>> > > > >       }
>> > > > >     ]
>> > > > >   },
>> > > > >   "highlighting": {
>> > > > >     "pdf1": {}
>> > > > >   }
>> > > > > }
>> > > > >
>> > > > > **On the content it brings the whole pdf content (book), and notice
>> > > that
>> > > > > in the highlight it shows empty.
>> > > > >
>> > > > > I tried creating a new core with bin/solr create -c test, using the
>> > > > > schema.xml and solrconfig.xml standard found in
>> > > > > /solr/server/solr/configsets/basic_configs/conf
>> > > > >
>> > > > > But even though... not working as expected (I think).
>> > > > >
>> > > > >
>> > > > > Would you know how to set this techproducts example to bring the
>> > > snnipets
>> > > > > of text?
>> > > > >
>> > > > > The server only allows specific ip address for this port, if you
>> > > would, I
>> > > > > could get it open for you to check.
>> > > > >
>> > > > >
>> > > > > Thanks again and best regards!
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > *Evert
>> > > > >
>> > > > >
>> > > > > 2015-12-15 18:14 GMT-02:00 Erick Erickson <erickerick...@gmail.com
>> >:
>> > > > >
>> > > > >> No, that's not what I meant. The highlight component adds a
>> special
>> > > > >> section to the return packet that will contain "snippets" of text
>> > with
>> > > > >> highlights. You control how big those snippets are via various
>> > > > >> parameters in the highlight component and they'll have the tags
>> you
>> > > > >> specify for highlighting.
>> > > > >>
>> > > > >> Your app needs to pull the information from the highlight portion
>> of
>> > > > >> the response packet rather than the document list. Just execute
>> your
>> > > > >> queries via cURL or a browser to see the structure of a response
>> to
>> > > > >> see what I mean.
>> > > > >>
>> > > > >> And note that you do _not_ need to return the fields you're
>> > > > >> highlighting in the "fl" list so you do _not_ need to return the
>> > > > >> entire document contents.
>> > > > >>
>> > > > >> What are you using to display the results anyway?
>> > > > >>
>> > > > >> Best,
>> > > > >> Erick
>> > > > >>
>> > > > >> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. <evert.ra...@gmail.com
>> >
>> > > > wrote:
>> > > > >> > Hi Erick,
>> > > > >> >
>> > > > >> > Thank you very much for the reply!!
>> > > > >> >
>> > > > >> > I do get back the full text, autor, and a whole lots of stuff
>> > which
>> > > > >> doesn´t
>> > > > >> > really matter for my project.
>> > > > >> >
>> > > > >> > So, what you are saying is that the solr gets me back the full
>> > > content
>> > > > >> and
>> > > > >> > my application will fix the rest? Which means for me that all my
>> > > books
>> > > > >> (pdf
>> > > > >> > files) when searching for an specific word it will bring me the
>> > > whole
>> > > > >> book
>> > > > >> > content that has the requested query. And my application (php)
>> in
>> > > this
>> > > > >> > case... will take care of show only part of the text (such as in
>> > > > >> highlight,
>> > > > >> > as I was understandind) and hightlight the key word I was
>> looking
>> > > for?
>> > > > >> >
>> > > > >> > If so, Erick, you gave me a big help clearing out... I thought I
>> > > would
>> > > > >> do
>> > > > >> > that with Solr in an easy way. =)
>> > > > >> >
>> > > > >> > Thanks for the attachements tip!
>> > > > >> >
>> > > > >> > Best regards,
>> > > > >> >
>> > > > >> > Evert
>> > > > >> >
>> > > > >> > 2015-12-15 14:56 GMT-02:00 Erick Erickson <
>> > erickerick...@gmail.com
>> > > >:
>> > > > >> >
>> > > > >> >> How are you trying to display the results? Highlighting is a
>> bit
>> > of
>> > > > an
>> > > > >> >> odd beast. Assuming it's correctly configured, the response
>> > packet
>> > > > >> >> will have a separate highlight section, it's the application's
>> > > > >> >> responsibility to present that pleasingly.
>> > > > >> >>
>> > > > >> >> What _do_ you get bak in the response?
>> > > > >> >>
>> > > > >> >> BTW, the mail sever pretty aggressively strips attachments,
>> > your's
>> > > > >> >> didn't come through.
>> > > > >> >>
>> > > > >> >> Best,
>> > > > >> >> Erick
>> > > > >> >>
>> > > > >> >> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <
>> evert.ra...@gmail.com
>> > >
>> > > > >> wrote:
>> > > > >> >> > Hi there!
>> > > > >> >> >
>> > > > >> >> > It´s my first installation, not sure if here is the right
>> > > > channel...
>> > > > >> >> >
>> > > > >> >> > Here is my steps:
>> > > > >> >> >
>> > > > >> >> > 1. Set up a basic install of solr 5.4.0
>> > > > >> >> >
>> > > > >> >> > 2. Create a new core through command line (bin/solr create -c
>> > > test)
>> > > > >> >> >
>> > > > >> >> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test
>> > > /docs/test/)
>> > > > >> >> >
>> > > > >> >> > 4. Query over the browser and it brings the correct search,
>> but
>> > > it
>> > > > >> does
>> > > > >> >> not
>> > > > >> >> > show the part of the text I am querying, the highlight.
>> > > > >> >> >
>> > > > >> >> >   I have already flagled the 'hl' option. But still it does
>> not
>> > > > >> word...
>> > > > >> >> >
>> > > > >> >> > Exemple: I am looking for the word 'peace' in my pdf file
>> > (book)
>> > > I
>> > > > >> have 4
>> > > > >> >> > matches for this word, it shows me the book name (pdf file)
>> but
>> > > > does
>> > > > >> not
>> > > > >> >> > bring which part of the text it has the word peace on it.
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > I am problably missing some configuration in schema.xml,
>> which
>> > is
>> > > > >> missing
>> > > > >> >> > from my folder.... /solr/server/solr/test/conf/
>> > > > >> >> >
>> > > > >> >> > Or even the solrconfig.xml...
>> > > > >> >> >
>> > > > >> >> > I have read a bunch of things about highlight check these
>> > files,
>> > > > >> copied
>> > > > >> >> the
>> > > > >> >> > standard schema.xml to my core/conf folder, but still it does
>> > not
>> > > > >> bring
>> > > > >> >> the
>> > > > >> >> > highlight.
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > Attached a copy of my solrconfig.xml file.
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > I am very sorry for this, probably, dumb and too basic
>> > > question...
>> > > > >> First
>> > > > >> >> > time I see solr in live.
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > Any help will be appreciated.
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > Best regards,
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > Evert Ramos
>> > > > >> >> >
>> > > > >> >> > evert.ra...@gmail.com
>> > > > >> >> >
>> > > > >> >>
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>

Re: Solr Basic Configuration - Highlight - Begginer

Reply via email to