Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hello Erick,

Thanks again for your time.

Here is as far as I have gone:

1. I started a fresh install and did the following:

[evert@nix]$ bin/solr start -e techproducts
[evert@nix]$ curl '
http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true'
-F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"

2. I am using only the Solr Admin UI to check the query respond, here is an
example:

Query:
http://nix.budhi.com.br:8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

Result: {
  "responseHeader": {
"status": 0,
"QTime": 14,
"params": {
  "q": "nietava",
  "hl": "true",
  "hl.simple.post": "",
  "indent": "true",
  "fl": "id, author, content",
  "wt": "json",
  "hl.simple.pre": "",
  "_": "1450262674102"
}
  },
  "response": {
"numFound": 1,
"start": 0,
"docs": [
  {
"id": "pdf1",
"author": "Wander",
"content": [
  "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n\n
Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo Espiritual” \n
\n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n \n \n
\n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua Souza
Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n Coleção
\n“A Vida no Mundo Espiritual” \n"
]
  }
]
  },
  "highlighting": {
"pdf1": {}
  }
}

**On the content it brings the whole pdf content (book), and notice that in
the highlight it shows empty.

I tried creating a new core with bin/solr create -c test, using the
schema.xml and solrconfig.xml standard found in
/solr/server/solr/configsets/basic_configs/conf

But even though... not working as expected (I think).


Would you know how to set this techproducts example to bring the snnipets
of text?

The server only allows specific ip address for this port, if you would, I
could get it open for you to check.


Thanks again and best regards!




*Evert Ramos*
*evert.ra...@gmail.com *


2015-12-15 18:14 GMT-02:00 Erick Erickson :

> No, that's not what I meant. The highlight component adds a special
> section to the return packet that will contain "snippets" of text with
> highlights. You control how big those snippets are via various
> parameters in the highlight component and they'll have the tags you
> specify for highlighting.
>
> Your app needs to pull the information from the highlight portion of
> the response packet rather than the document list. Just execute your
> queries via cURL or a browser to see the structure of a response to
> see what I mean.
>
> And note that you do _not_ need to return the fields you're
> highlighting in the "fl" list so you do _not_ need to return the
> entire document contents.
>
> What are you using to display the results anyway?
>
> Best,
> Erick
>
> On Tue, Dec 15, 2015 at 10:02 AM, Evert R.  wrote:
> > Hi Erick,
> >
> > Thank you very much for the reply!!
> >
> > I do get back the full text, autor, and a whole lots of stuff which
> doesn´t
> > really matter for my project.
> >
> > So, what you are saying is that the solr gets me back the full content
> and
> > my application will fix the rest? Which means for me that all my books
> (pdf
> > files) when searching for an specific word it will bring me the whole
> book
> > content that has the requested query. And my application (php) in this
> > case... will take care of show only part of the text (such as in
> highlight,
> > as I was understandind) and hightlight the key word I was looking for?
> >
> > If so, Erick, you gave me a big help clearing out... I thought I would do
> > that with Solr in an easy way. =)
> >
> > Thanks for the attachements tip!
> >
> > Best regards,
> >
> > Evert
> >
> > 2015-12-15 14:56 GMT-02:00 Erick Erickson :
> >
> >> How are you trying to display the results? Highlighting is a bit of an
> >> odd beast. Assuming it's correctly configured, the response packet
> >> will have a separate highlight section, it's the application's
> >> responsi

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hi everyone!

I think I should not have posted my server name... never had that many
access attempts...



2015-12-16 9:03 GMT-02:00 Evert R. :

> Hello Erick,
>
> Thanks again for your time.
>
> Here is as far as I have gone:
>
> 1. I started a fresh install and did the following:
>
> [evert@nix]$ bin/solr start -e techproducts
> [evert@nix]$ curl '
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true'
> -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
>
> 2. I am using only the Solr Admin UI to check the query respond, here is
> an example:
>
> Query: http://
> ​localhost
>
> :8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>
> Result: {
>   "responseHeader": {
> "status": 0,
> "QTime": 14,
> "params": {
>   "q": "nietava",
>   "hl": "true",
>   "hl.simple.post": "",
>   "indent": "true",
>   "fl": "id, author, content",
>   "wt": "json",
>   "hl.simple.pre": "",
>   "_": "1450262674102"
> }
>   },
>   "response": {
> "numFound": 1,
> "start": 0,
> "docs": [
>   {
> "id": "pdf1",
> "author": "Wander",
> "content": [
>   "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n\n
> Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
> Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo Espiritual” \n
> \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n \n \n
> \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua Souza
> Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
> Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n Coleção
> \n“A Vida no Mundo Espiritual” \n"
> ]
>   }
> ]
>   },
>   "highlighting": {
> "pdf1": {}
>   }
> }
>
> **On the content it brings the whole pdf content (book), and notice that
> in the highlight it shows empty.
>
> I tried creating a new core with bin/solr create -c test, using the
> schema.xml and solrconfig.xml standard found in
> /solr/server/solr/configsets/basic_configs/conf
>
> But even though... not working as expected (I think).
>
>
> Would you know how to set this techproducts example to bring the snnipets
> of text?
>
> The server only allows specific ip address for this port, if you would, I
> could get it open for you to check.
>
>
> Thanks again and best regards!
>
>
>
>
> *Evert Ramos*
> *evert.ra...@gmail.com *
>
>
> 2015-12-15 18:14 GMT-02:00 Erick Erickson :
>
>> No, that's not what I meant. The highlight component adds a special
>> section to the return packet that will contain "snippets" of text with
>> highlights. You control how big those snippets are via various
>> parameters in the highlight component and they'll have the tags you
>> specify for highlighting.
>>
>> Your app needs to pull the information from the highlight portion of
>> the response packet rather than the document list. Just execute your
>> queries via cURL or a browser to see the structure of a response to
>> see what I mean.
>>
>> And note that you do _not_ need to return the fields you're
>> highlighting in the "fl" list so you do _not_ need to return the
>> entire document contents.
>>
>> What are you using to display the results anyway?
>>
>> Best,
>> Erick
>>
>> On Tue, Dec 15, 2015 at 10:02 AM, Evert R.  wrote:
>> > Hi Erick,
>> >
>> > Thank you very much for the reply!!
>> >
>> > I do get back the full text, autor, and a whole lots of stuff which
>> doesn´t
>> > really matter for my project.
>> >
>> > So, what you are saying is that the solr gets me back the full content
>> and
>> > my application will fix the rest? Which means for me that all my books
>> (pdf
>> > files) when searching for an specific word it will bring me the whole
>> book
>> > content that has the requested query. And my application (php) in this
>> > case... will take care of show only part of the te

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hi Andrea,

Thanks for the reply!

I tried with the hl.fl parameter as well, using as below:

http://localhost:8983/solr/techproducts/select?q=nietava&fl=id%2C+content&wt=json&indent=true&hl=true&;
hl.fl=f.content.hl.content%3D4&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

with the parameter under the hl field in the solr ui:

1. f.content.hl.snnipets=2
2. f.content.hl.content=4
3. content

with no success...

Should I have this configuration in the XML file?

Regards,

*Evert *

2015-12-16 11:23 GMT-02:00 Andrea Gazzarini :

> Hi Evert,
> what is the configuration of the default request handler? Did you set the
> hl.fl parameter?
>
> Please check here [1] the parameters that the highlighting component
> expects. Required parameters should be in the query string or declared
> within the request handler which answers to your query.
>
> Andrea
>
> [1] https://wiki.apache.org/solr/HighlightingParameters
>
>
>
>
> 2015-12-16 12:51 GMT+01:00 Evert R. :
>
> > Hi everyone!
> >
> > I think I should not have posted my server name... never had that many
> > access attempts...
> >
> >
> >
> > 2015-12-16 9:03 GMT-02:00 Evert R. :
> >
> > > Hello Erick,
> > >
> > > Thanks again for your time.
> > >
> > > Here is as far as I have gone:
> > >
> > > 1. I started a fresh install and did the following:
> > >
> > > [evert@nix]$ bin/solr start -e techproducts
> > > [evert@nix]$ curl '
> > >
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> > '
> > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> > >
> > > 2. I am using only the Solr Admin UI to check the query respond, here
> is
> > > an example:
> > >
> > > Query: http://
> > > ​localhost
> > >
> > >
> >
> :8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> > >
> > > Result: {
> > >   "responseHeader": {
> > > "status": 0,
> > > "QTime": 14,
> > > "params": {
> > >   "q": "nietava",
> > >   "hl": "true",
> > >   "hl.simple.post": "",
> > >   "indent": "true",
> > >   "fl": "id, author, content",
> > >   "wt": "json",
> > >   "hl.simple.pre": "",
> > >   "_": "1450262674102"
> > > }
> > >   },
> > >   "response": {
> > > "numFound": 1,
> > > "start": 0,
> > > "docs": [
> > >   {
> > > "id": "pdf1",
> > > "author": "Wander",
> > > "content": [
> > >   "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n\n
> > > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
> > > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo Espiritual”
> > \n
> > > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n \n
> > \n
> > > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua
> Souza
> > > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> > > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
> > > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n
> Coleção
> > > \n“A Vida no Mundo Espiritual” \n"
> > > ]
> > >   }
> > > ]
> > >   },
> > >   "highlighting": {
> > > "pdf1": {}
> > >   }
> > > }
> > >
> > > **On the content it brings the whole pdf content (book), and notice
> that
> > > in the highlight it shows empty.
> > >
> > > I tried creating a new core with bin/solr create -c test, using the
> > > schema.xml and solrconfig.xml standard found in
> > > /solr/server/solr/configsets/basic_configs/conf
> > >
> > > But even though... not working as expected (I think).
> > >
> > >
> > > Would you know how to set this techproducts example to bring the
> snnipets
> > > of text?
> > >
> > > The server only allow

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hi Andrea,

ok, let´s do it:

1. it does has the 'nietava' term, so it brings the only book (pdf file)
has this word, and all its content as my previous message to Erick, so the
content field is there.

2. using content:nietava it does not show any result as below:

{ "responseHeader": { "status": 400, "QTime": 12, "params": { "q":
"contents:nietava", "indent": "true", "fl": "id", "wt": "json", "_":
"1450282631352" } }, "error": { "msg": "undefined field contents", "code":
400 } }

3. Here is what I found when grepping 'content' from the techproducts conf
folder:

schema.xml:  schema.xml: 
schema.xml:  schema.xml:
 solrconfig.xml: content_type solrconfig.xml: content features title name solrconfig.xml: 3 solrconfig.xml: 200 solrconfig.xml: content solrconfig.xml: 750 solrconfig.xml: application/json solrconfig.xml: application/csv solrconfig.xml: text/plain; charset=UTF-8

and the grep on 'content_type':

schema.xml:   
schema.xml:   
solrconfig.xml:   content_type

=)

Thanks for checking out.



*Evert ​​*

2015-12-16 12:59 GMT-02:00 Andrea Gazzarini :

> hl=f.content.hl.content (I guess) is definitely wrong. Some questions:
>
>- First, sorry, the obvious question: are you sure the documents contain
>the "nietava" term?
>- Could you try to use q=content:nietaval?
>- Could you paste the definition (field & fieldtype) of the content
>field?
>
> > Should I have this configuration in the XML file?
>
> You could, but it's up to you and it strongly depends on your context. The
> simple thing is that if you have those parameters within the configuration
> you can avoid to pass them (as part of the requests), but probably in this
> phase, where you are testing, it's better to have them there (in the
> request).
>
> Andrea
>
> 2015-12-16 15:28 GMT+01:00 Evert R. :
>
> > Hi Andrea,
> >
> > Thanks for the reply!
> >
> > I tried with the hl.fl parameter as well, using as below:
> >
> >
> >
> http://localhost:8983/solr/techproducts/select?q=nietava&fl=id%2C+content&wt=json&indent=true&hl=true&;
> >
> >
> hl.fl=f.content.hl.content%3D4&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >
> > with the parameter under the hl field in the solr ui:
> >
> > 1. f.content.hl.snnipets=2
> > 2. f.content.hl.content=4
> > 3. content
> >
> > with no success...
> >
> > Should I have this configuration in the XML file?
> >
> > Regards,
> >
> > *Evert *
> >
> > 2015-12-16 11:23 GMT-02:00 Andrea Gazzarini :
> >
> > > Hi Evert,
> > > what is the configuration of the default request handler? Did you set
> the
> > > hl.fl parameter?
> > >
> > > Please check here [1] the parameters that the highlighting component
> > > expects. Required parameters should be in the query string or declared
> > > within the request handler which answers to your query.
> > >
> > > Andrea
> > >
> > > [1] https://wiki.apache.org/solr/HighlightingParameters
> > >
> > >
> > >
> > >
> > > 2015-12-16 12:51 GMT+01:00 Evert R. :
> > >
> > > > Hi everyone!
> > > >
> > > > I think I should not have posted my server name... never had that
> many
> > > > access attempts...
> > > >
> > > >
> > > >
> > > > 2015-12-16 9:03 GMT-02:00 Evert R. :
> > > >
> > > > > Hello Erick,
> > > > >
> > > > > Thanks again for your time.
> > > > >
> > > > > Here is as far as I have gone:
> > > > >
> > > > > 1. I started a fresh install and did the following:
> > > > >
> > > > > [evert@nix]$ bin/solr start -e techproducts
> > > > > [evert@nix]$ curl '
> > > > >
> > > >
> > >
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> > > > '
> > > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> > > > >
> > > > > 2. I am using only the Solr Admin UI to check the query respond,
> here
> > > is
> > > > > an example:
> > > > >
> > > > > Query: http://
> > > > > ​localhost
> > > > >
> > > > >
> > > 

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hi Erick,

I think you are right!

When I use the form 'features:accents' in my case 'content:nietava', it
show as if there was not matching words... but if I take the field off
having only the 'q=searchword' (q=nietava) it brings the pdf content file,
as below (in XML out type):

#partial snip:


Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc Francisco
Cândido Xavier e Waldo Vieira Sexo e Destino 12o livro da Coleção “A Vida
no Mundo Espiritual” Ditado pelo Espírito André Luiz FEDERAÇÃO ESPÍRITA
BRASILEIRA DEPARTAMENTO EDITORIAL Rua Souza Valente, 17 20941-040 - Rio -
RJ - Brasil http://www.febnet.org.br/ Francisco Cândido Xavier - Sexo e
Destino - pelo Espírito André Luiz 2 Coleção “A Vida no Mundo Espiritual”
01 - Nosso Lar 02 - Os Mensageiros 03 - Missionários da Luz 04 - Obreiros
da Vida Eterna 05 - No Mundo Maior 06 - Libertação 07 - Entre a Terra e o
Céu 08 - Nos Domínios da Mediunidade 09 - Ação e Reação 10 - Evolução em
Dois Mundos 11 - Mecanismos da Mediunidade 12 - Sexo e Destino 13 - E a
Vida Continua... Francisco Cândid
​
So, using:

1. q=content:nietava&hl=true&hl.fl=content  -> results:



0
3

content:nietava
true
content






2.q=nietava&hl=true&hl.fl=content  -> results:



0
93

nietava
true
content




pdf1
2011-07-28T20:39:26Z


Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc



application/pdf

Wander
Wander


Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc Francisco
Cândido Xavier e Waldo Vieira Sexo e Destino 12o livro da Coleção “A Vida
no Mundo Espiritual” Ditado pelo Espírito André Luiz FEDERAÇÃO ESPÍRITA
BRASILEIRA DEPARTAMENTO EDITORIAL Rua Souza Valente, 17 20941-040 - Rio -
RJ - Brasil http://www.febnet.org.br/ Francisco Cândido Xavier - Sexo e
Destino - pelo Espírito André Luiz 2 Coleção “A Vida no Mundo Espiritual”
01 - Nosso Lar 02 - Os Mensageiros 03 - Missionários da Luz 04 - Obreiros
da Vida Eterna 05 - No Mundo Maior 06 - Libertação 07 - Entre a Terra e o
Céu 08 - Nos Domínios da Mediunidade 09 - Ação e Reação 10 - Evolução em
Dois Mundos 11 - Mecanismos da Mediunidade 12 - Sexo e Destino 13 - E a
Vida Continua... Francisco Cândido Xavier - ...(long text...
including the word 'nietava'
​  

1520731379641352192





​

 =(

Thanks!

​
*Evert*

2015-12-16 15:17 GMT-02:00 Erick Erickson :

> Ok, you're getting confused by all the options, an easy thing to do.
> You're trying to do too many things at once without making sure
> the basics work
>
> 1> Forget all about the f.content.hl stuff. That's there in case
> you want to specify different parameters for different fields in the same
> highlight request. That's an advanced option for later
>
> 2> start with the basic techproducts example. Then this should show
> you hightlights:
> q=features:accents&hl=true&hl.fl=features
>
> That's about as basic as you get. It's searching for "accents" in the
> features field and returning highlights on the features field.
>
> Once that's working, _then_ refine.
>
> Best,
> Erick
>
> On Wed, Dec 16, 2015 at 8:21 AM, Evert R.  wrote:
> > Hi Andrea,
> >
> > ok, let´s do it:
> >
> > 1. it does has the 'nietava' term, so it brings the only book (pdf file)
> > has this word, and all its content as my previous message to Erick, so
> the
> > content field is there.
> >
> > 2. using content:nietava it does not show any result as below:
> >
> > { "responseHeader": { "status": 400, "QTime": 12, "params": { "q":
> > "contents:nietava", "indent": "true", "fl": "id", "wt": "json", "_":
> > "1450282631352" } }, "error": { "msg": "undefined field contents",
> "code":
> > 400 } }
> >
> > 3. Here is what I found when grepping 'content' from the techproducts
> conf
> > folder:
> >
> > schema.xml:  > stored="true" multiValued="true"/> schema.xml:  > type="text_general" indexed="false" stored="true" multiValued="true"/>
> > schema.xml:  schema.xml:
> >  solrconfig.xml:  > name="facet.field">content_type solrconfig.xml:  > name="hl.fl">content features title name solrconfig.xml:  > name="f.content.hl.snippets">3 solrconfig.xml:  > name="f.content.hl.fragsize">200 solrconfig.xml:  > name="f.content.hl.alternateField">content solrconfig.xml:  > name="f.content.hl.maxAlternateFieldLength">750 solrconfig.xml:
>  > name="stream.c

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hi Teague!

I configured the solrconf.xml and schema.xml exactly the way you did, only
substituting the word 'documentText' per 'content' used by the techproducts
sample, I reindex through :

 curl '
http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true'
-F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"

with the same result no highlight in the respond as below:

"highlighting": { "pdf1": {} }

=(

Really... do not know what to do...

Thanks for your time, if you have any more suggestion where I could be
missing something... please let me know.


Best regards,

*Evert*

2015-12-16 15:30 GMT-02:00 Teague James :

> Hi Evert,
>
> I recently needed help with phrase highlighting and was pointed to the
> FastVectorHighlighter which worked out great. I just made a change to the
> configuration to add generateWordParts="0" and generateNumberParts="0" so
> that searches for things like "1a" would get highlighted correctly. You may
> or may not need that feature. You can always remove them or change the
> value to "1" to switch them on explicitly. Anyway, hope this helps!
>
> solrconfig.xml (partial snip)
> 
> 
> xml
> explicit
> 10
> documentText
> on
> text
> true
> 100
> 
> 
> 
> 
>
> schema.xml (partial snip)
> required="true" multiValued="false" />
> multivalued="true" termVectors="true" termOffsets="true"
> termPositions="true" />
>
>  positionIncrementGap="100">
> 
> 
>  words="stopwords.txt" />
>  catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> generateWordParts="0" />
>  synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
> 
> 
> 
>  catenateAll="1" preserveOriginal="1" generateWordParts="0" />
>  words="stopwords.txt" />
> 
> 
> 
> 
>
> -Teague
>
> From: Evert R. [mailto:evert.ra...@gmail.com]
> Sent: Tuesday, December 15, 2015 6:25 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Basic Configuration - Highlight - Begginer
>
> Hi there!
>
> It´s my first installation, not sure if here is the right channel...
>
> Here is my steps:
>
> 1. Set up a basic install of solr 5.4.0
>
> 2. Create a new core through command line (bin/solr create -c test)
>
> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>
> 4. Query over the browser and it brings the correct search, but it does
> not show the part of the text I am querying, the highlight.
>
>   I have already flagled the 'hl' option. But still it does not word...
>
> Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
> matches for this word, it shows me the book name (pdf file) but does not
> bring which part of the text it has the word peace on it.
>
>
> I am problably missing some configuration in schema.xml, which is missing
> from my folder /solr/server/solr/test/conf/
>
> Or even the solrconfig.xml...
>
> I have read a bunch of things about highlight check these files, copied
> the standard schema.xml to my core/conf folder, but still it does not bring
> the highlight.
>
>
> Attached a copy of my solrconfig.xml file.
>
>
> I am very sorry for this, probably, dumb and too basic question... First
> time I see solr in live.
>
>
> Any help will be appreciated.
>
>
>
> Best regards,
>
>
> Evert Ramos
>
> mailto:evert.ra...@gmail.com
>
>
>


Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
 with
> your search term (assuming you already indexed your document) and if
> necessary you can put something in the FQ like "id:123456" to target a
> specific record.
> >
> > Did you get a hit? If no, then it's not highlighting that's the issue.
> If yes, then try dumping this in your address bar (using your URL/IP,
> search term, and core name of course. The fq= is an example) :
> > http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]";
> >
> > That will dump Solr's output to your browser where you can see exactly
> what is getting hit.
> >
> > Hope that helps! Let me know how it goes. Good luck.
> >
> > -Teague
> >
> > -Original Message-
> > From: Evert R. [mailto:evert.ra...@gmail.com]
> > Sent: Wednesday, December 16, 2015 1:46 PM
> > To: solr-user 
> > Subject: Re: Solr Basic Configuration - Highlight - Begginer
> >
> > Hi Teague!
> >
> > I configured the solrconf.xml and schema.xml exactly the way you did,
> only substituting the word 'documentText' per 'content' used by the
> techproducts sample, I reindex through :
> >
> >  curl '
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> '
> > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> >
> > with the same result no highlight in the respond as below:
> >
> > "highlighting": { "pdf1": {} }
> >
> > =(
> >
> > Really... do not know what to do...
> >
> > Thanks for your time, if you have any more suggestion where I could be
> missing something... please let me know.
> >
> >
> > Best regards,
> >
> > *Evert*
> >
> > 2015-12-16 15:30 GMT-02:00 Teague James :
> >
> >> Hi Evert,
> >>
> >> I recently needed help with phrase highlighting and was pointed to the
> >> FastVectorHighlighter which worked out great. I just made a change to
> >> the configuration to add generateWordParts="0" and
> >> generateNumberParts="0" so that searches for things like "1a" would
> >> get highlighted correctly. You may or may not need that feature. You
> >> can always remove them or change the value to "1" to switch them on
> explicitly. Anyway, hope this helps!
> >>
> >> solrconfig.xml (partial snip)
> >> 
> >> 
> >> xml
> >> explicit
> >> 10
> >> documentText
> >> on
> >> text
> >>  name="hl.useFastVectorHighlighter">true
> >> 100
> >> 
> >> 
> >> 
> >> 
> >>
> >> schema.xml (partial snip)
> >> >> required="true" multiValued="false" />
> >> >> multivalued="true" termVectors="true" termOffsets="true"
> >> termPositions="true" />
> >>
> >>  >> positionIncrementGap="100">
> >> 
> >> 
> >>  >> words="stopwords.txt" />
> >>  >> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> >> generateWordParts="0" />
> >>  >> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >>  >> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
> >>  >> words="stopwords.txt" />
> >> 
> >> 
> >> 
> >> 
> >>
> >> -Teague
> >>
> >> From: Evert R. [mailto:evert.ra...@gmail.com]
> >> Sent: Tuesday, December 15, 2015 6:25 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Solr Basic Configuration - Highlight - Begginer
> >>
> >> Hi there!
> >>
> >> It´s my first installation, not sure if here is the right channel...
> >>
> >> Here is my steps:
> >>
> >> 1. Set up a basic install of solr 5.4.0
> >>
> >> 2. Create a new core through command line (bin/solr create -c test)
> >>
> >> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> >>
> >> 4. Query over the browser and it brings the correct search, but it
> >> does not show the part of the text I am querying, the highlight.
> >>
> >>   I have already flagled the 'hl' option. But still it does not word...
> >>
> >> Exemple: I am looking for the word 'peace' in my pdf file (book) I
> >> have 4 matches for this word, it shows me the book name (pdf file) but
> >> does not bring which part of the text it has the word peace on it.
> >>
> >>
> >> I am problably missing some configuration in schema.xml, which is
> >> missing from my folder /solr/server/solr/test/conf/
> >>
> >> Or even the solrconfig.xml...
> >>
> >> I have read a bunch of things about highlight check these files,
> >> copied the standard schema.xml to my core/conf folder, but still it
> >> does not bring the highlight.
> >>
> >>
> >> Attached a copy of my solrconfig.xml file.
> >>
> >>
> >> I am very sorry for this, probably, dumb and too basic question...
> >> First time I see solr in live.
> >>
> >>
> >> Any help will be appreciated.
> >>
> >>
> >>
> >> Best regards,
> >>
> >>
> >> Evert Ramos
> >>
> >> mailto:evert.ra...@gmail.com
> >>
> >>
> >>
> >
>


Re: Solr Basic Configuration - Highlight - Begginer

2015-12-17 Thread Evert R.
Hello Erick,

Sorry for my mistakes. Here is everything I got so far:

1. It bring the result perfectly but the hightlight (empty) field as below:
{

  "responseHeader":{
"status":0,
"QTime":15,
"params":{
  "q":"text:nietava",
  "debug":"query",
  "hl":"true",
  "hl.simple.post":"",
  "indent":"true",
  "fq":"id:pdf1",
  "hl.fl":"text",
  "wt":"json",
  "hl.simple.pre":""}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"pdf1",
"last_modified":"2011-07-28T20:39:26Z",
"title":["Microsoft Word - André Luiz - Sexo e Destino _Chico
e Waldo_.doc"],
"content_type":["application/pdf"],
"author":"Wander",
"author_s":"Wander",
"content":["André Luiz - Sexo e Destino _Chico e Waldo_.doc
***the whole content*** nietava"],

"_version_":1520765393269948416}]
  },
  *"highlighting":{
"pdf1":{***I THINK THE SNIPPETS OF TEXT SHOULD BE IN HERE, RIGHT?***}},*
  "debug":{
"rawquerystring":"text:nietava",
"querystring":"text:nietava",
"parsedquery":"text:nietava",
"parsedquery_toString":"text:nietava",
"QParser":"LuceneQParser",
"filter_queries":["id:pdf1"],

"parsed_filter_queries":["id:pdf1"]}}


2. Here is my settings:

In schema.xml:





  



  
  




  


In solrconfig.xml:

  explicit 10 false 

I have tried:

schema.xml:   

schema.xml:   

schema.xml:

















solrconfig.xml:

text
on
text
true
100



The debug is in the reply I have received.


I am still using the standard techproducts.


I hope this is complete enough.


Thanks again!



*Evert*

2015-12-17 2:01 GMT-02:00 Erick Erickson :

> bq: but when highlight, using the text field...nothing comes up...
>
>
> http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>
> It's unclear what this means. No results showed up (i.e. numFound==0)
> or no highlighting showed up? Assuming that
> 1> the "text" field has stored=true and
> 2> you find documents when searching on the "text" field
> the above should show something in the highlights section.
>
> Please take the time to provide complete details. Guessing what you're
> doing is wasting time, mine and yours. Once more:
> 1> what is the schema definition for the "text" field. Include the
> fieldType definition
> 2> What is the result of adding &debug=query to the field when you
> don't get highlights
>
> You might review: http://wiki.apache.org/solr/UsingMailingLists
> because it's becoming quite frustrating that you give us little bits
> of information that leave us guessing what you're _really_ doing.
> Highlighting is working for lots of people in lots of sites, it's not
> likely that this functionality is completely broken so the answer will
> be in the docs.
>
> Best,
> ERick
>
> On Wed, Dec 16, 2015 at 5:54 PM, Evert R.  wrote:
> > Hi Erick and Teague,
> >
> >
> > I found that when using the field 'text' it shows the pdf file result
> > id:pdf1 in this case, like:
> >
> > http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava
> >
> > but when highlight, using the text field...nothing comes up...
> >
> >
> http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >
> > of even with the option
> >
> > f.text.hl.snippets=2 under the hl.fl field.
> >
> >
> > I tried as well with the standard configuration, did it all over,
> reindexed
> > a couple times... and still did not work.
> >
> > Also,
> >
> > Using the Analysis, it brings below information:
> >
> > ST
> > textraw_bytesstartendpositionLengthtypeposition
> > nietav

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-17 Thread Evert R.
Hello Teague,

Thanks for your reply and tip! I think Solr will give me a better result
than just using Tika to read up my files and send to a Fulltext Index in my
MySQL, which has the precise point of not highlighting the text snippets...

So, I will keep on trying to fix Solr to my needs, and sure it works... I
am missing something.

Thanks again and I will keep on track.

When I find the solution I will post all files and configs here for future
references.

Best regards,

*Evert*

2015-12-17 6:11 GMT-02:00 Teague James :

> Erik's comments not withstanding, there are some gaps in my understanding
> of your precise situation. Here's a few things that weren't necessarily
> obvious to me when I took my first try with Solr.
>
> Highlighting is the end result of a good hit. It is essentially formatting
> applied to your hit. It is possible to get a hit without a highlight if
> certain conditions exist.
>
> First, start by making sure you are indexing your target (a PDF file?)
> correctly. Assuming you are indexing PDFs, are you extracting meta data
> only or are you parsing the document with Tika? If you want hits on the
> contents of your PDF, then you have to parse it at index time and store
> that.That was why I suggested just running some queries through the
> interface and the URL to see what Solr actually captured from your indexed
> PDF before worrying about how it looks on the screen.
>
> Next, you should look carefully at the Analyzer's output. Notice the
> abbreviations to the left of the columns? Hover over those to see what
> filter factory it is. When words are split into multiple columns at one of
> those points, it indicates that the filter factory broke apart the word
> while analyzing it. Do a search for the filter filter factories that you
> find and read up on them. In my case "1a" was being split into 4 by a word
> delimiter filter factory - "1a", "1", "a", "1a" which caused highlighting
> to fail in my case while still getting a hit. It also caused erroneous hits
> elsewhere. Adding some switches to the schema is all it took to correct
> that for me. However, every case is different based on your needs. That is
> why it is important to go through the analyzer and see if Solr's indexing
> and querying are doing what you expect.
>
> If that looks good and you've got solid hits all the way down, then it is
> time to start looking at your highlighter implementation in the index and
> query analyzers that you are using. My original issue of not being able to
> highlight phrases with one set of tags necessitated me switching to the
> fast vector highlighter - which had its own requirements for certain
> parameters to be set. Here again - going to the Solr docs and reading up on
> the various highlighters will be helpful in most cases.
>
> Solr has a very steep learning curve. I've been using it for several years
> and I still consider myself a noob. It can be a deep dive, but don't be
> discouraged. Keep at it. Cheers!
>
> -Teague
>
> On Wed, Dec 16, 2015 at 8:54 PM, Evert R.  wrote:
>
> > Hi Erick and Teague,
> >
> >
> > I found that when using the field 'text' it shows the pdf file result
> > id:pdf1 in this case, like:
> >
> > http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava
> >
> > but when highlight, using the text field...nothing comes up...
> >
> >
> >
> http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >
> > ​of even with the option
> >
> > f.text.hl.snippets=2 under the hl.fl field.
> >
> >
> > I tried as well with the standard configuration, did it all over,
> reindexed
> > a couple times... and still did not work.
> >
> > Also,
> >
> > Using the Analysis, it brings below information:
> >
> > ST
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]0711
> > SF
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]0711
> > LCF
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]0711
> > ​
> >
> > Alphanumeric I think... so, it´s 'string', right? would that be a
> problem?
> > Should be some other indication?
> >
> >
> > Thanks again!
> >
> >
> > *Evert*
> >
> > 2015-12-16 21:09 GMT-02:00 Erick Erickson :
> >
> > > I think you're still missing the critical bit. Highlighting is
> > 

Highlight brings the content from the first pages of pdf

2016-02-14 Thread Evert R.
Hi There,

I have a situation where started a techproducts, without any modification,
post a pdf file. When searching as:

q=text:search_word
hl=true
hl.fl=content

It show the highlight accordingly! =)

BUT... *if the "search_word" is after the first pages* in my pdf file, such
as page 15...

It simply *does not show* *the HIGHLIGHT*...

Does anyone has faced this situation before?


Thanks!


*--Evert*


Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Evert R.
Hi Paul,

Sorry my late reply.

All the content is inside de docs. It brings the docs and the pdf file that
has the search word in it. But the highlight is not showing if the search
word is after a few pages.

Evert


*--Evert*

2016-02-14 8:36 GMT-02:00 Paul Libbrecht :

> This looks like the stored content is shortened. Can it be?
> Can you see that inside the docs?
>
> paul
>
> > Evert R. <mailto:evert.ra...@gmail.com>
> > 14 February 2016 at 11:26
> > Hi There,
> >
> > I have a situation where started a techproducts, without any
> modification,
> > post a pdf file. When searching as:
> >
> > q=text:search_word
> > hl=true
> > hl.fl=content
> >
> > It show the highlight accordingly! =)
> >
> > BUT... *if the "search_word" is after the first pages* in my pdf file,
> > such
> > as page 15...
> >
> > It simply *does not show* *the HIGHLIGHT*...
> >
> > Does anyone has faced this situation before?
> >
> >
> > Thanks!
> >
> >
> > *--Evert*
> >
>
>


Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Evert R.
Hi Binoy,

I could not find this option in my solrconfig.xml file. ]

I tryied to add this setting and nothing changed...

Here is the code, I might miss placed:




  
  
  

  400
  409600

  

  
  

  
  200
  409600
  
  0.5
  
  [-\w ,/\n\"']{20,200}

  



thanks!


*--Evert*

2016-02-14 12:14 GMT-02:00 Binoy Dalal :

> From the solr wiki:
> hl.maxAnalyzedChars
>
> How many characters into a document to look for suitable
> snippets  Solr1.3. This parameter makes sense for the original Highlighter
> only.
>
> The default value is "51200".
>
> You can assign a large value to this parameter and use hl.fragsize=0 to
> return highlighting in large fields that have size greater than 51200
> characters.
>
> I think this might be your hiccup.
>
> On Sun, 14 Feb 2016, 17:11 Evert R.  wrote:
>
> > Hi Paul,
> >
> > Sorry my late reply.
> >
> > All the content is inside de docs. It brings the docs and the pdf file
> that
> > has the search word in it. But the highlight is not showing if the search
> > word is after a few pages.
> >
> > Evert
> >
> >
> > *--Evert*
> >
> > 2016-02-14 8:36 GMT-02:00 Paul Libbrecht :
> >
> > > This looks like the stored content is shortened. Can it be?
> > > Can you see that inside the docs?
> > >
> > > paul
> > >
> > > > Evert R. <mailto:evert.ra...@gmail.com>
> > > > 14 February 2016 at 11:26
> > > > Hi There,
> > > >
> > > > I have a situation where started a techproducts, without any
> > > modification,
> > > > post a pdf file. When searching as:
> > > >
> > > > q=text:search_word
> > > > hl=true
> > > > hl.fl=content
> > > >
> > > > It show the highlight accordingly! =)
> > > >
> > > > BUT... *if the "search_word" is after the first pages* in my pdf
> file,
> > > > such
> > > > as page 15...
> > > >
> > > > It simply *does not show* *the HIGHLIGHT*...
> > > >
> > > > Does anyone has faced this situation before?
> > > >
> > > >
> > > > Thanks!
> > > >
> > > >
> > > > *--Evert*
> > > >
> > >
> > >
> >
> --
> Regards,
> Binoy Dalal
>


Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Evert R.
Hi Binoy,

thanks!

Still not working, check the output:

{
  "responseHeader":{
"status":0,
"QTime":58,
"params":{
  "q":"nietava",
  "hl":"true",
  "hl.simple.post":"",
  "indent":"true",
  "fl":"id",
  "hl.flagsize":"0",
  "hl.fl":"content",
  "hl.maxAnalzyedChars":"208400",
  "wt":"json",
  "hl.simple.pre":""}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"/home/solr/dados/teste/Emmanuel.pdf"}]
  },
  "highlighting":{
"/home/solr/dados/teste/Emmanuel.pdf":{}}}



*--Evert*

2016-02-14 14:31 GMT-02:00 Binoy Dalal :

> Don't add this parameter to the searchComponent definition, because the
> components where you've added it, GapFragmenter and RegexFragmenter, simply
> don't use it.
> Instead, add it to your request handler (/select etc.) if you've configured
> highlighting in the handler or append it to your query:
> *&hl.maxAnalzyedChars=*.
> Additionally also set the *hl.fragsize parameter to 0*, if your text is
> larger than 51200 chars which it mostly is, in a similar fashion.
>
>
> On Sun, Feb 14, 2016 at 9:02 PM Evert R.  wrote:
>
> > Hi Binoy,
> >
> > I could not find this option in my solrconfig.xml file. ]
> >
> > I tryied to add this setting and nothing changed...
> >
> > Here is the code, I might miss placed:
> >
> > 
> > 
> > 
> >   
> >   
> >>   default="true"
> >   class="solr.highlight.GapFragmenter">
> > 
> >   400
> >   409600
> > 
> >   
> >
> >   
> >>   class="solr.highlight.RegexFragmenter">
> > 
> >   
> >   200
> >   409600
> >   
> >   0.5
> >   
> >   [-\w
> > ,/\n\"']{20,200}
> > 
> >       
> >
> > 
> >
> > thanks!
> >
> >
> > *--Evert*
> >
> > 2016-02-14 12:14 GMT-02:00 Binoy Dalal :
> >
> > > From the solr wiki:
> > > hl.maxAnalyzedChars
> > >
> > > How many characters into a document to look for suitable
> > > snippets  Solr1.3. This parameter makes sense for the original
> > Highlighter
> > > only.
> > >
> > > The default value is "51200".
> > >
> > > You can assign a large value to this parameter and use hl.fragsize=0 to
> > > return highlighting in large fields that have size greater than 51200
> > > characters.
> > >
> > > I think this might be your hiccup.
> > >
> > > On Sun, 14 Feb 2016, 17:11 Evert R.  wrote:
> > >
> > > > Hi Paul,
> > > >
> > > > Sorry my late reply.
> > > >
> > > > All the content is inside de docs. It brings the docs and the pdf
> file
> > > that
> > > > has the search word in it. But the highlight is not showing if the
> > search
> > > > word is after a few pages.
> > > >
> > > > Evert
> > > >
> > > >
> > > > *--Evert*
> > > >
> > > > 2016-02-14 8:36 GMT-02:00 Paul Libbrecht :
> > > >
> > > > > This looks like the stored content is shortened. Can it be?
> > > > > Can you see that inside the docs?
> > > > >
> > > > > paul
> > > > >
> > > > > > Evert R. <mailto:evert.ra...@gmail.com>
> > > > > > 14 February 2016 at 11:26
> > > > > > Hi There,
> > > > > >
> > > > > > I have a situation where started a techproducts, without any
> > > > > modification,
> > > > > > post a pdf file. When searching as:
> > > > > >
> > > > > > q=text:search_word
> > > > > > hl=true
> > > > > > hl.fl=content
> > > > > >
> > > > > > It show the highlight accordingly! =)
> > > > > >
> > > > > > BUT... *if the "search_word" is after the first pages* in my pdf
> > > file,
> > > > > > such
> > > > > > as page 15...
> > > > > >
> > > > > > It simply *does not show* *the HIGHLIGHT*...
> > > > > >
> > > > > > Does anyone has faced this situation before?
> > > > > >
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > >
> > > > > > *--Evert*
> > > > > >
> > > > >
> > > > >
> > > >
> > > --
> > > Regards,
> > > Binoy Dalal
> > >
> >
> --
> Regards,
> Binoy Dalal
>


Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Evert R.
Binoy,

You are the man! =)

Thank you very much!

Would you by chance know how could I get the second highlight of the same
word in the same file?

Like: file_1.pdf (has three words "nietava") so..., how can I bring the
highlighs for the three occurrences?

I am pretty new around, should I send (open) another subject?

Thanks again!


*--Evert*

2016-02-14 16:40 GMT-02:00 Binoy Dalal :

> Are you sure you've typed in the parameters correctly?
> In your response it says flagsize instead of fragsize and maxanalzyedchars
> instead of maxanalyzedchars.
>
> Ohh wait, I see that I made the analyzed typo. Awfully sorry for that, I'm
> using my phone to send the mail out.
>
> On Sun, 14 Feb 2016, 23:53 Evert R.  wrote:
>
> > Hi Binoy,
> >
> > thanks!
> >
> > Still not working, check the output:
> >
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":58,
> > "params":{
> >   "q":"nietava",
> >   "hl":"true",
> >   "hl.simple.post":"",
> >   "indent":"true",
> >   "fl":"id",
> >   "hl.flagsize":"0",
> >   "hl.fl":"content",
> >   "hl.maxAnalzyedChars":"208400",
> >   "wt":"json",
> >   "hl.simple.pre":""}},
> >   "response":{"numFound":1,"start":0,"docs":[
> >   {
> > "id":"/home/solr/dados/teste/Emmanuel.pdf"}]
> >   },
> >   "highlighting":{
> > "/home/solr/dados/teste/Emmanuel.pdf":{}}}
> >
> >
> >
> > *--Evert*
> >
> > 2016-02-14 14:31 GMT-02:00 Binoy Dalal :
> >
> > > Don't add this parameter to the searchComponent definition, because the
> > > components where you've added it, GapFragmenter and RegexFragmenter,
> > simply
> > > don't use it.
> > > Instead, add it to your request handler (/select etc.) if you've
> > configured
> > > highlighting in the handler or append it to your query:
> > > *&hl.maxAnalzyedChars=*.
> > > Additionally also set the *hl.fragsize parameter to 0*, if your text is
> > > larger than 51200 chars which it mostly is, in a similar fashion.
> > >
> > >
> > > On Sun, Feb 14, 2016 at 9:02 PM Evert R. 
> wrote:
> > >
> > > > Hi Binoy,
> > > >
> > > > I could not find this option in my solrconfig.xml file. ]
> > > >
> > > > I tryied to add this setting and nothing changed...
> > > >
> > > > Here is the code, I might miss placed:
> > > >
> > > > 
> > > > 
> > > > 
> > > >   
> > > >   
> > > >> > >   default="true"
> > > >   class="solr.highlight.GapFragmenter">
> > > > 
> > > >   400
> > > >   409600
> > > > 
> > > >   
> > > >
> > > >   
> > > >> > >   class="solr.highlight.RegexFragmenter">
> > > > 
> > > >   
> > > >   200
> > > >   409600
> > > >   
> > > >   0.5
> > > >   
> > > >   [-\w
> > > > ,/\n\"']{20,200}
> > > > 
> > > >   
> > > >
> > > > 
> > > >
> > > > thanks!
> > > >
> > > >
> > > > *--Evert*
> > > >
> > > > 2016-02-14 12:14 GMT-02:00 Binoy Dalal :
> > > >
> > > > > From the solr wiki:
> > > > > hl.maxAnalyzedChars
> > > > >
> > > > > How many characters into a document to look for suitable
> > > > > snippets  Solr1.3. This parameter makes sense for the original
> > > > Highlighter
> > > > > only.
> > > > >
> > > > > The default value is "51200".
> > > > >
> > > > > You can assign a large value to this parameter and use
> hl.fragsize=0
> > to
> > > > > return highlighting in large fields that have size greater than
> 5

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Evert R.
Binoy,

Thank you very much for you reply and explanation.

Best regards,


*--Evert*

2016-02-14 23:28 GMT-02:00 Binoy Dalal :

> What you've done so far will highlight every instance of "nietava" found in
> the field, and return it, i.e., your entire field will return with all the
> "nietava"s in  tags.
> If you do not want the entire field, only portions of your field containing
> the matched terms, then use hl.snippets parameter = the number of snippets
> you want, in this particular case 3, along with the hl.fragsize parameter
> set to the same number as your hl.mazAnalyzedChars (or a really large
> number).
>
> I suggest you go through the wiki documentation for highlighting once (
> https://wiki.apache.org/solr/HighlightingParameters). It should answer all
> of your questions regarding the use of the standard highlighter that you
> might have.
>
> As an additional note, I also suggest that you look into the
> PostingsHighlighter (
> https://cwiki.apache.org/confluence/display/solr/Postings+Highlighter),
> since you seem to be running highlighting on pretty big fields and postings
> is much more efficient at highlighting huge fields as compared to the
> standard highlighter.
>
> On Mon, Feb 15, 2016 at 4:15 AM Evert R.  wrote:
>
> > Binoy,
> >
> > You are the man! =)
> >
> > Thank you very much!
> >
> > Would you by chance know how could I get the second highlight of the same
> > word in the same file?
> >
> > Like: file_1.pdf (has three words "nietava") so..., how can I bring the
> > highlighs for the three occurrences?
> >
> > I am pretty new around, should I send (open) another subject?
> >
> > Thanks again!
> >
> >
> > *--Evert*
> >
> > 2016-02-14 16:40 GMT-02:00 Binoy Dalal :
> >
> > > Are you sure you've typed in the parameters correctly?
> > > In your response it says flagsize instead of fragsize and
> > maxanalzyedchars
> > > instead of maxanalyzedchars.
> > >
> > > Ohh wait, I see that I made the analyzed typo. Awfully sorry for that,
> > I'm
> > > using my phone to send the mail out.
> > >
> > > On Sun, 14 Feb 2016, 23:53 Evert R.  wrote:
> > >
> > > > Hi Binoy,
> > > >
> > > > thanks!
> > > >
> > > > Still not working, check the output:
> > > >
> > > > {
> > > >   "responseHeader":{
> > > > "status":0,
> > > > "QTime":58,
> > > > "params":{
> > > >   "q":"nietava",
> > > >   "hl":"true",
> > > >   "hl.simple.post":"",
> > > >   "indent":"true",
> > > >   "fl":"id",
> > > >   "hl.flagsize":"0",
> > > >   "hl.fl":"content",
> > > >   "hl.maxAnalzyedChars":"208400",
> > > >   "wt":"json",
> > > >   "hl.simple.pre":""}},
> > > >   "response":{"numFound":1,"start":0,"docs":[
> > > >   {
> > > > "id":"/home/solr/dados/teste/Emmanuel.pdf"}]
> > > >   },
> > > >   "highlighting":{
> > > > "/home/solr/dados/teste/Emmanuel.pdf":{}}}
> > > >
> > > >
> > > >
> > > > *--Evert*
> > > >
> > > > 2016-02-14 14:31 GMT-02:00 Binoy Dalal :
> > > >
> > > > > Don't add this parameter to the searchComponent definition, because
> > the
> > > > > components where you've added it, GapFragmenter and
> RegexFragmenter,
> > > > simply
> > > > > don't use it.
> > > > > Instead, add it to your request handler (/select etc.) if you've
> > > > configured
> > > > > highlighting in the handler or append it to your query:
> > > > > *&hl.maxAnalzyedChars=*.
> > > > > Additionally also set the *hl.fragsize parameter to 0*, if your
> text
> > is
> > > > > larger than 51200 chars which it mostly is, in a similar fashion.
> > > > >
> > > > >
> > > > > On Sun, Feb 14, 2016 at 9:02 PM Evert R. 
> > > wrote:
> > > > >
> > > > > > Hi Binoy,
> > > &

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Evert R.
Hello Mark,

Thanks for you reply.

All text is indexed (1 pdf file). It works now.

Best regard,


*--Evert*

2016-02-14 23:47 GMT-02:00 Mark Ehle :

> is all the text being indexed? Check to make sure that there's actually the
> data you are looking for in the index. Is there a setting in tika that
> limits how much is indexed? I seem to remember confronting this problem
> myself once, and the data that I wanted just wasn't in the index because it
> was never put there in the first place.Something about setMaxStringLength
> orsomething.
>
> On Sun, Feb 14, 2016 at 8:28 PM, Binoy Dalal 
> wrote:
>
> > What you've done so far will highlight every instance of "nietava" found
> in
> > the field, and return it, i.e., your entire field will return with all
> the
> > "nietava"s in  tags.
> > If you do not want the entire field, only portions of your field
> containing
> > the matched terms, then use hl.snippets parameter = the number of
> snippets
> > you want, in this particular case 3, along with the hl.fragsize parameter
> > set to the same number as your hl.mazAnalyzedChars (or a really large
> > number).
> >
> > I suggest you go through the wiki documentation for highlighting once (
> > https://wiki.apache.org/solr/HighlightingParameters). It should answer
> all
> > of your questions regarding the use of the standard highlighter that you
> > might have.
> >
> > As an additional note, I also suggest that you look into the
> > PostingsHighlighter (
> > https://cwiki.apache.org/confluence/display/solr/Postings+Highlighter),
> > since you seem to be running highlighting on pretty big fields and
> postings
> > is much more efficient at highlighting huge fields as compared to the
> > standard highlighter.
> >
> > On Mon, Feb 15, 2016 at 4:15 AM Evert R.  wrote:
> >
> > > Binoy,
> > >
> > > You are the man! =)
> > >
> > > Thank you very much!
> > >
> > > Would you by chance know how could I get the second highlight of the
> same
> > > word in the same file?
> > >
> > > Like: file_1.pdf (has three words "nietava") so..., how can I bring the
> > > highlighs for the three occurrences?
> > >
> > > I am pretty new around, should I send (open) another subject?
> > >
> > > Thanks again!
> > >
> > >
> > > *--Evert*
> > >
> > > 2016-02-14 16:40 GMT-02:00 Binoy Dalal :
> > >
> > > > Are you sure you've typed in the parameters correctly?
> > > > In your response it says flagsize instead of fragsize and
> > > maxanalzyedchars
> > > > instead of maxanalyzedchars.
> > > >
> > > > Ohh wait, I see that I made the analyzed typo. Awfully sorry for
> that,
> > > I'm
> > > > using my phone to send the mail out.
> > > >
> > > > On Sun, 14 Feb 2016, 23:53 Evert R.  wrote:
> > > >
> > > > > Hi Binoy,
> > > > >
> > > > > thanks!
> > > > >
> > > > > Still not working, check the output:
> > > > >
> > > > > {
> > > > >   "responseHeader":{
> > > > > "status":0,
> > > > > "QTime":58,
> > > > > "params":{
> > > > >   "q":"nietava",
> > > > >   "hl":"true",
> > > > >   "hl.simple.post":"",
> > > > >   "indent":"true",
> > > > >   "fl":"id",
> > > > >   "hl.flagsize":"0",
> > > > >   "hl.fl":"content",
> > > > >   "hl.maxAnalzyedChars":"208400",
> > > > >   "wt":"json",
> > > > >   "hl.simple.pre":""}},
> > > > >   "response":{"numFound":1,"start":0,"docs":[
> > > > >   {
> > > > > "id":"/home/solr/dados/teste/Emmanuel.pdf"}]
> > > > >   },
> > > > >   "highlighting":{
> > > > > "/home/solr/dados/teste/Emmanuel.pdf":{}}}
> > > > >
> > > > >
> > > > >
> > > > > *--Evert*
> > > > >

Re: What is the best way to index 15 million documents of total size 425 GB?

2016-03-04 Thread Evert R.
I have worked with Pentaho and I believe your problem might be there

Try to settled a quick php script and you might get better results with it.
There is no need for Data Integration on that.

Just a tip.
Em 04/03/2016 13:12, "Walter Underwood"  escreveu:

>
> > On Mar 3, 2016, at 9:54 AM, Aneesh Mon N  wrote:
> >
> > To be noted that all the fields are stored so as to support the atomic
> > updates.
>
> Are you doing all of these updates as atomic? That could be slow. If you
> are supplying all the fields, then just do a regular add.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Solr Basic Configuration - Highlight - Begginer

2015-12-15 Thread Evert R.
Hi there!

It´s my first installation, not sure if here is the right channel...

Here is my steps:

1. Set up a basic install of solr 5.4.0

2. Create a new core through command line (bin/solr create -c test)

3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)

4. Query over the browser and it brings the correct search, but it does not
show the part of the text I am querying, the highlight.

  I have already flagled the 'hl' option. But still it does not word...

Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
matches for this word, it shows me the book name (pdf file) but does not
bring which part of the text it has the word peace on it.


I am problably missing some configuration in schema.xml, which is missing
from my folder /solr/server/solr/test/conf/

Or even the solrconfig.xml...

I have read a bunch of things about highlight check these files, copied the
standard schema.xml to my core/conf folder, but still it does not bring the
highlight.


Attached a copy of my solrconfig.xml file.


I am very sorry for this, probably, dumb and too basic question... First
time I see solr in live.


Any help will be appreciated.



Best regards,



*Evert Ramos*
*evert.ra...@gmail.com *


Re: Solr Basic Configuration - Highlight - Begginer

2015-12-15 Thread Evert R.
Hi Erick,

Thank you very much for the reply!!

I do get back the full text, autor, and a whole lots of stuff which doesn´t
really matter for my project.

So, what you are saying is that the solr gets me back the full content and
my application will fix the rest? Which means for me that all my books (pdf
files) when searching for an specific word it will bring me the whole book
content that has the requested query. And my application (php) in this
case... will take care of show only part of the text (such as in highlight,
as I was understandind) and hightlight the key word I was looking for?

If so, Erick, you gave me a big help clearing out... I thought I would do
that with Solr in an easy way. =)

Thanks for the attachements tip!

Best regards,

​Evert​

2015-12-15 14:56 GMT-02:00 Erick Erickson :

> How are you trying to display the results? Highlighting is a bit of an
> odd beast. Assuming it's correctly configured, the response packet
> will have a separate highlight section, it's the application's
> responsibility to present that pleasingly.
>
> What _do_ you get bak in the response?
>
> BTW, the mail sever pretty aggressively strips attachments, your's
> didn't come through.
>
> Best,
> Erick
>
> On Tue, Dec 15, 2015 at 3:25 AM, Evert R.  wrote:
> > Hi there!
> >
> > It´s my first installation, not sure if here is the right channel...
> >
> > Here is my steps:
> >
> > 1. Set up a basic install of solr 5.4.0
> >
> > 2. Create a new core through command line (bin/solr create -c test)
> >
> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> >
> > 4. Query over the browser and it brings the correct search, but it does
> not
> > show the part of the text I am querying, the highlight.
> >
> >   I have already flagled the 'hl' option. But still it does not word...
> >
> > Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
> > matches for this word, it shows me the book name (pdf file) but does not
> > bring which part of the text it has the word peace on it.
> >
> >
> > I am problably missing some configuration in schema.xml, which is missing
> > from my folder /solr/server/solr/test/conf/
> >
> > Or even the solrconfig.xml...
> >
> > I have read a bunch of things about highlight check these files, copied
> the
> > standard schema.xml to my core/conf folder, but still it does not bring
> the
> > highlight.
> >
> >
> > Attached a copy of my solrconfig.xml file.
> >
> >
> > I am very sorry for this, probably, dumb and too basic question... First
> > time I see solr in live.
> >
> >
> > Any help will be appreciated.
> >
> >
> >
> > Best regards,
> >
> >
> > Evert Ramos
> >
> > evert.ra...@gmail.com
> >
>