Re: Information on classifier based key word suggestion

2017-01-24 Thread alessandro.benedetti
Hi Shamik,
for classification you can take a look to the Lucene module and the Solr
integration ( through UpdateRequestProcessor [1] ) .

Unfortunately I didn't have the time to work on the request handler version
[2], anyway you are free to contribute !

Related the extraction of interesting terms from text or a set of documents
is still work in progress.
But you can potentially play a bit with faceting to achieve something
similar.

Cheers

[1]
http://www.slideshare.net/AlessandroBenedetti/lucene-and-solr-document-classification
, 
https://issues.apache.org/jira/browse/SOLR-7739, 
https://issues.apache.org/jira/browse/SOLR-8871

[2] https://issues.apache.org/jira/browse/SOLR-7738



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Information-on-classifier-based-key-word-suggestion-tp4314942p4315510.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: A tool to quickly browse Solr documents ?

2017-01-24 Thread Charlie Hull

On 24/01/2017 04:36, Fengtan wrote:

Hi All,

I am looking for a tool to quickly browse/investigate documents indexed in
a Solr core.

The default web admin interface already offers this, but you need to know
the Solr query syntax if you want to list/filter/sort documents.

I have started to build my own tool (https://github.com/fengtan/sophie) but
I don't want to reinvent the wheel -- does anyone know if something similar
already exists ?

Thanks

We're building Marple, a RESTful API and GUI client to inspect Lucene 
indexes, originally started during our hackdays in October:

https://github.com/flaxsearch/marple

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Upgrade SOLR version - facets perfomance regression

2017-01-24 Thread alessandro.benedetti
Hi Solr,
I admit the issue you mentioned has not been transparently solved, and
indeed you would need to explicitly use the method=uif to get 4.10.1
behavior.

This is valid if you were using  fc/fcs approaches with high cardinality
fields.

In the case you facet method is enum ( Term Enumeration), the issue has been
transparently solved ( 
https://issues.apache.org/jira/browse/SOLR-9176 )

Cheers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4315512.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing nested documents giving back unrelated parents when asking for children

2017-01-24 Thread Fabien Renaud
Hello,

I'm wondering if I missed something in my code (which uses solrj 6.3):

public class Main {

private SolrClient client1;

public void run() {
client1 = new 
HttpSolrClient.Builder("http://localhost:8983/solr";).build();

SolrInputDocument doc1 = new SolrInputDocument();

doc1.addField("id", "1");
doc1.addField("type_s", "up");
SolrInputDocument doc2 = new SolrInputDocument();

doc2.addField("id", "2");
doc2.addField("type_s", "down");

doc1.addChildDocument(doc2);

SolrInputDocument doc4 = new SolrInputDocument();
doc4.addField("id", "4");
doc4.addField("type_s", "up");

SolrInputDocument doc5 = new SolrInputDocument();
doc5.addField("id", "5");
doc5.addField("type_s", "down");

doc4.addChildDocument(doc5);

try {
client1.add("techproducts", Arrays.asList(doc1,doc4));
} catch (Exception e) {
System.out.println("Indexing failed" + e);
}
}

If I start Solr 6.3 using bin/start start -e techproduct and ask the following:

http://localhost:8983/solr/techproducts/select?fl=*,[child%20parentFilter=type_s:down]&fq=type_s:down&indent=on&q=*:*&wt=json


then I get:

{
  "docs": [
{
  "id": "2",
  "type_s": "down"
},
{
  "id": "5",
  "type_s": "down",
  "_childDocuments_": [
{
  "id": "1",
  "type_s": "up"
}
  ]
}
  ]
}

which seems to be a bug for me. Or did I miss something?
Notice that the relations "2 is a child of 1" and "5 is a child of 4" are 
working fine. It's just that I get extra (unwanted and unrelated) relations.

Notice that at some point I manage to get back two documents with the __same__ 
id (with different version). I'm not able to reproduce this but I guess it 
could be related.

Fabien



no dataimport-handler defined!

2017-01-24 Thread Chris Rogers
Hi all,

Having frustrating issues with getting SOLR 6.4.0 to recognize the existence of 
my DIH config. I’m using Oracle Java8 jdk on Ubuntu 14.04.

The DIH .jar file appears to be loading correctly. There are no errors in the 
SOLR logs. It just says “Sorry, no dataimport-handler defined” in the SOLR 
admin UI.

My config files are listed below. Can anyone spot any mistakes here?

Many thanks,
Chris

# solrconfig.xml ##

  

…

  

  DIH-data-config.xml

  

# DIH-data-config.xml (in the same dir as solrconfig.xml) ##


  
  



  

  



  



  



--
Chris Rogers
Digital Projects Manager
Bodleian Digital Library Systems and Services
chris.rog...@bodleian.ox.ac.uk


Re: Indexing nested documents giving back unrelated parents when asking for children

2017-01-24 Thread Mikhail Khludnev
Hello Fabien,

I believe parentFilter should be type_s:up, and consequently the type_s:up
should go in fq.

On Tue, Jan 24, 2017 at 3:30 PM, Fabien Renaud 
wrote:

> Hello,
>
> I'm wondering if I missed something in my code (which uses solrj 6.3):
>
> public class Main {
>
> private SolrClient client1;
>
> public void run() {
> client1 = new HttpSolrClient.Builder("http://localhost:8983/solr
> ").build();
>
> SolrInputDocument doc1 = new SolrInputDocument();
>
> doc1.addField("id", "1");
> doc1.addField("type_s", "up");
> SolrInputDocument doc2 = new SolrInputDocument();
>
> doc2.addField("id", "2");
> doc2.addField("type_s", "down");
>
> doc1.addChildDocument(doc2);
>
> SolrInputDocument doc4 = new SolrInputDocument();
> doc4.addField("id", "4");
> doc4.addField("type_s", "up");
>
> SolrInputDocument doc5 = new SolrInputDocument();
> doc5.addField("id", "5");
> doc5.addField("type_s", "down");
>
> doc4.addChildDocument(doc5);
>
> try {
> client1.add("techproducts", Arrays.asList(doc1,doc4));
> } catch (Exception e) {
> System.out.println("Indexing failed" + e);
> }
> }
>
> If I start Solr 6.3 using bin/start start -e techproduct and ask the
> following:
>
> http://localhost:8983/solr/techproducts/select?fl=*,[
> child%20parentFilter=type_s:down]&fq=type_s:down&indent=on&q=*:*&wt=json
>
>
> then I get:
>
> {
>   "docs": [
> {
>   "id": "2",
>   "type_s": "down"
> },
> {
>   "id": "5",
>   "type_s": "down",
>   "_childDocuments_": [
> {
>   "id": "1",
>   "type_s": "up"
> }
>   ]
> }
>   ]
> }
>
> which seems to be a bug for me. Or did I miss something?
> Notice that the relations "2 is a child of 1" and "5 is a child of 4" are
> working fine. It's just that I get extra (unwanted and unrelated) relations.
>
> Notice that at some point I manage to get back two documents with the
> __same__ id (with different version). I'm not able to reproduce this but I
> guess it could be related.
>
> Fabien
>
>


-- 
Sincerely yours
Mikhail Khludnev


SQL-like queries (with percent character) - matching an exact substring, with parts of words

2017-01-24 Thread Maciej Ł. PCSS

Dear SOLR users,

please point me to the right solution of my problem. I'm using SOLR to 
implement a Google-like search in my application and this scenario is 
working fine.


However, in specific use-cases I need to filter documents that include a 
specific substring in a given field. It's about an SQL-like query 
similar to this:


SELECT *  FROM table WHERE someField = '%c def g%'

I expect to match documents having someField ='abc def ghi'. That means 
I expect match parts of words.


As I understand SOLR, as a reversed-index, does work with tokens rather 
that character strings and thereby looks for whole words (not substrings).


Is there any solution for such an issue?

Regards
Maciej Łabędzki



RE: Indexing nested documents giving back unrelated parents when asking for children

2017-01-24 Thread Fabien Renaud
I know it works as expected when I set type_s:up as you describe. But I was 
expecting no children at all in my query.

In my real query I have a document with several children and thus can't specify 
a specific type with childFilter. And I can't give back all children because 
some of them do not make any sense at all.
And the problem appears for an intermediate node (which has children and which 
itself a child of another). 

Fabien

-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: den 24 januari 2017 14:06
To: solr-user 
Subject: Re: Indexing nested documents giving back unrelated parents when 
asking for children

Hello Fabien,

I believe parentFilter should be type_s:up, and consequently the type_s:up 
should go in fq.

On Tue, Jan 24, 2017 at 3:30 PM, Fabien Renaud 
wrote:

> Hello,
>
> I'm wondering if I missed something in my code (which uses solrj 6.3):
>
> public class Main {
>
> private SolrClient client1;
>
> public void run() {
> client1 = new 
> HttpSolrClient.Builder("http://localhost:8983/solr
> ").build();
>
> SolrInputDocument doc1 = new SolrInputDocument();
>
> doc1.addField("id", "1");
> doc1.addField("type_s", "up");
> SolrInputDocument doc2 = new SolrInputDocument();
>
> doc2.addField("id", "2");
> doc2.addField("type_s", "down");
>
> doc1.addChildDocument(doc2);
>
> SolrInputDocument doc4 = new SolrInputDocument();
> doc4.addField("id", "4");
> doc4.addField("type_s", "up");
>
> SolrInputDocument doc5 = new SolrInputDocument();
> doc5.addField("id", "5");
> doc5.addField("type_s", "down");
>
> doc4.addChildDocument(doc5);
>
> try {
> client1.add("techproducts", Arrays.asList(doc1,doc4));
> } catch (Exception e) {
> System.out.println("Indexing failed" + e);
> }
> }
>
> If I start Solr 6.3 using bin/start start -e techproduct and ask the
> following:
>
> http://localhost:8983/solr/techproducts/select?fl=*,[
> child%20parentFilter=type_s:down]&fq=type_s:down&indent=on&q=*:*&wt=js
> on
>
>
> then I get:
>
> {
>   "docs": [
> {
>   "id": "2",
>   "type_s": "down"
> },
> {
>   "id": "5",
>   "type_s": "down",
>   "_childDocuments_": [
> {
>   "id": "1",
>   "type_s": "up"
> }
>   ]
> }
>   ]
> }
>
> which seems to be a bug for me. Or did I miss something?
> Notice that the relations "2 is a child of 1" and "5 is a child of 4" 
> are working fine. It's just that I get extra (unwanted and unrelated) 
> relations.
>
> Notice that at some point I manage to get back two documents with the 
> __same__ id (with different version). I'm not able to reproduce this 
> but I guess it could be related.
>
> Fabien
>
>


--
Sincerely yours
Mikhail Khludnev


Re: no dataimport-handler defined!

2017-01-24 Thread Alexandre Rafalovitch
Which solrconfig.xml are you editing and what kind of Solr install are
you running (cloud?). And did you reload the core.

I suspect you are not editing the file that is actually in use. For
example, if you are running a cloud setup, the solrconfig.xml on the
filesystem is disconnected from the config actually in use that is
stored in ZooKeeper. You would need to reupload it for change to take
effect.

You also may need to reload the core for changes to take effect.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 24 January 2017 at 07:43, Chris Rogers
 wrote:
> Hi all,
>
> Having frustrating issues with getting SOLR 6.4.0 to recognize the existence 
> of my DIH config. I’m using Oracle Java8 jdk on Ubuntu 14.04.
>
> The DIH .jar file appears to be loading correctly. There are no errors in the 
> SOLR logs. It just says “Sorry, no dataimport-handler defined” in the SOLR 
> admin UI.
>
> My config files are listed below. Can anyone spot any mistakes here?
>
> Many thanks,
> Chris
>
> # solrconfig.xml ##
>
>regex=".*dataimporthandler-.*\.jar" />
>
> …
>
>class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
>   DIH-data-config.xml
> 
>   
>
> # DIH-data-config.xml (in the same dir as solrconfig.xml) ##
>
> 
>   
>   
> 
>  fileName=".*xml"
> newerThan="'NOW-5YEARS'"
> recursive="true"
> rootEntity="false"
> dataSource="null"
> 
> baseDir="/home/bodl-tei-svc/sites/bodl-tei-svc/var/data/tolkein_tei">
>
>   
>
>  forEach="/TEI" url="${f.fileAbsolutePath}" 
> transformer="RegexTransformer" >
>  xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/>
>  xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/>
>  xpath="/TEI/teiHeader/fileDesc/sourceDesc/msDesc/msIdentifier/altIdentifier/idno"/>
>   
>
> 
>
>   
> 
>
>
> --
> Chris Rogers
> Digital Projects Manager
> Bodleian Digital Library Systems and Services
> chris.rog...@bodleian.ox.ac.uk


Re: no dataimport-handler defined!

2017-01-24 Thread Chris Rogers
Hi Alex,

I’m editing the solrconfig.xml file at /solr/server/solr/tei_config (ie the one 
generated from the configset when the node was created).

I’m running standalone, not cloud.

I’m restarting sole after every change. Do I need to reload the core instead of 
restarting?

I’ve also tried replacing the relative path to the .jar with an absolute path 
to the dist directory. Still didn’t work.

Thanks,
Chris

On 24/01/2017, 13:20, "Alexandre Rafalovitch"  wrote:

Which solrconfig.xml are you editing and what kind of Solr install are
you running (cloud?). And did you reload the core.

I suspect you are not editing the file that is actually in use. For
example, if you are running a cloud setup, the solrconfig.xml on the
filesystem is disconnected from the config actually in use that is
stored in ZooKeeper. You would need to reupload it for change to take
effect.

You also may need to reload the core for changes to take effect.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 24 January 2017 at 07:43, Chris Rogers
 wrote:
> Hi all,
>
> Having frustrating issues with getting SOLR 6.4.0 to recognize the 
existence of my DIH config. I’m using Oracle Java8 jdk on Ubuntu 14.04.
>
> The DIH .jar file appears to be loading correctly. There are no errors in 
the SOLR logs. It just says “Sorry, no dataimport-handler defined” in the SOLR 
admin UI.
>
> My config files are listed below. Can anyone spot any mistakes here?
>
> Many thanks,
> Chris
>
> # solrconfig.xml ##
>
>   
>
> …
>
>   
> 
>   DIH-data-config.xml
> 
>   
>
> # DIH-data-config.xml (in the same dir as solrconfig.xml) ##
>
> 
>   
>   
> 
>  fileName=".*xml"
> newerThan="'NOW-5YEARS'"
> recursive="true"
> rootEntity="false"
> dataSource="null"
> 
baseDir="/home/bodl-tei-svc/sites/bodl-tei-svc/var/data/tolkein_tei">
>
>   
>
>  forEach="/TEI" url="${f.fileAbsolutePath}" 
transformer="RegexTransformer" >
> 
> 
> 
>   
>
> 
>
>   
> 
>
>
> --
> Chris Rogers
> Digital Projects Manager
> Bodleian Digital Library Systems and Services
> chris.rog...@bodleian.ox.ac.uk




Re: no dataimport-handler defined!

2017-01-24 Thread Chris Rogers
A quick update. I rolled back to solr 6.2, and the data import handler is 
recognized there.

So there has either been a change in the config required between 6.2 and 6.4, 
or there’s a bug in 6.4

Any thoughts?   

On 24/01/2017, 13:32, "Chris Rogers"  wrote:

Hi Alex,

I’m editing the solrconfig.xml file at /solr/server/solr/tei_config (ie the 
one generated from the configset when the node was created).

I’m running standalone, not cloud.

I’m restarting sole after every change. Do I need to reload the core 
instead of restarting?

I’ve also tried replacing the relative path to the .jar with an absolute 
path to the dist directory. Still didn’t work.

Thanks,
Chris

On 24/01/2017, 13:20, "Alexandre Rafalovitch"  wrote:

Which solrconfig.xml are you editing and what kind of Solr install are
you running (cloud?). And did you reload the core.

I suspect you are not editing the file that is actually in use. For
example, if you are running a cloud setup, the solrconfig.xml on the
filesystem is disconnected from the config actually in use that is
stored in ZooKeeper. You would need to reupload it for change to take
effect.

You also may need to reload the core for changes to take effect.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and 
experienced


On 24 January 2017 at 07:43, Chris Rogers
 wrote:
> Hi all,
>
> Having frustrating issues with getting SOLR 6.4.0 to recognize the 
existence of my DIH config. I’m using Oracle Java8 jdk on Ubuntu 14.04.
>
> The DIH .jar file appears to be loading correctly. There are no 
errors in the SOLR logs. It just says “Sorry, no dataimport-handler defined” in 
the SOLR admin UI.
>
> My config files are listed below. Can anyone spot any mistakes here?
>
> Many thanks,
> Chris
>
> # solrconfig.xml ##
>
>   
>
> …
>
>   
> 
>   DIH-data-config.xml
> 
>   
>
> # DIH-data-config.xml (in the same dir as solrconfig.xml) ##
>
> 
>   
>   
> 
>  fileName=".*xml"
> newerThan="'NOW-5YEARS'"
> recursive="true"
> rootEntity="false"
> dataSource="null"
> 
baseDir="/home/bodl-tei-svc/sites/bodl-tei-svc/var/data/tolkein_tei">
>
>   
>
>  forEach="/TEI" url="${f.fileAbsolutePath}" 
transformer="RegexTransformer" >
> 
> 
> 
>   
>
> 
>
>   
> 
>
>
> --
> Chris Rogers
> Digital Projects Manager
> Bodleian Digital Library Systems and Services
> chris.rog...@bodleian.ox.ac.uk






Get Handler Returning Null

2017-01-24 Thread Chris Ulicny
Recently started using the get handler on a solr cloud collection and it
seems that it does not return any documents even when I can find those
documents by filtering for their unique ids.

I explicitly enabled the get handler, reindexed one of the documents, and
it seems to work fine for that single document, even when the handler is no
longer explicitly defined.

I am pretty sure that there weren't any changes made to the schema or
config files since the initial indexing. The update log has been enabled
since the collection was created.

Are there any obvious changes (or botched initial configurations) that
would possibly cause the get handler to not return documents that exist in
the collection?

Thanks,
Chris


Re: no dataimport-handler defined!

2017-01-24 Thread Alexandre Rafalovitch
Strange.

If you run a pre-built DIH example, do any of the cores work? (not the
RSS one, that is broken anyway).

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 24 January 2017 at 08:32, Chris Rogers
 wrote:
> Hi Alex,
>
> I’m editing the solrconfig.xml file at /solr/server/solr/tei_config (ie the 
> one generated from the configset when the node was created).
>
> I’m running standalone, not cloud.
>
> I’m restarting sole after every change. Do I need to reload the core instead 
> of restarting?
>
> I’ve also tried replacing the relative path to the .jar with an absolute path 
> to the dist directory. Still didn’t work.
>
> Thanks,
> Chris
>
> On 24/01/2017, 13:20, "Alexandre Rafalovitch"  wrote:
>
> Which solrconfig.xml are you editing and what kind of Solr install are
> you running (cloud?). And did you reload the core.
>
> I suspect you are not editing the file that is actually in use. For
> example, if you are running a cloud setup, the solrconfig.xml on the
> filesystem is disconnected from the config actually in use that is
> stored in ZooKeeper. You would need to reupload it for change to take
> effect.
>
> You also may need to reload the core for changes to take effect.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 24 January 2017 at 07:43, Chris Rogers
>  wrote:
> > Hi all,
> >
> > Having frustrating issues with getting SOLR 6.4.0 to recognize the 
> existence of my DIH config. I’m using Oracle Java8 jdk on Ubuntu 14.04.
> >
> > The DIH .jar file appears to be loading correctly. There are no errors 
> in the SOLR logs. It just says “Sorry, no dataimport-handler defined” in the 
> SOLR admin UI.
> >
> > My config files are listed below. Can anyone spot any mistakes here?
> >
> > Many thanks,
> > Chris
> >
> > # solrconfig.xml ##
> >
> >regex=".*dataimporthandler-.*\.jar" />
> >
> > …
> >
> >class="org.apache.solr.handler.dataimport.DataImportHandler">
> > 
> >   DIH-data-config.xml
> > 
> >   
> >
> > # DIH-data-config.xml (in the same dir as solrconfig.xml) ##
> >
> > 
> >   
> >   
> > 
> >  > fileName=".*xml"
> > newerThan="'NOW-5YEARS'"
> > recursive="true"
> > rootEntity="false"
> > dataSource="null"
> > 
> baseDir="/home/bodl-tei-svc/sites/bodl-tei-svc/var/data/tolkein_tei">
> >
> >   
> >
> >>   forEach="/TEI" url="${f.fileAbsolutePath}" 
> transformer="RegexTransformer" >
> >  xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/>
> >  xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/>
> >  xpath="/TEI/teiHeader/fileDesc/sourceDesc/msDesc/msIdentifier/altIdentifier/idno"/>
> >   
> >
> > 
> >
> >   
> > 
> >
> >
> > --
> > Chris Rogers
> > Digital Projects Manager
> > Bodleian Digital Library Systems and Services
> > chris.rog...@bodleian.ox.ac.uk
>
>


Problems with stored/not-stored field filter queries

2017-01-24 Thread Stanislav Sandalnikov
Hi everyone,

I’m facing strange Solr behavior, which could be better described in examples:


With indexed but not stored IndexDate field:

1) With this query everything works fine, I’m getting the results back:
/select?fl=taskid,docid,score&q=*:*&fq=category:"Security")))+AND+(datasource:(sites)))&fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0

2) With this query I get nothing:
/select?fl=taskid,docid,score&q=*:*&fq=category:"Security")))+AND+(datasource:(sites))+AND+(IndexDate:[NOW/DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0

Document’s IndexDate fit specified timeframe for sure. 

3) With this query there is no category filter, but there is timeframe filter 
and everything works fine:
/select?fl=taskid,docid,score&q=*:*&fq=((datasource:(sites))+AND+(IndexDate:[NOW/DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0

Then I decided that it might be related to IndexDate as it is not stored. I 
reindexed data with stored IndexDate field and now query number 2 works just 
fine.

However, I don’t get the logic here, why it doesn’t work in some particular 
case? Can someone explain? 

Thank you in advance

P.S. category field is indexed but not stored in all cases.

Regards
Stanislav






Re: Problems with stored/not-stored field filter queries

2017-01-24 Thread Shawn Heisey
On 1/24/2017 8:29 AM, Stanislav Sandalnikov wrote:
> With indexed but not stored IndexDate field:
>
> 1) With this query everything works fine, I’m getting the results back:
> /select?fl=taskid,docid,score&q=*:*&fq=category:"Security")))+AND+(datasource:(sites)))&fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0
>
> 2) With this query I get nothing:
> /select?fl=taskid,docid,score&q=*:*&fq=category:"Security")))+AND+(datasource:(sites))+AND+(IndexDate:[NOW/DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0
>
> Document’s IndexDate fit specified timeframe for sure. 
>
> 3) With this query there is no category filter, but there is timeframe filter 
> and everything works fine:
> /select?fl=taskid,docid,score&q=*:*&fq=((datasource:(sites))+AND+(IndexDate:[NOW/DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0

It sounds like there are no documents that fit all three conditions --
the correct datasource, the correct category, AND that date range.

> Then I decided that it might be related to IndexDate as it is not stored. I 
> reindexed data with stored IndexDate field and now query number 2 works just 
> fine.

Whether or not a field is stored does not affect searches, only which
fields are returned in the results.  My guess is that when you do this
reindex, there's some difference, so different information is being sent
to Solr and incorporated into the index.  There is also the possibility
that when you change the schema, that the changes are not limited to the
"stored" parameter on one field.  If you are changing the "type"
parameter for the field, the effect may be larger than you realize.

When you find that the search doesn't work, check the schema browser for
allthe fields used in your query, and load the top N terms on each one. 
You may find that there aren't any terms to load for some reason, or
that the terms that get loaded are not compatible with the part of the
query which is being done for that field.  The term info that the schema
browser shows is not be affected by whether or not the field is stored. 
Also in the schema browser, check all the settings for those fields when
it's working compared to when it's not working.  There may be a
difference other than the "stored" checkbox.

Here's a schema browser screenshot with terms loaded from a field that
would be compatible with your date range query:

https://www.dropbox.com/s/t5b4b6hrk0wz6oz/trie-date-schema-browser.png?dl=0

This screenshot comes from version 6.3.0 running in standalone mode.

If everything appears to be correct on your system when it's not working
compared to when it is working, then a bug is always a possibility, but
a bug like that would affect a LOT of people.  This list would have
heard from those people on the problem.  Such a bug should also get
caught by the numerous tests that are part of the Lucene/Solr source
code, tests that are frequently run by automated systems and Solr
developers working on the code.

Although your question did include more information than many do, it was
missing a number of important details.  Some of the more important
pieces are the version of Solr and concrete information from your
config/schema.

Thanks,
Shawn



Re: Problems with stored/not-stored field filter queries

2017-01-24 Thread Mikhail Khludnev
Hello Stanislav,
Stored fields have nothing which findability, I believe. Usually debugQuery
and explainOther is a right way to get what's going on there. What is $q ?
How it's supposed to work?

24 янв. 2017 г. 18:29 пользователь "Stanislav Sandalnikov" <
s.sandalni...@gmail.com> написал:

> Hi everyone,
>
> I’m facing strange Solr behavior, which could be better described in
> examples:
>
>
> With indexed but not stored IndexDate field:
>
> 1) With this query everything works fine, I’m getting the results back:
> /select?fl=taskid,docid,score&q=*:*&fq=category:"
> Security")))+AND+(datasource:(sites)))&fq={!frange+l%3D0}
> query($q)&sort=IndexDate+desc&rows=100&start=0
>
> 2) With this query I get nothing:
> /select?fl=taskid,docid,score&q=*:*&fq=category:"
> Security")))+AND+(datasource:(sites))+AND+(IndexDate:[NOW/
> DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&fq={!frange+l%3D0}query($q)&
> sort=IndexDate+desc&rows=100&start=0
>
> Document’s IndexDate fit specified timeframe for sure.
>
> 3) With this query there is no category filter, but there is timeframe
> filter and everything works fine:
> /select?fl=taskid,docid,score&q=*:*&fq=((datasource:(sites))
> +AND+(IndexDate:[NOW/DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&
> fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0
>
> Then I decided that it might be related to IndexDate as it is not stored.
> I reindexed data with stored IndexDate field and now query number 2 works
> just fine.
>
> However, I don’t get the logic here, why it doesn’t work in some
> particular case? Can someone explain?
>
> Thank you in advance
>
> P.S. category field is indexed but not stored in all cases.
>
> Regards
> Stanislav
>
>
>
>
>


RE: Indexing nested documents giving back unrelated parents when asking for children

2017-01-24 Thread Mikhail Khludnev
Fabien,
Giving this you have three levels can you update the sample code
accordingly? I might already replied on such question earlier, iirc filter
should enumerate all types beside of the certain one.

24 янв. 2017 г. 16:21 пользователь "Fabien Renaud" <
fabien.ren...@findwise.com> написал:

I know it works as expected when I set type_s:up as you describe. But I was
expecting no children at all in my query.

In my real query I have a document with several children and thus can't
specify a specific type with childFilter. And I can't give back all
children because some of them do not make any sense at all.
And the problem appears for an intermediate node (which has children and
which itself a child of another).

Fabien

-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org]
Sent: den 24 januari 2017 14:06
To: solr-user 
Subject: Re: Indexing nested documents giving back unrelated parents when
asking for children

Hello Fabien,

I believe parentFilter should be type_s:up, and consequently the type_s:up
should go in fq.

On Tue, Jan 24, 2017 at 3:30 PM, Fabien Renaud 
wrote:

> Hello,
>
> I'm wondering if I missed something in my code (which uses solrj 6.3):
>
> public class Main {
>
> private SolrClient client1;
>
> public void run() {
> client1 = new
> HttpSolrClient.Builder("http://localhost:8983/solr
> ").build();
>
> SolrInputDocument doc1 = new SolrInputDocument();
>
> doc1.addField("id", "1");
> doc1.addField("type_s", "up");
> SolrInputDocument doc2 = new SolrInputDocument();
>
> doc2.addField("id", "2");
> doc2.addField("type_s", "down");
>
> doc1.addChildDocument(doc2);
>
> SolrInputDocument doc4 = new SolrInputDocument();
> doc4.addField("id", "4");
> doc4.addField("type_s", "up");
>
> SolrInputDocument doc5 = new SolrInputDocument();
> doc5.addField("id", "5");
> doc5.addField("type_s", "down");
>
> doc4.addChildDocument(doc5);
>
> try {
> client1.add("techproducts", Arrays.asList(doc1,doc4));
> } catch (Exception e) {
> System.out.println("Indexing failed" + e);
> }
> }
>
> If I start Solr 6.3 using bin/start start -e techproduct and ask the
> following:
>
> http://localhost:8983/solr/techproducts/select?fl=*,[
> child%20parentFilter=type_s:down]&fq=type_s:down&indent=on&q=*:*&wt=js
> on
>
>
> then I get:
>
> {
>   "docs": [
> {
>   "id": "2",
>   "type_s": "down"
> },
> {
>   "id": "5",
>   "type_s": "down",
>   "_childDocuments_": [
> {
>   "id": "1",
>   "type_s": "up"
> }
>   ]
> }
>   ]
> }
>
> which seems to be a bug for me. Or did I miss something?
> Notice that the relations "2 is a child of 1" and "5 is a child of 4"
> are working fine. It's just that I get extra (unwanted and unrelated)
relations.
>
> Notice that at some point I manage to get back two documents with the
> __same__ id (with different version). I'm not able to reproduce this
> but I guess it could be related.
>
> Fabien
>
>


--
Sincerely yours
Mikhail Khludnev


Single call for distributed IDF?

2017-01-24 Thread Walter Underwood
I tried running with the LRUStatsCache for global IDF, but the performance 
penalty was pretty big. The 95th percentile response time went from 3.4 seconds 
to 13 seconds. Oops.

We should not need a separate call to get the tf and df stats. Those are 
already calculated when doing the first request. I worked on a search engine 
that did it that way twenty years ago.

In the past, there would have been an IP obstacle, but I think that is resolved.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




RE: Indexing nested documents giving back unrelated parents when asking for children

2017-01-24 Thread Fabien Renaud
But the problem is already there with only two levels.

If I change the code to add document to Solr by the following:
   client1.add(doc1);
   client1.commit();
   client1.add(doc4);
   client1.commit();

Then things work as expected as I get the follwing result (as well as the 
correct parent-child relation between 1,2 and 4;5):
"docs": [
  {
"id": "2"
  },
  {
"id": "5"
  }
]

Fabien
-Original Message-
From: Mikhail Khludnev [mailto:gge...@gmail.com] 
Sent: den 24 januari 2017 19:02
To: solr-user 
Subject: RE: Indexing nested documents giving back unrelated parents when 
asking for children

Fabien,
Giving this you have three levels can you update the sample code accordingly? I 
might already replied on such question earlier, iirc filter should enumerate 
all types beside of the certain one.

24 янв. 2017 г. 16:21 пользователь "Fabien Renaud" < 
fabien.ren...@findwise.com> написал:

I know it works as expected when I set type_s:up as you describe. But I was 
expecting no children at all in my query.

In my real query I have a document with several children and thus can't specify 
a specific type with childFilter. And I can't give back all children because 
some of them do not make any sense at all.
And the problem appears for an intermediate node (which has children and which 
itself a child of another).

Fabien

-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org]
Sent: den 24 januari 2017 14:06
To: solr-user 
Subject: Re: Indexing nested documents giving back unrelated parents when 
asking for children

Hello Fabien,

I believe parentFilter should be type_s:up, and consequently the type_s:up 
should go in fq.

On Tue, Jan 24, 2017 at 3:30 PM, Fabien Renaud 
wrote:

> Hello,
>
> I'm wondering if I missed something in my code (which uses solrj 6.3):
>
> public class Main {
>
> private SolrClient client1;
>
> public void run() {
> client1 = new
> HttpSolrClient.Builder("http://localhost:8983/solr
> ").build();
>
> SolrInputDocument doc1 = new SolrInputDocument();
>
> doc1.addField("id", "1");
> doc1.addField("type_s", "up");
> SolrInputDocument doc2 = new SolrInputDocument();
>
> doc2.addField("id", "2");
> doc2.addField("type_s", "down");
>
> doc1.addChildDocument(doc2);
>
> SolrInputDocument doc4 = new SolrInputDocument();
> doc4.addField("id", "4");
> doc4.addField("type_s", "up");
>
> SolrInputDocument doc5 = new SolrInputDocument();
> doc5.addField("id", "5");
> doc5.addField("type_s", "down");
>
> doc4.addChildDocument(doc5);
>
> try {
> client1.add("techproducts", Arrays.asList(doc1,doc4));
> } catch (Exception e) {
> System.out.println("Indexing failed" + e);
> }
> }
>
> If I start Solr 6.3 using bin/start start -e techproduct and ask the
> following:
>
> http://localhost:8983/solr/techproducts/select?fl=*,[
> child%20parentFilter=type_s:down]&fq=type_s:down&indent=on&q=*:*&wt=js
> on
>
>
> then I get:
>
> {
>   "docs": [
> {
>   "id": "2",
>   "type_s": "down"
> },
> {
>   "id": "5",
>   "type_s": "down",
>   "_childDocuments_": [
> {
>   "id": "1",
>   "type_s": "up"
> }
>   ]
> }
>   ]
> }
>
> which seems to be a bug for me. Or did I miss something?
> Notice that the relations "2 is a child of 1" and "5 is a child of 4"
> are working fine. It's just that I get extra (unwanted and unrelated)
relations.
>
> Notice that at some point I manage to get back two documents with the 
> __same__ id (with different version). I'm not able to reproduce this 
> but I guess it could be related.
>
> Fabien
>
>


--
Sincerely yours
Mikhail Khludnev


Re: Single call for distributed IDF?

2017-01-24 Thread Joel Bernstein
This may help out:
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/ScoreNodesStream.java#L208

This points to some code that calculates global idf for a list of terms.
Not sure if this matches you use case. It seems to be very fast.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood 
wrote:

> I tried running with the LRUStatsCache for global IDF, but the performance
> penalty was pretty big. The 95th percentile response time went from 3.4
> seconds to 13 seconds. Oops.
>
> We should not need a separate call to get the tf and df stats. Those are
> already calculated when doing the first request. I worked on a search
> engine that did it that way twenty years ago.
>
> In the past, there would have been an IP obstacle, but I think that is
> resolved.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>


Re: Single call for distributed IDF?

2017-01-24 Thread Walter Underwood
I know how to do it. You return df for each term and num_docs then recalculate 
idf. I wrote up how we did it in Ultraseek XPA about ten years ago, though with 
MonkeyRank instead of global IDF.

https://observer.wunderwood.org/2007/04/04/progressive-reranking/ 


I was wondering why Solr makes a separate request to each shard for that 
information instead of piggybacking it on the original request.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 24, 2017, at 10:34 AM, Joel Bernstein  wrote:
> 
> This may help out:
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/ScoreNodesStream.java#L208
> 
> This points to some code that calculates global idf for a list of terms.
> Not sure if this matches you use case. It seems to be very fast.
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood 
> wrote:
> 
>> I tried running with the LRUStatsCache for global IDF, but the performance
>> penalty was pretty big. The 95th percentile response time went from 3.4
>> seconds to 13 seconds. Oops.
>> 
>> We should not need a separate call to get the tf and df stats. Those are
>> already calculated when doing the first request. I worked on a search
>> engine that did it that way twenty years ago.
>> 
>> In the past, there would have been an IP obstacle, but I think that is
>> resolved.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>> 



Re: Single call for distributed IDF?

2017-01-24 Thread Joel Bernstein
Ah, I thought you were just interested in a fast way to get at IDF. This
approach does take a callback but it's really fast.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood 
wrote:

> I know how to do it. You return df for each term and num_docs then
> recalculate idf. I wrote up how we did it in Ultraseek XPA about ten years
> ago, though with MonkeyRank instead of global IDF.
>
> https://observer.wunderwood.org/2007/04/04/progressive-reranking/ <
> https://observer.wunderwood.org/2007/04/04/progressive-reranking/>
>
> I was wondering why Solr makes a separate request to each shard for that
> information instead of piggybacking it on the original request.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jan 24, 2017, at 10:34 AM, Joel Bernstein  wrote:
> >
> > This may help out:
> > https://github.com/apache/lucene-solr/blob/master/solr/
> solrj/src/java/org/apache/solr/client/solrj/io/stream/
> ScoreNodesStream.java#L208
> >
> > This points to some code that calculates global idf for a list of terms.
> > Not sure if this matches you use case. It seems to be very fast.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood  >
> > wrote:
> >
> >> I tried running with the LRUStatsCache for global IDF, but the
> performance
> >> penalty was pretty big. The 95th percentile response time went from 3.4
> >> seconds to 13 seconds. Oops.
> >>
> >> We should not need a separate call to get the tf and df stats. Those are
> >> already calculated when doing the first request. I worked on a search
> >> engine that did it that way twenty years ago.
> >>
> >> In the past, there would have been an IP obstacle, but I think that is
> >> resolved.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >>
>
>


Re: Single call for distributed IDF?

2017-01-24 Thread Walter Underwood
Specifically, I’m talking about this:

http://observer.wunderwood.org/  (my blog)


> On Jan 24, 2017, at 10:43 AM, Joel Bernstein  wrote:
> 
> Ah, I thought you were just interested in a fast way to get at IDF. This
> approach does take a callback but it's really fast.
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood 
> wrote:
> 
>> I know how to do it. You return df for each term and num_docs then
>> recalculate idf. I wrote up how we did it in Ultraseek XPA about ten years
>> ago, though with MonkeyRank instead of global IDF.
>> 
>> https://observer.wunderwood.org/2007/04/04/progressive-reranking/ <
>> https://observer.wunderwood.org/2007/04/04/progressive-reranking/>
>> 
>> I was wondering why Solr makes a separate request to each shard for that
>> information instead of piggybacking it on the original request.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jan 24, 2017, at 10:34 AM, Joel Bernstein  wrote:
>>> 
>>> This may help out:
>>> https://github.com/apache/lucene-solr/blob/master/solr/
>> solrj/src/java/org/apache/solr/client/solrj/io/stream/
>> ScoreNodesStream.java#L208
>>> 
>>> This points to some code that calculates global idf for a list of terms.
>>> Not sure if this matches you use case. It seems to be very fast.
>>> 
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>> 
>>> On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood >> 
>>> wrote:
>>> 
 I tried running with the LRUStatsCache for global IDF, but the
>> performance
 penalty was pretty big. The 95th percentile response time went from 3.4
 seconds to 13 seconds. Oops.
 
 We should not need a separate call to get the tf and df stats. Those are
 already calculated when doing the first request. I worked on a search
 engine that did it that way twenty years ago.
 
 In the past, there would have been an IP obstacle, but I think that is
 resolved.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
 
 
>> 
>> 



Re: Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

2017-01-24 Thread Doug Turnbull
Just throwing this back out there as a bit more official. Finally got
around to documenting how I use it. You can also download the plugin jar
from github

http://opensourceconnections.com/blog/2017/01/23/our-solution-to-solr-multiterm-synonyms/
https://github.com/o19s/match-query-parser

Enjoy! GH Issues, Feedback, and PRs welcome

-Doug

On Mon, Sep 5, 2016 at 4:32 AM Alexandre Rafalovitch 
wrote:

> Looks interesting.
>
> I especially like "we analyze it" and then "we analyze/space-split it
> again" as the last tutorial example.
>
> Regards,
>Alex.
> P.s. Cool enough for http://solr.cool/ ?
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 2 September 2016 at 07:45, Doug Turnbull
>  wrote:
> > I wanted to solicit feedback on my query parser, the match query parser (
> > https://github.com/o19s/match-query-parser). It's a work in progress, so
> > any thoughts from the community would be welcome.
> >
> > The point of this query parser is that it's not a query parser!
> >
> > Instead, it's a way of selecting any analyzer to apply to the query
> string. I
> > use it for all kinds of things, finely controlling a bigram phrase
> search,
> > searching with stemmed vs exact variants of the query.
> >
> > But it's biggest value to me is as a fix for multiterm synonyms. Because
> > I'm not giving the user's query to any underlying query parser -- I'm
> > always just doing analysis. So I know my selected analyzer will not be
> > disrupted by whitespace-based query parsing prior to query analysis.
> >
> > Those of you also in the Elasticsearch community may be familiar with the
> > match query (
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
> > ). This is similar, except it also lets you select whether to turn the
> > resulting tokens into a term query body:(sea\ biscuit likes to fish) or a
> > phrase query body:"sea biscuit" likes to fish. See the examples above for
> > more.
> >
> > It's also similar to Solr's field query parser. However the field query
> > parser tries to turn the fully analyzed token stream into a phrase query.
> > Moreover, the field query parser can only select the field's own
> query-time
> > analyzer, while the match query parser let's you select an arbitrary
> > analyzer. So match has more bells and whistles and acts as a compliment
> to
> > the field qp.
> >
> > Thanks for any thoughts, feedback, or critiques
> >
> > Best,
> > -Doug
>


Re: Single call for distributed IDF?

2017-01-24 Thread Joel Bernstein
Ok my mistake, I was thinking you were writing your own component and
needed a fast way to get global IDF. You're looking for fast global IDF
during the scoring it sounds like. That seems like a reasonable thing to
want.

In the piggy backing approach you mention does the aggregator node parse
the query and fetch the IDF, then pass it along to the shards?



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jan 24, 2017 at 2:01 PM, Walter Underwood 
wrote:

> Specifically, I’m talking about this:
>
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jan 24, 2017, at 10:43 AM, Joel Bernstein  wrote:
> >
> > Ah, I thought you were just interested in a fast way to get at IDF. This
> > approach does take a callback but it's really fast.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood  >
> > wrote:
> >
> >> I know how to do it. You return df for each term and num_docs then
> >> recalculate idf. I wrote up how we did it in Ultraseek XPA about ten
> years
> >> ago, though with MonkeyRank instead of global IDF.
> >>
> >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/ <
> >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/>
> >>
> >> I was wondering why Solr makes a separate request to each shard for that
> >> information instead of piggybacking it on the original request.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >>> On Jan 24, 2017, at 10:34 AM, Joel Bernstein 
> wrote:
> >>>
> >>> This may help out:
> >>> https://github.com/apache/lucene-solr/blob/master/solr/
> >> solrj/src/java/org/apache/solr/client/solrj/io/stream/
> >> ScoreNodesStream.java#L208
> >>>
> >>> This points to some code that calculates global idf for a list of
> terms.
> >>> Not sure if this matches you use case. It seems to be very fast.
> >>>
> >>> Joel Bernstein
> >>> http://joelsolr.blogspot.com/
> >>>
> >>> On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood <
> wun...@wunderwood.org
> >>>
> >>> wrote:
> >>>
>  I tried running with the LRUStatsCache for global IDF, but the
> >> performance
>  penalty was pretty big. The 95th percentile response time went from
> 3.4
>  seconds to 13 seconds. Oops.
> 
>  We should not need a separate call to get the tf and df stats. Those
> are
>  already calculated when doing the first request. I worked on a search
>  engine that did it that way twenty years ago.
> 
>  In the past, there would have been an IP obstacle, but I think that is
>  resolved.
> 
>  wunder
>  Walter Underwood
>  wun...@wunderwood.org
>  http://observer.wunderwood.org/  (my blog)
> 
> 
> 
> >>
> >>
>
>


Re: Single call for distributed IDF?

2017-01-24 Thread Joel Bernstein
Reading your blogs now.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jan 24, 2017 at 3:28 PM, Joel Bernstein  wrote:

> Ok my mistake, I was thinking you were writing your own component and
> needed a fast way to get global IDF. You're looking for fast global IDF
> during the scoring it sounds like. That seems like a reasonable thing to
> want.
>
> In the piggy backing approach you mention does the aggregator node parse
> the query and fetch the IDF, then pass it along to the shards?
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Jan 24, 2017 at 2:01 PM, Walter Underwood 
> wrote:
>
>> Specifically, I’m talking about this:
>>
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> > On Jan 24, 2017, at 10:43 AM, Joel Bernstein 
>> wrote:
>> >
>> > Ah, I thought you were just interested in a fast way to get at IDF. This
>> > approach does take a callback but it's really fast.
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood <
>> wun...@wunderwood.org>
>> > wrote:
>> >
>> >> I know how to do it. You return df for each term and num_docs then
>> >> recalculate idf. I wrote up how we did it in Ultraseek XPA about ten
>> years
>> >> ago, though with MonkeyRank instead of global IDF.
>> >>
>> >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/ <
>> >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/>
>> >>
>> >> I was wondering why Solr makes a separate request to each shard for
>> that
>> >> information instead of piggybacking it on the original request.
>> >>
>> >> wunder
>> >> Walter Underwood
>> >> wun...@wunderwood.org
>> >> http://observer.wunderwood.org/  (my blog)
>> >>
>> >>
>> >>> On Jan 24, 2017, at 10:34 AM, Joel Bernstein 
>> wrote:
>> >>>
>> >>> This may help out:
>> >>> https://github.com/apache/lucene-solr/blob/master/solr/
>> >> solrj/src/java/org/apache/solr/client/solrj/io/stream/
>> >> ScoreNodesStream.java#L208
>> >>>
>> >>> This points to some code that calculates global idf for a list of
>> terms.
>> >>> Not sure if this matches you use case. It seems to be very fast.
>> >>>
>> >>> Joel Bernstein
>> >>> http://joelsolr.blogspot.com/
>> >>>
>> >>> On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood <
>> wun...@wunderwood.org
>> >>>
>> >>> wrote:
>> >>>
>>  I tried running with the LRUStatsCache for global IDF, but the
>> >> performance
>>  penalty was pretty big. The 95th percentile response time went from
>> 3.4
>>  seconds to 13 seconds. Oops.
>> 
>>  We should not need a separate call to get the tf and df stats. Those
>> are
>>  already calculated when doing the first request. I worked on a search
>>  engine that did it that way twenty years ago.
>> 
>>  In the past, there would have been an IP obstacle, but I think that
>> is
>>  resolved.
>> 
>>  wunder
>>  Walter Underwood
>>  wun...@wunderwood.org
>>  http://observer.wunderwood.org/  (my blog)
>> 
>> 
>> 
>> >>
>> >>
>>
>>
>


Re: Multivalued Fields queries for Occurences.

2017-01-24 Thread slee
Anyone?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multivalued-Fields-queries-for-Occurences-tp4315482p4315580.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multivalued Fields queries for Occurences.

2017-01-24 Thread Erick Erickson
You might be able to do something with termfreq here:
https://cwiki.apache.org/confluence/display/solr/Function+Queries as
well as some of the conditionals.

You need to be sure that those functions are in your version of Solr
of course...

Best,
Erick

On Tue, Jan 24, 2017 at 12:42 PM, slee  wrote:
> Anyone?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multivalued-Fields-queries-for-Occurences-tp4315482p4315580.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problems with stored/not-stored field filter queries

2017-01-24 Thread Stanislav Sandalnikov
Hi Shawn, 

Thanks a lot for your valuable input. Of course you were right, the data was 
changed after reindex step, I completely forgot that categories are done by 
separate application and this application was pushing empty IndexDate field 
after update, because it couldn’t extract a value from it since the field was 
not stored. By the way is there any way to see if there is a index for some 
particular field of a document?

Stanislav

> 24 янв. 2017 г., в 23:43, Shawn Heisey  написал(а):
> 
> On 1/24/2017 8:29 AM, Stanislav Sandalnikov wrote:
>> With indexed but not stored IndexDate field:
>> 
>> 1) With this query everything works fine, I’m getting the results back:
>> /select?fl=taskid,docid,score&q=*:*&fq=category:"Security")))+AND+(datasource:(sites)))&fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0
>> 
>> 2) With this query I get nothing:
>> /select?fl=taskid,docid,score&q=*:*&fq=category:"Security")))+AND+(datasource:(sites))+AND+(IndexDate:[NOW/DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0
>> 
>> Document’s IndexDate fit specified timeframe for sure. 
>> 
>> 3) With this query there is no category filter, but there is timeframe 
>> filter and everything works fine:
>> /select?fl=taskid,docid,score&q=*:*&fq=((datasource:(sites))+AND+(IndexDate:[NOW/DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0
> 
> It sounds like there are no documents that fit all three conditions --
> the correct datasource, the correct category, AND that date range.
> 
>> Then I decided that it might be related to IndexDate as it is not stored. I 
>> reindexed data with stored IndexDate field and now query number 2 works just 
>> fine.
> 
> Whether or not a field is stored does not affect searches, only which
> fields are returned in the results.  My guess is that when you do this
> reindex, there's some difference, so different information is being sent
> to Solr and incorporated into the index.  There is also the possibility
> that when you change the schema, that the changes are not limited to the
> "stored" parameter on one field.  If you are changing the "type"
> parameter for the field, the effect may be larger than you realize.
> 
> When you find that the search doesn't work, check the schema browser for
> allthe fields used in your query, and load the top N terms on each one. 
> You may find that there aren't any terms to load for some reason, or
> that the terms that get loaded are not compatible with the part of the
> query which is being done for that field.  The term info that the schema
> browser shows is not be affected by whether or not the field is stored. 
> Also in the schema browser, check all the settings for those fields when
> it's working compared to when it's not working.  There may be a
> difference other than the "stored" checkbox.
> 
> Here's a schema browser screenshot with terms loaded from a field that
> would be compatible with your date range query:
> 
> https://www.dropbox.com/s/t5b4b6hrk0wz6oz/trie-date-schema-browser.png?dl=0
> 
> This screenshot comes from version 6.3.0 running in standalone mode.
> 
> If everything appears to be correct on your system when it's not working
> compared to when it is working, then a bug is always a possibility, but
> a bug like that would affect a LOT of people.  This list would have
> heard from those people on the problem.  Such a bug should also get
> caught by the numerous tests that are part of the Lucene/Solr source
> code, tests that are frequently run by automated systems and Solr
> developers working on the code.
> 
> Although your question did include more information than many do, it was
> missing a number of important details.  Some of the more important
> pieces are the version of Solr and concrete information from your
> config/schema.
> 
> Thanks,
> Shawn
> 



Re: Problems with stored/not-stored field filter queries

2017-01-24 Thread Stanislav Sandalnikov
Thanks Mikhail, didn’t know about debugQuery and explainOther, could be useful.

Regarding $q, you can find this information here - 
https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-AvailableFunctions
 

 scroll down to «query» function

Stanislav

> 25 янв. 2017 г., в 0:56, Mikhail Khludnev  написал(а):
> 
> Hello Stanislav,
> Stored fields have nothing which findability, I believe. Usually debugQuery
> and explainOther is a right way to get what's going on there. What is $q ?
> How it's supposed to work?
> 
> 24 янв. 2017 г. 18:29 пользователь "Stanislav Sandalnikov" <
> s.sandalni...@gmail.com> написал:
> 
>> Hi everyone,
>> 
>> I’m facing strange Solr behavior, which could be better described in
>> examples:
>> 
>> 
>> With indexed but not stored IndexDate field:
>> 
>> 1) With this query everything works fine, I’m getting the results back:
>> /select?fl=taskid,docid,score&q=*:*&fq=category:"
>> Security")))+AND+(datasource:(sites)))&fq={!frange+l%3D0}
>> query($q)&sort=IndexDate+desc&rows=100&start=0
>> 
>> 2) With this query I get nothing:
>> /select?fl=taskid,docid,score&q=*:*&fq=category:"
>> Security")))+AND+(datasource:(sites))+AND+(IndexDate:[NOW/
>> DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&fq={!frange+l%3D0}query($q)&
>> sort=IndexDate+desc&rows=100&start=0
>> 
>> Document’s IndexDate fit specified timeframe for sure.
>> 
>> 3) With this query there is no category filter, but there is timeframe
>> filter and everything works fine:
>> /select?fl=taskid,docid,score&q=*:*&fq=((datasource:(sites))
>> +AND+(IndexDate:[NOW/DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&
>> fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0
>> 
>> Then I decided that it might be related to IndexDate as it is not stored.
>> I reindexed data with stored IndexDate field and now query number 2 works
>> just fine.
>> 
>> However, I don’t get the logic here, why it doesn’t work in some
>> particular case? Can someone explain?
>> 
>> Thank you in advance
>> 
>> P.S. category field is indexed but not stored in all cases.
>> 
>> Regards
>> Stanislav
>> 
>> 
>> 
>> 
>> 



Re: Latest advice on G1 collector?

2017-01-24 Thread Shawn Heisey
On 1/23/2017 1:00 PM, Walter Underwood wrote:
> We have a workload with very long queries, and that can drive the CMS 
> collector into using about 20% of the CPU time. So I’m ready to try G1 on a 
> couple of replicas and see what happens. I’ve already upgraded to Java 8 
> update 121.
>
> I’ve read these pages:
>
> https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector 
> 
> https://gist.github.com/rockagen/e6d28244e1d540c05144370d6a64ba66 
> 

I would really like a concrete reason as to why the Lucene wiki
recommends NEVER using G1.

https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_.2F_OpenJDK_Bugs

If that recommendation can be backed up with demonstrable problems, then
it would make sense.  I took a look through some of the email history
sent by Jenkins, which runs automated testing of Lucene and Solr using
various configurations and Java versions.  Problems that were detected
on tests run with the G1 collector *also* happen on test runs using
other collectors.  The number of new messages from tests using G1 are a
very minor portion of the total number of new messages.  If G1 were a
root cause of big problems, I would expect the number of new failures
using G1 to be somewhere near half of the total, possibly more.

As many of you know, Solr's essential functionality comes from Lucene,
so this does matter for Solr.

I myself have never had an issue running Solr with the G1 collector.  I
haven't found any open and substantiated bugs on Lucene or Solr that
document real problems with G1 on a 64-bit JVM.  There is one bug that
happens on a 32-bit JVM ... but most users are NOT limited to 32-bit. 
For those that are limited that way, CMS is probably plenty fast because
the heap can't go beyond 2GB.

For my production and dev systems, the 4.x versions are running the G1
collector.  Most of the 5.x and later installs are using the GC tuning
that Solr contains by default, which is CMS.

Thanks,
Shawn



Re: Problems with stored/not-stored field filter queries

2017-01-24 Thread Erick Erickson
bq: By the way is there any way to see if there is a index for some
particular field of a document?

not really conveniently. To know that you have to unwind the inverted
index. The "luke" program can do this. Of course if the field is
_stored_ it's easy, just return q=id:doc_id&fl=*


Also note that in 6x, the above will work for returning the value form
a docValues field even if the field isn't stored,
https://cwiki.apache.org/confluence/display/solr/DocValues. Lots of
background here: https://issues.apache.org/jira/browse/SOLR-8220.

Best,
Erick

On Tue, Jan 24, 2017 at 5:19 PM, Stanislav Sandalnikov
 wrote:
> Thanks Mikhail, didn’t know about debugQuery and explainOther, could be 
> useful.
>
> Regarding $q, you can find this information here - 
> https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-AvailableFunctions
>  
> 
>  scroll down to «query» function
>
> Stanislav
>
>> 25 янв. 2017 г., в 0:56, Mikhail Khludnev  написал(а):
>>
>> Hello Stanislav,
>> Stored fields have nothing which findability, I believe. Usually debugQuery
>> and explainOther is a right way to get what's going on there. What is $q ?
>> How it's supposed to work?
>>
>> 24 янв. 2017 г. 18:29 пользователь "Stanislav Sandalnikov" <
>> s.sandalni...@gmail.com> написал:
>>
>>> Hi everyone,
>>>
>>> I’m facing strange Solr behavior, which could be better described in
>>> examples:
>>>
>>>
>>> With indexed but not stored IndexDate field:
>>>
>>> 1) With this query everything works fine, I’m getting the results back:
>>> /select?fl=taskid,docid,score&q=*:*&fq=category:"
>>> Security")))+AND+(datasource:(sites)))&fq={!frange+l%3D0}
>>> query($q)&sort=IndexDate+desc&rows=100&start=0
>>>
>>> 2) With this query I get nothing:
>>> /select?fl=taskid,docid,score&q=*:*&fq=category:"
>>> Security")))+AND+(datasource:(sites))+AND+(IndexDate:[NOW/
>>> DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&fq={!frange+l%3D0}query($q)&
>>> sort=IndexDate+desc&rows=100&start=0
>>>
>>> Document’s IndexDate fit specified timeframe for sure.
>>>
>>> 3) With this query there is no category filter, but there is timeframe
>>> filter and everything works fine:
>>> /select?fl=taskid,docid,score&q=*:*&fq=((datasource:(sites))
>>> +AND+(IndexDate:[NOW/DAY-27DAYS+TO+NOW/DAY%2B1DAY]))&
>>> fq={!frange+l%3D0}query($q)&sort=IndexDate+desc&rows=100&start=0
>>>
>>> Then I decided that it might be related to IndexDate as it is not stored.
>>> I reindexed data with stored IndexDate field and now query number 2 works
>>> just fine.
>>>
>>> However, I don’t get the logic here, why it doesn’t work in some
>>> particular case? Can someone explain?
>>>
>>> Thank you in advance
>>>
>>> P.S. category field is indexed but not stored in all cases.
>>>
>>> Regards
>>> Stanislav
>>>
>>>
>>>
>>>
>>>
>


Re: Problems with stored/not-stored field filter queries

2017-01-24 Thread Shawn Heisey
On 1/24/2017 6:15 PM, Stanislav Sandalnikov wrote:
> Thanks a lot for your valuable input. Of course you were right, the
> data was changed after reindex step, I completely forgot that
> categories are done by separate application and this application was
> pushing empty IndexDate field after update, because it couldn’t
> extract a value from it since the field was not stored. By the way is
> there any way to see if there is a index for some particular field of
> a document?

As long as the field doesn't have docValues enabled, the facet feature
can be used to drill down into what terms are in the index for that
field in a document.  If docValues are enabled, then you would only see
the originally indexed input, not the analyzed terms.  If your field
meets this requirement, do a facet on the field, and use the q/fq
parameters to search for a single document, probably by ID.

There is also a separate program called Luke that can inspect Lucene
indexes down to the individual document level.

Thanks,
Shawn



Solr Search Handler Suggestion

2017-01-24 Thread Moenieb Davids
Hi Guys,

Just an Idea for easier config of search handlers:

Will it be feasible to configure a search handler that has its own schema based 
on the current core as well as inserting nested objects from cross core queries.

Example (for illustration purpose, ignore syntax :) )


  

  

  
http://localhost:8983/solr/items"; query="user_liking_this:${thiscore.id}" >
  
  


  
http://localhost:8983/solr/items"; query="user_liking_this:${thiscore.id}" >
  
  




This will allow you to create endpoints to interact and return fields and their 
values from others cores and seems to be possibly easier to manage?










===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===