Search over XML data using xpath

2016-04-01 Thread Miguel Valencia Zurera

Hi everybody

I'm looking for the way to store XML file and keep on hierarchy of the 
data because I need show full xml and besides to search inside of nodes 
of xml.
Only I have found XPathEntityProcessor for import xml but it does not 
keep on the hierarchy of the data.


I have not found one type of field that it allow store xml and to do 
xpath. So, I have thougth parser all fields of xml file and additionaly 
add a new field with the full xml.


Is there another option?
thanks


Re: Deleted documents and expungeDeletes

2016-04-01 Thread Jostein Elvaker Haande
On 30 March 2016 at 17:46, Erick Erickson  wrote:
> through a clever bit of reflection, you can set the
> reclaimDeletesWeight variable from solrconfig by including something
> like
> 5 (going from memory
> here, you'll get an error on startup if I've messed it up.)

I added the following to my solrconfig a couple of days ago:


  8
  8
  5.0


There has been several commits and the core is current according to
SOLR admin, however I'm still seeing a lot of deleted docs. These are
my current core statistics.

Last Modified:4 minutes ago
Num Docs:1 675 255
Max Doc:2 353 476
Heap Memory Usage:208 464 267
Deleted Docs:678 221
Version:1 870 539
Segment Count:39

Index size is close to 149GB.

So at the moment, I'm seeing a deleted docs to max docs percentage
ratio of 28.81%. With 'reclaimsWeight' set to 5, it doesn't seem to be
deleting away any deleted docs.

Anything obvious I'm missing?

-- 
Yours sincerely Jostein Elvaker Haande
"A free society is a society where it is safe to be unpopular"
- Adlai Stevenson

http://tolecnal.net -- tolecnal at tolecnal dot net


Re: Deleted documents and expungeDeletes

2016-04-01 Thread David Santamauro


The docs on reclaimDeletesWeight say:

"Controls how aggressively merges that reclaim more deletions are 
favored. Higher values favor selecting merges that reclaim deletions."


I can't imagine you would notice anything after only a few commits. I 
have many shards that size or larger and what I do occasionally is to 
loop an optimize, setting maxSegments with decremented values, e.g.,


for maxSegments in $( seq 40 -1 20 ); do
  # optimize maxSegments=$maxSegments
done

It's definitely a poor-man's hack and is clearly not the most efficient 
way of optimizing, but it does remove deletes without requiring double 
or triple the disk space that a full optimize requires. I can usually 
reclaim 100-300GB of disk space in a collection that us currently ~ 2TB 
-- not inconsequential.


Seeing you only have 1.6M documents, perhaps an index rebuild isn't out 
of the question? I did just that on a test collection with 100M 
documents. Starting with 0 deleted docs, a reclaimDeletesWeight=5.0 and 
probably about 1-3% document turnover per week (updates) over the last 3 
months and my deleted percentage is staying below 10%.


If that's not an option, keeping reclaimDeletesWeight at 5.0 and using 
expungeDeletes=true on commit will get that percentage down over time.


//


On 04/01/2016 04:49 AM, Jostein Elvaker Haande wrote:

On 30 March 2016 at 17:46, Erick Erickson  wrote:

through a clever bit of reflection, you can set the
reclaimDeletesWeight variable from solrconfig by including something
like
5 (going from memory
here, you'll get an error on startup if I've messed it up.)


I added the following to my solrconfig a couple of days ago:

 
   8
   8
   5.0
 

There has been several commits and the core is current according to
SOLR admin, however I'm still seeing a lot of deleted docs. These are
my current core statistics.

Last Modified:4 minutes ago
Num Docs:1 675 255
Max Doc:2 353 476
Heap Memory Usage:208 464 267
Deleted Docs:678 221
Version:1 870 539
Segment Count:39

Index size is close to 149GB.

So at the moment, I'm seeing a deleted docs to max docs percentage
ratio of 28.81%. With 'reclaimsWeight' set to 5, it doesn't seem to be
deleting away any deleted docs.

Anything obvious I'm missing?



Re: Search over XML data using xpath

2016-04-01 Thread Alessandro Benedetti
Let's try to make clarity in here : Lucene Query Syntax is not XPath.
This means you can not search Lucene Documents as you do for xml nodes.

You need to model your information according to the Lucene Document ( and
children) structure.
Then you can play with the Lucene Query language and the different Solr
Query parsers.
What the XPathEntityProcessor does, is allowing you to parse XPaths
statement and fetch values from the XML to be indexed in specific Lucene
Fields.

Then you can search as you like.

Returning the whole XML is another problem.
You can simply store it in an additional field ( without indexing).
Or you can store the URL and then pass it at front end level ...
Ect ect

Cheers

On Fri, Apr 1, 2016 at 9:41 AM, Miguel Valencia Zurera <
miguel.valen...@juntadeandalucia.es> wrote:

> Hi everybody
>
> I'm looking for the way to store XML file and keep on hierarchy of the
> data because I need show full xml and besides to search inside of nodes of
> xml.
> Only I have found XPathEntityProcessor for import xml but it does not keep
> on the hierarchy of the data.
>
> I have not found one type of field that it allow store xml and to do
> xpath. So, I have thougth parser all fields of xml file and additionaly add
> a new field with the full xml.
>
> Is there another option?
> thanks
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Sorting question

2016-04-01 Thread Tamás Barta
Hi,

I have a problem and I don't know how should I solve it in Solr.

I have products indexed. Every product can be in lists. It is possible that
a product isn't in any list or it is in multiple list.
In a list the products are ordered. I would like to search for products in
a specified list with the correct order.

Earlier I tried to create a field for every list the product is in and
these fields stored the index value of the product in that list.
For example: list_23=1, list_841=8, ...

After that I could do a query for list N: product where list_N exists order
by list_N asc
In this case the index size were extremely large because there are lot of
lists and I guess every document stored a value for every field (it was
long time ago, I don't know if it would be still a problem in current
version of Solr)

Do you have any idea how could I solve this problem?

Thanks,
Tamas


Re: Sorting question

2016-04-01 Thread Binoy Dalal
I don't think I understand your problem properly. Are you trying to
pre-sort the products?

On Fri, 1 Apr 2016, 19:49 Tamás Barta,  wrote:

> Hi,
>
> I have a problem and I don't know how should I solve it in Solr.
>
> I have products indexed. Every product can be in lists. It is possible that
> a product isn't in any list or it is in multiple list.
> In a list the products are ordered. I would like to search for products in
> a specified list with the correct order.
>
> Earlier I tried to create a field for every list the product is in and
> these fields stored the index value of the product in that list.
> For example: list_23=1, list_841=8, ...
>
> After that I could do a query for list N: product where list_N exists order
> by list_N asc
> In this case the index size were extremely large because there are lot of
> lists and I guess every document stored a value for every field (it was
> long time ago, I don't know if it would be still a problem in current
> version of Solr)
>
> Do you have any idea how could I solve this problem?
>
> Thanks,
> Tamas
>
-- 
Regards,
Binoy Dalal


Re: Sorting question

2016-04-01 Thread Alessandro Benedetti
I think this is a classic XY Problem , you are trying to solve X with Y ,
and you are asking us about Y .
Could you describe us what is your X problem ? What are you trying to do
with this ordered lists ?

If not I would add a field to the product called :
list_position ( or a similar name) of type geo point (x,y) .
X could be your list ID
Y the position.
Then you can play with spatial search, to get what you want.

But again, let's try to solve X.

Cheers


Re: Sorting question

2016-04-01 Thread Tamás Barta
For example I have to display sellable products which are in list X in the
correct order.

If I add a "status" and "list" (multivalued) fields to every document
(products), then I can execute a query: status:sellable AND list:X, where X
is the ID of the list. The list field contains IDs of the list in which the
product is in.

The problem is that I can't sort the result. A product has different index
for every list.

Is it clear now?

Earlier I added a "listpos" field with multivalue content, for example:

1:23
2:4

Which means that this product is in position 23 in list 1 and it is in
position 4 in list 2. After that I created a custom comparator which parses
field values to get index for the specified list and sorts by that index.

But I didn't like that solution much. I wish there would be a better
solution. In SolrJ unfortunately I can't find an API to set custom
comparator like I did in Lucene. So I don't know how to solve this problem
in Solr.

Thanks,
Tamás
2016. ápr. 1. 17:25 ezt írta ("Alessandro Benedetti" ):

> I think this is a classic XY Problem , you are trying to solve X with Y ,
> and you are asking us about Y .
> Could you describe us what is your X problem ? What are you trying to do
> with this ordered lists ?
>
> If not I would add a field to the product called :
> list_position ( or a similar name) of type geo point (x,y) .
> X could be your list ID
> Y the position.
> Then you can play with spatial search, to get what you want.
>
> But again, let's try to solve X.
>
> Cheers
>


Function Query Parsing problem in Solr 5.4.1 and Solr 5.5.0

2016-04-01 Thread Max Bridgewater
Hi,

I have the following configuration for firstSearcher handler in
solrconfig.xml:


  
  

  parts
  score desc, Review1 asc, Rank2 asc


  make
  {!func}sum(product(0.01,param1),
product(0.20,param2),  min(param2,0.4)) desc

  


This works great in Solr 4.10. However, in solr 5.4.1 and solr 5.5.0, I get
the below error. How do I write this kind of query with Solr 5?


Thanks,
Max.


ERROR org.apache.solr.handler.RequestHandlerBase  [   x:productsearch] –
org.apache.solr.common.SolrException: Can't determine a Sort Order (asc or
desc) in sort spec '{!func}sum(product(0.01,param1), product(0.20,param2),
min(param2,0.4)) desc', pos=32
at
org.apache.solr.search.SortSpecParsing.parseSortSpec(SortSpecParsing.java:143)
at org.apache.solr.search.QParser.getSort(QParser.java:247)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:18
7)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler
.java:247)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.jav
a:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
at
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:6
9)
at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1840)


Re: Sorting question

2016-04-01 Thread John Bickerstaff
Specifically, what drives the position in the list?  Is it arbitrary or is
it driven by some piece of data?

If data-driven - code could do the sorting based on that data...  separate
from SOLR...

Alternatively, if the data point exists in SOLR, a "sub-query" might be
used to get the right sort order on the items returned by the "main"
search...  Possibly without having to resort to the clunky-feeling listpos
multivalued field...

On Fri, Apr 1, 2016 at 10:32 AM, Tamás Barta  wrote:

> For example I have to display sellable products which are in list X in the
> correct order.
>
> If I add a "status" and "list" (multivalued) fields to every document
> (products), then I can execute a query: status:sellable AND list:X, where X
> is the ID of the list. The list field contains IDs of the list in which the
> product is in.
>
> The problem is that I can't sort the result. A product has different index
> for every list.
>
> Is it clear now?
>
> Earlier I added a "listpos" field with multivalue content, for example:
>
> 1:23
> 2:4
>
> Which means that this product is in position 23 in list 1 and it is in
> position 4 in list 2. After that I created a custom comparator which parses
> field values to get index for the specified list and sorts by that index.
>
> But I didn't like that solution much. I wish there would be a better
> solution. In SolrJ unfortunately I can't find an API to set custom
> comparator like I did in Lucene. So I don't know how to solve this problem
> in Solr.
>
> Thanks,
> Tamás
> 2016. ápr. 1. 17:25 ezt írta ("Alessandro Benedetti" <
> abenede...@apache.org
> >):
>
> > I think this is a classic XY Problem , you are trying to solve X with Y ,
> > and you are asking us about Y .
> > Could you describe us what is your X problem ? What are you trying to do
> > with this ordered lists ?
> >
> > If not I would add a field to the product called :
> > list_position ( or a similar name) of type geo point (x,y) .
> > X could be your list ID
> > Y the position.
> > Then you can play with spatial search, to get what you want.
> >
> > But again, let's try to solve X.
> >
> > Cheers
> >
>


Re: Sorting question

2016-04-01 Thread Tamás Barta
Some of the lists are created by users and some are generated by
applications, it doesn't matter.

It would be fine to solve it in Solr because Solr does the work of
filtering and pagination. If sorting were done outside than I would have to
read every document from Solr to sort them. It is not an option, I have to
query onle one page.

I don't understand how to solve it using subqueries.
2016. ápr. 1. 18:42 ezt írta ("John Bickerstaff" ):

> Specifically, what drives the position in the list?  Is it arbitrary or is
> it driven by some piece of data?
>
> If data-driven - code could do the sorting based on that data...  separate
> from SOLR...
>
> Alternatively, if the data point exists in SOLR, a "sub-query" might be
> used to get the right sort order on the items returned by the "main"
> search...  Possibly without having to resort to the clunky-feeling listpos
> multivalued field...
>
> On Fri, Apr 1, 2016 at 10:32 AM, Tamás Barta  wrote:
>
> > For example I have to display sellable products which are in list X in
> the
> > correct order.
> >
> > If I add a "status" and "list" (multivalued) fields to every document
> > (products), then I can execute a query: status:sellable AND list:X,
> where X
> > is the ID of the list. The list field contains IDs of the list in which
> the
> > product is in.
> >
> > The problem is that I can't sort the result. A product has different
> index
> > for every list.
> >
> > Is it clear now?
> >
> > Earlier I added a "listpos" field with multivalue content, for example:
> >
> > 1:23
> > 2:4
> >
> > Which means that this product is in position 23 in list 1 and it is in
> > position 4 in list 2. After that I created a custom comparator which
> parses
> > field values to get index for the specified list and sorts by that index.
> >
> > But I didn't like that solution much. I wish there would be a better
> > solution. In SolrJ unfortunately I can't find an API to set custom
> > comparator like I did in Lucene. So I don't know how to solve this
> problem
> > in Solr.
> >
> > Thanks,
> > Tamás
> > 2016. ápr. 1. 17:25 ezt írta ("Alessandro Benedetti" <
> > abenede...@apache.org
> > >):
> >
> > > I think this is a classic XY Problem , you are trying to solve X with
> Y ,
> > > and you are asking us about Y .
> > > Could you describe us what is your X problem ? What are you trying to
> do
> > > with this ordered lists ?
> > >
> > > If not I would add a field to the product called :
> > > list_position ( or a similar name) of type geo point (x,y) .
> > > X could be your list ID
> > > Y the position.
> > > Then you can play with spatial search, to get what you want.
> > >
> > > But again, let's try to solve X.
> > >
> > > Cheers
> > >
> >
>


Re: Sorting question

2016-04-01 Thread John Bickerstaff
Just to be clear - I don't mean who requests the list (application or user)
I mean what "rule" determines the ordering of the list?

Or, is there even a rule of any kind?

In other words, does a user arbitrarily decide that documentA, documentF,
and documentW should be on a list of their own?  For reasons known only to
the user?

Or - does the ordering of the list depend on some piece of data?  (like a
date, or a manufacturer, or a price range or any other piece of "hard" data)

===

To give an example from what I'm working on right now --

My subject matter experts have given me a rule that says:

*Documents of  content_type "bar" should come higher in the results than
documents of content_type "foo".*

PsuedoCode: If (content_type == bar) then put this doc highest in the
results.  If (content_type == foo) put those docs after the "bar"
content_type docs.


This is an example of the ordering being tied to a specific piece of data
which I can manipulate in a "sub query"  (that's probably the wrong term...)


This isn't exactly what you're doing, but it's close -- IF you have rules
you can express clearly in this way...

---

Also, I'm confused a little by your statement that SOLR does the filtering
and pagination, thus you can't sort the documents after Solr returns them...

My mental model is that you ask Solr for all the documents that match a
certain criteria.  Solr returns that "set" of documents and then for your
list, you sort those document titles or ID's according to some rule --
possibly in the javascript on the web page...  But perhaps I'm not
understanding your situation well enough...

Oh - are you perhaps saying that your ONLY criteria for getting these
documents is the list number?  That would make sense, although there may
still be room for sorting based on some kind of logic / data point outside
of SOlR.  You could get all the documents associated to list #4, and then
sort them based on some hard data point they all contain.  At the very
least, your listpos "array" becomes simpler...

What does your query currently look like?

On Fri, Apr 1, 2016 at 10:51 AM, Tamás Barta  wrote:

> Some of the lists are created by users and some are generated by
> applications, it doesn't matter.
>
> It would be fine to solve it in Solr because Solr does the work of
> filtering and pagination. If sorting were done outside than I would have to
> read every document from Solr to sort them. It is not an option, I have to
> query onle one page.
>
> I don't understand how to solve it using subqueries.
> 2016. ápr. 1. 18:42 ezt írta ("John Bickerstaff"  >):
>
> > Specifically, what drives the position in the list?  Is it arbitrary or
> is
> > it driven by some piece of data?
> >
> > If data-driven - code could do the sorting based on that data...
> separate
> > from SOLR...
> >
> > Alternatively, if the data point exists in SOLR, a "sub-query" might be
> > used to get the right sort order on the items returned by the "main"
> > search...  Possibly without having to resort to the clunky-feeling
> listpos
> > multivalued field...
> >
> > On Fri, Apr 1, 2016 at 10:32 AM, Tamás Barta 
> wrote:
> >
> > > For example I have to display sellable products which are in list X in
> > the
> > > correct order.
> > >
> > > If I add a "status" and "list" (multivalued) fields to every document
> > > (products), then I can execute a query: status:sellable AND list:X,
> > where X
> > > is the ID of the list. The list field contains IDs of the list in which
> > the
> > > product is in.
> > >
> > > The problem is that I can't sort the result. A product has different
> > index
> > > for every list.
> > >
> > > Is it clear now?
> > >
> > > Earlier I added a "listpos" field with multivalue content, for example:
> > >
> > > 1:23
> > > 2:4
> > >
> > > Which means that this product is in position 23 in list 1 and it is in
> > > position 4 in list 2. After that I created a custom comparator which
> > parses
> > > field values to get index for the specified list and sorts by that
> index.
> > >
> > > But I didn't like that solution much. I wish there would be a better
> > > solution. In SolrJ unfortunately I can't find an API to set custom
> > > comparator like I did in Lucene. So I don't know how to solve this
> > problem
> > > in Solr.
> > >
> > > Thanks,
> > > Tamás
> > > 2016. ápr. 1. 17:25 ezt írta ("Alessandro Benedetti" <
> > > abenede...@apache.org
> > > >):
> > >
> > > > I think this is a classic XY Problem , you are trying to solve X with
> > Y ,
> > > > and you are asking us about Y .
> > > > Could you describe us what is your X problem ? What are you trying to
> > do
> > > > with this ordered lists ?
> > > >
> > > > If not I would add a field to the product called :
> > > > list_position ( or a similar name) of type geo point (x,y) .
> > > > X could be your list ID
> > > > Y the position.
> > > > Then you can play with spatial search, to get what you want.
> > > >
> > > > But again, let's try to solve X.
> > > >
> > > > Cheers
>

Re: Sorting question

2016-04-01 Thread John Bickerstaff
Oh - and if you send a copy of your query - please include a human-readable
version of what your intent is...

Something like: Find all the documents that have "blue" in the color field
in addition to searching the title field for the user's search term..."

...Or whatever your intent is for this search.

On Fri, Apr 1, 2016 at 11:15 AM, John Bickerstaff 
wrote:

> Just to be clear - I don't mean who requests the list (application or
> user) I mean what "rule" determines the ordering of the list?
>
> Or, is there even a rule of any kind?
>
> In other words, does a user arbitrarily decide that documentA, documentF,
> and documentW should be on a list of their own?  For reasons known only to
> the user?
>
> Or - does the ordering of the list depend on some piece of data?  (like a
> date, or a manufacturer, or a price range or any other piece of "hard" data)
>
> ===
>
> To give an example from what I'm working on right now --
>
> My subject matter experts have given me a rule that says:
>
> *Documents of  content_type "bar" should come higher in the results than
> documents of content_type "foo".*
>
> PsuedoCode: If (content_type == bar) then put this doc highest in the
> results.  If (content_type == foo) put those docs after the "bar"
> content_type docs.
>
>
> This is an example of the ordering being tied to a specific piece of data
> which I can manipulate in a "sub query"  (that's probably the wrong term...)
>
>
> This isn't exactly what you're doing, but it's close -- IF you have rules
> you can express clearly in this way...
>
> ---
>
> Also, I'm confused a little by your statement that SOLR does the filtering
> and pagination, thus you can't sort the documents after Solr returns them...
>
> My mental model is that you ask Solr for all the documents that match a
> certain criteria.  Solr returns that "set" of documents and then for your
> list, you sort those document titles or ID's according to some rule --
> possibly in the javascript on the web page...  But perhaps I'm not
> understanding your situation well enough...
>
> Oh - are you perhaps saying that your ONLY criteria for getting these
> documents is the list number?  That would make sense, although there may
> still be room for sorting based on some kind of logic / data point outside
> of SOlR.  You could get all the documents associated to list #4, and then
> sort them based on some hard data point they all contain.  At the very
> least, your listpos "array" becomes simpler...
>
> What does your query currently look like?
>
> On Fri, Apr 1, 2016 at 10:51 AM, Tamás Barta  wrote:
>
>> Some of the lists are created by users and some are generated by
>> applications, it doesn't matter.
>>
>> It would be fine to solve it in Solr because Solr does the work of
>> filtering and pagination. If sorting were done outside than I would have
>> to
>> read every document from Solr to sort them. It is not an option, I have to
>> query onle one page.
>>
>> I don't understand how to solve it using subqueries.
>> 2016. ápr. 1. 18:42 ezt írta ("John Bickerstaff" <
>> j...@johnbickerstaff.com
>> >):
>>
>> > Specifically, what drives the position in the list?  Is it arbitrary or
>> is
>> > it driven by some piece of data?
>> >
>> > If data-driven - code could do the sorting based on that data...
>> separate
>> > from SOLR...
>> >
>> > Alternatively, if the data point exists in SOLR, a "sub-query" might be
>> > used to get the right sort order on the items returned by the "main"
>> > search...  Possibly without having to resort to the clunky-feeling
>> listpos
>> > multivalued field...
>> >
>> > On Fri, Apr 1, 2016 at 10:32 AM, Tamás Barta 
>> wrote:
>> >
>> > > For example I have to display sellable products which are in list X in
>> > the
>> > > correct order.
>> > >
>> > > If I add a "status" and "list" (multivalued) fields to every document
>> > > (products), then I can execute a query: status:sellable AND list:X,
>> > where X
>> > > is the ID of the list. The list field contains IDs of the list in
>> which
>> > the
>> > > product is in.
>> > >
>> > > The problem is that I can't sort the result. A product has different
>> > index
>> > > for every list.
>> > >
>> > > Is it clear now?
>> > >
>> > > Earlier I added a "listpos" field with multivalue content, for
>> example:
>> > >
>> > > 1:23
>> > > 2:4
>> > >
>> > > Which means that this product is in position 23 in list 1 and it is in
>> > > position 4 in list 2. After that I created a custom comparator which
>> > parses
>> > > field values to get index for the specified list and sorts by that
>> index.
>> > >
>> > > But I didn't like that solution much. I wish there would be a better
>> > > solution. In SolrJ unfortunately I can't find an API to set custom
>> > > comparator like I did in Lucene. So I don't know how to solve this
>> > problem
>> > > in Solr.
>> > >
>> > > Thanks,
>> > > Tamás
>> > > 2016. ápr. 1. 17:25 ezt írta ("Alessandro Benedetti" <
>> > > abenede...@apache.org
>> > > >):
>> > >
>> > > > I th

Re: Sorting question

2016-04-01 Thread Tamás Barta
So, the list order is determined by the user. The user creates a list, adds
products to it and i have to display these list using filters and
pagination.

Let's assume there is list with 1 products in it. In the website where
i display the list only 50 products are displayed in a page. So if i could
query solr to give me products from list X, ordered as user defined, but
only products with some criteria (status, amount, ..) from offset and 50
rows then it would be perfect and fast. If ordering would be outside of
solr then i have to retrive almost every 1 documents from solr (a bit
less if filtered) to order them and display the page of 50 products.
2016. ápr. 1. 19:15 ezt írta ("John Bickerstaff" ):

> Just to be clear - I don't mean who requests the list (application or user)
> I mean what "rule" determines the ordering of the list?
>
> Or, is there even a rule of any kind?
>
> In other words, does a user arbitrarily decide that documentA, documentF,
> and documentW should be on a list of their own?  For reasons known only to
> the user?
>
> Or - does the ordering of the list depend on some piece of data?  (like a
> date, or a manufacturer, or a price range or any other piece of "hard"
> data)
>
> ===
>
> To give an example from what I'm working on right now --
>
> My subject matter experts have given me a rule that says:
>
> *Documents of  content_type "bar" should come higher in the results than
> documents of content_type "foo".*
>
> PsuedoCode: If (content_type == bar) then put this doc highest in the
> results.  If (content_type == foo) put those docs after the "bar"
> content_type docs.
>
>
> This is an example of the ordering being tied to a specific piece of data
> which I can manipulate in a "sub query"  (that's probably the wrong
> term...)
>
>
> This isn't exactly what you're doing, but it's close -- IF you have rules
> you can express clearly in this way...
>
> ---
>
> Also, I'm confused a little by your statement that SOLR does the filtering
> and pagination, thus you can't sort the documents after Solr returns
> them...
>
> My mental model is that you ask Solr for all the documents that match a
> certain criteria.  Solr returns that "set" of documents and then for your
> list, you sort those document titles or ID's according to some rule --
> possibly in the javascript on the web page...  But perhaps I'm not
> understanding your situation well enough...
>
> Oh - are you perhaps saying that your ONLY criteria for getting these
> documents is the list number?  That would make sense, although there may
> still be room for sorting based on some kind of logic / data point outside
> of SOlR.  You could get all the documents associated to list #4, and then
> sort them based on some hard data point they all contain.  At the very
> least, your listpos "array" becomes simpler...
>
> What does your query currently look like?
>
> On Fri, Apr 1, 2016 at 10:51 AM, Tamás Barta  wrote:
>
> > Some of the lists are created by users and some are generated by
> > applications, it doesn't matter.
> >
> > It would be fine to solve it in Solr because Solr does the work of
> > filtering and pagination. If sorting were done outside than I would have
> to
> > read every document from Solr to sort them. It is not an option, I have
> to
> > query onle one page.
> >
> > I don't understand how to solve it using subqueries.
> > 2016. ápr. 1. 18:42 ezt írta ("John Bickerstaff" <
> j...@johnbickerstaff.com
> > >):
> >
> > > Specifically, what drives the position in the list?  Is it arbitrary or
> > is
> > > it driven by some piece of data?
> > >
> > > If data-driven - code could do the sorting based on that data...
> > separate
> > > from SOLR...
> > >
> > > Alternatively, if the data point exists in SOLR, a "sub-query" might be
> > > used to get the right sort order on the items returned by the "main"
> > > search...  Possibly without having to resort to the clunky-feeling
> > listpos
> > > multivalued field...
> > >
> > > On Fri, Apr 1, 2016 at 10:32 AM, Tamás Barta 
> > wrote:
> > >
> > > > For example I have to display sellable products which are in list X
> in
> > > the
> > > > correct order.
> > > >
> > > > If I add a "status" and "list" (multivalued) fields to every document
> > > > (products), then I can execute a query: status:sellable AND list:X,
> > > where X
> > > > is the ID of the list. The list field contains IDs of the list in
> which
> > > the
> > > > product is in.
> > > >
> > > > The problem is that I can't sort the result. A product has different
> > > index
> > > > for every list.
> > > >
> > > > Is it clear now?
> > > >
> > > > Earlier I added a "listpos" field with multivalue content, for
> example:
> > > >
> > > > 1:23
> > > > 2:4
> > > >
> > > > Which means that this product is in position 23 in list 1 and it is
> in
> > > > position 4 in list 2. After that I created a custom comparator which
> > > parses
> > > > field values to get index for the specified list and sorts by that
> > index.

Re: Sorting question

2016-04-01 Thread billnbell
Put the match into 2 separate fields and index it. Then sort in Solr by the 2 
fields is one way 

Bill Bell
Sent from mobile


> On Apr 1, 2016, at 11:15 AM, John Bickerstaff  
> wrote:
> 
> Just to be clear - I don't mean who requests the list (application or user)
> I mean what "rule" determines the ordering of the list?
> 
> Or, is there even a rule of any kind?
> 
> In other words, does a user arbitrarily decide that documentA, documentF,
> and documentW should be on a list of their own?  For reasons known only to
> the user?
> 
> Or - does the ordering of the list depend on some piece of data?  (like a
> date, or a manufacturer, or a price range or any other piece of "hard" data)
> 
> ===
> 
> To give an example from what I'm working on right now --
> 
> My subject matter experts have given me a rule that says:
> 
> *Documents of  content_type "bar" should come higher in the results than
> documents of content_type "foo".*
> 
> PsuedoCode: If (content_type == bar) then put this doc highest in the
> results.  If (content_type == foo) put those docs after the "bar"
> content_type docs.
> 
> 
> This is an example of the ordering being tied to a specific piece of data
> which I can manipulate in a "sub query"  (that's probably the wrong term...)
> 
> 
> This isn't exactly what you're doing, but it's close -- IF you have rules
> you can express clearly in this way...
> 
> ---
> 
> Also, I'm confused a little by your statement that SOLR does the filtering
> and pagination, thus you can't sort the documents after Solr returns them...
> 
> My mental model is that you ask Solr for all the documents that match a
> certain criteria.  Solr returns that "set" of documents and then for your
> list, you sort those document titles or ID's according to some rule --
> possibly in the javascript on the web page...  But perhaps I'm not
> understanding your situation well enough...
> 
> Oh - are you perhaps saying that your ONLY criteria for getting these
> documents is the list number?  That would make sense, although there may
> still be room for sorting based on some kind of logic / data point outside
> of SOlR.  You could get all the documents associated to list #4, and then
> sort them based on some hard data point they all contain.  At the very
> least, your listpos "array" becomes simpler...
> 
> What does your query currently look like?
> 
>> On Fri, Apr 1, 2016 at 10:51 AM, Tamás Barta  wrote:
>> 
>> Some of the lists are created by users and some are generated by
>> applications, it doesn't matter.
>> 
>> It would be fine to solve it in Solr because Solr does the work of
>> filtering and pagination. If sorting were done outside than I would have to
>> read every document from Solr to sort them. It is not an option, I have to
>> query onle one page.
>> 
>> I don't understand how to solve it using subqueries.
>> 2016. ápr. 1. 18:42 ezt írta ("John Bickerstaff" >> ):
>> 
>>> Specifically, what drives the position in the list?  Is it arbitrary or
>> is
>>> it driven by some piece of data?
>>> 
>>> If data-driven - code could do the sorting based on that data...
>> separate
>>> from SOLR...
>>> 
>>> Alternatively, if the data point exists in SOLR, a "sub-query" might be
>>> used to get the right sort order on the items returned by the "main"
>>> search...  Possibly without having to resort to the clunky-feeling
>> listpos
>>> multivalued field...
>>> 
 On Fri, Apr 1, 2016 at 10:32 AM, Tamás Barta 
>>> wrote:
>>> 
 For example I have to display sellable products which are in list X in
>>> the
 correct order.
 
 If I add a "status" and "list" (multivalued) fields to every document
 (products), then I can execute a query: status:sellable AND list:X,
>>> where X
 is the ID of the list. The list field contains IDs of the list in which
>>> the
 product is in.
 
 The problem is that I can't sort the result. A product has different
>>> index
 for every list.
 
 Is it clear now?
 
 Earlier I added a "listpos" field with multivalue content, for example:
 
 1:23
 2:4
 
 Which means that this product is in position 23 in list 1 and it is in
 position 4 in list 2. After that I created a custom comparator which
>>> parses
 field values to get index for the specified list and sorts by that
>> index.
 
 But I didn't like that solution much. I wish there would be a better
 solution. In SolrJ unfortunately I can't find an API to set custom
 comparator like I did in Lucene. So I don't know how to solve this
>>> problem
 in Solr.
 
 Thanks,
 Tamás
 2016. ápr. 1. 17:25 ezt írta ("Alessandro Benedetti" <
 abenede...@apache.org
> ):
 
> I think this is a classic XY Problem , you are trying to solve X with
>>> Y ,
> and you are asking us about Y .
> Could you describe us what is your X problem ? What are you trying to
>>> do
> with this ordered lists ?
> 
> If not I would add a fiel

Re: Sorting question

2016-04-01 Thread Tamás Barta
Sorry I don't know what you mean.

If "listpos" field contains multiple values like "list=pos" then is it
possible to order by field value where that field value fits a query?

For example list 1 contains: p1 and p2, list 2 contains p2 and p1 in this
order, then

p1 document has a listpos field with values "1=1" and "2=2"
p2 document has a listpos fiels with values "1=2" and "2=1"

And if list 1 should be displayed then i should say to solr: sort by field
value where listpos:1=*

Because i need products which is in list1, and i want to sort by this
matching term.
2016. ápr. 1. 19:35 ezt írta ( ):

> Put the match into 2 separate fields and index it. Then sort in Solr by
> the 2 fields is one way
>
> Bill Bell
> Sent from mobile
>
>
> > On Apr 1, 2016, at 11:15 AM, John Bickerstaff 
> wrote:
> >
> > Just to be clear - I don't mean who requests the list (application or
> user)
> > I mean what "rule" determines the ordering of the list?
> >
> > Or, is there even a rule of any kind?
> >
> > In other words, does a user arbitrarily decide that documentA, documentF,
> > and documentW should be on a list of their own?  For reasons known only
> to
> > the user?
> >
> > Or - does the ordering of the list depend on some piece of data?  (like a
> > date, or a manufacturer, or a price range or any other piece of "hard"
> data)
> >
> > ===
> >
> > To give an example from what I'm working on right now --
> >
> > My subject matter experts have given me a rule that says:
> >
> > *Documents of  content_type "bar" should come higher in the results than
> > documents of content_type "foo".*
> >
> > PsuedoCode: If (content_type == bar) then put this doc highest in the
> > results.  If (content_type == foo) put those docs after the "bar"
> > content_type docs.
> >
> >
> > This is an example of the ordering being tied to a specific piece of data
> > which I can manipulate in a "sub query"  (that's probably the wrong
> term...)
> >
> >
> > This isn't exactly what you're doing, but it's close -- IF you have rules
> > you can express clearly in this way...
> >
> > ---
> >
> > Also, I'm confused a little by your statement that SOLR does the
> filtering
> > and pagination, thus you can't sort the documents after Solr returns
> them...
> >
> > My mental model is that you ask Solr for all the documents that match a
> > certain criteria.  Solr returns that "set" of documents and then for your
> > list, you sort those document titles or ID's according to some rule --
> > possibly in the javascript on the web page...  But perhaps I'm not
> > understanding your situation well enough...
> >
> > Oh - are you perhaps saying that your ONLY criteria for getting these
> > documents is the list number?  That would make sense, although there may
> > still be room for sorting based on some kind of logic / data point
> outside
> > of SOlR.  You could get all the documents associated to list #4, and then
> > sort them based on some hard data point they all contain.  At the very
> > least, your listpos "array" becomes simpler...
> >
> > What does your query currently look like?
> >
> >> On Fri, Apr 1, 2016 at 10:51 AM, Tamás Barta 
> wrote:
> >>
> >> Some of the lists are created by users and some are generated by
> >> applications, it doesn't matter.
> >>
> >> It would be fine to solve it in Solr because Solr does the work of
> >> filtering and pagination. If sorting were done outside than I would
> have to
> >> read every document from Solr to sort them. It is not an option, I have
> to
> >> query onle one page.
> >>
> >> I don't understand how to solve it using subqueries.
> >> 2016. ápr. 1. 18:42 ezt írta ("John Bickerstaff" <
> j...@johnbickerstaff.com
> >>> ):
> >>
> >>> Specifically, what drives the position in the list?  Is it arbitrary or
> >> is
> >>> it driven by some piece of data?
> >>>
> >>> If data-driven - code could do the sorting based on that data...
> >> separate
> >>> from SOLR...
> >>>
> >>> Alternatively, if the data point exists in SOLR, a "sub-query" might be
> >>> used to get the right sort order on the items returned by the "main"
> >>> search...  Possibly without having to resort to the clunky-feeling
> >> listpos
> >>> multivalued field...
> >>>
>  On Fri, Apr 1, 2016 at 10:32 AM, Tamás Barta 
> >>> wrote:
> >>>
>  For example I have to display sellable products which are in list X in
> >>> the
>  correct order.
> 
>  If I add a "status" and "list" (multivalued) fields to every document
>  (products), then I can execute a query: status:sellable AND list:X,
> >>> where X
>  is the ID of the list. The list field contains IDs of the list in
> which
> >>> the
>  product is in.
> 
>  The problem is that I can't sort the result. A product has different
> >>> index
>  for every list.
> 
>  Is it clear now?
> 
>  Earlier I added a "listpos" field with multivalue content, for
> example:
> 
>  1:23
>  2:4
> 
>  Which means that this product is in positio

Re: Sorting question

2016-04-01 Thread John Bickerstaff
OK - I get it.  List order is totally arbitrary and cannot be tied to an
hard data point.

I'll have to think - Perhaps billnbell's solution will help, although I'm
not totally sure I understand that suggestion yet.

At this point, you could get all the documents for List X that match the
search terms.  The next problem is sorting.  If you have the listpos field
too, you could use that, and some regex to find the proper order for these
documents before displaying them (in code I mean) but of course that means
you need some kind of "interceptor" to deal with this before the results
are displayed.

If I had enough control to do this in code, behind the scenes, I'd grab
that second part of the listops field, put it into a variable on each
object and then sort by that.  Then I'd return the entire list to the UI.

I understand that if you could get SOLR to do it all, that would be
ideal...  There is the possibility of writing some new code and plugging it
in to Solr, but I'm guessing you don't want to go that far..  As a final
step in the process, with custom code to consume the listpos entry, sorting
these would be fairly straightforward.  I'm not sure how you get away from
the lispos multivalue field however...

I'll keep thinking...

On Fri, Apr 1, 2016 at 11:26 AM, Tamás Barta  wrote:

> So, the list order is determined by the user. The user creates a list, adds
> products to it and i have to display these list using filters and
> pagination.
>
> Let's assume there is list with 1 products in it. In the website where
> i display the list only 50 products are displayed in a page. So if i could
> query solr to give me products from list X, ordered as user defined, but
> only products with some criteria (status, amount, ..) from offset and 50
> rows then it would be perfect and fast. If ordering would be outside of
> solr then i have to retrive almost every 1 documents from solr (a bit
> less if filtered) to order them and display the page of 50 products.
> 2016. ápr. 1. 19:15 ezt írta ("John Bickerstaff"  >):
>
> > Just to be clear - I don't mean who requests the list (application or
> user)
> > I mean what "rule" determines the ordering of the list?
> >
> > Or, is there even a rule of any kind?
> >
> > In other words, does a user arbitrarily decide that documentA, documentF,
> > and documentW should be on a list of their own?  For reasons known only
> to
> > the user?
> >
> > Or - does the ordering of the list depend on some piece of data?  (like a
> > date, or a manufacturer, or a price range or any other piece of "hard"
> > data)
> >
> > ===
> >
> > To give an example from what I'm working on right now --
> >
> > My subject matter experts have given me a rule that says:
> >
> > *Documents of  content_type "bar" should come higher in the results than
> > documents of content_type "foo".*
> >
> > PsuedoCode: If (content_type == bar) then put this doc highest in the
> > results.  If (content_type == foo) put those docs after the "bar"
> > content_type docs.
> >
> >
> > This is an example of the ordering being tied to a specific piece of data
> > which I can manipulate in a "sub query"  (that's probably the wrong
> > term...)
> >
> >
> > This isn't exactly what you're doing, but it's close -- IF you have rules
> > you can express clearly in this way...
> >
> > ---
> >
> > Also, I'm confused a little by your statement that SOLR does the
> filtering
> > and pagination, thus you can't sort the documents after Solr returns
> > them...
> >
> > My mental model is that you ask Solr for all the documents that match a
> > certain criteria.  Solr returns that "set" of documents and then for your
> > list, you sort those document titles or ID's according to some rule --
> > possibly in the javascript on the web page...  But perhaps I'm not
> > understanding your situation well enough...
> >
> > Oh - are you perhaps saying that your ONLY criteria for getting these
> > documents is the list number?  That would make sense, although there may
> > still be room for sorting based on some kind of logic / data point
> outside
> > of SOlR.  You could get all the documents associated to list #4, and then
> > sort them based on some hard data point they all contain.  At the very
> > least, your listpos "array" becomes simpler...
> >
> > What does your query currently look like?
> >
> > On Fri, Apr 1, 2016 at 10:51 AM, Tamás Barta 
> wrote:
> >
> > > Some of the lists are created by users and some are generated by
> > > applications, it doesn't matter.
> > >
> > > It would be fine to solve it in Solr because Solr does the work of
> > > filtering and pagination. If sorting were done outside than I would
> have
> > to
> > > read every document from Solr to sort them. It is not an option, I have
> > to
> > > query onle one page.
> > >
> > > I don't understand how to solve it using subqueries.
> > > 2016. ápr. 1. 18:42 ezt írta ("John Bickerstaff" <
> > j...@johnbickerstaff.com
> > > >):
> > >
> > > > Specifically, what drives th

Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-04-01 Thread Girish Tavag
Hi Jack,

 I copied schema.xml from solr-5.5.0\example\example-DIH\solr\db\conf\
to  \solr-5.5.0\server\solr\myDatabase\conf\
I've attached the file too.

@Shawn
The file does not have any field which defined as  


On Fri, Apr 1, 2016 at 9:13 AM, Jack Krupansky 
wrote:

> Exactly which file did you copy? Please give the specific directory.
>
> -- Jack Krupansky
>
> On Thu, Mar 31, 2016 at 3:24 PM, Girish Tavag 
> wrote:
>
> > Hi Binoy,
> >
> >  I copied the entire file schema.xml from the working example provided by
> > solr itself. Solr provided dih example i'm able to run successfully .How
> > could this be a problem?
> >
> > On Fri, Apr 1, 2016 at 12:39 AM, Binoy Dalal 
> > wrote:
> >
> > > Somewhere in your schema you've defined a field with type as
> "booleans".
> > > You should check if you've made a typo somewhere by adding that extra s
> > > after boolean.
> > > Else if it is a separate field that you're looking to add, define a new
> > > fieldtype called booleans.
> > >
> > > All the info to help you with this can be found here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Documents,+Fields,+and+Schema+Design
> > >
> > > I higly recommend that you go through the documentation before
> starting.
> > >
> > > On Fri, 1 Apr 2016, 00:34 Girish Tavag, 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am new to solr, I started using this only from today,  when I
> wanted
> > to
> > > > create dih, i'm getting the below error.
> > > >
> > > > SolrException: fieldType 'booleans' not found in the schema
> > > >
> > > > What does this mean? and How  to resolve this.
> > > >
> > > > Regards,
> > > > GNT
> > > >
> > > --
> > > Regards,
> > > Binoy Dalal
> > >
> >
>





   
   

   



   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   

   
   
   

   
   
   
   
   
   

   

   
   

   


   
   



 
 id

 

 

  











































  

  




  



  




  




  
  




  




  





	

	

  
  




	

	

  




  








  
  







  




  









  




  




  
  




  







  








  



  


  



  



  




  


  




  
	
  
  
	
  



  
	
  
  
	
  


 















   

 


   



   







  




   
 

 
   
  




   





   
  




  






  




  






  




   



   
  




   



   
  




   







  




   





  




   





  




   




  




  







  




   





  




   








  




   








  




   





  




   








  




   




   
  




   




  




   





  


   

Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-04-01 Thread Girish Tavag
Here is the error message "*myDatabase:*
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
fieldType 'booleans' not found in the schema"

On Fri, Apr 1, 2016 at 11:41 PM, Girish Tavag 
wrote:

> Hi Jack,
>
>  I copied schema.xml from solr-5.5.0\example\example-DIH\solr\db\conf\
> to  \solr-5.5.0\server\solr\myDatabase\conf\
> I've attached the file too.
>
> @Shawn
> The file does not have any field which defined as   type="booleans" indexed="true" stored="true"/>
>
>
> On Fri, Apr 1, 2016 at 9:13 AM, Jack Krupansky 
> wrote:
>
>> Exactly which file did you copy? Please give the specific directory.
>>
>> -- Jack Krupansky
>>
>> On Thu, Mar 31, 2016 at 3:24 PM, Girish Tavag 
>> wrote:
>>
>> > Hi Binoy,
>> >
>> >  I copied the entire file schema.xml from the working example provided
>> by
>> > solr itself. Solr provided dih example i'm able to run successfully .How
>> > could this be a problem?
>> >
>> > On Fri, Apr 1, 2016 at 12:39 AM, Binoy Dalal 
>> > wrote:
>> >
>> > > Somewhere in your schema you've defined a field with type as
>> "booleans".
>> > > You should check if you've made a typo somewhere by adding that extra
>> s
>> > > after boolean.
>> > > Else if it is a separate field that you're looking to add, define a
>> new
>> > > fieldtype called booleans.
>> > >
>> > > All the info to help you with this can be found here:
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/solr/Documents,+Fields,+and+Schema+Design
>> > >
>> > > I higly recommend that you go through the documentation before
>> starting.
>> > >
>> > > On Fri, 1 Apr 2016, 00:34 Girish Tavag, 
>> > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I am new to solr, I started using this only from today,  when I
>> wanted
>> > to
>> > > > create dih, i'm getting the below error.
>> > > >
>> > > > SolrException: fieldType 'booleans' not found in the schema
>> > > >
>> > > > What does this mean? and How  to resolve this.
>> > > >
>> > > > Regards,
>> > > > GNT
>> > > >
>> > > --
>> > > Regards,
>> > > Binoy Dalal
>> > >
>> >
>>
>
>


Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-04-01 Thread Shawn Heisey
On 4/1/2016 12:11 PM, Girish Tavag wrote:
>  I copied schema.xml from solr-5.5.0\example\example-DIH\solr\db\conf\
>   to  \solr-5.5.0\server\solr\myDatabase\conf\
> I've attached the file too.
>
> @Shawn
> The file does not have any field which defined as   name="somefield" type="booleans" indexed="true" stored="true"/>

You are correct, "booleans" does not show up in that file.

Either the schema you sent is not the schema that's actually being used,
or the error message that you sent is not a precise copy/paste of what
you are seeing.  You may need to let us see the entire stacktrace for
any error messages in your logfile, as well as several lines before and
after each error.  If you can share the entire logfile, that would be
helpful.

FYI -- attaching files often does not work.  You got lucky -- typically
such attachments do not make it to the list.

Thanks,
Shawn



Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-04-01 Thread Girish Tavag
Hi Shawn,

 Thank you for responding and informing me about the attachments. Here is
the log file details..
016-04-01 18:24:08.191 INFO  (coreLoadExecutor-6-thread-1) [
x:myDatabase] o.a.s.c.CachingDirectoryFactory looking to close
solr-5.5.0\server\solr\myDatabase\data\index
[CachedDir>]
2016-04-01 18:24:08.192 INFO  (coreLoadExecutor-6-thread-1) [
x:myDatabase] o.a.s.c.CachingDirectoryFactory Closing directory:
solr-5.5.0\server\solr\myDatabase\data\index
2016-04-01 18:24:08.192 INFO  (coreLoadExecutor-6-thread-1) [
x:myDatabase] o.a.s.c.CachingDirectoryFactory looking to close
solr-5.5.0\server\solr\myDatabase\data
[CachedDir>]
2016-04-01 18:24:08.192 INFO  (coreLoadExecutor-6-thread-1) [
x:myDatabase] o.a.s.c.CachingDirectoryFactory Closing directory:
solr-5.5.0\server\solr\myDatabase\data
2016-04-01 18:24:08.193 ERROR (coreLoadExecutor-6-thread-1) [
x:myDatabase] o.a.s.c.CoreContainer Error creating core [myDatabase]:
fieldType 'booleans' not found in the schema
org.apache.solr.common.SolrException: fieldType 'booleans' not found in the
schema
at org.apache.solr.core.SolrCore.(SolrCore.java:820)
at org.apache.solr.core.SolrCore.(SolrCore.java:658)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:814)
at org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458)
at java.util.concurrent.FutureTask.run(Unknown Source)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: fieldType 'booleans' not
found in the schema
at
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$TypeMapping.populateValueClasses(AddSchemaFieldsUpdateProcessorFactory.java:247)
at
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory.inform(AddSchemaFieldsUpdateProcessorFactory.java:173)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:697)
at org.apache.solr.core.SolrCore.(SolrCore.java:800)
... 10 more
2016-04-01 18:24:08.196 ERROR (coreContainerWorkExecutor-2-thread-1) [   ]
o.a.s.c.CoreContainer Error waiting for SolrCore to be created
java.util.concurrent.ExecutionException:
org.apache.solr.common.SolrException: Unable to create core [myDatabase]
at java.util.concurrent.FutureTask.report(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at org.apache.solr.core.CoreContainer$2.run(CoreContainer.java:496)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Unable to create core
[myDatabase]
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:828)
at org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458)
... 5 more
Caused by: org.apache.solr.common.SolrException: fieldType 'booleans' not
found in the schema
at org.apache.solr.core.SolrCore.(SolrCore.java:820)
at org.apache.solr.core.SolrCore.(SolrCore.java:658)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:814)
... 8 more
Caused by: org.apache.solr.common.SolrException: fieldType 'booleans' not
found in the schema
at
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$TypeMapping.populateValueClasses(AddSchemaFieldsUpdateProcessorFactory.java:247)
at
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory.inform(AddSchemaFieldsUpdateProcessorFactory.java:173)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:697)
at org.apache.solr.core.SolrCore.(SolrCore.java:800)
... 10 more
2016-04-01 18:24:10.176 INFO  (qtp7980742-15) [   ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json} status=0
QTime=3426
2016-04-01 18:24:17.394 INFO  (qtp7980742-15) [   ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/cores
params={indexInfo=false&wt=json&_=1459535057377} status=0 QTime=2
2016-04-01 18:24:17.494 INFO  (qtp7980742-16) [   ] o.a.s.s.HttpSolrCall
[admin] web

Re: Sorting question

2016-04-01 Thread John Bickerstaff
Tamas,

I'm brainstorming here - not being careful, just throwing out ideas...

One thing that comes up is a separate document in SOLR - one doc for each
list.

If a user adds a doc to their list, that doc's id gets added to this other
type of document...

So, a document with the title "List 1" would have a multivalue field of
ID's and the list order number like so:

IDList Position
_
doc1 ID :   1
doc2 ID:2
doc3 ID:3

and so on...  The big problem I see with this is keeping it organized
correctly.  More code would have to be written to handle this when the user
does any kind of "crud" on the list...

I'm pretty sure there's a way to write a query that uses that list to
properly order the items returned by your primary search, although I
haven't written such a query yet.

If you have the luxury of NOT being in production yet with this system, I'd
seriously consider pushing to keep application metadata OUT of your product
information store.  This particular problem (of ordering the results based
on arbitrary user choices) might be more easily handled via a separate step
that queries a relational database to handle list order - once Solr gives
you the documents that match the query and the user's list number...

Even if you can't use another relational data store - keeping that metadata
out of your individual product documents could be argued to be a good
design idea...

+

Here's an alternative brainstorm...

Where does the user data live?  What about putting the information about
the order of document ID's in the User's lists with the User?  Then you can
get all documents that match the search terms and are on List X from Solr -
and then sort them by ID based on the data associated with the User (a list
of ID's, in order)

There is even a way to write a plugin that will go after external data to
help sort Solr documents, although I'm guessing you'd rather avoid that...



On Fri, Apr 1, 2016 at 11:59 AM, John Bickerstaff 
wrote:

> OK - I get it.  List order is totally arbitrary and cannot be tied to an
> hard data point.
>
> I'll have to think - Perhaps billnbell's solution will help, although I'm
> not totally sure I understand that suggestion yet.
>
> At this point, you could get all the documents for List X that match the
> search terms.  The next problem is sorting.  If you have the listpos field
> too, you could use that, and some regex to find the proper order for these
> documents before displaying them (in code I mean) but of course that means
> you need some kind of "interceptor" to deal with this before the results
> are displayed.
>
> If I had enough control to do this in code, behind the scenes, I'd grab
> that second part of the listops field, put it into a variable on each
> object and then sort by that.  Then I'd return the entire list to the UI.
>
> I understand that if you could get SOLR to do it all, that would be
> ideal...  There is the possibility of writing some new code and plugging it
> in to Solr, but I'm guessing you don't want to go that far..  As a final
> step in the process, with custom code to consume the listpos entry, sorting
> these would be fairly straightforward.  I'm not sure how you get away from
> the lispos multivalue field however...
>
> I'll keep thinking...
>
> On Fri, Apr 1, 2016 at 11:26 AM, Tamás Barta  wrote:
>
>> So, the list order is determined by the user. The user creates a list,
>> adds
>> products to it and i have to display these list using filters and
>> pagination.
>>
>> Let's assume there is list with 1 products in it. In the website where
>> i display the list only 50 products are displayed in a page. So if i could
>> query solr to give me products from list X, ordered as user defined, but
>> only products with some criteria (status, amount, ..) from offset and 50
>> rows then it would be perfect and fast. If ordering would be outside of
>> solr then i have to retrive almost every 1 documents from solr (a bit
>> less if filtered) to order them and display the page of 50 products.
>> 2016. ápr. 1. 19:15 ezt írta ("John Bickerstaff" <
>> j...@johnbickerstaff.com
>> >):
>>
>> > Just to be clear - I don't mean who requests the list (application or
>> user)
>> > I mean what "rule" determines the ordering of the list?
>> >
>> > Or, is there even a rule of any kind?
>> >
>> > In other words, does a user arbitrarily decide that documentA,
>> documentF,
>> > and documentW should be on a list of their own?  For reasons known only
>> to
>> > the user?
>> >
>> > Or - does the ordering of the list depend on some piece of data?  (like
>> a
>> > date, or a manufacturer, or a price range or any other piece of "hard"
>> > data)
>> >
>> > ===
>> >
>> > To give an example from what I'm working on right now --
>> >
>> > My subject matter experts have given me a rule that says:
>> >
>> > *Documents of  content_type "bar" should come higher in 

Re: Sorting question

2016-04-01 Thread John Bickerstaff
Tamas,

This feels a bit like a "user favorites" problem.

I did a little searching and found this...  Don't know if it will help, but
when I'm looking for stuff like this I find it helps to try to come up with
generic or different descriptions of my problem and go search those as
well...

http://stackoverflow.com/questions/3931827/solr-merging-results-of-2-cores-into-only-those-results-that-have-a-matching-fie

On Fri, Apr 1, 2016 at 12:40 PM, John Bickerstaff 
wrote:

> Tamas,
>
> I'm brainstorming here - not being careful, just throwing out ideas...
>
> One thing that comes up is a separate document in SOLR - one doc for each
> list.
>
> If a user adds a doc to their list, that doc's id gets added to this other
> type of document...
>
> So, a document with the title "List 1" would have a multivalue field of
> ID's and the list order number like so:
>
> IDList Position
> _
> doc1 ID :   1
> doc2 ID:2
> doc3 ID:3
>
> and so on...  The big problem I see with this is keeping it organized
> correctly.  More code would have to be written to handle this when the user
> does any kind of "crud" on the list...
>
> I'm pretty sure there's a way to write a query that uses that list to
> properly order the items returned by your primary search, although I
> haven't written such a query yet.
>
> If you have the luxury of NOT being in production yet with this system,
> I'd seriously consider pushing to keep application metadata OUT of your
> product information store.  This particular problem (of ordering the
> results based on arbitrary user choices) might be more easily handled via a
> separate step that queries a relational database to handle list order -
> once Solr gives you the documents that match the query and the user's list
> number...
>
> Even if you can't use another relational data store - keeping that
> metadata out of your individual product documents could be argued to be a
> good design idea...
>
> +
>
> Here's an alternative brainstorm...
>
> Where does the user data live?  What about putting the information about
> the order of document ID's in the User's lists with the User?  Then you can
> get all documents that match the search terms and are on List X from Solr -
> and then sort them by ID based on the data associated with the User (a list
> of ID's, in order)
>
> There is even a way to write a plugin that will go after external data to
> help sort Solr documents, although I'm guessing you'd rather avoid that...
>
>
>
> On Fri, Apr 1, 2016 at 11:59 AM, John Bickerstaff <
> j...@johnbickerstaff.com> wrote:
>
>> OK - I get it.  List order is totally arbitrary and cannot be tied to an
>> hard data point.
>>
>> I'll have to think - Perhaps billnbell's solution will help, although I'm
>> not totally sure I understand that suggestion yet.
>>
>> At this point, you could get all the documents for List X that match the
>> search terms.  The next problem is sorting.  If you have the listpos field
>> too, you could use that, and some regex to find the proper order for these
>> documents before displaying them (in code I mean) but of course that means
>> you need some kind of "interceptor" to deal with this before the results
>> are displayed.
>>
>> If I had enough control to do this in code, behind the scenes, I'd grab
>> that second part of the listops field, put it into a variable on each
>> object and then sort by that.  Then I'd return the entire list to the UI.
>>
>> I understand that if you could get SOLR to do it all, that would be
>> ideal...  There is the possibility of writing some new code and plugging it
>> in to Solr, but I'm guessing you don't want to go that far..  As a final
>> step in the process, with custom code to consume the listpos entry, sorting
>> these would be fairly straightforward.  I'm not sure how you get away from
>> the lispos multivalue field however...
>>
>> I'll keep thinking...
>>
>> On Fri, Apr 1, 2016 at 11:26 AM, Tamás Barta 
>> wrote:
>>
>>> So, the list order is determined by the user. The user creates a list,
>>> adds
>>> products to it and i have to display these list using filters and
>>> pagination.
>>>
>>> Let's assume there is list with 1 products in it. In the website
>>> where
>>> i display the list only 50 products are displayed in a page. So if i
>>> could
>>> query solr to give me products from list X, ordered as user defined, but
>>> only products with some criteria (status, amount, ..) from offset and 50
>>> rows then it would be perfect and fast. If ordering would be outside of
>>> solr then i have to retrive almost every 1 documents from solr (a bit
>>> less if filtered) to order them and display the page of 50 products.
>>> 2016. ápr. 1. 19:15 ezt írta ("John Bickerstaff" <
>>> j...@johnbickerstaff.com
>>> >):
>>>
>>> > Just to be clear - I don't mean who requests the list (application or
>>> user)
>>> > I mean what "rule" de

Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-04-01 Thread Girish Tavag
Hi Shawn,

 Finally i'm able to figure out the problem. The issue was in
solrconfig.xml where the booleans was defined. I replaced booleans with
boolean and other similar fileds and it worked correctly :)

Regards,
GNT.


Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-04-01 Thread Jack Krupansky
I think it's a bug...

Ah, the key clue is here:
Caused by: org.apache.solr.common.SolrException: fieldType 'booleans' not
found in the schema
at
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessor
Factory$TypeMapping.populateValueClasses(AddSchemaFieldsUpdateProcessor
Factory.java:247)

In fact, if we look at this schema in the repo:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml

We can find this update processor chain:


Which has this processor:

 strings  java.lang.Boolean booleans 

And there is you field type "booleans" reference.

That type used to be needed for multivalued boolean fields, but now the
dynamic pattern for *_bs is itself multivalued and simply references the
"boolean" type.

This schema in the repo has a similar issue:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/solr/example/files/conf/solrconfig.xml

 strings  java.lang.Boolean booleans 

Hmmm... or maybe the old "booleans" field type should be restored to allow
boolean fields to be multivalued?

So, somebody should file a Jira on this.


-- Jack Krupansky

On Fri, Apr 1, 2016 at 3:24 PM, Girish Tavag 
wrote:

> Hi Shawn,
>
>  Finally i'm able to figure out the problem. The issue was in
> solrconfig.xml where the booleans was defined. I replaced booleans with
> boolean and other similar fileds and it worked correctly :)
>
> Regards,
> GNT.
>


Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-04-01 Thread Shawn Heisey
On 4/1/2016 1:24 PM, Girish Tavag wrote:
>  Finally i'm able to figure out the problem. The issue was in
> solrconfig.xml where the booleans was defined. I replaced booleans with
> boolean and other similar fileds and it worked correctly :)

This has happened because you mixed the solrconfig.xml file from
data_driven_schema_configs with the schema from a completely different
example.

The config and schema in each example are intended to be used as a
matched pair.  If you mix pieces from examples without checking to make
sure they are compatible and fixing anything you find, problems *will*
happen.

Thanks,
Shawn



Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-04-01 Thread Girish Tavag
Hi Shawn and Jack,

 Yes that is true. I was referring the tutorial and practicing and ended up
in this.
By the way, one more thing I would like to know, is it possible to schedule
the full data import? or the delta import?
i.e. on regular intervals the data should be updated.

Regards,
GNT

On Sat, Apr 2, 2016 at 2:21 AM, Shawn Heisey  wrote:

> On 4/1/2016 1:24 PM, Girish Tavag wrote:
> >  Finally i'm able to figure out the problem. The issue was in
> > solrconfig.xml where the booleans was defined. I replaced booleans with
> > boolean and other similar fileds and it worked correctly :)
>
> This has happened because you mixed the solrconfig.xml file from
> data_driven_schema_configs with the schema from a completely different
> example.
>
> The config and schema in each example are intended to be used as a
> matched pair.  If you mix pieces from examples without checking to make
> sure they are compatible and fixing anything you find, problems *will*
> happen.
>
> Thanks,
> Shawn
>
>


Re: Search over XML data using xpath

2016-04-01 Thread Alexandre Rafalovitch
You may be interested in checking out:
http://luxdb.org/ - seems to be still on Solr 4.x though
http://siren.solutions/siren/overview/ - I believe they indexed XML

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 1 April 2016 at 19:41, Miguel Valencia Zurera
 wrote:
> Hi everybody
>
> I'm looking for the way to store XML file and keep on hierarchy of the data
> because I need show full xml and besides to search inside of nodes of xml.
> Only I have found XPathEntityProcessor for import xml but it does not keep
> on the hierarchy of the data.
>
> I have not found one type of field that it allow store xml and to do xpath.
> So, I have thougth parser all fields of xml file and additionaly add a new
> field with the full xml.
>
> Is there another option?
> thanks


Re: make document with more matches rank higher with edismax parser?

2016-04-01 Thread Alexandre Rafalovitch
Have you tried 'tie' parameter?

https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Thetie%28TieBreaker%29Parameter

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 1 April 2016 at 14:03, Derek Poh  wrote:
> Hi
>
> Correct me if I am wrong, my understanding of edismax parser is it use the
> max score of the matches in a doc.
>
> How do I make docs with more matches rank higher with edismax?
>
> These 2 docs are from the same query result and this is their order in the
> result.
>
> P_ProductId: 1116393488
> P_CatConcatKeyword: Bancos del poder
> P_NewShortDescription: Accione el banco, 10,400mAh, 5.0V DC entran
> P_VeryShortDescription: Accione el banco
>
> score: 0.83850163
>
> P_ProductId: 1124048475
> P_CatConcatKeyword: Bancos del poder
> P_NewShortDescription: Banco del poder con el altavoz
> P_VeryShortDescription: Banco del poder
>
> score: 0.83850163
>
> q=Bancos del poder
> qf=P_CatConcatKeyword^3.0 P_NewShortDescription^2.0
> P_NewVeryShortDescription^1.0
>
> From the debug info, both docs max score match is from P_CatConcatKeyword
> field. Debug info of both docs attached.
> Comparing the field matches between both, the 2nd doc has more fields with
> matches. How can I make 2nd doc ranked higher based on this?
>
>
>
> --
> CONFIDENTIALITY NOTICE
>
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
>
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.
>
>


Re: Performance potential for updating (reindexing) documents

2016-04-01 Thread Erick Erickson
Shawn:

bq: The bottleneck is definitely Solr.

Since you commented out the server.add(doclist), you're right to focus
there. I've seen
a few things that help.

1> batch the documents, i.e. in the doclist above the list should be
on the order of 1,000 docs. Here
are some numbers I worked up one time:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/

2> If your Solr CPUs aren't running flat out, then adding threads
until they are being pretty well hammered
is A Good Thing. Of course you have to balance that off against
anything else your servers are doing like
serving queries

3> Make sure you're using CloudSolrClient.

4> If you still need more throughput, use more shards.

Best,
Erick

On Thu, Mar 31, 2016 at 6:39 PM, Shawn Heisey  wrote:
> On 3/24/2016 11:57 AM, tedsolr wrote:
>> My post was scant on details. The numbers I gave for collection sizes are
>> projections for the future. I am in the midst of an upgrade that will be
>> completed within a few weeks. My concern is that I may not be able to
>> produce the throughput necessary to index an entire collection quickly
>> enough (3 to 4 hours) for a large customer (100M docs).
>
> I can fully rebuild one of my indexes, with 146 million docs, in 8-10
> hours.  This is fairly inefficient indexing -- six large shards (not
> cloud), each one running the dataimport handler, importing from MySQL.
> I suspect I could probably get two or three times this rate (and maybe
> more) on the same hardware if I wrote a SolrJ application that uses
> multiple threads for each Solr shard.
>
> I know from experiments that the MySQL server can push over 100 million
> rows to a SolrJ program in less than an hour, including constructing
> SolrInputDocument objects.  That experiment just left out the
> "client.add(docs);" line.  The bottleneck is definitely Solr.
>
> Each machine holds three large shards(half the index),is running Solr
> 4.x (5.x upgrade is in the works), and has 64GB RAM with an 8GB heap.
> Each shard is approximately 24.4 million docs and 28GB.  These machines
> also hold another sharded index in the same Solr install, but it's quite
> a lot smaller.
>
> Thanks,
> Shawn
>


Re: Function Query Parsing problem in Solr 5.4.1 and Solr 5.5.0

2016-04-01 Thread Mikhail Khludnev
Hello Max,

Since it reports the first space occurrence pos=32, I advise to nuke all
spaces between braces  in sum().

On Fri, Apr 1, 2016 at 7:40 PM, Max Bridgewater 
wrote:

> Hi,
>
> I have the following configuration for firstSearcher handler in
> solrconfig.xml:
>
>
>   
>   
> 
>   parts
>   score desc, Review1 asc, Rank2 asc
> 
> 
>   make
>   {!func}sum(product(0.01,param1),
> product(0.20,param2),  min(param2,0.4)) desc
> 
>   
> 
>
> This works great in Solr 4.10. However, in solr 5.4.1 and solr 5.5.0, I get
> the below error. How do I write this kind of query with Solr 5?
>
>
> Thanks,
> Max.
>
>
> ERROR org.apache.solr.handler.RequestHandlerBase  [   x:productsearch] –
> org.apache.solr.common.SolrException: Can't determine a Sort Order (asc or
> desc) in sort spec '{!func}sum(product(0.01,param1), product(0.20,param2),
> min(param2,0.4)) desc', pos=32
> at
>
> org.apache.solr.search.SortSpecParsing.parseSortSpec(SortSpecParsing.java:143)
> at org.apache.solr.search.QParser.getSort(QParser.java:247)
> at
>
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:18
> 7)
> at
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler
> .java:247)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.jav
> a:156)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
> at
>
> org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:6
> 9)
> at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1840)
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics