from:"Pablo"

Pivot facets - distributed search - request

2016-04-12 Thread Pablo

Hi,
Is there any way of requesting limit 10 order by a stat within facet pivot?
I know that the "json facet" component can do this and it has a very
comphrehensive api, but it has a problem of consistency (refinement) when
querying across multiple shards. 
And given that pivot facets supports distributed searching, I tried to make
a similar request, but couldn't find how to do it.
Thanks in advance!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pivot-facets-distributed-search-request-tp4269570.html
Sent from the Solr - User mailing list archive at Nabble.com.

Which line is solr following in terms of a BI Tool?

2016-04-12 Thread Pablo

Hello, 
I think this topic is important for solr users that are planning to use solr
as a BI Tool.
Speaking about facets, nowadays there are three majors way of doing (more or
less) the same  in solr. 
First, you have the pivot facets, on the other hand you have the Analytics
component and finally you have the JSON Facet Api.
So, which line is Solr following? Which of these component is going to be in
constant development and which one is going to be deprecated sooner. 
In Yonik page, there are some test that shows how JSON Facet Api performs
better than legacy facets, also the Api was way simpler than the pivot
facets, so in my case that was enough to base my solution around the JSON
Api. But I would like to know what are the thoughts of the solr developers.

Thanks! 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Which-line-is-solr-following-in-terms-of-a-BI-Tool-tp4269597.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Which line is solr following in terms of a BI Tool?

2016-04-13 Thread Pablo Anzorena

Thank you very much both of you for your insights!
I really appreciate it.



2016-04-13 11:30 GMT-03:00 Kevin Risden :

> For Solr 6, ParallelSQL and Solr JDBC driver are going to be developed more
> as well as JSON facets. The Solr JDBC driver that is in Solr 6 contains
> SOLR-8502. There are further improvements coming in SOLR-8659 that didn't
> make it into 6.0. The Solr JDBC piece leverages ParallelSQL and in some
> cases uses JSON facets under the hood.
>
> The Solr JDBC driver should enable BI tools to connect to Solr and use the
> language of SQL. This is also a familiar interface for many Java
> developers.
>
> Just a note: Solr is not an RDBMS and shouldn't be treated like one even
> with a JDBC driver. The Solr JDBC driver is more of a convenience for
> querying.
>
> Kevin Risden
>
> On Tue, Apr 12, 2016 at 6:24 PM, Erick Erickson 
> wrote:
>
> > The unsatisfactory answer is that the have different characteristics.
> >
> > The analytics contrib does not work in distributed mode. It's not
> > receiving a lot of love at this point.
> >
> > The JSON facets are estimations. Generally very close but are not
> > guaranteed to be 100% accurate. The variance, as I understand it,
> > is something on the order of < 1% in most cases.
> >
> > The pivot facets are accurate, but more expensive than the JSON
> > facets.
> >
> > And, to make matters worse, the ParllelSQL way of doing some
> > aggregations is going to give yet another approach.
> >
> > Best,
> > Erick
> >
> > On Tue, Apr 12, 2016 at 7:15 AM, Pablo  wrote:
> > > Hello,
> > > I think this topic is important for solr users that are planning to use
> > solr
> > > as a BI Tool.
> > > Speaking about facets, nowadays there are three majors way of doing
> > (more or
> > > less) the same  in solr.
> > > First, you have the pivot facets, on the other hand you have the
> > Analytics
> > > component and finally you have the JSON Facet Api.
> > > So, which line is Solr following? Which of these component is going to
> > be in
> > > constant development and which one is going to be deprecated sooner.
> > > In Yonik page, there are some test that shows how JSON Facet Api
> performs
> > > better than legacy facets, also the Api was way simpler than the pivot
> > > facets, so in my case that was enough to base my solution around the
> JSON
> > > Api. But I would like to know what are the thoughts of the solr
> > developers.
> > >
> > > Thanks!
> > >
> > >
> > >
> > > --
> > > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Which-line-is-solr-following-in-terms-of-a-BI-Tool-tp4269597.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Create collections in SolrCloud "Could not get shard id for core"

2016-04-22 Thread Pablo Anzorena

Hey,
I'm using solr 5.2.1 and yesterday I started migrating to SolrCloud so I
might be quite noobish with it. The thing is that I could create 3
collections with not so many inconvenients, but when trying to create
another collection it throws a timeout and inside solr log it says "Could
not get shard id for core my_core_shard1_replica1".

My SolrCloud has three servers running zookeeper and two of them running
solr in cloud mode.
The command for creating the collection is:

http://localhost:8983/solr/admin/collections?action=CREATE&name=my_core&numShards=1&maxShardsPerNode=1&collection.configName=my_core_config&createNodeSet=server1:8983_solr

Thanks.

Lucene 5 Custom FieldComparator

2015-08-13 Thread Pablo Mincz

Hi,

I'm doing a migration from Lucene 3.6.1 to 5.2.1 and I have a custom
FieldComparator that sort the search for availables discounts. For this,
first I check that the date range is valid and later sort by the discount
amount.

I did this in Lucene 3.6.1 but now in 5.2.1 version, the FieldComparator
has the method doSetNextReader that has a LeafReaderContext and I do not
know how to read all the fields from the LeafReader because I did not
indexed this field with DocValues.

I tried with MultiFields but I got only one result instead of an array, and
some values are Floats.

Someone know how to do this?

Thanks a lot for the help.

Regards,
Pablo.

Lucene 5 Custom FieldComparator

2015-08-13 Thread Pablo Mincz

Hi,

I'm doing a migration from Lucene 3.6.1 to 5.2.1 and I have a custom
FieldComparator that sort the search for availables discounts. For this,
first I check that the date range is valid and later sort by the discount
amount.

I did this in Lucene 3.6.1 but now in 5.2.1 version, the FieldComparator
has the method doSetNextReader that has a LeafReaderContext and I do not
know how to read all the fields from the LeafReader because I did not
indexed this field with DocValues.

I tried with MultiFields but I got only one result instead of an array, and
some values are Floats.

Someone know how to do this?

Thanks a lot for the help.

Regards,
Pablo.

Lucene 5.2.1 Spatial Strategy PointVectorStrategy

2015-08-19 Thread Pablo Mincz

Hi,

I'm implementing a sort search by distance with a PointVectorStrategy.
In the index process I used createIndexableFields from the strategy
and makePoint from the context GEO.

But when I'm sorting the search I get the error:
Java::JavaLang::IllegalStateException: unexpected docvalues type NONE
for field 'location__x' (expected=NUMERIC)

And for what I see it is impossible to use a specific FieldType with
DocValuesType NUMERIC.

Someone know how to fix this?

Thanks a lot for the help.

Regards,
Pablo.

FieldCacheTermsFilter Replacement in Lucene 5.2.1

2015-08-21 Thread Pablo Mincz

Hi,

I'm doing a migration from Lucene 3.6.1 to 5.2.1, and for what I see
FieldCacheTermsFilter does not exists any more.

Is there any replacement for keep same functionality?

Thanks for the help.

Regards,
Pablo.

OutOfMemoryError does not fire the script

2016-05-27 Thread Pablo Anzorena

Hello,

I am using solr 5.2.1 in cloud mode. My jvm arguments for the
OutOfMemoryError is
-XX:OnOutOfMemoryError='/etc/init.d/solrcloud;restart'

In the Solr UI, the event is beign fired, but nothing happens.

What am I missing?

Regards.

Re: OutOfMemoryError does not fire the script

2016-05-27 Thread Pablo Anzorena

Perfect, thank you very much.

2016-05-27 12:44 GMT-03:00 Shawn Heisey :

> On 5/27/2016 7:05 AM, Pablo Anzorena wrote:
> > I am using solr 5.2.1 in cloud mode. My jvm arguments for the
> > OutOfMemoryError is
> > -XX:OnOutOfMemoryError='/etc/init.d/solrcloud;restart'
> >
> > In the Solr UI, the event is beign fired, but nothing happens.
>
> In all versions before 5.5.1, that -XX parameter is placed incorrectly
> on the commandline, in the bin/solr script, so it doesn't work.
>
> https://issues.apache.org/jira/browse/SOLR-8145
>
> If you move the parameter so it comes before "-jar start.jar" on the
> commandline (see the newest patch attached to SOLR-8145) and restart
> Solr, it will fix this problem.
>
> Thanks,
> Shawn
>
>

SolrCloud SolrNode stopping randomly for no reason

2016-06-07 Thread Pablo Anzorena

Hey,

I'am using SolrCloud with two nodes (5.2.1). One or two times a day the
node1 is stopping for no reason. I checked the logs but no errors are beign
logged.
I also have a standalone solr service in both nodes running in production
(we are doing the migration to SolrCloud, that's why).

Thanks.

Re: SolrCloud SolrNode stopping randomly for no reason

2016-06-07 Thread Pablo Anzorena

Sorry for the poor details, but I didn't post the log files because there
was nothing out of the ordinary in the solr.log file, neither in
the solr-8984-console.log, nor in solr_gc.log.

What log do you want me to show you? solr.log.1 (which I think it should be
the last one) for example? You need the tail or the head of the file?

When I say "stopping for no reason" I mean the service is not running
anymore, the process is finished. I tried killing it with kill -9 command
and it does not log that, my first thought was that I restarted the
standalone solr service which try to stop the service and if it can't it
kills it doing SOLR_PROCESS_ID="`ps -eaf | grep -v "grep" | grep
"start.jar";kill -9 ${SOLR_PROCESS_ID}. So sometimes it could kill
solrcloud instead of standalone, but sometimes the datetime does not match.
Another option is that it gives an outofmemoryerror and the oom script is
killing the process, but again I saw nothing in the solr_gc.log.

2016-06-07 10:18 GMT-03:00 Shawn Heisey :

> On 6/7/2016 6:08 AM, Pablo Anzorena wrote:
> > I'am using SolrCloud with two nodes (5.2.1). One or two times a day the
> > node1 is stopping for no reason. I checked the logs but no errors are
> beign
> > logged.
> > I also have a standalone solr service in both nodes running in production
> > (we are doing the migration to SolrCloud, that's why).
>
> https://wiki.apache.org/solr/UsingMailingLists
>
> There are no real details to your message.  What precisely does
> "stopping for no reason" mean?  What does Solr *do*?  We cannot see your
> system, you must tell us what is happening with considerable detail.
>
> It seems highly unlikely that Solr would misbehave without logging
> *something*.  Are you looking at the Logging tab in the admin UI, or the
> actual solr.log file?  The solr.log file is the only reliable place to
> look.  When you restart Solr, the current logfile is renamed and a new
> solr.log will be created.
>
> Thanks,
> Shawn
>
>

No live SolrServers triggered by maxclausecount

2016-06-27 Thread Pablo Anzorena

Hi,

I have an ensemble zookeeper consisting of 3 machines and 2 machines with
solrcloud.

With a high frequency I see in the logging:
*No live SolrServers available to handle this
request:[http://solr2:8983/solr/usa_bills_imp_2016_2016062300_shard1_replica1
,
http://solr3:8983/solr/usa_bills_imp_2016_2016062300_shard1_replica2
]*

and the state.json is:

{"usa_bills_imp_2016_2016062300":{
"replicationFactor":"2",
"shards":{"shard1":{
"range":"8000-7fff",
"state":"active",
"replicas":{
  "core_node1":{
"core":"usa_bills_imp_2016_2016062300_shard1_replica2",
"base_url":"http://solr3:8983/solr";,
"node_name":"solr3:8983_solr",
"state":"active"},
  "core_node2":{
"core":"usa_bills_imp_2016_2016062300_shard1_replica1",
"base_url":"http://solr2:8983/solr";,
"node_name":"solr2:8983_solr",
"state":"active",
"leader":"true",
"router":{"name":"compositeId"},
"maxShardsPerNode":"1",
"autoAddReplicas":"false"}}
*And the full stacktrace of the error is:*

null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this
request:[http://solr3:8983/solr/usa_bills_imp_2014_20160603115528_shard1_replica2,
http://solr3:8983/solr/usa_bills_imp_2016_2016062300_shard1_replica2,
http://solr2:8983/solr/usa_bills_imp_2014_20160603115528_shard1_replica1,
http://solr2:8983/solr/usa_bills_imp_2015_20160610125230_shard1_replica1]
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:375)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.client.solrj.SolrServerException: No live
SolrServers available to handle this
request:[http://solr3:8983/solr/usa_bills_imp_2014_20160603115528_shard1_replica2,
http://solr3:8983/solr/usa_bills_imp_2016_2016062300_shard1_replica2,
http://solr2:8983/solr/usa_bills_imp_2014_20160603115528_shard1_replica1,
http://solr2:8983/solr/usa_bills_imp_2015_20160610125230_shard1_replica1]
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:355)
at 
org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(HttpShardHandlerFactory.java:246)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:221)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:183)
at java.util.concurrent.Future

Re: No live SolrServers triggered by maxclausecount

2016-06-28 Thread Pablo Anzorena

Hi Erick, thanks for answering.

I attached the image to the body so you can see it.



Why do I need so many clauses?
It is because I have two text fields that contains in average 25 words with
a lot of typos (which I'm not cleaning it) and on top of that the index
consists of 25 million records. And we let the users make queries with
phrases, wildcards prefix and suffix. So for example the following query is
valid *q=text_field:"*ban*" AND text_field2:"foo* bar*"* (I ommited the
query parser syntax that allows this).


2016-06-28 2:08 GMT-03:00 Erick Erickson :

> That error sometimes gets reported inappropriately, as long as the
> servers are live
> you can pretty much ignore it.
>
> Attachments pretty much all get stripped by the mail server so we can't
> see your
> screen shot.
>
> Setting your max clause count to over 100K is pretty much an
> anti-pattern, what in the world
> are you doing that would require it to be that high? You haven't
> really shown us the query
> you're sending, but I bet it's amazing. Frankly, anything over the
> default of 1K is suspect.
>
> If this is some clause like id:(1 OR 2 OR 3 OR 4...) you really
> want to try using the
> TermsQueryParser (note the 's'  as opposed to TermQueryParser (no 's').
> See:
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermQueryParser
>
> And if you use the TermsQueryParser, it's probably a good idea to sort
> the list of terms, it's
> more efficient.
>
> Or do a join or... because anything with that many terms will be
> rather slow to say the least.
>
> Best,
> Erick
>
>
> On Mon, Jun 27, 2016 at 8:38 AM, Pablo Anzorena 
> wrote:
> > Hi,
> >
> > I have an ensemble zookeeper consisting of 3 machines and 2 machines with
> > solrcloud.
> >
> > With a high frequency I see in the logging:
> > No live SolrServers available to handle this
> > request:[
> http://solr2:8983/solr/usa_bills_imp_2016_2016062300_shard1_replica1,
> > http://solr3:8983/solr/usa_bills_imp_2016_2016062300_shard1_replica2
> ]
> >
> > and the state.json is:
> >
> > {"usa_bills_imp_2016_2016062300":{
> > "replicationFactor":"2",
> > "shards":{"shard1":{
> > "range":"8000-7fff",
> > "state":"active",
> > "replicas":{
> >   "core_node1":{
> > "core":"usa_bills_imp_2016_2016062300_shard1_replica2",
> > "base_url":"http://solr3:8983/solr";,
> > "node_name":"solr3:8983_solr",
> > "state":"active"},
> >   "core_node2":{
> > "core":"usa_bills_imp_2016_2016062300_shard1_replica1",
> > "base_url":"http://solr2:8983/solr";,
> > "node_name":"solr2:8983_solr",
> > "state":"active",
> > "leader":"true",
> > "router":{"name":"compositeId"},
> > "maxShardsPerNode":"1",
> > "autoAddReplicas":"false"}}
> >
> > And the full stacktrace of the error is:
> >
> > null:org.apache.solr.common.SolrException:
> > org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> > available to handle this
> > request:[
> http://solr3:8983/solr/usa_bills_imp_2014_20160603115528_shard1_replica2,
> > http://solr3:8983/solr/usa_bills_imp_2016_2016062300_shard1_replica2
> ,
> > http://solr2:8983/solr/usa_bills_imp_2014_20160603115528_shard1_replica1
> ,
> > http://solr2:8983/solr/usa_bills_imp_2015_20160610125230_shard1_replica1
> ]
> >   at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:375)
> >   at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> >   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
> >   at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
> >   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
> >   at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> >   at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)

Re: No live SolrServers triggered by maxclausecount

2016-06-28 Thread Pablo Anzorena

Thanks! I will analyze it and let you know.


2016-06-28 11:25 GMT-03:00 Erick Erickson :

> OK, but the consider what you can do to keep from having to
> create such things.
> 1> for infix notation (leading and trailing wildcards) you can
> use ngrams to turn them into simple queries. These are
> performance-killers.
> 2> Use reverseWildcardFactory to deal with leading wildcards
> 3> restrict the wildcard search to at least two "real" characters.
>  I can (and have) argued that a* is not useful to a user at all.
>
> The point is that your responsiveness will suffer if you come
> anywhere near 100K OR clauses. Maybe that's OK, but
> it's lurking out there.
>
>
> And no I can't see the image, the Apache server is pretty
> aggressive about stripping those too.
>
> Best,
> Erick
>
> On Tue, Jun 28, 2016 at 7:19 AM, Pablo Anzorena 
> wrote:
>
> > Hi Erick, thanks for answering.
> >
> > I attached the image to the body so you can see it.
> >
> >
> >
> > Why do I need so many clauses?
> > It is because I have two text fields that contains in average 25 words
> > with a lot of typos (which I'm not cleaning it) and on top of that the
> > index consists of 25 million records. And we let the users make queries
> > with phrases, wildcards prefix and suffix. So for example the following
> > query is valid *q=text_field:"*ban*" AND text_field2:"foo* bar*"* (I
> > ommited the query parser syntax that allows this).
> > 
> >
> > 2016-06-28 2:08 GMT-03:00 Erick Erickson :
> >
> >> That error sometimes gets reported inappropriately, as long as the
> >> servers are live
> >> you can pretty much ignore it.
> >>
> >> Attachments pretty much all get stripped by the mail server so we can't
> >> see your
> >> screen shot.
> >>
> >> Setting your max clause count to over 100K is pretty much an
> >> anti-pattern, what in the world
> >> are you doing that would require it to be that high? You haven't
> >> really shown us the query
> >> you're sending, but I bet it's amazing. Frankly, anything over the
> >> default of 1K is suspect.
> >>
> >> If this is some clause like id:(1 OR 2 OR 3 OR 4...) you really
> >> want to try using the
> >> TermsQueryParser (note the 's'  as opposed to TermQueryParser (no 's').
> >> See:
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermQueryParser
> >>
> >> And if you use the TermsQueryParser, it's probably a good idea to sort
> >> the list of terms, it's
> >> more efficient.
> >>
> >> Or do a join or... because anything with that many terms will be
> >> rather slow to say the least.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Mon, Jun 27, 2016 at 8:38 AM, Pablo Anzorena <
> anzorena.f...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I have an ensemble zookeeper consisting of 3 machines and 2 machines
> >> with
> >> > solrcloud.
> >> >
> >> > With a high frequency I see in the logging:
> >> > No live SolrServers available to handle this
> >> > request:[
> >>
> http://solr2:8983/solr/usa_bills_imp_2016_2016062300_shard1_replica1,
> >> >
> >>
> http://solr3:8983/solr/usa_bills_imp_2016_2016062300_shard1_replica2]
> >> >
> >> > and the state.json is:
> >> >
> >> > {"usa_bills_imp_2016_2016062300":{
> >> > "replicationFactor":"2",
> >> > "shards":{"shard1":{
> >> > "range":"8000-7fff",
> >> > "state":"active",
> >> > "replicas":{
> >> >   "core_node1":{
> >> >
>  "core":"usa_bills_imp_2016_2016062300_shard1_replica2",
> >> > "base_url":"http://solr3:8983/solr";,
> >> > "node_name":"solr3:8983_solr",
> >> > "state":"active"},
> >> >   "core_node2":{
> >> >
>  "core":"usa_bills_imp_2016_2016062300_shard1_replica1",
> >> > "base_ur

Re: Facet in SOLR Cloud vs Core

2016-07-07 Thread Pablo Anzorena

As long as you don't shard your index, you will have no problem migrating
to solrcloud.

The problem with the shards appears in the following scenario (note that
the problem below also applies in a solr standalone enviroment with
distributed search):

Shard1: DATA_SOURCE1 (3 docs), DATA_SOURCE2 (2 docs), DATA_SOURCE3 (2 docs).
Shard2: DATA_SOURCE3 (2 docs), DATA_SOURCE2 (1 docs).

If you make a distributed search across these two shards, faceting
dataSourceName with a limit of 1, it will ask for the top 1 in the first
shard (DATA_SOURCE1 (3 docs)) and for the top 1 in the second shard
(DATA_SOURCE3
(2 docs)). After that it will merge the results and return DATA_SOURCE1 (3
docs), when it should have return DATA_SOURCE3 (4 docs).

Summarizing: if you make a distributed search with a facet.limit, there is
a chance that the count is not correct (it also applies to stats).

2016-07-07 15:28 GMT-03:00 Whelan, Andy :

> Hello,
>
> I have am somewhat of a novice when it comes to using SOLR in a
> distributed SolrCloud environment. My team and I are doing development work
> with a SOLR core. We will shortly be transitioning over to a SolrCloud
> environment.
>
> My question specifically has to do with Facets in a SOLR cloud/collection
> (distributed environment). The core I am working with has a field
> "dataSourceName" defined as following in its schema.xml file.
>
>  required="true"/>
>
> I am using the following facet query which works fine in more Core based
> index
>
>
> http://localhost:8983/solr/gamra/select?q=*:*&rows=0&facet=true&facet.field=dataSourceName
>
> It returns counts for each distinct dataSourceName as follows (which is
> the desired behavior).
>
> 
>
>   169
>   121
>   68
>
> 
>
> I am wondering if this should work fine in the SOLR Cloud as well?  Will
> this method give me accurate counts out of the box in a SOLR Cloud
> configuration?
>
> Thanks
> -Andrew
>
> PS: The reason I ask is because I know there is some estimating performed
> in certain cases for the Facet "unique" function (as is outlined here:
> http://yonik.com/solr-count-distinct/ ). So I guess I am wondering why
> folks wouldn't just do what I have done vs going throught the trouble of
> using the unique(dataSourceName) function?
>
>
>

Re: Facet in SOLR Cloud vs Core

2016-07-07 Thread Pablo Anzorena

Sorry for introducing bad information.
Because it happens in the json facet api, I thought it would also happen in
the facet. Soyrry again for the misunderstood.

2016-07-07 16:08 GMT-03:00 Chris Hostetter :

>
> : The problem with the shards appears in the following scenario (note that
> : the problem below also applies in a solr standalone enviroment with
> : distributed search):
> :
> : Shard1: DATA_SOURCE1 (3 docs), DATA_SOURCE2 (2 docs), DATA_SOURCE3 (2
> docs).
> : Shard2: DATA_SOURCE3 (2 docs), DATA_SOURCE2 (1 docs).
> :
> : If you make a distributed search across these two shards, faceting
> : dataSourceName with a limit of 1, it will ask for the top 1 in the first
> : shard (DATA_SOURCE1 (3 docs)) and for the top 1 in the second shard
> : (DATA_SOURCE3
> : (2 docs)). After that it will merge the results and return DATA_SOURCE1
> (3
> : docs), when it should have return DATA_SOURCE3 (4 docs).
>
> That's completley false.
>
> a) in the first pass, even if you ask for "top 1" (ie: facet.limit=1) solr
> will overrequest when comunicating with each shard (the amount of
> overrequest is a function of your facet.limit, so as facet.limit increases
> so does the overrequest amount)
>
> b) if *any* (but not *all*) shards returns DATA_SOURCE3 from the
> initial shard request, a second "refinement" step will request the count
> for DATA_SOURCE3 from all of the other shards to get an accurate count,
> and to accurately sort DATA_SOURCE3 to the top of the facet constraint
> list.
>
>
> -Hoss
> http://www.lucidworks.com/
>

Is SolrCloudClient Singleton Pattern possible with multiple collections?

2016-07-14 Thread Pablo Anzorena

Hey,
So the question is quite simple, Is it possible to use Singleton Pattern
with SolrCloudClient instantiation and then reuse that instance to handle
multiple requests concurrently accessing differente collections?

Thanks.

Re: Is SolrCloudClient Singleton Pattern possible with multiple collections?

2016-07-14 Thread Pablo Anzorena

I was using
public QueryResponse query(ModifiableSolrParams params, METHOD method)

And my actual code is parsing that object. I can change it to your method,
but before that let me ask you if using the param "collection" is the same.

Actually I am using the param "collection" only when I need to request to
multiple collections.

Thanks.

2016-07-14 14:15 GMT-03:00 Erick Erickson :

> Just use the
>
> public NamedList request(SolrRequest request, String collection)
>
> method on the SolrCloudClient?
>
> Best,
> Erick
>
> On Thu, Jul 14, 2016 at 9:18 AM, Pablo Anzorena 
> wrote:
> > Hey,
> > So the question is quite simple, Is it possible to use Singleton Pattern
> > with SolrCloudClient instantiation and then reuse that instance to handle
> > multiple requests concurrently accessing differente collections?
> >
> > Thanks.
>

Re: Is SolrCloudClient Singleton Pattern possible with multiple collections?

2016-07-14 Thread Pablo Anzorena

The thing is that back in solr4.8, when I was using solr standalone and I
had to make a distributed query among multiple shards, I found that for
each shard in the param "shards" it makes a request (which is the correct
behaviour I know) but when I put just one shard in the "shards" param then
it makes two identical requests.
So, now because I'm using SolrCloud I replaced the "shards" with
"collection" param and I was wondering if it would have the same erratic
behaviour.

Now I tried and I found that it has the correct behaviour.

Thanks, and sorry for asking before testing it.

2016-07-14 15:26 GMT-03:00 Erick Erickson :

> bq:  if using the param "collection" is the same
>
> Did you just try it? If so what happened?
>
> Not sure what you're asking here. It's the name of the
> collection you want to query against. It's only
> necessary when you want to go against a
> collection that isn't the default which you can set with
> setDefaultCollection()
>
> Best,
> Erick
>
> On Thu, Jul 14, 2016 at 10:51 AM, Pablo Anzorena
>  wrote:
> > I was using
> > public QueryResponse query(ModifiableSolrParams params, METHOD method)
> >
> > And my actual code is parsing that object. I can change it to your
> method,
> > but before that let me ask you if using the param "collection" is the
> same.
> >
> > Actually I am using the param "collection" only when I need to request to
> > multiple collections.
> >
> > Thanks.
> >
> >
> >
> > 2016-07-14 14:15 GMT-03:00 Erick Erickson :
> >
> >> Just use the
> >>
> >> public NamedList request(SolrRequest request, String collection)
> >>
> >> method on the SolrCloudClient?
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Jul 14, 2016 at 9:18 AM, Pablo Anzorena <
> anzorena.f...@gmail.com>
> >> wrote:
> >> > Hey,
> >> > So the question is quite simple, Is it possible to use Singleton
> Pattern
> >> > with SolrCloudClient instantiation and then reuse that instance to
> handle
> >> > multiple requests concurrently accessing differente collections?
> >> >
> >> > Thanks.
> >>
>

Parallel SQL where exists, in

2016-07-19 Thread Pablo Anzorena

Hey,

Is anyone willing to add the where exists and in clauses into paraller sql?

Thanks.

Re: Parallel SQL where exists, in

2016-07-20 Thread Pablo Anzorena

I don't have time now, but I'll keep that in mind.

Thanks.

2016-07-19 12:27 GMT-03:00 Erick Erickson :

> There is a lot of activity in the ParallelSQL world, all being done
> by a very few people, so it's a matter of priorities. Can you
> consider submitting a patch?
>
> Best,
> Erick
>
> On Tue, Jul 19, 2016 at 8:12 AM, Pablo Anzorena 
> wrote:
> > Hey,
> >
> > Is anyone willing to add the where exists and in clauses into paraller
> sql?
> >
> > Thanks.
>

Re: Group and sum in SOLR 5.3

2016-07-29 Thread Pablo Anzorena

Using the json facet api is quite easy
Check out: http://yonik.com/json-facet-api/#TermsFacet

http://solr:8983/solr/your_collection/select?q=*:*&wt=json&indent=true&json.facet=
{
  property_codes_group_by : {
type : terms,
field : property_code,
facet : { sum_price : "sum(price)" }
  }
}


2016-07-29 7:47 GMT-03:00 andreap21 :

> Hi,
> is there any way in SOLR 5.3 to achieve grouping and do mathematical
> operations like sum, avg on a certain document property?
>
> Please find below example:
>
> 
>   HIKS
>   Hotel Holiday Inn King's cross
>   4
>   40.99
> 
>
> 
>   HIKS
>   Hotel Holiday Inn King's cross
>   4
>   40.99
> 
>
> 
>   HIKS
>   Hotel Holiday Inn King's cross
>   4
>   40.99
> 
>
> We would need to group by the property_code (HIKS) and get the sum of all
> the prices from the resulting group.
>
>
> Thanks,
> Andrea
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Group-and-sum-in-SOLR-5-3-tp4289556.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Does solr support two phase commit or any other distributed transaction protocol?

2016-08-09 Thread Pablo Anzorena

That's it.

Thanks.

Re: Does solr support two phase commit or any other distributed transaction protocol?

2016-08-09 Thread Pablo Anzorena

Thanks Shawn, I understood perfectly well. One important thing in my use
case is that I only have one entry point for indexing solr, so I won't have
any problems of multiple threads trying to update the index.

So what can I do if I have to index in solr and also in postgres and I need
to do it transactionally?

I imagine something like:
1)Open a distributed transaction in postgresql and "index" the data with
the global id transaction.
1.1)If some problem occurs, rollback postgres. End of transaction.
2)Index data in solr. If no problem occurs, commit in solr and then commit
in postgres. End of transaction.
2.1)If some problem occurs in solr, rollback solr and rollback postgres.
End of transaction.

2016-08-09 11:24 GMT-03:00 Shawn Heisey :

> On 8/9/2016 7:55 AM, Pablo Anzorena wrote:
> > That's it. Thanks.
>
> Solr doesn't support transactions in the way that most people with a
> database background imagine them.
>
> With a typical database server, all changes to the database that happen
> on a single DB connection can be committed or rolled back completely
> independently from updates that happen on other DB connections.
>
> Solr doesn't work this way.
>
> In a Lucene index (Solr is a Lucene program), a "transaction" is all
> updates made since the last commit with openSearcher=true.  This
> includes ALL updates made, regardless of where they came from.  So if
> you have a dozen different threads/processes making changes to your Solr
> index, then have something do a commit, all of the updates made by those
> 12 sources before the commit will be committed.  There is no concept of
> an individual transaction.
>
> Adding the DB transaction model would be a *major* development effort,
> and there's a good chance that adding it would destroy the blazing
> search performance that Solr and Lucene are known for.
>
> Thanks,
> Shawn
>
>

Consume sql response using solrj

2016-08-11 Thread Pablo Anzorena

Hey,

I'm trying to get the response of solr via QueryResponse using
QueryResponse queryResponse = client.query(solrParams); (where client is a
CloudSolrClient)

The error it thows is:

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://tywin:8983/solr/testcollection1_shard1_replica1:
Expected mime type application/octet-stream but got text/plain.
{"result-set":{"docs":[
{"count(*)":5304,"d1":2},
{"count(*)":5160,"d1":1},
{"count(*)":5016,"d1":3},
{"count(*)":4893,"d1":4},
{"count(*)":4824,"d1":5},
{"EOF":true,"RESPONSE_TIME":11}]}}
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:558)

Then I tryed to implement a custom ResponseParser that override the
getContentType() and returns "text/plain", but it returns another error.

So... Is it a way to get the sql response via this method?

I make it works via Connection and ResultSets, but I need to use the other
way (if possible).

Thanks!

Re: Consume sql response using solrj

2016-08-11 Thread Pablo Anzorena

Excellent!

Thanks Joel

2016-08-11 11:19 GMT-03:00 Joel Bernstein :

> There are two ways to do this with SolrJ:
>
> 1) Use the JDBC driver.
>
> 2) Use the SolrStream to send the request and then read() the Tuples. This
> is what the JDBC driver does under the covers. The sample code can be found
> here:
> https://github.com/apache/lucene-solr/blob/master/solr/
> solrj/src/java/org/apache/solr/client/solrj/io/sql/StatementImpl.java
>
> The constructStream() method creates a SolrStream with the request.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Aug 11, 2016 at 10:05 AM, Pablo Anzorena 
> wrote:
>
> > Hey,
> >
> > I'm trying to get the response of solr via QueryResponse using
> > QueryResponse queryResponse = client.query(solrParams); (where client is
> a
> > CloudSolrClient)
> >
> > The error it thows is:
> >
> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> > Error
> > from server at http://tywin:8983/solr/testcollection1_shard1_replica1:
> > Expected mime type application/octet-stream but got text/plain.
> > {"result-set":{"docs":[
> > {"count(*)":5304,"d1":2},
> > {"count(*)":5160,"d1":1},
> > {"count(*)":5016,"d1":3},
> > {"count(*)":4893,"d1":4},
> > {"count(*)":4824,"d1":5},
> > {"EOF":true,"RESPONSE_TIME":11}]}}
> > at
> > org.apache.solr.client.solrj.impl.HttpSolrClient.
> > executeMethod(HttpSolrClient.java:558)
> >
> > Then I tryed to implement a custom ResponseParser that override the
> > getContentType() and returns "text/plain", but it returns another error.
> >
> > So... Is it a way to get the sql response via this method?
> >
> > I make it works via Connection and ResultSets, but I need to use the
> other
> > way (if possible).
> >
> > Thanks!
> >
>

Re: Consume sql response using solrj

2016-08-11 Thread Pablo Anzorena

Joel, one more thing.

Is there anyway to use the sql and the lucene query syntax? The thing is
that my bussiness application is tightly coupled with the lucene query
syntax, so I need a way to use both the sql features (without the where
clause) and the query syntax of lucene.

Thanks.

2016-08-11 11:40 GMT-03:00 Pablo Anzorena :

> Excellent!
>
> Thanks Joel
>
> 2016-08-11 11:19 GMT-03:00 Joel Bernstein :
>
>> There are two ways to do this with SolrJ:
>>
>> 1) Use the JDBC driver.
>>
>> 2) Use the SolrStream to send the request and then read() the Tuples. This
>> is what the JDBC driver does under the covers. The sample code can be
>> found
>> here:
>> https://github.com/apache/lucene-solr/blob/master/solr/solrj
>> /src/java/org/apache/solr/client/solrj/io/sql/StatementImpl.java
>>
>> The constructStream() method creates a SolrStream with the request.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Thu, Aug 11, 2016 at 10:05 AM, Pablo Anzorena > >
>> wrote:
>>
>> > Hey,
>> >
>> > I'm trying to get the response of solr via QueryResponse using
>> > QueryResponse queryResponse = client.query(solrParams); (where client
>> is a
>> > CloudSolrClient)
>> >
>> > The error it thows is:
>> >
>> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>> > Error
>> > from server at http://tywin:8983/solr/testcollection1_shard1_replica1:
>> > Expected mime type application/octet-stream but got text/plain.
>> > {"result-set":{"docs":[
>> > {"count(*)":5304,"d1":2},
>> > {"count(*)":5160,"d1":1},
>> > {"count(*)":5016,"d1":3},
>> > {"count(*)":4893,"d1":4},
>> > {"count(*)":4824,"d1":5},
>> > {"EOF":true,"RESPONSE_TIME":11}]}}
>> > at
>> > org.apache.solr.client.solrj.impl.HttpSolrClient.
>> > executeMethod(HttpSolrClient.java:558)
>> >
>> > Then I tryed to implement a custom ResponseParser that override the
>> > getContentType() and returns "text/plain", but it returns another error.
>> >
>> > So... Is it a way to get the sql response via this method?
>> >
>> > I make it works via Connection and ResultSets, but I need to use the
>> other
>> > way (if possible).
>> >
>> > Thanks!
>> >
>>
>
>

Re: Consume sql response using solrj

2016-08-12 Thread Pablo Anzorena

Thanks Joel, that's work perfectly well. I checked some cases and the data
is consistent.



2016-08-11 14:17 GMT-03:00 Joel Bernstein :

> Actually try this:
>
> select a from b where _query_='a:b'
>
> *This produces the query:*
>
> (_query_:"a:b")
>
> which should run.
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Aug 11, 2016 at 1:04 PM, Joel Bernstein 
> wrote:
>
> > There are no test cases for this but you can try this syntax:
> >
> > select a from b where _query_=(a:c AND d:f)
> >
> > This should get translated to:
> >
> > _query_:(a:c AND d:f)
> >
> > This link describes the behavior of _query_ https://lucidworks.
> > com/blog/2009/03/31/nested-queries-in-solr/
> >
> > Just not positive how the SQL parser will treat the : in the query.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Aug 11, 2016 at 12:22 PM, Pablo Anzorena <
> anzorena.f...@gmail.com>
> > wrote:
> >
> >> Joel, one more thing.
> >>
> >> Is there anyway to use the sql and the lucene query syntax? The thing is
> >> that my bussiness application is tightly coupled with the lucene query
> >> syntax, so I need a way to use both the sql features (without the where
> >> clause) and the query syntax of lucene.
> >>
> >> Thanks.
> >>
> >> 2016-08-11 11:40 GMT-03:00 Pablo Anzorena :
> >>
> >> > Excellent!
> >> >
> >> > Thanks Joel
> >> >
> >> > 2016-08-11 11:19 GMT-03:00 Joel Bernstein :
> >> >
> >> >> There are two ways to do this with SolrJ:
> >> >>
> >> >> 1) Use the JDBC driver.
> >> >>
> >> >> 2) Use the SolrStream to send the request and then read() the Tuples.
> >> This
> >> >> is what the JDBC driver does under the covers. The sample code can be
> >> >> found
> >> >> here:
> >> >> https://github.com/apache/lucene-solr/blob/master/solr/solrj
> >> >> /src/java/org/apache/solr/client/solrj/io/sql/StatementImpl.java
> >> >>
> >> >> The constructStream() method creates a SolrStream with the request.
> >> >>
> >> >> Joel Bernstein
> >> >> http://joelsolr.blogspot.com/
> >> >>
> >> >> On Thu, Aug 11, 2016 at 10:05 AM, Pablo Anzorena <
> >> anzorena.f...@gmail.com
> >> >> >
> >> >> wrote:
> >> >>
> >> >> > Hey,
> >> >> >
> >> >> > I'm trying to get the response of solr via QueryResponse using
> >> >> > QueryResponse queryResponse = client.query(solrParams); (where
> client
> >> >> is a
> >> >> > CloudSolrClient)
> >> >> >
> >> >> > The error it thows is:
> >> >> >
> >> >> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrE
> >> xception:
> >> >> > Error
> >> >> > from server at http://tywin:8983/solr/testcol
> >> lection1_shard1_replica1:
> >> >> > Expected mime type application/octet-stream but got text/plain.
> >> >> > {"result-set":{"docs":[
> >> >> > {"count(*)":5304,"d1":2},
> >> >> > {"count(*)":5160,"d1":1},
> >> >> > {"count(*)":5016,"d1":3},
> >> >> > {"count(*)":4893,"d1":4},
> >> >> > {"count(*)":4824,"d1":5},
> >> >> > {"EOF":true,"RESPONSE_TIME":11}]}}
> >> >> > at
> >> >> > org.apache.solr.client.solrj.impl.HttpSolrClient.
> >> >> > executeMethod(HttpSolrClient.java:558)
> >> >> >
> >> >> > Then I tryed to implement a custom ResponseParser that override the
> >> >> > getContentType() and returns "text/plain", but it returns another
> >> error.
> >> >> >
> >> >> > So... Is it a way to get the sql response via this method?
> >> >> >
> >> >> > I make it works via Connection and ResultSets, but I need to use
> the
> >> >> other
> >> >> > way (if possible).
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: Solr 6 Configuration - java.net.SocketTimeoutException

2016-08-15 Thread Pablo Anzorena

I think it is under solr_path/server/etc/ a file name jetty.xml



2016-08-15 10:01 GMT-03:00 Stan Lee :

> I currently doing a POC with SOLR 6 on my windows 7, with 16GB ram.
> Successfully imported 16 million of documents from SQL Server, where one of
> the SQL column is an XML.
> Whenever I query on the XML datatype (In solr, it's a text field), I keep
> getting SocketTimeoutException if the /select query goes behind the 1min
> mark. Can someone shed some light as to where I can config solr to go
> beyond the 1 min sockettimeout exception?
>
> Many thanks.
>
> Regards, Stan.
>

Rollback solrcloud

2016-10-06 Thread Pablo Anzorena

Hey,

I was trying to make a rollback under solrcloud and foundd that it's not
supported
https://issues.apache.org/jira/browse/SOLR-4895 (I have solr6.1.0)

So my question is, how can I simulate a rollback?
Actually what I'm doing is:

   1. prepareCommit
   2. add documents
   3. try to commit
   4. if success, then exit, else rollback.

I have to point out that it doesn't happen that multiple threads are
preparing commits nor adding documents, just single thread.

Thanks.

Posting files 405 http error

2016-11-01 Thread Pablo Anzorena

Hey,

I'm indexing a file with a delete query in xml format using the post.jar. I
have two solrclouds, which apparently have all the same configurations. The
thing is that I have no problem when indexing in one of them, but the other
keeps giving me this error:

SimplePostTool version 5.0.0
Posting files to [base] url
http://solr2:8983/solr/mycollection/update?separator=| using content-type
application/xml...
POSTing file delete_file.unl.tmp to [base]
SimplePostTool: WARNING: Solr returned an error #405 (Method Not Allowed)
for url: http://solr2:8983/solr/mycollection/update?separator=|
SimplePostTool: WARNING: Response:
ErrorHTTP method POST is not
supported by this URL
SimplePostTool: WARNING: IOException while reading response:
java.io.IOException: Server returned HTTP response code: 405 for URL:
http://solr2:8983/solr/mycollection/update?separator=|
1 files indexed.
Time spent: 0:00:00.253

Do I need some extra configuration to support for xml updates?

Thanks!

Re: Posting files 405 http error

2016-11-03 Thread Pablo Anzorena

Thanks for the answer.

I checked the log and it wasn't logging anything.

The error i'm facing is way bizarre... I create a new fresh collection and
then index with no problem, but it keeps throwing this error if i copy the
collection from one solrcloud to the other and then index.

Any clue on why is this happening?

2016-11-01 17:42 GMT-03:00 Erick Erickson :

> What does the solr log say? I'd tail the Solr log while
> sending the query, that'll do two things:
>
> 1> insure that your request is actually getting to the
> Solr you expect.
>
> 2> the details in the solr log are often much more helpful
> than what gets returned to the client.
>
> Best,
> Erick
>
> On Tue, Nov 1, 2016 at 1:37 PM, Pablo Anzorena 
> wrote:
> > Hey,
> >
> > I'm indexing a file with a delete query in xml format using the
> post.jar. I
> > have two solrclouds, which apparently have all the same configurations.
> The
> > thing is that I have no problem when indexing in one of them, but the
> other
> > keeps giving me this error:
> >
> > SimplePostTool version 5.0.0
> > Posting files to [base] url
> > http://solr2:8983/solr/mycollection/update?separator=| using
> content-type
> > application/xml...
> > POSTing file delete_file.unl.tmp to [base]
> > SimplePostTool: WARNING: Solr returned an error #405 (Method Not Allowed)
> > for url: http://solr2:8983/solr/mycollection/update?separator=|
> > SimplePostTool: WARNING: Response:
> > ErrorHTTP method POST is not
> > supported by this URL
> > SimplePostTool: WARNING: IOException while reading response:
> > java.io.IOException: Server returned HTTP response code: 405 for URL:
> > http://solr2:8983/solr/mycollection/update?separator=|
> > 1 files indexed.
> > Time spent: 0:00:00.253
> >
> > Do I need some extra configuration to support for xml updates?
> >
> > Thanks!
>

Re: Posting files 405 http error

2016-11-03 Thread Pablo Anzorena

Thanks Shawn.

Actually there is no load balancer or proxy in the middle, but even if
there was, how would you explain that I can index if a create a completely
new collection?

I figured out how to fix it. What I'm doing is creating a new collection,
then unloading it (by unloading all the shards/replicas), then copy the
data directory from the collection in the other solrcloud, and finally
creating again the collection. It's not the best solution, but it works,
nevertheless I still would like to know what's causing the problem...

It's worth to mention that I'm not using jetty, I'm using solr-undertow
https://github.com/kohesive/solr-undertow

2016-11-03 12:56 GMT-03:00 Shawn Heisey :

> On 11/3/2016 9:10 AM, Pablo Anzorena wrote:
> > Thanks for the answer.
> >
> > I checked the log and it wasn't logging anything.
> >
> > The error i'm facing is way bizarre... I create a new fresh collection
> and
> > then index with no problem, but it keeps throwing this error if i copy
> the
> > collection from one solrcloud to the other and then index.
> >
> > Any clue on why is this happening?
>
> Solr's source code doesn't seem to even have a 405 error, so I bet
> what's happening is that you have Solr sitting behind a proxy or load
> balancer, and that server doesn't like the request you sent, so it
> rejects it and Solr never receives anything.
>
> Here's an excerpt of code from the SolrException class in the master
> branch:
>
>   /**
>* This list of valid HTTP Status error codes that Solr may return in
>* the case of a "Server Side" error.
>*
>* @since solr 1.2
>*/
>   public enum ErrorCode {
> BAD_REQUEST( 400 ),
> UNAUTHORIZED( 401 ),
> FORBIDDEN( 403 ),
> NOT_FOUND( 404 ),
> CONFLICT( 409 ),
> UNSUPPORTED_MEDIA_TYPE( 415 ),
> SERVER_ERROR( 500 ),
> SERVICE_UNAVAILABLE( 503 ),
> INVALID_STATE( 510 ),
> UNKNOWN(0);
> public final int code;
>
> private ErrorCode( int c )
> {
>   code = c;
> }
> public static ErrorCode getErrorCode(int c){
>   for (ErrorCode err : values()) {
> if(err.code == c) return err;
>   }
>   return UNKNOWN;
> }
>   };
>
> Thanks,
> Shawn
>
>

Re: Posting files 405 http error

2016-11-03 Thread Pablo Anzorena

When I manually copy one collection to another, I copy the core.properties
from the source to the destination with the name core.properties.unloaded
so there is no problem.

So the steps I'm doing are:
1> index to my source collection.
2>Copy the directory of the source collection, excluding the
core.properties.
3>Copy the core.properties under the name of core.properties.unloaded to
the destination.
4>Create the collection in the destination.
5>Use the ADDREPLICA command to add replicas.

With these it throws the error.

They are very similar to those you mentioned, but instead you first create
the destination collection and then copy the data. The problem I face with
your approach is that unless I unload my collection, solr doesn't realize
there is data indexed.

2016-11-03 14:54 GMT-03:00 Erick Erickson :

> Wait. What were you doing originally? Just copying the entire
> SOLR_HOME over or something?
>
> Because one of the things each core carries along is a
> "core.properties" file that identifies
> 1> the name of the core, something like collection_shard1_replica1
> 2> the name of the collection the core belongs to
>
> So if you just copy a directory containing the core.properties file
> from one place to another _and_ they're pointing to the same Zookeeper
> then the behavior is undefined.
>
> And if you _don't_ point to the same zookeeper, your copied collection
> isn't registered with ZK so that's a weird state as well.
>
> If your goal is to move things from one collection to another, here's
> a possibility (assuming the nodes can all "see" each other).
>
> 1> index to your source collection
> 2> create a new destination collection
> 3a> use the "fetchindex" command to move the relevant indexes from the
> source to the destination, see
> https://cwiki.apache.org/confluence/display/solr/Index+
> Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler
> 3b> instead of <3a>, manually copy the data directory from the source
> to each replica.
> 4> In either 3a> or 3b>, it's probably easier to create a leader-only
> (replicationFactor=1) destination collection then use the ADDREPLICA
> command to add replicas, that way they'll all sync automatically.
>
> Best,
> Erick
>
> On Thu, Nov 3, 2016 at 10:28 AM, Pablo Anzorena 
> wrote:
> > Thanks Shawn.
> >
> > Actually there is no load balancer or proxy in the middle, but even if
> > there was, how would you explain that I can index if a create a
> completely
> > new collection?
> >
> > I figured out how to fix it. What I'm doing is creating a new collection,
> > then unloading it (by unloading all the shards/replicas), then copy the
> > data directory from the collection in the other solrcloud, and finally
> > creating again the collection. It's not the best solution, but it works,
> > nevertheless I still would like to know what's causing the problem...
> >
> > It's worth to mention that I'm not using jetty, I'm using solr-undertow
> > https://github.com/kohesive/solr-undertow
> >
> > 2016-11-03 12:56 GMT-03:00 Shawn Heisey :
> >
> >> On 11/3/2016 9:10 AM, Pablo Anzorena wrote:
> >> > Thanks for the answer.
> >> >
> >> > I checked the log and it wasn't logging anything.
> >> >
> >> > The error i'm facing is way bizarre... I create a new fresh collection
> >> and
> >> > then index with no problem, but it keeps throwing this error if i copy
> >> the
> >> > collection from one solrcloud to the other and then index.
> >> >
> >> > Any clue on why is this happening?
> >>
> >> Solr's source code doesn't seem to even have a 405 error, so I bet
> >> what's happening is that you have Solr sitting behind a proxy or load
> >> balancer, and that server doesn't like the request you sent, so it
> >> rejects it and Solr never receives anything.
> >>
> >> Here's an excerpt of code from the SolrException class in the master
> >> branch:
> >>
> >>   /**
> >>* This list of valid HTTP Status error codes that Solr may return in
> >>* the case of a "Server Side" error.
> >>*
> >>* @since solr 1.2
> >>*/
> >>   public enum ErrorCode {
> >> BAD_REQUEST( 400 ),
> >> UNAUTHORIZED( 401 ),
> >> FORBIDDEN( 403 ),
> >> NOT_FOUND( 404 ),
> >> CONFLICT( 409 ),
> >> UNSUPPORTED_MEDIA_TYPE( 415 ),
> >> SERVER_ERROR( 500 ),
> >> SERVICE_UNAVAILABLE( 503 ),
> >> INVALID_STATE( 510 ),
> >> UNKNOWN(0);
> >> public final int code;
> >>
> >> private ErrorCode( int c )
> >> {
> >>   code = c;
> >> }
> >> public static ErrorCode getErrorCode(int c){
> >>   for (ErrorCode err : values()) {
> >> if(err.code == c) return err;
> >>   }
> >>   return UNKNOWN;
> >> }
> >>   };
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>

Solr dynamic "on the fly fields"

2017-07-03 Thread Pablo Anzorena

Hey,

I was wondering if there is some way to add fields "on the fly" based on
arithmetic operations on other fields. For example add a new field
"custom_field" = log(field1) + field2 -5.

Thanks.

Re: Solr dynamic "on the fly fields"

2017-07-03 Thread Pablo Anzorena

Thanks Erick,

For my use case it's not possible any of those solutions. I have a
multitenancy scheme in the most basic level, that is I have a single
collection with fields (clientId, field1, field2, ..., field50) attending
many clients.

Clients can create custom fields based on arithmetic operations of any
other field.

So, is it possible to update let's say field49 with the follow operation:
log(field39) + field25 on clientId=43?

Do field39 and field25 need to be stored to accomplish this? Is there any
other way to avoid storing them?

Thanks!

2017-07-03 15:00 GMT-03:00 Erick Erickson :

> There are two ways:
> 1> define a dynamic field pattern, i.e.
>
> 
>
> Now just add any field in the doc you want. If it ends in "_sum" and
> no other explicit field matches you have a new field.
>
> 2> Use the managed schema to add these on the fly. I don't recommend
> this from what I know of your use case, this is primarily intended for
> front-ends to be able to modify the schema and/or "field guessing".
>
> I do caution you though that either way don't go over-the-top. If
> you're thinking of thousands of different fields that can lead to
> performance issues.
>
> You can either put stuff in the field on your indexing client or
> create a custom update component, perhaps the simplest would be a
> "StatelessScriptUpdateProcessorFactory:
>
> see: https://cwiki.apache.org/confluence/display/solr/
> Update+Request+Processors#UpdateRequestProcessors-
> UpdateRequestProcessorFactories
>
> Best,
> Erick
>
> On Mon, Jul 3, 2017 at 10:52 AM, Pablo Anzorena 
> wrote:
> > Hey,
> >
> > I was wondering if there is some way to add fields "on the fly" based on
> > arithmetic operations on other fields. For example add a new field
> > "custom_field" = log(field1) + field2 -5.
> >
> > Thanks.
>

Re: Solr dynamic "on the fly fields"

2017-07-05 Thread Pablo Anzorena

Thanks Erick for the answer. Function Queries are great, but for my use
case what I really do is making aggregations (using Json Facet for example)
with this functions.

I have tried using Function Queries with Json Facet but it does not support
it.

Any other idea you can imagine?





2017-07-03 21:57 GMT-03:00 Erick Erickson :

> I don't know how one would do this. But I would ask what the use-case
> is. Creating such fields at index time just seems like it would be
> inviting abuse by creating a zillion fields as you have no control
> over what gets created. I'm assuming your tenants don't talk to each
> other
>
> Have you thought about using function queries to pull this data out as
> needed at _query_ time? See:
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
> Best,
> Erick
>
> On Mon, Jul 3, 2017 at 12:06 PM, Pablo Anzorena 
> wrote:
> > Thanks Erick,
> >
> > For my use case it's not possible any of those solutions. I have a
> > multitenancy scheme in the most basic level, that is I have a single
> > collection with fields (clientId, field1, field2, ..., field50) attending
> > many clients.
> >
> > Clients can create custom fields based on arithmetic operations of any
> > other field.
> >
> > So, is it possible to update let's say field49 with the follow operation:
> > log(field39) + field25 on clientId=43?
> >
> > Do field39 and field25 need to be stored to accomplish this? Is there any
> > other way to avoid storing them?
> >
> > Thanks!
> >
> >
> > 2017-07-03 15:00 GMT-03:00 Erick Erickson :
> >
> >> There are two ways:
> >> 1> define a dynamic field pattern, i.e.
> >>
> >> 
> >>
> >> Now just add any field in the doc you want. If it ends in "_sum" and
> >> no other explicit field matches you have a new field.
> >>
> >> 2> Use the managed schema to add these on the fly. I don't recommend
> >> this from what I know of your use case, this is primarily intended for
> >> front-ends to be able to modify the schema and/or "field guessing".
> >>
> >> I do caution you though that either way don't go over-the-top. If
> >> you're thinking of thousands of different fields that can lead to
> >> performance issues.
> >>
> >> You can either put stuff in the field on your indexing client or
> >> create a custom update component, perhaps the simplest would be a
> >> "StatelessScriptUpdateProcessorFactory:
> >>
> >> see: https://cwiki.apache.org/confluence/display/solr/
> >> Update+Request+Processors#UpdateRequestProcessors-
> >> UpdateRequestProcessorFactories
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Jul 3, 2017 at 10:52 AM, Pablo Anzorena <
> anzorena.f...@gmail.com>
> >> wrote:
> >> > Hey,
> >> >
> >> > I was wondering if there is some way to add fields "on the fly" based
> on
> >> > arithmetic operations on other fields. For example add a new field
> >> > "custom_field" = log(field1) + field2 -5.
> >> >
> >> > Thanks.
> >>
>

How difficult is adding a new aggregate function to Json Facet?

2017-07-15 Thread Pablo Anzorena

Hey,

it has been awhile since I'm using Solr, but I have never see the code. Now
I'm in the need to add some capability to Solr.

What I need to add is another aggregate function to Json Facet API, for
example "sum(field1)-sum(field2)". How hard do you think it would be? Also,
it would be great if you give me some advice on where to start.

Thanks!

Re: Configuring Solr for Maximum Concurrency

2016-12-28 Thread Pablo Anzorena

Dave,

there is something similar like MAX_CONNECTIONS and
MAX_CONNECTIONS_PER_HOST which control the number of connections.

Are you leaving open the connection to zookeeper after you establish it?
Are you using the singleton pattern?

2016-12-28 14:14 GMT-03:00 Dave Seltzer :

> Hi Erick,
>
> I'll dig in on these timeout settings and see how changes affect behavior.
>
> One interesting aspect is that we're not indexing any content at the
> moment. The rate of ingress is something like 10 to 20 documents per day.
>
> So my guess is that ZK simply is deciding that these servers are dead based
> on the fact that responses are so very sluggish.
>
> You've mentioned lots of timeouts, but are there any settings which control
> the number of available threads? Or is this something which is largely
> handled automagically?
>
> Many thanks!
>
> -Dave
>
> On Wed, Dec 28, 2016 at 11:56 AM, Erick Erickson 
> wrote:
>
> > Dave:
> >
> > There are at least 4 timeouts (not even including ZK) that can
> > be relevant, defined in solr.xml:
> > socketTimeout
> > connTimeout
> > distribUpdateConnTimeout
> > distribUpdateSoTimeout
> >
> > Plus the ZK timeout
> > zkClientTimeout
> >
> > Plus the ZK configurations.
> >
> > So it would help narrow down what's going on if we knew why the nodes
> > dropped out. There are indeed a lot of messages dumped, but somewhere
> > in the logs there should be a root cause.
> >
> > You might see Leader Initiated Recovery (LIR) which can indicate that
> > an update operation from the leader took too long, the timeouts above
> > can be adjusted in this case.
> >
> > You might see evidence that ZK couldn't get a response from Solr in
> > "too long" and decided it was gone.
> >
> > You might see...
> >
> > One thing I'd look at very closely is GC processing. One of the
> > culprits for this behavior I've seen is a very long GC stop-the-world
> > pause leading to ZK thinking the node is dead and tripping this chain.
> > Depending on the timeouts, "very long" might be a few seconds.
> >
> > Not entirely helpful, but until you pinpoint why the node goes into
> > recovery it's throwing darts at the wall. GC and log messages might
> > give some insight into the root cause.
> >
> > Best,
> > Erick
> >
> > On Wed, Dec 28, 2016 at 8:26 AM, Dave Seltzer 
> wrote:
> > > Hello Everyone,
> > >
> > > I'm working on a Solr Cloud cluster which is used in a hash matching
> > > application.
> > >
> > > For performance reasons we've opted to batch-execute hash matching
> > queries.
> > > This means that a single query will contain many nested queries. As you
> > > might expect, these queries take a while to execute. (On the order of 5
> > to
> > > 10 seconds.)
> > >
> > > I've noticed that Solr will act erratically when we send too many
> > > long-running queries. Specifically, heavily-loaded servers will
> > repeatedly
> > > fall out of the cluster and then recover. My theory is that there's
> some
> > > limit on the number of concurrent connections and that client queries
> are
> > > preventing zookeeper related queries... but I'm not sure. I've
> increased
> > > ZKClientTimeout to combat this.
> > >
> > > My question is: What configuration settings should I be looking at in
> > order
> > > to make sure I'm maximizing the ability of Solr to handle concurrent
> > > requests.
> > >
> > > Many thanks!
> > >
> > > -Dave
> >
>

Pagination bug? when sorting by a field (not unique field)

2017-03-29 Thread Pablo Anzorena

Hey,

I was paginating the results of a query and noticed that some documents
were repeated across pagination buckets of 100 rows.
When I sort by the unique field there is no repeated document but when I
sort by another field then repeated documents appear.
I assume is a bug and it's not the intended behaviour, right?

Solr version:5.2.1

Regards,
Pablo.

Re: Pagination bug? when sorting by a field (not unique field)

2017-03-29 Thread Pablo Anzorena

Let me try. It is really hard to replicate it, but I will try out and come
back when i got it.

2017-03-29 9:40 GMT-03:00 Erik Hatcher :

> Certainly not intended behavior.  Can you show us a way to replicate the
> issue?
>
>
> > On Mar 29, 2017, at 8:35 AM, Pablo Anzorena 
> wrote:
> >
> > Hey,
> >
> > I was paginating the results of a query and noticed that some documents
> > were repeated across pagination buckets of 100 rows.
> > When I sort by the unique field there is no repeated document but when I
> > sort by another field then repeated documents appear.
> > I assume is a bug and it's not the intended behaviour, right?
> >
> > Solr version:5.2.1
> >
> > Regards,
> > Pablo.
>
>

Re: Pagination bug? when sorting by a field (not unique field)

2017-03-29 Thread Pablo Anzorena

Shawn,

Yes, the field has duplicate values and yes, if I add the secondary sort by
the uniqueKey it solve the issue.

Those 2 situations you mentioned are not occurring, none of them. The index
is replicated, but not sharded.

Does solr sort by an internal id if no uniqueKey is present in the sort?

2017-03-29 9:58 GMT-03:00 Shawn Heisey :

> On 3/29/2017 6:35 AM, Pablo Anzorena wrote:
> > I was paginating the results of a query and noticed that some
> > documents were repeated across pagination buckets of 100 rows. When I
> > sort by the unique field there is no repeated document but when I sort
> > by another field then repeated documents appear. I assume is a bug and
> > it's not the intended behaviour, right?
>
> There is a potential situation that can cause this problem that is NOT a
> bug.
>
> If the field you are sorting on contains duplicate values (same value in
> multiple documents), then I am pretty sure that the sort order of
> documents with the same value in the sort field is non-deterministic in
> these situations:
>
> 1) A distributed (sharded) index.
> 2) When the index contents can change between a request for one page and
> a request for the next page -- documents being added, deleted, or changed.
>
> Because the sort order of documents with the same value can change, one
> document that may have ended up on the first page on the first query may
> end up on the second page on the second query.
>
> Sorting by a field with no duplicate values (the unique field you
> mentioned) will always result in the exact same sort order ... but if
> you add documents that sort to near the start of the sort order between
> queries, the behavior you have noticed can still happen.
>
> If this is what you are encountering, adding secondary sort on the
> uniqueKey field would probably clear up the problem.  If your uniqueKey
> field is "id", something like this:
>
> sort=someField desc,id desc
>
> Thanks,
> Shawn
>
>

Re: Pagination bug? when sorting by a field (not unique field)

2017-03-29 Thread Pablo Anzorena

Mikhall,

effectively maxDocs are different and also deletedDocs, but numDocs are ok.

I don't really get it, but can that be the problem?

2017-03-29 10:35 GMT-03:00 Mikhail Khludnev :

> Can it happen that replicas are different by deleted docs? I mean numDocs
> is the same, but maxDocs is different by number of deleted docs, you can
> see it in solr admin at the core page.
>
> On Wed, Mar 29, 2017 at 4:16 PM, Pablo Anzorena 
> wrote:
>
> > Shawn,
> >
> > Yes, the field has duplicate values and yes, if I add the secondary sort
> by
> > the uniqueKey it solve the issue.
> >
> > Those 2 situations you mentioned are not occurring, none of them. The
> index
> > is replicated, but not sharded.
> >
> > Does solr sort by an internal id if no uniqueKey is present in the sort?
> >
> > 2017-03-29 9:58 GMT-03:00 Shawn Heisey :
> >
> > > On 3/29/2017 6:35 AM, Pablo Anzorena wrote:
> > > > I was paginating the results of a query and noticed that some
> > > > documents were repeated across pagination buckets of 100 rows. When I
> > > > sort by the unique field there is no repeated document but when I
> sort
> > > > by another field then repeated documents appear. I assume is a bug
> and
> > > > it's not the intended behaviour, right?
> > >
> > > There is a potential situation that can cause this problem that is NOT
> a
> > > bug.
> > >
> > > If the field you are sorting on contains duplicate values (same value
> in
> > > multiple documents), then I am pretty sure that the sort order of
> > > documents with the same value in the sort field is non-deterministic in
> > > these situations:
> > >
> > > 1) A distributed (sharded) index.
> > > 2) When the index contents can change between a request for one page
> and
> > > a request for the next page -- documents being added, deleted, or
> > changed.
> > >
> > > Because the sort order of documents with the same value can change, one
> > > document that may have ended up on the first page on the first query
> may
> > > end up on the second page on the second query.
> > >
> > > Sorting by a field with no duplicate values (the unique field you
> > > mentioned) will always result in the exact same sort order ... but if
> > > you add documents that sort to near the start of the sort order between
> > > queries, the behavior you have noticed can still happen.
> > >
> > > If this is what you are encountering, adding secondary sort on the
> > > uniqueKey field would probably clear up the problem.  If your uniqueKey
> > > field is "id", something like this:
> > >
> > > sort=someField desc,id desc
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: Pagination bug? when sorting by a field (not unique field)

2017-04-01 Thread Pablo Anzorena

Excellent guns, thank you very much!

El mar. 29, 2017 18:09, "Erick Erickson"  escribió:

> You might be helped by "distributed IDF".
> see: SOLR-1632
>
> On Wed, Mar 29, 2017 at 1:56 PM, Chris Hostetter
>  wrote:
> >
> > The thing to keep in mind, is that w/o a fully deterministic sort,
> > the underlying problem statement "doc may appera on multiple pages" can
> > exist even in a single node solr index, even if no documents are
> > added/deleted between bage requests: because background merges /
> > searcher re-opening may happen in between those page requests.
> >
> > The best practice, if you really care about ensuring no (non-updated) doc
> > is ever returned twice in subsequent pages, is to to use a fully
> > deterministic sort, with a "tie breaker" clause that is unique to every
> > document (ie: uniqueKey field)
> >
> >
> >
> > : Date: Wed, 29 Mar 2017 23:14:22 +0300
> > : From: Mikhail Khludnev 
> > : Reply-To: solr-user@lucene.apache.org
> > : To: solr-user 
> > : Subject: Re: Pagination bug? when sorting by a field (not unique field)
> > :
> > : Great explanation, Alessandro!
> > :
> > : Let me briefly explain my experience. I have a tiny test with 2 shards
> and
> > : 2 replicas, index about a hundred of docs. And then when I fully
> paginate
> > : search results with score ranking, I've got duplicates across pages.
> And
> > : the reason is deletes, which occur probably due to update/failover.
> Every
> > : paging request lands to the different replica. There are a few
> workarounds:
> > : lands consequent requests to the same replicas; also  fixes
> > : duplicates; but tie-breaking is the best way for sure.
> > :
> > : On Wed, Mar 29, 2017 at 7:10 PM, alessandro.benedetti <
> a.benede...@sease.io>
> > : wrote:
> > :
> > : > The reason Mikhail mentioned that, is probably related to :
> > : >
> > : > *The way how number of document calculated is changed (LUCENE-6711)*
> > : > /The number of documents (docCount) is used to calculate term
> specificity
> > : > (idf) and average document length (avdl). Prior to LUCENE-6711,
> > : > collectionStats.maxDoc() was used for the statistics. Now,
> > : > collectionStats.docCount() is used whenever possible, if not
> maxDocs() is
> > : > used.
> > : > Assume that a collection contains 100 documents, and 50 of them have
> > : > "keywords" field. In this example, maxDocs is 100 while docCount is
> 50 for
> > : > the "keywords" field. The total number of tokens for "keywords"
> field is
> > : > divided by docCount to obtain avdl. Therefore, docCount which is the
> total
> > : > number of documents that have at least one term for the field, is a
> more
> > : > precise metric for optional fields.
> > : > DefaultSimilarity does not leverage avdl, so this change would have
> > : > relatively minor change in the result list. Because relative idf
> values of
> > : > terms will remain same. However, when combined with other factors
> such as
> > : > term frequency, relative ranking of documents could change. Some
> Similarity
> > : > implementations (such as the ones instantiated with NormalizationH2
> and
> > : > BM25) take account into avdl and would have notable change in ranked
> list.
> > : > Especially if you have a collection of documents with varying
> lengths.
> > : > Because NormalizationH2 tends to punish documents longer than avdl./
> > : >
> > : > This means that if you are load balancing, the page 2 query could go
> to
> > : > another replica, where the doc is scored differently, ending up on a
> > : > different position ( and maybe appearing again as a final effect).
> > : > This scenario is referred to scored ranking, so it will not affect
> sorting
> > : > (
> > : > and I believe in your initial mail you were referring not to sorting)
> > : >
> > : > Cheers
> > : >
> > : >
> > : > Pablo wrote
> > : > > Mikhall,
> > : > >
> > : > > effectively maxDocs are different and also deletedDocs, but
> numDocs are
> > : > > ok.
> > : > >
> > : > > I don't really get it, but can that be the problem?
> > : >
> > : >
> > : >
> > : >
> > : >
> > : > -
> > : > ---
> > : > Alessandro Benedetti
> > : > Search Consultant, R&D Software Engineer, Director
> > : > Sease Ltd. - www.sease.io
> > : > --
> > : > View this message in context: http://lucene.472066.n3.
> > : > nabble.com/Pagination-bug-when-sorting-by-a-field-not-unique-field-
> > : > tp4327408p4327461.html
> > : > Sent from the Solr - User mailing list archive at Nabble.com.
> > : >
> > :
> > :
> > :
> > : --
> > : Sincerely yours
> > : Mikhail Khludnev
> > :
> >
> > -Hoss
> > http://www.lucidworks.com/
>

How to use Wordnet in solr?

2017-04-21 Thread Pablo Anzorena

Hey,

I'm planning to use Wordnet and I want to know how.

There's a class called *WordnetSynonymParser *, does anybody use it? It
says it is experimental...

I'm using solr 5.2.1

Briefly speaking about my needs:
I have different collections in different languages (fr, pr, sp, en).
When the user search for example in the english collection the word
"furnitures" I want to look for "table", "chair", "furniture"(without the
plural) and all the synonyms of "furnitures". Wordnet already provides me
all this and in different languages, that's why it would be great to have
solr using it.

Regards,
Pablo.

Re: How to use Wordnet in solr?

2017-04-21 Thread Pablo Anzorena

Thanks to everybody.

I will try first Alessandro and Steve recommendation.

If i don't misunderstood, you are telling me that I have to customize the
prolog files to "solr txt synonyms syntax"? If that is correct, what is the
point of format:wordnet ?

2017-04-21 12:52 GMT-03:00 alessandro.benedetti :

> Hi Pablo,
> with wordnet format , Solr will just parse synonyms from a different file
> format [1] .
> The rest will work exactly the same.
> You will use a managed resource to load the file and then potentially
> update
> it.
> If you were thinking to use directly the online resource, you may need to
> customize it a bit.
>
> Cheers
>
> [1] http://wordnet.princeton.edu/man/prologdb.5WN.html
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/How-to-use-Wordnet-in-solr-tp4331273p4331306.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

RE: NoClassDefFoundError while indexing in Solr

2014-07-23 Thread Pablo Queixalos

There is a source code "parser" in tika that in fact just renders the source 
using an external source higlighter.

Seen in you stack trace : 
com.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)

You are indexing code (java, c or groovy). Solr seems to be missing a 
transitive tika dependency (http://freecode.com/projects/jhighlight).

Copying the lib in solr runtime lib directory should solve your issue.


Pablo.

From: Shalin Shekhar Mangar 
Sent: Wednesday, July 23, 2014 7:43 AM
To: solr-user@lucene.apache.org
Subject: Re: NoClassDefFoundError while indexing in Solr

Solr is trying to load "com/uwyn/jhighlight/renderer/XhtmlRendererFactory"
but that is not a class which is shipped or used by Solr. I think you have
some custom plugins (a highlighter perhaps?) which uses that class and the
classpath is not setup correctly.


On Wed, Jul 23, 2014 at 2:20 AM, Ameya Aware  wrote:

> Hi
>
> I am running into below error while indexing a file in solr.
>
> Can you please help to fix this?
>
> ERROR - 2014-07-22 16:40:32.126; org.apache.solr.common.SolrException;
> null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
> com/uwyn/jhighlight/renderer/XhtmlRendererFactory
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
>
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.NoClassDefFoundError:
> com/uwyn/jhighlight/renderer/XhtmlRendererFactory
> at
>
> org.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)
> at
>
> org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:102)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
>
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at
>
> org.apache.solr.core.RequestHandler

How to return a function result instead of doclist in the Solr collapsing/grouping feature?

2011-09-12 Thread Pablo Ricco

I have the following solr fields in schema.xml:

   - id (string)
   - name (string)
   - category(string)
   - latitude (double)
   - longitude(double)

Is it possible to make a query that groups by category and returns the
average of latitude and longitude instead of the doclist?

Thanks,
Pablo

Get field value in custom searchcomponent (solr 3.3)

2011-09-13 Thread Pablo Ricco

What is the best way to get a float field value from docID?
I tried the following code but when it runs throws an exception For input
string: "`??eI" at line float lat = Float.parseFloat(tlat);

schemal.xml:
...

...


component.java:

@Override
public void process(ResponseBuilder rb) throws IOException {
DocSet docs = rb.getResults().docSet;
SolrIndexSearcher searcher = req.getSearcher()
FieldCache.StringIndex slat =
FieldCache.DEFAULT.getStringIndex(searcher.getReader(), "latitude");
DocIterator iter = docs.iterator(); while (iter.hasNext()) {
 int docID = iter.nextDoc(); String tlat = slat.lookup[slat.order[docID]];
if (tlat != null) {
 float lat = Float.parseFloat(tlat); //Exception!
}
}
}

Thanks,
Pablo

Re: Implementing Search Suggestion on Solr

2010-10-19 Thread Pablo Recio

Yeah, I know.

Does anyone could tell me wich one is the good way?

Regards,
> What an interesting application :-)
>
> Dennis Gearon
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so you do not have to make them
yourself. from '
http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
> --- On Mon, 10/18/10, Pablo Recio Quijano  wrote:
>
>> From: Pablo Recio Quijano 
>> Subject: Implementing Search Suggestion on Solr
>> To: solr-user@lucene.apache.org
>> Date: Monday, October 18, 2010, 3:53 AM
>> Hi!
>>
>> I'm trying to implement some kind of Search Suggestion on a
>> search engine I have implemented. This search suggestions
>> should not be automatically like the one described for the
>> SpellCheckComponent [1]. I'm looking something like:
>>
>> "SAS oppositions" => "Public job offers for
>> some-company"
>>
>> So I will have to define it manually. I was thinking about
>> synonyms [2] but I don't know if it's the proper way to do
>> it, because semantically those terms are not synonyms.
>>
>> Any ideas or suggestions?
>>
>> Regards,
>>
>> [1] http://wiki.apache.org/solr/SpellCheckComponent
>> [2]
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>

Re: Implementing Search Suggestion on Solr

2010-10-27 Thread Pablo Recio

Hi,

I don't want to be annoying, but I'm looking for a way to do that.

I repeat the question: is there a way to implement Search Suggestion
manually?

Thanks in advance.
Regards,

2010/10/18 Pablo Recio Quijano 

> Hi!
>
> I'm trying to implement some kind of Search Suggestion on a search engine I
> have implemented. This search suggestions should not be automatically like
> the one described for the SpellCheckComponent [1]. I'm looking something
> like:
>
> "SAS oppositions" => "Public job offers for some-company"
>
> So I will have to define it manually. I was thinking about synonyms [2] but
> I don't know if it's the proper way to do it, because semantically those
> terms are not synonyms.
>
> Any ideas or suggestions?
>
> Regards,
>
> [1] http://wiki.apache.org/solr/SpellCheckComponent
> [2]
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>

Re: Implementing Search Suggestion on Solr

2010-10-27 Thread Pablo Recio

Thanks, it's not what I'm looking for.

Actually I need something like search "Ubuntu" and it will prompt "Maybe you
will like 'Debian' too" or something like that. I'm not trying to do it
automatically, manually will be ok.

Anyway, is good article you shared, maybe I will implement it, thanks!

2010/10/27 Jakub Godawa 

> I am a real rookie at solr, but try this:
> http://solr.pl/2010/10/18/solr-and-autocomplete-part-1/?lang=en
>
> 2010/10/27 Pablo Recio 
>
> > Hi,
> >
> > I don't want to be annoying, but I'm looking for a way to do that.
> >
> > I repeat the question: is there a way to implement Search Suggestion
> > manually?
> >
> > Thanks in advance.
> > Regards,
> >
> > 2010/10/18 Pablo Recio Quijano 
> >
> > > Hi!
> > >
> > > I'm trying to implement some kind of Search Suggestion on a search
> engine
> > I
> > > have implemented. This search suggestions should not be automatically
> > like
> > > the one described for the SpellCheckComponent [1]. I'm looking
> something
> > > like:
> > >
> > > "SAS oppositions" => "Public job offers for some-company"
> > >
> > > So I will have to define it manually. I was thinking about synonyms [2]
> > but
> > > I don't know if it's the proper way to do it, because semantically
> those
> > > terms are not synonyms.
> > >
> > > Any ideas or suggestions?
> > >
> > > Regards,
> > >
> > > [1] http://wiki.apache.org/solr/SpellCheckComponent
> > > [2]
> > >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> > >
> >
>

Possible bug in query sorting

2010-10-28 Thread Pablo Recio

Hi all. I'm having a problem with solr sorting search results.

When I try to make a query and sort it by title:

http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&sort=title%20desc

I get that error [1]. If I try to sort by other indexed field it works, indeed
if I change in solr schema title name to titlx, for example, it works.

It's a bug? Anyone has the same problem?

[1] HTTP ERROR: 500

501

java.lang.ArrayIndexOutOfBoundsException: 501
at 
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:721)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692)
at 
org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667)
at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

RequestURI=*/solr/select/*

Powered by Jetty://

Re: Possible bug in query sorting

2010-10-29 Thread Pablo Recio

That's my schema XML:



 
   
   
   
   
   
 
   
   
   
   
 
 
   
   
   
   
   
 
   
 

 
  
  
  
  
  
  
 

 link
 text

 

 
 
 ...




2010/10/28 Gora Mohanty 

> On Thu, Oct 28, 2010 at 5:18 PM, Michael McCandless
>  wrote:
> > Is it somehow possible that you are trying to sort by a multi-valued
> field?
> [...]
>
> Either that, or or your field gets processed into multiple tokens via the
> analyzer/tokenizer path in your schema. The reported error is a
> consequence of the fact that different documents might result in a
> different number of tokens.
>
> Please show us the part of schema.xml that defines the field type for
> the field "title".
>
> Regards,
> Gora
>

need some guidance about how to configure a specific solr solution.

2011-08-11 Thread Roman, Pablo

Hi There,

I am IT and  work on a project based on Liferary 605 with solr-3.2 like the 
indexer/search engine.

I presently have only one server that is indexing and searching but reading the 
Liferay Support suggestions they point to the need of having:
- 2 to n SOLR read-server for searching from any member of the liferay cluster
- 1 SOLR write-server where all liferay cluster members write.

However, going down to detail to implement that on the liferay side I think I 
know how to do that which is inserting into the plugin for Solr this entries

 solr-spring.xml in the WEB-INF/classes/META-INF folder. Open this file in a 
text editor and you will see that there are two entries which define where the 
Solr server can be found by Liferay:

 http://localhost:8080/solr/select"; />   
http://localhost:8080/solr/update"; /> 

However, I don't know how to replicate the writer solr server content into the 
readers. Please can you provide advice about that?

Thanks,
Pablo

This e-mail may contain confidential and/or privileged information for the sole 
use of the intended recipient. 
Any review or distribution by anyone other than the person for whom it was 
originally intended is strictly prohibited. 
If you have received this e-mail in error, please contact the sender and delete 
all copies. 
Opinions, conclusions or other information contained in this e-mail may not be 
that of the organization.

Clustering not working when using 'text' field as snippet.

2011-08-12 Thread Pablo Queixalos

Hi,

 

 

I am using solr-3.3.0 and carrot² clustering which works fine out of the box 
with the examples doc and default solr configuration (the 'features' Field is 
used as snippet).

 

I indexed my own documents using the embed ExtractingRequestHandler wich by 
default stores contents in the 'text' Field. When configuring clustering on 
'text' as snippet, carrot doesn't work fine and only shows 'Other topics' with 
all the documents within. It looks like carrot doesn't get the 'text' Field 
stored content.

 

 

If I store the documents content in the 'features' field and get back to the 
original configuration clustering works fine.

 

The only difference I see between 'text' and 'features' Fields in schema.xml is 
that some CopyFields are defined for 'text'.

 

 

I didn't debug solr.clustering.ClusteringComponent nor CarrotClusteringEngine 
yet, but am I misunderstanding something about the 'text' Field ? 

 

 

Thanks,

 

Pablo.

RE: Clustering not working when using 'text' field as snippet.

2011-08-12 Thread Pablo Queixalos

Thanks for your reply Staszek,


Of course, the field has to be stored. I forgot to mention that I already 
updated the schema for that. I also checked that data was effectiveley stored 
in that field. 

Anyway, I tried to reproduce it on a fresh Solr install and clustering works 
well. ;-)


Pablo.

-Message d'origine-
De : stac...@gmail.com [mailto:stac...@gmail.com] De la part de Stanislaw 
Osinski
Envoyé : vendredi 12 août 2011 11:00
À : solr-user@lucene.apache.org
Objet : Re: Clustering not working when using 'text' field as snippet.

Hi Pablo,

The reason clustering doesn't work with the "text" field is that the field is 
not stored:

 

For clustering to work, you'll need to keep your documents' titles and content 
in stored fields.

Staszek


On Fri, Aug 12, 2011 at 10:28, Pablo Queixalos  wrote:

> Hi,
>
>
>
>
>
> I am using solr-3.3.0 and carrot² clustering which works fine out of 
> the box with the examples doc and default solr configuration (the 'features'
> Field is used as snippet).
>
>
>
> I indexed my own documents using the embed ExtractingRequestHandler 
> wich by default stores contents in the 'text' Field. When configuring 
> clustering on 'text' as snippet, carrot doesn't work fine and only shows 
> 'Other topics'
> with all the documents within. It looks like carrot doesn't get the 'text'
> Field stored content.
>
>
>
>
>
> If I store the documents content in the 'features' field and get back 
> to the original configuration clustering works fine.
>
>
>
> The only difference I see between 'text' and 'features' Fields in 
> schema.xml is that some CopyFields are defined for 'text'.
>
>
>
>
>
> I didn't debug solr.clustering.ClusteringComponent nor 
> CarrotClusteringEngine yet, but am I misunderstanding something about 
> the 'text' Field ?
>
>
>
>
>
> Thanks,
>
>
>
> Pablo.
>
>

Re: Adding docs from MySQL and php

2009-09-01 Thread Pablo Ferrari

Thanks Aakash!

I've looked at it and it looks very interesting, the problem is that my
database is a relational model, therefore I don't have a table with all the
information, but many tables related to each other by their ids (primary
keys and foreign keys).

I've been thinking about using DataImportHandler in any of this two ways:
- Write a script that creates a table with all the information I need for
searching (it is not very efficient because of duplicate data)
- Configure DataImportHandler with some JOIN SQL statement

I'll let you know how I did, thanks again!

Pablo

2009/9/1 Aakash Dharmadhikari 

> hi Pablo,
>
>  DataImportHandler might be the best option for you. check this link
> http://wiki.apache.org/solr/DataImportHandler
>
> regards,
> aakash
>
> On Tue, Sep 1, 2009 at 9:18 PM, Pablo Ferrari  >wrote:
>
> > Hello all,
> >
> > I'm new to the list and new to Solr. My name is Pablo, I'm from Spain and
> > I'm developing a web site using Solr.
> >
> > I have Solr with the examples working correctly and now I would like to
> > load
> > the data from a MySQL database using php.
> > Is the best way to do this to write a php script that get the info from
> the
> > MySQL and then generates an XML document to load into Solr? Is there a
> > maximum size for this XML document? My MySQL database is quite big...
> >
> > Any help, book or internet tutorial you know will be really appreciated.
> >
> > Thank you!
> >
> > Pablo
> >
>

Re: Adding docs from MySQL and php

2009-09-01 Thread Pablo Ferrari

wow, it looks like DIH already works with relational databases... thanks
again!

2009/9/1 Pablo Ferrari 

> Thanks Aakash!
>
> I've looked at it and it looks very interesting, the problem is that my
> database is a relational model, therefore I don't have a table with all the
> information, but many tables related to each other by their ids (primary
> keys and foreign keys).
>
> I've been thinking about using DataImportHandler in any of this two ways:
> - Write a script that creates a table with all the information I need for
> searching (it is not very efficient because of duplicate data)
> - Configure DataImportHandler with some JOIN SQL statement
>
> I'll let you know how I did, thanks again!
>
> Pablo
>
> 2009/9/1 Aakash Dharmadhikari 
>
> hi Pablo,
>>
>>  DataImportHandler might be the best option for you. check this link
>> http://wiki.apache.org/solr/DataImportHandler
>>
>> regards,
>> aakash
>>
>> On Tue, Sep 1, 2009 at 9:18 PM, Pablo Ferrari > >wrote:
>>
>> > Hello all,
>> >
>> > I'm new to the list and new to Solr. My name is Pablo, I'm from Spain
>> and
>> > I'm developing a web site using Solr.
>> >
>> > I have Solr with the examples working correctly and now I would like to
>> > load
>> > the data from a MySQL database using php.
>> > Is the best way to do this to write a php script that get the info from
>> the
>> > MySQL and then generates an XML document to load into Solr? Is there a
>> > maximum size for this XML document? My MySQL database is quite big...
>> >
>> > Any help, book or internet tutorial you know will be really appreciated.
>> >
>> > Thank you!
>> >
>> > Pablo
>> >
>>
>
>

PhP, Solr and Delta Imports

2009-11-16 Thread Pablo Ferrari

Hello,

I have an already working Solr service based un full imports connected via
php to a Zend Framework MVC (I connect it directly to the Controller).
I use the SolrClient class for php which is great:
http://www.php.net/manual/en/class.solrclient.php

For now on, every time I want to edit a document I have to do a full import
again or I can delete the document by its id and add it again with the
updated info...
Anyone can guide me a bit in how to do delta imports? If its via php,
better!

Thanks in advance,

Pablo Ferrari
Tinkerlabs.net

Control DIH from PHP

2009-11-19 Thread Pablo Ferrari

Hello!

After been working in Solr documents updates using direct php code (using
SolrClient class) I want to use the DIH (Data Import Handler) to update my
documents.

Any one knows how can I send commands to the DIH from php? Any idea or
tutorial will be of great help because I'm not finding anything useful so
far.

Thank you for you time!

Pablo
Tinkerlabs

Re: Control DIH from PHP

2009-11-19 Thread Pablo Ferrari

Most specificly, I'm looking to update only one document using it's Unique
ID: I dont want the DIH to lookup the whole database because I already know
the Unique ID that has changed.

Pablo

2009/11/19 Pablo Ferrari 

>
>
> Hello!
>
> After been working in Solr documents updates using direct php code (using
> SolrClient class) I want to use the DIH (Data Import Handler) to update my
> documents.
>
> Any one knows how can I send commands to the DIH from php? Any idea or
> tutorial will be of great help because I'm not finding anything useful so
> far.
>
> Thank you for you time!
>
> Pablo
> Tinkerlabs
>

Re: Control DIH from PHP

2009-11-23 Thread Pablo Ferrari

Thankyou

2009/11/21 Lance Norskog 

> Nice! I didn't notice that before. Very useful.
>
> 2009/11/19 Noble Paul നോബിള്‍  नोब्ळ् :
> > you can pass the uniqueId as a param and use it in a sql query
> >
> http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
> .
> > --Noble
> >
> > On Thu, Nov 19, 2009 at 3:53 PM, Pablo Ferrari 
> wrote:
> >> Most specificly, I'm looking to update only one document using it's
> Unique
> >> ID: I dont want the DIH to lookup the whole database because I already
> know
> >> the Unique ID that has changed.
> >>
> >> Pablo
> >>
> >> 2009/11/19 Pablo Ferrari 
> >>
> >>>
> >>>
> >>> Hello!
> >>>
> >>> After been working in Solr documents updates using direct php code
> (using
> >>> SolrClient class) I want to use the DIH (Data Import Handler) to update
> my
> >>> documents.
> >>>
> >>> Any one knows how can I send commands to the DIH from php? Any idea or
> >>> tutorial will be of great help because I'm not finding anything useful
> so
> >>> far.
> >>>
> >>> Thank you for you time!
> >>>
> >>> Pablo
> >>> Tinkerlabs
> >>>
> >>
> >
> >
> >
> > --
> > -
> > Noble Paul | Principal Engineer| AOL | http://aol.com
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

SEVERE: java.lang.NumberFormatException: For input string: "104708<"

2010-02-26 Thread Pablo Mercado

Hello,

Solr is raising the following exception when processing queries that
sort on integer attribute.  The same queries and sorts have been
running fine in production for almost a year now.   If I run the query
without the sort on the integer attribute, the query runs fine.  If I
run a query that would return 0 results, but still has a sort
parameter the exception is raised.  The stack trace is the same no
matter what the query.

I need help troubleshooting this issue.  Any clues, or suggested
approaches would be helpful.  Thank you in advance!.

The stack trace is as follows:

SEVERE: java.lang.NumberFormatException: For input string: "104708<"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:456)
at java.lang.Integer.parseInt(Integer.java:497)
at 
org.apache.lucene.search.FieldCacheImpl$3.parseInt(FieldCacheImpl.java:148)
at 
org.apache.lucene.search.FieldCacheImpl$7.createValue(FieldCacheImpl.java:262)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
at 
org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:245)
at 
org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:239)
at 
org.apache.lucene.search.FieldSortedHitQueue.comparatorInt(FieldSortedHitQueue.java:291)
at 
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:188)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
at 
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)
at 
org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java:56)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


Our solr info is:
Solr Specification Version: 1.3.0
Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47
Lucene Specification Version: 2.4-dev
Lucene Implementation Version: 2.4-dev 691741 - 2008-09-03 15:25:16

Re: SEVERE: java.lang.NumberFormatException: For input string: "104708<"

2010-02-26 Thread Pablo Mercado

Thank you for taking the time to look at my issue and respond.

Do you have any suggestions for purging the document with this field
from the index?  Would that even help?

I do not know which document has the corrupt value, and searching for
the document with something like

pk_i:104708<

does return a document with that value.

(pk_i is the integer field that we try to sort on and that,
presumably, has a non-integer value stored for some document)




On Fri, Feb 26, 2010 at 10:26, Yonik Seeley  wrote:
> One of your field values isn't a valid integer, it's "104708<"
> You're probably using the straight integer type in 1.3, which is meant
> for back compat with existing lucene indexes and currently doesn't do
> validation on it's input.
>
> For Solr 1.4, "int" is a new field type (example schema maps it to
> TrieIntField) that does do validation at index time, and is just as
> efficient for sorting.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Fri, Feb 26, 2010 at 9:59 AM, Pablo Mercado  wrote:
>> Hello,
>>
>> Solr is raising the following exception when processing queries that
>> sort on integer attribute.  The same queries and sorts have been
>> running fine in production for almost a year now.   If I run the query
>> without the sort on the integer attribute, the query runs fine.  If I
>> run a query that would return 0 results, but still has a sort
>> parameter the exception is raised.  The stack trace is the same no
>> matter what the query.
>>
>> I need help troubleshooting this issue.  Any clues, or suggested
>> approaches would be helpful.  Thank you in advance!.
>>
>> The stack trace is as follows:
>>
>> SEVERE: java.lang.NumberFormatException: For input string: "104708<"
>>        at 
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>        at java.lang.Integer.parseInt(Integer.java:456)
>>        at java.lang.Integer.parseInt(Integer.java:497)
>>        at 
>> org.apache.lucene.search.FieldCacheImpl$3.parseInt(FieldCacheImpl.java:148)
>>        at 
>> org.apache.lucene.search.FieldCacheImpl$7.createValue(FieldCacheImpl.java:262)
>>        at 
>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
>>        at 
>> org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:245)
>>        at 
>> org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:239)
>>        at 
>> org.apache.lucene.search.FieldSortedHitQueue.comparatorInt(FieldSortedHitQueue.java:291)
>>        at 
>> org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:188)
>>        at 
>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
>>        at 
>> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)
>>        at 
>> org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java:56)
>>        at 
>> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)
>>        at 
>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
>>        at 
>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
>>        at 
>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
>>        at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
>>        at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>        at 
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>>        at 
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>>        at 
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>        at 
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>>        at 
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>>        at 
>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>>        at 
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>>        at 
>> org

Re: SEVERE: java.lang.NumberFormatException: For input string: "104708<"

2010-02-26 Thread Pablo Mercado

A big thanks to Yonik and Mark.  Using the raw term query I was able
to find the range(!) of documents that had bad integer field values.
Deleting those documents, committing and optimizing cleared up the
issue.

Still not sure how the bad values were inserted in the first place,
but that is another task.  Thanks again for being so helpful.

On Fri, Feb 26, 2010 at 11:29, Yonik Seeley  wrote:
> On Fri, Feb 26, 2010 at 10:59 AM, Mark Miller  wrote:
>> You have to find the document with the bad value somehow.
>>
>> In the past I have used Luke to help with this.
>>
>> Then you need to delete the document.
>
> You can also find the document with a raw term query.
>
> q={!raw f=myfield}104708<
>
> -Yonik
> http://www.lucidimagination.com
>

ubuntu lucid package

2010-04-29 Thread pablo platt

Hi

I've installed solr-tomcat package on ubuntu lucid (10.04 latest).
It automatically install java and tomcat and hopefully all other
dependencies.
I can access tomcat at http://localhost:8080 but not sure where to find the
solr web admin
http://localhost:8180 gives me nothing.

Is this package known to work? I've read that on previous ubuntu releases
the packages were broken.
Do I need to configure anything after installing the package?

Thanks

Re: ubuntu lucid package

2010-04-30 Thread pablo platt

http://localhost:8080/solr/admin/ gives me the solr admin.
thanks

On Fri, Apr 30, 2010 at 10:24 AM, Gora Mohanty  wrote:

> On Thu, 29 Apr 2010 19:54:49 -0700 (PDT)
> Otis Gospodnetic  wrote:
>
> > Pablo, Ubuntu Lucid is *brand* new :)
> >
> > try:
> > find / -name \*solr\*
> > or
> > locate solr.war
> [...]
>
> Also, the standard Debian/Ubuntu way of finding out what files a
> package installed is:
>  dpkg -l 
>
> Regards,
> Gora
>

Re: ubuntu lucid package

2010-04-30 Thread pablo platt

what parts doesn't work for you?
If there are bugs in the package it will be great if you can report them to
make it better.

On Fri, Apr 30, 2010 at 1:50 PM, Olivier Dobberkau  wrote:

>
> Am 30.04.2010 um 09:24 schrieb Gora Mohanty:
>
> > Also, the standard Debian/Ubuntu way of finding out what files a
> > package installed is:
> >  dpkg -l 
> >
> > Regards,
> > Gora
>
> You might try:
>
> # dpkg -L solr-common
> /.
> /etc
> /etc/solr
> /etc/solr/web.xml
> /etc/solr/conf
> /etc/solr/conf/admin-extra.html
> /etc/solr/conf/elevate.xml
> /etc/solr/conf/mapping-ISOLatin1Accent.txt
> /etc/solr/conf/protwords.txt
> /etc/solr/conf/schema.xml
> /etc/solr/conf/scripts.conf
> /etc/solr/conf/solrconfig.xml
> /etc/solr/conf/spellings.txt
> /etc/solr/conf/stopwords.txt
> /etc/solr/conf/synonyms.txt
> /etc/solr/conf/xslt
> /etc/solr/conf/xslt/example.xsl
> /etc/solr/conf/xslt/example_atom.xsl
> /etc/solr/conf/xslt/example_rss.xsl
> /etc/solr/conf/xslt/luke.xsl
> /usr
> /usr/share
> /usr/share/solr
> /usr/share/solr/WEB-INF
> /usr/share/solr/WEB-INF/lib
> /usr/share/solr/WEB-INF/lib/apache-solr-core-1.4.0.jar
> /usr/share/solr/WEB-INF/lib/apache-solr-dataimporthandler-1.4.0.jar
> /usr/share/solr/WEB-INF/lib/apache-solr-solrj-1.4.0.jar
> /usr/share/solr/WEB-INF/weblogic.xml
> /usr/share/solr/scripts
> /usr/share/solr/scripts/abc
> /usr/share/solr/scripts/abo
> /usr/share/solr/scripts/backup
> /usr/share/solr/scripts/backupcleaner
> /usr/share/solr/scripts/commit
> /usr/share/solr/scripts/optimize
> /usr/share/solr/scripts/readercycle
> /usr/share/solr/scripts/rsyncd-disable
> /usr/share/solr/scripts/rsyncd-enable
> /usr/share/solr/scripts/rsyncd-start
> /usr/share/solr/scripts/rsyncd-stop
> /usr/share/solr/scripts/scripts-util
> /usr/share/solr/scripts/snapcleaner
> /usr/share/solr/scripts/snapinstaller
> /usr/share/solr/scripts/snappuller
> /usr/share/solr/scripts/snappuller-disable
> /usr/share/solr/scripts/snappuller-enable
> /usr/share/solr/scripts/snapshooter
> /usr/share/solr/admin
> /usr/share/solr/admin/_info.jsp
> /usr/share/solr/admin/action.jsp
> /usr/share/solr/admin/analysis.jsp
> /usr/share/solr/admin/analysis.xsl
> /usr/share/solr/admin/distributiondump.jsp
> /usr/share/solr/admin/favicon.ico
> /usr/share/solr/admin/form.jsp
> /usr/share/solr/admin/get-file.jsp
> /usr/share/solr/admin/get-properties.jsp
> /usr/share/solr/admin/header.jsp
> /usr/share/solr/admin/index.jsp
> /usr/share/solr/admin/jquery-1.2.3.min.js
> /usr/share/solr/admin/meta.xsl
> /usr/share/solr/admin/ping.jsp
> /usr/share/solr/admin/ping.xsl
> /usr/share/solr/admin/raw-schema.jsp
> /usr/share/solr/admin/registry.jsp
> /usr/share/solr/admin/registry.xsl
> /usr/share/solr/admin/replication
> /usr/share/solr/admin/replication/header.jsp
> /usr/share/solr/admin/replication/index.jsp
> /usr/share/solr/admin/schema.jsp
> /usr/share/solr/admin/solr-admin.css
> /usr/share/solr/admin/solr_small.png
> /usr/share/solr/admin/stats.jsp
> /usr/share/solr/admin/stats.xsl
> /usr/share/solr/admin/tabular.xsl
> /usr/share/solr/admin/threaddump.jsp
> /usr/share/solr/admin/threaddump.xsl
> /usr/share/solr/admin/debug.jsp
> /usr/share/solr/admin/dataimport.jsp
> /usr/share/solr/favicon.ico
> /usr/share/solr/index.jsp
> /usr/share/doc
> /usr/share/doc/solr-common
> /usr/share/doc/solr-common/changelog.Debian.gz
> /usr/share/doc/solr-common/README.Debian
> /usr/share/doc/solr-common/TODO.Debian
> /usr/share/doc/solr-common/copyright
> /usr/share/doc/solr-common/changelog.gz
> /usr/share/doc/solr-common/NOTICE.txt.gz
> /usr/share/doc/solr-common/README.txt.gz
> /var
> /var/lib
> /var/lib/solr
> /var/lib/solr/data
> /usr/share/solr/WEB-INF/lib/xml-apis.jar
> /usr/share/solr/WEB-INF/lib/xml-apis-ext.jar
> /usr/share/solr/WEB-INF/lib/slf4j-jdk14.jar
> /usr/share/solr/WEB-INF/lib/slf4j-api.jar
> /usr/share/solr/WEB-INF/lib/lucene-spellchecker.jar
> /usr/share/solr/WEB-INF/lib/lucene-snowball.jar
> /usr/share/solr/WEB-INF/lib/lucene-queries.jar
> /usr/share/solr/WEB-INF/lib/lucene-highlighter.jar
> /usr/share/solr/WEB-INF/lib/lucene-core.jar
> /usr/share/solr/WEB-INF/lib/lucene-analyzers.jar
> /usr/share/solr/WEB-INF/lib/jetty-util.jar
> /usr/share/solr/WEB-INF/lib/jetty.jar
> /usr/share/solr/WEB-INF/lib/commons-io.jar
> /usr/share/solr/WEB-INF/lib/commons-httpclient.jar
> /usr/share/solr/WEB-INF/lib/commons-fileupload.jar
> /usr/share/solr/WEB-INF/lib/commons-csv.jar
> /usr/share/solr/WEB-INF/lib/commons-codec.jar
> /usr/share/solr/WEB-INF/web.xml
> /usr/share/solr/conf
>
> If i reckon correctly some parts of apache solr will not work with the
> ubuntu lucid distribution.
>
> http://solr.dkd.local/update/extract
>  throws an error:
>
> The server encountered an internal error (lazy loading error
> org.apache.solr.common.SolrException: lazy loading error at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
> at
>
> Maybe someone from ubuntu reading this list can confirm this.
>
> Olivier
> --
>
> Olivier

Re: Licensing issue advice for Solr.

2017-03-24 Thread Pablo Pita Leira

No answer from my side, but if you like to search the mailing list, you 
can try this:


http://markmail.org/search/?q=license+list%3Aorg.apache.lucene.solr-user


On 24.03.2017 18:53, russell.lemas...@comcast.net wrote:

Hi all,

I'm just getting started with Solr (6.4.2) and am trying to get approval for 
usage in my workplace.
I know that the product in general is licensed as Apache 2.0, but unfortunately 
there are packages
included in the build that are considered "non-permissive" by my company and as 
such, means that
I am having trouble getting things approved.

It appears that the vast majority of the licensing issues are within the 
contrib directory. I know these
provide significant functionality for Solr, but I was wondering if there is an 
official build that contains
just the Solr and Lucene server distribution (minus demos and contrib). Some of 
the packages are
dual licensed so I am able to deal with that by selecting which we wish to use, 
but there are some
that are either not licensed at all or are only non-permissive (ie: not Apache, 
BSD, MIT, etc.) like
GPL, CDDL, etc.

Has anyone had to deal with this in the past. My apologies if this has been 
discussed before, but
it doesn't appear that the mail list archive has a search option (correct me if 
I'm wrong on that).

Thanks

Extend the Solr Terms Component to implement a customized Autosuggest

2014-07-31 Thread Juan Pablo Albuja

Good afternoon guys, I really appreciate if someone on the community can help 
me with the following issue:

I need to implement a Solr autosuggest that supports:

1.   Get autosuggestion over multivalued fields

2.   Case - Insensitiveness

3.   Look for content in the middle for example I have the value "Hello 
World" indexed, and I need to get that value when the user types "wor"

4.   Filter by an additional field.

I was using the terms component because with it I can satisfy 1 to 3, but for 
point 4 is not possible. I also was looking at faceting searches and 
Ngram.Edge-Ngrams, but the problem with those approaches is that I need to copy 
fields over to make them tokenized or apply grams to those, and I don't want to 
do that because I have more than 6 fields that needs autosuggest, my index is 
big I have more than 400k documents and I don't want to increase the size.
I was trying to Extend the terms component in order to add an additional filter 
but it uses TermsEnum that is a vector over an specific field and I couldn't 
figure out how to filter it in a really efficient way.
Do you guys have an idea on how can I satisfy my requirements in an efficient 
way? If there is another way without using the terms component for me is also 
awesome.

Thanks




Juan Pablo Albuja
Senior Developer

Grouping or Facet ?

2011-12-03 Thread Juan Pablo Mora

I need to do some counts on a StrField field to suggest options from two 
different categories, and I don´t know what option is the best:

My schema looks:

- id
- name
- category: XX or YY

with Grouping I do:

http://localhost:8983/?q=name:prefix*&group=true&group.field=category

But I can change my schema to to:

- id
- nameXX
- nameYY
- category: XX or YY (only 1 value in nameXX or nameYY)

With facet:
http://localhost:8983/?q=*:*&facet=true&facet.field=nameXX&facet.field=nameYY&facet.prefix=prefix


What option have the best performance ?

Best,
Juampa.

RE: Grouping or Facet ?

2011-12-05 Thread Juan Pablo Mora

Because I need the count and the result to return back to the client side. Both 
the grouping and the facet offers me a solution to do that, but my doubt is 
about performance ...

With Grouping my results are:

"grouped":{
"category":{
  "matches": ...,
  "groups":[{
  "groupValue":"categoryXX",
  "doclist":{"numFound":Important_number,"start":0,"docs":[
  {
   doc:id
   category:XX
  }  
   "groupValue":"categoryYY",
  "doclist":{"numFound":Important_number,"start":0,"docs":[
  {
   doc: id
   category:YY
  }  

And with faceting my results are :
"facet.prefix=whatever"
"facet_counts":{
"facet_queries":{},
"facet_fields":{
  "namesXX":[
"whatever_name_in_category",76,
...
  "namesYY":[
"whatever_name_in_category",76,
...

Both results are OK to me.

De: Erick Erickson [erickerick...@gmail.com]
Enviado el: lunes, 05 de diciembre de 2011 14:48
Para: solr-user@lucene.apache.org
Asunto: Re: Grouping or Facet ?

Why not just use the first form of the document
and just facet.field=category? You'll get
two different facet counts for XX and YY
that way.

I don't think grouping is the way to go here.

Best
Erick

On Sat, Dec 3, 2011 at 6:43 AM, Juan Pablo Mora  wrote:
> I need to do some counts on a StrField field to suggest options from two 
> different categories, and I don´t know what option is the best:
>
> My schema looks:
>
> - id
> - name
> - category: XX or YY
>
> with Grouping I do:
>
> http://localhost:8983/?q=name:prefix*&group=true&group.field=category
>
> But I can change my schema to to:
>
> - id
> - nameXX
> - nameYY
> - category: XX or YY (only 1 value in nameXX or nameYY)
>
> With facet:
> http://localhost:8983/?q=*:*&facet=true&facet.field=nameXX&facet.field=nameYY&facet.prefix=prefix
>
>
> What option have the best performance ?
>
> Best,
> Juampa.

Re: Grouping or Facet ?

2011-12-09 Thread Juan Pablo Mora

Sorry if I don´t explain my problem clearly...

I need to do a suggester of names based on a prefix. My data are from two 
categories of people, admins and developers for example. So when the client 
write "SAN" my results should be:

Prefix: San
Developers: Sanchez Garcia, Juan (5)
   Sanchez Roman, Ivan (2)
   San...

Admins: Sanchez, Pedro (7)
Sanchez Garcia, Javier (2)


And the most common a name is, the upper position will have. So I think is not 
posible to do that with grouping. So finally my schema will be:

id
nameDeveloper or nameAdmin : both String fields, but only one will have values 
in a doc.

And my query with facet will be:

/q=*:*&facet=true&facet.field=nameDeveloper&facet.field=nameAdmin&facet.prefix=SAN&facet.minCounts=1


If I try to do that with grouping I need something like 
group.pivot=category,name , and is not posible in Solr yet.


Best,
Juampa.



El 08/12/2011, a las 02:23, Darren Govoni escribió:

> Yes. That's what I would expect. I guess I didn't understand when you said
> 
> "The facet counts are the counts of the *values* in that field"
> 
> Because it seems its the count of the number of matching documents 
> irrespective
> if one document has 20 values for that field and another 10, the facet count 
> will be 2,
> one for each document in the results.
> 
> On 12/07/2011 09:04 AM, Erick Erickson wrote:
>> In your example you'll have 10 facets returned each with a value of 1.
>> 
>> Best
>> Erick
>> 
>> On Tue, Dec 6, 2011 at 9:54 AM,  wrote:
>>> Sorry to jump into this thread, but are you saying that the facet count is
>>> not # of result hits?
>>> 
>>> So if I have 1 document with field CAT that has 10 values and I do a query
>>> that returns this 1 document with faceting, that the CAT facet count will
>>> be 10 not 1? I don't seem to be seeing that behavior in my app (Solr 3.5).
>>> 
>>> Thanks.
>>> 
>>>> OK, I'm not understanding here. You get the counts and the results if you
>>>> facet
>>>> on a single category field. The facet counts are the counts of the
>>>> *values* in that
>>>> field. So it would help me if you showed the output of faceting on a
>>>> single
>>>> category field and why that didn't work for you
>>>> 
>>>> But either way, faceting will probably outperform grouping.
>>>> 
>>>> Best
>>>> Erick
>>>> 
>>>> On Mon, Dec 5, 2011 at 9:05 AM, Juan Pablo Mora  wrote:
>>>>> Because I need the count and the result to return back to the client
>>>>> side. Both the grouping and the facet offers me a solution to do that,
>>>>> but my doubt is about performance ...
>>>>> 
>>>>> With Grouping my results are:
>>>>> 
>>>>> "grouped":{
>>>>>"category":{
>>>>>  "matches": ...,
>>>>>  "groups":[{
>>>>>  "groupValue":"categoryXX",
>>>>>  "doclist":{"numFound":Important_number,"start":0,"docs":[
>>>>>  {
>>>>>   doc:id
>>>>>   category:XX
>>>>>  }
>>>>>   "groupValue":"categoryYY",
>>>>>  "doclist":{"numFound":Important_number,"start":0,"docs":[
>>>>>  {
>>>>>   doc: id
>>>>>   category:YY
>>>>>  }
>>>>> 
>>>>> And with faceting my results are :
>>>>> "facet.prefix=whatever"
>>>>> "facet_counts":{
>>>>>"facet_queries":{},
>>>>>"facet_fields":{
>>>>>  "namesXX":[
>>>>>"whatever_name_in_category",76,
>>>>>...
>>>>>  "namesYY":[
>>>>>"whatever_name_in_category",76,
>>>>>...
>>>>> 
>>>>> Both results are OK to me.
>>>>> 
>>>>> 
>>>>> 
>>>>> De: Erick Erickson [erickerick...@gmail.com]
>>>>> Enviado el: lunes, 05 de diciembre de 2011 14:48
>>>>> Para: solr-user@lucene.apache.org
>>>>> Asunto: Re: Grouping or Facet ?
>>>>> 
>>>>> Why not just use the first form of the document
>>>>> and just facet.field=category? You'll get
>>>>> two different facet counts for XX and YY
>>>>> that way.
>>>>> 
>>>>> I don't think grouping is the way to go here.
>>>>> 
>>>>> Best
>>>>> Erick
>>>>> 
>>>>> On Sat, Dec 3, 2011 at 6:43 AM, Juan Pablo Mora
>>>>> wrote:
>>>>>> I need to do some counts on a StrField field to suggest options from
>>>>>> two different categories, and I don´t know what option is the best:
>>>>>> 
>>>>>> My schema looks:
>>>>>> 
>>>>>> - id
>>>>>> - name
>>>>>> - category: XX or YY
>>>>>> 
>>>>>> with Grouping I do:
>>>>>> 
>>>>>> http://localhost:8983/?q=name:prefix*&group=true&group.field=category
>>>>>> 
>>>>>> But I can change my schema to to:
>>>>>> 
>>>>>> - id
>>>>>> - nameXX
>>>>>> - nameYY
>>>>>> - category: XX or YY (only 1 value in nameXX or nameYY)
>>>>>> 
>>>>>> With facet:
>>>>>> http://localhost:8983/?q=*:*&facet=true&facet.field=nameXX&facet.field=nameYY&facet.prefix=prefix
>>>>>> 
>>>>>> 
>>>>>> What option have the best performance ?
>>>>>> 
>>>>>> Best,
>>>>>> Juampa.
>

RE: Solr Optimization Fail

2011-12-16 Thread Juan Pablo Mora

Maybe you are generating a snapshot of your index attached to the optimize ???
Look for post-commit or post-optimize events in your solr-config.xml


De: Rajani Maski [rajinima...@gmail.com]
Enviado el: viernes, 16 de diciembre de 2011 11:11
Para: solr-user@lucene.apache.org
Asunto: Solr Optimization Fail

Hi,

 When we do optimize, it actually reduces the data size right?

I have index of size 6gb(5 million documents). Index is already created
with commits for every 1 documents.

Now I was trying to do optimization with  http optimize command.   When i
did that,  data size became - 12gb.  Why this might have happened?

And can anyone please suggest me fix for it?

Thanks
Rajani

Implementing Search Suggestion on Solr

2010-10-18 Thread Pablo Recio Quijano


Hi!

I'm trying to implement some kind of Search Suggestion on a search 
engine I have implemented. This search suggestions should not be 
automatically like the one described for the SpellCheckComponent [1]. 
I'm looking something like:


"SAS oppositions" => "Public job offers for some-company"

So I will have to define it manually. I was thinking about synonyms [2] 
but I don't know if it's the proper way to do it, because semantically 
those terms are not synonyms.


Any ideas or suggestions?

Regards,

[1] http://wiki.apache.org/solr/SpellCheckComponent
[2] 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

RE: SOLR 3.3 DIH and Java 1.6

2012-03-20 Thread Juan Pablo Mora

Some versions of the OpenJDK doesn´t include the Rhino Engine to run javascript 
dataimport. You have to use the Oracle JDK.

Juampa.

De: randolf.julian [randolf.jul...@dominionenterprises.com]
Enviado el: martes, 20 de marzo de 2012 5:41
Para: solr-user@lucene.apache.org
Asunto: SOLR 3.3 DIH and Java 1.6

I am trying to use the data import handler to update SOLR index with Oracle
data. In the SOLR schema, a dynamic field called PHOTO_* has been defined. I
created a script transformer:

RE: problems with search in solr

2012-03-22 Thread Juan Pablo Mora

Remove the stemmer filter. "Caso" and "casa" are transformed into "cas" if you 
use the stemmer filter.

En español:
Quita el filtro de stemmer, que se usa para sacar la raiz de las palabras, pero 
en tu caso la raíz de "casa" y "caso" es la misma, "cas".

Un saludo.


De: PINA CORONADO, RAFAEL [rafael.p...@carm.es]
Enviado el: jueves, 22 de marzo de 2012 13:38
Para: solr-user@lucene.apache.org
Asunto: problems with search in solr

Good morning:
I have problems with the results obtained Solr search string (eg caso). Me back 
records with similar terms (in this example would return the same as if looking 
casa).
The 1.4.1 version of Solr is
The definition of type text in the file schema.xml is:


  






  


Could you tell if an error in the configuration and how to solve it.

thanks

=
Rafael Pina Coronado
Servicio de Informática.
Archivo General de la Región de Murcia
Email: rafael.p...@carm.es
==

RE: Transform a SolrDocument into a SolrInputDocument

2011-03-21 Thread Juan Pablo Mora

I answered myself a similar question here:

http://stackoverflow.com/questions/4037625/change-schema-in-solr-without-reindex

I Hope it helps


De: Marc SCHNEIDER [marc.schneide...@gmail.com]
Enviado el: lunes, 21 de marzo de 2011 15:20
Para: solr-user@lucene.apache.org
Asunto: Re: Transform a SolrDocument into a SolrInputDocument

Hi Péter,

I'm not sure to understand your answer. A SolrInputDocument always contains
only stored fields, so I don't see the problem.
I just like to update an existing stored field...

Thanks,
Marc.

2011/3/21 Péter Király 

> Hi Marc,
>
> as far as I know the best way to do it is working from the original
> source, because it is possible, that not all fields are stores, and
> the original content of the not stored fields is not inside the Solr
> document.
>
> Péter
>
> 2011/3/21 Marc SCHNEIDER :
> > Hello,
> >
> > I'd like to know the fastest way (code lines) to update a field of a
> > document.
> > So my idea was:
> > 1) Get a SolrDocument
> > 2) Add all fields of the SolrDocument to a new SolrInputDocument
> > 3) Update the field in SolrInputDocument
> > 4) Add SolrInputDocument to the server and commit it
> >
> > Is there a fastest way to do that? I mean transforming a SolrDocument
> into a
> > SolrInputDocument?
> >
> > Thanks in advance,
> > Regards,
> > Marc.
> >
>

Re: Matching on a multi valued field

2011-03-29 Thread Juan Pablo Mora

>> A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.


That is true but you cannot do things like:

q="bar* foo*"~10 with default query search.

and if you use dismax you will have the same problems with multivalued fields. 
Imagine the situation:

Doc1:
field A: ["foo bar","dooh"] 2 values

Doc2:
field A: ["bar dooh", "whatever"] Another 2 values

the query:
qt=dismax & qf= fieldA & q = ( bar dooh )

will return both Doc1 and Doc2. The only thing you can do in this situation is 
boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first 
position of the results:

pf = fieldA^1


Thanks,
JP.


El 29/03/2011, a las 23:14, Markus Jelsma escribió:

> orly, all replies came in while sending =)
> 
>> Hi,
>> 
>> Your filter query is looking for a match of "man's friend" in a single
>> field. Regardless of analysis of the common_names field, all terms are
>> present in the common_names field of both documents. A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.
>> 
>> That should work
>> 
>> Cheers,
>> 
>>> Hi all,
>>> 
>>> I have a field set up like this:
>>> 
>>> >> stored="true" required="false" />
>>> 
>>> And I have some records:
>>> 
>>> RECORD1
>>> 
>>> 
>>>  man's best friend
>>>  pooch
>>> 
>>> 
>>> 
>>> RECORD2
>>> 
>>> 
>>>  man's worst enemy
>>>  friend to no one
>>> 
>>> 
>>> 
>>> Now if I do a search such as:
>>> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND
>>> df=common_names}man's friend
>>> 
>>> Both records are returned. However, I only want RECORD1 returned. I
>>> understand why RECORD2 is returned but how can I structure my query so
>>> that only RECORD1 is returned?
>>> 
>>> Thanks,
>>> 
>>> Brian Lamb

Re: Matching on a multi valued field

2011-04-04 Thread Juan Pablo Mora

I have not find any solution to this. The only thing is to denormalize your 
multivalue field into several docs with a single value field.

Try ComplexPhraseQueryParser (https://issues.apache.org/jira/browse/SOLR-1604) 
if you are using solr 1.4 version.

El 04/04/2011, a las 21:21, Brian Lamb escribió:

I just noticed Juan's response and I find that I am encountering that very 
issue in a few cases. Boosting is a good way to put the more relevant results 
to the top but it is possible to only have the correct results returned?

On Wed, Mar 30, 2011 at 11:51 AM, Brian Lamb 
mailto:brian.l...@journalexperts.com>> wrote:
Thank you all for your responses. The field had already been set up with 
positionIncrementGap=100 so I just needed to add in the slop.

On Tue, Mar 29, 2011 at 6:32 PM, Juan Pablo Mora 
mailto:jua...@informa.es>> wrote:
>> A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.

That is true but you cannot do things like:

q="bar* foo*"~10 with default query search.

and if you use dismax you will have the same problems with multivalued fields. 
Imagine the situation:

Doc1:
   field A: ["foo bar","dooh"] 2 values

Doc2:
   field A: ["bar dooh", "whatever"] Another 2 values

the query:
   qt=dismax & qf= fieldA & q = ( bar dooh )

will return both Doc1 and Doc2. The only thing you can do in this situation is 
boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first 
position of the results:

pf = fieldA^1

Thanks,
JP.

El 29/03/2011, a las 23:14, Markus Jelsma escribió:

> orly, all replies came in while sending =)
>
>> Hi,
>>
>> Your filter query is looking for a match of "man's friend" in a single
>> field. Regardless of analysis of the common_names field, all terms are
>> present in the common_names field of both documents. A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.
>>
>> That should work
>>
>> Cheers,
>>
>>> Hi all,
>>>
>>> I have a field set up like this:
>>>
>>> >> stored="true" required="false" />
>>>
>>> And I have some records:
>>>
>>> RECORD1
>>> 
>>>
>>>  man's best friend
>>>  pooch
>>>
>>> 
>>>
>>> RECORD2
>>> 
>>>
>>>  man's worst enemy
>>>  friend to no one
>>>
>>> 
>>>
>>> Now if I do a search such as:
>>> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND<http://localhost:8983/solr/search/?q=*:*&fq=%7B!q.op=AND>
>>> df=common_names}man's friend
>>>
>>> Both records are returned. However, I only want RECORD1 returned. I
>>> understand why RECORD2 is returned but how can I structure my query so
>>> that only RECORD1 is returned?
>>>
>>> Thanks,
>>>
>>> Brian Lamb

Highlight, Dismax and local params

2011-04-18 Thread Juan Pablo Mora

Hello,

I think I have found something extrange with local params and edismax. If I do 
querys like :


"params":{
  "hl.requireFieldMatch":"true",
  "hl.fragsize":"200",
  "json.wrf":"callback0",
  "indent":"on",
  "hl.fl":"domicilio,deno",
  "wt":"json",
  "hl":"true",
  "rows":"5",
  "fl":"oidEmpresa,codNif,codTpoEmp,codVidaEmp,denoDef",
  "debugQuery":"on",
  "q":"{!edismax qf=$tipoDeno^5 pf=$tipoDeno^30 ps=5 qs=1}construcciones 
garcía",
  "tipoDeno":"deno",
  "f.domicilio.hl.alternateField":"domicilioDef",
  "fq":"-codTpoNif:F"}},

The highlighting section of the response looks like:


"highlighting":{
"75663":{
  "domicilio":["P45 FOO BAR"],
  "deno":["V00T06 FOO BAR"]},
"76021":{
  "domicilio":["P45 BLAH BLAH"],
  "deno":["V00T00 BLAH BLAH"]},

But if I repeat the query with:

 "q":"{!edismax qf='$tipoDeno^5 ANOTHER_FIELD' pf=$tipoDeno^30 ps=5 qs=1} 
construcciones garcía"
 tipoDeno = deno


The debug show:

"parsedquery":"+((DisjunctionMaxQuery((deno:construcciones)) 
DisjunctionMaxQuery((deno:garcia)))~2)",
"parsedquery_toString":"+(((deno:construcciones) (deno:garcia))~2)",

And there is no reference to "anotherField" field and the highlight of the 
field deno dissapear in the response.


"highlighting":{
"75663":{
  "domicilio":["P45 FOO BAR"],
"76021":{
  "domicilio":["P45 BLAH BLAH"],

Solr :: Snapshooter Cannot allocate memory

2008-08-29 Thread OLX - Pablo Garrido

Hello

We are facing this recurrent error on our Master Solr Server
every now and then :

SEVERE: java.io.IOException: Cannot run program
"/opt/solr/bin/snapshooter" (in directory "solr/bin"):
java.io.IOException: error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at java.lang.Runtime.exec(Runtime.java:593)
at
org.apache.solr.core.RunExecutableListener.exec(RunExecutableListener.java:70)
at
org.apache.solr.core.RunExecutableListener.postCommit(RunExecutableListener.java:97)
at
org.apache.solr.update.UpdateHandler.callPostCommitCallbacks(UpdateHandler.java:99)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:514)
at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:214)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
at org.mortbay.jetty.servlet.ServletHandler
$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at org.mortbay.jetty.HttpConnection
$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:723)
at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at org.mortbay.jetty.bio.SocketConnector
$Connection.run(SocketConnector.java:226)
at org.mortbay.thread.BoundedThreadPool
$PoolThread.run(BoundedThreadPool.java:442)
Caused by: java.io.IOException: java.io.IOException: error=12, Cannot
allocate memory
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 28 more


We have 3 Solr Servers, one Master and 2 Slaves, there is a Cron
Job doing Commits every 5 minutes on Master with Inserts/Deletes and
Slaves Rsync last Index Version from Master every 10 minutes, all 3
Servers have this Setup :

* RHEL 5.0 OS 64 bits
* 16 GB RAM Memory 

We are giving Solr Java process 10 GB RAM Memory, did anybody
face this error ? Snapshooter process will try to allocate memory from
the 6GB RAM Memory available for the OS ? should we reduce 10GB to 8GB
for Solr Java process ?

Thanks for your help


Pablo

Solr Slaves Sync

2008-09-04 Thread OLX - Pablo Garrido

Hello

We have a 3 Solr Servers replication schema, one Master and 2 Slaves,
commits are done every 5 minutes on the Master and an optimize is done
once a day during midnight, snapshots are copied via rsync to Slaves are
done every 10 minutes, we are facing serious problems when doing the
sync after the optimize and keeping Slaves serving queries as usual,
active connections to Slaves increase highly during Optimize Snapshot
sync, is there any way we can tune this process ? we try this process :

1. stopping Sync process on one Slave
2. taking the other one out of the LB pool
3. do the sync on this offline Slave
4. after sync is over add back to LB Pool synced Slave
5. take other Slave out from LB Pool
6. start sync process on the offline Slave
8 add back synced Slave to LB Pool

following these steps we sometimes face high active connections when
moving Slaves back to LB Pool. Has anybody faced this situation in
production envs ? Thanks

Pablo

85 matches

Mail list logo