from:"park"

is it appropriate to use external cache for whole shards

2018-02-26 Thread park

I'm indexing and searching documents using solr 6.x.
It is quite efficient when there are fewer shards and fewer cluster units.
However, when the number of shards exceeds 30 and the size of each shard is
30G, the search performance is significantly reduced.
Currently, usercache in solr is actively used, so we plan queryResultCache
for the entire shards.
Is this the right solution what  trying to use an external cache?(for
example, redis, memcahced, apache ignite, etc.)




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: is it appropriate to use external cache for whole shards

2018-03-01 Thread park

Thank you for answer. We will improve our system based on what you said.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

SolrJ: SolrQuery and ModifiableSolrParams

2014-03-18 Thread Cynthia Park

Hello,

What is the difference between setting parameters via SolrQuery vs
ModifiableSolrParams? If there is a difference,  is there a preferred
choice?  I'm using Solr 4.6.1.

SolrQuery query = new SolrQuery();
query.setParam("wt", "json");

ModifiableSolrParams params = new ModifiableSolrParams();

params.set("wt", "json");

Re: [ANNOUNCE] Apache Solr 4.5.1 released.

2013-10-24 Thread Jack Park

Download redirects to 4.5.0
Is there a typo in the server path?

On Thu, Oct 24, 2013 at 9:14 AM, Mark Miller  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> October 2013, Apache Solr™ 4.5.1 available
>
> The Lucene PMC is pleased to announce the release of Apache Solr 4.5.1
>
> Solr is the popular, blazing fast, open source NoSQL search platform
> from the Apache Lucene project. Its major features include powerful
> full-text search, hit highlighting, faceted search, dynamic clustering,
> database integration, rich document (e.g., Word, PDF) handling, and
> geospatial search. Solr is highly scalable, providing fault tolerant
> distributed search and indexing, and powers the search and navigation
> features of many of the world's largest internet sites.
>
> Solr 4.5.1 includes 16 bug fixes as well as Lucene 4.5.1 and its bug
> fixes. The release is available for immediate download at:
>
> http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>
>
> See the CHANGES.txt file included with the release for a full list of
> changes and further details.
>
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/solr/discussion.html)
>
> Note: The Apache Software Foundation uses an extensive mirroring network
> for distributing releases. It is possible that the mirror you are using
> may not have replicated the release yet. If that is the case, please try
> another mirror. This also goes for Maven access.
>
> Happy searching,
>
> Lucene/Solr developers
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJSaUdSAAoJED+/0YJ4eWrI90UP/RGSmLBdvrc/5NZEb7LSCSjW
> z4D3wJ2i4a0rLpiW2qA547y/NZ5KZcmrDSzJu0itf8Q/0q+tm7/d30uPg/cdRlgl
> wGERcxsyfPfTqBjzdSNNGgNm++tnkkqRJbYEfsG5ApWrKicitU7cPb82m8oCdlnn
> 4wnhYt6tfu/EPCglt9ixF7Ukv5o7txMnwWGmkGTbUt8ugp9oOMN/FfGHex/FVxcF
> xHhWBLymIJy24APEEF/Mq3UW12hQT+aRof66xBch0fEPVlbDitBa9wNuRNQ98M90
> ZpTl8o0ITMUKjTKNkxZJCO5LQeNwhYaOcM5nIykGadWrXBZo5Ob611ZKeYPZBWCW
> Ei88dwJQkXaDcVNLZ/HVcAePjmcALHd3nc4uNfcJB8zvgZOPagMpXW2rRSXFACHM
> FdaRezTdH8Uh5zp2n3hsqYCbpDreRoXGXaiOgVZ+8EekVMGYUnMFKdqNlqhVnF6r
> tzp+aaCBhGDUD5xUw2w2fb5c9Jh1oIQ9f7fsVH78kgsHShySnte3NbfoFWUClPMX
> PwrfWuZpmu9In2ZiJVYSOD6MBqmJ+z3N1bnf1kqsitv7MonkvQkOoDIafW835vG9
> 3aajknE1vazOATSGHIxCtJfqzTEqeqFqVbjG/qS72XIhMey8tVAwjrjcgFnayk9Z
> xrG1W1o2sjrYkioJ7nZK
> =8++G
> -END PGP SIGNATURE-

Re: [ANNOUNCE] Apache Solr 4.5.1 released.

2013-10-24 Thread Jack Park

Use a different server than default gets 4.5.1

On Thu, Oct 24, 2013 at 9:35 AM, Jack Park  wrote:
> Download redirects to 4.5.0
> Is there a typo in the server path?
>
> On Thu, Oct 24, 2013 at 9:14 AM, Mark Miller  wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> October 2013, Apache Solr™ 4.5.1 available
>>
>> The Lucene PMC is pleased to announce the release of Apache Solr 4.5.1
>>
>> Solr is the popular, blazing fast, open source NoSQL search platform
>> from the Apache Lucene project. Its major features include powerful
>> full-text search, hit highlighting, faceted search, dynamic clustering,
>> database integration, rich document (e.g., Word, PDF) handling, and
>> geospatial search. Solr is highly scalable, providing fault tolerant
>> distributed search and indexing, and powers the search and navigation
>> features of many of the world's largest internet sites.
>>
>> Solr 4.5.1 includes 16 bug fixes as well as Lucene 4.5.1 and its bug
>> fixes. The release is available for immediate download at:
>>
>> http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>>
>>
>> See the CHANGES.txt file included with the release for a full list of
>> changes and further details.
>>
>> Please report any feedback to the mailing lists
>> (http://lucene.apache.org/solr/discussion.html)
>>
>> Note: The Apache Software Foundation uses an extensive mirroring network
>> for distributing releases. It is possible that the mirror you are using
>> may not have replicated the release yet. If that is the case, please try
>> another mirror. This also goes for Maven access.
>>
>> Happy searching,
>>
>> Lucene/Solr developers
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1.4.14 (GNU/Linux)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>
>> iQIcBAEBAgAGBQJSaUdSAAoJED+/0YJ4eWrI90UP/RGSmLBdvrc/5NZEb7LSCSjW
>> z4D3wJ2i4a0rLpiW2qA547y/NZ5KZcmrDSzJu0itf8Q/0q+tm7/d30uPg/cdRlgl
>> wGERcxsyfPfTqBjzdSNNGgNm++tnkkqRJbYEfsG5ApWrKicitU7cPb82m8oCdlnn
>> 4wnhYt6tfu/EPCglt9ixF7Ukv5o7txMnwWGmkGTbUt8ugp9oOMN/FfGHex/FVxcF
>> xHhWBLymIJy24APEEF/Mq3UW12hQT+aRof66xBch0fEPVlbDitBa9wNuRNQ98M90
>> ZpTl8o0ITMUKjTKNkxZJCO5LQeNwhYaOcM5nIykGadWrXBZo5Ob611ZKeYPZBWCW
>> Ei88dwJQkXaDcVNLZ/HVcAePjmcALHd3nc4uNfcJB8zvgZOPagMpXW2rRSXFACHM
>> FdaRezTdH8Uh5zp2n3hsqYCbpDreRoXGXaiOgVZ+8EekVMGYUnMFKdqNlqhVnF6r
>> tzp+aaCBhGDUD5xUw2w2fb5c9Jh1oIQ9f7fsVH78kgsHShySnte3NbfoFWUClPMX
>> PwrfWuZpmu9In2ZiJVYSOD6MBqmJ+z3N1bnf1kqsitv7MonkvQkOoDIafW835vG9
>> 3aajknE1vazOATSGHIxCtJfqzTEqeqFqVbjG/qS72XIhMey8tVAwjrjcgFnayk9Z
>> xrG1W1o2sjrYkioJ7nZK
>> =8++G
>> -END PGP SIGNATURE-

First test cloud error question...

2013-10-24 Thread Jack Park

Background: all testing done on a Win7 platform. This is my first
migration from a single Solr server to a simple cloud. Everything is
configured exactly as specified in the wiki.

I created a simple 3-node client, all localhost with different server
URLs, and a lone external zookeeper.  The online admin shows they are
all up.

I then start an agent which sends in documents to "bootstrap" the
index. That's when issues start.  A clip from the log shows this:
First, I create a SolrDocument with this JSON data:

DEBUG 2013-10-24 18:00:09,143 [main] - SolrCloudClient.mapToDocument-
{"locator":"EssayNodeType","smallIcon":"\/images\/cogwheel.png","subOf":["NodeType"],"details":["The
TopicQuests NodeTypes typology essay
type."],"isPrivate":"false","creatorId":"SystemUser","label":["Essay
Type"],"largeIcon":"\/images\/cogwheel_sm.png","lastEditDate":Thu Oct
24 18:00:09 PDT 2013,"createdDate":Thu Oct 24 18:00:09 PDT 2013}

Then, send it in from SolrJ which has a CloudSolrServer initialized
with localhost:2181 and an instance of LBHttpSolrServer initialized
with http://localhost:8983/solr/

That trace follows

INFO  2013-10-24 18:00:09,145 [main] - Initiating client connection,
connectString=localhost:2181 sessionTimeout=1
watcher=org.apache.solr.common.cloud.ConnectionManager@e6c
INFO  2013-10-24 18:00:09,148 [main] - Waiting for client to connect
to ZooKeeper
INFO  2013-10-24 18:00:09,150 [main-SendThread(0:0:0:0:0:0:0:1:2181)]
- Opening socket connection to server
0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate
using SASL (Unable to locate a login configuration)
ERROR 2013-10-24 18:00:09,151 [main-SendThread(0:0:0:0:0:0:0:1:2181)]
- Unable to open socket to 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2181
WARN  2013-10-24 18:00:09,151 [main-SendThread(0:0:0:0:0:0:0:1:2181)]
- Session 0x0 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.SocketException: Address family not supported by protocol
family: connect
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.registerAndConnect(ClientCnxnSocketNIO.java:266)

I can watch the Zookeeper console running; it's mostly complaining
about too many connections from /127.0.0.1 ; I am seeing the errors in
the agent's log file.

Following that trace in the log is this:

INFO  2013-10-24 18:00:09,447 [main-SendThread(127.0.0.1:2181)] -
Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not
attempt to authenticate using SASL (Unable to locate a login
configuration)
INFO  2013-10-24 18:00:09,448 [main-SendThread(127.0.0.1:2181)] -
Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating
session
DEBUG 2013-10-24 18:00:09,449 [main-SendThread(127.0.0.1:2181)] -
Session establishment request sent on 127.0.0.1/127.0.0.1:2181
DEBUG 2013-10-24 18:00:09,449 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
INFO  2013-10-24 18:00:09,501 [main-SendThread(127.0.0.1:2181)] -
Session establishment complete on server 127.0.0.1/127.0.0.1:2181,
sessionid = 0x141ece7e6160017, negotiated timeout = 1
INFO  2013-10-24 18:00:09,501 [main-EventThread] - Watcher
org.apache.solr.common.cloud.ConnectionManager@42bad8a8
name:ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent
state:SyncConnected type:None path:null path:null type:None
INFO  2013-10-24 18:00:09,502 [main] - Client is connected to ZooKeeper
DEBUG 2013-10-24 18:00:09,502 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,502 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,503 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,503 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,504 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,504 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,505 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,506 [main-SendThread(127.0.0.1:2181)] -
Reading reply sessionid:0x141ece7e6160017, packet:: clientPath:null
serverPath:null finished:false header:: 1,3  replyHeader:: 1,541,0
request:: '/clus

Re: First test cloud error question...

2013-10-25 Thread Jack Park

Focus turned to the issue of " Unable to open socket to
0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2181"

That's apparently been problematic for others as well. It might be at root here.
I believe I am able to prove zookeeper is running by asking its
status, which reports at least something.

I moved the entire SolrCloud installation over to an Ubuntu box.
There, I see a new problem: zookeeper doesn't appear to be running,
even though it says "STARTED" after booting, but gives no further
console messages and returns to the command line as if it ended. If I
ask zkServer status, it says "probably not running".

In the data directory (/tmp/zookeeper -- same for both windoz and
nix), in nix, there is a pid file and nothing else. In the windoz
data, there is no pid file, but a /version-2 directory with what
appear to be runtime log files -- not the debug ones. Neither
installation shows a log4j log file anywhere.

I have reason to believe I followed all the instructions in the
ZooKeeper Getting Started page accurately. Still, no real cigar...

Java on windoz is 1.6.0_31; on ubuntu it is 1.7.0_40

Thanks in advance for any hints.

On Thu, Oct 24, 2013 at 6:24 PM, Jack Park  wrote:
> Background: all testing done on a Win7 platform. This is my first
> migration from a single Solr server to a simple cloud. Everything is
> configured exactly as specified in the wiki.
>
> I created a simple 3-node client, all localhost with different server
> URLs, and a lone external zookeeper.  The online admin shows they are
> all up.
>
> I then start an agent which sends in documents to "bootstrap" the
> index. That's when issues start.  A clip from the log shows this:
> First, I create a SolrDocument with this JSON data:
>
> DEBUG 2013-10-24 18:00:09,143 [main] - SolrCloudClient.mapToDocument-
> {"locator":"EssayNodeType","smallIcon":"\/images\/cogwheel.png","subOf":["NodeType"],"details":["The
> TopicQuests NodeTypes typology essay
> type."],"isPrivate":"false","creatorId":"SystemUser","label":["Essay
> Type"],"largeIcon":"\/images\/cogwheel_sm.png","lastEditDate":Thu Oct
> 24 18:00:09 PDT 2013,"createdDate":Thu Oct 24 18:00:09 PDT 2013}
>
> Then, send it in from SolrJ which has a CloudSolrServer initialized
> with localhost:2181 and an instance of LBHttpSolrServer initialized
> with http://localhost:8983/solr/
>
> That trace follows
>
> INFO  2013-10-24 18:00:09,145 [main] - Initiating client connection,
> connectString=localhost:2181 sessionTimeout=1
> watcher=org.apache.solr.common.cloud.ConnectionManager@e6c
> INFO  2013-10-24 18:00:09,148 [main] - Waiting for client to connect
> to ZooKeeper
> INFO  2013-10-24 18:00:09,150 [main-SendThread(0:0:0:0:0:0:0:1:2181)]
> - Opening socket connection to server
> 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate
> using SASL (Unable to locate a login configuration)
> ERROR 2013-10-24 18:00:09,151 [main-SendThread(0:0:0:0:0:0:0:1:2181)]
> - Unable to open socket to 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2181
> WARN  2013-10-24 18:00:09,151 [main-SendThread(0:0:0:0:0:0:0:1:2181)]
> - Session 0x0 for server null, unexpected error, closing socket
> connection and attempting reconnect
> java.net.SocketException: Address family not supported by protocol
> family: connect
> at sun.nio.ch.Net.connect(Native Method)
> at sun.nio.ch.SocketChannelImpl.connect(Unknown Source)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.registerAndConnect(ClientCnxnSocketNIO.java:266)
>
> I can watch the Zookeeper console running; it's mostly complaining
> about too many connections from /127.0.0.1 ; I am seeing the errors in
> the agent's log file.
>
> Following that trace in the log is this:
>
> INFO  2013-10-24 18:00:09,447 [main-SendThread(127.0.0.1:2181)] -
> Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not
> attempt to authenticate using SASL (Unable to locate a login
> configuration)
> INFO  2013-10-24 18:00:09,448 [main-SendThread(127.0.0.1:2181)] -
> Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating
> session
> DEBUG 2013-10-24 18:00:09,449 [main-SendThread(127.0.0.1:2181)] -
> Session establishment request sent on 127.0.0.1/127.0.0.1:2181
> DEBUG 2013-10-24 18:00:09,449 [main-SendThread(127.0.0.1:2181)] -
> Could not retrieve login configuration: java.lang.SecurityException:
> Unable to locate a login configuration
> INFO  2013-10-24 18:00:09,501 [main-SendThread(127.0.0.1:2181)] -
> Session establishment complete on server 127.0.0.1/127.0.0.1:2181,
> sessionid = 0x141ece7e6160017, negotiated timeou

Simple (?) zookeeper question

2013-10-31 Thread Jack Park

Latest zookeeper is installed on an Ubuntu server box.
Java is 1.7 latest build.
whereis points to java just fine.
/etc/zookeeper is empty.

boot zookeeper from /bin as sudo ./zkServer.sh start
Console says "Started"
/etc/zookeeper now has a .pid file
In another console, ./zkServer.sh status returns:
"It's probably not running"

An interesting fact: the log4j.properties file says there should be a
zookeeper.log file in "."; there is no log file. When I do a text
search in the zookeeper source code for where it picks up the
log4j.properties, nothing is found.

Fascinating, what?  This must be a common beginner's question, not
well covered in web-search for my context. Does it ring any bells?

Many thanks.
Jack

Re: Simple (?) zookeeper question

2013-10-31 Thread Jack Park

After digging deeper (slow for a *nix newbee), I uncovered issues with
the java installation. A step in installation of Oracle Java has it
that you -install "java" with the path to /bin/java. That done,
zookeeper seems to be running.

I booted three cores (on the same box) -- this is the simple one-box
3-node cloud test, and used the test code from the Lucidworks course
to send over and read some documents. That failed with this:
Unknown document router '{name=compositeId}'

Lots more research.
Closer...

On Thu, Oct 31, 2013 at 5:44 PM, Jack Park  wrote:
> Latest zookeeper is installed on an Ubuntu server box.
> Java is 1.7 latest build.
> whereis points to java just fine.
> /etc/zookeeper is empty.
>
> boot zookeeper from /bin as sudo ./zkServer.sh start
> Console says "Started"
> /etc/zookeeper now has a .pid file
> In another console, ./zkServer.sh status returns:
> "It's probably not running"
>
> An interesting fact: the log4j.properties file says there should be a
> zookeeper.log file in "."; there is no log file. When I do a text
> search in the zookeeper source code for where it picks up the
> log4j.properties, nothing is found.
>
> Fascinating, what?  This must be a common beginner's question, not
> well covered in web-search for my context. Does it ring any bells?
>
> Many thanks.
> Jack

Re: Simple (?) zookeeper question

2013-11-01 Thread Jack Park

Alan,
That was brilliant!
My test harness was behind a couple of notches.

Hah! So, now we open yet another can of strange looking creatures, namely:

No live SolrServers available to handle this
request:[http://127.0.1.1:8983/solr/collection1]
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:347)

3 times, once for each URL I passed into the server. Here is the code:

String zkurl = "10.1.10.178:2181";
String solrurla = "10.1.10.178:8983";
String solrurlb = "10.1.10.178:7574";
String solrurlc = "10.1.10.178:7590";

LBHttpSolrServer sv = new LBHttpSolrServer(solrurla,solrurlb,solrurlc);
CloudSolrServer server = new CloudSolrServer(zkurl,sv);
server.setDefaultCollection("collection1");

I am struggling to imagine how 10.1.10.178 got translated to 127.0.1.1
and the port assignments ignored for each URL passed in.

That error message seems well known to search engines. One suggestion
is to check the zookeeper logs.  According to the zookeeper's log4j
properties, there should be a zookeeper.log in the zookeeper
directory. There is no such log. I went to /etc/zookeeper/Version_2
and looked at log.1 (binary) but could see hints that this might be
where the 127.0.1.1 is coming from: zookeeper sending such an error
message back. This would suggest that, somehow or other, my nodes are
not properly registering themselves, though no error messages were
tossed when each node was booted.

solr.log for node1 only reflects queries from the admin page.

That's what I am working on now.
Thanks!

On Fri, Nov 1, 2013 at 6:03 AM, Alan Woodward  wrote:
> Unknown document router errors are usually caused by using different solr and 
> solrj versions - which version of solr and solrj are you using?
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 1 Nov 2013, at 04:19, Jack Park wrote:
>
>> After digging deeper (slow for a *nix newbee), I uncovered issues with
>> the java installation. A step in installation of Oracle Java has it
>> that you -install "java" with the path to /bin/java. That done,
>> zookeeper seems to be running.
>>
>> I booted three cores (on the same box) -- this is the simple one-box
>> 3-node cloud test, and used the test code from the Lucidworks course
>> to send over and read some documents. That failed with this:
>> Unknown document router '{name=compositeId}'
>>
>> Lots more research.
>> Closer...
>>
>> On Thu, Oct 31, 2013 at 5:44 PM, Jack Park  wrote:
>>> Latest zookeeper is installed on an Ubuntu server box.
>>> Java is 1.7 latest build.
>>> whereis points to java just fine.
>>> /etc/zookeeper is empty.
>>>
>>> boot zookeeper from /bin as sudo ./zkServer.sh start
>>> Console says "Started"
>>> /etc/zookeeper now has a .pid file
>>> In another console, ./zkServer.sh status returns:
>>> "It's probably not running"
>>>
>>> An interesting fact: the log4j.properties file says there should be a
>>> zookeeper.log file in "."; there is no log file. When I do a text
>>> search in the zookeeper source code for where it picks up the
>>> log4j.properties, nothing is found.
>>>
>>> Fascinating, what?  This must be a common beginner's question, not
>>> well covered in web-search for my context. Does it ring any bells?
>>>
>>> Many thanks.
>>> Jack
>

Re: Simple (?) zookeeper question

2013-11-01 Thread Jack Park

/clusterstate.json seems to clearly state that all 3 nodes are alive,
have ranges, and are active.

Still, it would seem that java is still not properly installed.
ZooKeeper is dropping zookeeper.out in the /bin directory, which says
this, among other things:

Server environment:java.home=/usr/local/java/jdk1.7.0_40/jre

Server 
environment:java.class.path=/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../build/classes:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../build/lib/*.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../conf:

Server environment:java.library.path=
/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

There is no /usr/java/...
It's really a mystery where zookeeper is getting these values;
everything else seems right.

But, for me, here's the amazing chunk of traces (cleaned up a bit)

Accepted socket connection from /127.0.0.1:39065
Client attempting to establish new session at /127.0.0.1:39065
Established session 0x1421197e6e90002 with negotiated timeout 15000
for client /127.0.0.1:39065
Got user-level KeeperException when processing
sessionid:0x1421197e6e90002 type:create cxid:0x1 zxid:0xc0 txntype:-1
reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = NodeExists
for /overseer
Got user-level KeeperException when processing
sessionid:0x1421197e6e90002 type:create cxid:0x3 zxid:0xc1 txntype:-1
reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = NodeExists
for /overseer
Got user-level KeeperException when processing
sessionid:0x1421197e6e90002 type:delete cxid:0xe zxid:0xc2 txntype:-1
reqpath:n/a Error Path:/live_nodes/127.0.1.1:7590_solr
Error:KeeperErrorCode = NoNode for /live_nodes/127.0.1.1:7590_solr
Got user-level KeeperException when processing
sessionid:0x1421197e6e90002 type:delete cxid:0x9f zxid:0xcd txntype:-1
reqpath:n/a Error Path:/collections/collection1/leaders/shard3
Error:KeeperErrorCode = NoNode for
/collections/collection1/leaders/shard3
2013-10-31 21:01:19,344 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
when processing sessionid:0x1421197e6e90002 type:create cxid:0xa0
zxid:0xce txntype:-1 reqpath:n/a Error Path:/overseer
Error:KeeperErrorCode = NodeExists for /overseer
Got user-level KeeperException when processing
sessionid:0x1421197e6e90002 type:create cxid:0xaa zxid:0xd1 txntype:-1
reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = NodeExists
for /overseer
Accepted socket connection from /10.1.10.180:55528
Client attempting to establish new session at /10.1.10.180:55528
Established session 0x1421197e6e90003 with negotiated timeout 1
for client /10.1.10.180:55528
WARN Exception causing close of session 0x1421197e6e90003 due to
java.io.IOException: Connection reset by peer
Closed socket connection for client /10.1.10.180:55528 which had
sessionid 0x1421197e6e90003

Sockets from 10.1.10.180 are my windoz box shipping solr documents. I
am not sure how I am using 55528 unless that's a solrj behavior.
Connection reset by peer would suggest something in my code, but my
code is a clone of code supplied in a Solr training course. Must be
good. Right?

I also have no clue what is /127.0.0.1:39065 -- that's not one of my nodes.

The quest continues.

On Fri, Nov 1, 2013 at 9:21 AM, Jack Park  wrote:
> Alan,
> That was brilliant!
> My test harness was behind a couple of notches.
>
> Hah! So, now we open yet another can of strange looking creatures, namely:
>
> No live SolrServers available to handle this
> request:[http://127.0.1.1:8983/solr/collection1]
> at 
> org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:347)
>
> 3 times, once for each URL I passed into the server. Here is the code:
>
> String zkurl = "10.1.10.178:2181";
> String solrurla = "10.1.10.178:8983";
> String solrurlb = "10.1.10.178:7574";
> String solrurlc = "10.1.10.178:7590";
>
> LBHttpSolrServer sv = new LBHttpSolrServer(solrurla,solrurlb,solrurlc);
> CloudSolrServer server = new CloudSolrServer(zkurl,sv);
> server.setDefaultCollection("collection1");
>
> I am struggling to imagine how 10.1.10.178 got translated to 127.0.1.1
> and the port assignments ignored for each URL passed in.
>
> That error message seems well known to search engines. One suggest

Re: Simple (?) zookeeper question

2013-11-01 Thread Jack Park

The top error message at my test harness is this:

No live SolrServers available to handle this request:
[http://127.0.1.1:8983/solr/collection1,
http://127.0.1.1:7574/solr/collection1,
http://127.0.1.1:7590/solr/collection1]

I have to assume that error message was somehow shipped by zookeeper,
because those servers actually exist, to the test harness, at
10.1.10.178, and if I access any one of them from the browser,
/solr/collection1 does not work, but /solr/#/collection1 does work.

On Fri, Nov 1, 2013 at 10:34 AM, Jack Park  wrote:
> /clusterstate.json seems to clearly state that all 3 nodes are alive,
> have ranges, and are active.
>
> Still, it would seem that java is still not properly installed.
> ZooKeeper is dropping zookeeper.out in the /bin directory, which says
> this, among other things:
>
> Server environment:java.home=/usr/local/java/jdk1.7.0_40/jre
>
> Server 
> environment:java.class.path=/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../build/classes:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../build/lib/*.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/usr/local/lib/SolrCloud/zookeeper/zookeeper-3.4.5/bin/../conf:
>
> Server environment:java.library.path=
> /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
>
> There is no /usr/java/...
> It's really a mystery where zookeeper is getting these values;
> everything else seems right.
>
> But, for me, here's the amazing chunk of traces (cleaned up a bit)
>
> Accepted socket connection from /127.0.0.1:39065
> Client attempting to establish new session at /127.0.0.1:39065
> Established session 0x1421197e6e90002 with negotiated timeout 15000
> for client /127.0.0.1:39065
> Got user-level KeeperException when processing
> sessionid:0x1421197e6e90002 type:create cxid:0x1 zxid:0xc0 txntype:-1
> reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = NodeExists
> for /overseer
> Got user-level KeeperException when processing
> sessionid:0x1421197e6e90002 type:create cxid:0x3 zxid:0xc1 txntype:-1
> reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = NodeExists
> for /overseer
> Got user-level KeeperException when processing
> sessionid:0x1421197e6e90002 type:delete cxid:0xe zxid:0xc2 txntype:-1
> reqpath:n/a Error Path:/live_nodes/127.0.1.1:7590_solr
> Error:KeeperErrorCode = NoNode for /live_nodes/127.0.1.1:7590_solr
> Got user-level KeeperException when processing
> sessionid:0x1421197e6e90002 type:delete cxid:0x9f zxid:0xcd txntype:-1
> reqpath:n/a Error Path:/collections/collection1/leaders/shard3
> Error:KeeperErrorCode = NoNode for
> /collections/collection1/leaders/shard3
> 2013-10-31 21:01:19,344 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
> when processing sessionid:0x1421197e6e90002 type:create cxid:0xa0
> zxid:0xce txntype:-1 reqpath:n/a Error Path:/overseer
> Error:KeeperErrorCode = NodeExists for /overseer
> Got user-level KeeperException when processing
> sessionid:0x1421197e6e90002 type:create cxid:0xaa zxid:0xd1 txntype:-1
> reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = NodeExists
> for /overseer
> Accepted socket connection from /10.1.10.180:55528
> Client attempting to establish new session at /10.1.10.180:55528
> Established session 0x1421197e6e90003 with negotiated timeout 1
> for client /10.1.10.180:55528
> WARN Exception causing close of session 0x1421197e6e90003 due to
> java.io.IOException: Connection reset by peer
> Closed socket connection for client /10.1.10.180:55528 which had
> sessionid 0x1421197e6e90003
>
> Sockets from 10.1.10.180 are my windoz box shipping solr documents. I
> am not sure how I am using 55528 unless that's a solrj behavior.
> Connection reset by peer would suggest something in my code, but my
> code is a clone of code supplied in a Solr training course. Must be
> good. Right?
>
> I also have no clue what is /127.0.0.1:39065 -- that's not one of my nodes.
>
> The quest continues.
>
> On Fri, Nov 1, 2013 at 9:21 AM, Jack Park  wrote:
>> Alan,
>> That was brilliant!
>> My test harness was behind a couple of notches.
>>
>> Hah! So, now we open yet another can of strange looking creatures, namely:
>>
>&

Re: Simple (?) zookeeper question

2013-11-01 Thread Jack Park

Thanks. I reviewed clusterstate.json again; those URLs are alive. Why
they are not responding seems to be the mystery du jour.

I reviewed my test suite: it is using field names in schema.xml, and
the server is configured to use the update responders I installed, all
of which work fine in a non-cloud mode.

Thanks
Jack

On Fri, Nov 1, 2013 at 11:12 AM, Shawn Heisey  wrote:
> On 11/1/2013 12:07 PM, Jack Park wrote:
>>
>> The top error message at my test harness is this:
>>
>> No live SolrServers available to handle this request:
>> [http://127.0.1.1:8983/solr/collection1,
>> http://127.0.1.1:7574/solr/collection1,
>> http://127.0.1.1:7590/solr/collection1]
>>
>> I have to assume that error message was somehow shipped by zookeeper,
>> because those servers actually exist, to the test harness, at
>> 10.1.10.178, and if I access any one of them from the browser,
>> /solr/collection1 does not work, but /solr/#/collection1 does work.
>
>
> Those are *base* urls.  By themselves, they return 404. For an example of
> how a base URL is used, try /solr/collection1/select?q=*:* instead.
>
> Any URL with /#/ in it is part of the admin UI, which runs mostly in the
> browser and accesses Solr handlers to gather information. It is not Solr
> itself.
>
> Thanks,
> Shawn
>

Cloud issue as an issue with SolrJ?

2013-11-03 Thread Jack Park

I now have a single ZK running standalone on 2121. On the same CPU, I
have three nodes.

I used a curl to send over two documents, one each to two of the three
nodes in the cloud.  According to a web query, they are both there.

My solrconfig.xml file has a custom update response processor chain
defined thus:


  
  
hello
  
  


where the added process intercepts a SolrDocument after it is
processed and sends it out as a JSON object to TCP socket listeners.

The instance of SolrJ I have implemented looks like this:

 LBHttpSolrServer sv = new LBHttpSolrServer(solrurla,solrurlb,solrurlc);
sv.getHttpClient().getParams().setParameter("update.chain",
"update"); // "merge");
   CloudSolrServer server = new CloudSolrServer(zkurl,sv);
server.setDefaultCollection("collection1");

where the commented-out code would call my "merge" update chain.

In curl tests, /solr/merge?commit=true ... got a jetty error
/solr/merge not found.
When I changed that to /solr/update?commit=true... the document got
indexed. Thus, commenting out "merge" in favor of "update".

In any case (merge, update, or no update.chain setting at all), the
SolrJ implementation fails, typically at a zookeeper.out nio exception
"socket closed by peer".

Rewriting my implementation to this:
   CloudSolrServer server = new CloudSolrServer(zkurl);
   server.setDefaultCollection("collection1");
makes no change in behavior.

Where is the error thrown?

The code to build a doc is this (which reflects my field definitions):

SolrInputDocument doc = new SolrInputDocument();
   doc.addField( "locator", "doc"+i);
   doc.addField( "label", "document " + i);
   doc.addField( "details", "This is document " + i);
   server.add(doc);

The error is thrown at server.add(doc)

Many thanks in advance for any observations or suggestions.

Cheers
Jack

Re: Cloud issue as an issue with SolrJ?

2013-11-03 Thread Jack Park

Issue resolved, with great thanks to Tim Casey.
The issue was based on my own poor understanding of the mechanics of
ZooKeeper. The "host" setting in solr.xml must find the correct value
and not default to localhost. Simply hard-wiring host to the network
address of the computer made everything work.


On Sun, Nov 3, 2013 at 12:04 PM, Jack Park  wrote:
> I now have a single ZK running standalone on 2121. On the same CPU, I
> have three nodes.
>
> I used a curl to send over two documents, one each to two of the three
> nodes in the cloud.  According to a web query, they are both there.
>
> My solrconfig.xml file has a custom update response processor chain
> defined thus:
>
> 
>   
>class="org.apache.solr.update.TopicQuestsHarvestProcessFactory">
> hello
>   
>   
> 
>
> where the added process intercepts a SolrDocument after it is
> processed and sends it out as a JSON object to TCP socket listeners.
>
> The instance of SolrJ I have implemented looks like this:
>
>  LBHttpSolrServer sv = new LBHttpSolrServer(solrurla,solrurlb,solrurlc);
> sv.getHttpClient().getParams().setParameter("update.chain",
> "update"); // "merge");
>CloudSolrServer server = new CloudSolrServer(zkurl,sv);
> server.setDefaultCollection("collection1");
>
> where the commented-out code would call my "merge" update chain.
>
> In curl tests, /solr/merge?commit=true ... got a jetty error
> /solr/merge not found.
> When I changed that to /solr/update?commit=true... the document got
> indexed. Thus, commenting out "merge" in favor of "update".
>
> In any case (merge, update, or no update.chain setting at all), the
> SolrJ implementation fails, typically at a zookeeper.out nio exception
> "socket closed by peer".
>
> Rewriting my implementation to this:
>CloudSolrServer server = new CloudSolrServer(zkurl);
>server.setDefaultCollection("collection1");
> makes no change in behavior.
>
> Where is the error thrown?
>
> The code to build a doc is this (which reflects my field definitions):
>
> SolrInputDocument doc = new SolrInputDocument();
>doc.addField( "locator", "doc"+i);
>doc.addField( "label", "document " + i);
>doc.addField( "details", "This is document " + i);
>server.add(doc);
>
> The error is thrown at server.add(doc)
>
> Many thanks in advance for any observations or suggestions.
>
> Cheers
> Jack

Indexing URLs in Solr?

2013-11-07 Thread Jack Park

Figuring out a google query to gain an answer seems difficult given
the ambiguity;

I have a field:



into which I store a URL

which, when displayed as a result of a query, looks like this in the
admin console:

"resourceURL": "http://someotherserver.org/";,

The query "resourceURL:*" will find all of them, but there is this question:

What does the query look like to find that specific URL?

Of course, "resourceURL:http://someotherserver.org/"; doesn't work

This one
resourceURL=http%3A%2F%2Fsomeotherserver.org%2F

fails as well.

What am I overlooking?

Many thanks in advance.
Jack

Re: Indexing URLs in Solr?

2013-11-07 Thread Jack Park

Spoke too soon. Hacking rocks!
Finally landed on this heuristic, and it works:

resourceURL:"http://someotherserver.org/";

On Thu, Nov 7, 2013 at 9:52 AM, Jack Park  wrote:
> Figuring out a google query to gain an answer seems difficult given
> the ambiguity;
>
> I have a field:
>
> 
>
> into which I store a URL
>
> which, when displayed as a result of a query, looks like this in the
> admin console:
>
> "resourceURL": "http://someotherserver.org/";,
>
> The query "resourceURL:*" will find all of them, but there is this question:
>
> What does the query look like to find that specific URL?
>
> Of course, "resourceURL:http://someotherserver.org/"; doesn't work
>
> This one
> resourceURL=http%3A%2F%2Fsomeotherserver.org%2F
>
> fails as well.
>
> What am I overlooking?
>
> Many thanks in advance.
> Jack

Solr 4.6.1: Core discovery and default core

2014-02-27 Thread Cynthia Park

Hello,

I may have missed this but, how do you specify a default core when using
the new-style
for the solr.xml? When I view the status of my Solr core setup (
http://localhost:8983/solr/admin/cores?action=STATUS) I see a
isDefaultCore speficiation
but, i'm not sure where it can from and and where it's located so that it
may be changed.  false

Also when viewing the status I see:
collection1

I thought that defaultCoreName was not supported when using the new-style
for solr.xml?  Also, not sure why it picked up the value "collection1" as I
did not specify a default core.

Any help is greatly appreciated.

Thanks

Re: solr.DirectUpdateHandler2 failed to instantiate

2013-06-27 Thread Jack Park

Wow! That's been a while back, and it appears that my journal didn't
carry a good trace of what I did. Here's a reconstruction:

>From my earlier attempt, which is reflected in this solrconfig.xml entry



notice that I am calling solrDirectUpdateHandler2 directly in defining
a requestHandler

I don't do that anymore. Now, it's this:



which took a lot of fishing to sort out, because, being somewhat
dyslexic, it took a long time to figure out that I can use "harvest"
as a setting in SolrJ, thus:

harvestServer = new HttpSolrServer(solrURL);
harvestServer.getHttpClient().getParams().setParameter("update.chain",
"harvest");

In short, the original exception was based on a gross
misinterpretation of how one goes about equating solrconfig.xml with
configurations of SolrJ.

Hope that helps more than it confuses!

Cheers
Jack

On Thu, Jun 27, 2013 at 9:45 AM, Mark Bennett
 wrote:
> Jack,
>
> Did you ever find a fix for this?
>
> I'm having similar issues (different parts of solrconfig) and my guess is 
> it's a config issue somewhere, vs. a proper casting problem, some nested init 
> issue.
>
> Was curious what you found?
>
>
> On Mar 13, 2013, at 11:52 AM, Jack Park  wrote:
>
>> I can safely say that it is not DirectUpdateHandler2 failing;  By
>> commenting out my own handlers, the system boots without error.
>>
>> This means that my handlers are problematic in some way. The moment I
>> put back just one of my handlers:
>>
>> 
>>  
>>  > class="org.apache.solr.update.TopicQuestsDocumentProcessFactory">
>>hello
>>  
>>  
>> 
>>
>> >  class="solr.DirectUpdateHandler2">
>>   
>> harvest
>>
>>
>> 
>>
>> The problem returns.  It simply appears that I cannot declare a named
>> requestHandler using that class.
>>
>> Jack
>>
>> On Tue, Mar 12, 2013 at 12:22 PM, Jack Park  wrote:
>>> Indeed! Perhaps the germane part is this, before the failure to
>>> instantiate notice:
>>>
>>> Caused by: java.lang.ClassCastException: class 
>>> org.apache.solr.update.DirectUpda
>>> teHandler2
>>>at java.lang.Class.asSubclass(Unknown Source)
>>>at 
>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.
>>> java:432)
>>>at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:507)
>>>
>>> This suggests that I might be doing something wrong elsewhere in 
>>> solrconfig.xml.
>>>
>>> The possibly relevant parts (my contributions) are these:
>>>
>>> 
>>>  
>>>  
>>> 
>>>
>>> 
>>>  
>>>  >> class="org.apache.solr.update.TopicQuestsDocumentProcessFactory">
>>>hello
>>>  
>>>  
>>> 
>>>
>>> >>  class="solr.DirectUpdateHandler2">
>>>   
>>> harvest
>>>
>>>
>>> 
>>>
>>> >>  class="solr.DirectUpdateHandler2">
>>>   
>>> partial
>>>   
>>> 
>>>
>>> Thanks
>>> Jack
>>>
>>> On Tue, Mar 12, 2013 at 12:16 PM, Mark Miller  wrote:
>>>> There should be a stack trace - also, you shouldn't have to do anything 
>>>> special to use this class. It's the default and only truly supported 
>>>> implementation…
>>>>
>>>> - Mark
>>>>
>>>> On Mar 12, 2013, at 2:53 PM, Jack Park  wrote:
>>>>
>>>>> That messages gives great, but terrible google. Zillions of hits,
>>>>> mostly filled with very long log traces, and zero messages (that I
>>>>> could find) about what to do about it.
>>>>>
>>>>> I switched over to using that handler since it has an update log
>>>>> specified, and that's the only place I've found how to use update log.
>>>>> But, can't boot now.
>>>>>
>>>>> All the jars are in place; I'm able to import that class in my code.
>>>>>
>>>>> Is there any news on that issue?
>>>>>
>>>>> Many thanks
>>>>> Jack
>>>>
>> FLAGS ()
>

Question about soft commit and updateRequestProcessorChain

2013-08-07 Thread Jack Park

If one allows for a soft commit (rather than a hard commit on each
request), when does the updateRequestProcessorChain fire? Does it fire
after the commit?

Many thanks
Jack

Re: Question about soft commit and updateRequestProcessorChain

2013-08-07 Thread Jack Park

Ok. So, running the update processor chain *is* the commit process?

In answer to Erick's question: my habit, an old and apparently bad
one, has been to call a hard commit at the end of each update. My
question had to do with allowing soft commits to be controlled by
settings in solrconfig.xml, say every 30 seconds or something like
that (I really haven't studied such options yet).

I ask this question because I add an additional call to the update
processor, which, after running Lucene, the document is then sent
outside to an agent network for further processing. I needed to know
if the document was already committed by that time.

I am inferring from here that the document has been committed after
the first step in the update processor chain, even if that's based on
a soft commit.

Thanks!
JackP

On Wed, Aug 7, 2013 at 4:20 PM, Jack Krupansky  wrote:
> Most update processor chains will be configured with the Run Update
> processor as the last processor of the chain. That's were the Lucene index
> update and optional commit would be done.
>
> -- Jack Krupansky
>
> -Original Message- From: Jack Park
> Sent: Wednesday, August 07, 2013 1:04 PM
> To: solr-user@lucene.apache.org
> Subject: Question about soft commit and updateRequestProcessorChain
>
>
> If one allows for a soft commit (rather than a hard commit on each
> request), when does the updateRequestProcessorChain fire? Does it fire
> after the commit?
>
> Many thanks
> Jack

Question about plug-in update handler failure

2013-10-10 Thread Jack Park

I have an "interceptor" which grabs SolrDocument instances in the
update handler chain. It feeds those documents as a JSON string out to
an agent system.

That system has been running fine all the way up to Solr 4.3.1
I have discovered that, as of 4.4 and now 4.5, the very same config
files, agent jar, and test harness shows that no documents are
intercepted, even though the index is built.

I am wondering if I missed something in changes to Solr beyond 4.3.1
which would invalidate my setup.

For the record, earlier trials opened the war and dropped my agent jar
into WEB-INF/lib; most recent trials on all systems leaves the war
intact and drops the agent jar into collection1/lib -- it still works
on 4.3.1, but nothing beyond that.

Many thanks in advance for any thoughts.

Jack

Re: Question about plug-in update handler failure

2013-10-11 Thread Jack Park

Issue resolved. Not a Solr issue; a really hard to discover missing
library in my installation.

On Thu, Oct 10, 2013 at 7:10 PM, Jack Park  wrote:
> I have an "interceptor" which grabs SolrDocument instances in the
> update handler chain. It feeds those documents as a JSON string out to
> an agent system.
>
> That system has been running fine all the way up to Solr 4.3.1
> I have discovered that, as of 4.4 and now 4.5, the very same config
> files, agent jar, and test harness shows that no documents are
> intercepted, even though the index is built.
>
> I am wondering if I missed something in changes to Solr beyond 4.3.1
> which would invalidate my setup.
>
> For the record, earlier trials opened the war and dropped my agent jar
> into WEB-INF/lib; most recent trials on all systems leaves the war
> intact and drops the agent jar into collection1/lib -- it still works
> on 4.3.1, but nothing beyond that.
>
> Many thanks in advance for any thoughts.
>
> Jack

Querying a transitive closure?

2013-03-27 Thread Jack Park

This is a question about "isA?"

We want to know if M isA B   isA?(M,B)

For some M, one might be able to look into M to see its type or which
class(es) for which it is a subClass. We're talking taxonomic queries
now.
But, for some M, one might need to ripple up the "transitive closure",
looking at all the super classes, etc, recursively.

It seems unreasonable to do that over HTTP; it seems more reasonable
to grab a core and write a custom isA query handler. But, how do you
do that in a SolrCloud?

Really curious...

Many thanks in advance for ideas.
Jack

Re: Querying a transitive closure?

2013-03-27 Thread Jack Park

Hi Otis,

I fully expect to grow to SolrCloud -- many shards. For now, it's
solo. But, my thinking relates to cloud. I look for ways to reduce the
number of HTTP round trips through SolrJ. Maybe you have some ideas?

Thanks
Jack

On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
 wrote:
> Hi Jack,
>
> Is this really about HTTP and Solr vs. SolrCloud or more whether
> Solr(Cloud) is the right tool for the job and if so how to structure
> the schema and queries to make such lookups efficient?
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Wed, Mar 27, 2013 at 12:53 PM, Jack Park  wrote:
>> This is a question about "isA?"
>>
>> We want to know if M isA B   isA?(M,B)
>>
>> For some M, one might be able to look into M to see its type or which
>> class(es) for which it is a subClass. We're talking taxonomic queries
>> now.
>> But, for some M, one might need to ripple up the "transitive closure",
>> looking at all the super classes, etc, recursively.
>>
>> It seems unreasonable to do that over HTTP; it seems more reasonable
>> to grab a core and write a custom isA query handler. But, how do you
>> do that in a SolrCloud?
>>
>> Really curious...
>>
>> Many thanks in advance for ideas.
>> Jack

Re: Querying a transitive closure?

2013-03-27 Thread Jack Park

Hi Otis,
That's essentially the answer I was looking for: each shard (are we
talking master + replicas?) has the plug-in custom query handler.  I
need to build it to find out.

What I mean is that there is a taxonomy, say one with a single root
for sake of illustration, which grows all the classes, subclasses, and
instances. If I have an object that is somewhere in that taxonomy,
then it has a zigzag chain of parents up that tree (I've seen that
called a "transitive closure". If class B is way up that tree from M,
no telling how many queries it will take to find it.  Hmmm...
recursive ascent, I suppose.

Many thanks
Jack

On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic
 wrote:
> Hi Jack,
>
> I don't fully understand the exact taxonomy structure and your needs,
> but in terms of reducing the number of HTTP round trips, you can do it
> by writing a custom SearchComponent that, upon getting the initial
> request, does everything "locally", meaning that it talks to the
> local/specified shard before returning to the caller.  In SolrCloud
> setup with N shards, each of these N shards could be queried in such a
> way in parallel, running query/queries on their local shards.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Wed, Mar 27, 2013 at 3:11 PM, Jack Park  wrote:
>> Hi Otis,
>>
>> I fully expect to grow to SolrCloud -- many shards. For now, it's
>> solo. But, my thinking relates to cloud. I look for ways to reduce the
>> number of HTTP round trips through SolrJ. Maybe you have some ideas?
>>
>> Thanks
>> Jack
>>
>> On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
>>  wrote:
>>> Hi Jack,
>>>
>>> Is this really about HTTP and Solr vs. SolrCloud or more whether
>>> Solr(Cloud) is the right tool for the job and if so how to structure
>>> the schema and queries to make such lookups efficient?
>>>
>>> Otis
>>> --
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 27, 2013 at 12:53 PM, Jack Park  
>>> wrote:
>>>> This is a question about "isA?"
>>>>
>>>> We want to know if M isA B   isA?(M,B)
>>>>
>>>> For some M, one might be able to look into M to see its type or which
>>>> class(es) for which it is a subClass. We're talking taxonomic queries
>>>> now.
>>>> But, for some M, one might need to ripple up the "transitive closure",
>>>> looking at all the super classes, etc, recursively.
>>>>
>>>> It seems unreasonable to do that over HTTP; it seems more reasonable
>>>> to grab a core and write a custom isA query handler. But, how do you
>>>> do that in a SolrCloud?
>>>>
>>>> Really curious...
>>>>
>>>> Many thanks in advance for ideas.
>>>> Jack

Re: Querying a transitive closure?

2013-03-28 Thread Jack Park

Thank you for this. I had thought about it but reasoned in a naive
way: who would do such a thing?

Doing so makes the query local: once the object has been retrieved, no
further HTTP queries are required. Implementation perhaps entails one
request to fetch the presumed parent in order to harvest its
transitive closure.  I need to think about that.

Many thanks
Jack

On Thu, Mar 28, 2013 at 5:06 AM, Jens Grivolla  wrote:
> Exactly, you should usually design your schema to fit your queries, and if
> you need to retrieve all ancestors then you should index all ancestors so
> you can query for them easily.
>
> If that doesn't work for you then either Solr is not the right tool for the
> job, or you need to rethink your schema.
>
> The description of doing lookups within a tree structure doesn't sound at
> all like what you would use a text retrieval engine for, so you might want
> to rethink why you want to use Solr for this. But if that "transitive
> closure" is something you can calculate at indexing time then the correct
> solution is the one Upayavira provided.
>
> If you want people to be able to help you you need to actually describe your
> problem (i.e. what is my data, and what are my queries) instead of diving
> into technical details like "reducing HTTP roundtrips". My guess is that if
> you need to "reduce HTTP roundtrips" you're probably doing it wrong.
>
> HTH,
> Jens
>
>
> On 03/28/2013 08:15 AM, Upayavira wrote:
>>
>> Why don't you index all ancestor classes with the document, as a
>> multivalued field, then you could get it in one hit. Am I missing
>> something?
>>
>> Upayavira
>>
>> On Thu, Mar 28, 2013, at 01:59 AM, Jack Park wrote:
>>>
>>> Hi Otis,
>>> That's essentially the answer I was looking for: each shard (are we
>>> talking master + replicas?) has the plug-in custom query handler.  I
>>> need to build it to find out.
>>>
>>> What I mean is that there is a taxonomy, say one with a single root
>>> for sake of illustration, which grows all the classes, subclasses, and
>>> instances. If I have an object that is somewhere in that taxonomy,
>>> then it has a zigzag chain of parents up that tree (I've seen that
>>> called a "transitive closure". If class B is way up that tree from M,
>>> no telling how many queries it will take to find it.  Hmmm...
>>> recursive ascent, I suppose.
>>>
>>> Many thanks
>>> Jack
>>>
>>> On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic
>>>  wrote:
>>>>
>>>> Hi Jack,
>>>>
>>>> I don't fully understand the exact taxonomy structure and your needs,
>>>> but in terms of reducing the number of HTTP round trips, you can do it
>>>> by writing a custom SearchComponent that, upon getting the initial
>>>> request, does everything "locally", meaning that it talks to the
>>>> local/specified shard before returning to the caller.  In SolrCloud
>>>> setup with N shards, each of these N shards could be queried in such a
>>>> way in parallel, running query/queries on their local shards.
>>>>
>>>> Otis
>>>> --
>>>> Solr & ElasticSearch Support
>>>> http://sematext.com/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 27, 2013 at 3:11 PM, Jack Park 
>>>> wrote:
>>>>>
>>>>> Hi Otis,
>>>>>
>>>>> I fully expect to grow to SolrCloud -- many shards. For now, it's
>>>>> solo. But, my thinking relates to cloud. I look for ways to reduce the
>>>>> number of HTTP round trips through SolrJ. Maybe you have some ideas?
>>>>>
>>>>> Thanks
>>>>> Jack
>>>>>
>>>>> On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
>>>>>  wrote:
>>>>>>
>>>>>> Hi Jack,
>>>>>>
>>>>>> Is this really about HTTP and Solr vs. SolrCloud or more whether
>>>>>> Solr(Cloud) is the right tool for the job and if so how to structure
>>>>>> the schema and queries to make such lookups efficient?
>>>>>>
>>>>>> Otis
>>>>>> --
>>>>>> Solr & ElasticSearch Support
>>>>>> http://sematext.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 27, 2013 at 12:53 PM, Jack Park 
>>>>>> wrote:
>>>>>>>
>>>>>>> This is a question about "isA?"
>>>>>>>
>>>>>>> We want to know if M isA B   isA?(M,B)
>>>>>>>
>>>>>>> For some M, one might be able to look into M to see its type or which
>>>>>>> class(es) for which it is a subClass. We're talking taxonomic queries
>>>>>>> now.
>>>>>>> But, for some M, one might need to ripple up the "transitive
>>>>>>> closure",
>>>>>>> looking at all the super classes, etc, recursively.
>>>>>>>
>>>>>>> It seems unreasonable to do that over HTTP; it seems more reasonable
>>>>>>> to grab a core and write a custom isA query handler. But, how do you
>>>>>>> do that in a SolrCloud?
>>>>>>>
>>>>>>> Really curious...
>>>>>>>
>>>>>>> Many thanks in advance for ideas.
>>>>>>> Jack
>>
>>
>
>

Re: Flow Chart of Solr

2013-04-03 Thread Jack Park

There are three books on Solr, two with that in the title, and one,
Taming Text, each of which have been very valuable in understanding
Solr.

Jack

On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky  wrote:
> Sure, yes. But... it comes down to what level of detail you want and need
> for a specific task. In other words, there are probably a dozen or more
> levels of detail. The reality is that if you are going to work at the Solr
> code level, that is very, very different than being a "user" of Solr, and at
> that point your first step is to become familiar with the code itself.
>
> When you talk about "parsing" and "stemming", you are really talking about
> the user-level, not the Solr code level. Maybe what you really need is a
> cheat sheet that maps a user-visible feature to the main Solr code component
> for that implements that user feature.
>
> There are a number of different forms of "parsing" in Solr - parsing of
> what? Queries? Requests? Solr documents? Function queries?
>
> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does that.
> Lucene does all of the "token filtering". Are you asking for details on how
> Lucene works? Maybe you meant to ask how "term analysis" works, which is
> split between Solr and Lucene. Or maybe you simply wanted to know when and
> where term analysis is done. Tell us your specific problem or specific
> question and we can probably quickly give you an answer.
>
> In truth, NOBODY uses "flow charts" anymore. Sure, there are some user-level
> diagrams, but not down to the code level.
>
> If you could focus on specific questions, we could give you specific
> answers.
>
> "Main steps"? That depends on what level you are working at. Tell us what
> problem you are trying to solve and we can point you to the relevant areas.
>
> In truth, if you become generally familiar with Solr at the user level
> (study the wikis), you will already know what the "main steps" are.
>
> So, it is not "main steps of Solr", but main steps of some specific
> "request" of Solr, and for a specified level of detail, and for a specified
> area of Solr if greater detail is needed. Be more specific, and then we can
> be more specific.
>
> For now, the general advice for people who need or want to go far beyond the
> user level is to "get familiar with the code" - just LOOK at it - a lot of
> the package and class names are OBVIOUS, really, and follow the class
> hierarchy and code flow using the standard features of any modern Java IDE.
> If you are wondering where to start for some specific user-level feature,
> please ask specifically about that feature. But... make a diligent effort to
> discover and learn on your own before asking open-ended questions.
>
> Sure, there are lots of things in Lucene and Solr that are rather complex
> and seemingly convoluted, and not obvious, but people are more than willing
> to help you out if you simply ask a specific question. I mean, not everybody
> needs to know the fine detail of query parsing, analysis, building a
> Lucene-level stemmer, etc. If we tried to put all of that in a diagram, most
> people would be more confused than enlightened.
>
> At which step are scores calculated? That's more of a Lucene question. Or,
> are you really asking what code in Solr invokes Lucene search methods that
> calculate basic scores?
>
> In short, you need to be more specific. Don't force us to guess what problem
> you are trying to solve.
>
> -- Jack Krupansky
>
> -Original Message- From: Furkan KAMACI
> Sent: Wednesday, April 03, 2013 6:52 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Flow Chart of Solr
>
>
> So, all in all, is there anybody who can write down just main steps of
> Solr(including parsing, stemming etc.)?
>
>
> 2013/4/2 Furkan KAMACI 
>
>> I think about myself as an example. I have started to make research about
>> Solr just for some weeks. I have learned Solr and its related projects. My
>> next step writing down the main steps Solr. We have separated learning
>> curve of Solr into two main categories.
>> First one is who are using it as out of the box components. Second one is
>> developer side.
>>
>> Actually developer side branches into two way.
>>
>> First one is general steps of it. i.e. document comes into Solr (i.e.
>> crawled data of Nutch). which analyzing processes are going to done
>> (stamming, hamming etc.), what will be doing after parsing step by step.
>> When a search query happens what happens step by step, at which step
>> scores
>> are calculated so on so forth.
>> Second one is more code specific i.e. which handlers takes into account
>> data that will going to be indexed(no need the explain every handler at
>> this step) . Which are the analyzer, tokenizer classes and what are the
>> flow between them. How response handlers works and what are they.
>>
>> Also explaining about cloud side is other work.
>>
>> Some of explanations are currently presents at wiki (but some of them are
>> at very deep places at

Re: Flow Chart of Solr

2013-04-03 Thread Jack Park

Jack,

Is that new book up to the 4.+ series?

Thanks
The other Jack

On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky  wrote:
> And another one on the way:
> http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957
>
> Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.
>
> -- Jack Krupansky
>
> -Original Message- From: Jack Park
> Sent: Wednesday, April 03, 2013 11:25 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Flow Chart of Solr
>
> There are three books on Solr, two with that in the title, and one,
> Taming Text, each of which have been very valuable in understanding
> Solr.
>
> Jack
>
> On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky 
> wrote:
>>
>> Sure, yes. But... it comes down to what level of detail you want and need
>> for a specific task. In other words, there are probably a dozen or more
>> levels of detail. The reality is that if you are going to work at the Solr
>> code level, that is very, very different than being a "user" of Solr, and
>> at
>> that point your first step is to become familiar with the code itself.
>>
>> When you talk about "parsing" and "stemming", you are really talking about
>> the user-level, not the Solr code level. Maybe what you really need is a
>> cheat sheet that maps a user-visible feature to the main Solr code
>> component
>> for that implements that user feature.
>>
>> There are a number of different forms of "parsing" in Solr - parsing of
>> what? Queries? Requests? Solr documents? Function queries?
>>
>> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
>> that.
>> Lucene does all of the "token filtering". Are you asking for details on
>> how
>> Lucene works? Maybe you meant to ask how "term analysis" works, which is
>> split between Solr and Lucene. Or maybe you simply wanted to know when and
>> where term analysis is done. Tell us your specific problem or specific
>> question and we can probably quickly give you an answer.
>>
>> In truth, NOBODY uses "flow charts" anymore. Sure, there are some
>> user-level
>> diagrams, but not down to the code level.
>>
>> If you could focus on specific questions, we could give you specific
>> answers.
>>
>> "Main steps"? That depends on what level you are working at. Tell us what
>> problem you are trying to solve and we can point you to the relevant
>> areas.
>>
>> In truth, if you become generally familiar with Solr at the user level
>> (study the wikis), you will already know what the "main steps" are.
>>
>> So, it is not "main steps of Solr", but main steps of some specific
>> "request" of Solr, and for a specified level of detail, and for a
>> specified
>> area of Solr if greater detail is needed. Be more specific, and then we
>> can
>> be more specific.
>>
>> For now, the general advice for people who need or want to go far beyond
>> the
>> user level is to "get familiar with the code" - just LOOK at it - a lot of
>> the package and class names are OBVIOUS, really, and follow the class
>> hierarchy and code flow using the standard features of any modern Java
>> IDE.
>> If you are wondering where to start for some specific user-level feature,
>> please ask specifically about that feature. But... make a diligent effort
>> to
>> discover and learn on your own before asking open-ended questions.
>>
>> Sure, there are lots of things in Lucene and Solr that are rather complex
>> and seemingly convoluted, and not obvious, but people are more than
>> willing
>> to help you out if you simply ask a specific question. I mean, not
>> everybody
>> needs to know the fine detail of query parsing, analysis, building a
>> Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
>> most
>> people would be more confused than enlightened.
>>
>> At which step are scores calculated? That's more of a Lucene question. Or,
>> are you really asking what code in Solr invokes Lucene search methods that
>> calculate basic scores?
>>
>> In short, you need to be more specific. Don't force us to guess what
>> problem
>> you are trying to solve.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Furkan KAMACI
>> Sent: Wednesday, April 03, 2013 6:52 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Flow Chart of Solr
>>
>>
>> So, all in all, is

Re: Downloaded Solr 4.2.1 Source: Build Failing

2013-04-14 Thread Jack Park

What I learned is that I needed to upgrade Ant, then needed to install
Ivy; the build.xml in the outer subversion directory has an ant target
to install Ivy, and one to run-maven-build. I ran that, then switched
to /solr and ran "ant dist" which finished in under 2 minutes.

On Sun, Apr 14, 2013 at 10:14 AM, Steve Rowe  wrote:
> Hi Umesh,
>
> I have the exact same Java 1.6 version as you, on OS X v10.8.3.
>
> I downloaded the source distribution from the same mirror as you did, and ran 
> 'ant dist' under the solr/ directory, and got "BUILD SUCCESSFUL".
>
> (FYI, building the source distribution is part of the "smoke testing" we do 
> as part of validating a release, and this passed for me on my OS X 10.8.3 
> machine before I voted to release 4.2.1.)
>
> What version of Ant are you using?
>
> What command are you using to build?
>
> Did you try running 'ant clean' from the top level and then re-building?
>
> Steve
>
> On Apr 14, 2013, at 7:41 AM, Umesh Prasad  wrote:
>
>> Further update on same.
>> Build on Branch
>> http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1 succeeds
>> fine.
>> Build fails only for Source code downloaded from
>> http://apache.techartifact.com/mirror/lucene/solr/4.2.1/solr-4.2.1-src.tgz
>>
>>
>>
>>
>> On Sun, Apr 14, 2013 at 1:05 PM, Umesh Prasad  wrote:
>>
>>> j*ava version "1.6.0_43"
>>> Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203)
>>> Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode)
>>> *
>>> Mac OS X : Version 10.7.5
>>>
>>> --
>>> Umesh
>>>
>>>
>>>
>>> On Sat, Apr 13, 2013 at 12:08 AM, Chris Hostetter <
>>> hossman_luc...@fucit.org> wrote:
>>>

 :
 /Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/c
 : *omponent/QueryComponent.java:765: cannot find symbol
 : [javac] symbol  : class ShardFieldSortedHitQueue
 : [javac] location: class
 org.apache.solr.handler.component.QueryComponent
 : [javac]   ShardFieldSortedHitQueue queue;*

 Weird ... can you provide us more details about the java compiler you are
 using?

 ShardFieldSortedHitQueue is a package protected class declared in
 ShardDoc.java (in the same package as QueryComponent).  That isn't exactly
 a best practice, but it shouldn't be causing a compilation failure.


 -Hoss

>>>
>>>
>>>
>>> --
>>> ---
>>> Thanks & Regards
>>> Umesh Prasad
>>>
>>
>>
>>
>> --
>> ---
>> Thanks & Regards
>> Umesh Prasad
>

Re: Best way to design a "story and comments" schema.

2013-05-13 Thread Jack Park

Jack,

Why are multi-valued fields considered messy?
I think I am about to learn something..

Thanks
Another Jack

On Mon, May 13, 2013 at 5:29 AM, Jack Krupansky  wrote:
> Try the simplest, cleanest design first (at least on paper), before you
> start resorting to either dynamic fields or multi-valued fields or other
> messy approaches. Like, one collection for stories, which would have a story
> id and a second collection for comments, each with a comment id and a field
> that is the associated story id and user id. And a third collection for
> users and their profiles. Identify the user and get their user id. Identify
> the story (maybe by keyword search) to get story id. Then identify and facet
> user comments by story id and user id and whatever other search criteria,
> and then facet on that.
>
> -- Jack Krupansky
>
> -Original Message- From: samabhiK
> Sent: Monday, May 13, 2013 5:24 AM
> To: solr-user@lucene.apache.org
> Subject: Best way to design a "story and comments" schema.
>
>
> Hi, I wish to know how to best design a schema to store comments in stories
> /
> articles posted.
> I have a set of fields:
>   /    indexed="true" stored="true"/>
>    indexed="true" stored="true"/>
>    indexed="true" stored="true"/>
>    indexed="false" stored="true" />   /
> Users can post their comments on a post and I should be able to retrieve
> these comments and show it along side the original post. I only need to show
> the last 3 comments and show a facet of the remaining comments which user
> can click and see the rest of the comments ( something like facebook does ).
> One alternative, I could think of, was adding a dynamic field for all
> comments :
> / indexed="false"  stored="true"/>/
> So, to store each comments, I would send a text to solr of the form ->
> For Field Name: /comment_n/ Value:/[Commenter Name]:[Commenter ID]:[Actual
> Comment Text]/
> And to keep the count of those comments, I could use another field like so
> :/ indexed="true" stored="true"/>/
> With this approach, I will have to do some calculation when a comment is
> deleted by the user but I still can manage to show the comments right.
> My idea is to find the best solution for this scenario which will be fast
> and also be simple.
> Kindly suggest.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Quick SolrJ query how-to question.

2013-05-14 Thread Jack Park

In some sense, if all you want to do is send over a URL, e.g.
http://localhost:8993/, it's not out of the question to
use the java url stuff as exemplified at
http://www.cafeaulait.org/course/week12/22.html
or
http://stackoverflow.com/questions/7500342/using-sockets-to-fetch-a-webpage-with-java

But, that's a trivial case. You might have something else in mind.

Jack

On Tue, May 14, 2013 at 1:36 PM, Shawn Heisey  wrote:
> On 5/14/2013 3:13 AM, Luis Cappa Banda wrote:
>> I know that, but I was wondering if it exists another way just to set the
>> complete query (including q, fq, sort, etc.) embedded in a SolrQuery object
>> as the same way that you query using some kind of RequestHandler. That way
>> would be more flexible because you don't need to parse the complete query
>> checking q, fg, sort... parameters one by one and setting them with
>> setFields(), setStart(), setRows(), etcetera. Solr is doing that query
>> parse internally when you execute queries with it's REST API and maybe
>> there exist a way to re-use that functionality to just set a String to a
>> SolrQuery and that SolrQuery does internally all the magic.
>
> This is a little bit of an odd idea, because it goes against the way a
> Java programmer expects to do things.  Where does the 'URL parameter'
> version of your query come from?  If it's possible, it would make more
> sense to incorporate that code into your SolrJ app and avoid two steps
> -- the need to create the URL syntax and the need to decode the URL syntax.
>
> In a later message, you said that you are working on a SolrServer
> implementation to handle your use case.  I'm wondering if SolrJ already
> has URL query parameter parsing capability.  I'd be slightly surprised
> to learn that it does - that code is probably part of the servlet API.
>
> It's not that your idea is bad, it just sounds like a ton of extra work
> that could be better spent elsewhere.
>
> Thanks,
> Shawn
>

Re: Varnish

2013-06-21 Thread Jack Park

I presume you mean https://www.varnish-cache.org/
That's the first I'd heard of it.

Thanks
Jack

On Thu, Jun 20, 2013 at 10:48 PM, William Bell  wrote:
> Who is using varnish in front of SOLR?
>
> Anyone have any configs that work with the cache control headers of SOLR?
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076

Re: The book: Solr 4.x Deep Dive - Early Access Release #1

2013-06-21 Thread Jack Park

As one of the early reviewers of the manuscript, I always had high
hopes for this work.

I now have the pdf from lulu; do not have time now to dive deeply, but
will comment that it seems, to me at least, well worth owning.

Jack

On Fri, Jun 21, 2013 at 11:41 AM, Jack Krupansky
 wrote:
> Okay, it's DONE. Here's the Lulu link, ready to go:
>
> http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21079719.html
>
> (Or, go to Lulu.com and just search for "Solr" - It's the only hit so far.)
>
> Price is $9.99 for now (I get $8.10 of that, BTW, in case you're wondering
> how Lulu works - minus $0.90 (10%)  "base price" to host the file,
> bandwidth, credit card processing, etc., and minus another $0.90 (10%) for
> Lulu's "share, a total of 19% to Lulu.)
>
> I'll see how the response is over the next two weeks and maybe adjust the
> price. I almost went with $14.99 or even $19.99, but I  decided this was a
> decent introductory special. I mean, if it was complete, I might sell the
> e-book for $25 or $29.99 or so.
>
> This pricing and distribution is all an experiment and subject to change at
> any time.
>
> Thanks for all the feedback!
>
> Seriously, if you want to wait two weeks or a month for cleanup, go right
> ahead. I thought of delaying so that "everything looks right", but I decided
> that some us us just want the facts and the "finish" is not as important.
> I'll try to cater to both.
>
> I'll spend another week or so on cleanup, and then decide whether to
> intensify "finish" work, or focus on adding more content, like highlighting,
> distributed search, DIH, core and collection management, or maybe even
> Spatial.
>
> Here are the topics that are NOT in the current early-access edition:
>
> - SolrCloud
> - Traditional Distributed Solr - shards, master/slave, replication
> - Data Import Handler (DIH)
> - Core management
> - Collection management
> - Admin UI
> - Admin API
> - Luke
> - CheckIndex
> - Spatial and Geospatial search
> - Highlighting
> - Query elevation
> - Autocomplete deep dive
> - SolrJ API
> - UI example
> - Application layer example
> - Terms Component
> - Term vectors component
> - Javabin format
> - Deeper coverage of DocValues (mentioned in Faceting)
>
> All of those are candidates for work over in the coming months.
>
> Here are aspects that are NOT under consideration and beyond the current
> anticipated scope of the book, for now:
>
> - Cookbook approach to Solr
> - Deployment, such as configuring Tomcat
> - Tuning, estimation, performance optimization
> - Troubleshooting
> - Tips
> - Security
> - Access control
> - Document-level access control
> - Relevance Tuning
> - Data Modeling
> - How to develop custom plugin code
> - Lucene API itself
> - Diagrams - sorry, I'm a text guy - but contributions are welcome
> - Details of Lucene index format
> - Details of Lucene document scoring and relevancy
> - Non-Java client APIs
>
> -- Jack Krupansky
>
> -Original Message- From: Jack Krupansky
> Sent: Friday, June 21, 2013 9:04 AM
> To: solr-user@lucene.apache.org
> Subject: The book: Solr 4.x Deep Dive - Early Access Release #1
>
>
> I’m expecting to self-publish the first Early Access Release for my book,
> Solr 4.x Deep Dive, on lulu.com sometime today. It is still far from
> finished and needs lots of work and missing a lot of important areas
> (SolrCloud and distributed Solr in general, DIH, highlighting, core and
> collection API, admin API and UI, query elevation, etc.), but I think there
> is a critical mass of useful material that is a decent foundation to build
> the rest of the book on. For those who participated in the early chapter
> review process for the book’s predecessor (Lucene and Solr: The Definitive
> Guide), most of those review chapters (at least the ones authored by me) are
> included, plus a bunch more, especially chapters on indexing data, update
> processors, and faceting. The new book is Solr-only. Alas, I have not
> incorporated most of the reviewer feedback yet as I have been focused on
> writing for the indexing and faceting chapters for the past two months.
>
> It will be e-book (PDF) only for the time being. Don’t even think about
> printing it yourself – over 1,100 pages, and counting! Currently a 5MB
> download.
>
> I still haven’t settled on pricing. For early access, the intent is that
> people will want to check back every couple weeks or month or two, more like
> a subscription. My current thought is to treat it as if it were a $60 to $80
> paper book bought once per year, but on a monthly subscription, say $5 to $8
> per download. My expectation is to update roughly every two weeks, or at
> least monthly, as new material is added, issues resolved, and new Solr
> releases. In the early going, I’ll probably update the PDF on lulu every
> week.
>
> Given this rough model, what price point has the most appeal: $2.99 (yeah,
> who doesn’t want it, but little incentive for me!), $4.99 (seems reasonable,
>

New user: version conflict for ....

2012-12-26 Thread Jack Park

I am running against a networked Solr4 installation -- but not using
any of the cloud apparatus. I wish to update a document (Node) with
new information. I send back as a partial update using SolrJ's add()
command
document id
the new or updated field
version number precisely as it was fetched

What I get back is an error message:
version conflict for MyFirstNode1356582803755
expected=1422480157168369664 actual=1422480158385766400

When I look at that document in the admin browser, it looks like this:
{
"TupleListProperty": [
"250d3a66\\-c2cc\\-4c83\\-9b93\\-46cfdde120b8"
],
"_version_": 1422480158385766400,
"LocatorPropertyType": "MyFirstNode1356582803755"
}

which means all the other data in that node were replaced.

There must be something obvious I am missing. Is there a particular
recipe online using SolrJ to do this properly?

Many thanks in advance
Jack

Re: New user: version conflict for ....

2012-12-31 Thread Jack Park

Hi Chris,

Your suspicion turned out to be spot on with a code glitch.

The history of this has been due to a fairly weak understanding of how
partial update works. The first code error was just a simple,stupid
one in which I was not working against a "current" copy of the
document. But, when I got that single update working, I revised the
unit test to increase the complexity.

The core unit test creates two "nodes", and then wires them together
with a third node which serves as a relation topic between the two
nodes; yes, I am building topic maps here.  Each actor node
(SolrDocument) is constructed, then is updated with an addition to a
multi-valued field which contains node identifiers for the TupleNode,
a SolrDocument which turns a relationship between two actors into a
topic itself.  The update is that of adding a value to a previously
empty field.

The second version of the unit test creates the two actors, and a
first relation, then adds a second one.  This is where I discovered
(lots of trial and error here), that you don't send in the list in the
add or set, rather you send in one value at a time.

I am imagining that the solution to sending in multivalued updates on
the same field might mean a custom update handler which reduces HTTP
round trips when dealing with a list of values to add. Perhaps there
is a documented way to do multiple updates on the same document/field
pair in a single call?

Many thanks.
Jack

On Mon, Dec 31, 2012 at 12:06 PM, Chris Hostetter
 wrote:
>
> : any of the cloud apparatus. I wish to update a document (Node) with
> : new information. I send back as a partial update using SolrJ's add()
> : command
> : document id
> : the new or updated field
> : version number precisely as it was fetched
>
> can you give us more details about what your client code is doing --
> ideally just include a complate example.
>
> : What I get back is an error message:
> : version conflict for MyFirstNode1356582803755
> : expected=1422480157168369664 actual=1422480158385766400
>
> that error suggests that in between the time you downloaded the document,
> and when you sent the request to update the document, some other update
> was already recieved and changed the version number.
>
> : When I look at that document in the admin browser, it looks like this:
> ...
> : which means all the other data in that node were replaced.
>
> Are you sure the field values were replaced by *your* changes, the ones
> that you sent when that error was returned, or is it possible some other
> instnace of your code did the exact same update and got a success?
>
> Checking your server logs to see if multiple update commands were recieved
> by solr is one way to help verify this.
>
> my suspicion is that maybe you have a glitch in your code that results in
> the update operation actually happening twice -- and it's the second
> update command that is getting the error.
>
>
>
> -Hoss

Question about dates and SolrJ

2013-01-12 Thread Jack Park

My work engages SolrJ, with which I send documents off to Solr 4 which
properly store, as viewed in the admin panel, as this example:
2013-02-04T02:11:39.995Z

When I retrieve a document with that date, I use the SolrDocument
returned as a Map in which the date now looks like
this:
Sun Feb 03 18:11:39 PST 2013

I am thinking that I am missing something in the SolrJ configuration,
though it could be in how I structure the query; for now, here is the
simplistic way I setup SolrJ:

HttpSolrServer server = new HttpSolrServer(solrURL);
server.setParser(new XMLResponseParser())

Is there something I am missing to retain dates as Solr stores them?

Many thanks in advance
Jack

Re: Question about dates and SolrJ

2013-01-13 Thread Jack Park

Thanks Shawn.

I stopped setting the parser as suggested.

I found that what I had to do is to just store Date objects in my
documents, then, at the last minute, when building a SolrDocument to
send, convert with DateField. When I Export to XML, I export to that
DateField string, then convert the zulu string back to a Date object
as needed.

Seems to be working fine now.

Many thanks
Jack

On Sat, Jan 12, 2013 at 10:52 PM, Shawn Heisey  wrote:
> On 1/12/2013 7:51 PM, Jack Park wrote:
>>
>> My work engages SolrJ, with which I send documents off to Solr 4 which
>> properly store, as viewed in the admin panel, as this example:
>> 2013-02-04T02:11:39.995Z
>>
>> When I retrieve a document with that date, I use the SolrDocument
>> returned as a Map in which the date now looks like
>> this:
>> Sun Feb 03 18:11:39 PST 2013
>>
>> I am thinking that I am missing something in the SolrJ configuration,
>> though it could be in how I structure the query; for now, here is the
>> simplistic way I setup SolrJ:
>>
>> HttpSolrServer server = new HttpSolrServer(solrURL);
>> server.setParser(new XMLResponseParser())
>>
>> Is there something I am missing to retain dates as Solr stores them?
>
>
> Quick note: setting the parser is NOT necessary unless you are trying to
> connect radically different versions of Solr and SolrJ (1.x and 3.x/later,
> to be precise), and will in fact make SolrJ slightly slower when contacting
> Solr.  Just let it use the default javabin parser -- it's faster.
>
> If your date field in Solr is an actual date type, then you should be
> getting back a Date object in Java which you can manipulate in all the usual
> Java ways.  The format that you are seeing matches the toString() output
> from a Date object:
>
> http://docs.oracle.com/javase/6/docs/api/java/util/Date.html#toString%28%29
>
> You'll almost certainly have to cast the object so it's the right type:
>
> Date dateField = (Date) doc.get("datefieldname");
>
> Thanks,
> Shawn
>

Re: URL encoding problems

2013-01-17 Thread Jack Park

Similar thoughts: I used unit tests to explore that issue with SolrJ,
originally encoding with ClientUtils; The returned results had "|"
many places in the text, with no clear way to un-encode. I eventually
ran some tests with no encoding at all, including strings like
"hello & goodbye"; such strings were served and fetched
without errors. In queries at the admin console, they show up in the
JSON results correctly.  What's left? I share the confusion about what
is really going on.

Jack

On Thu, Jan 17, 2013 at 2:44 AM, Bruno Dusausoy  wrote:
> Hi,
>
> I have some problems related to URL encoding.
> I'm using Solr 3.6.1 on a Windows (32 bit) system.
> Apache Tomcat is version 6.0.36.
> I'm accessing Solr through solrj-3.3.0.
>
> When using the Solr admin and specifying my request, the URL looks like this
> (${SOLR} is there for the sake of brevity) :
> ${SOLR}/select?q=rapporteur_name%3A%28John+%2BSmith+%2B%5C%28FOO%5C%29%29
>
> But when my app launching the query, the URL looks like this :
> ${SOLR}/select?q=rapporteur_name%3A%28John%5C+Smith%5C+%5C%28FOO%5C%29%29
>
> My "decoded" query, as entered in the admin interface, is :
> rapporteur_name:(John +Smith +\(FOO\))
>
> Both request return results, but only the one returns the correct ones.
>
> The code that escapes the query is :
>
> SolrQuery query = new SolrQuery();
> query.setQuery("rapporteur_name:(" + ClientUtils.escapeQueryChars("John
> Smith (FOO)") + ")");
>
> I don't know if it's the right way to encode the query.
>
> Any ideas or directions ?
>
> Regards.
> --
> Bruno Dusausoy
> Software Engineer
> YP5 Software
> --
> Pensez environnement : limitez l'impression de ce mail.
> Please don't print this e-mail unless you really need to.

When a URL is a component of a query string's data?

2013-01-21 Thread Jack Park

There exists in my Solr index a document (several, actually) which
harbor http:// URL values. Trying to find documents with a particular
URL fails.

The query is like this:
ResourceURLPropertyType:http://someserver.org/something

Fails due to the second ":"

If I substitute %3a into that query, e.g.
ResourceURLPropertyType:http$3a//someserver.org/something
the query goes through and finds nothing.

A fork in the road?
Make it a policy to swap %3a into all URL values going to Solr, then
use the same format in search.
or
Find another way to get the query to work with the full URL,
untouched, in the index.

Googling this one has been difficult due to the ambiguity of "url" in
query strings.

Thoughts?

Many thanks in advance
Jack

Re: When a URL is a component of a query string's data?

2013-01-21 Thread Jack Park

At the admin console, surrounding with "" worked fine.

Many thanks
Jack

On Mon, Jan 21, 2013 at 11:24 AM, Jack Krupansky
 wrote:
> The colons are probably okay. It is probably the slashes causing the
> problem. An embedded slash now terminates the preceding term and starts a
> regular expression term (that is terminated by a second slash).
>
> Solution: quote each slash with a backslash.
>
>ResourceURLPropertyType:http:\/\/someserver.org\/something
>
> Or, enclose the URL in quotes.
>
>ResourceURLPropertyType:"http://someserver.org/something";
>
> -- Jack Krupansky
>
> -Original Message- From: Jack Park
> Sent: Monday, January 21, 2013 1:41 PM
> To: solr-user@lucene.apache.org
> Subject: When a URL is a component of a query string's data?
>
>
> There exists in my Solr index a document (several, actually) which
> harbor http:// URL values. Trying to find documents with a particular
> URL fails.
>
> The query is like this:
> ResourceURLPropertyType:http://someserver.org/something
>
> Fails due to the second ":"
>
> If I substitute %3a into that query, e.g.
> ResourceURLPropertyType:http$3a//someserver.org/something
> the query goes through and finds nothing.
>
> A fork in the road?
> Make it a policy to swap %3a into all URL values going to Solr, then
> use the same format in search.
> or
> Find another way to get the query to work with the full URL,
> untouched, in the index.
>
> Googling this one has been difficult due to the ambiguity of "url" in
> query strings.
>
> Thoughts?
>
> Many thanks in advance
> Jack

Solr and Unicode characters in strings

2013-01-21 Thread Jack Park

Here is a situation I now experience:

What Solr has:
economist and thus â€¦@en
What was sent:
economist and thus …@en
where those are just snippets from what I sent up -- the ellipsis was
created by Carrot2, and what comes back when I fetch the document with
that passage.

There is a hint in the Solr FAQ that the server must support UTF-8;
it's not clear how to do that from HTTPSolrServer.
Other hints from around the web suggest I should be using a different
field than type = "string"

I should point out that I am running these developmental tests on the
Solr 4 example build with my schema.xml.

My question is this: what simple, say, utility call would return the
text to its original?
(perhaps that's the wrong question...)

Many thank in advance
Jack

Re: Solr and Unicode characters in strings

2013-01-22 Thread Jack Park

Thanks!

On Tue, Jan 22, 2013 at 8:59 AM, Otis Gospodnetic
 wrote:
> Hi,
>
> When you run your indexing app make sure you treat what you send to Solr as
> UTF-8.
> Use -Dfile.encoding=UTF8 -Dclient.encoding.override＝UTF-8 to the Java
> command line.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Mon, Jan 21, 2013 at 3:06 PM, Jack Park  wrote:
>
>> Here is a situation I now experience:
>>
>> What Solr has:
>> economist and thus â€¦@en
>> What was sent:
>> economist and thus …@en
>> where those are just snippets from what I sent up -- the ellipsis was
>> created by Carrot2, and what comes back when I fetch the document with
>> that passage.
>>
>> There is a hint in the Solr FAQ that the server must support UTF-8;
>> it's not clear how to do that from HTTPSolrServer.
>> Other hints from around the web suggest I should be using a different
>> field than type = "string"
>>
>> I should point out that I am running these developmental tests on the
>> Solr 4 example build with my schema.xml.
>>
>> My question is this: what simple, say, utility call would return the
>> text to its original?
>> (perhaps that's the wrong question...)
>>
>> Many thank in advance
>> Jack
>>

UpdateResponse agents in a cloud?

2013-02-10 Thread Jack Park

Say you have a dozen servers, one core each. Say you wish to add an
agent reference inside the solrconfig update response descriptor.
Would you do that for every core?

Thanks in advance.
Jack

Re: Introducing Solrstrap: A blazing fast tool for querying Solr in a Googleish fashion

2013-02-17 Thread Jack Park

Hi Fergus,

Would it make sense to you to switch to the Apache 2 license so that
your project can "play nice" in the apache ecosystem?

Thanks
Jack

On Sun, Feb 17, 2013 at 6:25 AM, Fergus McDowall
 wrote:
> Erik
>
> Thanks for the great feedback. It fills me with joy to know that another
> human being has chosen to use Solrstrap
>
> 1) I have added a couple more CONST variables to the code to allow the
> implementer to specify the names of the hit body and hit title
> (re: exampledocs/*.xml)
>
> 2) In order to pass a full document to the hit-template you could simply to
> this:
>
> rs.append(hitTemplate({doc: result.response.docs[i]}));
>
> and then change the hit template so that it references each hit as "doc"
> and subfields thereof {{doc.somefield}}
>
> 
> 
> {{doc.title}}

> {{doc.text}}
> {{doc.metadata}}
> 
> 
>
> 3) As for the license- I take your ribbing in the spirit in which it was
> intended :) Seriously though- this is my first open source contribution, so
> I haven't given licensing a lot of though. What would a more appropriate
> license be?
>
> Fergie
>
> On Sun, Feb 17, 2013 at 12:43 PM, Erik Hatcher wrote:
>
>> Fergie -
>>
>> Nice!
>>
>> I was able to get this working on a Solr 4.1 "example" instance following
>> these steps:
>>
>>   * Adjusting SERVERROOT in bootstrap/js/solrstrap.js to
>> http://localhost:8983/solr/collection1/select/
>>   * Changed line #38 in the same file to this:
>>
>> rs.append(hitTemplate({title: result.response.docs[i].name,
>> text: result.response.docs[i].text}));
>>
>> Just changing ".title" to ".name" since Solr's exampledocs/*.xml files use
>> "name" not "title".
>>
>> I like projects like this, making it really point and click easy to see
>> and work with Solr.  I'll just point out the important caveat that you
>> mention, that it's "Designed for "open" solr instances" and "needs clear
>> access to /select", as this is something easy to overlook at first
>> (beautiful) glance and think we can just go to production without taking
>> the necessary other steps to prevent Solr from being exposed directly.
>>
>> This is a nice start to a fun way to get started with Solr.
>>
>> A few questions:
>>
>> What would it take to get the full document object passed into the hit
>> template?  And what would that hit template then look like?  (navigating
>> say a "doc" object in the template rather than each field being passed
>> explicitly)
>>
>> Right now it's called from the above line of code (is hitTemplate()
>> mapping to the id="hit-template" in solrstramp.html part of handlebars
>> magic?  Or is this explicit somewhere?)
>>
>> Here's the current hit template:
>>
>> 
>> 
>> {{title}}

>> {{text}}
>> 
>> 
>>
>> And finally... GPL?! ewww, why?! (-1)  :)
>>
>> Well played, Fergus!
>>
>> Erik
>>
>>
>> On Feb 17, 2013, at 05:35 , Fergus McDowall wrote:
>>
>> > Solrstrap is a very basic Query-Result interface for Solr. Solrstrap is
>> intended to be a starting point for those building web interfaces that talk
>> to Solr, or a very lightweight admin tool for querying Solr in a Googleish
>> fashion.
>> >
>> > Cool things about Solrstrap:
>> >
>> >* Requires only local installation- easy to set up
>> >* Access to all Bootstrap functionality. Can be easily extended in a
>> Bootstrappy way.
>> >* Blazing fast
>> >* Uses less bandwidth
>> >
>> > Use it as you see fit. Merciless criticism and fawning praise equally
>> welcome.
>> >
>> > See http://fergiemcdowall.github.com/solrstrap/
>> >
>> > and
>> >
>> > http://blog.comperiosearch.com/blog/2013/02/17/introducing-solrstrap/
>> >
>> > Fergus
>> >
>> >
>>
>>

Document update question

2013-02-20 Thread Jack Park

>From what I can read about partial updates, it will only work for
singleton fields where you can set them to something else, or
multi-valued fields where you can add something. I am testing on 4.1

I ran some tests to prove to me that you cannot do anything else to a
multi-valued field, like remove a value and do a partial update on the
whole list. It flattens the result to a comma delimited String when I
remove a value, from
   "details": [
  "here & there",
  "Hello there",
  "Oh Fudge"
],
to this
   "details": [
  "[here & there, Oh Fudge]"
],

Does this meant that I must remove the entire document and re-index it?

Many thanks in advance
Jack

Re: Document update question

2013-02-21 Thread Jack Park

I am using 4.1. I was not aware of that link. In the absence of being
able to do partial updates to multi-valued fields, I just punted to
delete and reindex. I'd like to see otherwise.

Many thanks
Jack

On Thu, Feb 21, 2013 at 8:13 AM, Timothy Potter  wrote:
> Hi Jack,
>
> There was a bug for this fixed for 4.1 - which version are you on? I
> remember this b/c I was on 4.0 and had to upgrade for this exact
> reason.
>
> https://issues.apache.org/jira/browse/SOLR-4134
>
> Tim
>
> On Wed, Feb 20, 2013 at 9:16 PM, Jack Park  wrote:
>> From what I can read about partial updates, it will only work for
>> singleton fields where you can set them to something else, or
>> multi-valued fields where you can add something. I am testing on 4.1
>>
>> I ran some tests to prove to me that you cannot do anything else to a
>> multi-valued field, like remove a value and do a partial update on the
>> whole list. It flattens the result to a comma delimited String when I
>> remove a value, from
>>"details": [
>>   "here & there",
>>   "Hello there",
>>   "Oh Fudge"
>> ],
>> to this
>>"details": [
>>   "[here & there, Oh Fudge]"
>> ],
>>
>> Does this meant that I must remove the entire document and re-index it?
>>
>> Many thanks in advance
>> Jack

Re: Document update question

2013-02-21 Thread Jack Park

Interesting you should say that.  Here is my solrj code:

public Solr3Client(String solrURL) throws Exception {
server = new HttpSolrServer(solrURL);
//  server.setParser(new XMLResponseParser());
}

I cannot recall why I commented out the setParser line; something
about someone saying in another thread it's not important. I suppose I
should revisit my unit tests with that line uncommented. Or, did I
miss something?

The JSON results I painted earlier were from reading the document
online in the admin query panel.

Many thanks
Jack

On Thu, Feb 21, 2013 at 8:52 AM, Timothy Potter  wrote:
> Weird - the only difference I see is that we us XML vs. JSON, but
> otherwise, doing the following works for us:
>
> VALU1
> VALU2
>
> Result would be:
>
> 
>   VALU1
>   VALU2
> 
>
>
> On Thu, Feb 21, 2013 at 9:44 AM, Jack Park  wrote:
>> I am using 4.1. I was not aware of that link. In the absence of being
>> able to do partial updates to multi-valued fields, I just punted to
>> delete and reindex. I'd like to see otherwise.
>>
>> Many thanks
>> Jack
>>
>> On Thu, Feb 21, 2013 at 8:13 AM, Timothy Potter  wrote:
>>> Hi Jack,
>>>
>>> There was a bug for this fixed for 4.1 - which version are you on? I
>>> remember this b/c I was on 4.0 and had to upgrade for this exact
>>> reason.
>>>
>>> https://issues.apache.org/jira/browse/SOLR-4134
>>>
>>> Tim
>>>
>>> On Wed, Feb 20, 2013 at 9:16 PM, Jack Park  wrote:
>>>> From what I can read about partial updates, it will only work for
>>>> singleton fields where you can set them to something else, or
>>>> multi-valued fields where you can add something. I am testing on 4.1
>>>>
>>>> I ran some tests to prove to me that you cannot do anything else to a
>>>> multi-valued field, like remove a value and do a partial update on the
>>>> whole list. It flattens the result to a comma delimited String when I
>>>> remove a value, from
>>>>"details": [
>>>>   "here & there",
>>>>   "Hello there",
>>>>   "Oh Fudge"
>>>> ],
>>>> to this
>>>>"details": [
>>>>   "[here & there, Oh Fudge]"
>>>> ],
>>>>
>>>> Does this meant that I must remove the entire document and re-index it?
>>>>
>>>> Many thanks in advance
>>>> Jack

Re: If we Open Source our platform, would it be interesting to you?

2013-02-21 Thread Jack Park

Marcelo

In some sense, it sounds like you are aiming at building a topic map
of all your resources.

Jack

On Thu, Feb 21, 2013 at 11:54 AM, Marcelo Elias Del Valle
 wrote:
> Hello David,
>
>  First of all, thanks for answering!
>
> 2013/2/21 David Quarterman 
>
>> Looked through your site and the framework looks very powerful as an
>> aggregator. We do a lot of data aggregation from many different sources in
>> many different formats (XML, JSON, text, CSV, etc) using RDBMS as the main
>> repository for eventual SOLR indexing. A 'one-stop-shop' for all this would
>> be very appealing.
>>
>
> Actually, just to clarify, it uses Cassandra as repository, not an
> RDMS. We want to use it for large scale, so you could import entire company
> databases into the repo and relate the data from one another. However, If I
> understood you right, you got the idea, an intermediate repo before
> indexing, so you could postpone decisions about what to index and how...
>
>
>> Have you looked at products like Talend & Jitterbit? These offer
>> transformation from almost anything to almost anything using graphical
>> interfaces (Jitterbit is better) and a PHP-like coding format for trickier
>> work. If you (or somebody) could add a graphical interface, the world would
>> beat a path to your door!
>
>
>  This is very interesting, actually! We considered using Talend when we
> started our business, but we decided to go ahead with the development of a
> new product. The reason was: Talend is great, but it limits a good
> programmer, if he is more agile coding than using graphical interfaces.
> Have user interfaces as a possibility is nice, but as something you HAVE TO
> use is awful. Besides, it has a learning curve and seems to run better and
> you hire their own platform, and we wanted to choose the fine grain of our
> platform.
>   However, your question made me think a lot about it. Do you think
> integrating to jitterbit or talend could be interesting? Or did you mean
> developing a new user interface? The bad thing I see in integrating with a
> talend like program is that you start to be dependent on the graphical
> interface, I feel it's hard to use my own java code... I might be wrong.
>   Anyway, I will consider this possibility, but if you could explain
> better why you think one or other could be such a good idea would help us a
> lot. Would you be interested in using such a tool yourself?
>
> Best regards,
> Marcelo.

Re: semantic search questions

2013-02-22 Thread Jack Park

Hi Vinay,

Perhaps you could say more about what you are looking for? What use cases, say.
Did you see the book _Taming Text_?

Thanks
Jack

On Fri, Feb 22, 2013 at 8:48 AM, Vinay B,  wrote:
> Hi,
>
> A few questions, some specific to UIMA, others more general.
> 1. The SOLR/UIMA example employs 3rd party (some of which are
> commercial) semantic APIs such as AlchemyApi and OpenCalais. This
> won't do for our application (semantic analysis of large numbers of
> plain text files) . Are there any open source alternatives that work
> with SOLR and can achieve the same results.  OpenNLP can extract parts
> of speech and extract names etc but isn't really meant for concept
> extraction.
> 2. Regardless of the caveat mentioned above, can someone illustrate a
> usecase for UIMA annotations . i.e. what kind of queries can be
> performed once a document has been processed via the UIMA plugin
> 3. Does (or can) SOLR have any disambiguation functionality (either
> native or via a 3rd party plugin) and if so, how can I leverage it.
> Once again OpenNLP has a part of speech tagger that could possibly be
> used for this.
> eg. if doc 1 contains text "This pipe is made of lead" (lead is a
> noun) and doc 2 contains  text "Lincoln lead by example" (lead is a
> verb) , how would I phrase a query intended to return docs that
> countain the term "lead" as a verb. If there's a link that explains
> how to do this, please do post it.
>
> Apparantly SIREN (http://siren.sindice.com/index.html) has some of
> this functionality (and more) built in but the documentation and use
> cases are a bit sketchy. It also hasn't been updated in a year. Does
> anyone know if it will be compatable with future SOLR / Lucene
> releases.
>
> Thanks for your responses.

Interesting issue with "special characters" in a string field value

2013-02-22 Thread Jack Park

I have a multi-value stored field called "details"

I've been deliberately sending it values like



If I fetch a document with that field at the admin query console,
using XML, I get:

 
  


If I fetch with JSON, I get:
"details": [
  ""
],

Even more curious, if I use this query at the console:

details:

I get nothing back.
I think I'm having an identity crisis in relation to escaping
characters at SolrJ. The values are going up, and when the query is to
bring the document back, they come back. But, as individuals values,
they don't appear to submit to query. If I actually escape them going
up, then the document is full of escaped characters, which can be
troublesome when fetching and using.

Any thoughts?

Many thanks
Jack

Re: Interesting issue with "special characters" in a string field value

2013-02-22 Thread Jack Park

Michael,
I don't think you misunderstood. I will soon give a full response here, but
am on the road at the moment.

Many thanks
Jack

On Friday, February 22, 2013, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:
> My mistake, I misunderstood the problem.
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter
>  wrote:
>>
>> : If you're submitting documents as XML, you're always going to have to
>> : escape meaningful XML characters going in. If you ask for them back as
>> : XML, you should be prepared to unescape special XML characters as
>>
>> that still wouldn't explain the discrepency he's claiming to see between
>> the json & xml resmonses (the json containing an empty string
>>
>> Jack: please elaborate with specifics about your solr version, field,
>> field type, how you indexed your doc, and what the request urls & raw
>> responses that you get are (ie: don't trust the XML you see in your
>> browser, it may be unescaping escaped sequences in element text to be
>> "helpful" .. use something like curl)
>>
>> For example...
>>
>> BEGIN GOOD EXAMPLE OF SPECIFICS---
>>
>> I'm using Solr 4.x with the 4.x example schema which has the following
>> field...
>>
>>
>>
>>
>> I indexed a doc like this...
>>
>> $ curl "http://localhost:8983/solr/update?commit=true"; -H
'Content-type:application/json' -d '[{"id":"hoss", "cat":"" } ]'
>>
>> And this is what i get from the following requests...
>>
>> $ curl "
http://localhost:8983/solr/select?q=id:hoss&wt=xml&indent=true&omitHeader=true
"
>> 
>> 
>>
>> 
>>   
>> hoss
>> 
>>   
>> 
>> 1427705631375097856
>> 
>> 
>>
>> $ curl "
http://localhost:8983/solr/select?q=id:hoss&wt=json&indent=true&omitHeader=true
"
>> {
>>   "response":{"numFound":1,"start":0,"docs":[
>>   {
>> "id":"hoss",
>> "cat":[""],
>> "_version_":1427705631375097856}]
>>   }}
>>
>> $ curl "http://localhost:8983/solr/select?q=cat:%22
%22&wt=json&indent=true&omitHeader=true"
>> {
>>   "response":{"numFound":1,"start":0,"docs":[
>>   {
>> "id":"hoss",
>> "cat":[""],
>> "_version_":1427705631375097856}]
>>   }}
>>
>> END GOOD EXAMPLE OF SPECIFICS---
>>
>> : > Even more curious, if I use this query at the console:
>> : >
>> : > details:
>> : >
>> : > I get nothing back.
>>
>> note in my last example above the importance of using quotes (or the
>> {!term} qparser) to query string fields that contain special characters
>> like whitespace -- whitespace is syntacally meaningul to the lucene query
>> parser, it seperates clauses of a boolean query.
>>
>>
>> -Hoss
>

Re: Interesting issue with "special characters" in a string field value

2013-02-23 Thread Jack Park

Ok. I have revisited this issue as deeply as possible using simplistic
unit tests, tossing out indexes, and starting fresh.

A typical Solr document might have a label, e.g. the string inside the
quotes: "Node Type".  That would be queried, according to what I've
been able to read, as a Phrase Query, which means, include the quotes
around the text.

When I use the admin query panel with this query:
label:"Node Type"
A fragment of the full document is returned. it is this:

NodeType

  Node Type

In my code using SolrJ, I have printlines just as the "escaped" query
string comes in, and one which shows what the SolrQuery looks like
after setting it up to go online. I then show what came back:

Solr3Client.runQuery- label:"Node Type" 0 10
Solr3Client.runQuery-1 q=label%3A%22Node+Type%22&start=0&rows=10
 {numFound=1,start=0,docs=[SolrDocument{locator=NodeType,
smallIcon=cogwheel.png, subOf=ClassType, details=The TopicQuests
typology node type., isPrivate=false, creatorId=SystemUser, label=Node
Type, largeIcon=cogwheel.png, lastEditDate=Sat Feb 23 20:43:22 PST
2013, createdDate=Sat Feb 23 20:43:22 PST 2013,
_version_=1427826019119661056}]}

What that says is that SolrQuery inserted a + inside the query string,
and that it found 1 document, but did not return it.

In the largest picture, I have returned to using XMLResponseParser on
the theory that I will now be able to take advantage of partialUpdates
on multi-valued fields (List) but haven't tested that yet. I
am not yet escaping such things as "<" or ">" but just escaping those
things mentioned in the Solr documents which are reserved characters.

So, the current update is this: learning about phrase queries, and
judicious escaping of reserved characters seems to be helping. Next up
entails two issues: more robust testing of escaped characters, and
trying to discover what is the best approach to dealing with
characters that must be escaped to get past XML, e.g. '<', '>', and
others.

Many thanks
Jack

On Fri, Feb 22, 2013 at 2:44 PM, Jack Park  wrote:
> Michael,
> I don't think you misunderstood. I will soon give a full response here, but
> am on the road at the moment.
>
> Many thanks
> Jack
>
>
> On Friday, February 22, 2013, Michael Della Bitta
>  wrote:
>> My mistake, I misunderstood the problem.
>>
>> Michael Della Bitta
>>
>> 
>> Appinions
>> 18 East 41st Street, 2nd Floor
>> New York, NY 10017-6271
>>
>> www.appinions.com
>>
>> Where Influence Isn’t a Game
>>
>>
>> On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter
>>  wrote:
>>>
>>> : If you're submitting documents as XML, you're always going to have to
>>> : escape meaningful XML characters going in. If you ask for them back as
>>> : XML, you should be prepared to unescape special XML characters as
>>>
>>> that still wouldn't explain the discrepency he's claiming to see between
>>> the json & xml resmonses (the json containing an empty string
>>>
>>> Jack: please elaborate with specifics about your solr version, field,
>>> field type, how you indexed your doc, and what the request urls & raw
>>> responses that you get are (ie: don't trust the XML you see in your
>>> browser, it may be unescaping escaped sequences in element text to be
>>> "helpful" .. use something like curl)
>>>
>>> For example...
>>>
>>> BEGIN GOOD EXAMPLE OF SPECIFICS---
>>>
>>> I'm using Solr 4.x with the 4.x example schema which has the following
>>> field...
>>>
>>>>> multiValued="true"/>
>>>>> />
>>>
>>> I indexed a doc like this...
>>>
>>> $ curl "http://localhost:8983/solr/update?commit=true"; -H
>>> 'Content-type:application/json' -d '[{"id":"hoss", "cat":">> as a source node>" } ]'
>>>
>>> And this is what i get from the following requests...
>>>
>>> $ curl
>>> "http://localhost:8983/solr/select?q=id:hoss&wt=xml&indent=true&omitHeader=true";
>>> 
>>> 
>>>
>>> 
>>>   
>>> hoss
>>> 
>>>   <Something to use as a source node>
>>> 
>>> 1427705631375097856
>>> 
>>> 
>>>
>>> $ curl
>>> "http://localhost:8983/solr/select?q=id:hoss&wt=json&indent=true&omitHeader=true"

Re: Document update question

2013-02-24 Thread Jack Park

I uncommented out the line which sets server to an XMLResponse parser,
and used the following code in a tiny test:

String sourceNodeLocator = node.getLocator();
Map updateMap = new HashMap();
Map newMap = new HashMap();
Map myMap = node.getProperties();
Listvalues = (List) myMap.get(key);
String what = "set";
values.add(newValue);
updateMap.put(ITopicQuestsOntology.LOCATOR_PROPERTY, 
sourceNodeLocator);
newMap.put(what,values);
updateMap.put(key, newMap);
IResult result = solr.partialUpdateData(updateMap);;

The printstring fragment from that test look like this:


  


  

and fetching in JSON from the admin query console looks like this:
"locator": "MySecondNode1361728848603",
"details": [
  "here & there",
  "Oh Fudge"
],

It appears that using the XMLResponseParser and getting the query
string right works!

Many thanks for all the comments.

Cheers
Jack

On Thu, Feb 21, 2013 at 5:45 PM, Shawn Heisey  wrote:
> On 2/21/2013 10:00 AM, Jack Park wrote:
>>
>> Interesting you should say that.  Here is my solrj code:
>>
>> public Solr3Client(String solrURL) throws Exception {
>> server = new HttpSolrServer(solrURL);
>> //  server.setParser(new XMLResponseParser());
>> }
>>
>> I cannot recall why I commented out the setParser line; something
>> about someone saying in another thread it's not important. I suppose I
>> should revisit my unit tests with that line uncommented. Or, did I
>> miss something?
>>
>> The JSON results I painted earlier were from reading the document
>> online in the admin query panel.
>
>
> Jack,
>
> SolrJ defaults to the javabin response parser, which offers maximum
> efficiency in the communication.  Between version 1.4.1 and 3.1.0, the
> javabin version changed and became incompatible with the old one.
>
> The XML parser is a little bit less efficient than javabin, but is the only
> way to get Solr/SolrJ to talk when one side is using a different javabin
> version than the other side.  If you are not mixing 1.x with later versions,
> you do not need to worry about changing the response parser.
>
> Thanks,
> Shawn
>

Re: Interesting issue with "special characters" in a string field value

2013-02-24 Thread Jack Park

I did run attempt queries with and without escaping at the admin query
browser; made no difference. I seem to recall that the system did not
work without escaping, but it does seem worth blocking escaping and
testing again.

Many thanks
Jack

On Sun, Feb 24, 2013 at 1:16 PM, Michael Della Bitta
 wrote:
> Hello Jack,
>
> I'm not sure if this is an option for you, but if you submit and
> retrieve your documents using only SolrJ, you won't have to worry
> about escaping them for encoding into a particular document format.
> SolrJ would handle that for you.
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Sun, Feb 24, 2013 at 12:29 AM, Jack Park  wrote:
>> Ok. I have revisited this issue as deeply as possible using simplistic
>> unit tests, tossing out indexes, and starting fresh.
>>
>> A typical Solr document might have a label, e.g. the string inside the
>> quotes: "Node Type".  That would be queried, according to what I've
>> been able to read, as a Phrase Query, which means, include the quotes
>> around the text.
>>
>> When I use the admin query panel with this query:
>> label:"Node Type"
>> A fragment of the full document is returned. it is this:
>>
>>   
>> NodeType
>> 
>>   Node Type
>> 
>>
>> In my code using SolrJ, I have printlines just as the "escaped" query
>> string comes in, and one which shows what the SolrQuery looks like
>> after setting it up to go online. I then show what came back:
>>
>> Solr3Client.runQuery- label:"Node Type" 0 10
>> Solr3Client.runQuery-1 q=label%3A%22Node+Type%22&start=0&rows=10
>>  {numFound=1,start=0,docs=[SolrDocument{locator=NodeType,
>> smallIcon=cogwheel.png, subOf=ClassType, details=The TopicQuests
>> typology node type., isPrivate=false, creatorId=SystemUser, label=Node
>> Type, largeIcon=cogwheel.png, lastEditDate=Sat Feb 23 20:43:22 PST
>> 2013, createdDate=Sat Feb 23 20:43:22 PST 2013,
>> _version_=1427826019119661056}]}
>>
>> What that says is that SolrQuery inserted a + inside the query string,
>> and that it found 1 document, but did not return it.
>>
>> In the largest picture, I have returned to using XMLResponseParser on
>> the theory that I will now be able to take advantage of partialUpdates
>> on multi-valued fields (List) but haven't tested that yet. I
>> am not yet escaping such things as "<" or ">" but just escaping those
>> things mentioned in the Solr documents which are reserved characters.
>>
>> So, the current update is this: learning about phrase queries, and
>> judicious escaping of reserved characters seems to be helping. Next up
>> entails two issues: more robust testing of escaped characters, and
>> trying to discover what is the best approach to dealing with
>> characters that must be escaped to get past XML, e.g. '<', '>', and
>> others.
>>
>> Many thanks
>> Jack
>>
>>
>> On Fri, Feb 22, 2013 at 2:44 PM, Jack Park  wrote:
>>> Michael,
>>> I don't think you misunderstood. I will soon give a full response here, but
>>> am on the road at the moment.
>>>
>>> Many thanks
>>> Jack
>>>
>>>
>>> On Friday, February 22, 2013, Michael Della Bitta
>>>  wrote:
>>>> My mistake, I misunderstood the problem.
>>>>
>>>> Michael Della Bitta
>>>>
>>>> 
>>>> Appinions
>>>> 18 East 41st Street, 2nd Floor
>>>> New York, NY 10017-6271
>>>>
>>>> www.appinions.com
>>>>
>>>> Where Influence Isn’t a Game
>>>>
>>>>
>>>> On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter
>>>>  wrote:
>>>>>
>>>>> : If you're submitting documents as XML, you're always going to have to
>>>>> : escape meaningful XML characters going in. If you ask for them back as
>>>>> : XML, you should be prepared to unescape special XML characters as
>>>>>
>>>>> that still wouldn't explain the discrepency he's claiming to see between
>>>>> the json & xml resmonses (the json containing an empty string
>>>>>
>>>>> Jack: please elaborat

Re: Query with whitespace

2013-03-01 Thread Jack Park

I found a tiny notice about just using quotes; tried it in the admin
query console and it works. e.g. label:"car house" would fetch any
document for which the label field contained that phrase.

Jack

On Fri, Mar 1, 2013 at 9:17 AM, Shawn Heisey  wrote:
> On 3/1/2013 8:50 AM, vsl wrote:
>>
>> I would like to send query like "car house". My expectation is to have
>> resulting documents that contains both car and house. Unfortunately Apache
>> Solr out of the box returns documents as if the whitespace between was
>> treated as OR. Does anybody know how to fix this?
>
>
> Three solutions come to mind: 1) Set the q.op parameter to AND.  2) Send
> "car AND house" instead, or "+car +house".  3) Use the edismax query parser
> (defType=edismax) and set the mm parameter to 100%.  The wiki should have
> info on all these.
>
> Thanks,
> Shawn
>

Custom update handler?

2013-03-10 Thread Jack Park

With 4.1, not in cloud configuration, I have a custom response handler
chain which injects an additional handler for studying the documents
as they come in. But, when I do partial updates on those documents, I
don't want them to be studied again, so I created another version of
the same chain, but without my added feature. I named it "/partial".

When I create an instance of SolrJ for the url /solr/partial,
I get back this error message:

Server at http://localhost:8983/solr/partial returned non ok
status:404, message:Not Found
{locator=2146fd50-fac9-47d5-85c0-47aaeafe177f,
tuples={set=99edfffe-b65c-4b5e-9436-67085ce49c9c}}

Here is the configuration for that:


  
  


The normal handler chain is this:


  
  
hello
  
  


which runs on a SolrJ set for  http://localhost:8983/solr/

What might I be missing?

Many thanks
Jack

Re: Custom update handler?

2013-03-11 Thread Jack Park

Many thanks.
Let me record here what I have tried.
I have viewed:
http://wiki.apache.org/solr/UpdateXmlMessages

and this github project which is suggestive:
https://github.com/industria/solrprocessors


I now have two UpdateRequestChains:


  
  
hello
  
  
 

and the new one (which is "harvest" without the
TopicQuestsDocumentProcessFactory):


  
  


Before I added "partial"
  
...

"harvest" always ran using http://localhost:8983/solr as the base URL.

A goal was to use "harvest" only for "updates" and use "partial" for
partial updates.

I am now feeding partial with this code:

UpdateRequest ur = new UpdateRequest();
ur.add(document);
ur.setCommitWithin(1000);
UpdateResponse response = 
ur.process(updateServer);
where updateServer is a second SolrJ server set to
http://localhost:8983/solr/update

But, what is now happening, after I made this addition:

  
   
 partial
   
  

dropping "partial" into /update where nothing was there before,

Now, just "partial" is running from the base URL and "harvest" is
never called, which means that I never see partial updates to validate
that part of the code.

At issue is this:

I have two "update" pathways:
One for when I am adding new documents
One for which I am performing partial updates

May I ask how I can configure my system to use "harvest" for new
documents and "partial" for when partial updates are sent in?

Many thanks
Jack


On Mon, Mar 11, 2013 at 12:23 AM, Upayavira  wrote:
> You need to refer to your chain in a RequestHandler config. Search for
> /update, duplicate that, and change the chain it points to.
>
> Upayavira
>
> On Mon, Mar 11, 2013, at 05:22 AM, Jack Park wrote:
>> With 4.1, not in cloud configuration, I have a custom response handler
>> chain which injects an additional handler for studying the documents
>> as they come in. But, when I do partial updates on those documents, I
>> don't want them to be studied again, so I created another version of
>> the same chain, but without my added feature. I named it "/partial".
>>
>> When I create an instance of SolrJ for the url /solr/partial,
>> I get back this error message:
>>
>> Server at http://localhost:8983/solr/partial returned non ok
>> status:404, message:Not Found
>> {locator=2146fd50-fac9-47d5-85c0-47aaeafe177f,
>> tuples={set=99edfffe-b65c-4b5e-9436-67085ce49c9c}}
>>
>> Here is the configuration for that:
>>
>> 
>>   
>>   
>> 
>>
>> The normal handler chain is this:
>>
>> 
>>   
>>   > class="org.apache.solr.update.TopicQuestsDocumentProcessFactory">
>> hello
>>   
>>   
>> 
>>
>> which runs on a SolrJ set for  http://localhost:8983/solr/
>>
>> What might I be missing?
>>
>> Many thanks
>> Jack

Re: Custom update handler? Some progress, new issue

2013-03-11 Thread Jack Park

Further progress now hampered by configuring an update log. When I
follow instructions found around the web, I get this:

SEVERE: Unable to create core: collection1
caused by
Caused by: java.lang.NullPointerException
at 
org.apache.solr.common.params.SolrParams.toSolrParams(SolrParams.java:295)

Now, the updateLog is configured thus:

 
   
 partial
   

  ${solr.data.dir:}

 

I think the issue lies with "solr.data.dir"
The wikis just say to drop that into the request handler chain,
without any explanation of where "solr.data.dir" comes from.

In any case, I might have successfully settled on how to choose which
update chain, but now I am deep into the bowels of update logs.

What am I missing?

Many thanks
Jack


On Mon, Mar 11, 2013 at 9:45 PM, Jack Park  wrote:
> Many thanks.
> Let me record here what I have tried.
> I have viewed:
> http://wiki.apache.org/solr/UpdateXmlMessages
>
> and this github project which is suggestive:
> https://github.com/industria/solrprocessors
>
>
> I now have two UpdateRequestChains:
>
> 
>   
>class="org.apache.solr.update.TopicQuestsDocumentProcessFactory">
> hello
>   
>   
>  
>
> and the new one (which is "harvest" without the
> TopicQuestsDocumentProcessFactory):
>
> 
>   
>   
> 
>
> Before I added "partial"
>  class="solr.XmlUpdateRequestHandler">
> ...
>
> "harvest" always ran using http://localhost:8983/solr as the base URL.
>
> A goal was to use "harvest" only for "updates" and use "partial" for
> partial updates.
>
> I am now feeding partial with this code:
>
> UpdateRequest ur = new UpdateRequest();
> ur.add(document);
> ur.setCommitWithin(1000);
> UpdateResponse response = 
> ur.process(updateServer);
> where updateServer is a second SolrJ server set to
> http://localhost:8983/solr/update
>
> But, what is now happening, after I made this addition:
>
>  class="solr.XmlUpdateRequestHandler">
>
>  partial
>
>   
>
> dropping "partial" into /update where nothing was there before,
>
> Now, just "partial" is running from the base URL and "harvest" is
> never called, which means that I never see partial updates to validate
> that part of the code.
>
> At issue is this:
>
> I have two "update" pathways:
> One for when I am adding new documents
> One for which I am performing partial updates
>
> May I ask how I can configure my system to use "harvest" for new
> documents and "partial" for when partial updates are sent in?
>
> Many thanks
> Jack
>
>
> On Mon, Mar 11, 2013 at 12:23 AM, Upayavira  wrote:
>> You need to refer to your chain in a RequestHandler config. Search for
>> /update, duplicate that, and change the chain it points to.
>>
>> Upayavira
>>
>> On Mon, Mar 11, 2013, at 05:22 AM, Jack Park wrote:
>>> With 4.1, not in cloud configuration, I have a custom response handler
>>> chain which injects an additional handler for studying the documents
>>> as they come in. But, when I do partial updates on those documents, I
>>> don't want them to be studied again, so I created another version of
>>> the same chain, but without my added feature. I named it "/partial".
>>>
>>> When I create an instance of SolrJ for the url /solr/partial,
>>> I get back this error message:
>>>
>>> Server at http://localhost:8983/solr/partial returned non ok
>>> status:404, message:Not Found
>>> {locator=2146fd50-fac9-47d5-85c0-47aaeafe177f,
>>> tuples={set=99edfffe-b65c-4b5e-9436-67085ce49c9c}}
>>>
>>> Here is the configuration for that:
>>>
>>> 
>>>   
>>>   
>>> 
>>>
>>> The normal handler chain is this:
>>>
>>> 
>>>   
>>>   >> class="org.apache.solr.update.TopicQuestsDocumentProcessFactory">
>>> hello
>>>   
>>>   
>>> 
>>>
>>> which runs on a SolrJ set for  http://localhost:8983/solr/
>>>
>>> What might I be missing?
>>>
>>> Many thanks
>>> Jack

solr.DirectUpdateHandler2 failed to instantiate

2013-03-12 Thread Jack Park

That messages gives great, but terrible google. Zillions of hits,
mostly filled with very long log traces, and zero messages (that I
could find) about what to do about it.

I switched over to using that handler since it has an update log
specified, and that's the only place I've found how to use update log.
But, can't boot now.

All the jars are in place; I'm able to import that class in my code.

Is there any news on that issue?

Many thanks
Jack

Re: solr.DirectUpdateHandler2 failed to instantiate

2013-03-12 Thread Jack Park

Indeed! Perhaps the germane part is this, before the failure to
instantiate notice:

Caused by: java.lang.ClassCastException: class org.apache.solr.update.DirectUpda
teHandler2
at java.lang.Class.asSubclass(Unknown Source)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.
java:432)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:507)

This suggests that I might be doing something wrong elsewhere in solrconfig.xml.

The possibly relevant parts (my contributions) are these:

hello

 harvest

 partial

Thanks
Jack

On Tue, Mar 12, 2013 at 12:16 PM, Mark Miller  wrote:
> There should be a stack trace - also, you shouldn't have to do anything 
> special to use this class. It's the default and only truly supported 
> implementation…
>
> - Mark
>
> On Mar 12, 2013, at 2:53 PM, Jack Park  wrote:
>
>> That messages gives great, but terrible google. Zillions of hits,
>> mostly filled with very long log traces, and zero messages (that I
>> could find) about what to do about it.
>>
>> I switched over to using that handler since it has an update log
>> specified, and that's the only place I've found how to use update log.
>> But, can't boot now.
>>
>> All the jars are in place; I'm able to import that class in my code.
>>
>> Is there any news on that issue?
>>
>> Many thanks
>> Jack
>

Re: solr.DirectUpdateHandler2 failed to instantiate

2013-03-13 Thread Jack Park

I can safely say that it is not DirectUpdateHandler2 failing;  By
commenting out my own handlers, the system boots without error.

This means that my handlers are problematic in some way. The moment I
put back just one of my handlers:


  
  
hello
  
  



   
 harvest




The problem returns.  It simply appears that I cannot declare a named
requestHandler using that class.

Jack

On Tue, Mar 12, 2013 at 12:22 PM, Jack Park  wrote:
> Indeed! Perhaps the germane part is this, before the failure to
> instantiate notice:
>
> Caused by: java.lang.ClassCastException: class 
> org.apache.solr.update.DirectUpda
> teHandler2
> at java.lang.Class.asSubclass(Unknown Source)
> at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.
> java:432)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:507)
>
> This suggests that I might be doing something wrong elsewhere in 
> solrconfig.xml.
>
> The possibly relevant parts (my contributions) are these:
>
> 
>   
>   
>  
>
> 
>   
>class="org.apache.solr.update.TopicQuestsDocumentProcessFactory">
> hello
>   
>   
> 
>
>class="solr.DirectUpdateHandler2">
>
>  harvest
> 
>
> 
>
>class="solr.DirectUpdateHandler2">
>
>  partial
>
>  
>
> Thanks
> Jack
>
> On Tue, Mar 12, 2013 at 12:16 PM, Mark Miller  wrote:
>> There should be a stack trace - also, you shouldn't have to do anything 
>> special to use this class. It's the default and only truly supported 
>> implementation…
>>
>> - Mark
>>
>> On Mar 12, 2013, at 2:53 PM, Jack Park  wrote:
>>
>>> That messages gives great, but terrible google. Zillions of hits,
>>> mostly filled with very long log traces, and zero messages (that I
>>> could find) about what to do about it.
>>>
>>> I switched over to using that handler since it has an update log
>>> specified, and that's the only place I've found how to use update log.
>>> But, can't boot now.
>>>
>>> All the jars are in place; I'm able to import that class in my code.
>>>
>>> Is there any news on that issue?
>>>
>>> Many thanks
>>> Jack
>>

Re: SolrCloud with Zookeeper ensemble in production environment: SEVERE problems.

2013-03-15 Thread Jack Park

Is there a document that tells how to create multiple threads? Search
returns many hits which orbit this idea, but I haven't spotted one
which tells how.

Thanks
Jack

On Fri, Mar 15, 2013 at 1:01 PM, Mark Miller  wrote:
> You def have to use multiple threads with it for it to be fast, but 3 or 4 
> docs a second still sounds absurdly slow.
>
> - Mark
>
> On Mar 15, 2013, at 2:58 PM, Luis Cappa Banda  wrote:
>
>> And up! :-)
>>
>> I´ve been wondering if using CloudSolrServer has something to do here. Does
>> it have a bad performance when a CloudSolrServer singletong receives
>> multiple queries? Is it recommended to have a CloudSolrServer instances
>> list and select one of them with a Round Robin criteria?
>>
>>
>>
>> 2013/3/14 Luis Cappa Banda 
>>
>>> Hello!
>>>
>>> Thanks a lot, Erick! I've attached some stack traces during a normal
>>> 'engine' running.
>>>
>>> Cheers,
>>>
>>> - Luis Cappa
>>>
>>>
>>> 2013/3/13 Erick Erickson 
>>>
 Stack traces..

 First,
 jps -l

 that will give you a the process IDs of your running Java processes. Then:

 jstack 

 Usually I pipe the output from jstack into a text file...

 Best
 Erick


 On Wed, Mar 13, 2013 at 1:48 PM, Luis Cappa Banda  wrote:

> Uhm, how can I do that... 'cleanly'? I know that with JConsole it´s
 posible
> to output this traces, but with a .war application built on top of
 Spring I
> don´t know how can I do that. In any case, here is my CloudSolrServer
> wrapper that is used by other classes. There is no sync method or piece
 of
> code:
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> *public class BinaryLBHttpSolrServer extends LBHttpSolrServer {*
>
> private static final long serialVersionUID = 3905956120804659445L;
>public BinaryLBHttpSolrServer(String[] endpoints) throws
> MalformedURLException {
>super(endpoints);
>}
>
>@Override
>protected HttpSolrServer makeServer(String server) throws
> MalformedURLException {
>HttpSolrServer solrServer = super.makeServer(server);
>solrServer.setRequestWriter(new BinaryRequestWriter());
>return solrServer;
>}
> }
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> *public class CloudSolrHttpServerImpl implements CloudSolrHttpServer {*
> private CloudSolrServer cloudSolrServer;
>
> private Logger log = Logger.getLogger(CloudSolrHttpServerImpl.class);
>
> public CloudSolrHttpServerImpl(String zookeeperEndpoints, String[]
> endpoints, int clientTimeout,
> int connectTimeout, String cloudCollection) {
> try {
> BinaryLBHttpSolrServer lbSolrServer = new *BinaryLBHttpSolrServer*
> (endpoints);
> this.cloudSolrServer = new CloudSolrServer(zookeeperEndpoints,
> lbSolrServer);
> this.cloudSolrServer.setZkConnectTimeout(connectTimeout);
> this.cloudSolrServer.setZkClientTimeout(clientTimeout);
> this.cloudSolrServer.setDefaultCollection(cloudCollection);
> } catch (MalformedURLException e) {
> log.error(e);
> }
> }
>
> @Override
> public QueryResponse *search*(SolrQuery query) throws
 SolrServerException {
> return cloudSolrServer.query(query, METHOD.POST);
> }
>
> @Override
> public boolean *index*(DocumentBean user) {
> boolean indexed = false;
> int retries = 0;
> do {
> indexed = addBean(user);
> retries++;
> } while(!indexed && retries<4);
> return indexed;
> }
> @Override
> public boolean *update*(SolrInputDocument updateDoc) {
> boolean update = false;
> int retries = 0;
>
> do {
> update = addSolrInputDocument(updateDoc);
> retries++;
> } while(!update && retries<4);
> return update;
> }
> @Override
> public void commit() {
> try {
> cloudSolrServer.commit();
> } catch (SolrServerException e) {
> log.error(e);
> } catch (IOException e) {
> log.error(e);
> }
> }
>
> @Override
> public boolean *delete*(String ... ids) {
> boolean deleted = false;
> List idList = Arrays.asList(ids);
> try {
> this.cloudSolrServer.deleteById(idList);
> this.cloudSolrServer.commit(true, true);
> deleted = true;
>
> } catch (SolrServerException e) {
> log.error(e);
>
> } catch (IOException e) {
> log.error(e);
> }
> return deleted;
> }
>
> @Override
> public void *optimize*() {
> try {
> this.cloudSolrServer.optimize();
> } catch (SolrServerException e) {
> log.error(e);
> } catch (IOException e) {
> log.error(e);
> }
> }
>>

RE: what is too large for an indexed field

2009-09-21 Thread Park, Michael

I get no results back on a search.  But I can see the actual word or phrase in 
the stored doc.

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Monday, September 21, 2009 4:18 PM
To: solr-user@lucene.apache.org
Subject: Re: what is too large for an indexed field

On Mon, Sep 21, 2009 at 3:27 PM, Park, Michael  wrote:
> I am trying to place the value of around 390,000 characters into a
> single field.  However, my search results have become inaccurate.

Do you mean that the document should score higher, or that the
document doesn't match a particular query?
If the former, keep in mind that length normalization penalizes long documents.

-Yonik
http://www.lucidimagination.com

RE: what is too large for an indexed field

2009-09-21 Thread Park, Michael

I'm using the solr.WhitespaceTokenizerFactory and the
solr.LowerCaseFilterFactory.  Is it safe to assume that a token would be
created for each word?  

I can't image anything that would be even close to 16383 chars. Is there
a way to dissect the tokens? 

Thanks, Mike

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Monday, September 21, 2009 3:42 PM
To: solr-user@lucene.apache.org
Subject: Re: what is too large for an indexed field

Park, Michael wrote:
> I am trying to place the value of around 390,000 characters into a
> single field.  However, my search results have become inaccurate.  Is
> this too large?  I tried bumping the maxFieldLength in the
> solrconfig.xml file to 500,000 and it hasn't fixed the problem.
>
>  
>
> Thanks,
>
> Mike
>
>
>   
How large is your largest token? There is hard limit of (I think) 16383
chars.

-- 
- Mark

http://www.lucidimagination.com

solr home

2009-09-25 Thread Park, Michael

I already have a handful of solr instances running .  However, I'm
trying to install solr (1.4) on a new linux server with tomcat using a
context file (same way I usually do):

 



   



 

However it throws an exception due to the following:

SEVERE: Could not start SOLR. Check solr/home property

java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in
classpath or 'solr/conf/', cwd=/opt/local/solr/fedora_solr

at
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.
java:198)

at
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.ja
va:166)

 

Any ideas why this is happening?

 

Thanks, Mike

RE: Solr + autocomplete

2007-10-15 Thread Park, Michael

Thanks!  That's a good suggestion too.  I'll look into that.

Actually, I was hoping someone had used a reliable JS library that
accepted JSON. 

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 15, 2007 4:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr + autocomplete

> 
> I would imagine there is a library to set up an autocomplete search
with
> Solr.  Does anyone have any suggestions?  Scriptaculous has a
JavaScript
> autocomplete library.  However, the server must return an unordered
> list.
> 

Solr does not provide an autocomplete UI, but it can return JSON that a 
JS library can use to populate an autocomplete.

Depending on you index size/ query speed, you may be fine with a 
standard faceting prefix filter.  If the index is large, you may want to

index using the EdgeNGramFilterFactory.

Check the last comment in:
https://issues.apache.org/jira/browse/SOLR-357

ryan

RE: Solr + autocomplete

2007-10-18 Thread Park, Michael

Thx!  I remember coming across extjs a ways back.  It was very slick.
I'll give it a try.

-Original Message-
From: Bharani [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 18, 2007 5:59 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr + autocomplete


You should take a look at http:\\www.extjs.com. The combo box has got an
autocomplete fultionality. Infact it even has paging built into it. I
just
did a demo using Solr for autocomplete and i got a very good responsive
GUI.
I have got about 100,000 documents with 26 fields each and get a
response <
1s

Hope that helps
-Bharani

Park, Michael wrote:
> 
> Thanks!  That's a good suggestion too.  I'll look into that.
> 
> Actually, I was hoping someone had used a reliable JS library that
> accepted JSON. 
> 
> -Original Message-
> From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 15, 2007 4:44 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr + autocomplete
> 
>> 
>> I would imagine there is a library to set up an autocomplete search
> with
>> Solr.  Does anyone have any suggestions?  Scriptaculous has a
> JavaScript
>> autocomplete library.  However, the server must return an unordered
>> list.
>> 
> 
> Solr does not provide an autocomplete UI, but it can return JSON that
a 
> JS library can use to populate an autocomplete.
> 
> Depending on you index size/ query speed, you may be fine with a 
> standard faceting prefix filter.  If the index is large, you may want
to
> 
> index using the EdgeNGramFilterFactory.
> 
> Check the last comment in:
> https://issues.apache.org/jira/browse/SOLR-357
> 
> ryan
> 
> 
> 
> 

-- 
View this message in context:
http://www.nabble.com/Solr-%2B-autocomplete-tf4630140.html#a13271445
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr + autocomplete

2007-11-12 Thread Park, Michael

Thanks Ryan,

This looks like the way to go.  However, when I set up my schema I get,
"Error loading class 'solr.EdgeNGramFilterFactory'".  For some reason
the class is not found.  I tried the stable 1.2 build and even tried the
nightly build.  I'm using "".

Any suggestions?

Thanks,
Mike

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 15, 2007 4:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr + autocomplete

> 
> I would imagine there is a library to set up an autocomplete search
with
> Solr.  Does anyone have any suggestions?  Scriptaculous has a
JavaScript
> autocomplete library.  However, the server must return an unordered
> list.
> 

Solr does not provide an autocomplete UI, but it can return JSON that a 
JS library can use to populate an autocomplete.

Depending on you index size/ query speed, you may be fine with a 
standard faceting prefix filter.  If the index is large, you may want to

index using the EdgeNGramFilterFactory.

Check the last comment in:
https://issues.apache.org/jira/browse/SOLR-357

ryan

RE: Solr + autocomplete

2007-11-12 Thread Park, Michael

Will I need to use Solr 1.3 with the EdgeNGramFilterFactory in order to
get the autosuggest feature?

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 12, 2007 1:05 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr + autocomplete

: "Error loading class 'solr.EdgeNGramFilterFactory'".  For some reason

EdgeNGramFilterFactory didn't exist when Solr 1.2 was released, but the 
EdgeNGramTokenizerFactory did.  (the javadocs that come with each
release 
list all of the various factories in that release)

-Hoss

tomcat context fragment

2007-06-05 Thread Park, Michael

Hello All,

 

I've been working with solr on Tomcat 5.5/Windows and had success
setting my solr home using the context fragment.  However, I cannot get
it to work on Tomcat 5.028/Unix.  I've read and re-read the Apache
Tomcat documentation and cannot find a solution.  Has anyone run into
this issue?  Is there some Tomcat setting that is preventing this from
working?

 

Thanks,

Mike

RE: tomcat context fragment

2007-06-06 Thread Park, Michael

I've found the problem. 

The Context attribute path needed to be set:


   



-Original Message-
From: Park, Michael [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 05, 2007 5:28 PM
To: solr-user@lucene.apache.org
Subject: tomcat context fragment

Hello All,

 

I've been working with solr on Tomcat 5.5/Windows and had success
setting my solr home using the context fragment.  However, I cannot get
it to work on Tomcat 5.028/Unix.  I've read and re-read the Apache
Tomcat documentation and cannot find a solution.  Has anyone run into
this issue?  Is there some Tomcat setting that is preventing this from
working?

 

Thanks,

Mike

RE: tomcat context fragment

2007-06-06 Thread Park, Michael

Hi Chris,

No.  I set up a separate file, same as the wiki.  

It's either a tomcat version issue or a difference between how tomcat on
my Win laptop is configured vs. the configuration on our tomcat Unix
machine. 

I intend to run multiple instances of solr in production and wanted to
use the context fragments.

I have 3 test instances of solr running now (with 3 context files) and
found that whatever you set the path attribute to becomes the name of
the deployed web app (it doesn't have to match the name of the context
file, but cleaner to keep the names the same).

Here is what I found on the Apache site about this:
"The context path of this web application, which is matched
against
the beginning of each request URI to select the appropriate web
application for processing. All of the context paths within a
particular Host must be unique. If you specify a context path of an
empty string (""), you are defining the default web application for
this Host, which will process all requests not assigned to other
Contexts."

~Mike

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 06, 2007 2:53 PM
To: solr-user@lucene.apache.org
Subject: RE: tomcat context fragment

: I've found the problem.
:
: The Context attribute path needed to be set:
:
:

73 matches

Mail list logo