Re: Multiple Collections in one Zookeeper

2013-03-09 Thread jimtronic
Ok, I'm a little confused.

I had originally bootstrapped zookeeper using a solr.xml file which
specified the following cores:

cats
dogs
birds

In my /solr/#/cloud?view=tree view I see that I have

/collections
 /cats
 /dogs
 /birds
/configs
 /cats
 /dogs
 /birds

When I launch a new server and connect it to zookeeper, it creates all three
collections. What I'd like to do is move cats to it's own set of boxes. 

When I run:

java -DzkHost=zookeeper:9893/cats -jar start.jar

or

java -DzkHost=zookeeper:9893,zookeeper:9893/cats -jar start.jar


I get this error:

SEVERE: Could not create Overseer node

For simplicity, I'd like to only have zookeeper ensemble.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Collections-in-one-Zookeeper-tp4045936p4045981.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.1: problems with Spatial Search.

2013-03-09 Thread Luis Cappa Banda
Hello again!

Uhm, so if I have understood then if I´m writing/reading from the index at
the same time (in other words, indexing operations are executing at the
same that that other ones are querying) the performance goes down, isn´t it?

Refering to the fieldType that I´m using, I´m just testing and I just want
to implement the classic scenario where each document has one coordinate
and to which execute geo spatial queries "given a circle with X radius". I
was testing with a multiValued field because I was wondering to include one
or more coordinates for each document due to the requirements of the
project that I´m developing: I´m tracking users positions until a top of N
positions (coordinates) to be stored in that multiValued field, but I can
adapt into just one coordinate per user if that change gets a better
performance.

In other words, David, and to summary:

- I just one to implement the classic scenario "coordinate + circle +
radius X".

- Due to NRT user position changes, I will write & read into the index
continuosly.

- I was testing with that fieldType multiValued, but I can change it into
another fieldType and into a single value field type if that improves
performance. In that case, what would you recommend?

- If even changing fieldTypes, multiValues or not, etc. the performance
doesn´t improves, what other alternatives would you recommend me to use?
I´m a Solr lover from Solr 1.4 version, but I´m familiar to other Spatial
Search alternatives technologies/tools/dbs like MongoDB. Any suggestions or
recommendations?



Best regards,

- Luis Cappa

2013/3/6 David Smiley (@MITRE.org) 

> Luis,
> I should have asked how much data you have when I offered the solution.
>
> If you have a multi-valued spatial field and you need to get the closest
> of potentially many indexed points (and your schema snippet below shows
> multiValued=true) then I'm afraid you're stuck with this until the
> underlying distance caching mechanism is improved.  This is the biggest
> limitation of this field type.  See this issue for background on the
> problem:
> https://issues.apache.org/jira/browse/LUCENE-4698
> I suggest "watching" that issue to be notified of changes.  I've got a
> couple approaches to a solution on the horizon but it unfortunately hasn't
> been a priority for my time.
>
> RE 3-4 seconds… the first time it needs to build the cache (slow) but then
> speed-wise it shouldn't be too bad — it depends on how many documents you
> actually matched in your query, not how many might be in the system.  So
> instead please tell me how many documents your search matched (aka
> "numFound").  If you're doing simultaneous committing then this approach is
> completely un-workable.
>
> ~ David
>
> From: "Rakudten [via Lucene]"  >
> Date: Wednesday, March 6, 2013 11:48 AM
> To: "Smiley, David W." mailto:dsmi...@mitre.org>>
> Subject: Re: Solr 4.1: problems with Spatial Search.
>
> I´ve doing some performance tests and I´ve noticed that with the new query
> syntax that David told me to use the QTime increases a lot. I´ve got an
> index with up to 8 million docs and sometimes the query response delays
> three, four or more seconds until finishes. Is that normal?
>
> 2013/3/6 Luis Cappa Banda <[hidden
> email]>
>
> > Hey David, it works! Thank you very much. The true is that the
> > docummentation is a little bit confusing, but know It works perfectly.
> >
> > Regards,
> >
> > - Luis Cappa
> >
> > 2013/3/6 David Smiley (@MITRE.org) <[hidden
> email]>
> >
> > Ah; bingo!
> >>
> >> The top error in the log is what Solr reports in the HTTP response you
> >> reported but it's the message of the exception wrapped by it in the logs
> >> which is more indicative of the problem:
> >>
> >> Caused by: org.apache.solr.common.SolrException: A ValueSource isn't
> >> directly available from this field. Instead try a query using the
> distance
> >> as the score.
> >>
> >> That error message (which I wrote) even contains the solution :-)
> >>
> >> You're using geodist() against solr.SpatialRecursivePrefixTreeFieldType
> >> which isn't supported.  You can get the distance but not using that
> >> approach.  Instead the query itself returns the distance as the score.
>  In
> >> the example schema you'll see a link to documentation about this field
> >> type
> >> which is this URL:
> >> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> >> From there click on "Sorting and Relevancy":
> >>
> >>
> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4#Sorting_and_Relevancy
> >> And you'll see an example query along the lines of what you want:
> >> &fl=*,score&sort=score asc&q={!
> >> score=distance}geo:"Intersects(Circle(54.729696,-98.525391 d=10))"
> >> (the score is the distance in this case)
> >>
> >> ~ David
> >>
> >>
> >> Rakudten wrote
> >> > Hello everyone!
> >> >
> >> >- I´m using Solr 4.1.0.
> >> >
> >> >
> >> >- Yes, without the sort the query works perfectly.
> >> >

Re: Multiple Collections in one Zookeeper

2013-03-09 Thread Mark Miller
You want to create both under different root nodes in zk, so that you would have

/cluster1
and
/cluster2

Then you startup with addresses of:

> zookeeper:{port1},zookeeper:{port2}/cluster1

> zookeeper:{port2},zookeeper:{port2}/cluster2

If you are using one of the bootstrap calls on startup, it should create those 
for you with Solr 4.1, otherwise you have to create the root nodes ahead of 
time (you can use the zkcli tool we provide).

- mark


On Mar 9, 2013, at 2:38 AM, jimtronic  wrote:

> Ok, I'm a little confused.
> 
> I had originally bootstrapped zookeeper using a solr.xml file which
> specified the following cores:
> 
> cats
> dogs
> birds
> 
> In my /solr/#/cloud?view=tree view I see that I have
> 
> /collections
> /cats
> /dogs
> /birds
> /configs
> /cats
> /dogs
> /birds
> 
> When I launch a new server and connect it to zookeeper, it creates all three
> collections. What I'd like to do is move cats to it's own set of boxes. 
> 
> When I run:
> 
> java -DzkHost=zookeeper:9893/cats -jar start.jar
> 
> or
> 
> java -DzkHost=zookeeper:9893,zookeeper:9893/cats -jar start.jar
> 
> 
> I get this error:
> 
> SEVERE: Could not create Overseer node
> 
> For simplicity, I'd like to only have zookeeper ensemble.
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multiple-Collections-in-one-Zookeeper-tp4045936p4045981.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 4.1: problems with Spatial Search.

2013-03-09 Thread David Smiley (@MITRE.org)
Rakudten wrote
> Hello again!
> 
> Uhm, so if I have understood then if I´m writing/reading from the index at
> the same time (in other words, indexing operations are executing at the
> same that that other ones are querying) the performance goes down, isn´t
> it?

Committing is the problem (soft or hard) due to the subsequent warmup of the
point cache, not adding/indexing data.


Rakudten wrote
> Refering to the fieldType that I´m using, I´m just testing and I just want
> to implement the classic scenario where each document has one coordinate
> and to which execute geo spatial queries "given a circle with X radius". I
> was testing with a multiValued field because I was wondering to include
> one
> or more coordinates for each document due to the requirements of the
> project that I´m developing: I´m tracking users positions until a top of N
> positions (coordinates) to be stored in that multiValued field, but I can
> adapt into just one coordinate per user if that change gets a better
> performance.
> 
> In other words, David, and to summary:
> 
> - I just one to implement the classic scenario "coordinate + circle +
> radius X".
> 
> - Due to NRT user position changes, I will write & read into the index
> continuosly.
> 
> - I was testing with that fieldType multiValued, but I can change it into
> another fieldType and into a single value field type if that improves
> performance. In that case, what would you recommend?

Is it good enough sort by the most recent point only?  You could still
filter by all points if you want.  Spatial sorting & filtering are currently
independent.  So I suggest using not only the fieldType you have there but
also LatLonType for the latest position that you want to sort on.  Configure
it (via coordinate_ field type) to use floats instead of doubles.  


Rakudten wrote
> - If even changing fieldTypes, multiValues or not, etc. the performance
> doesn´t improves, what other alternatives would you recommend me to use?
> I´m a Solr lover from Solr 1.4 version, but I´m familiar to other Spatial
> Search alternatives technologies/tools/dbs like MongoDB. Any suggestions
> or
> recommendations?

Solr should be fine.

Cheers,
 David



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-1-problems-with-Spatial-Search-tp4044868p4046004.html
Sent from the Solr - User mailing list archive at Nabble.com.


Feeding Custom QueryParser with Nested Query

2013-03-09 Thread jimtronic
I've written a custom query parser that we'll call {!doFoo } which takes two
parameters: a field name and a space delimited list of values. The parser
does some calculations between the list of values and the field in question.

In some cases, the list is quite long and as it turns out, the core already
has the information. I think most of my latency in this operation is just
passing big lists around. 

Ideally, I'd like to accomplish something like this:

{!doFoo f=my_field v='query(...)'}

Or, even better, if I could just pass a parameter in and get the results.

{!doFoo with='bar')

Thanks for any advice!
Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Feeding-Custom-QueryParser-with-Nested-Query-tp4046007.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search a folder with File name and retrieve all the files matched

2013-03-09 Thread Jan Høydahl
Sure Erik,

Or since we already default to full path name as "id", perhaps we could change 
literal.resourcename to be the filename only. Guess that one is mostly for Tika 
to have more hints to guess the type of file, so it doesn't need to be 
absolute, especially when you have it in the ID already. See any downsides? 
Please just go ahead with whatever you think best :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

9. mars 2013 kl. 04:35 skrev Erik Hatcher :

> Thanks, Jan, for making the post tool do this type of thing.  Great stuff.
> 
> The filename would be a good one add for out of the box goodness.  We can 
> easily add just the filename to the index with something like the patch 
> below.  And on that note, what else would folks want in an easy to use 
> document search system like this?
> 
>   Erik
> 
> Index: core/src/java/org/apache/solr/util/SimplePostTool.java
> ===
> --- core/src/java/org/apache/solr/util/SimplePostTool.java(revision 
> 1450270)
> +++ core/src/java/org/apache/solr/util/SimplePostTool.java(working copy)
> @@ -749,6 +749,7 @@
>   urlStr = appendParam(urlStr, "resource.name=" + 
> URLEncoder.encode(file.getAbsolutePath(), "UTF-8"));
> if(urlStr.indexOf("literal.id")==-1)
>   urlStr = appendParam(urlStr, "literal.id=" + 
> URLEncoder.encode(file.getAbsolutePath(), "UTF-8"));
> +urlStr = appendParam(urlStr, "literal.filename_s=" + 
> URLEncoder.encode(file.getName(), "UTF-8"));
> url = new URL(urlStr);
>   }
> } else {
> 
> 
> 
> On Mar 8, 2013, at 19:16 , Jan Høydahl wrote:
> 
>> Since this is a POC you could simply run this command with the default 
>> example schema:
>> 
>> cd solr/example/exampledocs
>> java -Dauto -Drecursive=0 -jar post.jar path/to/folder
>> 
>> You will get the full file name with path in field "resourcename"
>> If you need to search just the filename, you can achieve that through adding 
>> a new field "filename" with a copyField resourcename->filename and a custom 
>> fieldType for filename with a PatternReplaceFilterFactory to remove the path.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> 7. mars 2013 kl. 22:11 skrev Alexandre Rafalovitch :
>> 
>>> You could use DataImportHandler with FileListEntityProcessor to get the
>>> file names in:
>>> http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor
>>> 
>>> Then, if it is recursive enumeration and not just one level, you probably
>>> want a tokenizer that splits on path separator characters (e.g. /). Or
>>> maybe you want to index filename as a separate field from full path (can do
>>> it in FileListEntityProcessor itself).
>>> 
>>> And if you combined the list of files with inner entity using Tika, you can
>>> load the file content for searching as well:
>>> http://wiki.apache.org/solr/DataImportHandler#Tika_Integration
>>> 
>>> Regards,
>>> Alex.
>>> 
>>> Personal blog: http://blog.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>> - Time is the quality of nature that keeps events from happening all at
>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>> 
>>> 
>>> On Thu, Mar 7, 2013 at 3:39 PM, pavangolla  wrote:
>>> 
 HI,
 I am new to apache solr,
 
 I am doing a poc, where there is a folder (in sys or some repository) which
 has different files with diff extensions pdf, doc, xls..,
 
 I want to search with a file name and retrieve all the files with the name
 matching
 
 How do i proceed on this.
 
 Please help me on this.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Search-a-folder-with-File-name-and-retrieve-all-the-files-matched-tp4045629.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
>> 
> 



Re: SolrCloud - Sorting Problem

2013-03-09 Thread varun srivastava
Hi Deniz,
 Your mail about distributed query is really helpful. Can you or someone
else improve the following wiki. RIght now we dont have any document
explaining distributed search in solr, which is now backbone of solr cloud.

http://wiki.apache.org/solr/WritingDistributedSearchComponents

Thanks
Varun

On Sun, Dec 2, 2012 at 10:49 PM, deniz  wrote:

> I think I have figured out this... at least some kinda..
>
> After putting logs here there in the code, especially in SolrCore,
> HttpShardHandler, SearchHandler classes, it seems like sorting is done
> after
> all of the shards finish "responding" and then before we see the results
> the
> result set is sorted... I am not sure if this is correct or not totally, it
> is what i see from the logs, in the request headers..
>
> so for a shard or distributed search the header looks like this:
>
> status=0,QTime=4,params={df=text,fl=*,position,shard.url=blablabla
>
> and just before i see the results on my browser the header becomes this:
>
> status=0,QTime=178,params={fl=*,position,sort=myfield desc
>
> and basically, because the position field was filled before actual sorting
> on the page, the positions are incorrect...
>
> is this right? i mean sorting is really done after everything finishes and
> we are about to get results?
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Sorting-Problem-tp4023382p4023889.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCloud - Sorting Problem

2013-03-09 Thread varun srivastava
Also if anyone who understand DistributedSearch can update following wiki
it will be really helpful for all of us.

http://wiki.apache.org/solr/DistributedSearchDesign

Thanks
Varun

On Sat, Mar 9, 2013 at 4:03 PM, varun srivastava wrote:

> Hi Deniz,
>  Your mail about distributed query is really helpful. Can you or someone
> else improve the following wiki. RIght now we dont have any document
> explaining distributed search in solr, which is now backbone of solr cloud.
>
> http://wiki.apache.org/solr/WritingDistributedSearchComponents
>
> Thanks
> Varun
>
> On Sun, Dec 2, 2012 at 10:49 PM, deniz  wrote:
>
>> I think I have figured out this... at least some kinda..
>>
>> After putting logs here there in the code, especially in SolrCore,
>> HttpShardHandler, SearchHandler classes, it seems like sorting is done
>> after
>> all of the shards finish "responding" and then before we see the results
>> the
>> result set is sorted... I am not sure if this is correct or not totally,
>> it
>> is what i see from the logs, in the request headers..
>>
>> so for a shard or distributed search the header looks like this:
>>
>> status=0,QTime=4,params={df=text,fl=*,position,shard.url=blablabla
>>
>> and just before i see the results on my browser the header becomes this:
>>
>> status=0,QTime=178,params={fl=*,position,sort=myfield desc
>>
>> and basically, because the position field was filled before actual sorting
>> on the page, the positions are incorrect...
>>
>> is this right? i mean sorting is really done after everything finishes and
>> we are about to get results?
>>
>>
>>
>> -
>> Zeki ama calismiyor... Calissa yapar...
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SolrCloud-Sorting-Problem-tp4023382p4023889.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>


Re: InvalidShapeException when using SpatialRecursivePrefixTreeFieldType with custom worldBounds

2013-03-09 Thread Lance Norskog
Thank you (and Hoss)! I have found this concept elusive, and you two 
have nailed it. I will be able to understand it for the 5 minutes I will 
need to code with it.


Lance

On 03/09/2013 10:57 AM, David Smiley (@MITRE.org) wrote:

Just finished:
http://wiki.apache.org/solr/SpatialForTimeDurations



-
  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/InvalidShapeException-when-using-SpatialRecursivePrefixTreeFieldType-with-custom-worldBounds-tp4045351p4045997.html
Sent from the Solr - User mailing list archive at Nabble.com.