from:"markwaddle"

Re: DIH wiht several Cores

2010-10-25 Thread markwaddle


Unfortunately, what you are asking for is not possible. The DIH needs to be
configured separately for each core. I have a similar situation with my Solr
application. I am solving it by creating a custom index feeder that is aware
of all of the cores and which documents to send to which cores.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-wiht-several-Cores-tp1767883p1769794.html
Sent from the Solr - User mailing list archive at Nabble.com.

How does DIH multithreading work?

2010-10-26 Thread markwaddle


I understand that the thread count is specified on root entities only. Does
it spawn multiple threads per root entity? Or multiple threads per
descendant entity? Can someone give an example of how you would make a
database query in an entity with 4 threads that would select 1 row per
thread?

Thanks,
Mark
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How does DIH multithreading work?

2010-10-27 Thread markwaddle


Anyone know how it works?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1784419.html
Sent from the Solr - User mailing list archive at Nabble.com.

Core/shard preference

2009-10-19 Thread markwaddle


I have a small core performing deltas quickly (core00), and a large core
performing deltas slowly (core01), both on the same set of documents. The
delta core is cleaned nightly. As you can imagine, at times there are two
versions of a document, one in each core. When I execute a query that
matches this document, sometimes it will come from the delta core, and some
times it will come from the large core. It almost seems random. Here is my
query:

http://porsche:8181/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP

When the delta documents from core00 are returned as desired the access logs
show:

10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
HTTP/1.1 200 293 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 506 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
HTTP/1.1 200 1151 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 2597 1
10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
HTTP/1.1 200 11881 9

When the documents are returned from core01 the access logs show:
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
HTTP/1.1 200 289 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 506 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 3390 1
10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
HTTP/1.1 200 11873 9

Any ideas on why there is a difference in the requests made? Is there a way
I can tell Solr to prefer the documents in core00?

Mark
-- 
View this message in context: 
http://www.nabble.com/Core-shard-preference-tp25966791p25966791.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Core/shard preference

2009-10-21 Thread markwaddle


Thank you guys for your responses. That is what I suspected, that it was
going with the first instance of the document that it sees. I tried setting
up Solr in Eclipse and ran into a couple of issues blocking it from
compiling. I also did some reading, but none of the write ups were very
comprehensive. Are there any good write ups that you know of with
instructions on setting up Solr in Eclipse?

Thanks again,
Mark



Yonik Seeley-2 wrote:
> 
> Although shards should be disjoint, Solr "tolerates" duplication
> (won't return duplicates in the main results list, but doesn't make
> any effort to correct facet counts, etc).
> 
> Currently, whichever shard responds first wins.
> The relevant code is around line 420 in QueryComponent.java:
> 
>   String prevShard = uniqueDoc.put(id, srsp.getShard());
>   if (prevShard != null) {
> // duplicate detected
> numFound--;
> 
> // For now, just always use the first encountered since we
> can't currently
> // remove the previous one added to the priority queue.
> If we switched
> // to the Java5 PriorityQueue, this would be easier.
> continue;
> // make which duplicate is used deterministic based on shard
> // if (prevShard.compareTo(srsp.shard) >= 0) {
> //  TODO: remove previous from priority queue
> //  continue;
> // }
>   }
> 
> So it's certainly possible to make it deterministic, we just haven't
> done it yet.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> On Mon, Oct 19, 2009 at 7:30 PM, Lance Norskog  wrote:
>> Distributed Search is designed only for disjoint cores.
>>
>> The document list from each core is returned sorted by the relevance
>> score. The distributed searcher merges these sorted lists. Solr does
>> not implement "distributed IDF", which essentially means distributed
>> coordinated scoring. All scoring happens inside each core, relative to
>> that core's contents. The resulting score numbers are not coordinated
>> with each other, and you will get random results.
>>
>> There is no way to say "use this core's results" because the searches
>> are not compared all at once. Only the page of results fetched is
>> compared, so there's no way to suppress a result in the second page if
>> it was already found in the first.
>>
>> On Mon, Oct 19, 2009 at 3:30 PM, markwaddle  wrote:
>>>
>>> I have a small core performing deltas quickly (core00), and a large core
>>> performing deltas slowly (core01), both on the same set of documents.
>>> The
>>> delta core is cleaned nightly. As you can imagine, at times there are
>>> two
>>> versions of a document, one in each core. When I execute a query that
>>> matches this document, sometimes it will come from the delta core, and
>>> some
>>> times it will come from the large core. It almost seems random. Here is
>>> my
>>> query:
>>>
>>> http://porsche:8181/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
>>>
>>> When the delta documents from core00 are returned as desired the access
>>> logs
>>> show:
>>>
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core00/select
>>> HTTP/1.1 200 293 1
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core01/select
>>> HTTP/1.1 200 506 1
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core00/select
>>> HTTP/1.1 200 1151 1
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core01/select
>>> HTTP/1.1 200 2597 1
>>> 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
>>> /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
>>> HTTP/1.1 200 11881 9
>>>
>>> When the documents are returned from core01 the access logs show:
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core00/select
>>> HTTP/1.1 200 289 1
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core01/select
>>> HTTP/1.1 200 506 1
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core01/select
>>> HTTP/1.1 200 3390 1
>>> 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
>&

RE: Solr under tomcat - UTF-8 issue

2009-10-25 Thread markwaddle


I was originally using POST for the same reason, however I discovered that
Tomcat could easily be configured to accept any length URI. All it requires
is specifying the maxHttpHeaderSize attribute in your default Connector in
server.xml. I set my value to 1MB, which is certainly excessive, but it
ensures I will never hit the limit. As the other chap mentioned, I now have
the benefits of caching and most importantly, proper web logs!

I also have a similar situation where I constrain the search results based
on the user's role. I have only two roles to support, so my case is very
simple, but I could imagine having a multivalued "role" field that you could
perform facet queries on.

Mark


Glock, Thomas wrote:
> 
> Thanks -
> 
> I agree.  However my application requires results be trimmed to users
> based on roles.  The roles are repeating values on the documents.  Users
> have many different role combinations as do documents.
> I recognize this is going to hamper caching - but using a GET will tend to
> limit the size of search phrases when combined with the boolean role
> clause.  And I am concerned with hitting url limits.
> 
> At any rate I solved it thanks to Yonik's recommendation.  
> 
> My flex client httpservice by default only sets the content-type request
> header to  "application/x-www-form-urlencoded"  what it needed to do for
> tomcat is set the content-type request header to content-type =
> "application/x-www-form-urlencoded; charset=UTF-8"; 
> 
> If you have any suggestions regarding limiting results based on user and
> document role permutations - I'm all ears.  I've been to the Search Summit
> in NYC and no vendor could even seem to grasp the concept.  
> 
> The problem case statement is this  - I have users globally who need to
> search for content tailored to them.  Users searching for 'Holiday' don't
> get any value from 1 documents having the word holiday. What they need
> are documents authored for that population.  The documents have the
> associated role information as metadata and therefore users will get only
> the documents they have access to and are relevant to them.  That's the
> plan anyway!  
> 
> By chance I stumbled in Solr a month or so ago and I think its awesome.  I
> got the book two days ago too - fantastic!
> 
> Thanks again,
> Tom
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-under-tomcat---UTF-8-issue-tp26040052p26054942.html
Sent from the Solr - User mailing list archive at Nabble.com.

Delete, commit, optimize doesn't reduce index file size

2009-12-29 Thread markwaddle


I have an index that used to have ~38M docs at 17.2GB. I deleted all but 13K
docs using a delete by query, commit and then optimize. A "*:*" query now
returns 13K docs. The problem is that the files on disk are still 17.1GB in
size. I expected the optimize to shrink the files. Is there a way I can
shrink them now that the index only has 13K docs?

Mark
-- 
View this message in context: 
http://old.nabble.com/Delete%2C-commit%2C-optimize-doesn%27t-reduce-index-file-size-tp26958067p26958067.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete, commit, optimize doesn't reduce index file size

2009-12-29 Thread markwaddle




Yonik Seeley-2 wrote:
> 
> On Tue, Dec 29, 2009 at 1:23 PM, markwaddle  wrote:
>> I have an index that used to have ~38M docs at 17.2GB. I deleted all but
>> 13K
>> docs using a delete by query, commit and then optimize. A "*:*" query now
>> returns 13K docs. The problem is that the files on disk are still 17.1GB
>> in
>> size. I expected the optimize to shrink the files. Is there a way I can
>> shrink them now that the index only has 13K docs?
> 
> Are you on Windows?
> The IndexWriter can't delete files in use by the current IndexReader
> (like it can in UNIX) when the commit is done.
> If you make further changes to the index and do a commit, you should
> see the space go down.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 

I am on Windows. Would a DataImportHandler delta-import with 1 or more
changes be a sufficient change to allow the files to be deleted?

Mark
-- 
View this message in context: 
http://old.nabble.com/Delete%2C-commit%2C-optimize-doesn%27t-reduce-index-file-size-tp26958067p26960857.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete, commit, optimize doesn't reduce index file size

2009-12-29 Thread markwaddle




Yonik Seeley-2 wrote:
> 
> If you make further changes to the index and do a commit, you should
> see the space go down.
> 

It worked. I added a bogus document using /update and then performed a
commit and now the files are down to 6MB.

http://.../core00/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E0%3C/field%3E%3C/doc%3E%3C/add%3E

http://.../core00/update?stream.body=%3Ccommit/%3E

Thanks!
Mark
-- 
View this message in context: 
http://old.nabble.com/Delete%2C-commit%2C-optimize-doesn%27t-reduce-index-file-size-tp26958067p26960957.html
Sent from the Solr - User mailing list archive at Nabble.com.

Unexpected boolean query behavior

2010-01-14 Thread markwaddle


Here is my query:
(virt* AND "machine fingerprinting") OR (virt* AND encryption) OR (virt* AND
anonymous) OR (virt* AND analytic*) AND owned:true

It can be broken down to:
(A) OR (B) OR (C) OR (D) AND E

A, B, C and D are themselves AND boolean clauses.

The E clause at the end is not behaving the way I would expect. No matter
how I order the A,B,C and D clauses, it always returns the equivalent of
((D) AND E).

When I add additional parentheses it behaves the way I expect. Like:
((A) OR (B) OR (C) OR (D)) AND E
or
(A) OR (B) OR (C) OR ((D) AND E)

Can anyone explain why it behaves the way it does without the parentheses?
Is there something I am missing in the way it processes boolean clauses?

Thanks,
Mark
-- 
View this message in context: 
http://old.nabble.com/Unexpected-boolean-query-behavior-tp27166967p27166967.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unexpected boolean query behavior

2010-01-14 Thread markwaddle


That is a reasonable question. The problem here is that my users have already
created numerous queries just like this one, using ANDs and ORs. My users
are very technical and they have been using the results of these queries for
months now to perform analysis that drives business decisions. I need an
explanation for why this is happening so I can not only train them on how to
use it more effectively, but also to restore their trust in the search
application.

Does anyone understand this behavior? Or can you recommend a place for me to
look?


Otis Gospodnetic wrote:
> 
> Mark,
> 
> Does it help if you rewrite your query using +/- syntax ("required",
> "prohibited"), or nothing for "should"?  Because that's what happens under
> the hood (terms are required, prohibited, or should occur).
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> 
> 
> 
> - Original Message 
>> From: markwaddle 
>> To: solr-user@lucene.apache.org
>> Sent: Thu, January 14, 2010 2:39:21 PM
>> Subject: Unexpected boolean query behavior
>> 
>> 
>> Here is my query:
>> (virt* AND "machine fingerprinting") OR (virt* AND encryption) OR (virt*
>> AND
>> anonymous) OR (virt* AND analytic*) AND owned:true
>> 
>> It can be broken down to:
>> (A) OR (B) OR (C) OR (D) AND E
>> 
>> A, B, C and D are themselves AND boolean clauses.
>> 
>> The E clause at the end is not behaving the way I would expect. No matter
>> how I order the A,B,C and D clauses, it always returns the equivalent of
>> ((D) AND E).
>> 
>> When I add additional parentheses it behaves the way I expect. Like:
>> ((A) OR (B) OR (C) OR (D)) AND E
>> or
>> (A) OR (B) OR (C) OR ((D) AND E)
>> 
>> Can anyone explain why it behaves the way it does without the
>> parentheses?
>> Is there something I am missing in the way it processes boolean clauses?
>> 
>> Thanks,
>> Mark
>> -- 
>> View this message in context: 
>> http://old.nabble.com/Unexpected-boolean-query-behavior-tp27166967p27166967.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Unexpected-boolean-query-behavior-tp27166967p27167750.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unexpected boolean query behavior

2010-01-14 Thread markwaddle


That explains my exact problem, thank you! May I ask how you found that wiki
posting?


Otis Gospodnetic wrote:
> 
> HI Mark,
> 
> Does this help?
> http://wiki.apache.org/lucene-java/BooleanQuerySyntax
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> 
-- 
View this message in context: 
http://old.nabble.com/Unexpected-boolean-query-behavior-tp27166967p27170172.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH wiht several Cores

How does DIH multithreading work?

Re: How does DIH multithreading work?

Core/shard preference

Re: Core/shard preference

RE: Solr under tomcat - UTF-8 issue

Delete, commit, optimize doesn't reduce index file size

Re: Delete, commit, optimize doesn't reduce index file size

Re: Delete, commit, optimize doesn't reduce index file size

Unexpected boolean query behavior

Re: Unexpected boolean query behavior

Re: Unexpected boolean query behavior

12 matches

Site Navigation

Mail list logo

Footer information