Re: NRT and warmupTime of filterCache

2011-03-10 Thread stockii
>> it'll negatively impact the desired goal of low latency new index readers?
- yes, i think so, thats the reason because i dont understand the
wiki-article ...

i set the warmupCount to 500 and i got no error messages, that solr isnt
available ...
but solr-stats.jsp show me a warmuptime of "warmupTime : 12174 " why ? 

is the warmuptime in solrconfig.xml the maximum time in ms, for autowarming
? or what does it really means ? 

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores < 100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2659560.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: NRT and warmupTime of filterCache

2011-03-10 Thread stockii
okay, not the time ... the items ...

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores < 100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2659562.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ and digest authentication

2011-03-10 Thread Erlend Garåsen


I figured it out. Since this Solr server does not has an SSL interface, 
I had to change the following line from 443 to 80:

AuthScope scope = new AuthScope(host, 80, "resin");

Erlend

On 09.03.11 17.09, Erlend Garåsen wrote:


I'm trying to do a search with SolrJ using digest authentication, but
I'm getting the following error:
org.apache.solr.common.SolrException: Unauthorized

I'm setting up SolrJ this way:

HttpClient client = new HttpClient();
List authPrefs = new ArrayList();
authPrefs.add(AuthPolicy.DIGEST);
client.getParams().setParameter(AuthPolicy.AUTH_SCHEME_PRIORITY,
authPrefs);
AuthScope scope = new AuthScope(host, 443, "resin");
client.getState().setCredentials(scope, new
UsernamePasswordCredentials(username, password));
client.getParams().setAuthenticationPreemptive(true);
SolrServer server = new CommonsHttpSolrServer(server, client);

Is this something which is not supported by SolrJ or have I written
something wrong in the code above?

Erlend




--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Possible to sort in .xml file?

2011-03-10 Thread Andy Newby
Hi,

I'm trying to setup Solr so that we can "sort" using:

document_views asc,score

...is this possible via the solrconfig.xml/schema.xml file?

I know its possible to do via adding sort= , but the Perl module
(WebService::Solr) doesn't seem to offer the option to pass in this value :(

TIA
-- 
Andy Newby
a...@ultranerds.com


Re: Possible to sort in .xml file?

2011-03-10 Thread Markus Jelsma
Is there no generic parameter store in the Solr module you can use for passing 
the sort parameter? If not, you can define your sort parameter as default in 
the request handler you use in solrconfig. See the shipped config for 
examples.

On Thursday 10 March 2011 11:25:01 Andy Newby wrote:
> Hi,
> 
> I'm trying to setup Solr so that we can "sort" using:
> 
> document_views asc,score
> 
> ...is this possible via the solrconfig.xml/schema.xml file?
> 
> I know its possible to do via adding sort= , but the Perl module
> (WebService::Solr) doesn't seem to offer the option to pass in this value
> :(
> 
> TIA

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Possible to sort in .xml file?

2011-03-10 Thread Markus Jelsma
No, look for request handlers.

  

 
   explicit
   10
 





etc... You can add any valid parameter there as default.

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

On Thursday 10 March 2011 11:34:47 Andy Newby wrote:
> Hi,
> 
> Thanks for the quick reply!
> 
> I did a quick look in the solrconfig.xml file, but can't see anything about
> "sort", appart from:
> 
>
> 
> 
> TIA
> 
> Andy
> 
> On Thu, Mar 10, 2011 at 10:33 AM, Markus Jelsma
> 
> wrote:
> > Is there no generic parameter store in the Solr module you can use for
> > passing
> > the sort parameter? If not, you can define your sort parameter as default
> > in
> > the request handler you use in solrconfig. See the shipped config for
> > examples.
> > 
> > On Thursday 10 March 2011 11:25:01 Andy Newby wrote:
> > > Hi,
> > > 
> > > I'm trying to setup Solr so that we can "sort" using:
> > > 
> > > document_views asc,score
> > > 
> > > ...is this possible via the solrconfig.xml/schema.xml file?
> > > 
> > > I know its possible to do via adding sort= , but the Perl module
> > > (WebService::Solr) doesn't seem to offer the option to pass in this
> > > value
> > > 
> > > :(
> > > 
> > > TIA
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


disquery - difference qf qs / pf ps

2011-03-10 Thread Gastone Penzo
Hi
i understand what qf and qs parameters are
but i can't understand what pf and ps are exactly.
someone can explain it to me??

for example

qf=title^2 name^1.2 surname^1
qs=3

it means i search in title field with boost 2 or in name field with boost
1.2 or in surname field with boost 1
and the maximum slop beetween term to match is 3.

right??

and the ps? pf? (phrase filter and phrase slop)?
can i use all 4 parameters together??

Thanx

-- 
Gastone Penzo


Re: disquery - difference qf qs / pf ps

2011-03-10 Thread Ahmet Arslan
> Hi
> i understand what qf and qs parameters are
> but i can't understand what pf and ps are exactly.
> someone can explain it to me??
> 
> for example
> 
> qf=title^2 name^1.2 surname^1
> qs=3
> 
> it means i search in title field with boost 2 or in name
> field with boost
> 1.2 or in surname field with boost 1
> and the maximum slop beetween term to match is 3.
> 
> right??
> 
> and the ps? pf? (phrase filter and phrase slop)?
> can i use all 4 parameters together??

Yes you can use all 4 parameters together. Please see similar discussion:
http://search-lucene.com/m/KWkYf2kE4Ng1/


  


Re: disquery - difference qf qs / pf ps

2011-03-10 Thread Gastone Penzo
Thank you very much. i understand the difference beetween qs and ps but not
what pf is...is it necessary to use ps?


>  Yes you can use all 4 parameters together. Please see similar discussion:
> http://search-lucene.com/m/KWkYf2kE4Ng1/
>
>
>
>


-- 
Gastone Penzo


Re: Math-generated fields during query

2011-03-10 Thread Markus Jelsma
Not at the moment if i'm not mistaken. The same issue is with Solr 3.1 where 
relative distances are not being returned as field value when doing spatial 
filtering. To retrieve the value one must use the score as the some pseudo 
field.

http://wiki.apache.org/solr/SpatialSearch#Returning_the_distance

On Wednesday 09 March 2011 23:06:33 Peter Sturge wrote:
> Hi,
> 
> I was wondering if it is possible during a query to create a returned
> field 'on the fly' (like function query, but for concrete values, not
> score).
> 
> For example, if I input this query:
>q=_val_:"product(15,3)"&fl=*,score
> 
> For every returned document, I get score = 45.
> 
> If I change it slightly to add *:* like this:
>q=*:* _val_:"product(15,3)"&fl=*,score
> 
> I get score = 32.526913.
> 
> If I try my use case of _val_:"product(qty_ordered,unit_price)", I get
> varying scores depending on...well depending on something.
> 
> I understand this is doing relevance scoring, but it doesn't seem to
> tally with the FunctionQuery Wiki
> [example at the bottom of the page]:
> 
>q=boxname:findbox+_val_:"product(product(x,y),z)"&fl=*,score
> ...where score will contain the resultant volume.
> 
> Is there a trick to getting not a score, but the actual value of
> quantity*price (e.g. product(5,2.21) == 11.05)?
> 
> Many thanks

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: disquery - difference qf qs / pf ps

2011-03-10 Thread Ahmet Arslan


> i understand the
> difference beetween qs and ps but not
> what pf is...is it necessary to use ps?

pf (Phrase Fields) and ps (Phrase Slop) related to each other.

Lets say you have &q=term1 term1&pf=title text&ps=10

We can think as if dismax adds title:"term1 term2"~10 text:"term1 term2"~10 
imaginary  optional clauses to your original query. Optional means they effect 
order of documents.

http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29


  


Re: NRT in Solr

2011-03-10 Thread Jason Rutherglen
Bill,

I think all of the improvements can be made, however they are fairly
large structural changes that would require perhaps several patches.
The other issue is we'll likely land RT this year (or next) and then
the cached values need to be appended to as the documents are added,
that and they'll be across several DWPTs (see LUCENE-2324).  So one
could easily do work for per-segment caching, and then need to go back
and do per-segment, append caches.  I'm not sure caching is needed at
all, especially with the recent speed improvements, except for facets
which resemble field caches, and probably should be subsumed there.

Jason

On Wed, Mar 9, 2011 at 8:27 PM, Bill Bell  wrote:
> So it looks like can handle adding new documents, and expiring old
> documents. Updating a document is not part of the game.
> This would work well for message boards or tweet type solutions.
>
> Solr can do this as well directly. Why wouldn't you just improve the
> document and facet caching so that when you append there is not a huge hit
> to Solr? Also we could add a expiration to documents as well.
>
> The big issue for me is that when I update Solr I need to replicate that
> change quickly to all slaves. If we changed replication to stream to the
> slaves in Near Real Time and not have to create a whole new index version,
> warming, etc, that would be awesome. That combined with better caching
> smarts and we have a near perfect solution.
>
> Thanks.
>
> On 3/9/11 3:29 PM, "Smiley, David W."  wrote:
>
>>Zoie adds NRT to Solr:
>>http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin
>>
>>I haven't tried it yet but looks cool.
>>
>>~ David Smiley
>>Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>>
>>On Mar 9, 2011, at 9:01 AM, Jason Rutherglen wrote:
>>
>>> Jae,
>>>
>>> NRT hasn't been implemented NRT as of yet in Solr, I think partially
>>> because major features such as replication, caching, and uninverted
>>> faceting suddenly are no longer viable, eg, it's another round of
>>> testing etc.  It's doable, however I think the best approach is a
>>> separate request call path, to avoid altering to current [working]
>>> API.
>>>
>>> On Tue, Mar 8, 2011 at 1:27 PM, Jae Joo  wrote:
 Hi,
 Is NRT in Solr 4.0 from trunk? I have checkouted from Trunk, but could
not
 find the configuration for NRT.

 Regards

 Jae

>>
>>
>>
>>
>>
>
>
>


Re: FunctionQueries and FieldCache and OOM

2011-03-10 Thread Markus Jelsma
Well, it's quite hard to debug because the values listed on the stats page in 
the fieldCache section don't make much sense. Reducing precision with 
NOW/HOUR, however, does seem to make a difference.

It is hard (or impossible) to reproduce this is a test setup with the same 
index but without continues updates and without stress tests. Firing manual 
queries with different values for the bf parameter don't show any difference 
in the values listed on the stats page.

Someone cares to provide an explanation?

Thanks

On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote:
> Hi,
> 
> In one of the environments i'm working on (4 Solr 1.4.1. nodes with
> replication, 3+ million docs, ~5.5GB index size, high commit rate
> (~1-2min), high query rate (~50q/s), high number of updates
> (~1000docs/commit)) the nodes continuously run out of memory.
> 
> During development we frequently ran excessive stress tests and after
> tuning JVM and Solr settings all ran fine. A while ago i added the DisMax
> bq parameter for boosting recent documents, documents older than a day
> receive 50% less boost, similar to the example but with a much steeper
> slope. For clarity, i'm not using the ordinal function but the reciprocal
> version in the bq parameter which is warned against when using Solr 1.4.1
> according to the wiki.
> 
> This week we started the stress tests and nodes are going down again. I've
> reconfigured the nodes to have different settings for the bq parameter (or
> no bq parameter).
> 
> It seems the bq the cause of the misery.
> 
> Issue SOLR- keeps popping up but it has not been resolved. Is there
> anyone who can confirm one of those patches fixes this issue before i
> waste hours of work finding out it doesn't? ;)
> 
> Am i correct when i assume that Lucene FieldCache entries are added for
> each unique function query?  In that case, every query is a unique cache
> entry because it operates on milliseconds. If all doesn't work i might be
> able to reduce precision by operating on minutes or even more instead of
> milli seconds. I, however, cannot use other nice math function in the ms()
> parameter so that might make things difficult.
> 
> However, date math seems available (NOW/HOUR) so i assume it would also
> work for /HOUR as well. This way i just might prevent
> useless entries.
> 
> My apologies for this long mail but it may prove useful for other users and
> hopefully we find the solution and can update the wiki to add this warning.
> 
> Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: NRT and warmupTime of filterCache

2011-03-10 Thread Jason Rutherglen
> - yes, i think so, thats the reason because i dont understand the
> wiki-article ...

Maybe the article is out of date?  I think it's grossly inefficient to
warm the searchers at all in the NRT case.  Queries are being
performed across *all* segments, even though there should only be 1
that's new that may require warming.  However given the new segment's
so small, there should be no reason to warm it at all?

On Thu, Mar 10, 2011 at 12:14 AM, stockii  wrote:
>>> it'll negatively impact the desired goal of low latency new index readers?
> - yes, i think so, thats the reason because i dont understand the
> wiki-article ...
>
> i set the warmupCount to 500 and i got no error messages, that solr isnt
> available ...
> but solr-stats.jsp show me a warmuptime of "warmupTime : 12174 " why ?
>
> is the warmuptime in solrconfig.xml the maximum time in ms, for autowarming
> ? or what does it really means ?
>
> -
> --- System 
> 
>
> One Server, 12 GB RAM, 2 Solr Instances, 7 Cores,
> 1 Core with 31 Million Documents other Cores < 100.000
>
> - Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
> - Solr2 for Update-Request  - delta every Minute - 4GB Xmx
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2659560.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Indexing a text string for faceting

2011-03-10 Thread Erick Erickson
A string type will NOT use tokenizers, so that's probably not it

There are several things that are extremely suspicious:
1> the '&' character in your URL. Add "debugQuery=on" and see what
happens there. I suspect everything after the & is interpreted as a
separate parameter and ignored.
2> even if the & isn't a problem, the general syntax of
fCategoryName:term1 term2 term3
usually means fCategoryName:term1 defaultField:term2
defaultField:term3. You probably
want something like fCategoryName:"term1 term2 term3"

Best
Erick

On Wed, Mar 9, 2011 at 4:45 PM, Greg Georges  wrote:
> Hello all,
>
> I have a small problem with my faceting fields. In all I create a new 
> faceting field which is indexed and not stored, and use copyField. The 
> problem is I facet on category names which have examples like this
>
> Policies & Documentation 
> (37)
> Forms & Checklists 
> (22)
>
> Right now my fields were using the string type, which is not got because I 
> think by default it is using a tokenizer etc.. I think I must define a new 
> type field so that my category names will be properly indexed as a facet 
> field. Here is what I have now
>
> 
> 
>  multiValued="true"/>
>  multiValued="true"/>
>
> 
> 
>
> Can someone give me a type configuration which will support my category names 
> which have whitespaces and ampersands?
>
> Thanks in advance
>
> Greg
>


Re: Math-generated fields during query

2011-03-10 Thread Markus Jelsma
There is a ticket:
https://issues.apache.org/jira/browse/SOLR-1298


On Thursday 10 March 2011 15:46:55 Peter Sturge wrote:
> Hi Markus,
> 
> Thanks for your input. Hmmm, so it sounds like all those nice math
> functions operate only on the Lucene tf/idf score.
> In the link you gave, there is mention of 'Returning distances (and
> any arbitrary function query value) is currently under development.'
> (the workaround mentioned doesn't work for product()).
> Do you know if there is a JIRA ticket for this (I couldn't ssee one in
> a search of JIRA)?
> 
> Thanks again!
> Peter
> 
> 
> On Thu, Mar 10, 2011 at 1:19 PM, Markus Jelsma
> 
>  wrote:
> > Not at the moment if i'm not mistaken. The same issue is with Solr 3.1
> > where relative distances are not being returned as field value when
> > doing spatial filtering. To retrieve the value one must use the score as
> > the some pseudo field.
> > 
> > http://wiki.apache.org/solr/SpatialSearch#Returning_the_distance
> > 
> > On Wednesday 09 March 2011 23:06:33 Peter Sturge wrote:
> >> Hi,
> >> 
> >> I was wondering if it is possible during a query to create a returned
> >> field 'on the fly' (like function query, but for concrete values, not
> >> score).
> >> 
> >> For example, if I input this query:
> >>q=_val_:"product(15,3)"&fl=*,score
> >> 
> >> For every returned document, I get score = 45.
> >> 
> >> If I change it slightly to add *:* like this:
> >>q=*:* _val_:"product(15,3)"&fl=*,score
> >> 
> >> I get score = 32.526913.
> >> 
> >> If I try my use case of _val_:"product(qty_ordered,unit_price)", I get
> >> varying scores depending on...well depending on something.
> >> 
> >> I understand this is doing relevance scoring, but it doesn't seem to
> >> tally with the FunctionQuery Wiki
> >> [example at the bottom of the page]:
> >> 
> >>q=boxname:findbox+_val_:"product(product(x,y),z)"&fl=*,score
> >> ...where score will contain the resultant volume.
> >> 
> >> Is there a trick to getting not a score, but the actual value of
> >> quantity*price (e.g. product(5,2.21) == 11.05)?
> >> 
> >> Many thanks
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


DIH : modify document in sibling entity of root entity

2011-03-10 Thread Chantal Ackermann
Dear all,

in DIH, is it possible to have two sibling entities where:

- the first one is the root entity that creates the documents by
iterating over a table that has one row per document.
- the second one is executed after the completion of the first entity
iteration, and it provides more data that is added to the newly created
documents.


I've set up such a dih configuration, and the second entity is executed,
but no data is written into the index apart from the data extracted by
the root entity  (=no document is modified?).

Documents are identified by the unique key 'id' which is defined by
pk="id" on both entities.

Is this supposed to work at all? I haven't found anything so far on the
net but I could have used the wrong keywords for searching, of course.

As answer to the maybe obvious question why I'm not using a subentity:
I thought that this solution might be faster because it iterates over
the second data source instead of hitting it with a query per each
document.

Anyway, the main reason I tried this is because I want to know whether
it works. I'm still not sure whether it should work but I'm doing
something wrong...


Thanks!
Chantal



Re: FunctionQueries and FieldCache and OOM

2011-03-10 Thread Markus Jelsma
Alright, i can now confirm the issue has been resolved by reducing precision. 
The garbage collector on nodes without reduced precision has a real hard time 
keeping up and clearly shows a very different graph of heap consumption.

Consider using MINUTE, HOUR or DAY as precision in case you suffer from 
excessive memory consumption:

recip(ms(NOW/,),,1,1)

On Thursday 10 March 2011 15:14:25 Markus Jelsma wrote:
> Well, it's quite hard to debug because the values listed on the stats page
> in the fieldCache section don't make much sense. Reducing precision with
> NOW/HOUR, however, does seem to make a difference.
> 
> It is hard (or impossible) to reproduce this is a test setup with the same
> index but without continues updates and without stress tests. Firing manual
> queries with different values for the bf parameter don't show any
> difference in the values listed on the stats page.
> 
> Someone cares to provide an explanation?
> 
> Thanks
> 
> On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote:
> > Hi,
> > 
> > In one of the environments i'm working on (4 Solr 1.4.1. nodes with
> > replication, 3+ million docs, ~5.5GB index size, high commit rate
> > (~1-2min), high query rate (~50q/s), high number of updates
> > (~1000docs/commit)) the nodes continuously run out of memory.
> > 
> > During development we frequently ran excessive stress tests and after
> > tuning JVM and Solr settings all ran fine. A while ago i added the DisMax
> > bq parameter for boosting recent documents, documents older than a day
> > receive 50% less boost, similar to the example but with a much steeper
> > slope. For clarity, i'm not using the ordinal function but the reciprocal
> > version in the bq parameter which is warned against when using Solr 1.4.1
> > according to the wiki.
> > 
> > This week we started the stress tests and nodes are going down again.
> > I've reconfigured the nodes to have different settings for the bq
> > parameter (or no bq parameter).
> > 
> > It seems the bq the cause of the misery.
> > 
> > Issue SOLR- keeps popping up but it has not been resolved. Is there
> > anyone who can confirm one of those patches fixes this issue before i
> > waste hours of work finding out it doesn't? ;)
> > 
> > Am i correct when i assume that Lucene FieldCache entries are added for
> > each unique function query?  In that case, every query is a unique cache
> > entry because it operates on milliseconds. If all doesn't work i might be
> > able to reduce precision by operating on minutes or even more instead of
> > milli seconds. I, however, cannot use other nice math function in the
> > ms() parameter so that might make things difficult.
> > 
> > However, date math seems available (NOW/HOUR) so i assume it would also
> > work for /HOUR as well. This way i just might prevent
> > useless entries.
> > 
> > My apologies for this long mail but it may prove useful for other users
> > and hopefully we find the solution and can update the wiki to add this
> > warning.
> > 
> > Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: NRT and warmupTime of filterCache

2011-03-10 Thread stockii
>> Maybe the article is out of date? 
  - maybe .. i dont know

in my case it make no sense and i use another configuration ...


-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores < 100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2660814.html
Sent from the Solr - User mailing list archive at Nabble.com.


Error on string searching # [STRANGE]

2011-03-10 Thread Dario Rigolin
I have a text field indexed using WordDelimeter
Indexed in that way

S.#L.W.VI.37
...


Serching in that way:
http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:("S.#L.W.VI.37")

Makes this error:

org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:("S.': 
Lexical error at line 1, column 17.  Encountered:  after : "\"S."

It seems that # is a wrong character for query... I try urlencoding o adding a 
slash before or removing quotes but other errors comes:

http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)

org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': 
Encountered "" at line 1, column 15.
Was expecting one of:
 ...
 ...
 ...
"+" ...
"-" ...
"(" ...
")" ...
"*" ...
"^" ...
 ...
 ...
 ...
 ...
 ...
"[" ...
"{" ...
 ...


Any idea how to solve this?
Maybe a bug? Or probably I'm missing something.

Dario.


Re: Error on string searching # [STRANGE]

2011-03-10 Thread Juan Grande
I think that the problem is with the "#" symbol, because it has a special
meaning when used inside a URL. Try replacing it with "%23", like this:
http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:("S.%23L.W.VI.37")

Regards,
*
Juan G. Grande*
-- Solr Consultant @ http://www.plugtree.com
-- Blog @ http://juanggrande.wordpress.com


On Thu, Mar 10, 2011 at 12:45 PM, Dario Rigolin
wrote:

> I have a text field indexed using WordDelimeter
> Indexed in that way
> 
> S.#L.W.VI.37
> ...
> 
>
> Serching in that way:
> http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:("S.#L.W.VI.37")
>
> Makes this error:
>
> org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:("S.':
> Lexical error at line 1, column 17.  Encountered:  after : "\"S."
>
> It seems that # is a wrong character for query... I try urlencoding o
> adding a
> slash before or removing quotes but other errors comes:
>
> http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)
>
> org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.':
> Encountered "" at line 1, column 15.
> Was expecting one of:
> ...
> ...
> ...
>"+" ...
>"-" ...
>"(" ...
>")" ...
>"*" ...
>"^" ...
> ...
> ...
> ...
> ...
> ...
>"[" ...
>"{" ...
> ...
>
>
> Any idea how to solve this?
> Maybe a bug? Or probably I'm missing something.
>
> Dario.
>


Re: Math-generated fields during query

2011-03-10 Thread dan sutton
As a workaround can you not have a search component run after the
querycomponent, and have the qty_ordered,unit_price as stored fields
and returned with the fl parameter and have your custom component do
the calc, unless you need to sort by this value too?

Dan

On Wed, Mar 9, 2011 at 10:06 PM, Peter Sturge  wrote:
> Hi,
>
> I was wondering if it is possible during a query to create a returned
> field 'on the fly' (like function query, but for concrete values, not
> score).
>
> For example, if I input this query:
>   q=_val_:"product(15,3)"&fl=*,score
>
> For every returned document, I get score = 45.
>
> If I change it slightly to add *:* like this:
>   q=*:* _val_:"product(15,3)"&fl=*,score
>
> I get score = 32.526913.
>
> If I try my use case of _val_:"product(qty_ordered,unit_price)", I get
> varying scores depending on...well depending on something.
>
> I understand this is doing relevance scoring, but it doesn't seem to
> tally with the FunctionQuery Wiki
> [example at the bottom of the page]:
>
>   q=boxname:findbox+_val_:"product(product(x,y),z)"&fl=*,score
> ...where score will contain the resultant volume.
>
> Is there a trick to getting not a score, but the actual value of
> quantity*price (e.g. product(5,2.21) == 11.05)?
>
> Many thanks
>


Re: Error on string searching # [STRANGE]

2011-03-10 Thread Dario Rigolin
On Thursday, March 10, 2011 04:53:51 pm Juan Grande wrote:
> I think that the problem is with the "#" symbol, because it has a special
> meaning when used inside a URL. Try replacing it with "%23", like this:
> http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:("S.%23L.W.VI.37")

If I do urlencoding and changing in %23 I get this error


3

java.lang.ArrayIndexOutOfBoundsException: 3
at 
org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhraseQuery.java:185)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:208)
at org.apache.lucene.search.Searcher.search(Searcher.java:88)

 
> Regards,
> *
> Juan G. Grande*
> -- Solr Consultant @ http://www.plugtree.com
> -- Blog @ http://juanggrande.wordpress.com
> 
> 
> On Thu, Mar 10, 2011 at 12:45 PM, Dario Rigolin
> 
> wrote:
> > I have a text field indexed using WordDelimeter
> > Indexed in that way
> > 
> > S.#L.W.VI.37
> > ...
> > 
> > 
> > Serching in that way:
> > http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:("S.#L.W.VI.37")
> > 
> > Makes this error:
> > 
> > org.apache.lucene.queryParser.ParseException: Cannot parse
> > 'myfield:("S.': Lexical error at line 1, column 17.  Encountered: 
> > after : "\"S."
> > 
> > It seems that # is a wrong character for query... I try urlencoding o
> > adding a
> > slash before or removing quotes but other errors comes:
> > 
> > http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)
> > 
> > org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.':
> > Encountered "" at line 1, column 15.
> > 
> > Was expecting one of:
> > ...
> > ...
> > ...
> >"+" ...
> >"-" ...
> >"(" ...
> >")" ...
> >"*" ...
> >"^" ...
> > ...
> > ...
> > ...
> > ...
> > ...
> >"[" ...
> >"{" ...
> > ...
> > 
> > Any idea how to solve this?
> > Maybe a bug? Or probably I'm missing something.
> > 
> > Dario.


Re: DIH : modify document in sibling entity of root entity

2011-03-10 Thread Stefan Matheis
Hi Chantal,

i'm not sure if i understood you correctly (if at all)? Two entities,
not arranged as sub-entitiy, but using values from the previous
entity? Could you paste your dataimport & the relevant part of the
logging-output?

Regards
Stefan

On Thu, Mar 10, 2011 at 4:12 PM, Chantal Ackermann
 wrote:
> Dear all,
>
> in DIH, is it possible to have two sibling entities where:
>
> - the first one is the root entity that creates the documents by
> iterating over a table that has one row per document.
> - the second one is executed after the completion of the first entity
> iteration, and it provides more data that is added to the newly created
> documents.
>
>
> I've set up such a dih configuration, and the second entity is executed,
> but no data is written into the index apart from the data extracted by
> the root entity  (=no document is modified?).
>
> Documents are identified by the unique key 'id' which is defined by
> pk="id" on both entities.
>
> Is this supposed to work at all? I haven't found anything so far on the
> net but I could have used the wrong keywords for searching, of course.
>
> As answer to the maybe obvious question why I'm not using a subentity:
> I thought that this solution might be faster because it iterates over
> the second data source instead of hitting it with a query per each
> document.
>
> Anyway, the main reason I tried this is because I want to know whether
> it works. I'm still not sure whether it should work but I'm doing
> something wrong...
>
>
> Thanks!
> Chantal
>
>


Re: True master-master fail-over without data gaps (choosing CA in CAP)

2011-03-10 Thread Otis Gospodnetic
Hi,



- Original Message 
> From: Jake Luciani 
> To: solr-user@lucene.apache.org
> Sent: Wed, March 9, 2011 8:07:00 PM
> Subject: Re: True master-master fail-over without data gaps (choosing CA in 
>CAP)
> 
> Yeah sure.  Let me update this on the Solandra wiki. I'll send across  the
> link

Excellent.  You could include ES there, too, if you feel extra adventurous. ;)

> I think you hit the main two shortcomings  atm.

- Grandma, why are your eyes so big? 
- To see you better.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


> -Jake
> 
> On Wed, Mar 9, 2011 at 6:17 PM, Otis Gospodnetic  >  wrote:
> 
> > Jake,
> >
> > Maybe it's time to come up with the  Solandra/Solr matrix so we can see
> > Solandra's strengths (e.g. RT, no  replication) and weaknesses (e.g. I think
> > I
> > saw a mention of  some big indices?) or missing feature (e.g. no delete by
> > query),  etc.
> >
> > Thanks!
> > Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > - Original  Message 
> > > From: Jake Luciani 
> > > To: "solr-user@lucene.apache.org"  
> >  > Sent: Wed, March 9, 2011 6:04:13 PM
> > > Subject: Re: True  master-master fail-over without data gaps (choosing CA
> > in
> >  >CAP)
> > >
> > > Jason,
> > >
> > > It's  predecessor did, Lucandra. But Solandra is a new approach  that
> >  manages
> > >shards of documents across the cluster for you and uses  solrs  distributed
> > >search to query indexes.
> >  >
> > >
> > > Jake
> > >
> > > On Mar 9, 2011, at  5:15  PM, Jason Rutherglen <
> > jason.rutherg...@gmail.com>
> >  >wrote:
> > >
> > > > Doesn't Solandra partition by term  instead of  document?
> > > >
> > > > On Wed, Mar 9,  2011 at 2:13 PM, Smiley, David W.  
> > wrote:
> >  > >> I  was just about to jump in this conversation to mention  Solandra 
and
> > go
> > >fig,  Solandra's committer comes in.  :-)   It was nice to meet you at
> > Strata,
> > >Jake.
> >  > >>
> > > >> I haven't dug into the code yet but  Solandra  strikes me as a killer
> > way to
> > >scale Solr. I'm  looking forward to playing with  it; particularly looking
> >  at
> > >disk requirements and performance  measurements.
> >  > >>
> > > >> ~ David Smiley
> > > >>
> >  > >>  On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote:
> > >  >>
> > > >>> Hi  Otis,
> > >  >>>
> > > >>> Have you considered using Solandra  with  Quorum writes
> > > >>> to achieve master/master with  CA  semantics?
> > > >>>
> > > >>>  -Jake
> > > >>>
> > > >>>
> > >  >>> On Wed, Mar 9, 2011 at 2:48 PM, Otis  Gospodnetic
> >  > >  >   wrote:
> > > >>>
> > >   Hi,
> > > 
> > >    Original Message 
> > > 
> > >  > From: Robert Petersen 
> > >  >
> > > > Can't you skip the SAN  and keep the indexes  locally?  Then you
> >  would
> >  > > have two redundant  copies of the index and no  lock issues.
> > > 
> > >   I  could, but then I'd have the issue of keeping them in sync, which
> >  >seems
> > >  more
> > >   fragile.  I think SAN  makes things simpler overall.
> > >  
> > > > Also,  Can't master02  just be a slave to master01 (in the master
> > farm
> > >and
> >  > > separate from the slave farm) until such time as   master01 fails?
> > Then
> > > 
> > >   No, because  it wouldn't be in sync.  It would always  be N minutes
> > >behind,
> > >  and
> > >   when the primary master  fails, the secondary would not  have all 
the
> > docs
> > >-
> > >    data
> > >  loss.
> > > 
> >  > >  master02 would start receiving the new documents  with an   
indexes
> > > > complete up to the last  replication at least and  the other slaves
> > would
> > >  > be directed by LB to poll  master02 also...
> >  > 
> > >  Yeah, "complete up to   the last replication" is the problem.  It's a
> > data
> > >    gap
> > >  that now needs to be  filled somehow.
> > > 
> > >   Otis
> > >  
> > >   Sematext  :: http://sematext.com/ ::  Solr - Lucene - Nutch
> > >   Lucene ecosystem search :: http://search-lucene.com/
> > >  
> > > 
> > > >  -Original   Message-
> > > > From: Otis  Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> >  > >  Sent: Wednesday, March 09, 2011 9:47 AM
> >  > > To: solr-user@lucene.apache.org
> >  > >  Subject:  Re: True master-master fail-over  without data gaps
> > (choosing
> > >CA
> > >  > in  CAP)
> > > >
> >  > > Hi,
> > > >
> > >  >
> > > > - Original Message  
> > > >>  From: Walter  Underwood  
> > >  >
> > > >> On  Mar 9, 2011,  at 9:02 AM, Otis Gospodnetic  wrote:
> > >  >>
> > > >>> You   mean  it's  not possible to have 2 masters that are in
> >   nearly
> > > > real-time
> > >  >>  sync?
> > > >>>  How  about with DRBD?  I know  peop

question regarding proper placement of geofilt in fq=

2011-03-10 Thread Jerry Mindek
Hi,

I am using rev 1036236 of solr trunk running as a servlet in Tomcat 7.
The doc set is sharded over 11 shards.
Currently, I have all the shards running in a single tomcat.

Please see the bottom of the email for the bits of my schema.xml and 
solrconfig.xml that might help you understand my configuration.

I am seeing what I think is strange behavior when I try to use the geofilt in a 
filter query.
Here's what I am seeing:


1.   If put the {!geofilt} as the last argument of the fq= parameter and I 
send the following distributed query to my sharded index:
/select?&start=0&rows=30&q=food&fq=b_type:shops AND 
{!geofilt}&qt=spatialdismax&fl=*%2Cscore&facet=false&pt=38.029191,-78.479266&sfield=lat_long&d=80&shards=...
I get a syntax error. Which seems odd to me.


2.   If I move the {!geofilt} to the first position in the fq= and send the 
following distributed query:
/select?&start=0&rows=30&q=food&fq={!geofilt} AND 
b_type:T01&qt=spatialdismax&fl=*%2Cscore&facet=false&pt=38.029191,-78.479266&sfield=lat_long&d=80&shards=...
Then only the geofilt is apply, not the b_type:T01. Which seems odd to me. I 
would expect both filters to be applied.


3.   Finally, when I submit this query as:
/select?&start=0&rows=30&q=food&fq=_query_:"{!geofilt} AND 
b_type:T01&qt=spatialdismax&fl=*%2Cscore&facet=false&pt=38.029191,-78.479266&sfield=lat_long&d=80&shards=...
This works as I had hoped, i.e. both the geofilt and the b_type filters are 
applied.

Am I trying to use geofilt in the wrong way or is this possibly a bug?

Thanks,
Jerry Mindek













...







 score desc
 true
 1
 explicit
 20
 0.01
 
cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
 
 
cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
 
   dn, cn,  t1, stat, pst, pct, ts, sv, score
 
2<-1 5<-2 6<90%
 
 100
 *:*

  




Re: Error on string searching # [STRANGE] [FIX]

2011-03-10 Thread Dario Rigolin
On Thursday, March 10, 2011 04:58:43 pm Dario Rigolin wrote:

It seems fixed by setting into WordDelimiterTokenizer

catenateWords="0" catenateNumbers="0" 

Instead of "1" on both...

Nice to know...


> On Thursday, March 10, 2011 04:53:51 pm Juan Grande wrote:
> > I think that the problem is with the "#" symbol, because it has a special
> > meaning when used inside a URL. Try replacing it with "%23", like this:
> > http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:("S.%23L.W.VI.37")
> 
> If I do urlencoding and changing in %23 I get this error
> 
> 
> 3
> 
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at
> org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhr
> aseQuery.java:185) at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:208) at
> org.apache.lucene.search.Searcher.search(Searcher.java:88)
> 
> 
> > Regards,
> > *
> > Juan G. Grande*
> > -- Solr Consultant @ http://www.plugtree.com
> > -- Blog @ http://juanggrande.wordpress.com
> > 
> > 
> > On Thu, Mar 10, 2011 at 12:45 PM, Dario Rigolin
> > 
> > wrote:
> > > I have a text field indexed using WordDelimeter
> > > Indexed in that way
> > > 
> > > S.#L.W.VI.37
> > > ...
> > > 
> > > 
> > > Serching in that way:
> > > http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:("S.#L.W.VI.37")
> > > 
> > > Makes this error:
> > > 
> > > org.apache.lucene.queryParser.ParseException: Cannot parse
> > > 'myfield:("S.': Lexical error at line 1, column 17.  Encountered: 
> > > after : "\"S."
> > > 
> > > It seems that # is a wrong character for query... I try urlencoding o
> > > adding a
> > > slash before or removing quotes but other errors comes:
> > > 
> > > http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)
> > > 
> > > org.apache.lucene.queryParser.ParseException: Cannot parse
> > > 'myfield:(S.': Encountered "" at line 1, column 15.
> > > 
> > > Was expecting one of:
> > > ...
> > > ...
> > > ...
> > >"+" ...
> > >"-" ...
> > >"(" ...
> > >")" ...
> > >"*" ...
> > >"^" ...
> > > ...
> > > ...
> > > ...
> > > ...
> > > ...
> > >"[" ...
> > >"{" ...
> > > ...
> > > 
> > > Any idea how to solve this?
> > > Maybe a bug? Or probably I'm missing something.
> > > 
> > > Dario.


Re: DIH : modify document in sibling entity of root entity

2011-03-10 Thread Gora Mohanty
On Thu, Mar 10, 2011 at 8:42 PM, Chantal Ackermann
 wrote:
[...]
> Is this supposed to work at all? I haven't found anything so far on the
> net but I could have used the wrong keywords for searching, of course.
>
> As answer to the maybe obvious question why I'm not using a subentity:
> I thought that this solution might be faster because it iterates over
> the second data source instead of hitting it with a query per each
> document.
[...]

I think that what you are after can be handled by Solr's
CachedSqlEntityProcessor:
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor

Two major caveats here:
* I am not 100% sure that I have understood your requirements.
* The documentation for CachedSqlEntityProcessor needs to be improved.
  Will see if I can test it, and come up with a better example. As I have
  not actually used this, it could be that I have misunderstood its purpose.

Regards,
Gora


Re: disquery - difference qf qs / pf ps

2011-03-10 Thread Jonathan Rochkind

On 3/10/2011 8:15 AM, Gastone Penzo wrote:

Thank you very much. i understand the difference beetween qs and ps but not
what pf is...is it necessary to use ps?


It's not neccesary to use anything, including Solr.

pf:  Will take the entire query the user entered, make it into a single 
phrase, and boost documents within the already existing result set that 
match that phrase. pf does not change the result set, it just changes 
the ranking.
ps: Will set phrase query slop on that pf query of the entire entered 
search string, that effects boosting.





Re: DIH : modify document in sibling entity of root entity

2011-03-10 Thread Chantal Ackermann
Hi Stefan,

thanks for your time!

No, the second entity is not reusing values from the previous one. It
just provides more fields to it, and, of course the unique identifier -
which in case of the second entity is not unique:



   






and here are the fields:







(For the sake of simplicity I've removed some fields that would be
created using copyfield instructions and transformers.)

I'm currently trying to run this using a subentity using the SQL
restriction "SUBVALUE like '${contributor.id};%'" but this takes ages...

The other one finished in under a minute (and it did actually process
the second entity, I think, it just didn't modify the index). The
current one runs for about 30min, and has only processed 22,000
documents out of more than 390,000. (Of course, there is probably no
index on that column)


Thanks for any suggestions!
Chantal




On Thu, 2011-03-10 at 17:13 +0100, Stefan Matheis wrote:
> Hi Chantal,
> 
> i'm not sure if i understood you correctly (if at all)? Two entities,
> not arranged as sub-entitiy, but using values from the previous
> entity? Could you paste your dataimport & the relevant part of the
> logging-output?
> 
> Regards
> Stefan
> 
> On Thu, Mar 10, 2011 at 4:12 PM, Chantal Ackermann
>  wrote:
> > Dear all,
> >
> > in DIH, is it possible to have two sibling entities where:
> >
> > - the first one is the root entity that creates the documents by
> > iterating over a table that has one row per document.
> > - the second one is executed after the completion of the first entity
> > iteration, and it provides more data that is added to the newly created
> > documents.
> >
> >
> > I've set up such a dih configuration, and the second entity is executed,
> > but no data is written into the index apart from the data extracted by
> > the root entity  (=no document is modified?).
> >
> > Documents are identified by the unique key 'id' which is defined by
> > pk="id" on both entities.
> >
> > Is this supposed to work at all? I haven't found anything so far on the
> > net but I could have used the wrong keywords for searching, of course.
> >
> > As answer to the maybe obvious question why I'm not using a subentity:
> > I thought that this solution might be faster because it iterates over
> > the second data source instead of hitting it with a query per each
> > document.
> >
> > Anyway, the main reason I tried this is because I want to know whether
> > it works. I'm still not sure whether it should work but I'm doing
> > something wrong...
> >
> >
> > Thanks!
> > Chantal
> >
> >



Re: Math-generated fields during query

2011-03-10 Thread Peter Sturge
Hi Dan,

Yes, you're right - in fact that was precisely what I was thinking of
doing! Also looking at SOLR-1298 & SOLR-1566 - which would be good for
applying functions generically rather than on a per-use-case basis.

Thanks!
Peter


On Thu, Mar 10, 2011 at 3:58 PM, dan sutton  wrote:
> As a workaround can you not have a search component run after the
> querycomponent, and have the qty_ordered,unit_price as stored fields
> and returned with the fl parameter and have your custom component do
> the calc, unless you need to sort by this value too?
>
> Dan
>
> On Wed, Mar 9, 2011 at 10:06 PM, Peter Sturge  wrote:
>> Hi,
>>
>> I was wondering if it is possible during a query to create a returned
>> field 'on the fly' (like function query, but for concrete values, not
>> score).
>>
>> For example, if I input this query:
>>   q=_val_:"product(15,3)"&fl=*,score
>>
>> For every returned document, I get score = 45.
>>
>> If I change it slightly to add *:* like this:
>>   q=*:* _val_:"product(15,3)"&fl=*,score
>>
>> I get score = 32.526913.
>>
>> If I try my use case of _val_:"product(qty_ordered,unit_price)", I get
>> varying scores depending on...well depending on something.
>>
>> I understand this is doing relevance scoring, but it doesn't seem to
>> tally with the FunctionQuery Wiki
>> [example at the bottom of the page]:
>>
>>   q=boxname:findbox+_val_:"product(product(x,y),z)"&fl=*,score
>> ...where score will contain the resultant volume.
>>
>> Is there a trick to getting not a score, but the actual value of
>> quantity*price (e.g. product(5,2.21) == 11.05)?
>>
>> Many thanks
>>
>


Re: DIH : modify document in sibling entity of root entity

2011-03-10 Thread Chantal Ackermann
Hi Gora,

thanks for making me read this part of the documentation again!
This processor probably cannot do what I need out of the box but I will
try to extend it to allow specifying a regular expression in its "where"
attribute.

Thanks!
Chantal

On Thu, 2011-03-10 at 17:39 +0100, Gora Mohanty wrote:
> On Thu, Mar 10, 2011 at 8:42 PM, Chantal Ackermann
>  wrote:
> [...]
> > Is this supposed to work at all? I haven't found anything so far on the
> > net but I could have used the wrong keywords for searching, of course.
> >
> > As answer to the maybe obvious question why I'm not using a subentity:
> > I thought that this solution might be faster because it iterates over
> > the second data source instead of hitting it with a query per each
> > document.
> [...]
> 
> I think that what you are after can be handled by Solr's
> CachedSqlEntityProcessor:
> http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
> 
> Two major caveats here:
> * I am not 100% sure that I have understood your requirements.
> * The documentation for CachedSqlEntityProcessor needs to be improved.
>   Will see if I can test it, and come up with a better example. As I have
>   not actually used this, it could be that I have misunderstood its purpose.
> 
> Regards,
> Gora



Custom fieldtype with sharding?

2011-03-10 Thread Peter Cline

Hi all,
I'm having an issue with using a custom fieldtype with distributed 
search.  It may be the case that what I'm looking for could be 
accomplished in a different way, but this is my first stab at it.


I'm looking to store XML in a field.  What I've done, which works fine, 
is to:

- on ingest, wrap the XML in a CDATA tag
- write a simple class that extends org.apache.solr.schema.TextField, 
which writes an XML node much in the way that a textfield would, but 
without escaping the contents


It looks like this:
public class XMLField extends TextField {
   @Override
   public void write(TextResponseWriter xmlWriter, String name, 
Fieldable f)

 throws java.io.IOException {
  Writer writer = xmlWriter.getWriter();
  writer.write("');
  writer.write(f.stringValue(), 0, f.stringValue() == null ? 0 : 
f.stringValue().length());

  writer.write("");
 }
}

Like I said, simple.  Not especially pretty, but it does the job.  Works 
fine for normal searching, I get back a response like:



When I try to use this with distributed searching, though, it comes back 
written as a normal textfield, like:



It looks like it doesn't know anything about my custom fieldtype at all, 
and is defaulting to writing it as a StrField or TextField instead.


So, my question:
- is there a better way to do this?  I'd be fine if it came back with a 
'str' element name, as long as it's not escaped.
- is there perhaps a different class I should extend to do this with 
sharded searching?
- should I just bite the bullet and manually unescape the xml after 
receiving the response?  I'd really prefer not to do this if I can get 
around it.


Thanks in advance for any help.

Peter


Re: question regarding proper placement of geofilt in fq=

2011-03-10 Thread Bill Bell
Can you use 2 fq parameters ? The default op is usually set to AND.

Bill Bell
Sent from mobile
 

On Mar 10, 2011, at 9:33 AM, Jerry Mindek  wrote:

> Hi,
> 
> I am using rev 1036236 of solr trunk running as a servlet in Tomcat 7.
> The doc set is sharded over 11 shards.
> Currently, I have all the shards running in a single tomcat.
> 
> Please see the bottom of the email for the bits of my schema.xml and 
> solrconfig.xml that might help you understand my configuration.
> 
> I am seeing what I think is strange behavior when I try to use the geofilt in 
> a filter query.
> Here's what I am seeing:
> 
> 
> 1.   If put the {!geofilt} as the last argument of the fq= parameter and 
> I send the following distributed query to my sharded index:
> /select?&start=0&rows=30&q=food&fq=b_type:shops AND 
> {!geofilt}&qt=spatialdismax&fl=*%2Cscore&facet=false&pt=38.029191,-78.479266&sfield=lat_long&d=80&shards=...
> I get a syntax error. Which seems odd to me.
> 
> 
> 2.   If I move the {!geofilt} to the first position in the fq= and send 
> the following distributed query:
> /select?&start=0&rows=30&q=food&fq={!geofilt} AND 
> b_type:T01&qt=spatialdismax&fl=*%2Cscore&facet=false&pt=38.029191,-78.479266&sfield=lat_long&d=80&shards=...
> Then only the geofilt is apply, not the b_type:T01. Which seems odd to me. I 
> would expect both filters to be applied.
> 
> 
> 3.   Finally, when I submit this query as:
> /select?&start=0&rows=30&q=food&fq=_query_:"{!geofilt} AND 
> b_type:T01&qt=spatialdismax&fl=*%2Cscore&facet=false&pt=38.029191,-78.479266&sfield=lat_long&d=80&shards=...
> This works as I had hoped, i.e. both the geofilt and the b_type filters are 
> applied.
> 
> Am I trying to use geofilt in the wrong way or is this possibly a bug?
> 
> Thanks,
> Jerry Mindek
> 
> 
> 
> 
>  />
> 
> 
> 
> 
> 
> 
> 
>  subFieldSuffix="_coordinate"/>
> ...
> 
> 
> 
> 
> 
> 
>
> score desc
> true
> 1
> explicit
> 20
> 0.01
> 
>cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
> 
> 
>cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
> 
>   dn, cn,  t1, stat, pst, pct, ts, sv, score
> 
>2<-1 5<-2 6<90%
> 
> 100
> *:*
>
>  
> 
> 


Re: question regarding proper placement of geofilt in fq=

2011-03-10 Thread Bill Bell
Also _query_ is the right approach when using fq with 2 Boolean values. Just 
make sure you double quote the "{!geofilt}" when using that.

Bill Bell
Sent from mobile


On Mar 10, 2011, at 9:33 AM, Jerry Mindek  wrote:

> Hi,
> 
> I am using rev 1036236 of solr trunk running as a servlet in Tomcat 7.
> The doc set is sharded over 11 shards.
> Currently, I have all the shards running in a single tomcat.
> 
> Please see the bottom of the email for the bits of my schema.xml and 
> solrconfig.xml that might help you understand my configuration.
> 
> I am seeing what I think is strange behavior when I try to use the geofilt in 
> a filter query.
> Here's what I am seeing:
> 
> 
> 1.   If put the {!geofilt} as the last argument of the fq= parameter and 
> I send the following distributed query to my sharded index:
> /select?&start=0&rows=30&q=food&fq=b_type:shops AND 
> {!geofilt}&qt=spatialdismax&fl=*%2Cscore&facet=false&pt=38.029191,-78.479266&sfield=lat_long&d=80&shards=...
> I get a syntax error. Which seems odd to me.
> 
> 
> 2.   If I move the {!geofilt} to the first position in the fq= and send 
> the following distributed query:
> /select?&start=0&rows=30&q=food&fq={!geofilt} AND 
> b_type:T01&qt=spatialdismax&fl=*%2Cscore&facet=false&pt=38.029191,-78.479266&sfield=lat_long&d=80&shards=...
> Then only the geofilt is apply, not the b_type:T01. Which seems odd to me. I 
> would expect both filters to be applied.
> 
> 
> 3.   Finally, when I submit this query as:
> /select?&start=0&rows=30&q=food&fq=_query_:"{!geofilt} AND 
> b_type:T01&qt=spatialdismax&fl=*%2Cscore&facet=false&pt=38.029191,-78.479266&sfield=lat_long&d=80&shards=...
> This works as I had hoped, i.e. both the geofilt and the b_type filters are 
> applied.
> 
> Am I trying to use geofilt in the wrong way or is this possibly a bug?
> 
> Thanks,
> Jerry Mindek
> 
> 
> 
> 
>  />
> 
> 
> 
> 
> 
> 
> 
>  subFieldSuffix="_coordinate"/>
> ...
> 
> 
> 
> 
> 
> 
>
> score desc
> true
> 1
> explicit
> 20
> 0.01
> 
>cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
> 
> 
>cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
> 
>   dn, cn,  t1, stat, pst, pct, ts, sv, score
> 
>2<-1 5<-2 6<90%
> 
> 100
> *:*
>
>  
> 
> 


Re: docBoost

2011-03-10 Thread Brian Lamb
Okay I think I have the idea:


  
  
 
 
  
  
  
  
http://localhost/solr/search/?q=dog
Boosted search: http://localhost/solr/search?q=dog&boost=true

To achieve this, would it be applied in the data import handler? If so, what
would I need to put in for some_condition?

Thanks for all the help so far. I truly do appreciate it.

Thanks,

Brian Lamb

On Wed, Mar 9, 2011 at 11:50 PM, Bill Bell  wrote:

> Yes just add if statement based on a field type and do a row.put() only if
> that other value is a certain value.
>
>
>
> On 3/9/11 1:39 PM, "Brian Lamb"  wrote:
>
> >That makes sense. As a follow up, is there a way to only conditionally use
> >the boost score? For example, in some cases I want to use the boost score
> >and in other cases I want all documents to be treated equally.
> >
> >On Wed, Mar 9, 2011 at 2:42 PM, Jayendra Patil
> > >> wrote:
> >
> >> you can use the ScriptTransformer to perform the boost calcualtion and
> >> addition.
> >> http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer
> >>
> >> 
> >> >>function f1(row)  {
> >>// Add boost
> >>row.put('$docBoost',1.5);
> >>return row;
> >>}
> >>]]>
> >>
> >> >> query="select * from X">
> >>
> >>
> >>
> >> 
> >>
> >> Regards,
> >> Jayendra
> >>
> >>
> >> On Wed, Mar 9, 2011 at 2:01 PM, Brian Lamb
> >>  wrote:
> >> > Anyone have any clue on this on?
> >> >
> >> > On Tue, Mar 8, 2011 at 2:11 PM, Brian Lamb <
> >> brian.l...@journalexperts.com>wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >> I am using dataimport to create my index and I want to use docBoost
> >>to
> >> >> assign some higher weights to certain docs. I understand the concept
> >> behind
> >> >> docBoost but I haven't been able to find an example anywhere that
> >>shows
> >> how
> >> >> to implement it. Assuming the following config file:
> >> >>
> >> >> 
> >> >> >> >>   dataSource="animals"
> >> >>   pk="id"
> >> >>   query="SELECT * FROM animals">
> >> >> 
> >> >> 
> >> >> 
> >> >>  >> >>dataSource="boosts"
> >> >>query="SELECT boost_score FROM boosts WHERE animal_id
> >>=
> >> ${
> >> >> animal.id}">
> >> >>   

Re: Sorting

2011-03-10 Thread Brian Lamb
Any ideas on this one?

On Wed, Mar 9, 2011 at 2:00 PM, Brian Lamb wrote:

> Hi all,
>
> I know that I can add &sort=score desc to the url to sort in descending
> order. However, I would like to sort a MoreLikeThis response which returns
> records like this:
>
> 
>   
>   
> 
>
> I don't want them grouped by result; I would just like have them all thrown
> together and then sorted according to score. I have an XSLT which does put
> them altogether and returns the following:
>
> 
>   
> x.
> some_id
>   
> 
>
> However it appears that it basically applies the stylesheet to result
> name="3" then result name="2".
>
> How can I make it so that with my XSLT, the results appear sorted by
> ?
>


Solr

2011-03-10 Thread yazhini.k vini
Hi ,

I need notes and detail about solr because of Now I am working in solr so i
need help .


Regards ,

Yazhini . K
 NCSI ,
 M.Sc ( Software Engineering ) .


Re: Possible to sort in .xml file?

2011-03-10 Thread Chris Hostetter

: I know its possible to do via adding sort= , but the Perl module
: (WebService::Solr) doesn't seem to offer the option to pass in this value :(

according to the docs, you can pass any query params you want to the sort 
method...

http://search.cpan.org/~bricas/WebService-Solr-0.11/lib/WebService/Solr.pm#search%28_$query,_\%options_%29

>> All key-value pairs supplied in \%options are serialzied in the request 
>> URL.


-Hoss


Re: Solr

2011-03-10 Thread Geert-Jan Brits
Start by reading  http://wiki.apache.org/solr/FrontPage and the provided
links (introduction, tutorial, etc. )

2011/3/10 yazhini.k vini 

> Hi ,
>
> I need notes and detail about solr because of Now I am working in solr so i
> need help .
>
>
> Regards ,
>
> Yazhini . K
>  NCSI ,
>  M.Sc ( Software Engineering ) .
>


If statements in DataImportHandler?

2011-03-10 Thread Jason Rutherglen
Is it possible to conditionally load sub-entities in
DataImportHandler, based on the gathered value of parent entities?


Re: New PHP API for Solr (Logic Solr API)

2011-03-10 Thread Liam O'Boyle
How about the Solr PHP Client (http://code.google.com/p/solr-php-client/)?
 We use this and have been quite happy with it, and it seems that it
addresses all of the concerns you expressed.

What advantages does yours offer?

Liam

On 8 March 2011 17:02, Burak  wrote:

> On 03/07/2011 12:43 AM, Stefan Matheis wrote:
>
>> Burak,
>>
>> what's wrong with the existing PHP-Extension
>> (http://php.net/manual/en/book.solr.php)?
>>
> I think "wrong" is not the appropriate word here. But if I had to summarize
> why I wrote this API:
>
> * Not everybody is enthusiastic about adding another item to an already
> long list of server dependencies. I just wanted a pure PHP option.
> * I am not a C programmer either so the ability to understand the source
> code and modify it according to my needs is another advantage.
> * Yes, a PECL package would be faster. However, in 99% of the cases, after
> everything is said, coded, and byte-code cached, my biggest bottlenecks end
> up being the database and network.
> * Last of all, choice is what open source means to me.
>
> Burak
>
>
>
>
>
>
>
>
>


-- 
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

*Awarded 2010 "Best New Business" and "Business of the Year" - Business3000
Awards*

This email and any attachments are confidential and may contain legally
privileged information or copyright material. If you are not an intended
recipient, please contact us at once by return email and then delete both
messages. We do not accept liability in connection with transmission of
information using the internet.


Solr and Permissions

2011-03-10 Thread Liam O'Boyle
Morning,

We use solr to index a range of content to which, within our application,
access is restricted by a system of user groups and permissions.  In order
to ensure that search results don't reveal information about items which the
user doesn't have access to, we need to somehow filter the results; this
needs to be done within Solr itself, rather than after retrieval, so that
the facet and result counts are correct.

Currently we do this by creating a filter query which specifies all of the
items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)),
but this has definite scalability issues - we're starting to run into
issues, as this can be a set of ORs of potentially unlimited size (and
practically, we're hitting the low thousands sometimes).  While we can
adjust maxBooleanClauses upwards, I understand that this has performance
implications...

So, has anyone had to implement something similar in the past?  Any
suggestions for a more scalable approach?  Any advice on safe and sensible
limits on how far I can push maxBooleanClauses?

Thanks for your advice,

Liam


Re: Solr and Permissions

2011-03-10 Thread Sujit Pal
How about assigning content types to documents in the index, and map
users to a set of content types they are allowed to access? That way you
will pass in fewer parameters in the fq.

-sujit

On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
> Morning,
> 
> We use solr to index a range of content to which, within our application,
> access is restricted by a system of user groups and permissions.  In order
> to ensure that search results don't reveal information about items which the
> user doesn't have access to, we need to somehow filter the results; this
> needs to be done within Solr itself, rather than after retrieval, so that
> the facet and result counts are correct.
> 
> Currently we do this by creating a filter query which specifies all of the
> items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)),
> but this has definite scalability issues - we're starting to run into
> issues, as this can be a set of ORs of potentially unlimited size (and
> practically, we're hitting the low thousands sometimes).  While we can
> adjust maxBooleanClauses upwards, I understand that this has performance
> implications...
> 
> So, has anyone had to implement something similar in the past?  Any
> suggestions for a more scalable approach?  Any advice on safe and sensible
> limits on how far I can push maxBooleanClauses?
> 
> Thanks for your advice,
> 
> Liam



Re: If statements in DataImportHandler?

2011-03-10 Thread Gora Mohanty
On Fri, Mar 11, 2011 at 4:48 AM, Jason Rutherglen
 wrote:
> Is it possible to conditionally load sub-entities in
> DataImportHandler, based on the gathered value of parent entities?

Probably the easies way to do that is with a transformer.
Please see the DIH Wiki page for details:
http://wiki.apache.org/solr/DataImportHandler#Transformer

Regards,
Gora


Re: If statements in DataImportHandler?

2011-03-10 Thread Jason Rutherglen
Right that's not within the XML however, and it's unclear how to
access the upper level entities that have already been instantiated,
eg, beyond the given 'transform' row.

On Thu, Mar 10, 2011 at 8:02 PM, Gora Mohanty  wrote:
> On Fri, Mar 11, 2011 at 4:48 AM, Jason Rutherglen
>  wrote:
>> Is it possible to conditionally load sub-entities in
>> DataImportHandler, based on the gathered value of parent entities?
>
> Probably the easies way to do that is with a transformer.
> Please see the DIH Wiki page for details:
> http://wiki.apache.org/solr/DataImportHandler#Transformer
>
> Regards,
> Gora
>


Re: Solr and Permissions

2011-03-10 Thread go canal
I have similar requirements.

Content type is one solution; but there are also other use cases where this not 
enough.

Another requirement is, when the access permission is changed, we need to 
update 
the field - my understanding is we can not unless re-index the whole document 
again. Am I correct?
 thanks,
canal





From: Sujit Pal 
To: solr-user@lucene.apache.org
Sent: Fri, March 11, 2011 10:39:27 AM
Subject: Re: Solr and Permissions

How about assigning content types to documents in the index, and map
users to a set of content types they are allowed to access? That way you
will pass in fewer parameters in the fq.

-sujit

On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
> Morning,
> 
> We use solr to index a range of content to which, within our application,
> access is restricted by a system of user groups and permissions.  In order
> to ensure that search results don't reveal information about items which the
> user doesn't have access to, we need to somehow filter the results; this
> needs to be done within Solr itself, rather than after retrieval, so that
> the facet and result counts are correct.
> 
> Currently we do this by creating a filter query which specifies all of the
> items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)),
> but this has definite scalability issues - we're starting to run into
> issues, as this can be a set of ORs of potentially unlimited size (and
> practically, we're hitting the low thousands sometimes).  While we can
> adjust maxBooleanClauses upwards, I understand that this has performance
> implications...
> 
> So, has anyone had to implement something similar in the past?  Any
> suggestions for a more scalable approach?  Any advice on safe and sensible
> limits on how far I can push maxBooleanClauses?
> 
> Thanks for your advice,
> 
> Liam


  

Re: If statements in DataImportHandler?

2011-03-10 Thread Gora Mohanty
On Fri, Mar 11, 2011 at 10:23 AM, Jason Rutherglen
 wrote:
> Right that's not within the XML however, and it's unclear how to
> access the upper level entities that have already been instantiated,
> eg, beyond the given 'transform' row.

The second example for a ScriptTransformer in
http://wiki.apache.org/solr/DataImportHandler#Transformer
should give you an idea of how to proceed:
* row.get( 'category' ) gets the field 'category' from the
  current entity to which the ScriptTransformer is being
  applied.
* Fields from higher-level entities will need to be passed
  in using DIH variables. E.g., if you have a higher-level
  entity called 'parent', and are getting data from the current
  entity via a database select, e.g.,
 
 you will need to modify the query to something like
 
 and add
 
 inside the current entity (cannot remember now if this is
 required, or can be dispensed with).

Regards,
Gora


Re: DIH : modify document in sibling entity of root entity

2011-03-10 Thread Lance Norskog
The DIH is strictly tree-structured. Data flows down the tree. If the
first sibling is the root entity, nothing is used from the second
sibling. This configuration is something that it the DIH should fail.

On Thu, Mar 10, 2011 at 9:14 AM, Chantal Ackermann
 wrote:
> Hi Gora,
>
> thanks for making me read this part of the documentation again!
> This processor probably cannot do what I need out of the box but I will
> try to extend it to allow specifying a regular expression in its "where"
> attribute.
>
> Thanks!
> Chantal
>
> On Thu, 2011-03-10 at 17:39 +0100, Gora Mohanty wrote:
>> On Thu, Mar 10, 2011 at 8:42 PM, Chantal Ackermann
>>  wrote:
>> [...]
>> > Is this supposed to work at all? I haven't found anything so far on the
>> > net but I could have used the wrong keywords for searching, of course.
>> >
>> > As answer to the maybe obvious question why I'm not using a subentity:
>> > I thought that this solution might be faster because it iterates over
>> > the second data source instead of hitting it with a query per each
>> > document.
>> [...]
>>
>> I think that what you are after can be handled by Solr's
>> CachedSqlEntityProcessor:
>> http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
>>
>> Two major caveats here:
>> * I am not 100% sure that I have understood your requirements.
>> * The documentation for CachedSqlEntityProcessor needs to be improved.
>>   Will see if I can test it, and come up with a better example. As I have
>>   not actually used this, it could be that I have misunderstood its purpose.
>>
>> Regards,
>> Gora
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Solr and Permissions

2011-03-10 Thread Liam O'Boyle
As Canal points out,  grouping into types is not always possible.

In our case, permissions are not on a per-type level, but either on a per
"folder" (of which there can be hundreds) or per item in some cases (of
which there can be... any number at all).

Reindexing is also to slow to really be an option; some of the items use
Tika to extract content, which means that we need to reextract the content
(variable length of time; average is about half a second, but on some
documents it will sit there until the connection times out) .  Querying it,
modifying then resubmitting without rerunning content extraction is still
faster, but involves sending even more data over the network; either way is
relatively slow.

Liam

On 11 March 2011 16:24, go canal  wrote:

> I have similar requirements.
>
> Content type is one solution; but there are also other use cases where this
> not
> enough.
>
> Another requirement is, when the access permission is changed, we need to
> update
> the field - my understanding is we can not unless re-index the whole
> document
> again. Am I correct?
>  thanks,
> canal
>
>
>
>
> 
> From: Sujit Pal 
> To: solr-user@lucene.apache.org
> Sent: Fri, March 11, 2011 10:39:27 AM
> Subject: Re: Solr and Permissions
>
> How about assigning content types to documents in the index, and map
> users to a set of content types they are allowed to access? That way you
> will pass in fewer parameters in the fq.
>
> -sujit
>
> On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
> > Morning,
> >
> > We use solr to index a range of content to which, within our application,
> > access is restricted by a system of user groups and permissions.  In
> order
> > to ensure that search results don't reveal information about items which
> the
> > user doesn't have access to, we need to somehow filter the results; this
> > needs to be done within Solr itself, rather than after retrieval, so that
> > the facet and result counts are correct.
> >
> > Currently we do this by creating a filter query which specifies all of
> the
> > items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
> ...)),
> > but this has definite scalability issues - we're starting to run into
> > issues, as this can be a set of ORs of potentially unlimited size (and
> > practically, we're hitting the low thousands sometimes).  While we can
> > adjust maxBooleanClauses upwards, I understand that this has performance
> > implications...
> >
> > So, has anyone had to implement something similar in the past?  Any
> > suggestions for a more scalable approach?  Any advice on safe and
> sensible
> > limits on how far I can push maxBooleanClauses?
> >
> > Thanks for your advice,
> >
> > Liam
>
>
>
>



-- 
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

*Awarded 2010 "Best New Business" and "Business of the Year" - Business3000
Awards*

This email and any attachments are confidential and may contain legally
privileged information or copyright material. If you are not an intended
recipient, please contact us at once by return email and then delete both
messages. We do not accept liability in connection with transmission of
information using the internet.


Re: Solr and Permissions

2011-03-10 Thread go canal
To be fair, I think there is a slight difference between a Content Management 
and a Search Engine.

Access control at per document level, per type level, supporting dynamic role 
changes, etc.are more like  content management use cases; where search solution 
like Solr focuses on different set of use cases;

But in real world, any content management systems need full text search; so the 
question is to how to support search with permission control.

JackRabbit integrated with Lucene/Tika, this could be one solution but I do not 
know its performance and scalability;

CouchDB also integrates with Lucene/Tika, another option? 

I have yet to see a Search Engine that provides some sort of Content Management 
features like we are discussing here (Solr, Elastic Search ?)


Then the last option is probably to build an application that works with a 
document repository with all necessary content management features and Solr 
which provides search capability;  and handling the permissions outside Solr?
thanks,
canal





From: Liam O'Boyle 
To: solr-user@lucene.apache.org
Cc: go canal 
Sent: Fri, March 11, 2011 2:28:19 PM
Subject: Re: Solr and Permissions

As Canal points out,  grouping into types is not always possible.

In our case, permissions are not on a per-type level, but either on a per
"folder" (of which there can be hundreds) or per item in some cases (of
which there can be... any number at all).

Reindexing is also to slow to really be an option; some of the items use
Tika to extract content, which means that we need to reextract the content
(variable length of time; average is about half a second, but on some
documents it will sit there until the connection times out) .  Querying it,
modifying then resubmitting without rerunning content extraction is still
faster, but involves sending even more data over the network; either way is
relatively slow.

Liam

On 11 March 2011 16:24, go canal  wrote:

> I have similar requirements.
>
> Content type is one solution; but there are also other use cases where this
> not
> enough.
>
> Another requirement is, when the access permission is changed, we need to
> update
> the field - my understanding is we can not unless re-index the whole
> document
> again. Am I correct?
>  thanks,
> canal
>
>
>
>
> 
> From: Sujit Pal 
> To: solr-user@lucene.apache.org
> Sent: Fri, March 11, 2011 10:39:27 AM
> Subject: Re: Solr and Permissions
>
> How about assigning content types to documents in the index, and map
> users to a set of content types they are allowed to access? That way you
> will pass in fewer parameters in the fq.
>
> -sujit
>
> On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
> > Morning,
> >
> > We use solr to index a range of content to which, within our application,
> > access is restricted by a system of user groups and permissions.  In
> order
> > to ensure that search results don't reveal information about items which
> the
> > user doesn't have access to, we need to somehow filter the results; this
> > needs to be done within Solr itself, rather than after retrieval, so that
> > the facet and result counts are correct.
> >
> > Currently we do this by creating a filter query which specifies all of
> the
> > items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
> ...)),
> > but this has definite scalability issues - we're starting to run into
> > issues, as this can be a set of ORs of potentially unlimited size (and
> > practically, we're hitting the low thousands sometimes).  While we can
> > adjust maxBooleanClauses upwards, I understand that this has performance
> > implications...
> >
> > So, has anyone had to implement something similar in the past?  Any
> > suggestions for a more scalable approach?  Any advice on safe and
> sensible
> > limits on how far I can push maxBooleanClauses?
> >
> > Thanks for your advice,
> >
> > Liam
>
>
>
>



-- 
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

*Awarded 2010 "Best New Business" and "Business of the Year" - Business3000
Awards*

This email and any attachments are confidential and may contain legally
privileged information or copyright material. If you are not an intended
recipient, please contact us at once by return email and then delete both
messages. We do not accept liability in connection with transmission of
information using the internet.



  

Problem with copyfield

2011-03-10 Thread nidhi gupta
I want to implement type ahead styling feature for description field.For that I 
defined ngtext fieldtype.I indexed 

description as text and then using copyfield indexed into ngtext field.But I 
found out that it is not working.
If I put ngtext directly as a field type value without using copyfield it is 
working fine.
I am not able to understand the reason behind it?