date:20110602

Hi, I am creating indexes using solr which is running on jetty server port
8983, and my application is running on tomcat server port 8080. Now my
problem is i want to display the results of search on my application. i
created a ajax-javascript page for parsing Json object. now please suggest
me how i send my request to solr server for search and get back the result.

Here is my sample html file where i parsed Json data.



Solr Ajax Example





  query:   
  
  

Raw JSON String: 






I suppose i am making mistake in xmlhttpPost("/solr/db/select").

Thanks and regards
Romi.

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3014101.html
Sent from the Solr - User mailing list archive at Nabble.com.

tika and solr 3,1 integration

Hi

I am trying to integrate solr 3.1 and tika (which comes default with the
version)

and using curl command trying to index few of the documents, i am getting
this error. the error is attr_meta field is unknown. i checked the
solrconfig, it looks perfect to me.

can you please tell me what i am missing.

I copied all the jars from contrib/extraction/lib to solr/lib folder that is
there in same place where conf is there 


I am using the same request handler which is coming with default



  
  text
  true
  ignored_

  
  true
  links
  ignored_

  





* curl "
http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true";
-F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"*


Apache Tomcat/6.0.18 - Error report
HTTP Status 400 - ERROR:unknown field 'attr_meta'type Status reportmessage
ERROR:unknown field 'attr_meta'description The
request sent by the client was syntactically incorrect (ERROR:unknown field
'attr_meta').Apache
Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib#


Please note

i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows machine
and using solr cell

calling the program works fine without any changes in configuration.

Thanks
Naveen

how to request for Json object

How to parse Json through ajax when your ajax pager is on one
server(Tomcat)and Json object is of onther server(solr server). i mean i
have to make a request to another server, how can i do it .

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html
Sent from the Solr - User mailing list archive at Nabble.com.

Standard Request Handler Boosting

2011-06-02 Thread Sujatha Arun

I want to know  what is the difference between the normal Boosting and
boosting using functionQuery for the standard request handler

In the example below I want to boost the field 2 with higher influence on
score
Example:

 field1: field2:^boost value

Example  :

 field1 AND _val_:""^boost value

Regards
Sujatha

Re: how to request for Json object

2011-06-02 Thread olivier sallou

ajax does not allow request to an other domain.
Only sway, unless using server side requests, is going through a proxy that
would hide the host origin so that ajax request think both servers are the
same

2011/6/2 Romi 

> How to parse Json through ajax when your ajax pager is on one
> server(Tomcat)and Json object is of onther server(solr server). i mean i
> have to make a request to another server, how can i do it .
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: how to request for Json object

look at d uploaded file here here it is making request from my local server 
for Json to server "http://api.flickr.com 
i juss want the same i want to request for Json from local server to solr
server.
http://lucene.472066.n3.nabble.com/file/n3014191/Jquery_Json.html
Jquery_Json.html 

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014191.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to request for Json object

sorry for the inconvenience, please look at this file
http://lucene.472066.n3.nabble.com/file/n3014224/JsonJquery.text
JsonJquery.text 



-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014224.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr memory consumption

2011-06-02 Thread Denis Kuzmenok

> Hey Denis,

> * How big is your index in terms of number of documents and index size?
5  cores,  average  250.000  documents,  one with about 1 million (but
without  text,  just  int/float  fields),  one  with  about 10 million
id/name documents, but with n-gram.
Size: 4 databases about 1G (sum), 1 database (with n-gram) for 21G..
I  don't  know any other way to search for product names except n-gram
=\

> * Is it production system where you have many search requests?
Yes, dependent on database, but not less than 100 req/sec

> * Is there any pattern for OOM errors? I.e. right after you start your
> Solr app, after some search activity or specific Solr queries, etc?
No, java just raises memory size used all the time until it crush.

> * What are 1) cache settings 2) facets and sort-by fields 3) commit
> frequency and warmup queries?
All settings are default (as given in trunk / example).
Facets are used, sort by also used.
Commits  are  divided  into  2  groups:
- often but small (last changed
info)
- 1 time per day all the database

> etc

> Generally you might want to connect to your jvm using jconsole tool
> and monitor your heap usage (and other JVM/Solr numbers)

> * http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html
> * http://wiki.apache.org/solr/SolrJmx#Remote_Connection_to_Solr_JMX

> HTH,
> Alexey

Multilingual text analysis

2011-06-02 Thread Juan Antonio Farré Basurte

Hello,
Some of the possible analyzers that can be applied to a text field, depend on 
the language of the text to analyze and can be configured for a concrete 
language.
In my case, the text fields can be in many different languages, but each 
document also includes a field containing the language of text fields.
Is it possible to configure analyzers to use the suitable language for each 
document, in function of the language field?
Thanks,

Juan

Re: synonyms problem

2011-06-02 Thread Gora Mohanty

On Thu, Jun 2, 2011 at 11:58 AM, deniz  wrote:
> Hi all,
>
> here is a piece from my solfconfig:
[...]
> but somehow synonyms are not read... I mean there is no match when i use a
> word in the synonym file... any ideas?
[...]

Please provide further details, e.g., is your field in schema.xml using
this fieldType, one example line from the synonyms.txt file, how are
you searching, what results you expect to get, and what are the actual
results.

Also, while this is not the issue here, normally the fieldType
"string" is a non-analyzed field, and one would normally use
a different fieldType, e.g., "text" for data that are to be analyzed.

Regards,
Gora

Question about sorting by coordination factor

2011-06-02 Thread Jesus Gabriel y Galan

Hi,

I am trying to solve a sorting problem using Solr. The sorting requirements are
a bit complicated.
I have to sort the documents by three different criteria:

- First by number of keywords that match (coordination factor)
- Then, within the documents that match the same number of keywords, sort first
the documents that match a user value (country) and then the rest.
- Then within those two blocks, sort by a document value (popularity).

I have managed to make the second and third criteria to work, with a query like
this:

http://localhost:8983/solr/select/?q=description%3Afootball&version=2.2&start=0&rows=10&indent=on&qq=country_uk:true&sort=map%28query%28$qq,-1%29,0,999,1%29%20desc,popularity%20desc

This gets with the query function a positive value for the documents that match
the country, and a negative for the ones that don't, and then maps those ones
to 1, so I have two blocks of documents with sorting value of 1 and -1, which
works for me cause ties are then sorted by popularity. But as you see, this is
only searching for 1 keyword.

My problem comes with the first requirement when we search for more than one
keyword, because as I understand, I would like to sort by the coordination
factor, which is the number of query keywords that each document matches. The
problem is that there's no Function Query I can use to get that value, so I
don't know how to proceed. I was trying to understand if there was a way to
split the regular score into sets which should mean that the same number of
keywords was matched, but the score depends on different things, and the range
of values can be arbitrary, so I'm not able to make such a function.

Is there any solution to this?

Thanks,

Jesus.

Sorting algorithm

2011-06-02 Thread Richard Hodsdon

Hi,

I want to do a similar sorting function query to the way reddit handles its
ranking.
I have the date stored in a 


I also have the number of twitter, facebook and reads from our site stored.
below is the pseudo code that I want to work out.

var t = (CreationDate - 1131428803) / 1000;
var x = FacebookCount + TwitterCount + VoteCount - DownVoteCount;
var y = 0;
if (x > 0) {
   y = 1;
} else if (x == 0) {
  y = 0;
} else if (x < 0) {
  y = -1;
}
var z = 1;
var absX = Math.abs(x);
if (absX >= 1) {
  z = absX;
}
var ranking = (Math.log(z) / Math.LN10) + ((y * t) / 45000);

I have no Java experience so I cannot re-write it as a custom function.
This is my current query I am trying to use.

http://127.0.0.1:8983/solr/select?q.alt=*:*&fq=content_type:news&start=0&rows=10&wt=json&indent=on&omitHeader=true
&fl=id,name,excerpt,timestamp,domain,source,facebook,twitter,read,imageheight
&defType=dismax
&tt=div(sub(_val_:timestamp,1131428803),1000)
&xx=sub(sum(facebook,twitter,read),0)
&yy=map(query($xx),1,,1,map(query($xx),0,0,0,map(query($xx),-,-1,-1,0)))
&zz=map(abs(query($xx)),-9,0,1)
&sort=sum(div(log(query($zz)),ln(10)),div(product(query($yy),query($tt)),45000))
desc

Currently I am getting errors relating to my date field when trying to
convert it from the TrieDate to timestamp with the _val_:MyDateField.

Also I wanted to know if their is another way to do this? If my query is
even correct.

Thanks in advance

Richard


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014549.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms problem

Deniz,

it looks like you are missing an index anlayzer ? or have you removed
that for brevity ?

lee c

On 2 June 2011 10:41, Gora Mohanty  wrote:
> On Thu, Jun 2, 2011 at 11:58 AM, deniz  wrote:
>> Hi all,
>>
>> here is a piece from my solfconfig:
> [...]
>> but somehow synonyms are not read... I mean there is no match when i use a
>> word in the synonym file... any ideas?
> [...]
>
> Please provide further details, e.g., is your field in schema.xml using
> this fieldType, one example line from the synonyms.txt file, how are
> you searching, what results you expect to get, and what are the actual
> results.
>
> Also, while this is not the issue here, normally the fieldType
> "string" is a non-analyzed field, and one would normally use
> a different fieldType, e.g., "text" for data that are to be analyzed.
>
> Regards,
> Gora
>

Re: synonyms problem

oh and its a string field change this to be text if you need analysis

class="solr.StrField"

lee c

On 2 June 2011 11:45, lee carroll  wrote:
> Deniz,
>
> it looks like you are missing an index anlayzer ? or have you removed
> that for brevity ?
>
> lee c
>
> On 2 June 2011 10:41, Gora Mohanty  wrote:
>> On Thu, Jun 2, 2011 at 11:58 AM, deniz  wrote:
>>> Hi all,
>>>
>>> here is a piece from my solfconfig:
>> [...]
>>> but somehow synonyms are not read... I mean there is no match when i use a
>>> word in the synonym file... any ideas?
>> [...]
>>
>> Please provide further details, e.g., is your field in schema.xml using
>> this fieldType, one example line from the synonyms.txt file, how are
>> you searching, what results you expect to get, and what are the actual
>> results.
>>
>> Also, while this is not the issue here, normally the fieldType
>> "string" is a non-analyzed field, and one would normally use
>> a different fieldType, e.g., "text" for data that are to be analyzed.
>>
>> Regards,
>> Gora
>>
>

Re: Multilingual text analysis

Juan

I don't think so.

you can try indexing fields like myfield_en. myfield_fr, my field_xx
if you now what language you are dealing with at index and query time.

you can also have seperate cores for your documents for each language
if you don't want to complicate your schema
again you will need to know language at index and query time



On 2 June 2011 08:57, Juan Antonio Farré Basurte
 wrote:
> Hello,
> Some of the possible analyzers that can be applied to a text field, depend on 
> the language of the text to analyze and can be configured for a concrete 
> language.
> In my case, the text fields can be in many different languages, but each 
> document also includes a field containing the language of text fields.
> Is it possible to configure analyzers to use the suitable language for each 
> document, in function of the language field?
> Thanks,
>
> Juan

Re: synonyms problem

2011-06-02 Thread François Schiettecatte

Are you sure solr.StrField is the way to go with this? solr.StrField stores the 
entire text verbatim and I am pretty sure skips any analysis. Perhaps you 
should use solr.TextField instead.

François

On Jun 2, 2011, at 2:28 AM, deniz wrote:

> Hi all,
> 
> here is a piece from my solfconfig:   
> 
>  omitNorms="true">
>
>
> ignoreCase="true" expand="true"/>
>  
>
> 
> 
> but somehow synonyms are not read... I mean there is no match when i use a
> word in the synonym file... any ideas?
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3014006.html
> Sent from the Solr - User mailing list archive at Nabble.com.

query routing with shards

2011-06-02 Thread Dmitry Kan

Hello all,

We have currently several pretty fat logically isolated shards with the same
schema / solrconfig (indices are separate). We currently have one single
front end SOLR (1.4) for the client code calls. Since a client code query
usually hits only one shard, we are considering making a smart routing of
queries to the shards they map to. Can you please give some pointers as to
what would be an optimal way to achieve such a routing inside the front end
solr? Is there a way to configure mapping inside the solrconfig?

Thanks.

-- 
Regards,

Dmitry Kan

Re: How to display search results of solr in to other application.

this is from another post and could help

Can you use a javascript library which handles ajax and json / jsonp
You will end up with much cleaner client code for example a jquery
implementation looks quite nice using solrs neat jsonp support:

queryString = "*:*"
$.getJSON(
"http://[server]:[port]/solr/select/?jsoncallback=?";,
{"q": queryString,
"version": "2.2",
"start": "0",
"rows": "10",
"indent": "on",
"json.wrf": "callbackFunctionToDoSomethingWithOurData",
"wt": "json",
"fl": "field1"}
);

and the callback function

function callbackFunctionToDoSomethingWithOurData(solrData) {
   // do stuff with your nice data
}

Their is also a javascript client for solr as well but i've not used this

On 2 June 2011 08:14, Romi  wrote:
> Hi, I am creating indexes using solr which is running on jetty server port
> 8983, and my application is running on tomcat server port 8080. Now my
> problem is i want to display the results of search on my application. i
> created a ajax-javascript page for parsing Json object. now please suggest
> me how i send my request to solr server for search and get back the result.
>
> Here is my sample html file where i parsed Json data.
>
> 
> 
> Solr Ajax Example
>
> 
> 
>
> 
>  query: 
>  
>
> 
> Raw JSON String: 
> 
> 
> 
>
>
>
> I suppose i am making mistake in xmlhttpPost("/solr/db/select").
>
> Thanks and regards
> Romi.
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3014101.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: how to request for Json object

use solrs jasonp format



On 2 June 2011 08:54, Romi  wrote:
> sorry for the inconvenience, please look at this file
> http://lucene.472066.n3.nabble.com/file/n3014224/JsonJquery.text
> JsonJquery.text
>
>
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014224.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Result Grouping always returns grouped output

2011-06-02 Thread Martijn v Groningen

Hi Karel,

group.main=true should do the trick. When that is set to true the
group.format is always simple.

Martijn

On 27 May 2011 19:13, kare...@gmail.com  wrote:

> Hello,
>
> I am using the latest nightly build of Solr 4.0 and I would like to
> use grouping/field collapsing while maintaining compatibility with my
> current parser.  I am using the regular webinterface to test it, the
> same commands like in the wiki, just with the field names matching my
> dataset.
>
> Grouping itself works, group=true and group.field return the expected
> results, but neither group.main=true or group.format=simple seem to
> change anything.
>
> Do I have to include something special in solrconconfig.xml or
> scheme.xml to make the simple output work?
>
> Thanks for any hints,
> K
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: how to request for Json object

2011-06-02 Thread François Schiettecatte

This is not really an issue with SOLR per se, and I have run into this before, 
you will need to read up on 'Access-Control-Allow-Origin' which needs to be set 
in the http headers that your ajax pager is returning. Beware that not all 
browsers obey it and Olivier is right when he suggested creating a proxy, which 
I did.

François

On Jun 2, 2011, at 3:27 AM, Romi wrote:

> How to parse Json through ajax when your ajax pager is on one
> server(Tomcat)and Json object is of onther server(solr server). i mean i
> have to make a request to another server, how can i do it .
> 
> -
> Thanks & Regards
> Romi
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html
> Sent from the Solr - User mailing list archive at Nabble.com.

tika and solr 3,1 integration error

Hi

I am trying to integrate solr 3.1 and tika (which comes default with the
version)

and using curl command trying to index few of the documents, i am getting
this error. the error is attr_meta field is unknown. i checked the
solrconfig, it looks perfect to me.

can you please tell me what i am missing.

I copied all the jars from contrib/extraction/lib to solr/lib folder that is
there in same place where conf is there 


I am using the same request handler which is coming with default


> 
>   
>   text
>   true
>   ignored_
>
>   
>   true
>   links
>   ignored_
> 
>   
>
>
>
>
>
> * curl "
> http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true";
> -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"*
>
>
> Apache Tomcat/6.0.18 - Error report
> HTTP Status 400 - ERROR:unknown field 'attr_meta' size="1" noshade="noshade">type Status reportmessage
> ERROR:unknown field 'attr_meta'description The
> request sent by the client was syntactically incorrect (ERROR:unknown field
> 'attr_meta').Apache
> Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib#
>
>
> Please note
>
> i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows
> machine and using solr cell
>
> calling the program works fine without any changes in configuration.
>
> Thanks
> Naveen
>

>

Re: Question about sorting by coordination factor

Say you're trying to match terms A, B, C. Would something like

(A AND B AND C)^1000 OR (A AND B)^100 OR (A AND C)^100 OR (B AND
C)^100 OR A OR B OR C

work? It wouldn't be an absolute ordering, but it would tend to
push the documents where all three terms matched toward
the top.

It would get really cumbersome if there were lots of terms, but.

Best
Erick

On Thu, Jun 2, 2011 at 6:21 AM, Jesus Gabriel y Galan
 wrote:
> Hi,
>
> I am trying to solve a sorting problem using Solr. The sorting requirements
> are a bit complicated.
> I have to sort the documents by three different criteria:
>
> - First by number of keywords that match  (coordination factor)
> - Then, within the documents that match the same number of keywords, sort
> first the documents that match a user value (country) and then the rest.
> - Then within those two blocks, sort by a document value (popularity).
>
> I have managed to make the second and third criteria to work, with a query
> like this:
>
> http://localhost:8983/solr/select/?q=description%3Afootball&version=2.2&start=0&rows=10&indent=on&qq=country_uk:true&sort=map%28query%28$qq,-1%29,0,999,1%29%20desc,popularity%20desc
>
> This gets with the query function a positive value for the documents that
> match the country, and a negative for the ones that don't, and then maps
> those ones to 1, so I have two blocks of documents with sorting value of 1
> and -1, which works for me cause ties are then sorted by popularity.  But as
> you see, this is only searching for 1 keyword.
>
> My problem comes with the first requirement when we search for more than one
> keyword, because as I understand, I would like to sort by the coordination
> factor, which is the number of query keywords that each document matches.
> The problem is that there's no Function Query I can use to get that value,
> so I don't know how to proceed. I was trying to understand if there was a
> way to split the regular score into sets which should mean that the same
> number of keywords was matched, but the score depends on different things,
> and the range of values can be arbitrary, so I'm not able to make such a
> function.
>
> Is there any solution to this?
>
> Thanks,
>
> Jesus.
>

Function Query not getting picked up by Standard Query Parser

2011-06-02 Thread Savvas-Andreas Moysidis

Hello,

I'm trying to find out why my Function Query isn't getting picked up by the
Standard Parser.
More specifically I send the following set of http params (I'm using the
"_val_" syntax):
.


"creationDate"^0.01
on
225
allFields:(born to be wild)
5

.

and turning on Debug Query yields the following calculation for the first
result:
.

0.29684606 = (MATCH) product of:
  0.5936921 = (MATCH) sum of:
0.5936921 = (MATCH) weight(allFields:wild in 13093), product of:
  0.64602524 = queryWeight(allFields:wild), product of:
5.88155 = idf(docFreq=223, maxDocs=29531)
0.10983928 = queryNorm
  0.91899216 = (MATCH) fieldWeight(allFields:wild in 13093), product of:
1.0 = tf(termFreq(allFields:wild)=1)
5.88155 = idf(docFreq=223, maxDocs=29531)
0.15625 = fieldNorm(field=allFields, doc=13093)
  0.5 = coord(1/2)
.

but I don't see anywhere my Function Query affecting the score..
Is there something else I should be setting? what am I missing?

Cheers,
Savvas

Re: how to request for Json object

just to re-iterate jasonp gets round ajax same server policy



2011/6/2 François Schiettecatte :
> This is not really an issue with SOLR per se, and I have run into this 
> before, you will need to read up on 'Access-Control-Allow-Origin' which needs 
> to be set in the http headers that your ajax pager is returning. Beware that 
> not all browsers obey it and Olivier is right when he suggested creating a 
> proxy, which I did.
>
> François
>
> On Jun 2, 2011, at 3:27 AM, Romi wrote:
>
>> How to parse Json through ajax when your ajax pager is on one
>> server(Tomcat)and Json object is of onther server(solr server). i mean i
>> have to make a request to another server, how can i do it .
>>
>> -
>> Thanks & Regards
>> Romi
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: how to request for Json object

I did this:

 $(document).ready(function(){


$.getJSON("http://[remotehost]:8983/solr/select/?q=diamond&wt=json&json.wrf=?";,
function(result){

 alert("hello" + result.response.docs[0].name);
});
});


But i am not getting any result, what i did wrong ??

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014792.html
Sent from the Solr - User mailing list archive at Nabble.com.

'deltaImportQuery' attribute is not specified for entity : user

2011-06-02 Thread ZiLi

Hi,I'm try to build a delta index .
I really have a entity calls 'user' in data-config.xml like 

'

Re: How to display search results of solr in to other application.

I did this:

 $(document).ready(function(){


$.getJSON("http://[remotehost]:8983/solr/select/?q=diamond&wt=json&json.wrf=?";,
function(result){

 alert("hello" + result.response.docs[0].name);
});
});


But i am not getting any result, what i did wrong ?? 

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3014797.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: 'deltaImportQuery' attribute is not specified for entity : user

take a look at the following url it might help you.
http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014805.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Function Query not getting picked up by Standard Query Parser

2011-06-02 Thread Erik Hatcher

For this to work, _val_:"" goes *in* the q parameter, not as a separate 
parameter.

See here for more details: 
http://wiki.apache.org/solr/SolrQuerySyntax#Differences_From_Lucene_Query_Parser

Erik

On Jun 2, 2011, at 07:46 , Savvas-Andreas Moysidis wrote:

> Hello,
> 
> I'm trying to find out why my Function Query isn't getting picked up by the
> Standard Parser.
> More specifically I send the following set of http params (I'm using the
> "_val_" syntax):
> .
> 
> 
> "creationDate"^0.01
> on
> 225
> allFields:(born to be wild)
> 5
> 
> .
> 
> and turning on Debug Query yields the following calculation for the first
> result:
> .
> 
> 0.29684606 = (MATCH) product of:
>  0.5936921 = (MATCH) sum of:
>0.5936921 = (MATCH) weight(allFields:wild in 13093), product of:
>  0.64602524 = queryWeight(allFields:wild), product of:
>5.88155 = idf(docFreq=223, maxDocs=29531)
>0.10983928 = queryNorm
>  0.91899216 = (MATCH) fieldWeight(allFields:wild in 13093), product of:
>1.0 = tf(termFreq(allFields:wild)=1)
>5.88155 = idf(docFreq=223, maxDocs=29531)
>0.15625 = fieldNorm(field=allFields, doc=13093)
>  0.5 = coord(1/2)
> .
> 
> but I don't see anywhere my Function Query affecting the score..
> Is there something else I should be setting? what am I missing?
> 
> Cheers,
> Savvas

Re: Sorting algorithm

2011-06-02 Thread Tomás Fernández Löbbe

Hi Richard, all your data seem to be available at indexing time, am I
correct? Why don't you do the math at index time and just index the result
on a field, on which you can sort later at query time?


On Thu, Jun 2, 2011 at 7:26 AM, Richard Hodsdon
wrote:

> Hi,
>
> I want to do a similar sorting function query to the way reddit handles its
> ranking.
> I have the date stored in a
>  precisionStep="6" positionIncrementGap="0"/>
>
> I also have the number of twitter, facebook and reads from our site stored.
> below is the pseudo code that I want to work out.
>
> var t = (CreationDate - 1131428803) / 1000;
> var x = FacebookCount + TwitterCount + VoteCount - DownVoteCount;
> var y = 0;
> if (x > 0) {
>   y = 1;
> } else if (x == 0) {
>  y = 0;
> } else if (x < 0) {
>  y = -1;
> }
> var z = 1;
> var absX = Math.abs(x);
> if (absX >= 1) {
>  z = absX;
> }
> var ranking = (Math.log(z) / Math.LN10) + ((y * t) / 45000);
>
> I have no Java experience so I cannot re-write it as a custom function.
> This is my current query I am trying to use.
>
>
> http://127.0.0.1:8983/solr/select?q.alt=*:*&fq=content_type:news&start=0&rows=10&wt=json&indent=on&omitHeader=true
>
> &fl=id,name,excerpt,timestamp,domain,source,facebook,twitter,read,imageheight
> &defType=dismax
> &tt=div(sub(_val_:timestamp,1131428803),1000)
> &xx=sub(sum(facebook,twitter,read),0)
>
> &yy=map(query($xx),1,,1,map(query($xx),0,0,0,map(query($xx),-,-1,-1,0)))
> &zz=map(abs(query($xx)),-9,0,1)
>
> &sort=sum(div(log(query($zz)),ln(10)),div(product(query($yy),query($tt)),45000))
> desc
>
> Currently I am getting errors relating to my date field when trying to
> convert it from the TrieDate to timestamp with the _val_:MyDateField.
>
> Also I wanted to know if their is another way to do this? If my query is
> even correct.
>
> Thanks in advance
>
> Richard
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014549.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: how to request for Json object

 
$.getJSON("http://192.168.1.9:8983/solr/db/select/?q=diamond&wt=json&json.wrf=?";,
function(result){

 alert("hello" + result.response.docs[0].name);
});
});

using this i got the result. But as you can see it is hard coded, i am
passing a query in the url how can i make it as user choice.


-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014928.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: NRT facet search options comparison

2011-06-02 Thread Nagendra Nagarajayya


Andy:

I did not actually measure and benchmark facet search performance with 
and without NRT. The screenshots though show the performance impact if 
you look at the  QTime parameter in Fig 1 and Fig 2. I did not notice 
any appreciable difference in performance when I tested faceting with 
NRT. The current implementation just recreates the UnInvertedField cache 
as this was easier to implement. This needs be dynamic and not recreated.


Regards,
- NN

On 6/1/2011 8:53 PM, Andy wrote:

Nagendra,

Thanks. Can you comment on the performance impact of NRT on facet search? The 
pages you linked to don't really touch on that.

My concern is that with NRT, the facet cache will be constantly invalidated. 
How will that impact the performance of faceting?

Do you have any benchmark comparing the performance of facet search with and 
without NRT?

Thanks

Andy


--- On Wed, 6/1/11, Nagendra Nagarajayya  wrote:


From: Nagendra Nagarajayya
Subject: Re: NRT facet search options comparison
To: solr-user@lucene.apache.org
Date: Wednesday, June 1, 2011, 11:29 PM
Hi Andy:

Here is a white paper that shows screenshots of faceting
working with
Solr and RankingAlgorithm under NRT:
http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search

The implementation (src) is also available with the
download and is
described in the below document:
http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf

The faceting test was done with the mbartists demo from the
book,
Solr-14-Enterprise-Search-Server and is approx around 390k
docs.

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.com
http://rankingalgorithm.tgels.com

On 6/1/2011 12:52 PM, Andy wrote:

Hi,

I need to provide NRT search with faceting. Been

looking at the options out there. Wondered if anyone could
clarify some questions I have and perhaps share your NRT
experiences.

The various NRT options:

1) Solr
-Solr doesn't have NRT, yet. What is the expected time

frame for NRT? Is it a few months or more like a year?

-How would Solr faceting work with NRT? My

understanding is that faceting in Solr relies on caching,
which doesn't go well with NRT updates. When NRT arrives,
would facet performance take a huge drop when using with NRT
because of this caching issue?

2) ElasticSearch
-ES supports NRT so that's great. Does anyone have

experiences with ES that they could share? Does faceting
work with NRT in ES? Any Solr features that are missing in
ES?

3) Solr-RA
-Read in this list about Solr-RA, which has NRT

support. Has anyone used it? Can you share your
experiences?

-Again not sure if facet would work with Solr-RA NRT.

Solr-RA is based on Solr, so faceting in Solr-RA relies on
caching I suppose. Does NRT affect facet performance?

4) Zoie plugin for Solr
-Zoie is a NRT search library. I tried but couldn't

get the Zoie plugin to work with Solr. Always got the error
message of opening too many Searchers. Has anyone got this
to work?

Any other options?

Thanks
Andy

how to make getJson parameter dynamic

 
$.getJSON("http://192.168.1.9:8983/solr/db/select/?q=diamond&wt=json&json.wrf=?";,
function(result){

 alert("hello" + result.response.docs[0].name);
});
});

using this i am parsing solr json response, but as you can see it is hard
coded (q=diamond) how can i make it user's choice. i mean user can pass the
query at run time for example using a text box.

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-make-getJson-parameter-dynamic-tp3014941p3014941.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sorting algorithm

2011-06-02 Thread Richard Hodsdon

Thanks for the response,

You are correct, but my pseudo code was not.
this line
var t = (CreationDate - 1131428803) / 1000; 
should be 
var t = (CreationDate - now()) / 1000; 

This will cause the items ranking to depreciate over time.

Richard


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014961.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about sorting by coordination factor

2011-06-02 Thread Jesus Gabriel y Galan


On 02/06/11 13:32, Erick Erickson wrote:

Say you're trying to match terms A, B, C. Would something like

(A AND B AND C)^1000 OR (A AND B)^100 OR (A AND C)^100 OR (B AND
C)^100 OR A OR B OR C

work? It wouldn't be an absolute ordering, but it would tend to
push the documents where all three terms matched toward
the top.


The problem with this is that that would give better score to the documents 
with most number of matches, but then I have to sort internally those groups. 
So I'd need a sort=score,xxx,yyy and the score would not be equal for the 
documents which match the same number of keywords.
I would need to have as many groups as keywords, and within each group all 
documents need to have the same value for that sorting criteria (score or a 
function or whatever), so that they tie, and they move to the next sorting 
criteria.

Thanks,

Jesus.

Re: Function Query not getting picked up by Standard Query Parser

2011-06-02 Thread Savvas-Andreas Moysidis

great, that did it! I can now see the Function Query part in the
calculation.

Thanks very much Eric,
Savvas

On 2 June 2011 13:28, Erik Hatcher  wrote:

> For this to work, _val_:"" goes *in* the q parameter, not as a separate
> parameter.
>
> See here for more details:
> http://wiki.apache.org/solr/SolrQuerySyntax#Differences_From_Lucene_Query_Parser
>
>Erik
>
> On Jun 2, 2011, at 07:46 , Savvas-Andreas Moysidis wrote:
>
> > Hello,
> >
> > I'm trying to find out why my Function Query isn't getting picked up by
> the
> > Standard Parser.
> > More specifically I send the following set of http params (I'm using the
> > "_val_" syntax):
> > .
> >
> > 
> > "creationDate"^0.01
> > on
> > 225
> > allFields:(born to be wild)
> > 5
> > 
> > .
> >
> > and turning on Debug Query yields the following calculation for the first
> > result:
> > .
> >
> > 0.29684606 = (MATCH) product of:
> >  0.5936921 = (MATCH) sum of:
> >0.5936921 = (MATCH) weight(allFields:wild in 13093), product of:
> >  0.64602524 = queryWeight(allFields:wild), product of:
> >5.88155 = idf(docFreq=223, maxDocs=29531)
> >0.10983928 = queryNorm
> >  0.91899216 = (MATCH) fieldWeight(allFields:wild in 13093), product
> of:
> >1.0 = tf(termFreq(allFields:wild)=1)
> >5.88155 = idf(docFreq=223, maxDocs=29531)
> >0.15625 = fieldNorm(field=allFields, doc=13093)
> >  0.5 = coord(1/2)
> > .
> >
> > but I don't see anywhere my Function Query affecting the score..
> > Is there something else I should be setting? what am I missing?
> >
> > Cheers,
> > Savvas
>
>

Re: Question about sorting by coordination factor

Ahhh, you're right. I know there's been some discussion in the past about
how to find out the number of terms that matched, but don't remember the
outcome off-hand. You might try searching the mail archive for something like
"number of matching terms" or some such.

Sorry I'm not more help
Erick

On Thu, Jun 2, 2011 at 8:48 AM, Jesus Gabriel y Galan
 wrote:
> On 02/06/11 13:32, Erick Erickson wrote:
>>
>> Say you're trying to match terms A, B, C. Would something like
>>
>> (A AND B AND C)^1000 OR (A AND B)^100 OR (A AND C)^100 OR (B AND
>> C)^100 OR A OR B OR C
>>
>> work? It wouldn't be an absolute ordering, but it would tend to
>> push the documents where all three terms matched toward
>> the top.
>
> The problem with this is that that would give better score to the documents
> with most number of matches, but then I have to sort internally those
> groups. So I'd need a sort=score,xxx,yyy and the score would not be equal
> for the documents which match the same number of keywords.
> I would need to have as many groups as keywords, and within each group all
> documents need to have the same value for that sorting criteria (score or a
> function or whatever), so that they tie, and they move to the next sorting
> criteria.
>
> Thanks,
>
> Jesus.
>

Re: Sorting algorithm

2011-06-02 Thread Tomás Fernández Löbbe

OK, then (everything that's available at index time, I'll say it's
constant):
(Math.log(z) / Math.LN10) (not sure what you mean with Math.LN10) is
constant, I'll call it c1

((y * t) / 45000) = (y/4500)*t --> y/4500 is constant, I'll call it c2.

c1+(c2 * t) = c1 + (c2 * (CreationDate - now) / 1000) --> c2 / 1000 is also
constant, I'll call it c3.

Then, your ranking formula is: c1 + (c3 * (creationDate - now)).

In solr, this will be: &sort=sum(c1,product(c3,ms(creationDate, NOW))).

I haven't tried it but if my arithmetics are correct (I'm a little bit rusty
with that), that should work and should be faster than doing the whole thing
at query time. Of course, "c1" and "c3" must be indexed as fields.

Regards,

Tomás
On Thu, Jun 2, 2011 at 9:46 AM, Richard Hodsdon
wrote:

> Thanks for the response,
>
> You are correct, but my pseudo code was not.
> this line
> var t = (CreationDate - 1131428803) / 1000;
> should be
> var t = (CreationDate - now()) / 1000;
>
> This will cause the items ranking to depreciate over time.
>
> Richard
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014961.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Multilingual text analysis

2011-06-02 Thread Paul Libbrecht

Juan,

An easy way in solr, I think, is indeed to use different fields at index time 
and expand on multiple fields at query time.
I believe using field-names' wildcards allows you to specify a different 
analyzer per language doing this.

There's been long discussions on the java-u...@lucene.apache.org mailing-list 
about the best design for multilingual indexing and searching. One of the key 
arguments was wether you were able to detect with faithfulness the language of 
a query, this is generally very hard.

It would make sense to start a page at the solr website...

paul

Le 2 juin 2011 à 12:52, lee carroll a écrit :

> Juan
> 
> I don't think so.
> 
> you can try indexing fields like myfield_en. myfield_fr, my field_xx
> if you now what language you are dealing with at index and query time.
> 
> you can also have seperate cores for your documents for each language
> if you don't want to complicate your schema
> again you will need to know language at index and query time
> 
> 
> 
> On 2 June 2011 08:57, Juan Antonio Farré Basurte
>  wrote:
>> Hello,
>> Some of the possible analyzers that can be applied to a text field, depend 
>> on the language of the text to analyze and can be configured for a concrete 
>> language.
>> In my case, the text fields can be in many different languages, but each 
>> document also includes a field containing the language of text fields.
>> Is it possible to configure analyzers to use the suitable language for each 
>> document, in function of the language field?
>> Thanks,
>> 
>> Juan

Re: Multilingual text analysis

2011-06-02 Thread Juan Antonio Farré Basurte

Thank you both Paul and Lee for your answer.
Luckily in my case there's no problem about knowing language at index time nor 
we have really to bother about the language of the query, as users can specify 
the language they are interested in.
So I guess our solution would be to use different optional fields, one for each 
language and that should be good enough.
I just had wondered whether it was possible to parametrize the analyzers in 
function of one field value. I think this would be a very elegant solution for 
many needs. May it could be a possible improvement for future versions of solr 
:)

Paul, what do you mean when you say it would make sense to start a page at the 
solr website?

Thanks again,

Juan

El 02/06/2011, a las 16:06, Paul Libbrecht escribió:

> Juan,
> 
> An easy way in solr, I think, is indeed to use different fields at index time 
> and expand on multiple fields at query time.
> I believe using field-names' wildcards allows you to specify a different 
> analyzer per language doing this.
> 
> There's been long discussions on the java-u...@lucene.apache.org mailing-list 
> about the best design for multilingual indexing and searching. One of the key 
> arguments was wether you were able to detect with faithfulness the language 
> of a query, this is generally very hard.
> 
> It would make sense to start a page at the solr website...
> 
> paul
> 
> 
> Le 2 juin 2011 à 12:52, lee carroll a écrit :
> 
>> Juan
>> 
>> I don't think so.
>> 
>> you can try indexing fields like myfield_en. myfield_fr, my field_xx
>> if you now what language you are dealing with at index and query time.
>> 
>> you can also have seperate cores for your documents for each language
>> if you don't want to complicate your schema
>> again you will need to know language at index and query time
>> 
>> 
>> 
>> On 2 June 2011 08:57, Juan Antonio Farré Basurte
>>  wrote:
>>> Hello,
>>> Some of the possible analyzers that can be applied to a text field, depend 
>>> on the language of the text to analyze and can be configured for a concrete 
>>> language.
>>> In my case, the text fields can be in many different languages, but each 
>>> document also includes a field containing the language of text fields.
>>> Is it possible to configure analyzers to use the suitable language for each 
>>> document, in function of the language field?
>>> Thanks,
>>> 
>>> Juan
>

Re: Multilingual text analysis

2011-06-02 Thread Paul Libbrecht



Le 2 juin 2011 à 16:27, Juan Antonio Farré Basurte a écrit :

> Paul, what do you mean when you say it would make sense to start a page at 
> the solr website?

I meant the solr wiki.

> I just had wondered whether it was possible to parametrize the analyzers in 
> function of one field value. I think this would be a very elegant solution 
> for many needs. May it could be a possible improvement for future versions of 
> solr :)

Honestly, I think it is of utmost importance for a CMS manager to kind of know 
"how much stemming" one wishes... so configuring which analyzer is used for 
which language is, I think really useful and the schema is easy to write that.

In one of my search projects, I have a series of unit-tests that all fail 
because the analyzer, say, for Arabic or Hungarian, was not "good enough"... 
this always happens and it's better to be aware of that.

paul

Re: How to display search results of solr in to other application.

did you include the jquery lib,
make sure you use the jsasoncallback

ie
$.getJSON(
   "http://[server]:[port]/solr/select/?jsoncallback=?";,
   {"q": queryString,
   "version": "2.2",
   "start": "0",
   "rows": "10",
   "indent": "on",
   "json.wrf": "callbackFunctionToDoSomethingWithOurData",
   "wt": "json",
   "fl": "field1"}
   );

not what you have got



On 2 June 2011 13:00, Romi  wrote:
> I did this:
>
>  $(document).ready(function(){
>
>
> $.getJSON("http://[remotehost]:8983/solr/select/?q=diamond&wt=json&json.wrf=?";,
> function(result){
>
>  alert("hello" + result.response.docs[0].name);
> });
> });
>
>
> But i am not getting any result, what i did wrong ??
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3014797.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: how to make getJson parameter dynamic

Hi Romi, this is the third thread you have created on this subject.
Its not good and will get you ignored by many people who could help.

The question relates to js rather than SOLR now. See any good js
manual or site for how to assign values to a variable and then
concatanate these into a string.

lee c

On 2 June 2011 13:40, Romi  wrote:
>  $.getJSON("http://192.168.1.9:8983/solr/db/select/?q=diamond&wt=json&json.wrf=?";,
> function(result){
>
>  alert("hello" + result.response.docs[0].name);
> });
> });
>
> using this i am parsing solr json response, but as you can see it is hard
> coded (q=diamond) how can i make it user's choice. i mean user can pass the
> query at run time for example using a text box.
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-make-getJson-parameter-dynamic-tp3014941p3014941.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Need Schema help

2011-06-02 Thread Denis Kuzmenok

Hi)

What i need:
Index  prices  to  products, each product has multiple prices, to each
region, country, and price itself.
I   tried   to  do  with  field  type  "long"  multiple:true, and form
value  as  "country  code  +  region code + price" (1004000349601, for
example), but it has strange behaviour.. price:[* TO 1004000349600] do
include 1004000349601.. I am doing something wrong?

Possible data:
Country: 1-9
Region: 0-99
Price: 1-999

Re: Need Schema help

Denis,

would dynamic fields help:
field defined as *_price in schema

at index time you index fields named like:
[1-9]_[0-99]_price

at query time you search the price field for a given country region
1_10_price:[10 TO 100]

This may work for some use-cases i guess

lee

2011/6/2 Denis Kuzmenok :
> Hi)
>
> What i need:
> Index  prices  to  products, each product has multiple prices, to each
> region, country, and price itself.
> I   tried   to  do  with  field  type  "long"  multiple:true, and form
> value  as  "country  code  +  region code + price" (1004000349601, for
> example), but it has strange behaviour.. price:[* TO 1004000349600] do
> include 1004000349601.. I am doing something wrong?
>
> Possible data:
> Country: 1-9
> Region: 0-99
> Price: 1-999
>
>

Re: return unaltered complete multivalued fields with Highlighted results

2011-06-02 Thread alexei

Hi,

Here is the code for Solr 3.1 that will preserve all the text and will
disable sorting.

This goes in solrconfig.xml request handler config or which ever way you
pass params:
 true

This line goes into HighlightParams class:
  public static final String PRESERVE_ORDER = HIGHLIGHT + ".preserveOrder";

Replace this method DefaultSolrHighlighter.doHighlightingByHighlighter (I
only added 3 if blocks):

  private void doHighlightingByHighlighter( Query query, SolrQueryRequest
req, NamedList docSummaries,
  int docId, Document doc, String fieldName ) throws IOException {
SolrParams params = req.getParams(); 
String[] docTexts = doc.getValues(fieldName);
// according to Document javadoc, doc.getValues() never returns null.
check empty instead of null
if (docTexts.length == 0) return;

SolrIndexSearcher searcher = req.getSearcher();
IndexSchema schema = searcher.getSchema();
TokenStream tstream = null;
int numFragments = getMaxSnippets(fieldName, params);
boolean mergeContiguousFragments = isMergeContiguousFragments(fieldName,
params);

String[] summaries = null;
List frags = new ArrayList();

TermOffsetsTokenStream tots = null; // to be non-null iff we're using
TermOffsets optimization
try {
TokenStream tvStream =
TokenSources.getTokenStream(searcher.getReader(), docId, fieldName);
if (tvStream != null) {
  tots = new TermOffsetsTokenStream(tvStream);
}
}
catch (IllegalArgumentException e) {
  // No problem. But we can't use TermOffsets optimization.
}

for (int j = 0; j < docTexts.length; j++) {
  if( tots != null ) {
// if we're using TermOffsets optimization, then get the next
// field value's TokenStream (i.e. get field j's TokenStream) from
tots:
tstream = tots.getMultiValuedTokenStream( docTexts[j].length() );
  } else {
// fall back to analyzer
tstream = createAnalyzerTStream(schema, fieldName, docTexts[j]);
  }
   
  Highlighter highlighter;
  if
(Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER,
"true"))) {
// TODO: this is not always necessary - eventually we would like to
avoid this wrap
//   when it is not needed.
tstream = new CachingTokenFilter(tstream);

// get highlighter
highlighter = getPhraseHighlighter(query, fieldName, req,
(CachingTokenFilter) tstream);
 
// after highlighter initialization, reset tstream since
construction of highlighter already used it
tstream.reset();
  }
  else {
// use "the old way"
highlighter = getHighlighter(query, fieldName, req);
  }
  
  int maxCharsToAnalyze = params.getFieldInt(fieldName,
  HighlightParams.MAX_CHARS,
  Highlighter.DEFAULT_MAX_CHARS_TO_ANALYZE);
  if (maxCharsToAnalyze < 0) {
highlighter.setMaxDocCharsToAnalyze(docTexts[j].length());
  } else {
highlighter.setMaxDocCharsToAnalyze(maxCharsToAnalyze);
  }

  try {
TextFragment[] bestTextFragments =
highlighter.getBestTextFragments(tstream, docTexts[j],
mergeContiguousFragments, numFragments);
for (int k = 0; k < bestTextFragments.length; k++) {
  if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {   

if ((bestTextFragments[k] != null) ){//&&
(bestTextFragments[k].getScore() > 0)) {
  frags.add(bestTextFragments[k]);
}
  }
  else {
if ((bestTextFragments[k] != null) &&
(bestTextFragments[k].getScore() > 0)) {
  frags.add(bestTextFragments[k]);
}
  }
}
  } catch (InvalidTokenOffsetsException e) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
  }
}
// sort such that the fragments with the highest score come first
if (!params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {
Collections.sort(frags, new Comparator() {
  public int compare(TextFragment arg0, TextFragment arg1) {
return Math.round(arg1.getScore() - arg0.getScore());
  }
});
}

 // convert fragments back into text
 // TODO: we can include score and position information in output as
snippet attributes
if (frags.size() > 0) {
  ArrayList fragTexts = new ArrayList();
  for (TextFragment fragment: frags) {
if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {  
if ((fragment != null) ){// && (fragment.getScore() > 0)) {
  fragTexts.add(fragment.toString());
}
if (fragTexts.size() >= numFragments) break;
} else {
if ((fragment != null) && (fragment.getScore() > 0)) {
  fragTexts.add(fragment.toString());
}

RE: Spellcheck Phrases

2011-06-02 Thread Dyer, James

Actually, someone just pointed out to me that a patch like this is unnecessary. 
 The code works as-is if configured like this:

.01  (correct)

instead of this:

.01 (incorrect)

I tested this and it seems to work.  I'm still am trying to figure out if using 
this parameter actually improves the quality of our spell suggestions, now that 
I know how to use it properly.

Sorry about the mis-information earlier.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Dyer, James 
Sent: Wednesday, June 01, 2011 3:02 PM
To: solr-user@lucene.apache.org
Subject: RE: Spellcheck Phrases

Tanner,

I just entered SOLR-2571 to fix the float-parsing-bug that breaks 
"thresholdTokenFrequency".  Its just a 1-line code fix so I also included a 
patch that should cleanly apply to solr 3.1.  See 
https://issues.apache.org/jira/browse/SOLR-2571 for info and patches.

This parameter appears absent from the wiki.  And as it has always been broken 
for me, I haven't tested it.  However, my understanding it should be set as the 
minimum percentage of documents in which a term has to occur in order for it to 
appear in the spelling dictionary.  For instance in the config below, a term 
would have to occur in at least 1% of the documents for it to be part of the 
spelling dictionary.  This might be a good setting for long fields but for the 
short fields in my application, I was thinking of setting this to something 
like 1/1000 of 1% ...

 text

  spellchecker
  Spelling_Dictionary
  text
  ./spellchecker
  .01 

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Tanner Postert [mailto:tanner.post...@gmail.com] 
Sent: Friday, May 27, 2011 6:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck Phrases

are there any updates on this? any third party apps that can make this work
as expected?

On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James wrote:

> Tanner,
>
> Currently Solr will only make suggestions for words that are not in the
> dictionary, unless you specifiy "spellcheck.onlyMorePopular=true".  However,
> if you do that, then it will try to "improve" every word in your query, even
> the ones that are spelled correctly (so while it might change "brake" to
> "break" it might also change "leg" to "log".)
>
> You might be able to alleviate some of the pain by setting the
> "thresholdTokenFrequency" so as to remove misspelled and rarely-used words
> from your dictionary, although I personally haven't been able to get this
> parameter to work.  It also doesn't seem to be documented on the wiki but it
> is in the 1.4.1. source code, in class IndexBasedSpellChecker.  Its also
> mentioned in Smiley&Pugh's book.  I tried setting it like this, but got a
> ClassCastException on the float value:
>
> 
>  text_spelling
>  
>  spellchecker
>  Spelling_Dictionary
>  text_spelling
>  true
>  .001
>  
> 
>
> I have it on my to-do list to look into this further but haven't yet.  If
> you decide to try it and can get it to work, please let me know how you do
> it.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
> -Original Message-
> From: Tanner Postert [mailto:tanner.post...@gmail.com]
> Sent: Wednesday, February 23, 2011 12:53 PM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck Phrases
>
> right now when I search for 'brake a leg', solr returns valid results with
> no indication of misspelling, which is understandable since all of those
> terms are valid words and are probably found in a few pieces of our
> content.
> My question is:
>
> is there any way for it to recognize that the phase should be "break a leg"
> and not "brake a leg" and suggest the proper phrase?
>

Re: Need Schema help

2011-06-02 Thread Denis Kuzmenok

Thursday, June 2, 2011, 6:29:23 PM, you wrote:
Wow. This sounds nice. Will try this way. Thanks!

> Denis,

> would dynamic fields help:
> field defined as *_price in schema

> at index time you index fields named like:
> [1-9]_[0-99]_price

> at query time you search the price field for a given country region
> 1_10_price:[10 TO 100]

> This may work for some use-cases i guess

> lee

SolrJ and Range Faceting

2011-06-02 Thread Jamie Johnson

Currently the range and date faceting in SolrJ acts a bit differently than I
would expect.  Specifically, range facets aren't parsed at all and date
facets end up generating filterQueries which don't have the range, just the
lower bound.  Is there a reason why SolrJ doesn't support these?  I have
written some things on my end to handle these and generate filterQueries for
date ranges of the form dateTime:[start TO end] and I have a function (which
I copied from the date faceting) which parses the range facets, but would
prefer not to have to maintain these myself.  Is there a plan to implement
these?  Also is there a plan to update FacetField to not have end be a date,
perhaps making it a String like start so we can support date and range
queries?

Re: return unaltered complete multivalued fields with Highlighted results

Hmmm, I don't know a thing about the highlighter code, but if you can just make
a patch and create a JIRA (https://issues.apache.org/jira/browse/SOLR)
and attach it, it'll get "in the system".

I suspect you've seen this page, but just in case:
http://wiki.apache.org/solr/HowToContribute
See, especially, "Yonik's Law of patches" on that page...


Two questions:
1> after your changes, could you successfully run "ant test"?
2> can you supply any unit tests that illustrated the correct behavior here?

Even if both answers are "no", it's still probably a good idea to submit the
patch.

Although first it might be a good idea to discuss this on the dev list
(d...@lucene.apache.org) before opening a JIRA, it's possible that
there's something similar in the works already...

Best
Erick


On Thu, Jun 2, 2011 at 11:31 AM, alexei  wrote:
> Hi,
>
> Here is the code for Solr 3.1 that will preserve all the text and will
> disable sorting.
>
> This goes in solrconfig.xml request handler config or which ever way you
> pass params:
>     true
>
> This line goes into HighlightParams class:
>  public static final String PRESERVE_ORDER = HIGHLIGHT + ".preserveOrder";
>
> Replace this method DefaultSolrHighlighter.doHighlightingByHighlighter (I
> only added 3 if blocks):
>
>  private void doHighlightingByHighlighter( Query query, SolrQueryRequest
> req, NamedList docSummaries,
>      int docId, Document doc, String fieldName ) throws IOException {
>    SolrParams params = req.getParams();
>    String[] docTexts = doc.getValues(fieldName);
>    // according to Document javadoc, doc.getValues() never returns null.
> check empty instead of null
>    if (docTexts.length == 0) return;
>
>    SolrIndexSearcher searcher = req.getSearcher();
>    IndexSchema schema = searcher.getSchema();
>    TokenStream tstream = null;
>    int numFragments = getMaxSnippets(fieldName, params);
>    boolean mergeContiguousFragments = isMergeContiguousFragments(fieldName,
> params);
>
>    String[] summaries = null;
>    List frags = new ArrayList();
>
>    TermOffsetsTokenStream tots = null; // to be non-null iff we're using
> TermOffsets optimization
>    try {
>        TokenStream tvStream =
> TokenSources.getTokenStream(searcher.getReader(), docId, fieldName);
>        if (tvStream != null) {
>          tots = new TermOffsetsTokenStream(tvStream);
>        }
>    }
>    catch (IllegalArgumentException e) {
>      // No problem. But we can't use TermOffsets optimization.
>    }
>
>    for (int j = 0; j < docTexts.length; j++) {
>      if( tots != null ) {
>        // if we're using TermOffsets optimization, then get the next
>        // field value's TokenStream (i.e. get field j's TokenStream) from
> tots:
>        tstream = tots.getMultiValuedTokenStream( docTexts[j].length() );
>      } else {
>        // fall back to analyzer
>        tstream = createAnalyzerTStream(schema, fieldName, docTexts[j]);
>      }
>
>      Highlighter highlighter;
>      if
> (Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER,
> "true"))) {
>        // TODO: this is not always necessary - eventually we would like to
> avoid this wrap
>        //       when it is not needed.
>        tstream = new CachingTokenFilter(tstream);
>
>        // get highlighter
>        highlighter = getPhraseHighlighter(query, fieldName, req,
> (CachingTokenFilter) tstream);
>
>        // after highlighter initialization, reset tstream since
> construction of highlighter already used it
>        tstream.reset();
>      }
>      else {
>        // use "the old way"
>        highlighter = getHighlighter(query, fieldName, req);
>      }
>
>      int maxCharsToAnalyze = params.getFieldInt(fieldName,
>          HighlightParams.MAX_CHARS,
>          Highlighter.DEFAULT_MAX_CHARS_TO_ANALYZE);
>      if (maxCharsToAnalyze < 0) {
>        highlighter.setMaxDocCharsToAnalyze(docTexts[j].length());
>      } else {
>        highlighter.setMaxDocCharsToAnalyze(maxCharsToAnalyze);
>      }
>
>      try {
>        TextFragment[] bestTextFragments =
> highlighter.getBestTextFragments(tstream, docTexts[j],
> mergeContiguousFragments, numFragments);
>        for (int k = 0; k < bestTextFragments.length; k++) {
>          if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {
>                if ((bestTextFragments[k] != null) ){//&&
> (bestTextFragments[k].getScore() > 0)) {
>                  frags.add(bestTextFragments[k]);
>                }
>          }
>          else {
>                if ((bestTextFragments[k] != null) &&
> (bestTextFragments[k].getScore() > 0)) {
>                  frags.add(bestTextFragments[k]);
>            }
>          }
>        }
>      } catch (InvalidTokenOffsetsException e) {
>        throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
>      }
>    }
>    // sort such that the fragments with the highest score come first
>    if (!params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {
>            Collections.sort(frags, n

Re: Need Schema help

This range behavior doesn't make sense. Are you completely sure
you're not dropping a digit out someplace?

Best
Erick

2011/6/2 Denis Kuzmenok :
> Hi)
>
> What i need:
> Index  prices  to  products, each product has multiple prices, to each
> region, country, and price itself.
> I   tried   to  do  with  field  type  "long"  multiple:true, and form
> value  as  "country  code  +  region code + price" (1004000349601, for
> example), but it has strange behaviour.. price:[* TO 1004000349600] do
> include 1004000349601.. I am doing something wrong?
>
> Possible data:
> Country: 1-9
> Region: 0-99
> Price: 1-999
>
>

Is there a way to get all the hits and score them later?

Basically I don't want the hits and the scores at the same time.  I want to
get a list of hits but I want to score them myself externally (there is a
dedicated server that will do the scoring given a list of id's).  Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016424.html
Sent from the Solr - User mailing list archive at Nabble.com.

Large number of dynamic fields

2011-06-02 Thread Santiago Bazerque

Hello,

I have a 7Gb index having 2MM documents. Each document has about 400 fields,
but fields are dynamic and in total I have ~200k fields.

We're using SOLR 3.1 and tomcat 5.5. We are seeing very slow start-up times
(from tomcat startup to SOLR ready to answer queries about 5 minutes). We
have tried from 8 to 32GB of memory, with little difference.

Would you say SOLR is not suitable for such a large number of fields?
Committing ~10k docs takes about 5 minutes as well.

Thanks in advance,
Santiago

Re: return unaltered complete multivalued fields with Highlighted results

2011-06-02 Thread Jonathan Rochkind


I could use this feature too, encourage you to submit a patch in JIRA.

I wouldn't call the param "preserveOrder" though -- what it's really 
doing is returning the whole entire field, with highlighting markers, 
not just "preserving order" of fragments.  Not sure what to call it, but 
not "preserveOrder".


On 6/2/2011 11:31 AM, alexei wrote:

Hi,

Here is the code for Solr 3.1 that will preserve all the text and will
disable sorting.

This goes in solrconfig.xml request handler config or which ever way you
pass params:
  true

This line goes into HighlightParams class:
   public static final String PRESERVE_ORDER = HIGHLIGHT + ".preserveOrder";

Replace this method DefaultSolrHighlighter.doHighlightingByHighlighter (I
only added 3 if blocks):

   private void doHighlightingByHighlighter( Query query, SolrQueryRequest
req, NamedList docSummaries,
   int docId, Document doc, String fieldName ) throws IOException {
 SolrParams params = req.getParams();
 String[] docTexts = doc.getValues(fieldName);
 // according to Document javadoc, doc.getValues() never returns null.
check empty instead of null
 if (docTexts.length == 0) return;

 SolrIndexSearcher searcher = req.getSearcher();
 IndexSchema schema = searcher.getSchema();
 TokenStream tstream = null;
 int numFragments = getMaxSnippets(fieldName, params);
 boolean mergeContiguousFragments = isMergeContiguousFragments(fieldName,
params);

 String[] summaries = null;
 List  frags = new ArrayList();

 TermOffsetsTokenStream tots = null; // to be non-null iff we're using
TermOffsets optimization
 try {
 TokenStream tvStream =
TokenSources.getTokenStream(searcher.getReader(), docId, fieldName);
 if (tvStream != null) {
   tots = new TermOffsetsTokenStream(tvStream);
 }
 }
 catch (IllegalArgumentException e) {
   // No problem. But we can't use TermOffsets optimization.
 }

 for (int j = 0; j<  docTexts.length; j++) {
   if( tots != null ) {
 // if we're using TermOffsets optimization, then get the next
 // field value's TokenStream (i.e. get field j's TokenStream) from
tots:
 tstream = tots.getMultiValuedTokenStream( docTexts[j].length() );
   } else {
 // fall back to analyzer
 tstream = createAnalyzerTStream(schema, fieldName, docTexts[j]);
   }

   Highlighter highlighter;
   if
(Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER,
"true"))) {
 // TODO: this is not always necessary - eventually we would like to
avoid this wrap
 //   when it is not needed.
 tstream = new CachingTokenFilter(tstream);

 // get highlighter
 highlighter = getPhraseHighlighter(query, fieldName, req,
(CachingTokenFilter) tstream);

 // after highlighter initialization, reset tstream since
construction of highlighter already used it
 tstream.reset();
   }
   else {
 // use "the old way"
 highlighter = getHighlighter(query, fieldName, req);
   }

   int maxCharsToAnalyze = params.getFieldInt(fieldName,
   HighlightParams.MAX_CHARS,
   Highlighter.DEFAULT_MAX_CHARS_TO_ANALYZE);
   if (maxCharsToAnalyze<  0) {
 highlighter.setMaxDocCharsToAnalyze(docTexts[j].length());
   } else {
 highlighter.setMaxDocCharsToAnalyze(maxCharsToAnalyze);
   }

   try {
 TextFragment[] bestTextFragments =
highlighter.getBestTextFragments(tstream, docTexts[j],
mergeContiguousFragments, numFragments);
 for (int k = 0; k<  bestTextFragments.length; k++) {
   if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {  

if ((bestTextFragments[k] != null) ){//&&
(bestTextFragments[k].getScore()>  0)) {
  frags.add(bestTextFragments[k]);
}
   }
   else {
if ((bestTextFragments[k] != null)&&
(bestTextFragments[k].getScore()>  0)) {
  frags.add(bestTextFragments[k]);
 }
   }
 }
   } catch (InvalidTokenOffsetsException e) {
 throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
   }
 }
 // sort such that the fragments with the highest score come first
 if (!params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {
Collections.sort(frags, new Comparator() {
  public int compare(TextFragment arg0, TextFragment arg1) {
return Math.round(arg1.getScore() - arg0.getScore());
  }
});
 }

  // convert fragments back into text
  // TODO: we can include score and position information in output as
snippet attributes
 if (frags.size()>  0) {
   ArrayList  fragTexts = new ArrayList();
   for (TextFragment fragment: frags) {
 if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {

Re: Is there a way to get all the hits and score them later?

To clarify.  I want to do this all underneath solr.  I don't want to get a
bunch of hits from solr in my app and then go to my server and score them
again.  I'd like to score them myself underneath solr before I return the
results to my app.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016592.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to get all the hits and score them later?

2011-06-02 Thread Jonathan Rochkind

Well, you can get all the hits by setting "rows" to a very high value, 
say a value one more than the total number of docs you have int he 
database, so all hits will be returned. If there are a lot of them, it 
won't be quick.


If you choose to sort by something other than 'score', I don't know if 
Solr will score anyway, not sure if there's a way to actually turn off 
scoring. But you can certainly ignore it.


Not sure if this is really what you were asking, it is a pretty simple 
answer.


On 6/2/2011 2:30 PM, arian487 wrote:

Basically I don't want the hits and the scores at the same time.  I want to
get a list of hits but I want to score them myself externally (there is a
dedicated server that will do the scoring given a list of id's).  Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016424.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to get all the hits and score them later?

2011-06-02 Thread Jonathan Rochkind

It sounds to me like maybe you want to implement a custom scoring 
algorithm in Solr?


I have no experience with that, but maybe if you ask and/or google using 
those words, you'll have more luck. I know it's possible to implement a 
custom scoring algorithm, but I believe it's kind of tricky, and also of 
course has performance implications depending on implementation -- and 
defintiely isn't designed for the use case of sending all results to an 
external server for scoring (not sure how you could do that in a 
performant way even if Solr's architecture would support it, which I'm 
not sure).


On 6/2/2011 3:01 PM, arian487 wrote:

To clarify.  I want to do this all underneath solr.  I don't want to get a
bunch of hits from solr in my app and then go to my server and score them
again.  I'd like to score them myself underneath solr before I return the
results to my app.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016592.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching using a PDF

2011-06-02 Thread Brian Lamb

I mean instead of typing http://localhost:8983/?q=mysearch, I would send a
PDF file with the contents of "mysearch" and search based on that. I am
leaning toward handling this before it hits solr however.

Thanks,

Brian Lamb

On Wed, Jun 1, 2011 at 3:52 PM, Erick Erickson wrote:

> I'm not quite sure what you mean by "regular search". When
> you index a PDF (Presumably through Tika or Solr Cell) the text
> is indexed into your index and you can certainly search that. Additionally,
> there may be meta data indexed in specific fields (e.g. author,
> date modified, etc).
>
> But what does "search based on a PDF file" mean in your context?
>
> Best
> Erick
>
> On Wed, Jun 1, 2011 at 3:41 PM, Brian Lamb
>  wrote:
> > Is it possible to do a search based on a PDF file? I know its possible to
> > update the index with a PDF but can you do just a regular search with it?
> >
> > Thanks,
> >
> > Brian Lamb
> >
>

Re: Is there a way to get all the hits and score them later?

2011-06-02 Thread Upayavira

don't know if this is what you mean: you can add 'score' to the fl field
list, and it will show you the score for each item.

Upayavira

On Thu, 02 Jun 2011 11:30 -0700, "arian487"  wrote:
> Basically I don't want the hits and the scores at the same time.  I want
> to
> get a list of hits but I want to score them myself externally (there is a
> dedicated server that will do the scoring given a list of id's).  Thanks!
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016424.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source

Re: Searching using a PDF

Not that I know of, you'll probably have to handle this before it hits Solr.

Best
Erick

On Thu, Jun 2, 2011 at 3:10 PM, Brian Lamb
 wrote:
> I mean instead of typing http://localhost:8983/?q=mysearch, I would send a
> PDF file with the contents of "mysearch" and search based on that. I am
> leaning toward handling this before it hits solr however.
>
> Thanks,
>
> Brian Lamb
>
> On Wed, Jun 1, 2011 at 3:52 PM, Erick Erickson wrote:
>
>> I'm not quite sure what you mean by "regular search". When
>> you index a PDF (Presumably through Tika or Solr Cell) the text
>> is indexed into your index and you can certainly search that. Additionally,
>> there may be meta data indexed in specific fields (e.g. author,
>> date modified, etc).
>>
>> But what does "search based on a PDF file" mean in your context?
>>
>> Best
>> Erick
>>
>> On Wed, Jun 1, 2011 at 3:41 PM, Brian Lamb
>>  wrote:
>> > Is it possible to do a search based on a PDF file? I know its possible to
>> > update the index with a PDF but can you do just a regular search with it?
>> >
>> > Thanks,
>> >
>> > Brian Lamb
>> >
>>
>

RE: Anyway to know changed documents?

2011-06-02 Thread Robert Petersen

...and it works really well!!!  :)

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Wednesday, June 01, 2011 5:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Anyway to know changed documents?

On 6/1/2011 6:12 AM, pravesh wrote:
> SOLR wiki will provide help on this. You might be interested in pure
Java
> based replication too. I'm not sure,whether SOLR operational will have
this
> feature(synch'ing only changed segments). You might need to change
> configuration in searchconfig.xml

Yes, this feature is there in the Java/HTTP based replication since Solr
1.4

Re: tika and solr 3,1 integration

2011-06-02 Thread Juan Grande

Hi Naveen,

Check if there is a dynamic field named "attr_*" in the schema. The
"uprefix=attr_" parameter means that if Solr can't find an extracted field
in the schema, it'll add the prefix "attr_" and try again.

*Juan*



On Thu, Jun 2, 2011 at 4:21 AM, Naveen Gupta  wrote:

> Hi
>
> I am trying to integrate solr 3.1 and tika (which comes default with the
> version)
>
> and using curl command trying to index few of the documents, i am getting
> this error. the error is attr_meta field is unknown. i checked the
> solrconfig, it looks perfect to me.
>
> can you please tell me what i am missing.
>
> I copied all the jars from contrib/extraction/lib to solr/lib folder that
> is
> there in same place where conf is there 
>
>
> I am using the same request handler which is coming with default
>
>   startup="lazy"
>  class="solr.extraction.ExtractingRequestHandler" >
>
>  
>  text
>  true
>  ignored_
>
>  
>  true
>  links
>  ignored_
>
>  
>
>
>
>
>
> * curl "
>
> http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true
> "
> -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"*
>
>
> Apache Tomcat/6.0.18 - Error report
> HTTP Status 400 - ERROR:unknown field 'attr_meta' size="1" noshade="noshade">type Status
> reportmessage
> ERROR:unknown field 'attr_meta'description The
> request sent by the client was syntactically incorrect (ERROR:unknown field
> 'attr_meta').Apache
> Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib#
>
>
> Please note
>
> i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows
> machine
> and using solr cell
>
> calling the program works fine without any changes in configuration.
>
> Thanks
> Naveen
>

Sorting

2011-06-02 Thread Clecio Varjao

Hi,

When using the following URL:
http://localhost:8080/solr/StatReg/select?version=2.2&sort=path+asc&fl=path&start=0&q=paths%3A%222%2Froot%2FStatReg%2F--+C+--%22&hl=off&rows=500

I get the result in the following order:

[...]
/-- C --/Community Care Facility Act [RSBC 1996] c. 60/00_96060REP_01.xml
/-- C --/Community Care and Assisted Living Act [SBC 2002] c. 75/00_02075_01.xml
[...]

However, the order is not right "and Assisted" should come before
"Facitity Act".

I'm using the following schema configuration:





Thanks,

Clécio

Re: Is there a way to get all the hits and score them later?

Actually I was thinking I wanted to do something before the sharding (like in
the layer where faceting happens for example).  I wanna hack a plugin in the
middle to go to my server after I have a bunch of hits.  Just not sure where
to do this...

Though I've decided I can do scoring from solr (like a preliminary scoring
to narrow down some results) and then in the middle send those hits to my
server for additional scoring.  I can't hack it on in the end since the
sharding has happened I think, I'm just not sure where to look right now.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3017401.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sorting

2011-06-02 Thread Juan Grande

Hi Clécio,

Your problem may be caused by case sensitiveness of string fields. Try using
the "lowercase" field type that comes in the example.

Regards,

*Juan*



On Thu, Jun 2, 2011 at 6:13 PM, Clecio Varjao wrote:

> Hi,
>
> When using the following URL:
>
> http://localhost:8080/solr/StatReg/select?version=2.2&sort=path+asc&fl=path&start=0&q=paths%3A%222%2Froot%2FStatReg%2F--+C+--%22&hl=off&rows=500
>
> I get the result in the following order:
>
> [...]
> /-- C --/Community Care Facility Act [RSBC 1996] c. 60/00_96060REP_01.xml
> /-- C --/Community Care and Assisted Living Act [SBC 2002] c.
> 75/00_02075_01.xml
> [...]
>
> However, the order is not right "and Assisted" should come before
> "Facitity Act".
>
> I'm using the following schema configuration:
>
>  omitNorms="true"/>
>
>  multiValued="false" />
>
> Thanks,
>
> Clécio
>

Indexes in ramdisk don't show performance improvement?

2011-06-02 Thread Parker Johnson


Hey everyone.

Been doing some load testing over the past few days. I've been throwing a
good bit of load at an instance of solr and have been measuring response
time.  We're running a variety of different keyword searches to keep
solr's cache on its toes.

I'm running two exact same load testing scenarios: one with indexes
residing in /dev/shm and another from local disk.  The indexes are about
4.5GB in size.

On both tests the response times are the same.  I wasn't expecting that.
I do see the java heap size grow when indexes are served from disk (which
is expected).  When the indexes are served out of /dev/shm, the java heap
stays small.

So in general is this consistent behavior?  I don't really see the
advantage of serving indexes from /dev/shm.  When the indexes are being
served out of ramdisk, is the linux kernel or the memory mapper doing
something tricky behind the scenes to use ramdisk in lieu of the java heap?

For what it is worth, we are running x_64 rh5.4 on a 12 core 2.27Ghz Xeon
system with 48GB ram.

Thoughts?

-Park

Re: Solr memory consumption

2011-06-02 Thread Alexey Serba

> Commits  are  divided  into  2  groups:
> - often but small (last changed
> info)
1) Make sure that it's not too often and you don't have commit
overlapping problem.
http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F

2) You may also try to limit cache sizes and check if it helps.

3) If it doesn't help then try to monitor your app using jconsole
* try to hit garbage collector and see if it frees some memory
* browse solr jmx attributes and see if there'r any hints re solr
caches usage, etc

4) Try to run jmap -heap -histo and see if there's any hints there

5) If none of above helps then you probably need to examine your
memory usage using some kind of java profiler tool (like yourkit
profiler)


> Size: 4 databases about 1G (sum), 1 database (with n-gram) for 21G..
> I  don't  know any other way to search for product names except n-gram
> =\
Standard text field with solr.WordDelimiterFilterFactory and
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" during
indexing isn't good enough? You might want to limit min and max ngram
size, just to reduce your index size.

Hitting the URI limit, how to get around this?

2011-06-02 Thread JohnRodey

I have a master solr instance that I sent my request to, it hosts no
documents it just farms the request out to a large number of shards. All the
other solr instances that host the data contain multiple cores.

Therefore my search string looks like
"http://host:port/solr/select?...&shards=nodeA:1234/solr/core01,nodeA:1234/solr/core02,nodeA:1234/solr/core03,...";
This shard list is pretty long and has finally hit "the limit".

So my question is how to best avoid having to build such a long uri?

Is there a way to have mutiple tiers, where the master server has a list of
servers (nodeA:1234,nodeB:1234,...) and each of those nodes query the cores
that they host (nodeA hosts core01, core02, core03, ...)?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3017837.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexes in ramdisk don't show performance improvement?

What I expect is happening is that the Solr caches are effectively making the
two tests identical, using memory to hold the vital parts of the code in both
cases (after disk warming on the instance using the local disk). I suspect if
you measured the first few queries (assuming no auto-warming) you'd see the
local disk version be slower.

Were you running these tests for curiosity or is running from /dev/shm something
you're considering for production?

Best
Erick

On Thu, Jun 2, 2011 at 5:47 PM, Parker Johnson  wrote:
>
> Hey everyone.
>
> Been doing some load testing over the past few days. I've been throwing a
> good bit of load at an instance of solr and have been measuring response
> time.  We're running a variety of different keyword searches to keep
> solr's cache on its toes.
>
> I'm running two exact same load testing scenarios: one with indexes
> residing in /dev/shm and another from local disk.  The indexes are about
> 4.5GB in size.
>
> On both tests the response times are the same.  I wasn't expecting that.
> I do see the java heap size grow when indexes are served from disk (which
> is expected).  When the indexes are served out of /dev/shm, the java heap
> stays small.
>
> So in general is this consistent behavior?  I don't really see the
> advantage of serving indexes from /dev/shm.  When the indexes are being
> served out of ramdisk, is the linux kernel or the memory mapper doing
> something tricky behind the scenes to use ramdisk in lieu of the java heap?
>
> For what it is worth, we are running x_64 rh5.4 on a 12 core 2.27Ghz Xeon
> system with 48GB ram.
>
> Thoughts?
>
> -Park
>
>
>

Re: Sorting algorithm

It hasn't been committed yet, but you may want to track this JIRA:

https://issues.apache.org/jira/browse/SOLR-2136

I happened to notice it over on the dev list, it's about adding if ()
to function queries.

Best
Erick

2011/6/2 Tomás Fernández Löbbe :
> OK, then (everything that's available at index time, I'll say it's
> constant):
> (Math.log(z) / Math.LN10) (not sure what you mean with Math.LN10) is
> constant, I'll call it c1
>
> ((y * t) / 45000) = (y/4500)*t --> y/4500 is constant, I'll call it c2.
>
> c1+(c2 * t) = c1 + (c2 * (CreationDate - now) / 1000) --> c2 / 1000 is also
> constant, I'll call it c3.
>
> Then, your ranking formula is: c1 + (c3 * (creationDate - now)).
>
> In solr, this will be: &sort=sum(c1,product(c3,ms(creationDate, NOW))).
>
> I haven't tried it but if my arithmetics are correct (I'm a little bit rusty
> with that), that should work and should be faster than doing the whole thing
> at query time. Of course, "c1" and "c3" must be indexed as fields.
>
> Regards,
>
> Tomás
> On Thu, Jun 2, 2011 at 9:46 AM, Richard Hodsdon
> wrote:
>
>> Thanks for the response,
>>
>> You are correct, but my pseudo code was not.
>> this line
>> var t = (CreationDate - 1131428803) / 1000;
>> should be
>> var t = (CreationDate - now()) / 1000;
>>
>> This will cause the items ranking to depreciate over time.
>>
>> Richard
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014961.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>

Better to have lots of smaller cores or one really big core?

2011-06-02 Thread JohnRodey

I am trying to decide what the right approach would be, to have one big core
and many smaller cores hosted by a solr instance.

I think there may be trade offs either way but wanted to see what others do. 
And by small I mean about 5-10 million documents, large may be 50 million.

It seems like small cores are better because
- If one server can host say 70 million documents (before memory issues) we
can get really close with a bunch of small indexes, vs only being able to
host one 50 million document index.  And when a software update comes out
that allows us to host 90 million then we could add a few more small
indexes. 
- It takes less time to build ten 5 million document indexes than one 50
million document index.

It seems like larger cores are better because
- Each core returns their result set, so if I want 1000 results and their
are 100 cores the network is transferring 10 documents for that search. 
Where if I had only 10 much larger cores only 1 documents would be sent
over the network.
- It would prolong my time until I hit uri length limits being that there
would be less cores in my system.

Any thoughts???  Other trade-offs???

How do you find what the right size for you is?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3017973.html
Sent from the Solr - User mailing list archive at Nabble.com.

DeltaImport records not commited using DIH.

2011-06-02 Thread Anandha Ranganathan

Hi,


1) When I run full-import it works fine and commits all the records. The
document count matches table and dataImport.properties is updated with
last_index timestamp.
2)  After some time I ran the delta import and it is giving enough
information but it is not adding the new record into the index.

 I am including my config and log information. Could anyone help me to fix
this.

*Data-config*


  









  




*DeltaImport Status message.*


0:0:3.391
15
77
0
0
2011-06-02 16:58:23
2011-06-02 16:58:23
2011-06-02 16:58:24
2011-06-02 16:58:24
77



*Log info*
Jun 2, 2011 3:59:20 PM org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:15.641
Jun 2, 2011 4:01:02 PM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
INFO: Starting Delta Import
Jun 2, 2011 4:01:02 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={command=delta-import&clean=false&qt=/dataimport&commit=true}
status=0 QTime=0
Jun 2, 2011 4:01:02 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Jun 2, 2011 4:01:02 PM org.apache.solr.handler.dataimport.DocBuilder doDelta
INFO: Starting delta collection.
Jun 2, 2011 4:01:02 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running ModifiedRowKey() for Entity: cps_dataset
Jun 2, 2011 4:01:02 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity cps_dataset with URL:
jdbc:oracle:thin:@lnxdb-stg-abcd.com:1521:STG
Jun 2, 2011 4:01:03 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 484
Jun 2, 2011 4:01:03 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: xxx_xxx_cps_dataset rows obtained
: 77
Jun 2, 2011 4:01:03 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: xxx_xxx_cps_dataset rows obtained
: 0
Jun 2, 2011 4:01:03 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: xxx_xxx_dataset
Jun 2, 2011 4:01:18 PM org.apache.solr.handler.dataimport.DocBuilder doDelta
INFO: Delta Import completed successfully
Jun 2, 2011 4:01:18 PM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {} 0 0
Jun 2, 2011 4:01:18 PM org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:15.625

Re: Is there a way to get all the hits and score them later?

Hmm, looks like I can inherit the Similarity Class and do my own thing there. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3018001.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Better to have lots of smaller cores or one really big core?

Take another approach ? Cores are often used for isolation
purposes. That is, the data in one core may have nothing to do with
another core, the schemas don't have to match etc. They #may# be
both logically and physically separate.

I don't have measurements for this, so I'm guessing a little. But I expect
that using multiple cores will actually use a few more resources than a
single core (e.g. memory). Each core will be keeping a separate
cache, duplicating terms etc. (I may be wrong on this one!).

But if you have a single schema in a logically single core that just grows
too big to server queries acceptably, the usual approach is to go to
shards, which are just a core but Solr manages the query part over
multiple shards via configuration, which is probably easier. So the answer
in this case is to put stuff on a single machine in a single core until it
grows too big, then go to sharding

So the question is really whether you consider the cores sub-parts of a
single index or distinct units (say one core per customer). In the former,
I'd use one core until it gets too big, then shard. In the latter, multiple
cores are a good solution, largely for administrative/security reasons,
but then you aren't manually constructing a huge URL...

Hope that helps
Erick

On Thu, Jun 2, 2011 at 7:57 PM, JohnRodey  wrote:
> I am trying to decide what the right approach would be, to have one big core
> and many smaller cores hosted by a solr instance.
>
> I think there may be trade offs either way but wanted to see what others do.
> And by small I mean about 5-10 million documents, large may be 50 million.
>
> It seems like small cores are better because
> - If one server can host say 70 million documents (before memory issues) we
> can get really close with a bunch of small indexes, vs only being able to
> host one 50 million document index.  And when a software update comes out
> that allows us to host 90 million then we could add a few more small
> indexes.
> - It takes less time to build ten 5 million document indexes than one 50
> million document index.
>
> It seems like larger cores are better because
> - Each core returns their result set, so if I want 1000 results and their
> are 100 cores the network is transferring 10 documents for that search.
> Where if I had only 10 much larger cores only 1 documents would be sent
> over the network.
> - It would prolong my time until I hit uri length limits being that there
> would be less cores in my system.
>
> Any thoughts???  Other trade-offs???
>
> How do you find what the right size for you is?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3017973.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Indexes in ramdisk don't show performance improvement?

2011-06-02 Thread Parker Johnson


That¹s just the thing.  Even the initial queries have similar response
times as the later ones.  WEIRD!

I was considering running from /dev/shm in production, but for slaves only
(master remains on disk).  At this point though, I'm not seeing a benefit
to ramdisk so I think I'm going back to traditional disk so the indexes
stay intact after a power cycle.

Has anyone else seen that indexes served from disk perform similarly as
indexes served from ramdisk?

-Park

On 6/2/11 4:15 PM, "Erick Erickson"  wrote:

>What I expect is happening is that the Solr caches are effectively making
>the
>two tests identical, using memory to hold the vital parts of the code in
>both
>cases (after disk warming on the instance using the local disk). I
>suspect if
>you measured the first few queries (assuming no auto-warming) you'd see
>the
>local disk version be slower.
>
>Were you running these tests for curiosity or is running from /dev/shm
>something
>you're considering for production?
>
>Best
>Erick
>
>On Thu, Jun 2, 2011 at 5:47 PM, Parker Johnson 
>wrote:
>>
>> Hey everyone.
>>
>> Been doing some load testing over the past few days. I've been throwing
>>a
>> good bit of load at an instance of solr and have been measuring response
>> time.  We're running a variety of different keyword searches to keep
>> solr's cache on its toes.
>>
>> I'm running two exact same load testing scenarios: one with indexes
>> residing in /dev/shm and another from local disk.  The indexes are about
>> 4.5GB in size.
>>
>> On both tests the response times are the same.  I wasn't expecting that.
>> I do see the java heap size grow when indexes are served from disk
>>(which
>> is expected).  When the indexes are served out of /dev/shm, the java
>>heap
>> stays small.
>>
>> So in general is this consistent behavior?  I don't really see the
>> advantage of serving indexes from /dev/shm.  When the indexes are being
>> served out of ramdisk, is the linux kernel or the memory mapper doing
>> something tricky behind the scenes to use ramdisk in lieu of the java
>>heap?
>>
>> For what it is worth, we are running x_64 rh5.4 on a 12 core 2.27Ghz
>>Xeon
>> system with 48GB ram.
>>
>> Thoughts?
>>
>> -Park
>>
>>
>>
>

Re: Indexes in ramdisk don't show performance improvement?

2011-06-02 Thread Trey Grainger

Linux will cache the open index files in RAM (in the filesystem cache)
after their first read which makes the ram disk generally useless.
Unless you're processing other files on the box with a size greater
than your total unused ram (and thus need to micro-manage what stays
in RAM), then I wouldn't recommend using a ramdisk - it's just more to
manage.  If you reboot the box and run a few searches, those first few
will likely be slower until all the index files are cached in Memory.
After that point, the performance should be comparable because all
files are read out of RAM from that point forward.

If solr caches are enabled and your queries are repetitive then that
could also be contributing to the speed of repetitive queries.  Note
that the above advice assumes your total unused ram (not allocated to
the JVM or any other processes) is greater than the size of your
lucene index files, which should be a safe assumption considering
you're trying to put the whole index in a ramdisk.

-Trey

On Thu, Jun 2, 2011 at 7:15 PM, Erick Erickson  wrote:
> What I expect is happening is that the Solr caches are effectively making the
> two tests identical, using memory to hold the vital parts of the code in both
> cases (after disk warming on the instance using the local disk). I suspect if
> you measured the first few queries (assuming no auto-warming) you'd see the
> local disk version be slower.
>
> Were you running these tests for curiosity or is running from /dev/shm 
> something
> you're considering for production?
>
> Best
> Erick
>
> On Thu, Jun 2, 2011 at 5:47 PM, Parker Johnson  wrote:
>>
>> Hey everyone.
>>
>> Been doing some load testing over the past few days. I've been throwing a
>> good bit of load at an instance of solr and have been measuring response
>> time.  We're running a variety of different keyword searches to keep
>> solr's cache on its toes.
>>
>> I'm running two exact same load testing scenarios: one with indexes
>> residing in /dev/shm and another from local disk.  The indexes are about
>> 4.5GB in size.
>>
>> On both tests the response times are the same.  I wasn't expecting that.
>> I do see the java heap size grow when indexes are served from disk (which
>> is expected).  When the indexes are served out of /dev/shm, the java heap
>> stays small.
>>
>> So in general is this consistent behavior?  I don't really see the
>> advantage of serving indexes from /dev/shm.  When the indexes are being
>> served out of ramdisk, is the linux kernel or the memory mapper doing
>> something tricky behind the scenes to use ramdisk in lieu of the java heap?
>>
>> For what it is worth, we are running x_64 rh5.4 on a 12 core 2.27Ghz Xeon
>> system with 48GB ram.
>>
>> Thoughts?
>>
>> -Park
>>
>>
>>
>

Re: synonyms problem

2011-06-02 Thread deniz

oh thank you for reminding me about string and text issues... I will change
it asap... and about index analyzer i just removed if for brevity... 

i will try again and if it fails will post here again...

thank you so much

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3018185.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Hitting the URI limit, how to get around this?

2011-06-02 Thread Jayendra Patil

just a suggestion ...

If the shards are know, you can add them as the default params in the
requesthandler so they are added always. and the URL would just have
the qt parameter.

As the limit for uri is browser dependent.
How are you querying solr .. any client api ?? through browser ??
is it hitting the max header length ?? Can you use post instead ??

Regards,
Jayendra

On Thu, Jun 2, 2011 at 7:12 PM, JohnRodey  wrote:
> I have a master solr instance that I sent my request to, it hosts no
> documents it just farms the request out to a large number of shards. All the
> other solr instances that host the data contain multiple cores.
>
> Therefore my search string looks like
> "http://host:port/solr/select?...&shards=nodeA:1234/solr/core01,nodeA:1234/solr/core02,nodeA:1234/solr/core03,...";
> This shard list is pretty long and has finally hit "the limit".
>
> So my question is how to best avoid having to build such a long uri?
>
> Is there a way to have mutiple tiers, where the master server has a list of
> servers (nodeA:1234,nodeB:1234,...) and each of those nodes query the cores
> that they host (nodeA hosts core01, core02, core03, ...)?
>
> Thanks!
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3017837.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Debugging a Solr/Jetty Hung Process

2011-06-02 Thread Michael Sokolov

If you have an SNMP infrastructure available (nagios or similar) you 
should be able to set up a polling monitor that will keep statistics on 
the number of threads in your jvm and even allow you to inspect their 
stacks remotely.  You can set alarms so you will be notified if cpu  
thread count or other metrics exceed a configurable threshold and then 
take a look at the process before it goes off the deep end.  It is a 
fair amount of work to set this up, but really useful if you need to 
support a critical system.


-Mike

On 6/1/2011 3:42 PM, Jonathan Rochkind wrote:
First guess (and it really is just a guess) would be Java garbage 
collection taking over. There are some JVM parameters you can use to 
tune the GC process, especially if the machine is multi-core, making 
sure GC happens in a seperate thread is helpful.


But figuring out exactly what's going on requires confusing JVM 
debugging of which I am no expert at either.


On 6/1/2011 3:04 PM, Chris Cowan wrote:
About once a day a Solr/Jetty process gets hung on my server 
consuming 100% of one of the CPU's. Once this happens the server no 
longer responds to requests. I've looked through the logs to try and 
see if anything stands out but so far I've found nothing out of the 
ordinary.


My current remedy is to log in and just kill the single processes 
that's hung. Once that happens everything goes back to normal and I'm 
good for a day or so.  I'm currently  the running following:


solr-jetty-1.4.0+ds1-1ubuntu1

which is comprised of

Solr 1.4.0
Jetty 6.1.22
on Unbuntu 10.10

I'm pretty new to managing a Jetty/Solr instance so at this point I'm 
just looking for advice on how I should go about trouble shooting 
this problem.


Chris

Re: Index vs. Query Time Aware Filters

2011-06-02 Thread Michael Sokolov

It doesn't look like this is supported in any way that is at all
straightforward. http://wiki.apache.org/solr/SolrPlugins talks about
the easy ways to parameterize plugins, and they don't include what
you're after.

I think maybe you could extend the query parser you are currently using,
wrap the parse() method, get a hold of your analyzer, which maybe is
your own class with special knowledge of its filter chain and can inform
the filter that it's being used in "query" mode; otherwise it would
default to index mode.

If you are letting Solr generate the Analyzer, or maybe in either case
(?) you could call Analyzer.reusableTokenStream() to get the
TokenStream, but from there things get murky. I don't think TokenStream
provides any mechanism to walk the chain so you could find your special
filter and inform it of its status. You'd probably have to add your own
mechanism for tracking this, extending all TokenStreams, but I don't
think this is actually feasible since these are required to be final!

-Mike

On 6/1/2011 12:23 PM, Mike Schultz wrote:

I should have explained that the queryMode parameter is for our own custom
filter. So the result is that we have 8 filters in our field definition.
All the filter parameters (30 or so) of the query time and index time are
identical EXCEPT for our one custom filter which needs to know if it's in
query time or index time mode. If we could determine inside our custom code
whether we're indexing or querying, then we could omit the query time
definition entirely and save about 50 lines of configuration and be much
less error prone.

One possible solution would be if we could get at the SolrCore from within a
filter. Then at init time we could iterate through the filter chains and
determine when we find a factory == this. (I've done this in other places
where it's useful to know the name of a ValueSourceParser for example)

--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-vs-Query-Time-Aware-Filters-tp3009450p3011556.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Newbie question: how to deal with different # of search results per page due to pagination then grouping

2011-06-02 Thread Michael Sokolov

Just keep one extra facet value hidden; ie request one more than you 
need to show the current page.  If you get it, there are more (show the 
next button), otherwise there aren't.  You can't page arbitrarily deep 
like this, but you can have a next button reliably enabled or disabled.


On 6/1/2011 5:57 PM, Robert Petersen wrote:

Yes that is exactly the issue... we're thinking just maybe always have a
next button and if you go too far you just get zero results.  User gets
what the user asks for, and so user could simply back up if desired to
where the facet still has values.  Could also detect an empty facet
results on the front end.  You can also only expand one facet only to
allow paging only the facet pane and not the whole page using an ajax
call.



-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Wednesday, June 01, 2011 2:30 PM
To: solr-user@lucene.apache.org
Cc: Robert Petersen
Subject: Re: Newbie question: how to deal with different # of search
results per page due to pagination then grouping

How do you know whether to provide a 'next' button, or whether you are
the end of your facet list?

On 6/1/2011 4:47 PM, Robert Petersen wrote:

I think facet.offset allows facet paging nicely by letting you index
into the list of facet values.  It is working for me...

http://wiki.apache.org/solr/SimpleFacetParameters#facet.offset


-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Wednesday, June 01, 2011 12:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Newbie question: how to deal with different # of search
results per page due to pagination then grouping

There's no great way to do that.

One approach would be using facets, but that will just get you the
author names (as stored in fields), and not the documents under it. If
you really only want to show the author names, facets could work. One
issue with facets though is Solr won't tell you the total number of
facet values for your query, so it's tricky to provide next/prev

paging

through them.

There is also a 'field collapsing' feature that I think is not in a
released Solr, but may be in the Solr repo. I'm not sure it will quite
do what you want either though, although it's related and worth a

look.

http://wiki.apache.org/solr/FieldCollapsing

Another vaguely related thing that is also not yet in a released Solr,
is a 'join' function. That could possibly be used to do what you want,
although it'd be tricky too.
https://issues.apache.org/jira/browse/SOLR-2272

Jonathan

On 6/1/2011 2:56 PM, beccax wrote:

Apologize if this question has already been raised.  I tried

searching

but

couldn't find the relevant posts.

We've indexed a bunch of documents by different authors.  Then for

search

results, we'd like to show the authors that have 1 or more documents
matching the search keywords.

The problem is right now our solr search method first paginates

results to

100 documents per page, then we take the results and group by

authors.

This

results in different number of authors per page.  (Some authors may

only

have one matching document and others 5 or 10.)

How do we change it to somehow show the same number of authors (say

25) per

page?

I mean alternatively we could just show all the documents themselves

ordered

by author, but it's not the user experience we're looking for.

Thanks so much.  And please let me know if you need more details not
provided here.
B

--
View this message in context:

http://lucene.472066.n3.nabble.com/Newbie-question-how-to-deal-with-diff
erent-of-search-results-per-page-due-to-pagination-then-grouping-tp30121

68p3012168.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: tika and solr 3,1 integration

Hi

This is fixed .. yes, schema.xml was the culprit and i fixed it looking at
the sample schema provided in the sample.

But in windows, i am getting slf4j (illegalacess exception) which looks like
jar problem. looking at the fixes, suggested in their FAQs, they are
suggesting to use 1.5.5 version, which is already there in lib folder ..

i have been finding a lot of jars to be deployed .. i am afraid if that is
causing the problem ..

Has somebody experienced the same ?

Thanks
Naveen


On Fri, Jun 3, 2011 at 2:41 AM, Juan Grande  wrote:

> Hi Naveen,
>
> Check if there is a dynamic field named "attr_*" in the schema. The
> "uprefix=attr_" parameter means that if Solr can't find an extracted field
> in the schema, it'll add the prefix "attr_" and try again.
>
> *Juan*
>
>
>
> On Thu, Jun 2, 2011 at 4:21 AM, Naveen Gupta  wrote:
>
> > Hi
> >
> > I am trying to integrate solr 3.1 and tika (which comes default with the
> > version)
> >
> > and using curl command trying to index few of the documents, i am getting
> > this error. the error is attr_meta field is unknown. i checked the
> > solrconfig, it looks perfect to me.
> >
> > can you please tell me what i am missing.
> >
> > I copied all the jars from contrib/extraction/lib to solr/lib folder that
> > is
> > there in same place where conf is there 
> >
> >
> > I am using the same request handler which is coming with default
> >
> >  >  startup="lazy"
> >  class="solr.extraction.ExtractingRequestHandler" >
> >
> >  
> >  text
> >  true
> >  ignored_
> >
> >  
> >  true
> >  links
> >  ignored_
> >
> >  
> >
> >
> >
> >
> >
> > * curl "
> >
> >
> http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true
> > "
> > -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"*
> >
> >
> > Apache Tomcat/6.0.18 - Error
> report
> > HTTP Status 400 - ERROR:unknown field
> 'attr_meta' > size="1" noshade="noshade">type Status
> > reportmessage
> > ERROR:unknown field 'attr_meta'description The
> > request sent by the client was syntactically incorrect (ERROR:unknown
> field
> > 'attr_meta').Apache
> > Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib#
> >
> >
> > Please note
> >
> > i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows
> > machine
> > and using solr cell
> >
> > calling the program works fine without any changes in configuration.
> >
> > Thanks
> > Naveen
> >
>

Strategy --> Frequent updates in our application

Hi

We are having an application where every 10 mins, we are doing indexing of
users docs repository, and eventually, if some thread is being added in that
particular discussion, we need to index the thread again (please note we are
not doing blind indexing each time, we have various rules to filter out
which thread is new and thus that is a candidate for indexing plus new ones
which has arrived).

So we are doing updates for each user docs repository .. the performance is
not looking so far very good. the future is that we are going to get hits in
volume(1000 to 10,000 hits per mins), so looking for strategy where we can
tune solr in order to index the data in real time

and what about NRT, is it fine to apply in this case of scenario. i read
that solr NRT is not very good in performance, but i am not going to believe
it since it is one of the best open sources ..so it is going to have this
problem sorted in near future ..but if any benchmark is there, kindly share
with me ... we would like to analyze with our requirements.

Is there any way to add incremental indexes which we generally find in other
search engine like endeca and etc? i don't know much in detail about solr...
since i am newbie, so can you please tell me if we can have some settings
which can keep track of incremental indexing?


Thanks
Naveen

RE: solr Invalid Date in Date Math String/Invalid Date String

2011-06-02 Thread Ellery Leung

Hi Erick

Here is the error message:

Fieldtype: tdate (I use the default one in solr schema.xml)
Field value(Index): 2006-12-22T13:52:13Z
Field value(query): [2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z]   <<<
with '[' and ']'

And it generates the result below:

---Start---
HTTP ERROR: 500

org.apache.solr.common.SolrException: Invalid Date in Date Math
String:'[2006-12-22T00:00:00Z TO 2006-12'

org.apache.jasper.JasperException: org.apache.solr.common.SolrException:
Invalid Date in Date Math String:'[2006-12-22T00:00:00Z TO 2006-12'
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:4
02)
at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
264)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl
ection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11
4)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:
835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:22
6)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:4
42)
Caused by: org.apache.solr.common.SolrException: Invalid Date in Date Math
String:'[2006-12-22T00:00:00Z TO 2006-12'
at org.apache.solr.schema.DateField.parseMath(DateField.java:158)
at
org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:101)
at
org.apache.solr.analysis.TrieTokenizer.(TrieTokenizerFactory.java:73)
at
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.ja
va:51)
at
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.ja
va:41)
at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.
java:69)
at
org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:
74)
at
org.apache.jsp.admin.analysis_jsp._jspService(org.apache.jsp.admin.analysis_
jsp:685)
at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
73)
... 29 more
Caused by: java.text.ParseException: Unparseable date:
"[2006-12-22T00:00:00Z"
at java.text.DateFormat.parse(Unknown Source)
at org.apache.solr.schema.DateField.parseDate(DateField.java:254)
at org.apache.solr.schema.DateField.parseMath(DateField.java:156)
... 39 more

RequestURI=/solr/i-audience.com-contacts-test/admin/analysis.jsp

Powered by Jetty://

--- End ---

Can you tell me what is the problem?

Thank you very much in advance.



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 2011年5月31日 9:54 下午
To: solr-user@lucene.apache.org; elleryle...@be-o.com
Subject: Re: solr Invalid Date in Date Math String/Invalid Date String

Can we see the results of attaching &debugQuery=on to the query? That
often points out the issue.

I'd expect this form to work:
[

Re: Strategy --> Frequent updates in our application

2011-06-02 Thread Otis Gospodnetic

Naveen,

Solr does support incremental indexing.
Solr currently doesn't make use of Lucene's NRT support, but that is starting 
to 
change.
If you provide more specifics about issues you are having and your 
architecture, 
data and query volume, we may be able to help better.

Otis 

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Naveen Gupta 
> To: solr-user@lucene.apache.org
> Sent: Thu, June 2, 2011 11:29:42 PM
> Subject: Strategy --> Frequent updates in our application
> 
> Hi
> 
> We are having an application where every 10 mins, we are doing  indexing of
> users docs repository, and eventually, if some thread is being  added in that
> particular discussion, we need to index the thread again  (please note we are
> not doing blind indexing each time, we have various rules  to filter out
> which thread is new and thus that is a candidate for indexing  plus new ones
> which has arrived).
> 
> So we are doing updates for each  user docs repository .. the performance is
> not looking so far very good. the  future is that we are going to get hits in
> volume(1000 to 10,000 hits per  mins), so looking for strategy where we can
> tune solr in order to index the  data in real time
> 
> and what about NRT, is it fine to apply in this case of  scenario. i read
> that solr NRT is not very good in performance, but i am not  going to believe
> it since it is one of the best open sources ..so it is going  to have this
> problem sorted in near future ..but if any benchmark is there,  kindly share
> with me ... we would like to analyze with our  requirements.
> 
> Is there any way to add incremental indexes which we  generally find in other
> search engine like endeca and etc? i don't know much  in detail about solr...
> since i am newbie, so can you please tell me if we  can have some settings
> which can keep track of incremental  indexing?
> 
> 
> Thanks
> Naveen
>

Re: Indexes in ramdisk don't show performance improvement?

2011-06-02 Thread Otis Gospodnetic

Park,

I think there is no way initial queries will be the same IF:
* your index in ramfs is really in RAM
* your index in regular FS is not already in RAM due to being previously cached 
(you *did* flush OS cache before the test, right?)

Having said that, if you update your index infrequently and make use of warm up 
queries and cache warming, you are likely to be very fine with the index on 
disk.
For example, we have a customer right now that we helped a bit with 
performance.  They also have lots of RAM, 10M docs in the index, and replicate 
the whole optimized index nightly.  They have 2 servers, each handling about 
1000 requests per minute and their average response time is under 20 ms with 
pre-1.4.1 Solr and lots of facets and fqs (they use Solr not only for search, 
but also navigation).  No ramfs involved, but they have zero disk reads because 
the whole index is cached in memory, so things are fast.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Parker Johnson 
> To: "solr-user@lucene.apache.org" 
> Sent: Thu, June 2, 2011 9:20:55 PM
> Subject: Re: Indexes in ramdisk don't show performance improvement?
> 
> 
> That¹s just the thing.  Even the initial queries have similar  response
> times as the later ones.  WEIRD!
> 
> I was considering  running from /dev/shm in production, but for slaves only
> (master remains on  disk).  At this point though, I'm not seeing a benefit
> to ramdisk so I  think I'm going back to traditional disk so the indexes
> stay intact after a  power cycle.
> 
> Has anyone else seen that indexes served from disk perform  similarly as
> indexes served from ramdisk?
> 
> -Park
> 
> On 6/2/11 4:15  PM, "Erick Erickson"   wrote:
> 
> >What I expect is happening is that the Solr caches are  effectively making
> >the
> >two tests identical, using memory to hold  the vital parts of the code in
> >both
> >cases (after disk warming on  the instance using the local disk). I
> >suspect if
> >you measured the  first few queries (assuming no auto-warming) you'd see
> >the
> >local  disk version be slower.
> >
> >Were you running these tests for  curiosity or is running from /dev/shm
> >something
> >you're considering  for production?
> >
> >Best
> >Erick
> >
> >On Thu, Jun 2,  2011 at 5:47 PM, Parker Johnson 
> >wrote:
> >>
> >>  Hey everyone.
> >>
> >> Been doing some load testing over the  past few days. I've been throwing
> >>a
> >> good bit of load at  an instance of solr and have been measuring response
> >> time.   We're running a variety of different keyword searches to keep
> >> solr's  cache on its toes.
> >>
> >> I'm running two exact same load  testing scenarios: one with indexes
> >> residing in /dev/shm and another  from local disk.  The indexes are about
> >> 4.5GB in  size.
> >>
> >> On both tests the response times are the  same.  I wasn't expecting that.
> >> I do see the java heap size  grow when indexes are served from disk
> >>(which
> >> is  expected).  When the indexes are served out of /dev/shm, the  java
> >>heap
> >> stays small.
> >>
> >> So in  general is this consistent behavior?  I don't really see the
> >>  advantage of serving indexes from /dev/shm.  When the indexes are  being
> >> served out of ramdisk, is the linux kernel or the memory  mapper doing
> >> something tricky behind the scenes to use ramdisk in  lieu of the java
> >>heap?
> >>
> >> For what it is worth,  we are running x_64 rh5.4 on a 12 core 2.27Ghz
> >>Xeon
> >>  system with 48GB ram.
> >>
> >> Thoughts?
> >>
> >>  -Park
> >>
> >>
> >>
> >
> 
> 
>

Re: query routing with shards

2011-06-02 Thread Otis Gospodnetic

Hi Dmitry (you may not want to additionally copy Yonik, he's subscribed to this 
list, too)


It sounds like you have the knowledge of which query maps to which shard.  If 
so, why not control/change the value of "shards" param in the request to your 
front-end Solr (aka distributed request dispatcher) within your app, which is 
the one calling Solr?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Dmitry Kan 
> To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> Sent: Thu, June 2, 2011 7:00:53 AM
> Subject: query routing with shards
> 
> Hello all,
> 
> We have currently several pretty fat logically isolated shards  with the same
> schema / solrconfig (indices are separate). We currently have  one single
> front end SOLR (1.4) for the client code calls. Since a client  code query
> usually hits only one shard, we are considering making a smart  routing of
> queries to the shards they map to. Can you please give some  pointers as to
> what would be an optimal way to achieve such a routing inside  the front end
> solr? Is there a way to configure mapping inside the  solrconfig?
> 
> Thanks.
> 
> -- 
> Regards,
> 
> Dmitry Kan
>

java.io.IOException: The specified network name is no longer available

2011-06-02 Thread Gaurav Shingala


Hi,

I am using solr 1.4.1 and at the time of updating index getting following error:

2011-06-03 05:54:06,943 ERROR [org.apache.solr.core.SolrCore] 
(http-10.38.33.146-8080-4) java.io.IOException: The specified network name is 
no longer available
at java.io.RandomAccessFile.readBytes(Native Method)
at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(SimpleFSDirectory.java:132)
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78)
at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64)
at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129)
at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:160)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:57)
at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1103)
at org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:981)
at org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:320)
at 
org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:640)
at 
org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:545)
at 
org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:581)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:903)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:181)
at 
org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.event(CatalinaContext.java:285)
at 
org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.invoke(CatalinaContext.java:261)
at 
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:88)
at 
org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:100)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:53)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:362)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:951)
at java.lang.Thread.run(Thread.java:619)

2011-06-03 05:54:06,943 INFO  [org.apache.solr.core.SolrCore] 
(http-10.38.33.146-8080-4) [project_58787] webapp=/solr path=/select 
params={sort=revisionid_l+desc&start=0&q=type_s:IFCFileMaster+AND+modelversionid_l:(+8+7+)&wt=javabin&fq=reftable_s:IFCRELDEFINESBYPROPERTIES&version=1&rows=100}
 status=500 QTime=0 
2011-06-03 05:54:06,990 ERROR [org.apache.solr.servlet.SolrDispatchFilter] 
(http-10.38.33.146-8080-4) java.io.IOException: The specified network name is 
no longer available
at java.io.RandomAccessFile.readBytes(Native Method)
at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(SimpleFSDirectory.java:132)
a

different indexes for multitenant approach

Hi

I want to implement different index strategy where we want to keep indexes
with respect to each tennant and we want to maintain indexes separately ...

first level of category -- company name

second level of category - company name + fields to be indexed

then further categories - group of different company name based on some
heuristic (hashing) (if it grows furhter)

i want to do in the same solr instance. can it be possible ?

Thanks
Naveen

Re: how to make getJson parameter dynamic

lee carroll: Sorry for this. i did this because i was not getting any
response. anyway thanks for letting me know and now i found the solution of
the above problem :)
now i am facing a very strange problem related to jquery can you please help
me out.

$(document).ready(function(){
 $("#c2").click(function(){
var q=getquerystring() ;

   
$.getJSON("http://192.168.1.9:8983/solr/db/select/?wt=json&q="+q+"&json.wrf=?";,
function(result){
$.each(result.response.docs, function(i,item){
alert(result.response.docs);
alert(item.UID_PK);
});
});
});
});


when i use $("#c2").click(function() then it does not enter in $.getJSON()
and when i remove $("#c2").click(function() from the code it run fine. Why
is so please explain. because i want to get data from a text box on
onclickevent and then display response.



-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-make-getJson-parameter-dynamic-tp3014941p3018732.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to display search results of solr in to other application.

$.getJSON(
   "http://[server]:[port]/solr/select/?jsoncallback=?";,
   {"q": queryString,
   "version": "2.2",
   "start": "0",
   "rows": "10",
   "indent": "on",
   "json.wrf": "callbackFunctionToDoSomethingWithOurData",
   "wt": "json",
   "fl": "field1"}
   ); 

would you please explain what are  queryString and "json.wrf":
"callbackFunctionToDoSomethingWithOurData". and what if i want to change my
query string each time.

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3018740.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: query routing with shards

2011-06-02 Thread Dmitry Kan

Hi Otis,

I merely followed on the gmail's suggestion to include other people into the
recipients list, Yonik was the first one :) I won't do it next time.

Thanks for a rapid reply. The reason for doing this query routing is that we
abstract the distributed SOLR from the client code for security reasons
(that is, we don't want to expose the entire shard farm to the world, but
only the frontend SOLR) and for better decoupling.

Is it possible to implement a plugin to SOLR that would map queries to
shards?

We have other choices too, they'll take quite some time, that's why I
decided to quickly ask, if I was missing something from the SOLR main
components design and configuration.

Dmitry

On Fri, Jun 3, 2011 at 8:25 AM, Otis Gospodnetic  wrote:

> Hi Dmitry (you may not want to additionally copy Yonik, he's subscribed to
> this
> list, too)
>
>
> It sounds like you have the knowledge of which query maps to which shard.
>  If
> so, why not control/change the value of "shards" param in the request to
> your
> front-end Solr (aka distributed request dispatcher) within your app, which
> is
> the one calling Solr?
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Dmitry Kan 
> > To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> > Sent: Thu, June 2, 2011 7:00:53 AM
> > Subject: query routing with shards
> >
> > Hello all,
> >
> > We have currently several pretty fat logically isolated shards  with the
> same
> > schema / solrconfig (indices are separate). We currently have  one single
> > front end SOLR (1.4) for the client code calls. Since a client  code
> query
> > usually hits only one shard, we are considering making a smart  routing
> of
> > queries to the shards they map to. Can you please give some  pointers as
> to
> > what would be an optimal way to achieve such a routing inside  the front
> end
> > solr? Is there a way to configure mapping inside the  solrconfig?
> >
> > Thanks.
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>

-- 
Regards,

Dmitry Kan

Re: How to display search results of solr in to other application.