Trouble handling Unit symbol

2012-03-30 Thread Rajani Maski
Hi,

We have data having such symbols like :  µ


Indexed data has  -Dose:"0 µL"
Language type - "English"


Now , when  it is searched as  - Dose:"0 µL"
Number of document matched = 0


Query Q value observed  : S257:"0 µL/injection"




*Any solution to handle such cases? *

Thanks & Regards,
Rajani
*
*
*
*


Re: Trouble handling Unit symbol

2012-03-30 Thread Paul Libbrecht
Rajani,

you need to look at the analysis tools of solr-admin, or even luke, to help you.

paul


Le 30 mars 2012 à 10:01, Rajani Maski a écrit :

> Hi,
> 
> We have data having such symbols like :  µ
> 
> 
> Indexed data has  -Dose:"0 µL"
> Language type - "English"
> 
> 
> Now , when  it is searched as  - Dose:"0 µL"
> Number of document matched = 0
> 
> 
> Query Q value observed  : S257:"0 µL/injection"
> 
> 
> 
> 
> *Any solution to handle such cases? *
> 
> Thanks & Regards,
> Rajani
> *
> *
> *
> *



Re: UTF-8 encoding

2012-03-30 Thread henri.gour...@laposte.net
Paul,

velocity.properties are set.
One thing I am not 100% sure about is where this file should reside?
I have placed in in the example/solr/conf/velocity folder (where the .vm
files reside).

Cheers,
Henri


--
View this message in context: 
http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3870398.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: query help

2012-03-30 Thread Ahmet Arslan
> b) As i have explained i have result set( documents ) and
> each document
> contains a fields "*ad_text*" (with other fields also) which
> is
> multivalued..storing some tags say "B1, B2, B3" in each. bt
> order of tags
> are different for each doc. say (B1, B2, B3) *for
> doc1*,  B3,B1 B2*, for
> doc2*, B1, B3, B2*, doc3*, B2, B3, B1* for doc4*
> 
> if i search for B1: result should come in following order:
> doc1,doc3,doc2,doc4
> (As B1 is first value in maltivalued result for doc1and
> doc3, and B1 is in
> 2nd value in doc2 while  B1 is at 3rd in doc4  )
> if i search for B2: result should come in following order:
> doc4
> ,doc1,doc3,doc2


I have an idea that could work. Please note that this is untested.

Insert an artificial token at index time. e.g. your multivalued field becomes 
from "B1, B2, B3" to "ARTIFICIALTOKEN, B1, B2, B3"

And fire a query like &q=multivaluedField:"ARTIFICIALTOKEN B2"~100^10

Sort by relevancy score (which is the default) should do the trick.

"It may be desirable to boost the score of documents with query terms that 
appear closer together Phrase queries with slop will score higher when the 
terms are closer together." 

http://wiki.apache.org/solr/SolrRelevancyCookbook#Term_Proximity



Re: SolrCloud

2012-03-30 Thread Erick Erickson
Zookeeper is the "meta data" repository. It's in charge of keeping the
state of the cluster, which machines are up/down, etc. It's also
where the bookkeeping for bringing on additional shards lives.

Best
Erick

On Fri, Mar 30, 2012 at 12:30 AM, asia  wrote:
> Ok.Then what does exactly zookeeper do in Solrcloud?Why we use?I am geetting
> query response from both shards even without using zookeeper.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-tp3867086p3869896.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Quantiles in SOLR ???

2012-03-30 Thread Erick Erickson
I really don't understand what you're trying
to accomplish. What is a "quantities function"?
What do you want it to do?

Best
Erick

On Fri, Mar 30, 2012 at 2:50 AM, Kashif Khan  wrote:
> Hi all,
>
> I am doing R&D about SOLR using any quantiles function for a set. I need a
> quick-start road-map for modifying  that quantiles function in my SOLR
> plugin. I am thinking that it might be using any third party tools or
> library for it.
>
> --
> Kashif Khan
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Quantiles-in-SOLR-tp3870084p3870084.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using jetty's GzipFilter in the example solr.war

2012-03-30 Thread mechravi25
Hi, 

I tried by including the following filter tag inside the web.xml in the
/apache-solr-1.5-dev/example/work/Jetty/webapp/WEB-INF


GzipFilter
org.mortbay.servlet.GzipFilter

  mimeTypes
 
text/html,text/plain,text/xml,application/xhtml+xml,text/css,application/javascript,image/svg+xml

  
  
GzipFilter
/*
  

This is giving the following error again

java.lang.ClassNotFoundException: org.mortbay.servlet.GzipFilter 
Failed startup of context
org.mortbay.jetty.webapp.WebAppContext@b8bef7{/solr,jar:file:/apache-solr-1.4.0/example/webapps/solr.war!/}
javax.servlet.UnavailableException: org.mortbay.servlet.GzipFilter 


Then, I tried by placing the org.mortbay.servlet.GzipFilter jar file in all
the lib folders; but also, its resulting in the same exception. I also tried
to add the above filter tag to the web.xml  present in the following path
\apache-solr-1.4.0\src\webapp\web\WEB-INF


Finally, I tried to add the above tag to the solr.war file (in the web.xml
file after extracting the files) and then started the server again. But, its
resulting in the same error.

Can you tell me where im going wrong? Can you guide me on this?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-jetty-s-GzipFilter-in-the-example-solr-war-tp1894069p3870625.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr advanced boosting

2012-03-30 Thread mads
Hi Martin,

Thanks for the reply. I have been reading a bit about the way you suggest,
so I think I will try to look further into it, unless anybody else has
comments or ideas.

Thanks again.
Regards Mads

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-advanced-boosting-tp3867025p3870661.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using jetty's GzipFilter in the example solr.war

2012-03-30 Thread Dmitry Kan
you didn't say, which jetty version you are using. This blog post tells how
to configure depending on it:

http://blog.max.berger.name/2010/01/jetty-7-gzip-filter.html

On Fri, Mar 30, 2012 at 3:00 PM, mechravi25  wrote:

> Hi,
>
> I tried by including the following filter tag inside the web.xml in the
> /apache-solr-1.5-dev/example/work/Jetty/webapp/WEB-INF
>
> 
>GzipFilter
>org.mortbay.servlet.GzipFilter
>
>  mimeTypes
>
>
> text/html,text/plain,text/xml,application/xhtml+xml,text/css,application/javascript,image/svg+xml
>
>  
>  
>GzipFilter
>/*
>  
>
> This is giving the following error again
>
> java.lang.ClassNotFoundException: org.mortbay.servlet.GzipFilter
> Failed startup of context
> org.mortbay.jetty.webapp.WebAppContext@b8bef7
> {/solr,jar:file:/apache-solr-1.4.0/example/webapps/solr.war!/}
> javax.servlet.UnavailableException: org.mortbay.servlet.GzipFilter
>
>
> Then, I tried by placing the org.mortbay.servlet.GzipFilter jar file in all
> the lib folders; but also, its resulting in the same exception. I also
> tried
> to add the above filter tag to the web.xml  present in the following path
> \apache-solr-1.4.0\src\webapp\web\WEB-INF
>
>
> Finally, I tried to add the above tag to the solr.war file (in the web.xml
> file after extracting the files) and then started the server again. But,
> its
> resulting in the same error.
>
> Can you tell me where im going wrong? Can you guide me on this?
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-jetty-s-GzipFilter-in-the-example-solr-war-tp1894069p3870625.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,

Dmitry Kan


Re: Solr advanced boosting

2012-03-30 Thread Jamie Johnson
Another option is to use something like edismax
http://wiki.apache.org/solr/ExtendedDisMax.  You simply assign your qf
as something like title^10 brand^5 description^1 and then do a sort
based on price/discount, i.e. sort=price asc, discount desc

On Fri, Mar 30, 2012 at 8:17 AM, mads  wrote:
> Hi Martin,
>
> Thanks for the reply. I have been reading a bit about the way you suggest,
> so I think I will try to look further into it, unless anybody else has
> comments or ideas.
>
> Thanks again.
> Regards Mads
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-advanced-boosting-tp3867025p3870661.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud

2012-03-30 Thread Mark Miller
If you want to be able to continue when a node goes down, you need at least one 
replica for each shard. Then if a node goes down, the replica will continue 
serving requests for that shard.

If you have no replicas and a node goes down, requests would return only 
partial results! We will support this in the future, with a warning in the 
returned header that the full results were not available, but currently we do 
not.

On Mar 30, 2012, at 12:30 AM, asia wrote:

> Ok.Then what does exactly zookeeper do in Solrcloud?Why we use?I am geetting
> query response from both shards even without using zookeeper.
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-tp3867086p3869896.html
> Sent from the Solr - User mailing list archive at Nabble.com.

- Mark Miller
lucidimagination.com













Re: [Announce] Solr 4.0 with RankingAlgorithm 1.4.1, NRT now supports both RankingAlgorithm and Lucene

2012-03-30 Thread Nagendra Nagarajayya
The NRT implementation which is different from the soft commit 
implementation is being contributed back to the Solr source. This should 
happen anytime now.


The RA is closed source so I am not sure how this could be contributed 
or made available as a module. Will explore this option.


Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org


On 3/29/2012 9:38 PM, William Bell wrote:

Why don't yu contribute RA to the source so that it is a
feature/module inside SOLR?

On Thu, Mar 29, 2012 at 8:32 AM, Nagendra Nagarajayya
  wrote:

It is from build 2012-03-19 from the trunk (part of the email). No fork.


Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

On 3/29/2012 7:20 AM, Bernd Fehling wrote:

Nothing against RankingAlgorithm and your work, which sounds great, but
I think that YOUR "Solr 4.0" might confuse some Solr users and/or newbees.
As far as I know the next official release will be 3.6.

So your "Solr 4.0" is a trunk snapshot or what?

If so, which revision number?

Or have you done a fork and produced a stable Solr 4.0 of your own?

Regards
Bernd


Am 29.03.2012 15:49, schrieb Nagendra Nagarajayya:

I am very excited to announce the availability of Solr 4.0 with
RankingAlgorithm 1.4.1 (NRT support) (build 2012-03-19). The NRT
implementation
now supports both RankingAlgorithm and Lucene.

RankingAlgorithm 1.4.1 has improved performance over the earlier release
(1.4) and supports the entire Lucene Query Syntax, ą and/or boolean
queries and is compatible with the new Lucene 4.0 api.

You can get more information about NRT performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x

You can download Solr 4.0 with RankingAlgorithm 1.4.1 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org











Solr 3.1 JDBC DataImportHandler doesn't work with Tomcat v7.0.23

2012-03-30 Thread Bill Kratochvil
The DataImportHandler stopped working for our JDBC connection to SQL Server 
after deploying to a new production environment.

When querying the import status it would just report "busy".   We attached the 
SQL Profiler to our SQL Server and saw that when an import was requested that 
it touched the SQL Server to do an authentication test (can't recall exact 
message) so we knew the connection strings and configuration were correct, but 
that is where communications ended.

We attached the SQL Profiler to a functioning "development" environment and 
found that after initiating import that there was a lot more activity after the 
authentication line.   This helped us isolate it to environment versus 
configuration.  Note: we thought perhaps it was related to Java 32bit versus 
64bit environment (as these differed also) but after installing consistent Java 
environments we continued to have the problem.

We rolled Tomcat back to v7.0.19, to be consistent with our other production 
environments, and our imports started functioning again.


Position Solr results

2012-03-30 Thread Manuel Antonio Novoa Proenza





Hi 

I'm not good with English, and for this reason I had to resort to a translator. 

I have the following question ... 

How I can get the position in which there is a certain website in solr results 
generated for a given search criteria ? 

regards 

ManP 






10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Solr adding and commit

2012-03-30 Thread Erick Erickson
Commit as rarely as possible. But let's be clear what "commit" means,
I'm talking about an actual call to _server.commit() as opposed
to _server.add(List). I don't issue an explicit commit until all the documents
are indexed, just rely on commitWithin to keep things flowing.

I'm guessing you're talking SolrJ here BTW.

By and large I prefer sending fewer packets with lots of docs
in them, I think 5-10K is fine. You probably want to set the
commitWithin variable to limit how long it takes to some rather long
period (i often use 10 minutes or even longer).

But really the packet size is irrelevant if your parsing/etc process isn't
keeping Solr busy. Throw a perf monitor on it (or just use Top or similar)
and see if you're pegging the CPU before worrying about tweaking your
packet size IMO...

Best
Erick

On Fri, Mar 30, 2012 at 1:00 PM, Daniel Persson  wrote:
> Hi Solr users.
>
> I've been using solr for a while now and got really good search potential.
>
> The last assignment was to take a really large dataset in files and load it
> to solr for searches.
>
> My solution was to build a tool that loads information with about 20
> threads reading data and submitting and because the data needs some
> preprocessing and the parsing isn't simple either.
> At the moment each thread creates up to 5000 documents with a size of about
> 1kb adds them to the server and commits the data. So small documents but
> about 30 million of them :)
>
> So my questions to you is the best practice for loading solr with small
> documents.
> Should I send larger chucks, commit less seldomly or smaller chucks and
> commit more often.
> The load is done to an separate core that isn't in use so there is no issue
> of waiting for commit.
>
> Eager to hear your suggestions.
>
> best regards
>
> Daniel


index the links having a certain website

2012-03-30 Thread Manuel Antonio Novoa Proenza





Hello 

I'm not good with English, and therefore I had to resort to a translator. 

I have the following question ... 

How I can index the links having a certain website ? 

regards 

ManP 



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Quantiles in SOLR ???

2012-03-30 Thread Walter Underwood
Quantiles require accessing the entire list of results, or at least, sorting by 
the interesting values, checking the total hits, then accessing the results 
list at the desired interval. So, with 3000 hits, get deciles by getting the 
first row, then the 301st row, the 601st row, etc.

This might be slow and require a lot of memory. Solr is optimized for showing 
the top few results. Anything that accesses all results or even requests rows 
far down the list can be very slow.

If your main use requires accessing all results, Solr may not be the right 
choice. A relational database is designed for efficient operations over the 
entire set of results.

wunder

On Mar 29, 2012, at 11:50 PM, Kashif Khan wrote:

> Hi all,
> 
> I am doing R&D about SOLR using any quantiles function for a set. I need a
> quick-start road-map for modifying  that quantiles function in my SOLR
> plugin. I am thinking that it might be using any third party tools or
> library for it.
> 
> --
> Kashif Khan
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Quantiles-in-SOLR-tp3870084p3870084.html
> Sent from the Solr - User mailing list archive at Nabble.com.






Re: Quantiles in SOLR ???

2012-03-30 Thread Erick Erickson
Duh. quantiles. I was reading quantities. Old eyes.

On Fri, Mar 30, 2012 at 1:44 PM, Walter Underwood  wrote:
> Quantiles require accessing the entire list of results, or at least, sorting 
> by the interesting values, checking the total hits, then accessing the 
> results list at the desired interval. So, with 3000 hits, get deciles by 
> getting the first row, then the 301st row, the 601st row, etc.
>
> This might be slow and require a lot of memory. Solr is optimized for showing 
> the top few results. Anything that accesses all results or even requests rows 
> far down the list can be very slow.
>
> If your main use requires accessing all results, Solr may not be the right 
> choice. A relational database is designed for efficient operations over the 
> entire set of results.
>
> wunder
>
> On Mar 29, 2012, at 11:50 PM, Kashif Khan wrote:
>
>> Hi all,
>>
>> I am doing R&D about SOLR using any quantiles function for a set. I need a
>> quick-start road-map for modifying  that quantiles function in my SOLR
>> plugin. I am thinking that it might be using any third party tools or
>> library for it.
>>
>> --
>> Kashif Khan
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Quantiles-in-SOLR-tp3870084p3870084.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>


Re: Custom scoring question

2012-03-30 Thread Tomás Fernández Löbbe
But if you have that "score" in a field, you could use that field as part
of a function-query instead of directly sorting on it, that would mix this
"score" with the score calculated with other fields.

On Thu, Mar 29, 2012 at 5:49 PM, Darren Govoni  wrote:

> Yeah, I guess that would work. I wasn't sure if it would change relative
> to other documents. But if it were to be combined with other fields,
> that approach may not work because the calculation wouldn't include the
> scoring for other parts of the query. So then you have the dynamic score
> and what to do with it.
>
> On Thu, 2012-03-29 at 16:29 -0300, Tomás Fernández Löbbe wrote:
> > Can't you simply calculate that at index time and assign the result to a
> > field, then sort by that field.
> >
> > On Thu, Mar 29, 2012 at 12:07 PM, Darren Govoni 
> wrote:
> >
> > > I'm going to try index time per-field boosting and do the boost
> > > computation at index time and see if that helps.
> > >
> > > On Thu, 2012-03-29 at 10:08 -0400, Darren Govoni wrote:
> > > > Hi,
> > > >  I have a situation I want to re-score document relevance.
> > > >
> > > > Let's say I have two fields:
> > > >
> > > > text: The quick brown fox jumped over the white fence.
> > > > terms: fox fence
> > > >
> > > > Now my queries come in as:
> > > >
> > > > terms:[* TO *]
> > > >
> > > > and Solr scores them on that field.
> > > >
> > > > What I want is to rank them according to the distribution of field
> > > > "terms" within field "text". Which is a per document calculation.
> > > >
> > > > Can this be done with any kind of dismax? I'm not searching for known
> > > > terms at query time.
> > > >
> > > > If not, what is the best way to implement a custom scoring handler to
> > > > perform this calculation and re-score/sort the results?
> > > >
> > > > thanks for any tips!!!
> > > >
> > >
> > >
> > >
>
>
>


Re: Unload(true) doesn't delele Index file when unloading a core

2012-03-30 Thread vybe3142
Thanks, good to know. I'll program around this.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unload-true-doesn-t-delele-Index-file-when-unloading-a-core-tp3862816p3872022.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sorting each group using different fields

2012-03-30 Thread prakash.balaji
We have one single query which returns products belonging to multiple
subcategories, we group the products by subcategory, the requirement we have
now is that we need to sort the products in each subcategory using its own
sort order. We can't use single field to sort because a same product could
be available in multiple subcategories. There is an example below to explain
our use case, any help would be much appreciated.

Jeans
   skinny - P1, P2 , P3
   bootcut - P3,P4,P5

if you see we have P3 shared between both subcategories but it appears last
on skinny and first on bootcut.

What we have is a dynamic sort order on each product

P1 - skinny_sort_order = 1
P2 - skinny_sort_order = 2
P3 - skinny_sort_order = 3, bootcut_sort_order = 1
P4 - bootcut_sort_order = 2
P5 - bootcut_sort_order = 3

group.query={cateogory:skinny}&group.sort=skinny_sort_order&group.query={category:bootcut}&group.sort=bootcut_sort_order
 

is not giving us the result as it is trying to sort on a combination field
of 2 sort orders. 

Running this as separate queries will solve the issue but we would be firing
n queries depending on as many subcategories we have. Moreover multiple
queries is out of option for us because we retrieve facet information as
part of this query because this query looks at all the document on the
resultset.

Thanks,
Prakash



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-each-group-using-different-fields-tp3871997p3871997.html
Sent from the Solr - User mailing list archive at Nabble.com.


How do I use localparams/joins using SolrJ and/or the Admin GUI

2012-03-30 Thread vybe3142
Here's a JOIN query using local params that I can sucessfully execute on a
browser window:


When I paste the relevant part of the query into the SOLR admin  UI query
interface, 
{!join+from=join_id+to=id}attributes_AUTHORS.4:4, I fail to retrieve any
documents

The query transaltes to :





Server stack trace


There appears to be some sort of URL translation going on but I'd like to
understand how to make this work.

Next, I'd like to understand how to make this work with SOLRJ

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-use-localparams-joins-using-SolrJ-and-or-the-Admin-GUI-tp3872088p3872088.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: pattern error in PatternReplaceCharFilterFactory

2012-03-30 Thread Chris Hostetter

: This works. Other patterns tried were: \p{InLatin-1_Supplement} or \p{Latin}
: These throw an exception, from the log:
: ***
: Mar 29, 2012 5:56:45 PM org.apache.solr.common.SolrException log
: SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for
: [schema.xml] fieldType:Plugin init failure for [schema.xml]
: analyzer/charFilter:Configuration Error: 'pattern' can not be parsed in
: org.apache.solr.analysis.PatternReplaceCharFilterFactory

Immediately below that should have been more details on what error 
generated by the Java regex engine when trying to parse your pattern. 
(something like "caused by: ...")  which is fairly crucial to understand 
what might be going wrong.

: Can anybody help? Or, might this be a java issue?

I suspect it's a java issue ... you didn't mention which version of java 
you are using, and i don't know which java versions corripsond to which 
unicode versions in terms of the block names they support, but is it 
possible some of those patterns are only legal in a newer version of java 
then you have?

have you tried running a simple little java main() to verify that those 
patterns are legal in your JVM?

public static final class PatTest {
  public static final void main(String[] args) throws Exception {
String pat = args[0];
String input = args[1];
Pattern p = Pattern.compile(pat);
System.out.println(input + " does " + 
   (p.matcher(input).matches() ? "" : "NOT") +
   " match " + pat);
  }
}


-Hoss


Re: Trouble handling Unit symbol

2012-03-30 Thread Chris Hostetter

: We have data having such symbols like :  µ
: Indexed data has  -Dose:"0 µL"
: Now , when  it is searched as  - Dose:"0 µL"
...
: Query Q value observed  : S257:"0 µL/injection"

First off: your "when searched as" example does not match up to your 
"Query Q" observed value (ie: field queries, extra "/injection" text at 
the end) suggesting that you maybe cut/paste something you didn't mean to 
-- so take the rest of this advice with a grain of salt.

If i ignore your "when it is searched as" exampleand focus entirely on 
what you say you've indexed the data as, and the Q value you are sing (in 
what looks like the echoParams output) then the first thing that jumps out 
at me is that it looks like your servlet container (or perhaps your web 
browser if that's where you tested this) is not dealing with the unicode 
correctly -- because allthough i see a "µ" in the first three lines i 
quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it 
preceeded by a "Â" (UTF8: 0xC3 0x82) ... suggesting that perhaps the "µ" 
did not get URL encoded properly when the request was made to your servlet 
container?

In particular, you might want to take a look at...

https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
The example/exampledocs/test_utf8.sh script included with solr




-Hoss

Distributed grouping issue

2012-03-30 Thread Young, Cody
Hi All,

I'm having an issue getting distributed grouping  working on trunk (Mar 29, 
2012).

If I send this query:

http://localhost:8086/solr/core0/select/?q=*:*&group=false 
&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I get 260,000 results. As soon as I change to using grouping:

http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I only get 32,000 results. (the number of documents in a single core.)

The field that I am grouping on is defined as:





The document id:






document_id

Anyone else experiencing this? Any ideas?

Thanks,
Cody


RE: Distributed grouping issue

2012-03-30 Thread Young, Cody
I forgot to mention, I can see the distributed requests happening in the logs:

Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core2] webapp=/solr path=/select 
params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core2&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
 status=0 QTime=2
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core4] webapp=/solr path=/select 
params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
 status=0 QTime=1
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core1] webapp=/solr path=/select 
params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core1&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
 status=0 QTime=1
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core3] webapp=/solr path=/select 
params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core3&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
 status=0 QTime=1
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core0] webapp=/solr path=/select 
params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core0&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
 status=0 QTime=1
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core6] webapp=/solr path=/select 
params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
 status=0 QTime=0
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core7] webapp=/solr path=/select 
params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core7&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
 status=0 QTime=3
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core5] webapp=/solr path=/select 
params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core5&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
 status=0 QTime=1
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core4] webapp=/solr path=/select 
params={distrib=false&group.distributed.second=true&wt=javabin&version=2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
 status=0 QTime=2
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core6] webapp=/solr path=/select 
params={distrib=false&group.distributed.second=true&wt=javabin&version=2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
 status=0 QTime=2
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core4] webapp=/solr path=/select 
params={NOW=1333151353217&shard.url=localhost:8086/solr/core4&ids=4182445488-535180165,3554417600-527549713,4608765424-526014561,3524954944-531590393,4183765296-514134497,4206259648-530219973,3465497912-534955957,4213143392-534186349,3140802904-538688961,4328299312-533482537&q=*:*&distrib=false&group.field=group_field&wt=javabin&isShard=true&version=2&rows=10}
 status=0 QTime=5
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core0] webapp=/solr path=/select/ 
params={shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086

Re: Slow first searcher with facet on bibliographic data in Master - Slave

2012-03-30 Thread Chris Hostetter
: I do have a firstSearcher, but currently coldSearcher is set to true. 
: But doesn't this just mean that that any searches will block while the 
: first searcher is running? This is how the comment describes first 
: searcher. It would almost give the same effect; that some searches take 
: a long time.
: 
: What I am looking for is after receiving replicated data, do first 
: searcher and then switch to new index.

"firstSearcher" is literally the very first searcher used when the 
SolrCore is loaded -- it is *NOT* the first searcher after replication, 
those are "newSearcher" instances.


-Hoss


Large Index and OutOfMemoryError: Map failed

2012-03-30 Thread Gopal Patwa
*I need help!!*

*
*

*I am using Solr 4.0 nightly build with NRT and I often get this error
during auto commit "**java.lang.OutOfMemoryError:* *Map* *failed". I
have search this forum and what I found it is related to OS ulimit
setting, please se below my ulimit settings. I am not sure what ulimit
setting I should have? and we also get "**java.net.SocketException:*
*Too* *many* *open* *files" NOT sure how many open file we need to
set?*


I have 3 core with index size : core1 - 70GB, Core2 - 50GB and Core3 -
15GB, with Single shard

*
*

*We update the index every 5 seconds, soft commit every 1 second and
hard commit every 15 minutes*

*
*

*Environment: Jboss 4.2, JDK 1.6 , CentOS, JVM Heap Size = 24GB*

*
*

ulimit:

core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 401408
max locked memory   (kbytes, -l) 1024
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 401408
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited


*
*

*ERROR:*

*
*

*2012-03-29* *15:14:08*,*560* [] *priority=ERROR* *app_name=*
*thread=pool-3-thread-1* *location=CommitTracker* *line=93* *auto*
*commit* *error...:java.io.IOException:* *Map* *failed*
*at* *sun.nio.ch.FileChannelImpl.map*(*FileChannelImpl.java:748*)
*at* 
*org.apache.lucene.store.MMapDirectory$MMapIndexInput.*<*init*>(*MMapDirectory.java:293*)
*at* 
*org.apache.lucene.store.MMapDirectory.openInput*(*MMapDirectory.java:221*)
*at* 
*org.apache.lucene.codecs.lucene40.Lucene40PostingsReader.*<*init*>(*Lucene40PostingsReader.java:58*)
*at* 
*org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsProducer*(*Lucene40PostingsFormat.java:80*)
*at* 
*org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader$1.visitOneFormat*(*PerFieldPostingsFormat.java:189*)
*at* 
*org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$VisitPerFieldFile.*<*init*>(*PerFieldPostingsFormat.java:280*)
*at* 
*org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader$1.*<*init*>(*PerFieldPostingsFormat.java:186*)
*at* 
*org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.*<*init*>(*PerFieldPostingsFormat.java:186*)
*at* 
*org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer*(*PerFieldPostingsFormat.java:256*)
*at* 
*org.apache.lucene.index.SegmentCoreReaders.*<*init*>(*SegmentCoreReaders.java:108*)
*at* 
*org.apache.lucene.index.SegmentReader.*<*init*>(*SegmentReader.java:51*)
*at* 
*org.apache.lucene.index.IndexWriter$ReadersAndLiveDocs.getReader*(*IndexWriter.java:494*)
*at* 
*org.apache.lucene.index.BufferedDeletesStream.applyDeletes*(*BufferedDeletesStream.java:214*)
*at* 
*org.apache.lucene.index.IndexWriter.applyAllDeletes*(*IndexWriter.java:2939*)
*at* 
*org.apache.lucene.index.IndexWriter.maybeApplyDeletes*(*IndexWriter.java:2930*)
*at* 
*org.apache.lucene.index.IndexWriter.prepareCommit*(*IndexWriter.java:2681*)
*at* 
*org.apache.lucene.index.IndexWriter.commitInternal*(*IndexWriter.java:2804*)
*at* 
*org.apache.lucene.index.IndexWriter.commit*(*IndexWriter.java:2786*)
*at* 
*org.apache.solr.update.DirectUpdateHandler2.commit*(*DirectUpdateHandler2.java:391*)
*at* 
*org.apache.solr.update.CommitTracker.run*(*CommitTracker.java:197*)
*at* 
*java.util.concurrent.Executors$RunnableAdapter.call*(*Executors.java:441*)
*at* 
*java.util.concurrent.FutureTask$Sync.innerRun*(*FutureTask.java:303*)
*at* *java.util.concurrent.FutureTask.run*(*FutureTask.java:138*)
*at* 
*java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301*(*ScheduledThreadPoolExecutor.java:98*)
*at* 
*java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run*(*ScheduledThreadPoolExecutor.java:206*)
*at* 
*java.util.concurrent.ThreadPoolExecutor$Worker.runTask*(*ThreadPoolExecutor.java:886*)
*at* 
*java.util.concurrent.ThreadPoolExecutor$Worker.run*(*ThreadPoolExecutor.java:908*)
*at* *java.lang.Thread.run*(*Thread.java:662*)*Caused* *by:*
*java.lang.OutOfMemoryError:* *Map* *failed*
*at* *sun.nio.ch.FileChannelImpl.map0*(*Native* *Method*)
*at* *sun.nio.ch.FileChannelImpl.map*(*FileChannelImpl.java:745*)
*...* *28* *more*

*
*

*
*

*


SolrConfig.xml:



false
10
2147483647
1
4096