Re: Multi-words synonyms matching

2012-04-11 Thread elisabeth benoit
<' mapping instead? Something
< mairie
 Have you tried the "=>' mapping instead? Something
> like
> hotel de ville => mairie
> might work for you.
>
> Best
> Erick
>
> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
>  wrote:
> > Hello,
> >
> > I've read several post on this issue, but can't find a real solution to
> my
> > multi-words synonyms matching problem.
> >
> > I have in my synonyms.txt an entry like
> >
> > mairie, hotel de ville
> >
> > and my index time analyzer is configured as followed for synonyms.
> >
> >  > ignoreCase="true" expand="true"/>
> >
> > The problem I have is that now "mairie" matches with "hotel" and I would
> > only want "mairie" to match with "hotel de ville" and "mairie".
> >
> > When I look into the analyzer, I see that "mairie" is mapped into
> "hotel",
> > and words "de ville" are added in second and third position. To change
> > that, I tried to do
> >
> >  > ignoreCase="true" expand="true"
> > tokenizerFactory="solr.KeywordTokenizerFactory"/> (as I read in one post)
> >
> > and I can see now in the analyzer that "mairie" is mapped to "hotel de
> > ville", but now when I have query "hotel de ville", it doesn't match at
> all
> > with "mairie".
> >
> > Anyone has a clue of what I'm doing wrong?
> >
> > I'm using Solr 3.4.
> >
> > Thanks,
> > Elisabeth
>


Re: using solr to do a 'match'

2012-04-11 Thread jmlucjav
I have done that by getting X top hits, finding the best match among them
(combination of Levenshtein distance, contains...tweaked the code till
testing showed good results), and then deciding if the candidate was a match
or not, again based in custom code plus a user defined leniency value

xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-solr-to-do-a-match-tp3901436p3901884.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: using solr to do a 'match'

2012-04-11 Thread Mikhail Khludnev
Hi,

This use case is similar to matching boolean expression problem. You can
find recent thread about it. I have an idea that we can introduce
disjunction query with dynamic mm (minShouldMatch parameter
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/BooleanQuery.html#setMinimumNumberShouldMatch(int))
i.e. 'match these clauses disjunctively but for every document use
value
from field cache of field xxxCount as a minShouldMatch parameter'. Also
norms can be used as a source for dynamics mm values.

Wdyt?

On Wed, Apr 11, 2012 at 10:08 AM, Li Li  wrote:

> it's not possible now because lucene don't support this.
> when doing disjunction query, it only record how many terms match this
> document.
> I think this is a common requirement for many users.
> I suggest lucene should divide scorer to a matcher and a scorer.
> the matcher just return which doc is matched and why/how the doc is
> matched.
> especially for disjuction query, it should tell which term matches and
> possible other
> information such as tf/idf and the distance of terms(to support proximity
> search).
> That's the matcher's job. and then the scorer(a ranking algorithm) use
> flexible algorithm
> to score this document and the collector can collect it.
>
> On Wed, Apr 11, 2012 at 10:28 AM, Chris Book  wrote:
>
> > Hello, I have a solr index running that is working very well as a search.
> >  But I want to add the ability (if possible) to use it to do matching.
>  The
> > problem is that by default it is only looking for all the input terms to
> be
> > present, and it doesn't give me any indication as to how many terms in
> the
> > target field were not specified by the input.
> >
> > For example, if I'm trying to match to the song title "dust in the wind",
> > I'm correctly getting a match if the input query is "dust in wind".  But
> I
> > don't want to get a match if the input is just "dust".  Although as a
> > search "dust" should return this result, I'm looking for some way to
> filter
> > this out based on some indication that the input isn't close enough to
> the
> > output.  Perhaps if I could get information that that the number of input
> > terms is much less than the number of terms in the field.  Or something
> > else along those line?
> >
> > I realize that this isn't the typical use case for a search, but I'm just
> > looking for some suggestions as to how I could improve the above example
> a
> > bit.
> >
> > Thanks,
> > Chris
> >
>



-- 
Sincerely yours
Mikhail Khludnev
ge...@yandex.ru


 


Re: using solr to do a 'match'

2012-04-11 Thread Li Li
I searched my mail but nothing found.
the thread searched by key words "boolean expression" is Indexing Boolean
Expressions from joaquin.delgado
to tell which terms are matched, for BooleanScorer2, a simple method is to
modify DisjunctionSumScorer and add a BitSet to record matched scorers.
When collector collect this document, it can get the scorer and recursively
find the matched terms.
But I think maybe it's better to add a component maybe named matcher that
do the matching job, and scorer use the information from the matcher and do
ranking things.

On Wed, Apr 11, 2012 at 4:32 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hi,
>
> This use case is similar to matching boolean expression problem. You can
> find recent thread about it. I have an idea that we can introduce
> disjunction query with dynamic mm (minShouldMatch parameter
>
> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/BooleanQuery.html#setMinimumNumberShouldMatch(int)
> )
> i.e. 'match these clauses disjunctively but for every document use
> value
> from field cache of field xxxCount as a minShouldMatch parameter'. Also
> norms can be used as a source for dynamics mm values.
>
> Wdyt?
>
> On Wed, Apr 11, 2012 at 10:08 AM, Li Li  wrote:
>
> > it's not possible now because lucene don't support this.
> > when doing disjunction query, it only record how many terms match this
> > document.
> > I think this is a common requirement for many users.
> > I suggest lucene should divide scorer to a matcher and a scorer.
> > the matcher just return which doc is matched and why/how the doc is
> > matched.
> > especially for disjuction query, it should tell which term matches and
> > possible other
> > information such as tf/idf and the distance of terms(to support proximity
> > search).
> > That's the matcher's job. and then the scorer(a ranking algorithm) use
> > flexible algorithm
> > to score this document and the collector can collect it.
> >
> > On Wed, Apr 11, 2012 at 10:28 AM, Chris Book 
> wrote:
> >
> > > Hello, I have a solr index running that is working very well as a
> search.
> > >  But I want to add the ability (if possible) to use it to do matching.
> >  The
> > > problem is that by default it is only looking for all the input terms
> to
> > be
> > > present, and it doesn't give me any indication as to how many terms in
> > the
> > > target field were not specified by the input.
> > >
> > > For example, if I'm trying to match to the song title "dust in the
> wind",
> > > I'm correctly getting a match if the input query is "dust in wind".
>  But
> > I
> > > don't want to get a match if the input is just "dust".  Although as a
> > > search "dust" should return this result, I'm looking for some way to
> > filter
> > > this out based on some indication that the input isn't close enough to
> > the
> > > output.  Perhaps if I could get information that that the number of
> input
> > > terms is much less than the number of terms in the field.  Or something
> > > else along those line?
> > >
> > > I realize that this isn't the typical use case for a search, but I'm
> just
> > > looking for some suggestions as to how I could improve the above
> example
> > a
> > > bit.
> > >
> > > Thanks,
> > > Chris
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> ge...@yandex.ru
>
> 
>  
>


Re: Large Index and OutOfMemoryError: Map failed

2012-04-11 Thread Michael McCandless
Hi,

65K is already a very large number and should have been sufficient...

However: have you increased the merge factor?  Doing so increases the
open files (maps) required.

Have you disabled compound file format?  (Hmmm: I think Solr does so
by default... which is dangerous).  Maybe try enabling compound file
format?

Can you "ls -l" your index dir and post the results?

It's also possible Solr isn't closing the old searchers quickly enough
... I don't know the details on when Solr closes old searchers...

Mike McCandless

http://blog.mikemccandless.com



On Tue, Apr 10, 2012 at 11:35 PM, Gopal Patwa  wrote:
> Michael, Thanks for response
>
> it was 65K as you mention the default value for "cat
> /proc/sys/vm/max_map_count" . How we determine what value this should be?
>  is it number of document during hard commit in my case it is 15 minutes?
> or it is number of  index file or number of documents we have in all cores.
>
> I have raised the number to 140K but I still get when it reaches to 140K,
> we have to restart jboss server to free up the map count, sometime OOM
> error happen during "*Error opening new searcher"*
>
> is making this number to unlimited is only solution?''
>
>
> Error log:
>
> *location=CommitTracker line=93 auto commit
> error...:org.apache.solr.common.SolrException: Error opening new
> searcher
>        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1138)
>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1251)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:409)
>        at org.apache.solr.update.CommitTracker.run(CommitTracker.java:197)
>        at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>        at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)Caused by:
> java.io.IOException: Map failed
>        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748)
>        at 
> org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:293)
>        at 
> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:221)
>        at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$VisitPerFieldFile.(PerFieldPostingsFormat.java:262)
>        at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$1.(PerFieldPostingsFormat.java:316)
>        at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.files(PerFieldPostingsFormat.java:316)
>        at org.apache.lucene.codecs.Codec.files(Codec.java:56)
>        at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:423)
>        at 
> org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:215)
>        at 
> org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:2220)
>        at 
> org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:497)
>        at 
> org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:477)
>        at 
> org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201)
>        at 
> org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119)
>        at 
> org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148)
>        at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:438)
>        at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:553)
>        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:354)
>        at 
> org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:258)
>        at 
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:243)
>        at 
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:250)
>        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1091)
>        ... 11 moreCaused by: java.lang.OutOfMemoryError: Map failed
>        at sun.nio.ch.FileChannelImpl.map0(Native Method)
>        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)*
>
>
>
> And one more issue we came across i.e
>
> On Sat, Mar 31, 2012 at 3:15 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> It's the virtual memory limit that matters; yours says unlimited below
>> (good!

Solr Http Caching

2012-04-11 Thread Kissue Kissue
Hi,

Are any of you using Solr Http caching? I am interested to see how people
use this functionality. I have an index that basically changes once a day
at midnight. Is it okay to enable Solr Http caching for such an index and
set the max age to 1 day? Any potential issues?

I am using solr 3.5 with SolrJ.

Thanks.


Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-11 Thread Darren Govoni
Hard to say why its not working for you. Start with a fresh Solr and
work forward from there or back out your configs and plugins until it
works again.

On Tue, 2012-04-10 at 17:15 -0400, Benson Margulies wrote:
> In my cloud configuration, if I push
> 
> 
>   *:*
> 
> 
> followed by:
> 
> 
> 
> I get no errors, the log looks happy enough, but the documents remain
> in the index, visible to /query.
> 
> Here's what seems my relevant bit of solrconfig.xml. My URP only
> implements processAdd.
> 
>
> 
>  class="com.basistech.rni.solr.NameIndexingUpdateRequestProcessorFactory"/>
> 
> 
> 
>   
> 
> 
>  class="solr.XmlUpdateRequestHandler">
> 
>   RNI
> 
> 
> 




Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-11 Thread pcrao
Hi,

Any update on this?
Please let me know if you need additional information on this.

Thanks,
PC Rao.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3902171.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr 3.4 with nTiers >= 2: usage of ids param causes NullPointerException (NPE)

2012-04-11 Thread Dmitry Kan
Hello,

Hopefully this question is not too complex to handle, but I'm currently
stuck with it.

We have a system with nTiers, that is:

Solr front base ---> Solr front --> shards

Inside QueryComponent there is a method createRetrieveDocs(ResponseBuilder
rb) which collects doc ids of each shard and sends them in different
queries using the ids parameter:

[code]
sreq.params.add(ShardParams.IDS, StrUtils.join(ids, ','));
[/code]

This actually produces NPE (same as in
https://issues.apache.org/jira/browse/SOLR-1477) in the first tier, because
Solr front (on the second tier) fails to process such a query. I have tried
to fix this by using a unique field with a value of ids ORed (the following
code substitutes the code above):

[code]
  StringBuffer idsORed = new StringBuffer();
  for (Iterator iterator = ids.iterator(); iterator.hasNext();
) {
String next = iterator.next();

if (iterator.hasNext()) {
  idsORed.append(next).append(" OR ");
} else {
  idsORed.append(next);
}
  }

  sreq.params.add(rb.req.getSchema().getUniqueKeyField().getName(),
idsORed.toString());
[/code]

This works perfectly if for rows=n there is n or less hits from a
distributed query. However, if there are more than 2*n hits, the querying
fails with an NPE in a completely different component, which is
HighlightComponent (highlights are requested in the same query with
hl=true&hl.fragsize=5&hl.requireFieldMatch=true&hl.fl=targetTextField):

SEVERE: java.lang.NullPointerException
at
org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:161)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619)

It sounds like the ids of documents somehow get shuffled and the
instruction (only a hypothesis)

[code]
ShardDoc sdoc = rb.resultIds.get(id);
[/code]

returns sdoc=null, which causes the next line of code to fail with an NPE:

[code]
int idx = sdoc.positionInResponse;
[/code]

Am I missing anything? Can something be done for solving this issue?

Thanks.

-- 
Regards,

Dmitry Kan


Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-11 Thread Mikhail Khludnev
Hi,

it's hard to help until you tell us why you think that index is corrupted.
Logs&steps&stacktraces are useful.

Regards

On Wed, Apr 11, 2012 at 2:56 PM, pcrao  wrote:

> Hi,
>
> Any update on this?
> Please let me know if you need additional information on this.
>
> Thanks,
> PC Rao.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3902171.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
ge...@yandex.ru


 


Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-11 Thread Benson Margulies
See https://issues.apache.org/jira/browse/SOLR-3347. I can replace the
solrconfig.xml with the vanilla solrconfig.xml and the problem
remains.

On Wed, Apr 11, 2012 at 6:35 AM, Darren Govoni  wrote:
> Hard to say why its not working for you. Start with a fresh Solr and
> work forward from there or back out your configs and plugins until it
> works again.
>
> On Tue, 2012-04-10 at 17:15 -0400, Benson Margulies wrote:
>> In my cloud configuration, if I push
>>
>> 
>>   *:*
>> 
>>
>> followed by:
>>
>> 
>>
>> I get no errors, the log looks happy enough, but the documents remain
>> in the index, visible to /query.
>>
>> Here's what seems my relevant bit of solrconfig.xml. My URP only
>> implements processAdd.
>>
>>    
>>     
>>     > class="com.basistech.rni.solr.NameIndexingUpdateRequestProcessorFactory"/>
>>     
>>     
>>     
>>   
>>
>>     
>>   >                   class="solr.XmlUpdateRequestHandler">
>>     
>>       RNI
>>     
>>     
>>
>
>


Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-11 Thread Benson Margulies
I didn't have a _version_ field, since nothing in the schema says that
it's required!

On Wed, Apr 11, 2012 at 6:35 AM, Darren Govoni  wrote:
> Hard to say why its not working for you. Start with a fresh Solr and
> work forward from there or back out your configs and plugins until it
> works again.
>
> On Tue, 2012-04-10 at 17:15 -0400, Benson Margulies wrote:
>> In my cloud configuration, if I push
>>
>> 
>>   *:*
>> 
>>
>> followed by:
>>
>> 
>>
>> I get no errors, the log looks happy enough, but the documents remain
>> in the index, visible to /query.
>>
>> Here's what seems my relevant bit of solrconfig.xml. My URP only
>> implements processAdd.
>>
>>    
>>     
>>     > class="com.basistech.rni.solr.NameIndexingUpdateRequestProcessorFactory"/>
>>     
>>     
>>     
>>   
>>
>>     
>>   >                   class="solr.XmlUpdateRequestHandler">
>>     
>>       RNI
>>     
>>     
>>
>
>


Re: How to get a list of values of a specified field

2012-04-11 Thread a sd
The type of content is "solr.string", actually is a sequence of any
characters,"_",number,etc.

On Wed, Apr 11, 2012 at 7:06 PM, Marcelo Carvalho Fernandes <
mcf2...@gmail.com> wrote:

> What type of content do you have in this field?
>
> ---
> Marcelo Carvalho Fernandes
>
> On Wednesday, April 11, 2012, a sd  wrote:
> > hi,all.
> >  I want to get all values of a specified field,  this field is type
> of
> > "solr.string".
> >  I can achieve this object by using "facet" feature, but there is a
> > trouble : it respond me the all values by the default "facet" query. If
> > there are millions of values with a field ,or more, it is a disaster to
> > the application.  I thought, was there a way  by which i can account the
> > amount of values at first, and then i query a segment of values by
> > specified the "facet.offset" and "facet.limit" iteratively?
> > Thanks for your attention.
> > B.R.
> > murphy
> >
>
> --
> 
> Marcelo Carvalho Fernandes
> +55 21 8272-7970
> +55 21 2205-2786
>


Re: How to facet data from a multivalued field?

2012-04-11 Thread Thiago
Thank you very much, Erik. I just changed the fieldtype to String and it
worked as I expected. Now I can select the count of the series. Thanks again
and thanks the others too.

Thiago


Erik Hatcher-4 wrote
> 
> Thiago -
> 
> You'll want your series field to be of type "string".   If you also need
> that field searchable by the words within them, you can copyField to a
> separate "text" (or other analyzed) field type where you search on the
> tokenized field but facet on the "string" one.
> 
>   Erik
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-facet-data-from-a-multivalued-field-tp3897853p3902621.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-words synonyms matching

2012-04-11 Thread Jeevanandam Madanagopal
Elisabeth -

As you described, below mapping might suit for your need.
mairie => hotel de ville, mairie

mairie gets expanded to "hotel de ville" and "mairie" at index time.  So 
"mairie" and "hotel de ville" searchable on document.

However, still white space tokenizer splits at query time will be a problem as 
described by Markus.

--Jeevanandam

On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:

> <' mapping instead? Something
> < < mairie
> < 
> Yes, thanks, I've tried it but from what I undestand it doesn't solve my
> problem, since this means hotel de ville will be replace by mairie at
> index time (I use synonyms only at index time). So when user will ask
> "hôtel de ville", it won't match.
> 
> In fact, at index time I have mairie in my data, but I want user to be able
> to request "mairie" or "hôtel de ville" and have mairie as answer, and not
> have mairie as an answer when requesting "hôtel".
> 
> 
> < white
> < 
> < 
> < query
> < 
> Ok, I guess this means I have a problem. No simple solution since at query
> time my tokenizer do split on white spaces.
> 
> I guess my problem is more or less one of the problems discussed in
> 
> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
> 
> 
> Thanks a lot for your answers,
> Elisabeth
> 
> 
> 
> 
> 
> 2012/4/10 Erick Erickson 
> 
>> Have you tried the "=>' mapping instead? Something
>> like
>> hotel de ville => mairie
>> might work for you.
>> 
>> Best
>> Erick
>> 
>> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
>>  wrote:
>>> Hello,
>>> 
>>> I've read several post on this issue, but can't find a real solution to
>> my
>>> multi-words synonyms matching problem.
>>> 
>>> I have in my synonyms.txt an entry like
>>> 
>>> mairie, hotel de ville
>>> 
>>> and my index time analyzer is configured as followed for synonyms.
>>> 
>>> >> ignoreCase="true" expand="true"/>
>>> 
>>> The problem I have is that now "mairie" matches with "hotel" and I would
>>> only want "mairie" to match with "hotel de ville" and "mairie".
>>> 
>>> When I look into the analyzer, I see that "mairie" is mapped into
>> "hotel",
>>> and words "de ville" are added in second and third position. To change
>>> that, I tried to do
>>> 
>>> >> ignoreCase="true" expand="true"
>>> tokenizerFactory="solr.KeywordTokenizerFactory"/> (as I read in one post)
>>> 
>>> and I can see now in the analyzer that "mairie" is mapped to "hotel de
>>> ville", but now when I have query "hotel de ville", it doesn't match at
>> all
>>> with "mairie".
>>> 
>>> Anyone has a clue of what I'm doing wrong?
>>> 
>>> I'm using Solr 3.4.
>>> 
>>> Thanks,
>>> Elisabeth
>> 



Re: solr hangs

2012-04-11 Thread Pawel Rog
You wrote that you can see such error "OutOfMemoryError". I had such
problems when my caches were to big. It means that there is no more free
memory in JVM and probably full gc starts running. How big is your Java
heap? Maybe cache sizes in yout solr are to big according to your JVM
settings.

--
Regards,
Pawel

On Tue, Apr 10, 2012 at 9:51 PM, Peter Markey  wrote:

> Hello,
>
> I have a solr cloud setup based on a blog (
> http://outerthought.org/blog/491-ot.html) and am able to bring up the
> instances and cores. But when I start indexing data (through csv update),
> the core throws a out of memory exception (null:java.lang.RuntimeException:
> java.lang.OutOfMemoryError: unable to create new native thread). The thread
> dump from new solr ui is below:
>
> cmdDistribExecutor-8-thread-777 (827)
>
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1bd11b79
>
>   - sun.misc.Unsafe.park​(Native Method)
>   - java.util.concurrent.locks.LockSupport.park​(LockSupport.java:186)
>   -
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await
> (AbstractQueuedSynchronizer.java:2043)
>   -
>
> org.apache.http.impl.conn.tsccm.WaitingThread.await​(WaitingThread.java:158)
>   -
>   org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking
> (ConnPoolByRoute.java:403)
>   -
>   org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry
> (ConnPoolByRoute.java:300)
>   -
>
> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection
> (ThreadSafeClientConnManager.java:224)
>   -
>   org.apache.http.impl.client.DefaultRequestDirector.execute
> (DefaultRequestDirector.java:401)
>   -
>   org.apache.http.impl.client.AbstractHttpClient.execute
> (AbstractHttpClient.java:820)
>   -
>   org.apache.http.impl.client.AbstractHttpClient.execute
> (AbstractHttpClient.java:754)
>   -
>   org.apache.http.impl.client.AbstractHttpClient.execute
> (AbstractHttpClient.java:732)
>   -
>   org.apache.solr.client.solrj.impl.HttpSolrServer.request
> (HttpSolrServer.java:304)
>   -
>   org.apache.solr.client.solrj.impl.HttpSolrServer.request
> (HttpSolrServer.java:209)
>   -
>   org.apache.solr.update.SolrCmdDistributor$1.call
> (SolrCmdDistributor.java:320)
>   -
>   org.apache.solr.update.SolrCmdDistributor$1.call
> (SolrCmdDistributor.java:301)
>   - java.util.concurrent.FutureTask$Sync.innerRun​(FutureTask.java:334)
>   - java.util.concurrent.FutureTask.run​(FutureTask.java:166)
>   -
>   java.util.concurrent.Executors$RunnableAdapter.call​(Executors.java:471)
>   - java.util.concurrent.FutureTask$Sync.innerRun​(FutureTask.java:334)
>   - java.util.concurrent.FutureTask.run​(FutureTask.java:166)
>   -
>   java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1110)
>   -
>   java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:603)
>   - java.lang.Thread.run​(Thread.java:679)
>
>
>
> Apparently I do see lots of threads like above in the thread dump. I'm
> using latest build from the trunk (Apr 10th). Any insights into this issue
> woudl be really helpful. Thanks a lot.
>


Re: Facets involving multiple fields

2012-04-11 Thread Erick Erickson
Have you considered facet.query? You can specify an arbitrary query
to facet on which might do what you want. Otherwise, I'm not sure what
you mean by "faceted search using two fields". How should these fields
be combined into a single facet? What that means practically is not at
all obvious from your problem statement.

Best
Erick

On Tue, Apr 10, 2012 at 8:55 AM, Marc SCHNEIDER
 wrote:
> Hi,
>
> I'd like to make a faceted search using two fields. I want to have a
> single result and not a result by field (like when using
> facet.field=f1,facet.field=f2).
> I don't want to use a copy field either because I want it to be
> dynamic at search time.
> As far as I know this is not possible for Solr 3.x...
> But I saw a new parameter named "group.facet" for Solr4. Could that
> solve my problem? If yes could somebody give me an example?
>
> Thanks,
> Marc.


Re: Default qt on SolrCloud

2012-04-11 Thread Erick Erickson
What does your "query" request handler look like? By adding qt=standard
you're specifying the standard request handler, whereas your
...solr/query?q=*:* format goes at the request handler you named
"query" which presumably you've defined in solrconfig.xml...

What does &debugQuery=on show?

Best
Erick

On Tue, Apr 10, 2012 at 12:31 PM, Benson Margulies
 wrote:
> After I load documents into my cloud instance, a URL like:
>
> http://localhost:PORT/solr/query?q=*:*
>
> finds nothing.
>
> http://localhost:PORT/solr/query?q=*:*&qt=standard
>
> finds everything.
>
> My custom request handlers have 'default="false"'.
>
> What have I done?


Re: term frequency outweighs exact phrase match

2012-04-11 Thread Erick Erickson
Consider boosting on phrase with a SHOULD clause, something
like field:"apache solr"^2..

Best
Erick


On Tue, Apr 10, 2012 at 12:46 PM,   wrote:
> Hello,
>
> I use solr 3.5 with edismax. I have the following issue with phrase search. 
> For example if I have three documents with content like
>
> 1.apache apache
> 2. solr solr
> 3.apache solr
>
> then search for apache solr displays documents in the order 1,.2,3 instead of 
> 3, 2, 1 because term frequency in the first and second documents is higher 
> than in the third document. We want results be displayed in the order as  
> 3,2,1 since the third document has exact match.
>
> My request handler is as follows.
>
> 
> 
> edismax
> explicit
> 0.01
> host^30  content^0.5 title^1.2
> host^30  content^20 title^22 
> url,id, site ,title
> 2<-1 5<-2 6<90%
> 1
> true
> *:*
> content
> 0
> 165
> title
> 0
> url
> regex
> true
> true
> 5
> true
> site
> true
> 
> 
>  spellcheck
> 
> 
>
> Any ideas how to fix this issue?
>
> Thanks in advance.
> Alex.


Re: Suggester not working for digit starting terms

2012-04-11 Thread Erick Erickson
Hmmm, I can't pursue this right now, anyone want to jump in?

Erick

On Tue, Apr 10, 2012 at 2:41 PM, jmlucjav  wrote:
> I have double checked and still get the same behaviour. My field is:
>                 positionIncrementGap="100">
>                        
>                                 class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>                                 class="solr.KeywordTokenizerFactory"/>
>                                
>                                
>                        
>                
>
> Analisys shows numbers are there, for '500 $' I get as last step both in
> index&query:
>
> org.apache.solr.analysis.TrimFilterFactory {luceneMatchVersion=LUCENE_35}
> position        1
> term text       500 $
> startOffset     0
> endOffset       5
>
> So I still see something going wrong here
> xab
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3900783.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: custom query string parsing?

2012-04-11 Thread sam ”
Yah, RequestHandler is much better. Thanks! I don't know why I started with
QParserPlugin and SearchComponent.


Even with my own RequestHandler that only passes down selected query
params, people can  still get around it through qt parameter:

  ?qt=/update&stream.update=*:*&commit=true

I think I am trying to solve two things at once: security and application
specific query parameter transformation.

Security will have to be handled elsewhere. Query parameter manipulation
can indeed be done by providing RequestHandler...
If I could just introduce a custom application, both will be solved quite
easily.
But, I am required to do all application development using Solr only
(through plugins and velocity templates).


Thanks.


On Tue, Apr 10, 2012 at 10:19 PM, Chris Hostetter
wrote:

>
> : Essentially, this is what I want to do  (I'm extending SearchComponent):
>
> the level of request manipulation you seem to be interested strikes me as
> something that you should do as a custom RequestHandler -- not a
> SearchComponent or a QParserPlugin.
>
> You can always subclass SearchHandler, and override the handleRequest
> method to manipulate the request however you want and then delegate to
> super.
>
>
> -Hoss
>


Does the lucene can read the index file from solr?

2012-04-11 Thread neosky
both are version 3.5
I have tried that the solr can read the index file by lucene,
but I tried to use the lucene to read the index file from a specific field.
It returns me the result when I do the *.* search

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-the-lucene-can-read-the-index-file-from-solr-tp3902927p3902927.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: custom query string parsing?

2012-04-11 Thread sam ”
Actually,  /solr/mycore/myhandler/?qt=/updatestill uses my handler.

Only /solr/mycore/select/?qt=/update  uses update handler :P



On Wed, Apr 11, 2012 at 11:41 AM, sam ”  wrote:

> Yah, RequestHandler is much better. Thanks! I don't know why I started
> with QParserPlugin and SearchComponent.
>
>
> Even with my own RequestHandler that only passes down selected query
> params, people can  still get around it through qt parameter:
>
>   ?qt=/update&stream.update=*:*&commit=true
>
> I think I am trying to solve two things at once: security and application
> specific query parameter transformation.
>
> Security will have to be handled elsewhere. Query parameter manipulation
> can indeed be done by providing RequestHandler...
> If I could just introduce a custom application, both will be solved quite
> easily.
> But, I am required to do all application development using Solr only
> (through plugins and velocity templates).
>
>
> Thanks.
>
>
>
> On Tue, Apr 10, 2012 at 10:19 PM, Chris Hostetter <
> hossman_luc...@fucit.org> wrote:
>
>>
>> : Essentially, this is what I want to do  (I'm extending SearchComponent):
>>
>> the level of request manipulation you seem to be interested strikes me as
>> something that you should do as a custom RequestHandler -- not a
>> SearchComponent or a QParserPlugin.
>>
>> You can always subclass SearchHandler, and override the handleRequest
>> method to manipulate the request however you want and then delegate to
>> super.
>>
>>
>> -Hoss
>>
>
>


Re: How to get a list of values of a specified field

2012-04-11 Thread Erick Erickson
Consider using the TermsComponent
(http://wiki.apache.org/solr/TermsComponent)
You could get some number of terms from
your field at a time by judicious use of, say,
facet.prefix if you wanted.

But why do you want to do this? It's kind of an
odd requirement, and since you say there are
millions of values this will be expensive

Best
Erick

On Wed, Apr 11, 2012 at 7:23 AM, a sd  wrote:
> The type of content is "solr.string", actually is a sequence of any
> characters,"_",number,etc.
>
> On Wed, Apr 11, 2012 at 7:06 PM, Marcelo Carvalho Fernandes <
> mcf2...@gmail.com> wrote:
>
>> What type of content do you have in this field?
>>
>> ---
>> Marcelo Carvalho Fernandes
>>
>> On Wednesday, April 11, 2012, a sd  wrote:
>> > hi,all.
>> >      I want to get all values of a specified field,  this field is type
>> of
>> > "solr.string".
>> >      I can achieve this object by using "facet" feature, but there is a
>> > trouble : it respond me the all values by the default "facet" query. If
>> > there are millions of values with a field ,or more, it is a disaster to
>> > the application.  I thought, was there a way  by which i can account the
>> > amount of values at first, and then i query a segment of values by
>> > specified the "facet.offset" and "facet.limit" iteratively?
>> >     Thanks for your attention.
>> >     B.R.
>> > murphy
>> >
>>
>> --
>> 
>> Marcelo Carvalho Fernandes
>> +55 21 8272-7970
>> +55 21 2205-2786
>>


Re: Boost differences in two environments for same query and config

2012-04-11 Thread Erick Erickson
Well, you're matching a different number of records, so I have to assume
your indexes are different on the two machines.

Here is one case where doing an optimize might make sense, that'll purge
the data associated with any deleted records from the index which should
make comparisons better

Additionally, you have to insure that your request handler is identical
on both, have you made any changes to solrconfig.xml?

About the coord (2/3), I'm pretty clueless. But also insure that your
parsed query is identical on both, which is an additional check on
whether you've changed something on one server and not the
other.

Best
Erick

On Wed, Apr 11, 2012 at 8:19 AM, Kerwin  wrote:
> Hi All,
>
> I am firing the following Solr query against installations on two
> environments one on my local Windows machine and the other on Unix
> (Remote).
>
> RECORD_TYPE:info AND (NAME:ee123* OR CD:ee123^1000 OR CD:ee123*^100)
>
> There are no differences in the DataImportHandler configuration ,
> Schema and Solrconfig for both these installations.
> The correct expected result is given by the local installation of Solr
> which also gives scores as expected for the boosts.
>
> CORRECT/Expected:
> Debug query output for local installation:
>
> 10.822258 = (MATCH) sum of:
>        0.002170282 = (MATCH) weight(RECORD_TYPE:info in 35916), product of:
>                3.65739E-4 = queryWeight(RECORD_TYPE:info), product of:
>                        5.933964 = idf(docFreq=58891, maxDocs=8181811)
>                        6.1634855E-5 = queryNorm
>                5.933964 = (MATCH) fieldWeight(RECORD_TYPE:info in 35916), 
> product of:
>                        1.0 = tf(termFreq(RECORD_TYPE:info)=1)
>                        5.933964 = idf(docFreq=58891, maxDocs=8181811)
>                        1.0 = fieldNorm(field=RECORD_TYPE, doc=35916)
>        10.820087 = (MATCH) product of:
>                16.230131 = (MATCH) sum of:
>                        16.223969 = (MATCH) weight(CD:ee123^1000.0 in 35916), 
> product of:
>                                0.81 = queryWeight(CD:ee123^1000.0), 
> product of:
>                                        1000.0 = boost
>                                        16.224277 = idf(docFreq=1, 
> maxDocs=8181811)
>                                        6.1634855E-5 = queryNorm
>                                16.224277 = (MATCH) fieldWeight(CD:ee123 in 
> 35916), product of:
>                                        1.0 = tf(termFreq(CD:ee123)=1)
>                                        16.224277 = idf(docFreq=1, 
> maxDocs=8181811)
>                                        1.0 = fieldNorm(field=CD, doc=35916)
>                                0.0061634853 = (MATCH)
> ConstantScoreQuery(QueryWrapperFilter(CD:ee123 CD:ee123c CD:ee123c.
> CD:ee123dc CD:ee123e CD:ee123e. CD:ee123en CD:ee123fx CD:ee123g
> CD:ee123g.1 CD:ee123g1 CD:ee123ee123 CD:ee123l.1 CD:ee123l1 CD:ee123ll
> CD:ee123lr CD:ee123m.z CD:ee123mg CD:ee123mz CD:ee123na CD:ee123nx
> CD:ee123ol CD:ee123op CD:ee123p CD:ee123p.1 CD:ee123p1 CD:ee123pn
> CD:ee123r.1 CD:ee123r1 CD:ee123s CD:ee123s.z CD:ee123sm CD:ee123sn
> CD:ee123sp CD:ee123ss CD:ee123sz)), product of:
>                                        100.0 = boost
>                                        6.1634855E-5 = queryNorm
>                0.667 = coord(2/3)
>
> INCORRECT/Unexpected:
> Debug query output for Unix installation (Remote):
>
> 9.950362E-4 = (MATCH) sum of:
>        9.950362E-4 = (MATCH) weight(RECORD_TYPE:info in 35948), product of:
>                9.950362E-4 = queryWeight(RECORD_TYPE:info), product of:
>                        1.0 = idf(docFreq=58891, maxDocs=8181811)
>                        9.950362E-4 = queryNorm
>                1.0 = (MATCH) fieldWeight(RECORD_TYPE:info in 35948), product 
> of:
>                        1.0 = tf(termFreq(RECORD_TYPE:info)=1)
>                        1.0 = idf(docFreq=58891, maxDocs=8181811)
>                        1.0 = fieldNorm(field=RECORD_TYPE, doc=35948)
>        0.0 = (MATCH) product of:
>                1.0945399 = (MATCH) sum of:
>                        0.99503624 = (MATCH) weight(CD:ee123^1000.0 in 35948), 
> product of:
>                                0.99503624 = queryWeight(CD:ee123^1000.0), 
> product of:
>                                        1000.0 = boost
>                                        1.0 = idf(docFreq=1, maxDocs=8181811)
>                                        9.950362E-4 = queryNorm
>                                1.0 = (MATCH) fieldWeight(CD:ee123 in 35948), 
> product of:
>                                        1.0 = tf(termFreq(CD:ee123)=1)
>                                        1.0 = idf(docFreq=1, maxDocs=8181811)
>                                        1.0 = fieldNorm(field=CD, doc=35948)
>                                0.09950362 = (MATCH)
> ConstantScoreQuery(QueryWrapperFilter(CD:ee123 CD:ee123c CD:ee123c.
> CD:ee123dc CD:ee123e CD:ee123e. CD:ee123en CD:ee123fx CD:ee123g
> CD:ee12

Re: custom query string parsing?

2012-04-11 Thread Chris Hostetter

: Only /solr/mycore/select/?qt=/update  uses update handler :P

or just register your handler using the name "/select" then the request 
dispatcher will use it, and ignore "qt".

In trunk, the legacy SolrServlet has been removed, so you'll be able to
set handleSelect="false" on the  and not worry about 
/select at all unless you really want a url with that path to exist.

-Hoss


Re: Does the lucene can read the index file from solr?

2012-04-11 Thread Erick Erickson
Solr uses Lucene, so any index written with Solr should be
usable by Lucene and vice-versa.

But searching will be significantly different in the sense that Solr
wraps the raw Lucene search so you'll have to make sure your use
use of Lucene is compatible with your Solr configurations if you
compare results.

Best
Erick

On Wed, Apr 11, 2012 at 9:47 AM, neosky  wrote:
> both are version 3.5
> I have tried that the solr can read the index file by lucene,
> but I tried to use the lucene to read the index file from a specific field.
> It returns me the result when I do the *.* search
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Does-the-lucene-can-read-the-index-file-from-solr-tp3902927p3902927.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Default qt on SolrCloud

2012-04-11 Thread Benson Margulies
On Wed, Apr 11, 2012 at 11:19 AM, Erick Erickson
 wrote:
> What does your "query" request handler look like? By adding qt=standard
> you're specifying the standard request handler, whereas your
> ...solr/query?q=*:* format goes at the request handler you named
> "query" which presumably you've defined in solrconfig.xml...
>
> What does &debugQuery=on show?


It turned out that I had left an extra(eous) declaration for /query
with my custom RT, and when I removed it all was well.

thanks,benson


>
> Best
> Erick
>
> On Tue, Apr 10, 2012 at 12:31 PM, Benson Margulies
>  wrote:
>> After I load documents into my cloud instance, a URL like:
>>
>> http://localhost:PORT/solr/query?q=*:*
>>
>> finds nothing.
>>
>> http://localhost:PORT/solr/query?q=*:*&qt=standard
>>
>> finds everything.
>>
>> My custom request handlers have 'default="false"'.
>>
>> What have I done?


SOLR 4 autocommit - is it working as I think it should?

2012-04-11 Thread vybe3142
I've gotten past most of my initial hurdles with SOLR, with some useful
suggestions from this group. 

Thank You.

On to tweaking. 

This morning, I've been looking at the autocommit functionality as defined
in solrconfig.xml. By default, it appears that it should kick in 15 seconds
after a new document has been added. I do see this event triggered via the
SOLR/tomcat logs, but can't see the docs/terms  in the index or query them.
I haven't bothered with the softcommit yet as I'd like to first understand
what the issue is wrt the autocommit.

My config / logs are pasted as follows.

Thanks 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-autocommit-is-it-working-as-I-think-it-should-tp3903135p3903135.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 4 autocommit - is it working as I think it should?

2012-04-11 Thread Yonik Seeley
On Wed, Apr 11, 2012 at 12:58 PM, vybe3142  wrote:
> This morning, I've been looking at the autocommit functionality as defined
> in solrconfig.xml. By default, it appears that it should kick in 15 seconds
> after a new document has been added. I do see this event triggered via the
> SOLR/tomcat logs, but can't see the docs/terms  in the index or query them.
> I haven't bothered with the softcommit yet as I'd like to first understand
> what the issue is wrt the autocommit.

The 15 second hard autocommit is not for the purpose of update
visibility, but for durability (hence the hard autocommit uses
openSearcher=false).  It simply makes sure that recent changes are
flushed to disk.

If you want to automatically see changes after some period of time,
use an additional soft autocommit for that (and leave the hard
autocommit exactly as configured),
or use commitWithin when you do an update... that's more flexible and
allows you to specify latency on a per-update basis.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


solr 3.5 taking long to index

2012-04-11 Thread Rohit
We recently migrated from solr3.1 to solr3.5,  we have one master and one
slave configured. The master has two cores,

 

1) Core1 - 44555972 documents

2) Core2 - 29419244 documents

 

We commit every 5000 documents, but lately the commit is taking very long 15
minutes plus in some cases. What could have caused this, I have checked the
logs and the only warning i can see is,

 

"WARNING: Use of deprecated update request parameter update.processor
detected. Please use the new parameter update.chain instead, as support for
update.processor will be removed in a later version."

 

Memory details:

 

export JAVA_OPTS="$JAVA_OPTS -Xms6g -Xmx36g -XX:MaxPermSize=5g"

 

Solr Config:

 

false

10

32



  1

  1000

  1

 

What could be causing this, as everything was running fine a few days back?

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:   http://about.me/rohitg

 



Re: SOLR 4 autocommit - is it working as I think it should?

2012-04-11 Thread vybe3142
Thanks, makes perfect sense

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-autocommit-is-it-working-as-I-think-it-should-tp3903135p3903353.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Moving to Maven from Ant solr.build.dir Not Found

2012-04-11 Thread Eli Finkelshteyn
Alright, for those interested, I got this to work using the dependencies 
I mentioned before by swapping in the web.xml from the latest nightly 
build and passing in parameters for my custom stuff. I'm now running 
Solr with all my dependencies neatly stashed away in Maven and custom 
code just built on top without having to touch ant at all. I'll try to 
write up a short guide on how to do this some time soon, as it's saving 
us a whole ton of hassle in terms configuration for our project, and 
I'll bet it can do the same for others.


Eli

On 4/10/12 3:38 PM, Steven A Rowe wrote:

Eli,

Sorry, I don't have any experience using Solr in this way.

Has anybody else here successfully run Solr when it's included as a war dependency in an 
external Maven-based war project, by running "mvn jetty:run exploded" from the 
external  project?

FYI, The nightly download page I pointed you to includes a *binary* distribution, and you 
can run Solr using such a binary distribution by following the Solr tutorial I linked to. 
 (This is the standard way to "get Solr running".)

Steve

-Original Message-
From: Eli Finkelshteyn [mailto:iefin...@gmail.com]
Sent: Tuesday, April 10, 2012 3:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Moving to Maven from Ant solr.build.dir Not Found

I'm running mvn jetty:run-exploded on my own project. My dependencies are:



org.apache.solr
solr
4.0-SNAPSHOT
war


org.apache.solr
solr-core
4.0-SNAPSHOT


org.apache.solr
solr-analysis-extras
4.0-SNAPSHOT


org.apache.solr
solr-commons-csv
4.0-SNAPSHOT


org.apache.lucene
lucene-core
4.0-SNAPSHOT
jar


org.apache.lucene
lucene-spatial
4.0-SNAPSHOT


org.apache.lucene
lucene-queryparser
4.0-SNAPSHOT


org.apache.lucene
lucene-analyzers-common
4.0-SNAPSHOT


org.apache.lucene
lucene-queries
4.0-SNAPSHOT



I know I could download the snapshot manually, but I'd much prefer to do that 
through Maven since I don't need to modify source at all.

Eli

On 4/10/12 3:14 PM, Steven A Rowe wrote:

You didn't answer my question about where you are running "mvn 
jetty:run-exploded" - is it in your own project, or from the Solr sources?

Exactly which Solr Maven artifacts are you including as dependencies
are in your project's POM?  (Can you copy/paste the
section?)


Basically, I was just doing that to try to get Solr up and running. I
haven't found too many clear guides on this point, so I could
definitely be doing something wrong here.

Have you seen?

If you haven't already done so, you can download a recent 4.0 snapshot by following the 
"Download" link next to "Trunk (4.x-SNAPSHOT)" 
from.

Steve

-Original Message-
From: Eli Finkelshteyn [mailto:iefin...@gmail.com]
Sent: Tuesday, April 10, 2012 2:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Moving to Maven from Ant solr.build.dir Not Found

Hey Steven,
I'm not modifying Solr sources at all. I just have a project that's built on 
top of Solr using ant. I'd like to move it to use maven instead of ant. The way 
I was going about this was just adding in all parts of Solr that it's using as 
dependencies in Maven. I wasn't using a local repo for this at all, and instead 
just pulling everything from http://repository.apache.org/snapshots. I'm using 
version 4.0-SNAPSHOT for everything right now.

I'm running mvn jetty:run-exploded after compiling right now (or as my build 
target in Eclipse as per that guide I originally posted).
Basically, I was just doing that to try to get Solr up and running. I haven't 
found too many clear guides on this point, so I could definitely be doing 
something wrong here.

I'm fine with maven being officially unsupported as long as I can get things 
working. I'm not doing anything too fancy or out of the ordinary, so I'm 
thinking this shouldn't be too bad.

Thanks again for the help!

Eli

On 4/10/12 2:12 PM, Steven A Rowe wrote:

Eli,

Could you please more fully describe what you're doing?

Are you modifying Solr sources, and then compiling&installing the resulting 
modifications to your local Maven repository?

Or do you have a project that doesn't include any Solr sources at all, but only 
depends on Solr artifacts pulled in via Maven?

Also, which branch are you using?  Trunk (will be 4.0 when released)?  If 
you're using branch_3x, my recommendation is that you instead use released 
artifacts instead of snapshots.

Where are you running "mvn jetty:run-exploded"?


I'm not using ant at all, and would really like to keep it that way
if at all possible.

Well, the official Lucene/Solr build is Ant.  Using Maven to build Lucene/Solr is 
"officially unsupported".  So depending on what you're doing, it may not be 
possible to avoid Ant.

Steve

-Original Message-
From: Eli Finkelshteyn [mailto:iefin...@gmail.com]
Sent: Tuesday, April 10, 2012 2:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Moving to Maven from Ant solr.build.dir Not Found

Hey

Re: Suggester not working for digit starting terms

2012-04-11 Thread jmlucjav
Just to be sure, reproduced this with example config from 3.5.

1. add to schema.xml











2 1. add to solrconfig.xml


a_suggest
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.fst.FSTLookup
a_suggest

true
100




true
a_suggest
true
5
true


suggest


3. wipe data and undex sample docs
4. 
http://localhost:8983/solr/suggest?&q=720&debugQuery=true   --- 0 result
http://localhost:8983/solr/select/?q={!prefix%20f=a_suggest}720 ---
1 result


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3903790.html
Sent from the Solr - User mailing list archive at Nabble.com.


Question about solr.WordDelimiterFilterFactory

2012-04-11 Thread Jian Xu
Hello,

I am new to solr/lucene. I am tasked to index a large number of documents. Some 
of these documents contain decimal points. I am looking for a way to index 
these documents so that adjacent numeric characters (such as [0-9.,]) are 
treated as single token. For example,

12.34 => "12.34"
12,345 => "12,345"

However, "," and "." should be treated as usual when around non-digital 
characters. For example,

ab,cd => "ab" "cd".

It is so that searching for "12.34" will match "12.34" not "12 34". Searching 
for "ab.cd" should match both "ab.cd" and "ab cd".

After doing some research on solr, It seems that there is a build-in analyzer 
called solr.WordDelimiterFilter that supports a "types" attribute which map 
special characters as different delimiters.  However, it isn't exactly what I 
want. It doesn't provide context check such as "," or "." must surround by 
digital characters, etc. 

Does anyone have any experience configuring solr to meet this requirements?  Is 
writing my own plugin necessary for this simple thing?

Thanks in advance!

-Jian

Re: How to get a list of values of a specified field

2012-04-11 Thread a sd
I know,i know, This is a very expensive operation,the requirement  is also
very odd ,but is also very real. It is actually desired to go through the
whole documents within lucene again and again.
List the all potential value of a specified, and then divide the all work
(to go through) into a series of sub jobs by the variant field values. this
is my original intention.
By the way,I had a suggestion: can solr/lucene expose some class/interface
"public"? I had study the src of ( lucene/solr), i found some  utilities is
convenient to fulfill my required, but it is sad that they are  all access
modifier with private,protected or default.
B.R.
murphy

On Wed, Apr 11, 2012 at 11:59 PM, Erick Erickson wrote:

> Consider using the TermsComponent
> (http://wiki.apache.org/solr/TermsComponent)
> You could get some number of terms from
> your field at a time by judicious use of, say,
> facet.prefix if you wanted.
>
> But why do you want to do this? It's kind of an
> odd requirement, and since you say there are
> millions of values this will be expensive
>
> Best
> Erick
>
> On Wed, Apr 11, 2012 at 7:23 AM, a sd  wrote:
> > The type of content is "solr.string", actually is a sequence of any
> > characters,"_",number,etc.
> >
> > On Wed, Apr 11, 2012 at 7:06 PM, Marcelo Carvalho Fernandes <
> > mcf2...@gmail.com> wrote:
> >
> >> What type of content do you have in this field?
> >>
> >> ---
> >> Marcelo Carvalho Fernandes
> >>
> >> On Wednesday, April 11, 2012, a sd  wrote:
> >> > hi,all.
> >> >  I want to get all values of a specified field,  this field is
> type
> >> of
> >> > "solr.string".
> >> >  I can achieve this object by using "facet" feature, but there is
> a
> >> > trouble : it respond me the all values by the default "facet" query.
> If
> >> > there are millions of values with a field ,or more, it is a disaster
> to
> >> > the application.  I thought, was there a way  by which i can account
> the
> >> > amount of values at first, and then i query a segment of values by
> >> > specified the "facet.offset" and "facet.limit" iteratively?
> >> > Thanks for your attention.
> >> > B.R.
> >> > murphy
> >> >
> >>
> >> --
> >> 
> >> Marcelo Carvalho Fernandes
> >> +55 21 8272-7970
> >> +55 21 2205-2786
> >>
>


Re: solr 3.5 taking long to index

2012-04-11 Thread Lance Norskog
It's telling you the problem. Try your  solrconfig.xml against the one
in 3.5/solr/example/solr/conf. You will what has changed in the
suggested tools.


On Wed, Apr 11, 2012 at 10:42 AM, Rohit  wrote:
> We recently migrated from solr3.1 to solr3.5,  we have one master and one
> slave configured. The master has two cores,
>
>
>
> 1) Core1 - 44555972 documents
>
> 2) Core2 - 29419244 documents
>
>
>
> We commit every 5000 documents, but lately the commit is taking very long 15
> minutes plus in some cases. What could have caused this, I have checked the
> logs and the only warning i can see is,
>
>
>
> "WARNING: Use of deprecated update request parameter update.processor
> detected. Please use the new parameter update.chain instead, as support for
> update.processor will be removed in a later version."
>
>
>
> Memory details:
>
>
>
> export JAVA_OPTS="$JAVA_OPTS -Xms6g -Xmx36g -XX:MaxPermSize=5g"
>
>
>
> Solr Config:
>
>
>
> false
>
> 10
>
> 32
>
> 
>
>  1
>
>  1000
>
>  1
>
>
>
> What could be causing this, as everything was running fine a few days back?
>
>
>
>
>
> Regards,
>
> Rohit
>
> Mobile: +91-9901768202
>
> About Me:   http://about.me/rohitg
>
>
>



-- 
Lance Norskog
goks...@gmail.com


Solr 3.5 takes very long to commit gradually

2012-04-11 Thread Rohit
We recently migrated from solr3.1 to solr3.5, we have one master and one
slave configured. The master has two cores,

1) Core1 - 44555972 documents

2) Core2 - 29419244 documents

We commit every 5000 documents, but lately the commit time gradually
increase and solr is taking as very long 15 minutes plus in some cases. What
could have caused this, I have checked the logs and the only warning i can
see is,

"WARNING: Use of deprecated update request parameter update.processor
detected. Please use the new parameter update.chain instead, as support for
update.processor will be removed in a later version."

Memory details:

export JAVA_OPTS="$JAVA_OPTS -Xms6g -Xmx36g -XX:MaxPermSize=5g"

Solr Config:

false

10

32



1

1000

1

Also noticed, that top command show almost 350GB of Virtual memory usage.

What could be causing this, as everything was running fine a few days back?

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:   http://about.me/rohitg

 



Re: Suggester not working for digit starting terms

2012-04-11 Thread Robert Muir
On Wed, Apr 11, 2012 at 4:37 PM, jmlucjav  wrote:
> Just to be sure, reproduced this with example config from 3.5.
>

Regardless of your tokenizer, be aware that with this version of solr
its going to split up terms based on 'identifier rules' (including
splitting on whitespace).
This is because suggesters go thru the ordinary spellchecker framework.

If you are trying to autosuggest actual phrases, have a look at
http://wiki.apache.org/solr/Suggester#Tips_and_tricks
which describes how to set this up along with example configurations.
More information is available in
https://issues.apache.org/jira/browse/SOLR-3143

Essentially this provides a QueryConverter thats hopefully more
suitable for autosuggesters, it just passes the whole entire input to
your query analyzer,
and its your responsibility to do whatever you need there to extract
the 'meat' of the query for autosuggest. The example configuration
linked from the wiki page is just that and uses some regexps to try to
imitate what google's does (discarding operators like +/- but still
keeping the whole thing as a phrase).

You will need Solr 3.6 for this..., but its on its way out.

-- 
lucidimagination.com


Re: solr 3.5 taking long to index

2012-04-11 Thread Bernd Fehling

There were some changes in solrconfig.xml between solr3.1 and solr3.5.
Always read CHANGES.txt when switching to a new version.
Also helpful is comparing both versions of solrconfig.xml from the examples.

Are you sure you need a MaxPermSize of 5g?
Use jvisualvm to see what you really need.
This is also for all other JAVA_OPTS.



Am 11.04.2012 19:42, schrieb Rohit:
> We recently migrated from solr3.1 to solr3.5,  we have one master and one
> slave configured. The master has two cores,
> 
>  
> 
> 1) Core1 - 44555972 documents
> 
> 2) Core2 - 29419244 documents
> 
>  
> 
> We commit every 5000 documents, but lately the commit is taking very long 15
> minutes plus in some cases. What could have caused this, I have checked the
> logs and the only warning i can see is,
> 
>  
> 
> "WARNING: Use of deprecated update request parameter update.processor
> detected. Please use the new parameter update.chain instead, as support for
> update.processor will be removed in a later version."
> 
>  
> 
> Memory details:
> 
>  
> 
> export JAVA_OPTS="$JAVA_OPTS -Xms6g -Xmx36g -XX:MaxPermSize=5g"
> 
>  
> 
> Solr Config:
> 
>  
> 
> false
> 
> 10
> 
> 32
> 
> 
> 
>   1
> 
>   1000
> 
>   1
> 
>  
> 
> What could be causing this, as everything was running fine a few days back?
> 
>  
> 
>  
> 
> Regards,
> 
> Rohit
> 
> Mobile: +91-9901768202
> 
> About Me:   http://about.me/rohitg
> 
>  
> 
> 


Re: Multi-words synonyms matching

2012-04-11 Thread elisabeth benoit
oh, that's right.

thanks a lot,
Elisabeth

2012/4/11 Jeevanandam Madanagopal 

> Elisabeth -
>
> As you described, below mapping might suit for your need.
> mairie => hotel de ville, mairie
>
> mairie gets expanded to "hotel de ville" and "mairie" at index time.  So
> "mairie" and "hotel de ville" searchable on document.
>
> However, still white space tokenizer splits at query time will be a
> problem as described by Markus.
>
> --Jeevanandam
>
> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
>
> > <' mapping instead? Something
> > < > < mairie
> > < >
> > Yes, thanks, I've tried it but from what I undestand it doesn't solve my
> > problem, since this means hotel de ville will be replace by mairie at
> > index time (I use synonyms only at index time). So when user will ask
> > "hôtel de ville", it won't match.
> >
> > In fact, at index time I have mairie in my data, but I want user to be
> able
> > to request "mairie" or "hôtel de ville" and have mairie as answer, and
> not
> > have mairie as an answer when requesting "hôtel".
> >
> >
> > < your
> > white
> > < >
> > < >
> > < > query
> > < >
> > Ok, I guess this means I have a problem. No simple solution since at
> query
> > time my tokenizer do split on white spaces.
> >
> > I guess my problem is more or less one of the problems discussed in
> >
> >
> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
> >
> >
> > Thanks a lot for your answers,
> > Elisabeth
> >
> >
> >
> >
> >
> > 2012/4/10 Erick Erickson 
> >
> >> Have you tried the "=>' mapping instead? Something
> >> like
> >> hotel de ville => mairie
> >> might work for you.
> >>
> >> Best
> >> Erick
> >>
> >> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
> >>  wrote:
> >>> Hello,
> >>>
> >>> I've read several post on this issue, but can't find a real solution to
> >> my
> >>> multi-words synonyms matching problem.
> >>>
> >>> I have in my synonyms.txt an entry like
> >>>
> >>> mairie, hotel de ville
> >>>
> >>> and my index time analyzer is configured as followed for synonyms.
> >>>
> >>>  >>> ignoreCase="true" expand="true"/>
> >>>
> >>> The problem I have is that now "mairie" matches with "hotel" and I
> would
> >>> only want "mairie" to match with "hotel de ville" and "mairie".
> >>>
> >>> When I look into the analyzer, I see that "mairie" is mapped into
> >> "hotel",
> >>> and words "de ville" are added in second and third position. To change
> >>> that, I tried to do
> >>>
> >>>  >>> ignoreCase="true" expand="true"
> >>> tokenizerFactory="solr.KeywordTokenizerFactory"/> (as I read in one
> post)
> >>>
> >>> and I can see now in the analyzer that "mairie" is mapped to "hotel de
> >>> ville", but now when I have query "hotel de ville", it doesn't match at
> >> all
> >>> with "mairie".
> >>>
> >>> Anyone has a clue of what I'm doing wrong?
> >>>
> >>> I'm using Solr 3.4.
> >>>
> >>> Thanks,
> >>> Elisabeth
> >>
>
>