date:20111212

Re: NRT or similar for Solr 3.5?

2011-12-12 Thread vikram kamath

The Onclick handler does not seem to be called on google chrome (Ubuntu ).

Also , I dont seem to receive the email with the confirmation link on
registering (I have checked my spam)




Regards
Vikram Kamath



2011/12/12 Nagendra Nagarajayya 

> Steven:
>
> There is an onclick handler that allows you to download the src. BTW, an
> early access Solr 3.5 with RankingAlgorithm 1.3 (NRT) release is
> available for download. So please give it a try.
>
> Regards,
>
> - Nagendra Nagarajayya
> http://solr-ra.tgels.org
> http://rankingalgorithm.tgels.org
>
>
> On 12/10/2011 11:18 PM, Steven Ou wrote:
> > All the links on the download section link to http://solr-ra.tgels.org/#
> > --
> > Steven Ou | 歐偉凡
> >
> > *ravn.com* | Chief Technology Officer
> > steve...@gmail.com | +1 909-569-9880
> >
> >
> > 2011/12/11 Nagendra Nagarajayya 
> >
> >> Steven:
> >>
> >> Not sure why you had problems, #downloads (
> >> http://solr-ra.tgels.org/#downloads ) should point you to the downloads
> >> section showing the different versions available for download ? Please
> >> share if this is not so ( there were downloads yesterday with no
> problems )
> >>
> >> Regarding NRT, you can switch between RA and Lucene at query level or at
> >> config level; in the current version with RA, NRT is in effect while
> >> with lucene, it is not, you can get more information from here:
> >> http://solr-ra.tgels.org/papers/Solr34_with_RankingAlgorithm13.pdf
> >>
> >> Solr 3.5 with RankingAlgorithm 1.3 should be available next week.
> >>
> >> Regards,
> >>
> >> - Nagendra Nagarajayya
> >> http://solr-ra.tgels.org
> >> http://rankingalgorithm.tgels.org
> >>
> >> On 12/9/2011 4:49 PM, Steven Ou wrote:
> >>> Hey Nagendra,
> >>>
> >>> I took a look and Solr-RA looks promising - but:
> >>>
> >>>- I could not figure out how to download it. It seems like all the
> >>>download links just point to "#"
> >>>- I wasn't looking for another ranking algorithm, so would it be
> >>>possible for me to use NRT but *not* RA (i.e. just use the normal
> >> Lucene
> >>>library)?
> >>>
> >>> --
> >>> Steven Ou | 歐偉凡
> >>>
> >>> *ravn.com* | Chief Technology Officer
> >>> steve...@gmail.com | +1 909-569-9880
> >>>
> >>>
> >>> On Sat, Dec 10, 2011 at 5:13 AM, Nagendra Nagarajayya <
> >>> nnagaraja...@transaxtions.com> wrote:
> >>>
>  Steven:
> 
>  Please take a look at Solr  with RankingAlgorithm. It offers NRT
>  functionality. You can set your autoCommit to about 15 mins. You can
> get
>  more information from here:
>  http://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.x<
> >> http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x>
> 
>  Regards,
> 
>  - Nagendra Nagarajayya
>  http://solr-ra.tgels.org
>  http://rankingalgorithm.tgels.**org <
> http://rankingalgorithm.tgels.org>
> 
> 
>  On 12/8/2011 9:30 PM, Steven Ou wrote:
> 
> > Hi guys,
> >
> > I'm looking for NRT functionality or similar in Solr 3.5. Is that
> > possible?
> >> From what I understand there's NRT in Solr 4, but I can't figure out
> > whether or not 3.5 can do it as well?
> >
> > If not, is it feasible to use an autoCommit every 1000ms? We don't
> > currently process *that* much data so I wonder if it's OK to just
> >> commit
> > very often? Obviously not scalable on a large scale, but it is
> feasible
> > for
> > a relatively small amount of data?
> >
> > I recently upgraded from Solr 1.4 to 3.5. I had a hard time getting
> > everything working smoothly and the process ended up taking my site
> >> down
> > for a couple hours. I am very hesitant to upgrade to Solr 4 if it's
> not
> > necessary to get some sort of NRT functionality.
> >
> > Can anyone help me? Thanks!
> > --
> > Steven Ou | 歐偉凡
> >
> > *ravn.com* | Chief Technology Officer
> > steve...@gmail.com | +1 909-569-9880
> >
> >
> >>
>
>

Re: cache monitoring tools?

2011-12-12 Thread Dmitry Kan

Hoss, I can't see why Network IO is the issue as the shards and the front
end SOLR resided on the same server. I said "resided", because I got rid of
the front end (which according to my measurements, was taking at least as
much time for merging as it took to find the actual data in the shards) and
shards. Now I have only one shard having all the data. Filter cache tuning
also helped to reduce the amount of evictions to a minimum.

Dmitry

On Fri, Dec 9, 2011 at 10:42 PM, Chris Hostetter
wrote:

>
> : The culprit seems to be the merger (frontend) SOLR. Talking to one shard
> : directly takes substantially less time (1-2 sec).
> ...
> : >> > > >>facet.limit=50
>
> Your probably most likeley has very little to do with your caches at all
> -- a facet.limit that high requires sending a very large amount of data
> over the wire, multiplied by the number of shards, multipled by some
> constant (i think it's 2 but it might be higher) in order to "over
> request" facet constriant counts from each shard to aggregate them.
>
> the dominant factor in the slow speed you are seeing is most likeley
> Network IO between the shards.
>
>
>
> -Hoss
>

-- 
Regards,

Dmitry Kan

Re: cache monitoring tools?

2011-12-12 Thread Dmitry Kan

Paul, have you checked solrmeter and zabbix?

Dmitry

On Fri, Dec 9, 2011 at 11:16 PM, Paul Libbrecht  wrote:

> Allow me to chim in and ask a generic question about monitoring tools for
> people close to developers: are any of the tools mentioned in this thread
> actually able to show graphs of loads, e.g. cache counts or CPU load, in
> parallel to a console log or to an http request log??
>
> I am working on such a tool currently but I have a bad feeling of
> reinventing the wheel.
>
> thanks in advance
>
> Paul
>
>
>
> Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :
>
> > Otis, Tomás: thanks for the great links!
> >
> > 2011/12/7 Tomás Fernández Löbbe 
> >
> >> Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use
> any
> >> tool that visualizes JMX stuff like Zabbix. See
> >>
> >>
> http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
> >>
> >> On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan 
> wrote:
> >>
> >>> The culprit seems to be the merger (frontend) SOLR. Talking to one
> shard
> >>> directly takes substantially less time (1-2 sec).
> >>>
> >>> On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan 
> wrote:
> >>>
>  Tomás: thanks. The page you gave didn't mention cache specifically, is
>  there more documentation on this specifically? I have used solrmeter
> >>> tool,
>  it draws the cache diagrams, is there a similar tool, but which would
> >> use
>  jmx directly and present the cache usage in runtime?
> 
>  pravesh:
>  I have increased the size of filterCache, but the search hasn't become
> >>> any
>  faster, taking almost 9 sec on avg :(
> 
>  name: search
>  class: org.apache.solr.handler.component.SearchHandler
>  version: $Revision: 1052938 $
>  description: Search using components:
> 
> >>>
> >>
> org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
> 
>  stats: handlerStart : 1323255147351
>  requests : 100
>  errors : 3
>  timeouts : 0
>  totalTime : 885438
>  avgTimePerRequest : 8854.38
>  avgRequestsPerSecond : 0.008789442
> 
>  the stats (copying fieldValueCache as well here, to show term
> >>> statistics):
> 
>  name: fieldValueCache
>  class: org.apache.solr.search.FastLRUCache
>  version: 1.0
>  description: Concurrent LRU Cache(maxSize=1, initialSize=10,
>  minSize=9000, acceptableSize=9500, cleanupThread=false)
>  stats: lookups : 79
>  hits : 77
>  hitratio : 0.97
>  inserts : 1
>  evictions : 0
>  size : 1
>  warmupTime : 0
>  cumulative_lookups : 79
>  cumulative_hits : 77
>  cumulative_hitratio : 0.97
>  cumulative_inserts : 1
>  cumulative_evictions : 0
>  item_shingleContent_trigram :
> 
> >>>
> >>
> {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
>  name: filterCache
>  class: org.apache.solr.search.FastLRUCache
>  version: 1.0
>  description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
>  minSize=138240, acceptableSize=145920, cleanupThread=false)
>  stats: lookups : 1082854
>  hits : 940370
>  hitratio : 0.86
>  inserts : 142486
>  evictions : 0
>  size : 142486
>  warmupTime : 0
>  cumulative_lookups : 1082854
>  cumulative_hits : 940370
>  cumulative_hitratio : 0.86
>  cumulative_inserts : 142486
>  cumulative_evictions : 0
> 
> 
>  index size: 3,25 GB
> 
>  Does anyone have some pointers to where to look at and optimize for
> >> query
>  time?
> 
> 
>  2011/12/7 Tomás Fernández Löbbe 
> 
> > Hi Dimitry, cache information is exposed via JMX, so you should be
> >> able
> >>> to
> > monitor that information with any JMX tool. See
> > http://wiki.apache.org/solr/SolrJmx
> >
> > On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan 
> >>> wrote:
> >
> >> Yes, we do require that much.
> >> Ok, thanks, I will try increasing the maxsize.
> >>
> >> On Wed, Dec 7, 2011 at 10:56 AM, pravesh 
> > wrote:
> >>
> > facet.limit=50
> >>> your facet.limit seems too high. Do you actually require this
> >> much?
> >>>
> >>> Since there a lot of evictions from filtercache, so, increase the
> > maxsize
> >>> value to your acceptable limit.
> >>>
> >>> Regards
> >>> Pravesh
> >>>
> >>> --
> >>> View this message in context:
> >>>
> >>
> >
> >>>
> >>
> http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
> >>> Sent from the Solr - User mailing list archive at Nabble

limiting the content of content field in search results

2011-12-12 Thread ayyappan

I am developing n application which indexes whole pdfs and other documents
to solr. I have completed a working version of my application. But there are
some problems. The main one is that when I do a search the indexed whole
document is shown. I have used solrj and need some help to reduce this
content. 

How limiting the content of content field in search results and display
over there .

i need like this 



*Grammer1.docx*
Blazing â€“ burring Faceted Cluster â€“ to gather Geospatial Replication â€“
coping Distinguish â€“ apart from Flawlessly â€“ perfectly Recipe â€“method
Concentrated inscription 
Last Modified : 2011-12-11T14:42:27Z

*who.pdf*
Who We Are Table of contents 1 Solr Committers (in alphabetical
order)fgfgfgfg2 2 Inactive Committers (in alphabetical orde 

*version_control.pdf*
Solr Version Control System Table of contents 1 Overview.gfgfgfg 2 Web Acce 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/limiting-the-content-of-content-field-in-search-results-tp3578859p3578859.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 3.4 problem with words separated by coma without space

2011-12-12 Thread elisabeth benoit

Thanks for the answer.

yes in fact when I look at debugQuery output, I notice that name and number
are never treated as single entries.

I have

(((text:name text:number)) (text:ru) (text:tain) (text:paris)))

so name and number are in same parenthesis, but not exactlly treated as a
phrase, as far as I know, since a phrase would be more like text:"name
number".

could you tell me what is the difference between (text:name text:number)
and (text:"name number")?

I'll check autoGeneratePhraseQueries.

Best regards,
Elisabeth




2011/12/8 Chris Hostetter 

>
> : If I check in the solr.admin.analyzer, I get the same analysis for the
> two
> : different requests. But it seems, if fact, that the lacking space after
> : coma prevents name and number from matching.
>
> query analysis is only part of hte picture ... Did you look at the
> debuqQuery output? ...  i believe you are seeing the effects of the
> QueryParser analyzing "name," distinctly from "number" in one case, vs
> analyzing the entire string "name,number" in the second case, an treating
> the later as a phrase query (because one input clause produces multiple
> tokens)
>
> there is a recently added autoGeneratePhraseQueries option that affects
> this.
>
>
> -Hoss
>

Re: cache monitoring tools?

2011-12-12 Thread Dmitry Kan

Justin, in terms of the overhead, have you noticed if Munin puts much of it
when used in production? In terms of the solr farm: how big is a shard's
index (given you have sharded architecture).

Dmitry

On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas
wrote:

> At my work, we use Munin and Nagio for monitoring and alerts.  Munin is
> great because writing a plugin for it so simple, and with Solr's
> statistics handler, we can track almost any solr stat we want.  It also
> comes with included plugins for load, file system stats, processes,
> etc.
>
> http://munin-monitoring.org/
>
> Justin
>
> Paul Libbrecht  writes:
>
> > Allow me to chim in and ask a generic question about monitoring tools
> > for people close to developers: are any of the tools mentioned in this
> > thread actually able to show graphs of loads, e.g. cache counts or CPU
> > load, in parallel to a console log or to an http request log??
> >
> > I am working on such a tool currently but I have a bad feeling of
> reinventing the wheel.
> >
> > thanks in advance
> >
> > Paul
> >
> >
> >
> > Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :
> >
> >> Otis, Tomás: thanks for the great links!
> >>
> >> 2011/12/7 Tomás Fernández Löbbe 
> >>
> >>> Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use
> any
> >>> tool that visualizes JMX stuff like Zabbix. See
> >>>
> >>>
> http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
> >>>
> >>> On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan 
> wrote:
> >>>
>  The culprit seems to be the merger (frontend) SOLR. Talking to one
> shard
>  directly takes substantially less time (1-2 sec).
> 
>  On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan 
> wrote:
> 
> > Tomás: thanks. The page you gave didn't mention cache specifically,
> is
> > there more documentation on this specifically? I have used solrmeter
>  tool,
> > it draws the cache diagrams, is there a similar tool, but which would
> >>> use
> > jmx directly and present the cache usage in runtime?
> >
> > pravesh:
> > I have increased the size of filterCache, but the search hasn't
> become
>  any
> > faster, taking almost 9 sec on avg :(
> >
> > name: search
> > class: org.apache.solr.handler.component.SearchHandler
> > version: $Revision: 1052938 $
> > description: Search using components:
> >
> 
> >>>
> org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
> >
> > stats: handlerStart : 1323255147351
> > requests : 100
> > errors : 3
> > timeouts : 0
> > totalTime : 885438
> > avgTimePerRequest : 8854.38
> > avgRequestsPerSecond : 0.008789442
> >
> > the stats (copying fieldValueCache as well here, to show term
>  statistics):
> >
> > name: fieldValueCache
> > class: org.apache.solr.search.FastLRUCache
> > version: 1.0
> > description: Concurrent LRU Cache(maxSize=1, initialSize=10,
> > minSize=9000, acceptableSize=9500, cleanupThread=false)
> > stats: lookups : 79
> > hits : 77
> > hitratio : 0.97
> > inserts : 1
> > evictions : 0
> > size : 1
> > warmupTime : 0
> > cumulative_lookups : 79
> > cumulative_hits : 77
> > cumulative_hitratio : 0.97
> > cumulative_inserts : 1
> > cumulative_evictions : 0
> > item_shingleContent_trigram :
> >
> 
> >>>
> {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
> > name: filterCache
> > class: org.apache.solr.search.FastLRUCache
> > version: 1.0
> > description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
> > minSize=138240, acceptableSize=145920, cleanupThread=false)
> > stats: lookups : 1082854
> > hits : 940370
> > hitratio : 0.86
> > inserts : 142486
> > evictions : 0
> > size : 142486
> > warmupTime : 0
> > cumulative_lookups : 1082854
> > cumulative_hits : 940370
> > cumulative_hitratio : 0.86
> > cumulative_inserts : 142486
> > cumulative_evictions : 0
> >
> >
> > index size: 3,25 GB
> >
> > Does anyone have some pointers to where to look at and optimize for
> >>> query
> > time?
> >
> >
> > 2011/12/7 Tomás Fernández Löbbe 
> >
> >> Hi Dimitry, cache information is exposed via JMX, so you should be
> >>> able
>  to
> >> monitor that information with any JMX tool. See
> >> http://wiki.apache.org/solr/SolrJmx
> >>
> >> On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan 
>  wrote:
> >>
> >>> Yes, we do require that much.
> >>> Ok, thanks, I will try increa

InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Max

Hi there,

when highlighting a field with this definition:



















containing this string:

"Mosfellsbær"

I get the following exception, if that field is in the highlight fields:

SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token
mosfellsbaer exceeds length of provided text sized 11
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:497)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token mosfellsbaer exceeds length of provided text sized 11
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490)

I tried with solr 3.4 and 3.5, same error for both. Removing the char
filter didnt fix the problem either.

It seems like there is some weird stuff going on when folding the
string, it can be seen in the analysis view, too:

http://i.imgur.com/6B2Uh.png

The end offset remains 11 even after folding and transforming "æ" to
"ae", which seems wrong to me.

I also stumbled upon https://issues.apache.org/jira/browse/LUCENE-1500
which seems like a similiar issue.

Is there a workaround for that problem or is the field configuration wrong?

Ask about the question of solr cache

2011-12-12 Thread JiaoyanChen

When I have delete or add data by application through solrj, or have
import index through command nutch solrindex, the cache of solr are not
changed if I do not restart solr.
Could anyone tell me how could I update solr cache without restarting
using shell command?
When I recreate the index by nutch, I should update data in solr. 
I use java -jar start.jar to publish solr.
Thanks!

Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Robert Muir

On Mon, Dec 12, 2011 at 5:18 AM, Max  wrote:

> The end offset remains 11 even after folding and transforming "æ" to
> "ae", which seems wrong to me.

End offsets refer to the *original text* so this is correct.

What is wrong, is EdgeNGramsFilter. See how it turns that 11 to a 12?

>
> I also stumbled upon https://issues.apache.org/jira/browse/LUCENE-1500
> which seems like a similiar issue.
>
> Is there a workaround for that problem or is the field configuration wrong?

For now, don't use EdgeNGrams.

-- 
lucidimagination.com

Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Robert Muir

On Mon, Dec 12, 2011 at 5:18 AM, Max  wrote:

> It seems like there is some weird stuff going on when folding the
> string, it can be seen in the analysis view, too:
>
> http://i.imgur.com/6B2Uh.png
>

I created a bug here, https://issues.apache.org/jira/browse/LUCENE-3642

Thanks for the screenshot, makes it easy to do a test case here.

-- 
lucidimagination.com

Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Max

Robert, thank you for creating the issue in JIRA.

However, I need ngrams on that field – is there an alternative to the
EdgeNGramFilterFactory ?

Thanks!

On Mon, Dec 12, 2011 at 1:25 PM, Robert Muir  wrote:
> On Mon, Dec 12, 2011 at 5:18 AM, Max  wrote:
>
>> It seems like there is some weird stuff going on when folding the
>> string, it can be seen in the analysis view, too:
>>
>> http://i.imgur.com/6B2Uh.png
>>
>
> I created a bug here, https://issues.apache.org/jira/browse/LUCENE-3642
>
> Thanks for the screenshot, makes it easy to do a test case here.
>
> --
> lucidimagination.com

Re: Setting group.ngroups=true considerable slows down queries

2011-12-12 Thread Martijn v Groningen

Hi!

As as I know currently there isn't another way. Unfortunately the
performance degrades badly when having a lot of unique groups.
I think an issue should be opened to investigate how we can improve this...

Question: Does Solr have a decent chuck of heap space (-Xmx)? Because
grouping requires quite some heap space (also without
group.ngroups=true).

Martijn

On 9 December 2011 23:08, Michael Jakl  wrote:
> Hi!
>
> On Fri, Dec 9, 2011 at 17:41, Martijn v Groningen
>  wrote:
>> On what field type are you grouping and what version of Solr are you
>> using? Grouping by string field is faster.
>
> The field is defined as follows:
> 
>
> Grouping itself is quite fast, only computing the number of groups
> seems to increase significantly with the number of documents (linear).
>
> I was hoping for a faster solution to compute the total number of
> distinct documents (or in other terms, the number of distinct values
> in the signature field). Facets came to mind, but as far as I could
> see, they don't offer a total number of facets as well.
>
> I'm using Solr 3.5 (upgraded from Solr 3.4 without reindexing).
>
> Thanks,
> Michael
>
>> On 9 December 2011 12:46, Michael Jakl  wrote:
>>> Hi, I'm using the grouping feature of Solr to return a list of unique
>>> documents together with a count of the duplicates.
>>>
>>> Essentially I use Solr's signature algorithm to create the "signature"
>>> field and use grouping on it.
>>>
>>> To provide good numbers for paging through my result list, I'd like to
>>> compute the total number of documents found (= matches) and the number
>>> of unique documents (= ngroups). Unfortunately, enabling
>>> "group.ngroups" considerably slows down the query (from 500ms to
>>> 23000ms for a result list of roughly 30 documents).
>>>
>>> Is there a faster way to compute the number of groups (or unique
>>> values in the signature field) in the search result? My Solr instance
>>> currently contains about 50 million documents and around 10% of them
>>> are duplicates.
>>>
>>> Thank you,
>>> Michael
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen



-- 
Met vriendelijke groet,

Martijn van Groningen

ExtractingRequestHandler and HTML

2011-12-12 Thread Michael Kelleher

I am submitting HTML document to Solr using the ERH.  Is it possible to 
store the contents of the document (including all markup) into a field?  
Using fmap.content (I am assuming this comes from Tika) stores the 
extracted text of the document in a field, but not the markup.  I want 
the whole un-altered document.


Is this possible?

thanks

--mike

Re: performance of json vs xml?

2011-12-12 Thread Erick Erickson

How are you getting your documents into Solr? Because
if you're using SolrJ it's a moot point because a binary
format is used.

I haven't done any specific comparisons, but I'd be
surprised if JSON took longer.

And removing a whole operation from your update
chain that had to be kept fed and watered is worth
the risk of a bit of slowdown.

In other words, "Try it and see" ...

Best
Erick

On Sun, Dec 11, 2011 at 3:16 PM, Jason Toy  wrote:
> I'm thinking about modifying my index process to use json because all my
> docs are originally in json anyway . Are there any performance issues if I
> insert json docs instead of xml docs?  A colleague recommended to me to
> stay with xml because solr is highly optimized for xml.

Re: Setting group.ngroups=true considerable slows down queries

2011-12-12 Thread Michael Jakl

Hi!

On Mon, Dec 12, 2011 at 13:57, Martijn v Groningen
 wrote:
> As as I know currently there isn't another way. Unfortunately the
> performance degrades badly when having a lot of unique groups.
> I think an issue should be opened to investigate how we can improve this...
>
> Question: Does Solr have a decent chuck of heap space (-Xmx)? Because
> grouping requires quite some heap space (also without
> group.ngroups=true).

Thanks, for answering. The Server has gotten as much memory as the
machine can afford (without swapping):
  -Xmx21g \
  -Xms4g \

Shall I open an issue as a subtask of SOLR-236 even though there is
already a performance related task (SOLR-2205)?

Cheers,
Michael

Re: Setting group.ngroups=true considerable slows down queries

2011-12-12 Thread Martijn v Groningen

I'd not make a subtaks onder SOLR-236 b/c it is related to a
completely different implementation which was never committed.
SOLR-2205 is related to general result grouping and think should be closed.
I'd make a new issue for improving the performance of
group.ngroups=true when there are a lot of unique groups.

Martijn

On 12 December 2011 14:32, Michael Jakl  wrote:
> Hi!
>
> On Mon, Dec 12, 2011 at 13:57, Martijn v Groningen
>  wrote:
>> As as I know currently there isn't another way. Unfortunately the
>> performance degrades badly when having a lot of unique groups.
>> I think an issue should be opened to investigate how we can improve this...
>>
>> Question: Does Solr have a decent chuck of heap space (-Xmx)? Because
>> grouping requires quite some heap space (also without
>> group.ngroups=true).
>
> Thanks, for answering. The Server has gotten as much memory as the
> machine can afford (without swapping):
>  -Xmx21g \
>  -Xms4g \
>
> Shall I open an issue as a subtask of SOLR-236 even though there is
> already a performance related task (SOLR-2205)?
>
> Cheers,
> Michael



-- 
Met vriendelijke groet,

Martijn van Groningen

manipulate the results coming back from SOLR? (was: possible to do arithmetic on returned values?)

2011-12-12 Thread Gabriel Cooper

I'm hoping I just got lost in the shuffle due to posting on a Friday 
night. Is there a way to change a field's data via some function, e.g. 
add, subtract, product, etc.?



On 12/9/11 4:17 PM, Gabriel Cooper wrote:

Is there a way to manipulate the results coming back from SOLR?

I have a SOLR 3.5 index that contains values in cents (e.g. "100" in the
index represents $1.00) and in certain contexts (e.g. CSV export) I'd
like to divide by 100 for that field to provide a user-friendly "in
dollars" number. To do this I played around with Function Queries for a
while without realizing they're limited to relevancy scores, and later
found "DocTransformers" in 4.0 whose description sounded right but don't
exist in 3.5.

Is there anything else I haven't considered?

Thanks for any help

Gabriel Cooper.

Re: cache monitoring tools?

2011-12-12 Thread Justin Caratzas

Dmitry,

The only added stress that munin puts on each box is the 1 request per
stat per 5 minutes to our admin stats handler.  Given that we get 25
requests per second, this doesn't make much of a difference.  We don't
have a sharded index (yet) as our index is only 2-3 GB, but we do have slave 
servers with replicated
indexes that handle the queries, while our master handles
updates/commits.

Justin

Dmitry Kan  writes:

> Justin, in terms of the overhead, have you noticed if Munin puts much of it
> when used in production? In terms of the solr farm: how big is a shard's
> index (given you have sharded architecture).
>
> Dmitry
>
> On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas
> wrote:
>
>> At my work, we use Munin and Nagio for monitoring and alerts.  Munin is
>> great because writing a plugin for it so simple, and with Solr's
>> statistics handler, we can track almost any solr stat we want.  It also
>> comes with included plugins for load, file system stats, processes,
>> etc.
>>
>> http://munin-monitoring.org/
>>
>> Justin
>>
>> Paul Libbrecht  writes:
>>
>> > Allow me to chim in and ask a generic question about monitoring tools
>> > for people close to developers: are any of the tools mentioned in this
>> > thread actually able to show graphs of loads, e.g. cache counts or CPU
>> > load, in parallel to a console log or to an http request log??
>> >
>> > I am working on such a tool currently but I have a bad feeling of
>> reinventing the wheel.
>> >
>> > thanks in advance
>> >
>> > Paul
>> >
>> >
>> >
>> > Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :
>> >
>> >> Otis, Tomás: thanks for the great links!
>> >>
>> >> 2011/12/7 Tomás Fernández Löbbe 
>> >>
>> >>> Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use
>> any
>> >>> tool that visualizes JMX stuff like Zabbix. See
>> >>>
>> >>>
>> http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
>> >>>
>> >>> On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan 
>> wrote:
>> >>>
>>  The culprit seems to be the merger (frontend) SOLR. Talking to one
>> shard
>>  directly takes substantially less time (1-2 sec).
>> 
>>  On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan 
>> wrote:
>> 
>> > Tomás: thanks. The page you gave didn't mention cache specifically,
>> is
>> > there more documentation on this specifically? I have used solrmeter
>>  tool,
>> > it draws the cache diagrams, is there a similar tool, but which would
>> >>> use
>> > jmx directly and present the cache usage in runtime?
>> >
>> > pravesh:
>> > I have increased the size of filterCache, but the search hasn't
>> become
>>  any
>> > faster, taking almost 9 sec on avg :(
>> >
>> > name: search
>> > class: org.apache.solr.handler.component.SearchHandler
>> > version: $Revision: 1052938 $
>> > description: Search using components:
>> >
>> 
>> >>>
>> org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
>> >
>> > stats: handlerStart : 1323255147351
>> > requests : 100
>> > errors : 3
>> > timeouts : 0
>> > totalTime : 885438
>> > avgTimePerRequest : 8854.38
>> > avgRequestsPerSecond : 0.008789442
>> >
>> > the stats (copying fieldValueCache as well here, to show term
>>  statistics):
>> >
>> > name: fieldValueCache
>> > class: org.apache.solr.search.FastLRUCache
>> > version: 1.0
>> > description: Concurrent LRU Cache(maxSize=1, initialSize=10,
>> > minSize=9000, acceptableSize=9500, cleanupThread=false)
>> > stats: lookups : 79
>> > hits : 77
>> > hitratio : 0.97
>> > inserts : 1
>> > evictions : 0
>> > size : 1
>> > warmupTime : 0
>> > cumulative_lookups : 79
>> > cumulative_hits : 77
>> > cumulative_hitratio : 0.97
>> > cumulative_inserts : 1
>> > cumulative_evictions : 0
>> > item_shingleContent_trigram :
>> >
>> 
>> >>>
>> {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
>> > name: filterCache
>> > class: org.apache.solr.search.FastLRUCache
>> > version: 1.0
>> > description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
>> > minSize=138240, acceptableSize=145920, cleanupThread=false)
>> > stats: lookups : 1082854
>> > hits : 940370
>> > hitratio : 0.86
>> > inserts : 142486
>> > evictions : 0
>> > size : 142486
>> > warmupTime : 0
>> > cumulative_lookups : 1082854
>> > cumulative_hits : 940370
>> > cumulative_hitratio : 0.86
>> > cumulative_inserts : 142486
>> > cumulative_evictions : 0
>> >
>> >

Re: RegexQuery performance

2011-12-12 Thread Jay Luker

On Sat, Dec 10, 2011 at 9:25 PM, Erick Erickson  wrote:
> My off-the-top-of-my-head notion is you implement a
> Filter whose job is to emit some "special" tokens when
> you find strings like this that allow you to search without
> regexes. For instance, in the example you give, you could
> index something like...oh... I don't know, ###VER### as
> well as the "normal" text of "IRAS-A-FPA-3-RDR-IMPS-V6.0".
> Now, when searching for docs with the pattern you used
> as an example, you look for ###VER### instead. I guess
> it all depends on how many regexes you need to allow.
> This wouldn't work at all if you allow users to put in arbitrary
> regexes, but if you have a small enough number of patterns
> you'll allow, something like this could work.

This is a great suggestion. I think the number of users that need this
feature, as well as the variety of regexs that would be used, is small
enough that it could definitely work. I turns it into a problem of
collecting the necessary regexes, plus the UI details.

Thanks!
--jay

Re: limiting the content of content field in search results

2011-12-12 Thread Juan Grande

Hi,

It sounds like highlighting might be the solution for you. See
http://wiki.apache.org/solr/HighlightingParameters

*Juan*



On Mon, Dec 12, 2011 at 4:42 AM, ayyappan  wrote:

> I am developing n application which indexes whole pdfs and other documents
> to solr. I have completed a working version of my application. But there
> are
> some problems. The main one is that when I do a search the indexed whole
> document is shown. I have used solrj and need some help to reduce this
> content.
>
>How limiting the content of content field in search results and display
> over there .
>
> i need like this
>
>
>
> *Grammer1.docx*
> Blazing â€“ burring Faceted Cluster â€“ to gather Geospatial Replication
> â€“
> coping Distinguish â€“ apart from Flawlessly â€“ perfectly Recipe â€“method
> Concentrated inscription
> Last Modified : 2011-12-11T14:42:27Z
>
> *who.pdf*
> Who We Are Table of contents 1 Solr Committers (in alphabetical
> order)fgfgfgfg2 2 Inactive Committers (in alphabetical orde
>
> *version_control.pdf*
> Solr Version Control System Table of contents 1 Overview.gfgfgfg 2 Web Acce
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/limiting-the-content-of-content-field-in-search-results-tp3578859p3578859.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Solr Load Testing

2011-12-12 Thread Kissue Kissue

Hi,

I ran some jmeter load testing on my solr instance version 3.5.0 running on
tomcat 6.6.29 using 1000 concurrent users and the error below is thrown
after a certain number of requests. My solr configuration is basically the
default configuration at this time. Has anybody done soemthing similar?
Should solr be able to handle 1000 concurrent users based on the default
configuration? Any ideas let me know. Thanks.

12-Dec-2011 15:56:02 org.apache.solr.common.SolrException log
SEVERE: ClientAbortException:  java.io.IOException
at
org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:319)
at
org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:288)
at
org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:98)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:861)
at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
at
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1584)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException
at
org.apache.coyote.http11.InternalAprOutputBuffer.flushBuffer(InternalAprOutputBuffer.java:696)
at
org.apache.coyote.http11.InternalAprOutputBuffer.flush(InternalAprOutputBuffer.java:284)
at
org.apache.coyote.http11.Http11AprProcessor.action(Http11AprProcessor.java:1016)
at org.apache.coyote.Response.action(Response.java:183)
at
org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:314)
... 20 more

Re: Virtual Memory very high

2011-12-12 Thread Yury Kats

On 12/11/2011 4:57 AM, Rohit wrote:
> What are the difference in the different DirectoryFactory?

http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html
http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html

Re: MySQL data import

2011-12-12 Thread Brian Lamb

Hi all,

Any tips on this one?

Thanks,

Brian Lamb

On Sun, Dec 11, 2011 at 3:54 PM, Brian Lamb
wrote:

> Hi all,
>
> I have a few questions about how the MySQL data import works. It seems it
> creates a separate connection for each entity I create. Is there any way to
> avoid this?
>
> By nature of my schema, I have several multivalued fields. Each one I
> populate with a separate entity. Is there a better way to do it? For
> example, could I pull in all the singular data in one sitting and then come
> back in later and populate with the multivalued items.
>
> An alternate approach in some cases would be to do a GROUP_CONCAT and then
> populate the multivalued column with some transformation. Is that possible?
>
> Lastly, is it possible to use copyField to copy three regular fields into
> one multiValued field and have all the data show up?
>
> Thanks,
>
> Brian Lamb
>

URLDataSource delta import

2011-12-12 Thread Brian Lamb

Hi all,

According to
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
a
delta-import is not "currently" implemented for URLDataSource. I say
"currently" because I've noticed that such documentation is out of date in
many places. I wanted to see if this feature had been added yet or if there
were plans to do so.

Thanks,

Brian Lamb

Possible to configure the fq caching settings on the server?

2011-12-12 Thread Andrew Lundgren

Is it possible to configure solr such that the filter query cache settings is 
set to fq={!cache=false} by default?

--
Andrew Lundgren
lundg...@familysearch.org


 NOTICE: This email message is for the sole use of the intended recipient(s) 
and may contain confidential and privileged information. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.

Re: MySQL data import

2011-12-12 Thread Gora Mohanty

On Mon, Dec 12, 2011 at 2:24 AM, Brian Lamb
 wrote:
> Hi all,
>
> I have a few questions about how the MySQL data import works. It seems it
> creates a separate connection for each entity I create. Is there any way to
> avoid this?

Not sure, but I do not think that it is possible. However, from your description
below, I think that you are unnecessarily multiplying entities.

> By nature of my schema, I have several multivalued fields. Each one I
> populate with a separate entity. Is there a better way to do it? For
> example, could I pull in all the singular data in one sitting and then come
> back in later and populate with the multivalued items.

Not quite sure as to what you mean. Would it be possible for you
to post your schema.xml, and the DIH configuration file? Preferably,
put these on pastebin.com, and send us links. Also, you should
obfuscate details like access passwords.

> An alternate approach in some cases would be to do a GROUP_CONCAT and then
> populate the multivalued column with some transformation. Is that possible?
[...]

This is how we have been handling it. A complete description would
be long, but here is the gist of it:
* A transformer will be needed. In this case, we found it easiest
  to use a Java-based transformer. Thus, your entity should include
  something like

* There are complications involved if NULL values are allowed
   for the field, in which case you would need to use COALESCE,
   maybe along with CAST
* The transformer would look up "myfield", split along the separator,
   and populate the multi-valued field.

This *is* a little complicated, so I would also like to hear about
possible alternatives.

Regards,
Gora

Re: Trim and copy a solr field

2011-12-12 Thread Juan Grande

Hi Swapna,

You could try using a copyField to a field that uses
PatternReplaceFilterFactory:

The regular expression may not be exactly what you want, but it will give
you an idea of how to do it. I'm pretty sure there must be some other ways
of doing this, but this is the first that comes to my mind.

*Juan*

On Mon, Dec 12, 2011 at 4:46 AM, Swapna Vuppala wrote:

> Hi,
>
> I have a Solr field that contains the absolute path of the file that is
> indexed, which will be something like
> file:/myserver/Folder1/SubFol1/Sub-Fol2/Test.msg.
>
> Am interested in indexing the location in a separate field.  I was looking
> for some way to trim the field value from last occurrence of char "/", so
> that I can get the location value, something like
> file:/myserver/Folder1/SubFol1/Sub-Fol2,
> and store it in a new field. Can you please suggest some way to achieve
> this ?
>
> Thanks and Regards,
> Swapna.
> 
> Electronic mail messages entering and leaving Arup  business
> systems are scanned for acceptability of content and viruses
>

Re: MySQL data import

2011-12-12 Thread Erick Erickson

You might want to consider just doing the whole
thing in SolrJ with a JDBC connection. When things
get complex, it's sometimes more straightforward.

Best
Erick...

P.S. Yes, it's pretty standard to have a single
field be the destination for several copyField
directives.

On Mon, Dec 12, 2011 at 12:48 PM, Gora Mohanty  wrote:
> On Mon, Dec 12, 2011 at 2:24 AM, Brian Lamb
>  wrote:
>> Hi all,
>>
>> I have a few questions about how the MySQL data import works. It seems it
>> creates a separate connection for each entity I create. Is there any way to
>> avoid this?
>
> Not sure, but I do not think that it is possible. However, from your 
> description
> below, I think that you are unnecessarily multiplying entities.
>
>> By nature of my schema, I have several multivalued fields. Each one I
>> populate with a separate entity. Is there a better way to do it? For
>> example, could I pull in all the singular data in one sitting and then come
>> back in later and populate with the multivalued items.
>
> Not quite sure as to what you mean. Would it be possible for you
> to post your schema.xml, and the DIH configuration file? Preferably,
> put these on pastebin.com, and send us links. Also, you should
> obfuscate details like access passwords.
>
>> An alternate approach in some cases would be to do a GROUP_CONCAT and then
>> populate the multivalued column with some transformation. Is that possible?
> [...]
>
> This is how we have been handling it. A complete description would
> be long, but here is the gist of it:
> * A transformer will be needed. In this case, we found it easiest
>  to use a Java-based transformer. Thus, your entity should include
>  something like
>   transformer="com.mycompany.search.solr.handler.JobsNumericTransformer...>
>  ...
>  
>  Here, the class name to be used for the transformer attribute follows
>  the usual Java rules, and the .jar needs to be made available to Solr.
> * The SELECT statement for the entity looks something like
>  select group_concat( myfield SEPARATOR '@||@')...
>  The separator should be something that does not occur in your
>  normal data stream.
> * Within the entity, define
>   
> * There are complications involved if NULL values are allowed
>   for the field, in which case you would need to use COALESCE,
>   maybe along with CAST
> * The transformer would look up "myfield", split along the separator,
>   and populate the multi-valued field.
>
> This *is* a little complicated, so I would also like to hear about
> possible alternatives.
>
> Regards,
> Gora

Re: Solr Load Testing

2011-12-12 Thread Otis Gospodnetic

Hi,

1000 *concurrent* *queries* is a lot.  If your index is small relatively to hw 
specs, sure.  If not, then tuning may be needed, including maybe Tomcat and JVM 
level tuning.  The error below is from Tomcat, not really tied to Solr...

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


- Original Message -
> From: Kissue Kissue 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Monday, December 12, 2011 11:43 AM
> Subject: Solr Load Testing
> 
> Hi,
> 
> I ran some jmeter load testing on my solr instance version 3.5.0 running on
> tomcat 6.6.29 using 1000 concurrent users and the error below is thrown
> after a certain number of requests. My solr configuration is basically the
> default configuration at this time. Has anybody done soemthing similar?
> Should solr be able to handle 1000 concurrent users based on the default
> configuration? Any ideas let me know. Thanks.
> 
> 12-Dec-2011 15:56:02 org.apache.solr.common.SolrException log
> SEVERE: ClientAbortException:  java.io.IOException
>         at
> org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:319)
>         at
> org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:288)
>         at
> org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:98)
>         at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
>         at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
>         at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
>         at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
>         at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>         at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>         at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>         at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>         at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>         at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>         at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>         at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>         at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:861)
>         at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
>         at
> org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1584)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException
>         at
> org.apache.coyote.http11.InternalAprOutputBuffer.flushBuffer(InternalAprOutputBuffer.java:696)
>         at
> org.apache.coyote.http11.InternalAprOutputBuffer.flush(InternalAprOutputBuffer.java:284)
>         at
> org.apache.coyote.http11.Http11AprProcessor.action(Http11AprProcessor.java:1016)
>         at org.apache.coyote.Response.action(Response.java:183)
>         at
> org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:314)
>         ... 20 more
>

Re: performance of json vs xml?

2011-12-12 Thread Mark Miller

On Sun, Dec 11, 2011 at 3:16 PM, Jason Toy  wrote:

> I'm thinking about modifying my index process to use json because all my
> docs are originally in json anyway . Are there any performance issues if I
> insert json docs instead of xml docs?  A colleague recommended to me to
> stay with xml because solr is highly optimized for xml.
>

I'd make a big bet the JSON parsing is faster than the xml parsing.

And you have the cost of converting your docs to XML...

If you are too worried, do some testing. I'd simply use JSON. The JSON
support should be considered first class - it just came after the XML
support.

-- 
- Mark

http://www.lucidimagination.com

Re: SmartChineseAnalyzer

2011-12-12 Thread Chris Hostetter


: Subject: SmartChineseAnalyzer
: References:
: 
:  
:  
: In-Reply-To:
: 

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss

Facet on same date field multiple times

2011-12-12 Thread dbashford

I've Googled around a bit and seen this referenced a few times, but cannot
seem to get it to work

I have a query that looks like this:

facet=true
&facet.date={!key=foo}date
&f.foo.facet.date.start=2010-12-12T00:00:00Z
&f.foo.facet.date.end=2011-12-12T00:00:00Z
&f.foo.facet.date.gap=%2B1DAY

Eventually the goal is to do different ranges on the same field.  Month by
day.  Day by hour.  Year by week.  Something to that effect.  But I thought
I'd start simple to see if I could get the syntax right and what I have
above doesn't seem to work.

I get:
message Missing required parameter: f.date.facet.date.start (or default:
facet.date.start)
description The request sent by the client was syntactically incorrect
(Missing required parameter: f.date.facet.date.start (or default:
facet.date.start)).

So it doesn't seem interested in me using the local key.  From reading here: 
http://lucene.472066.n3.nabble.com/Date-Faceting-on-Solr-3-1-td3302499.html#a3309517
it would seem i should be able to do it (see the note at the bottom).

I know one option is to copyField the date into a few other spots, and I can
use that as a last resort, but if this works and I'm just arsing something
up...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-on-same-date-field-multiple-times-tp3580449p3580449.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet on same date field multiple times

2011-12-12 Thread Chris Hostetter


: Eventually the goal is to do different ranges on the same field.  Month by
: day.  Day by hour.  Year by week.  Something to that effect.  But I thought
: I'd start simple to see if I could get the syntax right and what I have
: above doesn't seem to work.
...
: So it doesn't seem interested in me using the local key.  From reading here: 
: 
http://lucene.472066.n3.nabble.com/Date-Faceting-on-Solr-3-1-td3302499.html#a3309517
: it would seem i should be able to do it (see the note at the bottom).

That was me, and i was wrong in that post ... what worked was changing the 
output key, but using that key to specify the various date (ie: range) 
based params has never worked, and i didn't realize that at the time.

The work to try and fix this is currently being tracked in tihs Jira 
issue, i recently spelled out what i think would be needed to finish it 
up, but i don't think anyone is actively working on it (if you want to 
jump in, patches would certianly be welcome)...

https://issues.apache.org/jira/browse/SOLR-1351

-Hoss

Re: MySQL data import

2011-12-12 Thread Brian Lamb

Thanks all. Erick, is there documentation on doing things with SolrJ and a
JDBC connection?

On Mon, Dec 12, 2011 at 1:34 PM, Erick Erickson wrote:

> You might want to consider just doing the whole
> thing in SolrJ with a JDBC connection. When things
> get complex, it's sometimes more straightforward.
>
> Best
> Erick...
>
> P.S. Yes, it's pretty standard to have a single
> field be the destination for several copyField
> directives.
>
> On Mon, Dec 12, 2011 at 12:48 PM, Gora Mohanty  wrote:
> > On Mon, Dec 12, 2011 at 2:24 AM, Brian Lamb
> >  wrote:
> >> Hi all,
> >>
> >> I have a few questions about how the MySQL data import works. It seems
> it
> >> creates a separate connection for each entity I create. Is there any
> way to
> >> avoid this?
> >
> > Not sure, but I do not think that it is possible. However, from your
> description
> > below, I think that you are unnecessarily multiplying entities.
> >
> >> By nature of my schema, I have several multivalued fields. Each one I
> >> populate with a separate entity. Is there a better way to do it? For
> >> example, could I pull in all the singular data in one sitting and then
> come
> >> back in later and populate with the multivalued items.
> >
> > Not quite sure as to what you mean. Would it be possible for you
> > to post your schema.xml, and the DIH configuration file? Preferably,
> > put these on pastebin.com, and send us links. Also, you should
> > obfuscate details like access passwords.
> >
> >> An alternate approach in some cases would be to do a GROUP_CONCAT and
> then
> >> populate the multivalued column with some transformation. Is that
> possible?
> > [...]
> >
> > This is how we have been handling it. A complete description would
> > be long, but here is the gist of it:
> > * A transformer will be needed. In this case, we found it easiest
> >  to use a Java-based transformer. Thus, your entity should include
> >  something like
> >   > transformer="com.mycompany.search.solr.handler.JobsNumericTransformer...>
> >  ...
> >  
> >  Here, the class name to be used for the transformer attribute follows
> >  the usual Java rules, and the .jar needs to be made available to Solr.
> > * The SELECT statement for the entity looks something like
> >  select group_concat( myfield SEPARATOR '@||@')...
> >  The separator should be something that does not occur in your
> >  normal data stream.
> > * Within the entity, define
> >   
> > * There are complications involved if NULL values are allowed
> >   for the field, in which case you would need to use COALESCE,
> >   maybe along with CAST
> > * The transformer would look up "myfield", split along the separator,
> >   and populate the multi-valued field.
> >
> > This *is* a little complicated, so I would also like to hear about
> > possible alternatives.
> >
> > Regards,
> > Gora
>

Re: Possible to configure the fq caching settings on the server?

2011-12-12 Thread Chris Hostetter


: Is it possible to configure solr such that the filter query cache 
: settings is set to fq={!cache=false} by default?

well, you could always disable the filterCache -- but i get the impression 
you want *most* "fq" filters to not be cached, but sometimes you'll 
specify some thta you *do* want cached? is that it?

I don't know of anyway to do that (or even anyway to change solr easily to 
make that posisble) for *only* the "fq" params.

I was going to suggest that something like this should work a a way to 
disable caching of all queries unless you explicitly re-enable it...

  ?cache=false?q={!cache=true}foo&fq=bar&fq={!cache=true}yak

...in which case you could change up your "q" param so it would default to 
being cached (and move that "cache=false" to a default in your solrconfig 
if you desired)...

  ?cache=false?q={!cache=true v=$qq}&qq=foo&fq=bar&fq={!cache=true}yak

...but eviddently thta doesn't work.   aparently "cache" is only consulted 
as a local param, and doesn't default up to the other request (or 
configed default) SolrParams.

I'm not sure if that was intentional or an oversight -- but if you'd like 
to open a Jira requesting that it work someone could probably look into 
it (patches welcome!)


-Hoss

Re: sub query parsing bug???

2011-12-12 Thread Steve Fuchs

Thanks for the reply!

I do believe I have set (or have tried setting) all of those options for the 
default query and none of them seem to help. Anytime an OR appears inside the 
query the default for that query becomes OR. At least thats the anecdotal 
evidence I've encountered.
Also in this case the results do match what the parser is telling me, so I'm 
not getting the results I expect.

As for the second suggestion, the actual fields searched are controlled by the 
user, so it can get more complicated. But even in the single field search I do 
believe I need to use the edismax parser. I have tried the regular query syntax 
for searching one field and find that it can't handle the more complex queries.

Something like 
ref_expertise:(nonlinear OR soliton) AND "optical lattice"

won't return any documents even though there are many that satisfy those 
requirements. Is there some other way I could be executing this query even in 
the single field case?

Thanks and Thanks in Advance for all help

Steve





On Dec 6, 2011, at 8:26 AM, Erick Erickson wrote:

> Hmmm, does this help?
> 
> In Solr 1.4 and prior, you should basically set mm=0 if you want the
> equivilent of q.op=OR, and mm=100% if you want the equivilent of
> q.op=AND. In 3.x and trunk the default value of mm is dictated by the
> q.op param (q.op=AND => mm=100%; q.op=OR => mm=0%). Keep in mind the
> default operator is effected by your schema.xml  defaultOperator="xxx"/> entry. In older versions of Solr the default
> value is 100% (all clauses must match)
> (from http://wiki.apache.org/solr/DisMaxQParserPlugin).
> 
> I don't think you'll see the query parsed as you expect, but the
> results of the query
> should be what you expect. Tricky, eh?
> 
> I'm assuming you've simplified the example for clarity and your qf
> will be on more than one field when you use it "for real", but if not
> the actual query doesn't need edismax at all.
> 
> Best
> Erick
> 
> On Mon, Dec 5, 2011 at 10:52 AM, Steve Fuchs  wrote:
>> Hello All,
>> 
>> I have my field description listed below, but I don't think its pertinent. 
>> As my issue seems to be with the query parser.
>> 
>> I'm currently using an edismax subquery clause to help with my searching as 
>> such:
>> 
>> _query_:"{!type=edismax qf='ref_expertise'}\(nonlinear OR soliton\) AND 
>> \"optical lattice\""
>> 
>> translates correctly to
>> 
>> +(+((ref_expertise:nonlinear) (ref_expertise:soliton)) 
>> +(ref_expertise:"optical lattice"))
>> 
>> 
>> but the users expect the default operator to be AND (it is in all simpler 
>> searches), however nothing I can do here gets me that same result as above 
>> when the search is:
>> 
>> _query_:"{!type=edismax qf='ref_expertise'}\(nonlinear OR soliton\) 
>> \"optical lattice\""
>> 
>> this gets converted to:
>> 
>> +(((ref_expertise:nonlinear) (ref_expertise:soliton)) 
>> (ref_expertise:"optical lattice"))
>> 
>> where the "optical lattice" is optional.
>> 
>> These produce the same results, trying q.op and mm. Also the default search 
>> term as  set in the solr.config is AND.
>> 
>> _query_:"{!type=edismax q.op=AND qf='ref_expertise'}\(nonlinear OR 
>> soliton\)\"optical lattice\""
>> _query_:"{!type=edismax mm=1.0 qf='ref_expertise'}\(nonlinear OR 
>> soliton\)\"optical lattice\""
>> 
>> 
>> 
>> 
>> Any ideas???
>> 
>> Thanks In Advance
>> 
>> Steven Fuchs
>> 
>> 
>> 
>> 
>> 
>> 
>>
>>  
>>
>>
>>
>>
>>> maxGramSize="25" />
>>  
>>  
>>
>>
>>
>>  
>>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>

Re: NRT or similar for Solr 3.5?

2011-12-12 Thread Steven Ou

Yeah, running Chrome on OSX and doesn't do anything.

Just switched to Firefox and it works. *But*, also don't seem to be
receiving confirmation email.
--
Steven Ou | 歐偉凡

*ravn.com* | Chief Technology Officer
steve...@gmail.com | +1 909-569-9880


2011/12/12 vikram kamath 

> The Onclick handler does not seem to be called on google chrome (Ubuntu ).
>
> Also , I dont seem to receive the email with the confirmation link on
> registering (I have checked my spam)
>
>
>
>
> Regards
> Vikram Kamath
>
>
>
> 2011/12/12 Nagendra Nagarajayya 
>
> > Steven:
> >
> > There is an onclick handler that allows you to download the src. BTW, an
> > early access Solr 3.5 with RankingAlgorithm 1.3 (NRT) release is
> > available for download. So please give it a try.
> >
> > Regards,
> >
> > - Nagendra Nagarajayya
> > http://solr-ra.tgels.org
> > http://rankingalgorithm.tgels.org
> >
> >
> > On 12/10/2011 11:18 PM, Steven Ou wrote:
> > > All the links on the download section link to
> http://solr-ra.tgels.org/#
> > > --
> > > Steven Ou | 歐偉凡
> > >
> > > *ravn.com* | Chief Technology Officer
> > > steve...@gmail.com | +1 909-569-9880
> > >
> > >
> > > 2011/12/11 Nagendra Nagarajayya 
> > >
> > >> Steven:
> > >>
> > >> Not sure why you had problems, #downloads (
> > >> http://solr-ra.tgels.org/#downloads ) should point you to the
> downloads
> > >> section showing the different versions available for download ? Please
> > >> share if this is not so ( there were downloads yesterday with no
> > problems )
> > >>
> > >> Regarding NRT, you can switch between RA and Lucene at query level or
> at
> > >> config level; in the current version with RA, NRT is in effect while
> > >> with lucene, it is not, you can get more information from here:
> > >> http://solr-ra.tgels.org/papers/Solr34_with_RankingAlgorithm13.pdf
> > >>
> > >> Solr 3.5 with RankingAlgorithm 1.3 should be available next week.
> > >>
> > >> Regards,
> > >>
> > >> - Nagendra Nagarajayya
> > >> http://solr-ra.tgels.org
> > >> http://rankingalgorithm.tgels.org
> > >>
> > >> On 12/9/2011 4:49 PM, Steven Ou wrote:
> > >>> Hey Nagendra,
> > >>>
> > >>> I took a look and Solr-RA looks promising - but:
> > >>>
> > >>>- I could not figure out how to download it. It seems like all the
> > >>>download links just point to "#"
> > >>>- I wasn't looking for another ranking algorithm, so would it be
> > >>>possible for me to use NRT but *not* RA (i.e. just use the normal
> > >> Lucene
> > >>>library)?
> > >>>
> > >>> --
> > >>> Steven Ou | 歐偉凡
> > >>>
> > >>> *ravn.com* | Chief Technology Officer
> > >>> steve...@gmail.com | +1 909-569-9880
> > >>>
> > >>>
> > >>> On Sat, Dec 10, 2011 at 5:13 AM, Nagendra Nagarajayya <
> > >>> nnagaraja...@transaxtions.com> wrote:
> > >>>
> >  Steven:
> > 
> >  Please take a look at Solr  with RankingAlgorithm. It offers NRT
> >  functionality. You can set your autoCommit to about 15 mins. You can
> > get
> >  more information from here:
> >  http://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.x<
> > >> http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x>
> > 
> >  Regards,
> > 
> >  - Nagendra Nagarajayya
> >  http://solr-ra.tgels.org
> >  http://rankingalgorithm.tgels.**org <
> > http://rankingalgorithm.tgels.org>
> > 
> > 
> >  On 12/8/2011 9:30 PM, Steven Ou wrote:
> > 
> > > Hi guys,
> > >
> > > I'm looking for NRT functionality or similar in Solr 3.5. Is that
> > > possible?
> > >> From what I understand there's NRT in Solr 4, but I can't figure
> out
> > > whether or not 3.5 can do it as well?
> > >
> > > If not, is it feasible to use an autoCommit every 1000ms? We don't
> > > currently process *that* much data so I wonder if it's OK to just
> > >> commit
> > > very often? Obviously not scalable on a large scale, but it is
> > feasible
> > > for
> > > a relatively small amount of data?
> > >
> > > I recently upgraded from Solr 1.4 to 3.5. I had a hard time getting
> > > everything working smoothly and the process ended up taking my site
> > >> down
> > > for a couple hours. I am very hesitant to upgrade to Solr 4 if it's
> > not
> > > necessary to get some sort of NRT functionality.
> > >
> > > Can anyone help me? Thanks!
> > > --
> > > Steven Ou | 歐偉凡
> > >
> > > *ravn.com* | Chief Technology Officer
> > > steve...@gmail.com | +1 909-569-9880
> > >
> > >
> > >>
> >
> >
>

Removing whitespace

2011-12-12 Thread Devon Baumgarten

Hello,

I am having trouble finding how to remove/ignore whitespace when indexing. The 
only answer I have found suggested that it is necessary to write my own 
tokenizer. Is this true? I want to remove whitespace and special characters 
from the phrase and create N-grams from the result.

Ultimately, the effect I am after is that searching "bobdole" would match "Bob 
Dole", "Bo B. Dole", and maybe "Bobdo". Maybe there is a better way... can 
anyone lend some assistance?

Thanks!

Dev B

Re: Removing whitespace

2011-12-12 Thread Alireza Salimi

That sounds strange requirement, but I think you can use CharFilters
instead of implementing your own Tokenizer.
Take a look at this section, maybe it helps.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories



The

On Mon, Dec 12, 2011 at 4:51 PM, Devon Baumgarten <
dbaumgar...@nationalcorp.com> wrote:

> Hello,
>
> I am having trouble finding how to remove/ignore whitespace when indexing.
> The only answer I have found suggested that it is necessary to write my own
> tokenizer. Is this true? I want to remove whitespace and special characters
> from the phrase and create N-grams from the result.
>
> Ultimately, the effect I am after is that searching "bobdole" would match
> "Bob Dole", "Bo B. Dole", and maybe "Bobdo". Maybe there is a better way...
> can anyone lend some assistance?
>
> Thanks!
>
> Dev B
>
>


-- 
Alireza Salimi
Java EE Developer

RE: Removing whitespace

2011-12-12 Thread Steven A Rowe

Hi Devon,

Something like this should work for you (untested!):


  
  
  
  
  


Steve

> -Original Message-
> From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com]
> Sent: Monday, December 12, 2011 4:52 PM
> To: 'solr-user@lucene.apache.org'
> Subject: Removing whitespace
> 
> Hello,
> 
> I am having trouble finding how to remove/ignore whitespace when indexing.
> The only answer I have found suggested that it is necessary to write my
> own tokenizer. Is this true? I want to remove whitespace and special
> characters from the phrase and create N-grams from the result.
> 
> Ultimately, the effect I am after is that searching "bobdole" would match
> "Bob Dole", "Bo B. Dole", and maybe "Bobdo". Maybe there is a better
> way... can anyone lend some assistance?
> 
> Thanks!
> 
> Dev B

Re: Removing whitespace

2011-12-12 Thread Koji Sekiguchi


(11/12/13 6:51), Devon Baumgarten wrote:

Hello,

I am having trouble finding how to remove/ignore whitespace when indexing. The 
only answer I have found suggested that it is necessary to write my own 
tokenizer. Is this true? I want to remove whitespace and special characters 
from the phrase and create N-grams from the result.


How about using one of existing charfilters?

https://builds.apache.org/job/Solr-3.x/javadoc/org/apache/solr/analysis/PatternReplaceCharFilterFactory.html

https://builds.apache.org/job/Solr-3.x/javadoc/org/apache/solr/analysis/MappingCharFilterFactory.html

koji
--
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

RE: Removing whitespace

2011-12-12 Thread Devon Baumgarten

Thanks Alireza, Steven and Koji for the quick responses!

I'll read up on those and give it a shot.

Devon Baumgarten

-Original Message-
From: Alireza Salimi [mailto:alireza.sal...@gmail.com] 
Sent: Monday, December 12, 2011 4:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Removing whitespace

That sounds strange requirement, but I think you can use CharFilters
instead of implementing your own Tokenizer.
Take a look at this section, maybe it helps.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories



The

On Mon, Dec 12, 2011 at 4:51 PM, Devon Baumgarten <
dbaumgar...@nationalcorp.com> wrote:

> Hello,
>
> I am having trouble finding how to remove/ignore whitespace when indexing.
> The only answer I have found suggested that it is necessary to write my own
> tokenizer. Is this true? I want to remove whitespace and special characters
> from the phrase and create N-grams from the result.
>
> Ultimately, the effect I am after is that searching "bobdole" would match
> "Bob Dole", "Bo B. Dole", and maybe "Bobdo". Maybe there is a better way...
> can anyone lend some assistance?
>
> Thanks!
>
> Dev B
>
>


-- 
Alireza Salimi
Java EE Developer

RE: Removing whitespace

2011-12-12 Thread Devon Baumgarten

Thanks Alireza, Steven and Koji for the quick responses!

I'll read up on those and give it a shot.

Devon Baumgarten

Re: MySQL data import

2011-12-12 Thread Erick Erickson

Here's a quick demo I wrote at one point. I haven't run it in a while,
but you should be able to get the idea.

package jdbc;

import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.common.SolrInputDocument;

import java.io.IOException;
import java.sql.*;
import java.util.ArrayList;
import java.util.Collection;

public class Indexer {
  public static void main(String[] args) {
startIndex("http://localhost:8983/solr";);
  }

  private static void startIndex(String url) {
Connection con = DataSource.getConnection();
try {

  long start = System.currentTimeMillis();
  // Create a multi-threaded communications channel to the Solr
server. Full interface (3.3) at:
  // 
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html
  StreamingUpdateSolrServer server = new
StreamingUpdateSolrServer(url, 10, 4);

  // You may want to set these timeouts higer, Solr occasionally
will have long pauses while
  // segments merge.
  server.setSoTimeout(1000);  // socket read timeout
  server.setConnectionTimeout(100);
  //server.setDefaultMaxConnectionsPerHost(100);
  //server.setMaxTotalConnections(100);
  //server.setFollowRedirects(false);  // defaults to false
  // allowCompression defaults to false.
  // Server side must support gzip or deflate for this to have any effect.
  //server.setAllowCompression(true);
  server.setMaxRetries(1); // defaults to 0.  > 1 not recommended.
  server.setParser(new XMLResponseParser()); // binary parser is
used by default

  doDocuments(server, con);
  server.commit(); // Only needs to be done at the end, autocommit
or commitWithin should
  // do the rest.
  long endTime = System.currentTimeMillis();
  System.out.println("Total Time Taken->" + (endTime - start) + " mils");

} catch (Exception e) {
  e.printStackTrace();
  String msg = e.getMessage();
  System.out.println(msg);
}
  }

  private static void doDocuments(StreamingUpdateSolrServer server,
Connection con) throws SQLException, IOException, SolrServerException
{

Statement st = con.createStatement();
ResultSet rs = st.executeQuery("select id,title,text from test");

// SolrInputDocument interface (3.3) at
// 
http://lucene.apache.org/solr/api/org/apache/solr/common/SolrInputDocument.html
Collection docs = new ArrayList();
int total = 0;
int counter = 0;

while (rs.next()) {
  SolrInputDocument doc = new SolrInputDocument(); // DO NOT move
this outside the while loop
  // or be sure to call doc.clear()

  String id = rs.getString("id");
  String title = rs.getString("title");
  String text = rs.getString("text");

  doc.addField("id", id);
  doc.addField("title", title);
  doc.addField("text", text);

  docs.add(doc);
  ++counter;
  ++total;
  if (counter > 1000) { // Completely arbitrary, just batch up
more than one document for throughput!
server.add(docs);
docs.clear();
counter = 0;
  }
}
System.out.println("Total " + total + " Docs added succesfully");

  }
}

// Trivial class showing connecting to a MySql database server via jdbc...
class DataSource {
  public static Connection getConnection() {
Connection conn = null;
try {

  Class.forName("com.mysql.jdbc.Driver").newInstance();
  System.out.println("Driver Loaded..");
  conn = DriverManager.getConnection("jdbc:mysql://172.16.0.169:3306/test?"
+ "user=testuser&password=test123");
  System.out.println("Connection build..");
} catch (Exception ex) {
  System.out.println(ex);
}
return conn;
  }

  public static void closeConnection(Connection con) {
try {
  if (con != null)
con.close();
} catch (SQLException e) {
  e.printStackTrace();
}
  }
}

On Mon, Dec 12, 2011 at 2:57 PM, Brian Lamb
 wrote:
> Thanks all. Erick, is there documentation on doing things with SolrJ and a
> JDBC connection?
>
> On Mon, Dec 12, 2011 at 1:34 PM, Erick Erickson 
> wrote:
>
>> You might want to consider just doing the whole
>> thing in SolrJ with a JDBC connection. When things
>> get complex, it's sometimes more straightforward.
>>
>> Best
>> Erick...
>>
>> P.S. Yes, it's pretty standard to have a single
>> field be the destination for several copyField
>> directives.
>>
>> On Mon, Dec 12, 2011 at 12:48 PM, Gora Mohanty  wrote:
>> > On Mon, Dec 12, 2011 at 2:24 AM, Brian Lamb
>> >  wrote:
>> >> Hi all,
>> >>
>> >> I have a few questions about how the MySQL data import works. It seems
>> it
>> >> creates a separate connection for each entity I create. Is there any
>> way to
>> >> avoid this?
>> >
>> > Not sure, but I do not think that it is possible. However, from your
>> description
>> > below, I thin

Re: Images for the DataImportHandler page

2011-12-12 Thread Chris Hostetter


: There is some very useful information on the 
: http://wiki.apache.org/solr/DataImportHandler page about indexing 
: database contents, but the page contains three images whose links are 
: broken. The descriptions of those images sound like it would be quite 
: handy to see them in the page. Could someone please fix the links so the 
: images are displayed?

Images, and all attachments in general, were disabled some time back for 
all of wiki.apache.org.  Pages that still refer/link to old attachments 
just never got updated after the fact to reflect this.

ASF Infra has a policy permitting individual wiki's to re-enable 
attachment support, but doing so would require switching the entire wiki 
over to a new ACL model, where only people who had been granted explicit 
access to perform edits would be allowed to do so.

My personal opinion is that i'd rather have a low barrier for editing the 
wiki (ie: register and do a textcha) and live w/o images; rather then have 
images, but have a high barrier to editing (ie: register, ask for 
edit permission from a committer, *and* do textchas).  But i'm open to 
other suggestions...

https://wiki.apache.org/general/OurWikiFarm
https://wiki.apache.org/general/OurWikiFarm#Attachments


-Hoss

Re: server down caused by complex query

2011-12-12 Thread Chris Hostetter


: Because our user send very long and complex queries with asterisk and near
: operator.
: Sometimes near operator exceeds 1,000 and keywords almost include asterisk.
: If such query is sent to server, jvm memory is full. (our jvm memory

"near" operator isn't something I know of as a built in feature of SOlr 
(definitely not Solr 1.4) ... which query parser are you using?  

what is the value of your  setting in solrconfig.xml?  
that's the method that should help to limit the risk of query explosion if 
users try to overwelm the server with really large queries, but for 
wildcard and prefix queries (ie: using "*") even Solr 1.4 implemented 
those using "ConstantScoreQuery" instead of using query expansion, so i'm 
no sure how/why a single query could eat up so much ram.

In general, there have been a lot of improvements in memory usage in 
recent versions of Solr, so i suggest you upgrade to Solr 3.5 -- but 
beyond that basic advice any other suggestions will require a *lot* more 
specifics about exactly waht your configs look like, the full requests 
(all params) of queries that are causing you problems, detials on your 
JVM configuration, etc... 

-Hoss

FTP mount crash when crawling with solrj

2011-12-12 Thread hadi

I have a lots of files in my FTP account,and i use the curlftpfs to mount
them to folder and then start index them with solrj api, but after a minutes
pass something strange happen and the mounted folder is not accessible and
crash,also i can not unmount it and the message "device is in use" appear,
my solrj code is OK and i test it with my local files and the result is
great but indexing mounted folder is my terrible problem, i mention that i
use the curlftpfs with both centOS,fedora and Ubuntu but the result of
crashing is the same,how can i fix this problem? is the problem with the my
code? is sombody have ever face this problem when indexed of mounted folder?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/FTP-mount-crash-when-crawling-with-solrj-tp3580982p3580982.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr-3.5.0/Nutch-1.4 - SolrDeleteDuplicates fails

2011-12-12 Thread Patrick Durusau


Greetings!

On the Nutch Tutorial:

I can run the following commands with Solr-3.5.0/Nutch-1.4:

bin/nutch crawl urls -dir crawl -depth 3 -topN 5


then:

bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb 
crawl/linkdb crawl/segments/*



successfully.

But, if I run:

bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5

It fails with the following messages:

SolrIndexer: starting at 2011-12-11 14:01:27

Adding 11 documents

SolrIndexer: finished at 2011-12-11 14:01:28, elapsed: 00:00:01

SolrDeleteDuplicates: starting at 2011-12-11 14:01:28

SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/

Exception in thread "main" java.io.IOException: Job failed!

at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)

at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)


at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)


at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

I am running on Ubuntu 10.10 with 12 GB of memory, Java version 1.6.0_26.

I can delete the crawl directory and replicate this error consistently.

Suggestions?

Other than "...use the way that doesn't fail." ;-)

I am concerned that a different invocation of Solr failing consistently 
represents something that may cause trouble elsewhere when least 
expected. (And hard to isolate as the problem.)


Thanks!

Hope everyone is having a great weekend!

Patrick

PS: From the hadoop log (when it fails) if that's helpful:

2011-12-11 15:21:51,436 INFO  solr.SolrWriter - Adding 11 documents

2011-12-11 15:21:52,250 INFO  solr.SolrIndexer - SolrIndexer: finished 
at 2011-12-11 15:21:52, elapsed: 00:00:01


2011-12-11 15:21:52,251 INFO  solr.SolrDeleteDuplicates - 
SolrDeleteDuplicates: starting at 2011-12-11 15:21:52


2011-12-11 15:21:52,251 INFO  solr.SolrDeleteDuplicates - 
SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/


2011-12-11 15:21:52,330 WARN  mapred.LocalJobRunner - job_local_0020

java.lang.NullPointerException

at org.apache.hadoop.io.Text.encode(Text.java:388)

at org.apache.hadoop.io.Text.set(Text.java:178)

at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)


at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)


at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)


at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)


at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)


--
Patrick Durusau
patr...@durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
OASIS Technical Advisory Board (TAB) - member

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau

highlighting questions

2011-12-12 Thread Bent Jensen



I am trying to figure out how to display search query fields highlighted in 
html. I can enable the highlighting in the query, and I think I get the correct 
response back (See below: I search using 'Contents' and the highlighting is 
shown with  and . However, I can't figure out what to add to 
the xslt file to display in html. I think it is a question of defining the 
appropriate xpath(?), but I am stuck. Can someone point me in the right 
direction? Thanks in advance!


Here is the result I get back:
 
- 
- 
  0 
  20 
- 
   
  on 
  '' 
  * 
   
  on 
  10 
  2.2 
   
  0 
  contents 
  '' 
   
   
  
  
- 
- 
- 
  Start with the Table of Contents. See if you can find the topic that you 
are interested in. Look through the section to see if there is a resource that 
can help you. If you find one, you may want to attach a Post-it tab so you can 
find the page later. Write down all of the information that you need to find 
out more information about the resource: agency name, name of contact person, 
telephone number, email and website addresses. If you were unable to find a 
resource that will help you in this resource guide, a good first step would be 
to call your local Independent Living Center. They will have a good idea of 
what is available in your area. A second step would be to call or email us at 
the Rehabilitation Research Center. We have a ROBOT resource specialist who may 
be able to assist. You can reach Lois Roberts, the “Back On Track …To Success” 
Mentoring Program Assistant, at 408-793-6426 or email her at 
lois.robe...@hhs.sccgov.org 
  
- 
  robot.pdf#page=11 
  
  CHAPTER 1: How to Use This Resource Guide 
  1-1 
  
  
- 
- 
- 
  Start with the Table of ''Contents''. See if you can 
find the topic that you are interested in. Look

Re: server down caused by complex query

2011-12-12 Thread Jason

Hellow, Hoss

We're using ComplexPhraseQueryParser and maxBooleanClauses setting is
100.
I know maxBooleanClauses is so big.
But we are expert search organization and queries are very complex and
include wildcard.
So we need it.
Our application receives type of queries like ((A* OR B* OR C*,...) n/2 (X*
OR Y* OR Z*,...)) AND (...)  from user.
Then it is converted into solr query like ("A* X*"~2 OR "A* Y*"~2 OR "A*
Z*"~2 OR "B* X*"~2 OR ...) AND (...).
Like above, queries for near expression is written repeatedly.
I expect this is inefficient and why jvm memory is full.

I think surround query parser may our solution.
So now we are customizing surround query parser because it is very limited.


Below is out tomcat setenv...
==
export CATALINA_OPTS="-Xms112640m -Xmx112640m"
export CATALINA_OPTS="$CATALINA_OPTS -Dserver"
export CATALINA_OPTS="$CATALINA_OPTS
-Djava.library.path=/usr/local/lib:/usr/local/apr/lib"
export CATALINA_OPTS="$CATALINA_OPTS -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9014
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false"
export CATALINA_OPTS="$CATALINA_OPTS -Dfile.encoding=utf-8"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+UseConcMarkSweepGC"
==

Thanks
Jason

--
View this message in context: 
http://lucene.472066.n3.nabble.com/server-down-caused-by-complex-query-tp3535506p3581218.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: sub query parsing bug???

2011-12-12 Thread Erick Erickson

Well, your query below becomes ref_expertise:(nonlinear OR soliton)
AND default_search:"optical lattice:"

The regular Solr/Lucene query should handle pretty much anything you
can throw at it. But do be aware that Solr/Lucene syntax is not true
boolean logic, you have to think in terms of SHOULD, MUST, MUST_NOT.

But this works:
q={!type=edismax qf='name' }(nonlinear OR soliton) AND "optical lattice"
giving this:
+(+((name:nonlinear) (name:soliton)) +(name:"optical lattice"))

Best
Erick

On Mon, Dec 12, 2011 at 3:29 PM, Steve Fuchs  wrote:
> Thanks for the reply!
>
> I do believe I have set (or have tried setting) all of those options for the 
> default query and none of them seem to help. Anytime an OR appears inside the 
> query the default for that query becomes OR. At least thats the anecdotal 
> evidence I've encountered.
> Also in this case the results do match what the parser is telling me, so I'm 
> not getting the results I expect.
>
> As for the second suggestion, the actual fields searched are controlled by 
> the user, so it can get more complicated. But even in the single field search 
> I do believe I need to use the edismax parser. I have tried the regular query 
> syntax for searching one field and find that it can't handle the more complex 
> queries.
>
> Something like
> ref_expertise:(nonlinear OR soliton) AND "optical lattice"
>
> won't return any documents even though there are many that satisfy those 
> requirements. Is there some other way I could be executing this query even in 
> the single field case?
>
> Thanks and Thanks in Advance for all help
>
> Steve
>
>
>
>
>
> On Dec 6, 2011, at 8:26 AM, Erick Erickson wrote:
>
>> Hmmm, does this help?
>>
>> In Solr 1.4 and prior, you should basically set mm=0 if you want the
>> equivilent of q.op=OR, and mm=100% if you want the equivilent of
>> q.op=AND. In 3.x and trunk the default value of mm is dictated by the
>> q.op param (q.op=AND => mm=100%; q.op=OR => mm=0%). Keep in mind the
>> default operator is effected by your schema.xml > defaultOperator="xxx"/> entry. In older versions of Solr the default
>> value is 100% (all clauses must match)
>> (from http://wiki.apache.org/solr/DisMaxQParserPlugin).
>>
>> I don't think you'll see the query parsed as you expect, but the
>> results of the query
>> should be what you expect. Tricky, eh?
>>
>> I'm assuming you've simplified the example for clarity and your qf
>> will be on more than one field when you use it "for real", but if not
>> the actual query doesn't need edismax at all.
>>
>> Best
>> Erick
>>
>> On Mon, Dec 5, 2011 at 10:52 AM, Steve Fuchs  wrote:
>>> Hello All,
>>>
>>> I have my field description listed below, but I don't think its pertinent. 
>>> As my issue seems to be with the query parser.
>>>
>>> I'm currently using an edismax subquery clause to help with my searching as 
>>> such:
>>>
>>> _query_:"{!type=edismax qf='ref_expertise'}\(nonlinear OR soliton\) AND 
>>> \"optical lattice\""
>>>
>>> translates correctly to
>>>
>>> +(+((ref_expertise:nonlinear) (ref_expertise:soliton)) 
>>> +(ref_expertise:"optical lattice"))
>>>
>>>
>>> but the users expect the default operator to be AND (it is in all simpler 
>>> searches), however nothing I can do here gets me that same result as above 
>>> when the search is:
>>>
>>> _query_:"{!type=edismax qf='ref_expertise'}\(nonlinear OR soliton\) 
>>> \"optical lattice\""
>>>
>>> this gets converted to:
>>>
>>> +(((ref_expertise:nonlinear) (ref_expertise:soliton)) 
>>> (ref_expertise:"optical lattice"))
>>>
>>> where the "optical lattice" is optional.
>>>
>>> These produce the same results, trying q.op and mm. Also the default search 
>>> term as  set in the solr.config is AND.
>>>
>>> _query_:"{!type=edismax q.op=AND qf='ref_expertise'}\(nonlinear OR 
>>> soliton\)\"optical lattice\""
>>> _query_:"{!type=edismax mm=1.0 qf='ref_expertise'}\(nonlinear OR 
>>> soliton\)\"optical lattice\""
>>>
>>>
>>>
>>>
>>> Any ideas???
>>>
>>> Thanks In Advance
>>>
>>> Steven Fuchs
>>>
>>>
>>>
>>>
>>>
>>>
>>>    
>>>      
>>>        
>>>        >> preserveOriginal="1"/>
>>>        
>>>        
>>>        >> maxGramSize="25" />
>>>      
>>>      
>>>        
>>>        
>>>        
>>>      
>>>    
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>

Re: Reducing heap space consumption for large dictionaries?

2011-12-12 Thread Maciej Lisiewski


Hi,

in my index schema I has defined a
DictionaryCompoundWordTokenFilterFactory and a
HunspellStemFilterFactory. Each FilterFactory has a dictionary with
about 100k entries.

To avoid an out of memory error I have to set the heap space to 128m
for 1 index.

Is there a way to reduce the memory consumption when parsing the dictionary?
I need to create several indexes and 128m for each index is too much.


Same problem here - even with an empty index (no data yet) and two 
fields using Hunspell (pl_PL) I had to increase heap size to over 2GB 
for solr to start at all..


Stempel using the very same dictionary works fine with 128M..

--
Maciej Lisiewski

Re: Reducing heap space consumption for large dictionaries?

2011-12-12 Thread Chris Male

Hi,

Its good to hear some feedback on using the Hunspell dictionaries.
 Lucene's support is pretty new so we're obviously looking to improve it.
 Could you open a JIRA issue so we can explore whether there is some ways
to reduce memory consumption?

On Tue, Dec 13, 2011 at 5:37 PM, Maciej Lisiewski  wrote:

> Hi,
>>
>> in my index schema I has defined a
>> DictionaryCompoundWordTokenFil**terFactory and a
>> HunspellStemFilterFactory. Each FilterFactory has a dictionary with
>> about 100k entries.
>>
>> To avoid an out of memory error I have to set the heap space to 128m
>> for 1 index.
>>
>> Is there a way to reduce the memory consumption when parsing the
>> dictionary?
>> I need to create several indexes and 128m for each index is too much.
>>
>
> Same problem here - even with an empty index (no data yet) and two fields
> using Hunspell (pl_PL) I had to increase heap size to over 2GB for solr to
> start at all..
>
> Stempel using the very same dictionary works fine with 128M..
>
> --
> Maciej Lisiewski
>



-- 
Chris Male | Software Developer | DutchWorks | www.dutchworks.nl

RE: Trim and copy a solr field

2011-12-12 Thread Swapna Vuppala

Hi Juan,

Thanks for the reply. I tried using this, but I don't see any effect of the 
analyzer/filter.

I tried copying my Solr field to another field of the type defined below. Then 
I indexed couple of documents with the new schema, but I see that both fields 
have got the same value.
Am looking at the indexed data in Luke.

Am assuming that analyzers process the field value (as specified by various 
filters etc) and then store the modified value. Is that true ? What else could 
I be missing here ?

Thanks and Regards,
Swapna.

-Original Message-
From: Juan Grande [mailto:juan.gra...@gmail.com] 
Sent: Monday, December 12, 2011 11:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Trim and copy a solr field

Hi Swapna,

You could try using a copyField to a field that uses
PatternReplaceFilterFactory:

The regular expression may not be exactly what you want, but it will give
you an idea of how to do it. I'm pretty sure there must be some other ways
of doing this, but this is the first that comes to my mind.

*Juan*

On Mon, Dec 12, 2011 at 4:46 AM, Swapna Vuppala wrote:

> Hi,
>
> I have a Solr field that contains the absolute path of the file that is
> indexed, which will be something like
> file:/myserver/Folder1/SubFol1/Sub-Fol2/Test.msg.
>
> Am interested in indexing the location in a separate field.  I was looking
> for some way to trim the field value from last occurrence of char "/", so
> that I can get the location value, something like
> file:/myserver/Folder1/SubFol1/Sub-Fol2,
> and store it in a new field. Can you please suggest some way to achieve
> this ?
>
> Thanks and Regards,
> Swapna.
> 
> Electronic mail messages entering and leaving Arup  business
> systems are scanned for acceptability of content and viruses
>

Generic RemoveDuplicatesTokenFilter

2011-12-12 Thread pravesh

Hi All,

Currently, the SOLR's existing 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory
RemoveDuplicatesTokenFilter   filters the duplicate tokens with the same
text and logical at the same position.

In my case, if the same term appears duplicate one after the other then i
need to remove all duplicates and consume only single occurance of the term
(even if the positionincrementgap ==1).

For e.g. the input stream is as:  /quick brown brown brown fox jumps jumps
over the little little lazy brown dog/
Then the output shld be:  quick brown fox jumps over the little lazy brown
dog.

To acheive this, I implemented my own version of
/RemoveDuplicatesTokenFilter/ with overridden /process()/ method as:

  protected Token process(Token t) throws IOException {
  Token nextTok = peek(1);
  if(t!=null && nextTok!=null){
 if(t.termText().equalsIgnoreCase(nextTok.termText())){
return null;
  }
  }
  return t;
  }

The above implementation works as per desired and the continuous duplicates
are getting removed :)

Any advice/feedback for the above implementation :)

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Generic-RemoveDuplicatesTokenFilter-tp3581656p3581656.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: NRT or similar for Solr 3.5?

2011-12-12 Thread vikram kamath

@Steven .. try some alternate email address(besides google/yahoo)  and
check your spam

[image: twitter] [image:
facebook][image:
google-buzz] [image:
linkedin]

Regards
Vikram Kamath



2011/12/13 Steven Ou 

> Yeah, running Chrome on OSX and doesn't do anything.
>
> Just switched to Firefox and it works. *But*, also don't seem to be
> receiving confirmation email.
> --
> Steven Ou | 歐偉凡
>
> *ravn.com* | Chief Technology Officer
> steve...@gmail.com | +1 909-569-9880
>
>
> 2011/12/12 vikram kamath 
>
> > The Onclick handler does not seem to be called on google chrome (Ubuntu
> ).
> >
> > Also , I dont seem to receive the email with the confirmation link on
> > registering (I have checked my spam)
> >
> >
> >
> >
> > Regards
> > Vikram Kamath
> >
> >
> >
> > 2011/12/12 Nagendra Nagarajayya 
> >
> > > Steven:
> > >
> > > There is an onclick handler that allows you to download the src. BTW,
> an
> > > early access Solr 3.5 with RankingAlgorithm 1.3 (NRT) release is
> > > available for download. So please give it a try.
> > >
> > > Regards,
> > >
> > > - Nagendra Nagarajayya
> > > http://solr-ra.tgels.org
> > > http://rankingalgorithm.tgels.org
> > >
> > >
> > > On 12/10/2011 11:18 PM, Steven Ou wrote:
> > > > All the links on the download section link to
> > http://solr-ra.tgels.org/#
> > > > --
> > > > Steven Ou | 歐偉凡
> > > >
> > > > *ravn.com* | Chief Technology Officer
> > > > steve...@gmail.com | +1 909-569-9880
> > > >
> > > >
> > > > 2011/12/11 Nagendra Nagarajayya 
> > > >
> > > >> Steven:
> > > >>
> > > >> Not sure why you had problems, #downloads (
> > > >> http://solr-ra.tgels.org/#downloads ) should point you to the
> > downloads
> > > >> section showing the different versions available for download ?
> Please
> > > >> share if this is not so ( there were downloads yesterday with no
> > > problems )
> > > >>
> > > >> Regarding NRT, you can switch between RA and Lucene at query level
> or
> > at
> > > >> config level; in the current version with RA, NRT is in effect while
> > > >> with lucene, it is not, you can get more information from here:
> > > >> http://solr-ra.tgels.org/papers/Solr34_with_RankingAlgorithm13.pdf
> > > >>
> > > >> Solr 3.5 with RankingAlgorithm 1.3 should be available next week.
> > > >>
> > > >> Regards,
> > > >>
> > > >> - Nagendra Nagarajayya
> > > >> http://solr-ra.tgels.org
> > > >> http://rankingalgorithm.tgels.org
> > > >>
> > > >> On 12/9/2011 4:49 PM, Steven Ou wrote:
> > > >>> Hey Nagendra,
> > > >>>
> > > >>> I took a look and Solr-RA looks promising - but:
> > > >>>
> > > >>>- I could not figure out how to download it. It seems like all
> the
> > > >>>download links just point to "#"
> > > >>>- I wasn't looking for another ranking algorithm, so would it be
> > > >>>possible for me to use NRT but *not* RA (i.e. just use the
> normal
> > > >> Lucene
> > > >>>library)?
> > > >>>
> > > >>> --
> > > >>> Steven Ou | 歐偉凡
> > > >>>
> > > >>> *ravn.com* | Chief Technology Officer
> > > >>> steve...@gmail.com | +1 909-569-9880
> > > >>>
> > > >>>
> > > >>> On Sat, Dec 10, 2011 at 5:13 AM, Nagendra Nagarajayya <
> > > >>> nnagaraja...@transaxtions.com> wrote:
> > > >>>
> > >  Steven:
> > > 
> > >  Please take a look at Solr  with RankingAlgorithm. It offers NRT
> > >  functionality. You can set your autoCommit to about 15 mins. You
> can
> > > get
> > >  more information from here:
> > > 
> http://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.x<
> > > >> http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x>
> > > 
> > >  Regards,
> > > 
> > >  - Nagendra Nagarajayya
> > >  http://solr-ra.tgels.org
> > >  http://rankingalgorithm.tgels.**org <
> > > http://rankingalgorithm.tgels.org>
> > > 
> > > 
> > >  On 12/8/2011 9:30 PM, Steven Ou wrote:
> > > 
> > > > Hi guys,
> > > >
> > > > I'm looking for NRT functionality or similar in Solr 3.5. Is that
> > > > possible?
> > > >> From what I understand there's NRT in Solr 4, but I can't figure
> > out
> > > > whether or not 3.5 can do it as well?
> > > >
> > > > If not, is it feasible to use an autoCommit every 1000ms? We
> don't
> > > > currently process *that* much data so I wonder if it's OK to just
> > > >> commit
> > > > very often? Obviously not scalable on a large scale, but it is
> > > feasible
> > > > for
> > > > a relatively small amount of data?
> > > >
> > > > I recently upgraded from Solr 1.4 to 3.5. I had a hard time
> getting
> > > > everything working smoothly and the process ended up taking my
> site
> > > >> down
> > > > for a couple hours. I am very hesitant to upgrade to Solr 4 if
> it's
> > > not
> > > > necessary to get some sort of NRT functionality.
> > > >
> > > > Can anyone help me? Thanks!
> > >

56 matches

Mail list logo