date:20091110

Re: Question about the message "Indexing failed. Rolled back all changes."

2009-11-10 Thread Bertie Shen

No. I did not check the logs.

But  even after I successfully index data using
http://host:port/solr-example/dataimport?command=full-import&commit=true&clean=true,
do solr search which returns meaningful results, and then visit
http://host:port/solr-example/dataimport?command=status, I can see the
following result


-

0
1

-

-

data-config.xml


status
idle

-

0:2:11.426
584
1538
0
2009-11-09 23:54:41
*Indexing failed. Rolled back all changes.*
2009-11-09 23:54:42
2009-11-09 23:54:42
2009-11-09 23:54:42

-

This response format is experimental.  It is likely to change in the future.



On Mon, Nov 9, 2009 at 7:39 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Sat, Nov 7, 2009 at 1:10 PM, Bertie Shen  wrote:
>
> >
> >  When I use
> > http://localhost:8180/solr/admin/dataimport.jsp?handler=/dataimport to
> > debug
> > the indexing config file, I always see the status message on the right
> part
> > Indexing failed. Rolled back all changes., even the
> > indexing process looks to be successful. I am not sure whether you guys
> > have
> > seen the same phenomenon or not.  BTW, I usually check the checkbox Clean
> > and sometimes check Commit box, and then click Debug Now button.
> >
> >
> Do you see any exceptions in the logs?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Question about the message "Indexing failed. Rolled back all changes."

2009-11-10 Thread Avlesh Singh

>
> But  even after I successfully index data using 
> http://host:port/solr-example/dataimport?command=full-import&commit=true&clean=true,
> do solr search which returns meaningful results
>
I am not sure what "meaningful" means. The full-import command starts an
asynchronous process to start re-indexing. The response that you get in
return to the above mentioned URL, (always) indicates that a full-import has
been started. It does NOT know about anything that might go wrong with the
process itself.

and then visit http://host:port/solr-example/dataimport?command=status, I
> can see thefollowing result ...
>
The status URL is the one which tells you what is going on with the process.
The message - "Indexing failed. Rolled back all changes" can come because of
multiple reasons - missing database drivers, incorrect sql queries, runtime
errors in custom transformers etc.

Start the full-import once more. Keep a watch on the Solr server log. If you
can figure out what's going wrong, great; otherwise, copy-paste the
exception stack-trace from the log file for specific answers.

Cheers
Avlesh

On Tue, Nov 10, 2009 at 1:32 PM, Bertie Shen  wrote:

> No. I did not check the logs.
>
> But  even after I successfully index data using
> http://host:port
> /solr-example/dataimport?command=full-import&commit=true&clean=true,
> do solr search which returns meaningful results, and then visit
> http://host:port/solr-example/dataimport?command=status, I can see the
> following result
>
> 
> -
> 
> 0
> 1
> 
> -
> 
> -
> 
> data-config.xml
> 
> 
> status
> idle
> 
> -
> 
> 0:2:11.426
> 584
> 1538
> 0
> 2009-11-09 23:54:41
> *Indexing failed. Rolled back all changes.*
> 2009-11-09 23:54:42
> 2009-11-09 23:54:42
> 2009-11-09 23:54:42
> 
> -
> 
> This response format is experimental.  It is likely to change in the
> future.
> 
> 
>
> On Mon, Nov 9, 2009 at 7:39 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > On Sat, Nov 7, 2009 at 1:10 PM, Bertie Shen 
> wrote:
> >
> > >
> > >  When I use
> > > http://localhost:8180/solr/admin/dataimport.jsp?handler=/dataimport to
> > > debug
> > > the indexing config file, I always see the status message on the right
> > part
> > > Indexing failed. Rolled back all changes., even the
> > > indexing process looks to be successful. I am not sure whether you guys
> > > have
> > > seen the same phenomenon or not.  BTW, I usually check the checkbox
> Clean
> > > and sometimes check Commit box, and then click Debug Now button.
> > >
> > >
> > Do you see any exceptions in the logs?
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>

Re: How TEXT field make sortable?

2009-11-10 Thread Lucas F. A. Teixeira

That's correct.

You can use copyField to copy this field's content to another field of other
tipo (string) and sort by this one.

[]s,

Lucas Frare Teixeira .·.
- lucas...@gmail.com
- lucastex.com.br
- blog.lucastex.com
- twitter.com/lucastex


On Tue, Nov 10, 2009 at 5:36 AM, Avlesh Singh  wrote:

> >
> > Can some one help me how we can sort the text field.
> >
> You CANNOT sort on a "text" field. Sorting can only be done on an
> untokenized field (e.g string, sint, sfloat etc fields)
>
> Cheers
> Avlesh
>
> On Tue, Nov 10, 2009 at 11:44 AM, deepak agrawal 
> wrote:
>
> > Can some one help me how we can sort the text field.
> >
> > 
> >
> > --
> > DEEPAK AGRAWAL
> > +91-9379433455
> > GOOD LUCK.
> >
>

Re: A question about how to make schema.xml change take effect

2009-11-10 Thread Chantal Ackermann

Did the schema browser really show a different type after restarting? I 
would think you'd have to reindex before the change gets applied to the 
actual data. Or is you're index/import process launched on Tomcat startup?


(schema.xml != schema browser ?!)

Chantal


Bertie Shen schrieb:

Oh. Sorry, take back what I said. Most of my config change is at
data-config.xml, not schema.xml.

I just made a change for field data type in schema.xml and noticed that I
have to restart tomcat.



On Mon, Nov 9, 2009 at 10:37 PM, Ritesh Gurung  wrote:


Well everytime you make change in schema.xml file you need restart the
tomcat server.

On Tue, Nov 10, 2009 at 11:59 AM, Bertie Shen 
wrote:

Hey folks,

 When I update schema.xml, I found most of time I do not need to restart
tomcat in order to make change take effect. But sometimes, I have to

restart

tomcat server to make change take effect.

  For example, when I change a field data type from sint to tlong, I

called

http://host:port

/solr/dataimport?command=full-import&commit=true&clean=true.

I clicked [Schema] link from admin page and found data type is tlong; but
click [Schema Browser] and that field link, I found the data type is

still

sint.  When I make a search, the result also shows the field is still

sint.

The only way to make the change effective I found is to restart tomcat.

  I want to confirm whether it is intended or it is a bug.

  Thanks.

Configuring 1.4 - multi master setup?

2009-11-10 Thread Kevin Jackson

Hi all,

We have a situation where we would like to have
1 Master server (creates the index)
1 input slave server (which receives the updated index from the master)
n slaves (which receive the updated index from the input slave server)

This is to prevent each of the n slaves polling the master server.

a: is this setup possible?
b: has anyone done anything like this, if so do you have any advice?

This is all with 1.4 so we would be using inbuilt/java replication,
not snapshooter/snappuller

Thanks,
Kev

distributed facet dates

2009-11-10 Thread Marc Sturlese


Hey there,
I am thinking to develope facet dates for distributed search but I don't
know exacly where to start. I am familiar with facet dates source code and I
think if I could undesertand how distributed facet queries work shouldn't be
that difficult.
I have read http://wiki.apache.org/solr/WritingDistributedSearchComponents
but I miss some info.
Could anyone point me how could I start?

Thanks in advance

-- 
View this message in context: 
http://old.nabble.com/distributed-facet-dates-tp26282343p26282343.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuring 1.4 - multi master setup?

2009-11-10 Thread Noble Paul നോബിള്‍ नोब्ळ्

see the setting up a repeater section in this page

http://wiki.apache.org/solr/SolrReplication

On Tue, Nov 10, 2009 at 5:17 PM, Kevin Jackson  wrote:
> Hi all,
>
> We have a situation where we would like to have
> 1 Master server (creates the index)
> 1 input slave server (which receives the updated index from the master)
> n slaves (which receive the updated index from the input slave server)
>
> This is to prevent each of the n slaves polling the master server.
>
> a: is this setup possible?
> b: has anyone done anything like this, if so do you have any advice?
>
> This is all with 1.4 so we would be using inbuilt/java replication,
> not snapshooter/snappuller
>
> Thanks,
> Kev
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: distributed facet dates

2009-11-10 Thread Yonik Seeley

On Tue, Nov 10, 2009 at 7:09 AM, Marc Sturlese  wrote:
> Hey there,
> I am thinking to develope facet dates for distributed search but I don't
> know exacly where to start. I am familiar with facet dates source code and I
> think if I could undesertand how distributed facet queries work shouldn't be
> that difficult.
> I have read http://wiki.apache.org/solr/WritingDistributedSearchComponents
> but I miss some info.
> Could anyone point me how could I start?


It should be relatively straigtforward and easier than normal
facets... a single phase (the first main phase) and merge the results.
 date math relative to "NOW" needs to be coordinated across shards.

-Yonik
http://www.lucidimagination.com

Re: distributed facet dates

2009-11-10 Thread Yonik Seeley

On Tue, Nov 10, 2009 at 7:54 AM, Yonik Seeley
 wrote:
> On Tue, Nov 10, 2009 at 7:09 AM, Marc Sturlese  
> wrote:
>> Hey there,
>> I am thinking to develope facet dates for distributed search but I don't
>> know exacly where to start. I am familiar with facet dates source code and I
>> think if I could undesertand how distributed facet queries work shouldn't be
>> that difficult.
>> I have read http://wiki.apache.org/solr/WritingDistributedSearchComponents
>> but I miss some info.
>> Could anyone point me how could I start?
>
>
> It should be relatively straigtforward and easier than normal
> facets... a single phase (the first main phase) and merge the results.
>  date math relative to "NOW" needs to be coordinated across shards.

Check out FacetComponent.countFacets()

-Yonik
http://www.lucidimagination.com





> -Yonik
> http://www.lucidimagination.com
>

Re: tracking solr response time

2009-11-10 Thread bharath venkatesh

Otis,

   This means we have to leave enough space for os cache to cache the whole
index  . so In  case of 16 GB index ., if  I am not wrong at least 16 GB
memory must not be   allocated to any application for os cache to utilize
the memory .

>> The operating systems are very good at maintaining this cache. It
> > usually better to give the Solr JVM enough memory to run comfortably
> > and rely on the OS cache to optimize disk I/O, instead of giving it
> > all available ram.

how much ram would be good enough for the Solr JVM  to run comfortably.


thanks,
Bharath


On Tue, Nov 10, 2009 at 3:59 AM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Bharat,
>
> No, you should not give the JVM so much memory.  Give it enough to avoid
> overly frequent GC, but don't steal memory from the OS cache.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
> > From: bharath venkatesh 
> > To: solr-user@lucene.apache.org
> > Sent: Sun, November 8, 2009 2:15:00 PM
> > Subject: Re: tracking solr response time
> >
> > Thanks  Lance for the clear explanation .. are you saying we should give
> > solr JVM enough memory so that os cache can optimize disk I/O efficiently
> ..
> > that means in our case we have  16 GB  index so  would it  be enough to
> > allocated solr JVM 20GB memory and rely on the OS cache to optimize disk
> I/O
> > i .e cache the index in memory  ??
> >
> >
> > below is stats related to cache
> >
> >
> > *name: * queryResultCache  *class: * org.apache.solr.search.LRUCache  *
> > version: * 1.0  *description: * LRU Cache(maxSize=512, initialSize=512,
> > autowarmCount=256,
> > regenerator=org.apache.solr.search.solrindexsearche...@67e112b3)
> > *stats: *lookups
> > : 0
> > hits : 0
> > hitratio : 0.00
> > inserts : 8
> > evictions : 0
> > size : 8
> > cumulative_lookups : 15
> > cumulative_hits : 7
> > cumulative_hitratio : 0.46
> > cumulative_inserts : 8
> > cumulative_evictions : 0
> >
> >
> > *name: * documentCache  *class: * org.apache.solr.search.LRUCache  *
> > version: * 1.0  *description: * LRU Cache(maxSize=512, initialSize=512)
>  *
> > stats: *lookups : 0
> > hits : 0
> > hitratio : 0.00
> > inserts : 0
> > evictions : 0
> > size : 0
> > cumulative_lookups : 744
> > cumulative_hits : 639
> > cumulative_hitratio : 0.85
> > cumulative_inserts : 105
> > cumulative_evictions : 0
> >
> >
> > *name: * filterCache  *class: * org.apache.solr.search.LRUCache
> > *version: *1.0
> > *description: * LRU Cache(maxSize=512, initialSize=512,
> autowarmCount=256,
> > regenerator=org.apache.solr.search.solrindexsearche...@1e3dbf67)
> > *stats: *lookups
> > : 0
> > hits : 0
> > hitratio : 0.00
> > inserts : 20
> > evictions : 0
> > size : 12
> > cumulative_lookups : 64
> > cumulative_hits : 60
> > cumulative_hitratio : 0.93
> > cumulative_inserts : 12
> > cumulative_evictions : 0
> >
> >
> > hits and hit ratio are  zero for ducment cache , filter cache and query
> > cache ..  only commulative hits and hitratio has a non zero numbers ..
>  is
> > this how it is supposed to be .. or do we to configure it properly ?
> >
> > Thanks,
> > Bharath
> >
> >
> >
> >
> >
> > On Sat, Nov 7, 2009 at 5:47 AM, Lance Norskog wrote:
> >
> > > The OS cache is the memory used by the operating system (Linux or
> > > Windows) to store a cache of the data stored on the disk. The cache is
> > > usually by block numbers and are not correlated to files. Disk blocks
> > > that are not used by programs are slowly pruned from the cache.
> > >
> > > The operating systems are very good at maintaining this cache. It
> > > usually better to give the Solr JVM enough memory to run comfortably
> > > and rely on the OS cache to optimize disk I/O, instead of giving it
> > > all available ram.
> > >
> > > Solr has its own caches for certain data structures, and there are no
> > > solid guidelines for tuning those. The solr/admin/stats.jsp page shows
> > > the number of hits & deletes for the caches and most people just
> > > reload that over & over.
> > >
> > > On Fri, Nov 6, 2009 at 3:09 AM, bharath venkatesh
> > > wrote:
> > > >>I have to state the obvious: you may really want to upgrade to 1.4
> when
> > > > it's out
> > > >
> > > > when would solr 1.4 be released .. is there any beta version
> available ?
> > > >
> > > >>We don't have the details, but a machine with 32 GB RAM and 16 GB
> index
> > > > should have the whole index cached by >the OS
> > > >
> > > > do we have to configure solr  for the index to be cached  by OS in a
> > > > optimised way   . how does this caching of index in memory happens ?
>  r
> > > > there  any docs or link which gives details regarding the same
> > > >
> > > >>unless something else is consuming the memory or unless something is
> > > > constantly throwing data out of the OS >cache (e.g. frequent index
> > > > optimization).
> > > >
> > > > what are the factors which would cause co

Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-11-10 Thread Eugene Dzhurinsky

On Tue, Nov 03, 2009 at 05:49:23PM -0800, Lance Norskog wrote:
> The DIH has improved a great deal from Solr 1.3 to 1.4. You will be
> much better off using the DIH from this.
> 
> This is the current Solr release candidate binary:
> http://people.apache.org/~gsingers/solr/1.4.0/

In fact we are prohibited to use release candidates/nightly builds, we are
forced to use only releases of Solr :(

-- 
Eugene N Dzhurinsky


pgp3tGF8YojpA.pgp
Description: PGP signature

Re: tracking solr response time

2009-11-10 Thread Yonik Seeley

On Tue, Nov 10, 2009 at 8:07 AM, bharath venkatesh
 wrote:
> how much ram would be good enough for the Solr JVM  to run comfortably.

It really depends on how much stuff is cached, what fields you facet
and sort on, etc.

It can be easier to measure than to try and calculate it.
Run jconsole to see the memory use, do a whole bunch of queries that
do all the faceting, sorting, and function queries you will do in
production.  Then invoke GC a few times in rapid succession via
jconsole and see how much memory is actually used.  Double that to
account for a new index searcher being opened while the current one is
still open (that's just the worst case for Solr 1.4... the average
reopen case is better since many segments can be shared).  Add a
little more for safety.

-Yonik
http://www.lucidimagination.com

Re: Configuring 1.4 - multi master setup?

2009-11-10 Thread Kevin Jackson

Hi,

2009/11/10 Noble Paul നോബിള്‍  नोब्ळ् :
> see the setting up a repeater section in this page
>
> http://wiki.apache.org/solr/SolrReplication

Doh!

Sorry for the noise

Thanks,
Kev

Re: sanizing/filtering query string for security

2009-11-10 Thread michael8


Thanks guys for your input and suggestions!

Michael


Otis Gospodnetic wrote:
> 
> Word of warning:
> Careful with q.alt=*:* if you are dealing with large indices! :)
> 
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> - Original Message 
>> From: Alexey Serba 
>> To: solr-user@lucene.apache.org
>> Sent: Mon, November 9, 2009 5:23:52 PM
>> Subject: Re: sanizing/filtering query string for security
>> 
>> > BTW, I have not used DisMax handler yet, but does it handle *:*
>> properly?
>> See q.alt DisMax parameter
>> http://wiki.apache.org/solr/DisMaxRequestHandler#q.alt
>> 
>> You can specify q.alt=*:* and q as empty string to get all results.
>> 
>> > do you care if users issue this query
>> I allow users to issue an empty search and get all results with all
>> facets / etc. It's a nice navigation UI btw.
>> 
>> > Basically given my UI, I'm trying to *hide* the total count from users 
>> searching for *everything*
>> If you don't specify q.alt parameter then Solr returns zero results
>> for empty search. *:* won't work either.
>> 
>> > though this syntax has helped me debug/monitor the state of my search
>> doc pool 
>> size.
>> see q.alt
>> 
>> Alex
>> 
>> On Tue, Nov 10, 2009 at 12:59 AM, michael8 wrote:
>> >
>> > Sounds like a nice approach you have  done.  BTW, I have not used
>> DisMax
>> > handler yet, but does it handle *:* properly?  IOW, do you care if
>> users
>> > issue this query, or does DisMax treat this query string differently
>> than
>> > standard request handler?  Basically given my UI, I'm trying to *hide*
>> the
>> > total count from users searching for *everything*, though this syntax
>> has
>> > helped me debug/monitor the state of my search doc pool size.
>> >
>> > Thanks,
>> > Michael
>> >
>> >
>> > Alexey-34 wrote:
>> >>
>> >> I added some kind of pre and post processing of Solr results for this,
>> >> i.e.
>> >>
>> >> If I find fieldname specified in query string in form of
>> >> "fieldname:term" then I pass this query string to standard request
>> >> handler, otherwise use DisMaxRequestHandler ( DisMaxRequestHandler
>> >> doesn't break the query, at least I haven't seen yet ). If standard
>> >> request handler throws error ( invalid field, too many clauses, etc )
>> >> then I pass original query to DisMax request handler.
>> >>
>> >> Alex
>> >>
>> >> On Mon, Nov 9, 2009 at 10:05 PM, michael8 wrote:
>> >>>
>> >>> Hi Julian,
>> >>>
>> >>> Saw you post on exactly the question I have.  I'm curious if you got
>> any
>> >>> response directly, or figured out a way to do this by now that you
>> could
>> >>> share?  I'm in the same situation trying to 'sanitize' the query
>> string
>> >>> coming in before handing it to solr.  I do see that characters like
>> ":"
>> >>> could break the query, but am curious if anyone has come up with a
>> >>> general
>> >>> solution as I think this must be a fairly common problem for any solr
>> >>> deployment to tackle.
>> >>>
>> >>> Thanks,
>> >>> Michael
>> >>>
>> >>>
>> >>> Julian Davchev wrote:
>> 
>>  Hi,
>>  Is there anything special that can be done for sanitizing user input
>>  before passed as query to solr.
>>  Not allowing * and ? as first char is only thing I can thing of
>> right
>>  now. Anything else it should somehow handle.
>> 
>>  I am not able to find any relevant document.
>> 
>> 
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>> 
>> http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html
>> >>> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>>
>> >>>
>> >>
>> >>
>> >
>> > --
>> > View this message in context: 
>> http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26274459.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>> >
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26283657.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-11-10 Thread Israel Ekpo

On Tue, Nov 10, 2009 at 8:26 AM, Eugene Dzhurinsky  wrote:

> On Tue, Nov 03, 2009 at 05:49:23PM -0800, Lance Norskog wrote:
> > The DIH has improved a great deal from Solr 1.3 to 1.4. You will be
> > much better off using the DIH from this.
> >
> > This is the current Solr release candidate binary:
> > http://people.apache.org/~gsingers/solr/1.4.0/
>
> In fact we are prohibited to use release candidates/nightly builds, we are
> forced to use only releases of Solr :(
>
> --
> Eugene N Dzhurinsky
>


Well, the official release is out and you can pick it up from your closest
mirror here

http://www.apache.org/dyn/closer.cgi/lucene/solr/


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.

Re: Configuring 1.4 - multi master setup?

2009-11-10 Thread Walter Underwood

Replication creates very little load on the master, so you should not  
need to have a separate machine just to handle the replication.


Why do you think you need that?

wunder

On Nov 10, 2009, at 5:37 AM, Kevin Jackson wrote:


Hi,

2009/11/10 Noble Paul നോബിള്‍  नोब्ळ्  
:

see the setting up a repeater section in this page

http://wiki.apache.org/solr/SolrReplication


Doh!

Sorry for the noise

Thanks,
Kev

Re: Slow Commits

2009-11-10 Thread Jim Murphy


Just an update to the list.  It appears that memory was the culprit.  I
attached a JMX console to the running Tomcat instance and monitored memory
usage.  Used Total memory stayed ~900MB till a commit then jumped to m Xmx
setting of 1.2GB where the "peak" flatlined and fell down likely after an
OOM exception.  I upped the Xmx to 2GB and commits are happening much better
- in the 1 minute range.


Jim



Jim Murphy wrote:
> 
> Thanks Jerome,
> 
> 
> 1. I have shut off autowarming by setting params to 0.
> 2. My JVM Settings: -Xmx1200m -Xms1200m -XX:-UseGCOverheadLimit
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=50
> 3. I am using autocommits - every 6 ms.  But the commit blocks all the
> master request threadpool threads as it spends 2-3 minutes commiting.
> 4. I'm reluctant to NOT waitFlush since I don't want commits stacking up. 
> 
> 
> Any other thoughts?
> 
> Thanks
> 
> Jim
> 
> 
> 
> 
> Jérôme Etévé wrote:
>> 
>> Hi, here's two thing that can slow down commits:
>> 
>> 1) Autowarming the caches.
>> 2) The Java old generation object garbage collection.
>> 
>> You can try:
>> - Turning autowarming off (set autowarmCount="0"  in the caches
>> configuration)
>> - If you use the sun jvm, use  -XX:+UseConcMarkSweepGC to get a less
>> blocking garbage collection.
>> 
>> You may also try to:
>> - Not wait for the new searcher when you commit. The commit will then
>> be instant from your posting application point of view. ( option
>> waitSearcher=false  ).
>> - Leave the commits to the server ( by setting autocommits in the
>> solrconfig.xml). This is the best strategy if you've got lot of
>> concurrent processes who posts.
>> 
>> Cheers.
>> 
>> Jerome.
>> 
>> 2009/10/28 Jim Murphy :
>>>
>>> Hi All,
>>>
>>> We have 8 solr shards, index is ~ 90M documents 190GB.  :)
>>>
>>> 4 of the shards have acceptable commit time - 30-60 seconds.  The other
>>> 4
>>> have drifted over the last couple months to but up around 2-3 minutes. 
>>> This
>>> is killing our write throughput as you can imagine.
>>>
>>> I've included a log dump of a typical commit.  Not the large time period
>>> (3:40) between the start commit log message and the OnCommit log
>>> message.
>>> So, I think warming issues are not relevant.
>>>
>>> Any ideas what to debug at this point?
>>>
>>> I'm about to issue an optimize and see where that goes.  Its been a
>>> while
>>> since I did that.
>>>
>>> Cheers,
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>> Oct 28, 2009 11:47:02 AM org.apache.solr.update.DirectUpdateHandler2
>>> commit
>>> INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> Oct 28, 2009 11:50:43 AM org.apache.solr.core.SolrDeletionPolicy
>>> onCommit
>>> INFO: SolrDeletionPolicy.onCommit: commits:num=2
>>>
>>> commit{dir=/master/data/index,segFN=segments_8us4,version=1228872482131,generation=413140,filenames=[segments_8us4,
>>> _alae.fnm, _ai
>>> lk.tis, _ala9.fnm, _ala9.fdx, _alac.fnm, _al9w_h.del, _alab.prx,
>>> _ala9.fdt,
>>> _a61p_b76.del, _alab.fnm, _al8x.frq, _al7i_2f.del, _akh1.tis,
>>> _add1.frq, _alae.tis, _alad_1.del, _alaa.fnm, _alad.nrm, _al9w.frq,
>>> _alae.tii, _ailk.tii, _add1.tis, _alac.tii, _akuu.tis, _add1.tii, _ail
>>> k.frq, _alac.tis, _7zfh.tii, _962y.tis, _ala7.frq, _ah91.prx, _akuu.tii,
>>> _alab_3.del, _ah91.fnm, _7zfh.tis, _ala8.frq, _962y.tii, _alae.pr
>>> x, _a61p.fdt, _akuu.frq, _a61p.fdx, _al7i.fdx, _al2o.tis, _al9w.tis,
>>> _ala7.fnm, _a61p.frq, _akzu.fnm, _9wzn.fnm, _akh1.prx, _al7i.fdt, _al
>>> a9_2.del, _962y.prx, _al7i.prx, _al9w.tii, _alaa_4.del, _al7i.frq,
>>> _ah91.tii, _ala8.nrm, _962y.fdt, _add1_62u.del, _alae.nrm, _ah91.tis, _
>>> 962y.fdx, _akh1.fnm, _al8x.prx, _al2o.tii, _ala7.fdx, _ala9.prx,
>>> _ala7.fdt,
>>> _al9w.prx, _ala8.prx, _akh1.tii, _al2o.fdx, _7zfh.frq, _alac_3
>>> .del, _akzu.tii, _akzu.fdt, _alad.fnm, _akzu.tis, _alab.nrm, _akzu.fdx,
>>> _al2o.fnm, _al2o.fdt, _alaa.prx, _alaa.nrm, _962y.fnm, _ala7.prx,
>>> _alaa.tis, _ailk.fdt, _akzu_8d.del, _alac.frq, _akzu.prx, _ala9.nrm,
>>> _ailk.prx, _ala9.tis, _alaa.tii, _alae.frq, _add1.fnm, _7zfh.prx, _al
>>> 9w.fnm, _ala9.tii, _ala9.frq, _962y.nrm, _alab.frq, _ala8.fdx,
>>> _al8x.fnm,
>>> _a61p.prx, _7zfh.fnm, _ala8.fdt, _ailk.fdx, _alaa.frq, _7zfh.fdx
>>> , _al7i.tis, _ah91.fdt, _ailk.fnm, _9wzn_i0m.del, _ah91.fdx, _al7i.tii,
>>> _ailk_24j.del, _alad.fdx, _al8x.tii, _alae.fdx, _add1.prx, _akuu.f
>>> nm, _al8x.tis, _ah91.frq, _ala8.fnm, _7zfh.fdt, _alad.fdt, _alae_1.del,
>>> _alae.fdt, _akzu.frq, _a61p.fnm, _9wzn.frq, _ala8.tii, _7zfh_1gsd.
>>> del, _7zfh.nrm, _ala7_6.del, _a61p.tis, _9wzn.tii, _alad.frq, _alad.tii,
>>> _akuu.fdt, _alab.tii, _ala8.tis, _962y_xgg.del, _akh1.frq, _akuu.
>>> fdx, _alab.tis, _al7i.fnm, _alad.tis, _alac.nrm, _alab.fdx, _ala8_5.del,
>>> _add1.fdx, _ala7.tii, _akuu_cc.del, _alab.fdt, _9wzn.prx, _alaa.f
>>> dx, _al9w.fdt, _al2o.frq, _akh1_nf.del, _alac.prx, _akh1.fdx, _alaa.fdt,
>>> _al9w.fdx, _al8x_17.del, _add1.fdt, _al2o.prx, _akh1.fdt, _alad.p
>>> rx, _akuu.prx, _962y.frq, _al2o_66.del, _

Re: de-boosting certain facets during search

2009-11-10 Thread Paul Rosen


Thanks Erik,

Your suggestion below works great.

And we do want a particularly relevant Citation to appear higher in the 
list.


I'm guessing that the value of the boost (you've given "5" in your 
example) is important to getting the Citations to be just high enough.


Is there a way for me to determine, in a generic way, what a good value 
for that boost would be? Since there are an infinite number of possible 
queries the user can make, I don't think trial and error is particularly 
useful.


Are there any rules-of-thumb for determining that number?

Erik Hatcher wrote:

Paul,

Inline below...

On Nov 9, 2009, at 6:28 PM, Paul Rosen wrote:
If I could just create the desired URL, I can probably work backwards 
and construct the correct ruby call.


Right, this list will always serve you best if you take the Ruby out of 
the equation.  solr-ruby, while cool and all, isn't very well known by 
many, but Solr URLs are universal lingo here.



http://localhost:8983/solr/resources/select?hl.fragsize=600
&hl=true
&facet.field=genre
&facet.field=archive
&facet.limit=-1
&qt=standard
&start=0
&fq=archive%3A%22blake%22
&hl.fl=text
&fl=uri%2Carchive%2Cdate_label%2Cgenre
&facet=true
&q=%28history%29
&rows=60
&facet.missing=true
&facet.mincount=1

What this search returns from my index is 53 hits. The first 43 
contain the genre field value "Citation" and the last 10 do not (they 
contain other values in that field.)


Note: the genre field is multivalued, if that matters.


It matters if you want to sort by genre.  It doesn't make sense to sort 
by a multivalued field though.


I'd like the search to put all of the objects that contain genre 
"Citation" below the 10 objects that do not contain that genre.


Are you dogmatic about them _all_ appearing below?  Or might it be ok if 
a Citation that has substantially better term matching than another type 
of object appear ahead in the results?


I've read the various pages on boosting, but since I'm not actively 
searching on the field that I want to put a boost value on, I'm not 
sure how to go about this.


How this is done is dependent on the query parser.  You're using the 
Lucene query parser.  Something like this might work for you:



http://localhost:8983/solr/select?q=ipod%20%20OR%20%28ipod%20-manu:Belkin%29^5&debugQuery=true 



unurlencoded, that is q=ipod  OR (ipod -manu:Belkin)^5, where the users 
query is repeated in a second clause that boosts up all documents that 
are not of a particular manufacturer using the example docs that Solr 
ships with.


Be sure to use debugQuery=true to look at the score explanations (try 
looking at the output in the wt=ruby&indent=on format for best 
readability).


Additionally...




Thanks for any hints.

Paul Rosen wrote:

Hi,
I'm using solr-ruby-0.0.8 and solr 1.4.
My data contains a faceted field called "genre". We would like one 
particular genre, (the one named "Citation") to show up last in the 
results.
I'm having trouble figuring out how to add the boost parameter to the 
solr-ruby call. Here is my code:

req = Solr::Request::Standard.new(:start => start,
 :rows => max,
 :sort => sort_param,
 :query => query,
 :filter_queries => filter_queries,
 :field_list => @field_list,
 :facets => {:fields => @facet_fields,
   :mincount => 1,
   :missing => true,
   :limit => -1},
 :highlighting => {:field_list => ['text'],
   :fragment_size => 600},
   :shards => @cores)
response = @solr.send(req)
Do I just format it inside my query, like this:
query = query + "AND genre:Citation^.01"
or in filter_query, like this:
filter_queries.push("genre:Citation^.01")
or is there a hash parameter that I set?


filter queries (fq) do not contribute to the score, so boosting them 
makes no score difference at all.


(Note that the user can select Citation explicitly. I'll probably 
special case that.)

I've tried variations of the above, but I've had no luck so far.
Thanks,
Paul




Erik

Converting SortableIntField to Integer (Externalizing)

2009-11-10 Thread Chantal Ackermann


Hi all,

has anyone some code snippet on how to convert the String representation 
of a SortableIntField (or SortableLongField or else) to a 
java.lang.Integer or int?


Input: String (cryptic, non human readable, value of a sint field)
Output: Integer or int

I would appreciate if anyone could give me some hint/pointer/input on 
how to do that. I couldn't find anything about it in the Javadoc for 
SortableIntField or by googling for it.




More in detail:
Requesting for example "interestingTerms" using the MoreLikeThisHandler 
returns a list of fields and terms where the terms' string values are 
not externalized. Thus, fields of type sint do not contain the actual 
int value but some cryptic (non-human-readable) string. I would like to 
extract the int value to create another query with it.



Thanks a lot,
Chantal

Re: tracking solr response time

2009-11-10 Thread bharath venkatesh

Thanks yonik .. will consider Jconsole

On Tue, Nov 10, 2009 at 7:01 PM, Yonik Seeley wrote:

> On Tue, Nov 10, 2009 at 8:07 AM, bharath venkatesh
>  wrote:
> > how much ram would be good enough for the Solr JVM  to run comfortably.
>
> It really depends on how much stuff is cached, what fields you facet
> and sort on, etc.
>
> It can be easier to measure than to try and calculate it.
> Run jconsole to see the memory use, do a whole bunch of queries that
> do all the faceting, sorting, and function queries you will do in
> production.  Then invoke GC a few times in rapid succession via
> jconsole and see how much memory is actually used.  Double that to
> account for a new index searcher being opened while the current one is
> still open (that's just the worst case for Solr 1.4... the average
> reopen case is better since many segments can be shared).  Add a
> little more for safety.
>
> -Yonik
> http://www.lucidimagination.com
>

understanding how solr/lucene handles a select query (to analyze where solr/lucene is taking time )

2009-11-10 Thread bharath venkatesh

Hi,
As mentioned in my previous post
  ,
we are experiencing a delay (latency ) for 15 % of the request to solr .
delay is about 2-4 sec sometimes it even reaches 10 sec (noticed from apache
tomcat logs where solr is running , so internal  network issue ruled out).
So to fix the roblem we need to analyze where  solr/lucene is  taking time ,
for that  we need to understand how solr/lucene handles a select query
(what are the methods  being used )   . is there any doc or link which
explains the same in detail ?  . We are planning to change the source code
to log the time each method takes while solr handles a request so that we
can analyze where  solr/lucene is  taking time. I am not sure if this is the
right way (unless if this is the only way  ) . Is there any other way to
analyze where  solr/lucene is  taking time ?

so we need to know two  things :
 1.how solr/lucene handles a select query (link or doc will do ) ?
 2. any way to  anaylse where solr/lucene is taking time  ?

Thanks in Advance,
Bharath

[ANN] Solr 1.4.0 Released

2009-11-10 Thread Grant Ingersoll

Apache Solr 1.4 has been released and is now available for public  
download!

http://www.apache.org/dyn/closer.cgi/lucene/solr/

Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project.  Its major features include
powerful full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, and rich document (e.g., Word, PDF)
handling.  Solr is highly scalable, providing distributed search and
index replication, and it powers the search and navigation features of
many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text search server
within a servlet container such as Tomcat.  Solr uses the Lucene Java
search library at its core for full-text indexing and search, and has
REST-like HTTP/XML and JSON APIs that make it easy to use from virtually
any programming language.  Solr's powerful external configuration  
allows it to

be tailored to almost any type of application without Java coding, and
it has an extensive plugin architecture when more advanced
customization is required.


New Solr 1.4 features include
- Major performance enhancements in indexing, searching, and faceting
- Revamped all-Java index replication that's simple to configure and
can replicate config files
- Greatly improved database integration via the DataImportHandler
- Rich document processing (Word, PDF, HTML) via Apache Tika
- Dynamic search results clustering via Carrot2
- Multi-select faceting (support for multiple items in a single
category to be selected)
- Many powerful query enhancements, including ranges over arbitrary
functions, and nested queries of different syntaxes
- Many other plugins including Terms for auto-suggest, Statistics,
TermVectors, Deduplication

Getting Started

New to Solr?  Follow the steps below to get up and running ASAP.

1. Download Solr at http://www.apache.org/dyn/closer.cgi/lucene/solr/
2. Check out the tutorial at http://lucene.apache.org/solr/tutorial.html
3. Read the Solr wiki (http://wiki.apache.org/solr) to learn more
4. Join the community by subscribing to solr-user@lucene.apache.org
5. Give Back (Optional, but encouraged!)  See 
http://wiki.apache.org/solr/HowToContribute

For more information on Apache Solr, see http://lucene.apache.org/solr

Selection of terms for MoreLikeThis

2009-11-10 Thread Andrew Clegg


Hi,

If I run a MoreLikeThis query like the following:

http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=list&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1

one of the hits in the results is "and" (I don't do any stopword removal on
this field).

However if I look inside that document with the TermVectorComponent:

http://www.cathdb.info/solr/select/?q=id:3.40.50.720&tv=true&tv.all=true&tv.fl=keywords

I see that "and" has a measly tf.idf of 7.46E-4. But there are other terms
with *much* higher tf.idf scores, e.g.:


1
10
0.1


that *don't* appear in the MoreLikeThis list. (I tried adding &mlt.maxwl=999
to the end of the MLT query but it makes no difference.)

What's going on? Surely something with tf.idf = 0.1 is a far better
candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4? Or
does MoreLikeThis do some other heuristic magic to select good candidates,
and sometimes get it wrong?

BTW the keywords field is indexed, stored, multi-valued and term-vectored.

Thanks,

Andrew.

-- 
:: http://biotext.org.uk/ ::

-- 
View this message in context: 
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26286005.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting is very slow

2009-11-10 Thread Nicolas Dessaigne

I'm afraid there is no perfect solution for this problem, as you may always
have very long documents that will result in long response times, even with
a faster implementation (see https://issues.apache.org/jira/browse/SOLR-1268
).

The only way to avoid confusion for users and to ensure correct response
times is to truncate the indexed field. This way, every documents returned
can be highlighted... but you'll miss matches in long documents!

If you don't control the length of the documents and need highlight, either
you don't highlight all documents, either you don't find all documents. I
think that a pretty large copyfield (maybe 50k?) is usually enough for most
documents to be highlighted, but that depends on your corpus.

Good luck ;)
Nicolas

2009/11/9 Andrew Clegg 

>
>
> Nicolas Dessaigne wrote:
> >
> > Alternatively, you could use a copyfield with a maxChars limit as your
> > highlighting field. Works well in my case.
> >
>
> Thanks for the tip. We did think about doing something similar (only
> enabling highlighting for certain shorter fields) but we decided that
> perhaps users would be confused if search terms were sometimes
> snippeted+highlighted and sometimes not. (A brief run through with a single
> user suggested this, although that's not statistically significant...) So
> we
> decided to avoid highlighting altogether until we can do it across the
> board.
>
> Cheers,
>
> Andrew.
> --
> View this message in context:
> http://old.nabble.com/Highlighting-is-very-slow-tp26160216p26267441.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Converting SortableIntField to Integer (Externalizing)

2009-11-10 Thread Yonik Seeley

On Tue, Nov 10, 2009 at 10:26 AM, Chantal Ackermann
 wrote:
> has anyone some code snippet on how to convert the String representation of
> a SortableIntField (or SortableLongField or else) to a java.lang.Integer or
> int?

FieldType.indexedToReadable()

-Yonik
http://www.lucidimagination.com

[Fwd: [ANN] Solr 1.4.0 Released]

2009-11-10 Thread Sean Timm



--- Begin Message ---
Apache Solr 1.4 has been released and is now available for public  
download!

http://www.apache.org/dyn/closer.cgi/lucene/solr/

Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project.  Its major features include
powerful full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, and rich document (e.g., Word, PDF)
handling.  Solr is highly scalable, providing distributed search and
index replication, and it powers the search and navigation features of
many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text search server
within a servlet container such as Tomcat.  Solr uses the Lucene Java
search library at its core for full-text indexing and search, and has
REST-like HTTP/XML and JSON APIs that make it easy to use from virtually
any programming language.  Solr's powerful external configuration  
allows it to

be tailored to almost any type of application without Java coding, and
it has an extensive plugin architecture when more advanced
customization is required.


New Solr 1.4 features include
- Major performance enhancements in indexing, searching, and faceting
- Revamped all-Java index replication that's simple to configure and
can replicate config files
- Greatly improved database integration via the DataImportHandler
- Rich document processing (Word, PDF, HTML) via Apache Tika
- Dynamic search results clustering via Carrot2
- Multi-select faceting (support for multiple items in a single
category to be selected)
- Many powerful query enhancements, including ranges over arbitrary
functions, and nested queries of different syntaxes
- Many other plugins including Terms for auto-suggest, Statistics,
TermVectors, Deduplication

Getting Started

New to Solr?  Follow the steps below to get up and running ASAP.

1. Download Solr at http://www.apache.org/dyn/closer.cgi/lucene/solr/
2. Check out the tutorial at http://lucene.apache.org/solr/tutorial.html
3. Read the Solr wiki (http://wiki.apache.org/solr) to learn more
4. Join the community by subscribing to solr-user@lucene.apache.org
5. Give Back (Optional, but encouraged!)  See 
http://wiki.apache.org/solr/HowToContribute

For more information on Apache Solr, see http://lucene.apache.org/solr
--- End Message ---

Re: Configuring 1.4 - multi master setup?

2009-11-10 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Tue, Nov 10, 2009 at 7:58 PM, Walter Underwood  wrote:
> Replication creates very little load on the master, so you should not need
> to have a separate machine just to handle the replication.
>
> Why do you think you need that?
correct.

A repeater is setup when your main master is not located in the same LAN
>
> wunder
>
> On Nov 10, 2009, at 5:37 AM, Kevin Jackson wrote:
>
>> Hi,
>>
>> 2009/11/10 Noble Paul നോബിള്‍  नोब्ळ् :
>>>
>>> see the setting up a repeater section in this page
>>>
>>> http://wiki.apache.org/solr/SolrReplication
>>
>> Doh!
>>
>> Sorry for the noise
>>
>> Thanks,
>> Kev
>>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Configuring 1.4 - multi master setup?

2009-11-10 Thread Walter Underwood

If the master and slaves are separated by a WAN, sure, but Kev wants  
all the slaves to go to a single repeater in order to "reduce  
polling", so I doubt this is a WAN issue.


Just trying to keep the configuration simple. Only use a repeater if  
you actually need it.


wunder

On Nov 10, 2009, at 10:31 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:


On Tue, Nov 10, 2009 at 7:58 PM, Walter Underwood > wrote:
Replication creates very little load on the master, so you should  
not need

to have a separate machine just to handle the replication.

Why do you think you need that?

correct.

A repeater is setup when your main master is not located in the same  
LAN


wunder

On Nov 10, 2009, at 5:37 AM, Kevin Jackson wrote:


Hi,

2009/11/10 Noble Paul നോബിള്‍  नोब्ळ्  
:


see the setting up a repeater section in this page

http://wiki.apache.org/solr/SolrReplication


Doh!

Sorry for the noise

Thanks,
Kev








--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

HTMLStripCharFilterFactory not working when using SolrJ java client

2009-11-10 Thread aseem cheema

Hey Guys,
I have HTMLStripCharFilterFactory char filter declared in my
schema.xml for fieldType text (code below). I am using this field type
for body field of my schema. I am seeing different behavior when I use
SolrJ to post a document (code below) and when I use the analysis.jsp.
The text I am putting in the field is content.

When SolrJ is used, the field gets the whole value
content, but when analysis.jsp is used, it shows only
"content" being used for the field.

What am I possibly doing wrong here? How do I get
HTMLStripCharFilterFactory to work, even if I am pushing data using
SolrJ. Thanks.

Your help is highly appreciated.
Thanks
-- 
Aseem

# schema.xml ##

  
  
  
  
  
  
  
  


## SolrJ Code ##
 CommonsHttpSolrServer server = new
CommonsHttpSolrServer("http://aseem.desktop.amazon.com:8983/solr/sharepoint";);
  SolrInputDocument doc = new SolrInputDocument();
  UpdateRequest req = new UpdateRequest();
  doc.addField("url", "http://haha.com";);
  doc.addField("body", sbr.toString());*/
  doc.addField("body", "content");
  req.add(doc);
  req.setAction(ACTION.COMMIT, false, false);
  UpdateResponse resp = req.process(server);
  System.out.println(resp);

Re: [Fwd: [ANN] Solr 1.4.0 Released]

2009-11-10 Thread Sean Timm

Apologies.  Meant to forward the message to a corporate internal list.  
I blame my e-mail address auto-complete. ;-)


Sean Timm wrote:





Subject:
[ANN] Solr 1.4.0 Released
From:
Grant Ingersoll 
Date:
Tue, 10 Nov 2009 11:01:27 -0500
To:
solr-user@lucene.apache.org, gene...@lucene.apache.org, 
solr-...@lucene.apache.org, annou...@apache.org


To:
solr-user@lucene.apache.org, gene...@lucene.apache.org, 
solr-...@lucene.apache.org, annou...@apache.org



Apache Solr 1.4 has been released and is now available for public 
download!

http://www.apache.org/dyn/closer.cgi/lucene/solr/

Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project.  Its major features include
powerful full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, and rich document (e.g., Word, PDF)
handling.  Solr is highly scalable, providing distributed search and
index replication, and it powers the search and navigation features of
many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text search server
within a servlet container such as Tomcat.  Solr uses the Lucene Java
search library at its core for full-text indexing and search, and has
REST-like HTTP/XML and JSON APIs that make it easy to use from virtually
any programming language.  Solr's powerful external configuration 
allows it to

be tailored to almost any type of application without Java coding, and
it has an extensive plugin architecture when more advanced
customization is required.


New Solr 1.4 features include
- Major performance enhancements in indexing, searching, and faceting
- Revamped all-Java index replication that's simple to configure and
can replicate config files
- Greatly improved database integration via the DataImportHandler
- Rich document processing (Word, PDF, HTML) via Apache Tika
- Dynamic search results clustering via Carrot2
- Multi-select faceting (support for multiple items in a single
category to be selected)
- Many powerful query enhancements, including ranges over arbitrary
functions, and nested queries of different syntaxes
- Many other plugins including Terms for auto-suggest, Statistics,
TermVectors, Deduplication

Getting Started

New to Solr?  Follow the steps below to get up and running ASAP.

1. Download Solr at http://www.apache.org/dyn/closer.cgi/lucene/solr/
2. Check out the tutorial at http://lucene.apache.org/solr/tutorial.html
3. Read the Solr wiki (http://wiki.apache.org/solr) to learn more
4. Join the community by subscribing to solr-user@lucene.apache.org 

5. Give Back (Optional, but encouraged!) 
 See http://wiki.apache.org/solr/HowToContribute
 
For more information on Apache Solr, see http://lucene.apache.org/solr
=

How to Post Search Query

2009-11-10 Thread deepak agrawal

Hi All,

My Solr Search query is too long so i am not able to put it through get
method.
So i want to post it through POST method.
is there any way through i can POST the Search Query through POST Method.

-- 
DEEPAK AGRAWAL
+91-9379433455
GOOD LUCK.

RE: How to Post Search Query

2009-11-10 Thread Ankit Bhatnagar

Hi Deepak,
U can specify - METHOD.POST 

-Ankit

-Original Message-
From: deepak agrawal [mailto:dk.a...@gmail.com] 
Sent: Tuesday, November 10, 2009 3:08 PM
To: solr-user@lucene.apache.org
Subject: How to Post Search Query

Hi All,

My Solr Search query is too long so i am not able to put it through get
method.
So i want to post it through POST method.
is there any way through i can POST the Search Query through POST Method.

-- 
DEEPAK AGRAWAL
+91-9379433455
GOOD LUCK.

any docs on solr.EdgeNGramFilterFactory?

2009-11-10 Thread Peter Wolanin

This fairly recent blog post:

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

describes the use of the solr.EdgeNGramFilterFactory as the tokenizer
for the index.  I don't see any mention of that tokenizer on the Solr
wiki - is it just waiting to be added, or is there any other
documentation in addition to the blog post?  In particular, there was
a thread last year about using an N-gram tokenizer to enable
reasonable (if not ideal) searching of CJK text, so I'd be curious to
know how people are configuring their schema (with this tokenizer?)
for that use case.

Thanks,

Peter

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: HTMLStripCharFilterFactory not working when using SolrJ java client

2009-11-10 Thread aseem cheema

I printed the UpdateRequest object (getXML) and the XML is:
http://haha.comcontent

I can see that the issue is because the HTML/XML <> are replaced by < >
I understand that it is required to do so to keep them from
interfering with the solr xml document, but how do I accomplish what I
want to? I need to get the html in body field stripped out.

Any help is highly appreciated.
Thanks
Aseem

On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema  wrote:
> Hey Guys,
> I have HTMLStripCharFilterFactory char filter declared in my
> schema.xml for fieldType text (code below). I am using this field type
> for body field of my schema. I am seeing different behavior when I use
> SolrJ to post a document (code below) and when I use the analysis.jsp.
> The text I am putting in the field is content.
>
> When SolrJ is used, the field gets the whole value
> content, but when analysis.jsp is used, it shows only
> "content" being used for the field.
>
> What am I possibly doing wrong here? How do I get
> HTMLStripCharFilterFactory to work, even if I am pushing data using
> SolrJ. Thanks.
>
> Your help is highly appreciated.
> Thanks
> --
> Aseem
>
> # schema.xml ##
>        
>          
>          
>                            ignoreCase="true"
>                  words="stopwords.txt"
>                  enablePositionIncrements="true"
>                  />
>           generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1"                  catenateAll="0"
> splitOnCaseChange="1"/>
>          
>           synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>           protected="protwords.txt"/>
>          
>        
>
> ## SolrJ Code ##
>     CommonsHttpSolrServer server = new
> CommonsHttpSolrServer("http://aseem.desktop.amazon.com:8983/solr/sharepoint";);
>      SolrInputDocument doc = new SolrInputDocument();
>      UpdateRequest req = new UpdateRequest();
>      doc.addField("url", "http://haha.com";);
>      doc.addField("body", sbr.toString());*/
>      doc.addField("body", "content");
>      req.add(doc);
>      req.setAction(ACTION.COMMIT, false, false);
>      UpdateResponse resp = req.process(server);
>      System.out.println(resp);
>



-- 
Aseem

Field settings for best highlighting performance

2009-11-10 Thread Jake Brownell

Hi,

I've seen the use case for highlighting on:

http://wiki.apache.org/solr/FieldOptionsByUseCase

I just wanted to confirm that for best performance

Indexed=true
Stored=true
termVectors=true
termPositions=true

is the way to go for highlighting for Solr 1.4. Note that I'm not doing 
anything else with this field, it's just for highlighting.

Congratulations on the release, I'm particularly excited because it was soon 
enough to be included in our launch of full text search integration.

Thanks,
Jake

RE: Segment file not found error - after replicating

2009-11-10 Thread Maduranga Kannangara

Thanks Otis,

I did the du -s for all three index directories as you said right after 
replicating and when I find errors.

All three gave me the exact same value. This time I found the error in a rather 
small index too (31Mb).

BTW, if I copy the segment_x file to what Solr is looking for, and restart the 
Solr web-app from Tomcat manager, this resolves. But it's just a work around, 
never good enough for the production deployments.

My next plan is to do a remote debug to see what exactly happening in the code.

Any other things I should looking at?
Any help is really appreciated on this matter.

Thanks
Madu


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Tuesday, 10 November 2009 1:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Segment file not found error - after replicating

Madu,

So are you saying that all slaves have the exact same index, and that index is 
exactly the same as the one on the master, yet only some of those slaves 
exhibit this error, while others do not?  Mind listing index directories of 1) 
master 2) slave without errors, 3) slave with errors and doing:
du -s /path/to/index/on/master
du -s /path/to/index/on/slave/without/errors
du -s /path/to/index/on/slave/with/errors


Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Maduranga Kannangara 
> To: "solr-user@lucene.apache.org" 
> Sent: Mon, November 9, 2009 7:47:04 PM
> Subject: RE: Segment file not found error - after replicating
>
> Thanks Otis!
>
> Yes, I checked the index directories and they are 100% same, both timestamp 
> and
> size wise.
>
> Not all the slaves face this issue. I would say roughly 50% has this trouble.
>
> Logs do not have any errors too :-(
>
> Any other things I should do/look at?
>
> Cheers
> Madu
>
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Tuesday, 10 November 2009 9:26 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Segment file not found error - after replicating
>
> It's hard to troubleshoot blindly like this, but have you tried manually
> comparing the contents of the index dir on the master and on the slave(s)?
> If they are out of sync, have you tried forcing of replication to see if one 
> of
> the subsequent replication attempts gets the dirs in sync?
> Do you have more than 1 slave and do they all start having this problem at the
> same time?
> Any errors in the logs for any of the scripts involved in replication in 1.3?
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
> > From: Maduranga Kannangara
> > To: "solr-user@lucene.apache.org"
> > Sent: Sun, November 8, 2009 10:30:44 PM
> > Subject: Segment file not found error - after replicating
> >
> > Hi guys,
> >
> > We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux
> > environment and use the replication scripts to make replicas those live in
> load
> > balancing slaves.
> >
> > The issue we face quite often (only in Linux servers) is that they tend to 
> > not
>
> > been able to find the segment file (segment_x etc) after the replicating
> > completed. As this has become quite common, we started hitting a serious
> issue.
> >
> > Below is a stack trace, if that helps and any help on this matter is greatly
> > appreciated.
> >
> > 
> >
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created /admin/ping: org.apache.solr.handler.PingRequestHandler
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created /debug/dump: org.apache.solr.handler.DumpRequestHandler
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created gap: org.apache.solr.highlight.GapFragmenter
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created regex: org.apache.solr.highlight.RegexFragmenter
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created html: org.apache.solr.highlight.HtmlFormatter
> > Nov 5, 2009 11:34:46 PM org.apache.solr.servlet.SolrDispatchFilter init
> > SEVERE: Could not start SOLR. Check solr/home property
> > java.lang.RuntimeException: java.io.FileNotFoundException:
> > /solrinstances/solrhome01/data/index/segments_v (No such file or directory)
> > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960)
> > at org.apache.solr.core.SolrCore.(SolrCore.java:470)
> > at
> >
> org.apache.solr.core.CoreContainer$Initializer.initiali

Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Turner, Robbin J

I've been looking through all the documentation.  I've set up a single solr 
instance, and one multicore instance.  If someone would be willing to share 
some configuration examples and/or advise for setting up solr for distributing 
the search, I would really appreciate it.  I've read that there is a way to do 
it, but most of the current documentation doesn't provide enough example on 
what to do with solr.xml, and the solrconfig.xml.  Also, I'm using tomcat 6 for 
the servlet container.  I deployed the solr 1.4.0 released yesterday.

Thanks
RJ

Re: Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Otis Gospodnetic

RJ,

You may want to take a simpler step - single Solr core (no solr.xml needed) per 
machine.  Then distributed search really only requires that you specify shard 
URLs in the URL of the search requests.  In practice/production you rarely 
benefit from distributed search against multiple cores on the same server 
anyway.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR





From: "Turner, Robbin J" 
To: "solr-user@lucene.apache.org" 
Sent: Tue, November 10, 2009 5:58:52 PM
Subject: Request assistance with distributed search multi shard/core setup and 
configuration

I've been looking through all the documentation.  I've set up a single solr 
instance, and one multicore instance.  If someone would be willing to share 
some configuration examples and/or advise for setting up solr for distributing 
the search, I would really appreciate it.  I've read that there is a way to do 
it, but most of the current documentation doesn't provide enough example on 
what to do with solr.xml, and the solrconfig.xml.  Also, I'm using tomcat 6 for 
the servlet container.  I deployed the solr 1.4.0 released yesterday.

Thanks
RJ

Re: Segment file not found error - after replicating

2009-11-10 Thread Otis Gospodnetic

It sounds like your index is not being fully replicated.  I can't tell why, but 
I can suggest you try the new Solr 1.4 replication.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Maduranga Kannangara 
> To: "solr-user@lucene.apache.org" 
> Sent: Tue, November 10, 2009 5:42:44 PM
> Subject: RE: Segment file not found error - after replicating
> 
> Thanks Otis,
> 
> I did the du -s for all three index directories as you said right after 
> replicating and when I find errors.
> 
> All three gave me the exact same value. This time I found the error in a 
> rather 
> small index too (31Mb).
> 
> BTW, if I copy the segment_x file to what Solr is looking for, and restart 
> the 
> Solr web-app from Tomcat manager, this resolves. But it's just a work around, 
> never good enough for the production deployments.
> 
> My next plan is to do a remote debug to see what exactly happening in the 
> code.
> 
> Any other things I should looking at?
> Any help is really appreciated on this matter.
> 
> Thanks
> Madu
> 
> 
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Tuesday, 10 November 2009 1:14 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Segment file not found error - after replicating
> 
> Madu,
> 
> So are you saying that all slaves have the exact same index, and that index 
> is 
> exactly the same as the one on the master, yet only some of those slaves 
> exhibit 
> this error, while others do not?  Mind listing index directories of 1) master 
> 2) 
> slave without errors, 3) slave with errors and doing:
> du -s /path/to/index/on/master
> du -s /path/to/index/on/slave/without/errors
> du -s /path/to/index/on/slave/with/errors
> 
> 
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> - Original Message 
> > From: Maduranga Kannangara 
> > To: "solr-user@lucene.apache.org" 
> > Sent: Mon, November 9, 2009 7:47:04 PM
> > Subject: RE: Segment file not found error - after replicating
> >
> > Thanks Otis!
> >
> > Yes, I checked the index directories and they are 100% same, both timestamp 
> and
> > size wise.
> >
> > Not all the slaves face this issue. I would say roughly 50% has this 
> > trouble.
> >
> > Logs do not have any errors too :-(
> >
> > Any other things I should do/look at?
> >
> > Cheers
> > Madu
> >
> >
> > -Original Message-
> > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> > Sent: Tuesday, 10 November 2009 9:26 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Segment file not found error - after replicating
> >
> > It's hard to troubleshoot blindly like this, but have you tried manually
> > comparing the contents of the index dir on the master and on the slave(s)?
> > If they are out of sync, have you tried forcing of replication to see if 
> > one 
> of
> > the subsequent replication attempts gets the dirs in sync?
> > Do you have more than 1 slave and do they all start having this problem at 
> > the
> > same time?
> > Any errors in the logs for any of the scripts involved in replication in 
> > 1.3?
> >
> > Otis
> > --
> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >
> >
> >
> > - Original Message 
> > > From: Maduranga Kannangara
> > > To: "solr-user@lucene.apache.org"
> > > Sent: Sun, November 8, 2009 10:30:44 PM
> > > Subject: Segment file not found error - after replicating
> > >
> > > Hi guys,
> > >
> > > We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux
> > > environment and use the replication scripts to make replicas those live in
> > load
> > > balancing slaves.
> > >
> > > The issue we face quite often (only in Linux servers) is that they tend 
> > > to 
> not
> >
> > > been able to find the segment file (segment_x etc) after the replicating
> > > completed. As this has become quite common, we started hitting a serious
> > issue.
> > >
> > > Below is a stack trace, if that helps and any help on this matter is 
> > > greatly
> > > appreciated.
> > >
> > > 
> > >
> > > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> load
> > > INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers
> > > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> load
> > > INFO: created /admin/ping: org.apache.solr.handler.PingRequestHandler
> > > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> load
> > > INFO: created /debug/dump: org.apache.solr.handler.DumpRequestHandler
> > > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> load
> > > INFO: created gap: org.apache.solr.highlight.GapFragmenter
> > > Nov 5, 2009 11:34:46 PM org.apache.solr.u

RE: Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Turner, Robbin J

I've already done the single Solr, that's why my request.  I read on some site 
that there is a way to setup the configuration so I can send a query to one 
solr instance and have it pass it on or distribute it across all the instances?

Btw, thanks for the quick reply.
RJ 

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Tuesday, November 10, 2009 6:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Request assistance with distributed search multi shard/core setup 
and configuration

RJ,

You may want to take a simpler step - single Solr core (no solr.xml needed) per 
machine.  Then distributed search really only requires that you specify shard 
URLs in the URL of the search requests.  In practice/production you rarely 
benefit from distributed search against multiple cores on the same server 
anyway.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR





From: "Turner, Robbin J" 
To: "solr-user@lucene.apache.org" 
Sent: Tue, November 10, 2009 5:58:52 PM
Subject: Request assistance with distributed search multi shard/core setup and 
configuration

I've been looking through all the documentation.  I've set up a single solr 
instance, and one multicore instance.  If someone would be willing to share 
some configuration examples and/or advise for setting up solr for distributing 
the search, I would really appreciate it.  I've read that there is a way to do 
it, but most of the current documentation doesn't provide enough example on 
what to do with solr.xml, and the solrconfig.xml.  Also, I'm using tomcat 6 for 
the servlet container.  I deployed the solr 1.4.0 released yesterday.

Thanks
RJ

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-10 Thread Otis Gospodnetic

Peter,

For CJK and n-grams, I think you don't want the *Edge* n-grams, but just 
n-grams.
Before you take the n-gram route, you may want to look at the smart Chinese 
analyzer in Lucene contrib (I think it works only for Simplified Chinese) and 
Sen (on java.net).  I also spotted a Korean analyzer in the wild a few months 
back.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Peter Wolanin 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 10, 2009 4:06:52 PM
> Subject: any docs on solr.EdgeNGramFilterFactory?
> 
> This fairly recent blog post:
> 
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
> 
> describes the use of the solr.EdgeNGramFilterFactory as the tokenizer
> for the index.  I don't see any mention of that tokenizer on the Solr
> wiki - is it just waiting to be added, or is there any other
> documentation in addition to the blog post?  In particular, there was
> a thread last year about using an N-gram tokenizer to enable
> reasonable (if not ideal) searching of CJK text, so I'd be curious to
> know how people are configuring their schema (with this tokenizer?)
> for that use case.
> 
> Thanks,
> 
> Peter
> 
> -- 
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wola...@acquia.com

Re: Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Otis Gospodnetic

Right, that's http://wiki.apache.org/solr/DistributedSearch

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: "Turner, Robbin J" 
> To: "solr-user@lucene.apache.org" 
> Sent: Tue, November 10, 2009 6:05:19 PM
> Subject: RE: Request assistance with distributed search multi shard/core  
> setup and configuration
> 
> I've already done the single Solr, that's why my request.  I read on some 
> site 
> that there is a way to setup the configuration so I can send a query to one 
> solr 
> instance and have it pass it on or distribute it across all the instances?
> 
> Btw, thanks for the quick reply.
> RJ 
> 
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
> Sent: Tuesday, November 10, 2009 6:02 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Request assistance with distributed search multi shard/core 
> setup 
> and configuration
> 
> RJ,
> 
> You may want to take a simpler step - single Solr core (no solr.xml needed) 
> per 
> machine.  Then distributed search really only requires that you specify shard 
> URLs in the URL of the search requests.  In practice/production you rarely 
> benefit from distributed search against multiple cores on the same server 
> anyway.
> 
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> 
> 
> From: "Turner, Robbin J" 
> To: "solr-user@lucene.apache.org" 
> Sent: Tue, November 10, 2009 5:58:52 PM
> Subject: Request assistance with distributed search multi shard/core setup 
> and 
> configuration
> 
> I've been looking through all the documentation.  I've set up a single solr 
> instance, and one multicore instance.  If someone would be willing to share 
> some 
> configuration examples and/or advise for setting up solr for distributing the 
> search, I would really appreciate it.  I've read that there is a way to do 
> it, 
> but most of the current documentation doesn't provide enough example on what 
> to 
> do with solr.xml, and the solrconfig.xml.  Also, I'm using tomcat 6 for the 
> servlet container.  I deployed the solr 1.4.0 released yesterday.
> 
> Thanks
> RJ

Re: understanding how solr/lucene handles a select query (to analyze where solr/lucene is taking time )

2009-11-10 Thread Otis Gospodnetic

Hi,

I don't think there is anything inside Lucene/Solr that will give you granular 
timing information.  The only thing I can think of is using &debugQuery=true 
and looking at timing info for different search components.

You're better off using a profiler, though such slow queries tend to be the 
result of some bad setup (config, JVM...)

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: bharath venkatesh 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 10, 2009 10:30:49 AM
> Subject: understanding how solr/lucene handles a select query (to analyze  
> where solr/lucene is taking time )
> 
> Hi,
> As mentioned in my previous post
>   ,
> we are experiencing a delay (latency ) for 15 % of the request to solr .
> delay is about 2-4 sec sometimes it even reaches 10 sec (noticed from apache
> tomcat logs where solr is running , so internal  network issue ruled out).
> So to fix the roblem we need to analyze where  solr/lucene is  taking time ,
> for that  we need to understand how solr/lucene handles a select query
> (what are the methods  being used )   . is there any doc or link which
> explains the same in detail ?  . We are planning to change the source code
> to log the time each method takes while solr handles a request so that we
> can analyze where  solr/lucene is  taking time. I am not sure if this is the
> right way (unless if this is the only way  ) . Is there any other way to
> analyze where  solr/lucene is  taking time ?
> 
> so we need to know two  things :
> 1.how solr/lucene handles a select query (link or doc will do ) ?
> 2. any way to  anaylse where solr/lucene is taking time  ?
> 
> Thanks in Advance,
> Bharath

Re: tracking solr response time

2009-11-10 Thread Otis Gospodnetic

Hello,


- Original Message 
> From: bharath venkatesh 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 10, 2009 8:07:59 AM
> Subject: Re: tracking solr response time
> 
> Otis,
> 
>This means we have to leave enough space for os cache to cache the whole
> index  . so In  case of 16 GB index ., if  I am not wrong at least 16 GB
> memory must not be   allocated to any application for os cache to utilize
> the memory .

No, on the contrary - you want to give the process (the JVM in this case) 
enough so it works comfortably, but not too much, since it is not the process 
itself that will load and cache your data/index, but the OS.


Otis 
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR


> >> The operating systems are very good at maintaining this cache. It
> > > usually better to give the Solr JVM enough memory to run comfortably
> > > and rely on the OS cache to optimize disk I/O, instead of giving it
> > > all available ram.
> 
> how much ram would be good enough for the Solr JVM  to run comfortably.
> 
> 
> thanks,
> Bharath
> 
> 
> On Tue, Nov 10, 2009 at 3:59 AM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
> 
> > Bharat,
> >
> > No, you should not give the JVM so much memory.  Give it enough to avoid
> > overly frequent GC, but don't steal memory from the OS cache.
> >
> > Otis
> > --
> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >
> >
> >
> > - Original Message 
> > > From: bharath venkatesh 
> > > To: solr-user@lucene.apache.org
> > > Sent: Sun, November 8, 2009 2:15:00 PM
> > > Subject: Re: tracking solr response time
> > >
> > > Thanks  Lance for the clear explanation .. are you saying we should give
> > > solr JVM enough memory so that os cache can optimize disk I/O efficiently
> > ..
> > > that means in our case we have  16 GB  index so  would it  be enough to
> > > allocated solr JVM 20GB memory and rely on the OS cache to optimize disk
> > I/O
> > > i .e cache the index in memory  ??
> > >
> > >
> > > below is stats related to cache
> > >
> > >
> > > *name: * queryResultCache  *class: * org.apache.solr.search.LRUCache  *
> > > version: * 1.0  *description: * LRU Cache(maxSize=512, initialSize=512,
> > > autowarmCount=256,
> > > regenerator=org.apache.solr.search.solrindexsearche...@67e112b3)
> > > *stats: *lookups
> > > : 0
> > > hits : 0
> > > hitratio : 0.00
> > > inserts : 8
> > > evictions : 0
> > > size : 8
> > > cumulative_lookups : 15
> > > cumulative_hits : 7
> > > cumulative_hitratio : 0.46
> > > cumulative_inserts : 8
> > > cumulative_evictions : 0
> > >
> > >
> > > *name: * documentCache  *class: * org.apache.solr.search.LRUCache  *
> > > version: * 1.0  *description: * LRU Cache(maxSize=512, initialSize=512)
> >  *
> > > stats: *lookups : 0
> > > hits : 0
> > > hitratio : 0.00
> > > inserts : 0
> > > evictions : 0
> > > size : 0
> > > cumulative_lookups : 744
> > > cumulative_hits : 639
> > > cumulative_hitratio : 0.85
> > > cumulative_inserts : 105
> > > cumulative_evictions : 0
> > >
> > >
> > > *name: * filterCache  *class: * org.apache.solr.search.LRUCache
> > > *version: *1.0
> > > *description: * LRU Cache(maxSize=512, initialSize=512,
> > autowarmCount=256,
> > > regenerator=org.apache.solr.search.solrindexsearche...@1e3dbf67)
> > > *stats: *lookups
> > > : 0
> > > hits : 0
> > > hitratio : 0.00
> > > inserts : 20
> > > evictions : 0
> > > size : 12
> > > cumulative_lookups : 64
> > > cumulative_hits : 60
> > > cumulative_hitratio : 0.93
> > > cumulative_inserts : 12
> > > cumulative_evictions : 0
> > >
> > >
> > > hits and hit ratio are  zero for ducment cache , filter cache and query
> > > cache ..  only commulative hits and hitratio has a non zero numbers ..
> >  is
> > > this how it is supposed to be .. or do we to configure it properly ?
> > >
> > > Thanks,
> > > Bharath
> > >
> > >
> > >
> > >
> > >
> > > On Sat, Nov 7, 2009 at 5:47 AM, Lance Norskog wrote:
> > >
> > > > The OS cache is the memory used by the operating system (Linux or
> > > > Windows) to store a cache of the data stored on the disk. The cache is
> > > > usually by block numbers and are not correlated to files. Disk blocks
> > > > that are not used by programs are slowly pruned from the cache.
> > > >
> > > > The operating systems are very good at maintaining this cache. It
> > > > usually better to give the Solr JVM enough memory to run comfortably
> > > > and rely on the OS cache to optimize disk I/O, instead of giving it
> > > > all available ram.
> > > >
> > > > Solr has its own caches for certain data structures, and there are no
> > > > solid guidelines for tuning those. The solr/admin/stats.jsp page shows
> > > > the number of hits & deletes for the caches and most people just
> > > > reload that over & over.
> > > >
> > > > On Fri, Nov 6, 2009 at 3:09 AM, bharath venkatesh
> > > > wrote:
> >

Re: HTMLStripCharFilterFactory not working when using SolrJ java client

2009-11-10 Thread aseem cheema

HTMLStripCharFilterFactory class has a constructor that accept
escaptedTags. I believe this will solve my problem. But I am not sure
how to pass this from schema.xml file. I have tried  but
that didn't work.
Anybody?
Thanks

On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema  wrote:
> Hey Guys,
> I have HTMLStripCharFilterFactory char filter declared in my
> schema.xml for fieldType text (code below). I am using this field type
> for body field of my schema. I am seeing different behavior when I use
> SolrJ to post a document (code below) and when I use the analysis.jsp.
> The text I am putting in the field is content.
>
> When SolrJ is used, the field gets the whole value
> content, but when analysis.jsp is used, it shows only
> "content" being used for the field.
>
> What am I possibly doing wrong here? How do I get
> HTMLStripCharFilterFactory to work, even if I am pushing data using
> SolrJ. Thanks.
>
> Your help is highly appreciated.
> Thanks
> --
> Aseem
>
> # schema.xml ##
>        
>          
>          
>                            ignoreCase="true"
>                  words="stopwords.txt"
>                  enablePositionIncrements="true"
>                  />
>           generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1"                  catenateAll="0"
> splitOnCaseChange="1"/>
>          
>           synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>           protected="protwords.txt"/>
>          
>        
>
> ## SolrJ Code ##
>     CommonsHttpSolrServer server = new
> CommonsHttpSolrServer("http://aseem.desktop.amazon.com:8983/solr/sharepoint";);
>      SolrInputDocument doc = new SolrInputDocument();
>      UpdateRequest req = new UpdateRequest();
>      doc.addField("url", "http://haha.com";);
>      doc.addField("body", sbr.toString());*/
>      doc.addField("body", "content");
>      req.add(doc);
>      req.setAction(ACTION.COMMIT, false, false);
>      UpdateResponse resp = req.process(server);
>      System.out.println(resp);
>



-- 
Aseem

RE: Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Turner, Robbin J

Thanks, I had already read through this url.  I guess my request was is there a 
way to setup something that is already part of solr itself to pass the 
URL[shard...] then having create a custom handler.

thanks

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Tuesday, November 10, 2009 6:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Request assistance with distributed search multi shard/core setup 
and configuration

Right, that's http://wiki.apache.org/solr/DistributedSearch

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: "Turner, Robbin J" 
> To: "solr-user@lucene.apache.org" 
> Sent: Tue, November 10, 2009 6:05:19 PM
> Subject: RE: Request assistance with distributed search multi 
> shard/core  setup and configuration
> 
> I've already done the single Solr, that's why my request.  I read on 
> some site that there is a way to setup the configuration so I can send 
> a query to one solr instance and have it pass it on or distribute it across 
> all the instances?
> 
> Btw, thanks for the quick reply.
> RJ
> 
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Tuesday, November 10, 2009 6:02 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Request assistance with distributed search multi 
> shard/core setup and configuration
> 
> RJ,
> 
> You may want to take a simpler step - single Solr core (no solr.xml 
> needed) per machine.  Then distributed search really only requires 
> that you specify shard URLs in the URL of the search requests.  In 
> practice/production you rarely benefit from distributed search against 
> multiple cores on the same server anyway.
> 
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> 
> 
> From: "Turner, Robbin J" 
> To: "solr-user@lucene.apache.org" 
> Sent: Tue, November 10, 2009 5:58:52 PM
> Subject: Request assistance with distributed search multi shard/core 
> setup and configuration
> 
> I've been looking through all the documentation.  I've set up a single 
> solr instance, and one multicore instance.  If someone would be 
> willing to share some configuration examples and/or advise for setting 
> up solr for distributing the search, I would really appreciate it.  
> I've read that there is a way to do it, but most of the current 
> documentation doesn't provide enough example on what to do with 
> solr.xml, and the solrconfig.xml.  Also, I'm using tomcat 6 for the servlet 
> container.  I deployed the solr 1.4.0 released yesterday.
> 
> Thanks
> RJ

Re: Are subqueries possible in Solr? If so, are they performant?

2009-11-10 Thread Otis Gospodnetic

No, I don't think you can do that with Solr.  Somebody will correct me if I'm 
wrong. :)

What you are describing are SQL sub-queries and the closest things I can think 
of are using AND as I mentioned, and maybe using filter queries (the "fq" 
parameter).


Otis 
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Vicky_Dev 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 10, 2009 1:23:07 AM
> Subject: Re: Are subqueries possible in Solr? If so, are they performant?
> 
> 
> Thanks Otis for your response.
> 
> Is it possible to get result of one solr query feed into another Solr Query?
> 
> Issue which I am facing right now is::
> I am getting results from one query and I just need 2 index attribute values
> . These index attribute values are used for form new Query to Solr. 
> 
> Since Solr gives result only for GET request, hence there is restriction on
> : forming query with all values.
> 
> Please do send your views on above problem
> 
> Thanks
> ~Vikrant
> 
> 
> 
> 
> Otis Gospodnetic wrote:
> > 
> > You can mimic them by combining 2 clauses with an AND.
> > e.g.
> > cookies
> > vs.
> > cookies AND vanilla
> > 
> > Otis
> > --
> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> > 
> > 
> > 
> > - Original Message 
> >> From: Vicky_Dev 
> >> To: solr-user@lucene.apache.org
> >> Sent: Mon, November 9, 2009 1:48:03 PM
> >> Subject: Re: Are subqueries possible in Solr? If so, are they performant?
> >> 
> >> 
> >> 
> >> Hi Team,
> >> Is it possible to write subqueries in dismaxrequest handler?
> >> 
> >> ~Vikrant
> >> 
> >> 
> >> Edoardo Marcora wrote:
> >> > 
> >> > Does Solr have the ability to do subqueries, like this one (in SQL):
> >> > 
> >> > SELECT id, first_name
> >> > FROM student_details
> >> > WHERE first_name IN (SELECT first_name
> >> > FROM student_details
> >> > WHERE subject= 'Science'); 
> >> > 
> >> > If so, how performant is this kind of queries?
> >> > 
> >> 
> >> -- 
> >> View this message in context: 
> >> 
> http://old.nabble.com/Are-subqueries-possible-in-Solr--If-so%2C-are-they-performant--tp24467023p26271600.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/Are-subqueries-possible-in-Solr--If-so%2C-are-they-performant--tp24467023p26278872.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-10 Thread Peter Wolanin

So, this is the normal N-gram one?  NGramTokenizerFactory

Digging deeper - there are actualy CJK and Chinese tokenizers in the
Solr codebase:

http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html
http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html

The CJK one uses the lucene CJKTokenizer
http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html

and there seems to be another one even that no one has wrapped into Solr:
http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html

So seems like the existing options are a little better than I thought,
though it would be nice to have some docs on properly configuring
these.

-Peter

On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic
 wrote:
> Peter,
>
> For CJK and n-grams, I think you don't want the *Edge* n-grams, but just 
> n-grams.
> Before you take the n-gram route, you may want to look at the smart Chinese 
> analyzer in Lucene contrib (I think it works only for Simplified Chinese) and 
> Sen (on java.net).  I also spotted a Korean analyzer in the wild a few months 
> back.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
>> From: Peter Wolanin 
>> To: solr-user@lucene.apache.org
>> Sent: Tue, November 10, 2009 4:06:52 PM
>> Subject: any docs on solr.EdgeNGramFilterFactory?
>>
>> This fairly recent blog post:
>>
>> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
>>
>> describes the use of the solr.EdgeNGramFilterFactory as the tokenizer
>> for the index.  I don't see any mention of that tokenizer on the Solr
>> wiki - is it just waiting to be added, or is there any other
>> documentation in addition to the blog post?  In particular, there was
>> a thread last year about using an N-gram tokenizer to enable
>> reasonable (if not ideal) searching of CJK text, so I'd be curious to
>> know how people are configuring their schema (with this tokenizer?)
>> for that use case.
>>
>> Thanks,
>>
>> Peter
>>
>> --
>> Peter M. Wolanin, Ph.D.
>> Momentum Specialist,  Acquia. Inc.
>> peter.wola...@acquia.com
>
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Apache Hadoop Get Together Berlin - December 2009

2009-11-10 Thread Isabel Drost


As announced at ApacheCon US, the next Apache Hadoop Get Together Berlin is 
scheduled for December 2009.

When: Wednesday December 16, 2009  at 5:00pm 
Where: newthinking store, Tucholskystr. 48, Berlin

As always there will be slots of 20min each for talks on your Hadoop topic. 
After each talk there will be a lot time to discuss. You can order drinks 
directly at the bar in the newthinking store. If you like, you can order 
pizza. We will go to Cafe Aufsturz after the event for some beer and 
something to eat.

Talks scheduled so far:

Richard Hutton (nugg.ad): "Moving from five days to one hour." - This talk 
explains how we made data processing scalable at nugg.ad. The company's core 
business is online advertisement targeting. Our servers receive 10,000 
requests per second resulting in data of 100GB per day.

As the classical data warehouse solution reached its limit, we moved to a 
framework built on top of Hadoop to make analytics speedy,data mining 
detailed and all of our lives easier. We will give an overview of our 
solution involving file system structures, scheduling, messaging and 
programming languages from the future.

Jörg Möllenkamp (Sun): "Hadoop on Sun"
Abstract: Hadoop is a well known technology inside of Sun. This talk want to 
show some interesting use cases of Hadoop in conjunction with Sun 
technologies. The first show case wants to demonstrate how Hadoop can used to 
load massive multicore system with up to 256 threads in a single system to 
the max. The second use case shows how several mechanisms integrated in 
Solaris can ease the deployment and operation of Hadoop even in non-dedicated 
environments. The last usecase will show the combination of the Sun Grid 
Engine and Hadoop. Talk may contain command-line demonstrations ;).

Nikolaus Pohle (nurago): "M/R for MR - Online Market Research powered by 
Apache Hadoop. Enable consultants to analyze online behavior for audience 
segmentation, advertising effects and usage patterns."

We would like to invite you, the visitor to also tell your Hadoop story, if 
you like, you can bring slides - there will be a beamer.

A big Thanks goes to the newthinking store for providing a room in the center 
of Berlin for us. Another big thanks goes to StudiVZ for sponsoring videos of 
the talks. Links to the videos will be posted here as well as on the StudiVZ 
blog.

Please do indicate on the following upcoming event if you are planning to 
attend to make planning (and booking tables at Aufsturz) easier:

http://upcoming.yahoo.com/event/4842528/


Looking forward to seeing you in Berlin,
Isabel

-- 
  |\  _,,,---,,_   Web:   
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  



signature.asc
Description: This is a digitally signed message part.

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-10 Thread Otis Gospodnetic

Yes, that's the n-gram one.  I believe the existing CJK one in Lucene is really 
just an n-gram tokenizer, so no different than the normal n-gram tokenizer.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Peter Wolanin 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 10, 2009 7:34:37 PM
> Subject: Re: any docs on solr.EdgeNGramFilterFactory?
> 
> So, this is the normal N-gram one?  NGramTokenizerFactory
> 
> Digging deeper - there are actualy CJK and Chinese tokenizers in the
> Solr codebase:
> 
> http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html
> http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html
> 
> The CJK one uses the lucene CJKTokenizer
> http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html
> 
> and there seems to be another one even that no one has wrapped into Solr:
> http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html
> 
> So seems like the existing options are a little better than I thought,
> though it would be nice to have some docs on properly configuring
> these.
> 
> -Peter
> 
> On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic
> wrote:
> > Peter,
> >
> > For CJK and n-grams, I think you don't want the *Edge* n-grams, but just 
> n-grams.
> > Before you take the n-gram route, you may want to look at the smart Chinese 
> analyzer in Lucene contrib (I think it works only for Simplified Chinese) and 
> Sen (on java.net).  I also spotted a Korean analyzer in the wild a few months 
> back.
> >
> > Otis
> > --
> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >
> >
> >
> > - Original Message 
> >> From: Peter Wolanin 
> >> To: solr-user@lucene.apache.org
> >> Sent: Tue, November 10, 2009 4:06:52 PM
> >> Subject: any docs on solr.EdgeNGramFilterFactory?
> >>
> >> This fairly recent blog post:
> >>
> >> 
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
> >>
> >> describes the use of the solr.EdgeNGramFilterFactory as the tokenizer
> >> for the index.  I don't see any mention of that tokenizer on the Solr
> >> wiki - is it just waiting to be added, or is there any other
> >> documentation in addition to the blog post?  In particular, there was
> >> a thread last year about using an N-gram tokenizer to enable
> >> reasonable (if not ideal) searching of CJK text, so I'd be curious to
> >> know how people are configuring their schema (with this tokenizer?)
> >> for that use case.
> >>
> >> Thanks,
> >>
> >> Peter
> >>
> >> --
> >> Peter M. Wolanin, Ph.D.
> >> Momentum Specialist,  Acquia. Inc.
> >> peter.wola...@acquia.com
> >
> >
> 
> 
> 
> -- 
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wola...@acquia.com

Re: Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Otis Gospodnetic

Hm, I don't follow.  You don't need to create a custom (request) handler to 
make use of Solr's distributed search.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: "Turner, Robbin J" 
> To: "solr-user@lucene.apache.org" 
> Sent: Tue, November 10, 2009 6:41:32 PM
> Subject: RE: Request assistance with distributed search multi shard/core   
> setup and configuration
> 
> Thanks, I had already read through this url.  I guess my request was is there 
> a 
> way to setup something that is already part of solr itself to pass the 
> URL[shard...] then having create a custom handler.
> 
> thanks
> 
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
> Sent: Tuesday, November 10, 2009 6:09 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Request assistance with distributed search multi shard/core 
> setup 
> and configuration
> 
> Right, that's http://wiki.apache.org/solr/DistributedSearch
> 
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> - Original Message 
> > From: "Turner, Robbin J" 
> > To: "solr-user@lucene.apache.org" 
> > Sent: Tue, November 10, 2009 6:05:19 PM
> > Subject: RE: Request assistance with distributed search multi 
> > shard/core  setup and configuration
> > 
> > I've already done the single Solr, that's why my request.  I read on 
> > some site that there is a way to setup the configuration so I can send 
> > a query to one solr instance and have it pass it on or distribute it across 
> all the instances?
> > 
> > Btw, thanks for the quick reply.
> > RJ
> > 
> > -Original Message-
> > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> > Sent: Tuesday, November 10, 2009 6:02 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Request assistance with distributed search multi 
> > shard/core setup and configuration
> > 
> > RJ,
> > 
> > You may want to take a simpler step - single Solr core (no solr.xml 
> > needed) per machine.  Then distributed search really only requires 
> > that you specify shard URLs in the URL of the search requests.  In 
> > practice/production you rarely benefit from distributed search against 
> > multiple cores on the same server anyway.
> > 
> > Otis
> > --
> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> > 
> > 
> > 
> > 
> > 
> > From: "Turner, Robbin J" 
> > To: "solr-user@lucene.apache.org" 
> > Sent: Tue, November 10, 2009 5:58:52 PM
> > Subject: Request assistance with distributed search multi shard/core 
> > setup and configuration
> > 
> > I've been looking through all the documentation.  I've set up a single 
> > solr instance, and one multicore instance.  If someone would be 
> > willing to share some configuration examples and/or advise for setting 
> > up solr for distributing the search, I would really appreciate it.  
> > I've read that there is a way to do it, but most of the current 
> > documentation doesn't provide enough example on what to do with 
> > solr.xml, and the solrconfig.xml.  Also, I'm using tomcat 6 for the servlet 
> container.  I deployed the solr 1.4.0 released yesterday.
> > 
> > Thanks
> > RJ

Re: long startup time

2009-11-10 Thread Otis Gospodnetic

I'm not sure if anyone answered this.
The "2 minutes" makes me think it's a DNS lookup timeout.  Is something trying 
to look up some host name? (say from the top of some XML file)

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Teruhiko Kurosaka 
> To: "solr-user@lucene.apache.org" 
> Sent: Mon, October 26, 2009 8:32:54 PM
> Subject: long startup time
> 
> I've been testing Solr 1.4.0 (RC).
> After sometime, solr started to pause
> for a long time (a minutes or two) after
> printing:
> 
> INFO:  jetty-6.1.3
> 
> Sometime it starts immediately, but more often
> than not, it pasues.  Is there any known cause
> of this kind of long pause?
> 
> -kuro

memory size

2009-11-10 Thread Jörg Agatz

Hallo,

I have a Problem withe the Memory Size, but i dont know how i can repair it.

Maby it is a PHP problem, but i dont know.

My Error:

Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to
allocate 16515072 bytes)


I hope you can help me

KinGArtus

53 matches

Mail list logo