Re: Count disctint groups in grouping distributed

2012-09-12 Thread Jason Rutherglen
Distinct in a distributed environment would require de-duplication en-masse, use Hive or MapReduce instead. On Wed, Sep 12, 2012 at 11:53 AM, yriveiro wrote: > Hi, > > Exists the possibility of do a distinct group count in a grouping done using > a sharding schema? > > This issue https://issues.a

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
documents? > > Which open source library are you referring to? Will Solr adopt this > per-segment approach any time soon? > > Thanks > > > ____ > From: Jason Rutherglen > To: solr-user@lucene.apache.org > Sent: Saturday, July 7, 2012 2:05

Re: Grouping and Averages

2012-07-07 Thread Jason Rutherglen
ftware Engineer > http://LinkedIn.com/in/**JeremyBranham<http://LinkedIn.com/in/JeremyBranham> > http://jeremybranham.**wordpress.com/<http://jeremybranham.wordpress.com/> > http://Zeroth.biz > > -Original Message- From: Jason Rutherglen > Sent: Saturday, July 07, 2012 2

Re: Grouping and Averages

2012-07-07 Thread Jason Rutherglen
Average should be doable in Solr, maybe not today, not sure. Median is the challenge :) Try Hive. On Sat, Jul 7, 2012 at 3:34 PM, Walter Underwood wrote: > It sounds like you need a database for analytics, not a search engine. > > Solr cannot do aggregates like that. It can select and group, bu

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
ache to per-segment? How do I do that? > > Thanks. > > > ________ > From: Jason Rutherglen > To: solr-user@lucene.apache.org > Sent: Saturday, July 7, 2012 11:32 AM > Subject: Re: Nrt and caching > > The field caches are per-segment, whi

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
, as with some other Apache licensed Lucene based search engines. On Sat, Jul 7, 2012 at 10:42 AM, Yonik Seeley wrote: > On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen > wrote: > > Currently the caches are stored per-multiple-segments, meaning after each > > 'soft'

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
Hi Amit, If the caches were per-segment, then NRT would be optimal in Solr. Currently the caches are stored per-multiple-segments, meaning after each 'soft' commit, the cache(s) will be purged. On Fri, Jul 6, 2012 at 9:45 PM, Amit Nithian wrote: > Sorry I'm a bit new to the nrt stuff in solr b

Re: Search timeout for Solrcloud

2012-06-05 Thread Jason Rutherglen
There isn't a solution for killing long running queries that works. On Tue, Jun 5, 2012 at 1:34 AM, arin_g wrote: > Hi, > We use solrcloud in production, and we are facing some issues with queries > that take very long specially deep paging queries, these queries keep our > servers very busy. i a

Re: Solr Merge during off peak times

2012-05-02 Thread Jason Rutherglen
> BTW, in 4.0, there's DocumentWriterPerThread that > merges in the background It flushes without pausing, but does not perform merges. Maybe you're thinking of ConcurrentMergeScheduler? On Wed, May 2, 2012 at 7:26 AM, Erick Erickson wrote: > Optimizing is much less important query-speed wise >

Re: Benchmark Solr vs Elastic Search vs Sensei

2012-04-27 Thread Jason Rutherglen
I think Datatax Enterprise is faster than Solr Cloud with transaction logging turned on. Cassandra has it's own fast(er) transaction logging mechanism. Of course it's best to use two HDs when testing, eg, one for the data, the other for the transaction log. On Fri, Apr 27, 2012 at 12:58 PM, Jeff

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Jason Rutherglen
would recommend you to ask on ES ML with further questions, I > do not want to run into system X vs system Y flame here...) > > Regards, > Lukas > > On Wed, Apr 18, 2012 at 2:22 PM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > >> I'm curious

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Jason Rutherglen
dd/remove > indices, nodes and aliases on the fly I think there is a way how to handle > growing data set with ease. If anyone is interested such scenario has been > discussed in detail in ES mail list. > > Regards, > Lukas > > On Tue, Apr 17, 2012 at 2:42 AM, Jason Ruthergl

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-17 Thread Jason Rutherglen
ext.com/spm/solr-performance-monitoring/index.html > > > >> >> From: Jason Rutherglen >>To: solr-user@lucene.apache.org >>Sent: Monday, April 16, 2012 8:42 PM >>Subject: Re: Options for automagically Scaling Solr (without needing >>

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-16 Thread Jason Rutherglen
One of big weaknesses of Solr Cloud (and ES?) is the lack of the ability to redistribute shards across servers. Meaning, as a single shard grows too large, splitting the shard, while live updates. How do you plan on elastically adding more servers without this feature? Cassandra and HBase handle

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-15 Thread Jason Rutherglen
This was done in SOLR-1301 going on several years ago now. On Sat, Apr 14, 2012 at 4:11 PM, Lance Norskog wrote: > It sounds like you really want the final map/reduce phase to put Solr > index files into HDFS. Solr has a feature to do this called 'Embedded > Solr'. This packages Solr as a library

Re: Frequent garbage collections after a day of operation

2012-02-16 Thread Jason Rutherglen
> One thing that could fit the pattern you describe would be Solr caches > filling up and getting you too close to your JVM or memory limit This [uncommitted] issue would solve that problem by allowing the GC to collect caches that become too large, though in practice, the cache setting would need

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Jason Rutherglen
own-either) > commercial message policy, I'll echo Ted's observation that some commercial > messages (depending on content, tone and context) are acceptable. > > Steve > >> -Original Message- >> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] >

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Jason Rutherglen
A Rowe wrote: > Why Jason, I declare, whatever do you mean? > > >> -Original Message- >> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] >> Sent: Wednesday, January 18, 2012 8:29 PM >> To: solr-user@lucene.apache.org >> Subject: Re: How to

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Jason Rutherglen
Steven, If you are going to admonish people for advertising, it should be equally dished out or not at all. On Wed, Jan 18, 2012 at 6:38 PM, Steven A Rowe wrote: > Hi Peter, > > Commercial solicitations are taboo here, except in the context of a request > for help that is directly relevant to a

Re: soft commit

2012-01-03 Thread Jason Rutherglen
Yonik Seeley wrote: > On Tue, Jan 3, 2012 at 5:03 PM, Jason Rutherglen > wrote: >> Yikes.  I'd love to see a test showing that un-inverted field cache >> (which is for ALL segments as a single unit) can be used efficiently >> with NRT / soft commit. > > Pleas

Re: soft commit

2012-01-03 Thread Jason Rutherglen
The main point is, Solr unlike for example Elastic Search and other Lucene based systems does NOT cache filters or facets per-segment. This is a fundamental design flaw. On Tue, Jan 3, 2012 at 1:50 PM, Yonik Seeley wrote: > On Tue, Jan 3, 2012 at 4:36 PM, Erik Hatcher wrote: >> As I understand

Re: soft commit

2012-01-03 Thread Jason Rutherglen
> multi-select faceting Yikes. I'd love to see a test showing that un-inverted field cache (which is for ALL segments as a single unit) can be used efficiently with NRT / soft commit. On Tue, Jan 3, 2012 at 1:50 PM, Yonik Seeley wrote: > On Tue, Jan 3, 2012 at 4:36 PM, Erik Hatcher wrote: >> A

Re: soft commit

2012-01-03 Thread Jason Rutherglen
*Laugh* I stand by what Mark said: "Right - in most NRT cases (very frequent soft commits), the cache should probably be disabled." On Mon, Jan 2, 2012 at 7:45 PM, Yonik Seeley wrote: > On Mon, Jan 2, 2012 at 9:58 PM, Jason Rutherglen > wrote: >>> It still normall

Re: soft commit

2012-01-02 Thread Jason Rutherglen
> It still normally makes sense to have the caches enabled (esp filter and > document caches). In the NRT case that statement is completely incorrect On Mon, Jan 2, 2012 at 5:37 PM, Yonik Seeley wrote: > On Mon, Jan 2, 2012 at 1:28 PM, Mark Miller wrote: >> Right - in most NRT cases (very freq

Re: Core overhead

2011-12-16 Thread Jason Rutherglen
Ted, The list would be unreadable if everyone spammed at the bottom their email like Otis'. It's just bad form. Jason On Fri, Dec 16, 2011 at 12:00 PM, Ted Dunning wrote: > Sounds like we disagree. > > On Fri, Dec 16, 2011 at 11:56 AM, Jason Rutherglen < > jason.ru

Re: Core overhead

2011-12-16 Thread Jason Rutherglen
tool that > will help you solve your problem".  That is responsive to the OP and it is > clear that it is a commercial deal. > > On Fri, Dec 16, 2011 at 10:02 AM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > >> Wow the shameless plugging of product (fo

Re: Core overhead

2011-12-16 Thread Jason Rutherglen
Wow the shameless plugging of product (footer) has hit a new low Otis. On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic wrote: > Hi Yury, > > Not sure if this was already covered in this thread, but with N smaller cores > on a single N-CPU-core box you could run N queries in parallel over small

Re: overwrite=false support with SolrJ client

2011-11-04 Thread Jason Rutherglen
It should be supported in SolrJ, I'm surprised it's been lopped out. Bulk indexing is extremely common. On Fri, Nov 4, 2011 at 1:16 PM, Ken Krugler wrote: > Hi list, > > I'm working on improving the performance of the Solr scheme for Cascading. > > This supports generating a Solr index as the out

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Jason Rutherglen
> abstract away the encoding of the index Robert, this is what you wrote. "Abstract away the encoding of the index" means pluggable, otherwise it's not abstract and / or it's a flawed design. Sounds like it's the latter.

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Jason Rutherglen
postings at a later time. Otherwise it will be 6 months before 4.0 ships, that's too long. Also it is an amusing contradiction that your argument flies in the face of Lucid shipping 4.x today without said functionality. On Fri, Oct 28, 2011 at 5:09 PM, Robert Muir wrote: > On Fri, Oct 28

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Jason Rutherglen
> We should maybe try to fix this in 3.x too? +1 I suggested it should be backported a while back. Or that Lucene 4.x should be released. I'm not sure what is holding up Lucene 4.x at this point, bulk postings is only needed useful for PFOR. On Fri, Oct 28, 2011 at 3:27 PM, Simon Willnauer wro

Re: How to make UnInvertedField faster?

2011-10-21 Thread Jason Rutherglen
Sweet + Very cool! On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > In trunk we have a feature called IndexDocValues which basically > creates the uninverted structure at index time. You can then simply > suck that into memory or even access it on disk d

Re: Solr vs ElasticSearch

2011-06-01 Thread Jason Rutherglen
Jonathan, This is all true, however it ends up being hacky (this is from experience) and the core on the source needs to be deleted. Feel free to post to the issue. Jason On Wed, Jun 1, 2011 at 8:44 AM, Jonathan Rochkind wrote: > On 6/1/2011 10:52 AM, Jason Rutherglen wrote: >> >&

Re: Solr vs ElasticSearch

2011-06-01 Thread Jason Rutherglen
> And some way to delete the core when it has been transferred. Right, I manually added that to CoreAdminHandler. I opened an issue to try to solve this problem: SOLR-2569 On Wed, Jun 1, 2011 at 8:26 AM, Upayavira wrote: > > > On Wed, 01 Jun 2011 07:52 -0700, "Jason Rut

Re: Solr vs ElasticSearch

2011-06-01 Thread Jason Rutherglen
Upayavira wrote: > > > On Tue, 31 May 2011 19:38 -0700, "Jason Rutherglen" > wrote: >> Mark, >> >> Nice email address.  I personally have no idea, maybe ask Shay Banon >> to post an answer?  I think it's possible to make Solr more elastic, >> eg

Re: Solr vs ElasticSearch

2011-05-31 Thread Jason Rutherglen
Thanks Shashi, this is oddly coincidental with another issue being put into Solr (SOLR-2193) to help solve some of the NRT issues, the timing is impeccable. At a base however Solr uses Lucene, as does ES. I think the main advantage of ES is the auto-sharding etc. I think it uses a gossip protoco

Re: Solr vs ElasticSearch

2011-05-31 Thread Jason Rutherglen
Mark, Nice email address. I personally have no idea, maybe ask Shay Banon to post an answer? I think it's possible to make Solr more elastic, eg, it's currently difficult to make it move cores between servers without a lot of manual labor. Jason On Tue, May 31, 2011 at 7:33 PM, Mark wrote: >

Re: Can the Suggester be updated incrementally?

2011-04-29 Thread Jason Rutherglen
Good question, you could be correct about that. It's possible that part hasn't been built yet? If not then you could create a patch? On Thu, Apr 28, 2011 at 10:13 PM, Andy wrote: > > --- On Fri, 4/29/11, Jason Rutherglen wrote: > >> It's answered on the wiki si

Re: Can the Suggester be updated incrementally?

2011-04-28 Thread Jason Rutherglen
It's answered on the wiki site: "TSTLookup - ternary tree based representation, capable of immediate data structure updates" Although the EdgeNGram technique is probably more widely adopted, eg, it's closer to what Google has implemented. http://www.lucidimagination.com/blog/2009/09/08/auto-sugg

Re: Search across related/correlated multivalue fields in Solr

2011-04-27 Thread Jason Rutherglen
Renaud, Can you provide a brief synopsis of how your system works? Jason On Wed, Apr 27, 2011 at 11:17 AM, Renaud Delbru wrote: > Hi, > > you might want to look at the SIREn plugin [1,2], which allows you to index > and query 1:N relationships such as yours, in a tabular data format [3]. > > [1

Re: Updates during Optimize

2011-04-12 Thread Jason Rutherglen
You can index and optimize at the same time. The current limitation or pause is when the ram buffer is flushing to disk, however that's changing with the DocumentsWriterPerThread implementation, eg, LUCENE-2324. On Tue, Apr 12, 2011 at 8:34 AM, Shawn Heisey wrote: > On 4/12/2011 6:21 AM, stockii

Re: solr on the cloud

2011-03-25 Thread Jason Rutherglen
Dmitry, If you're planning on using HBase you can take a look at https://issues.apache.org/jira/browse/HBASE-3529 I think we may even have a reasonable solution for reading the index [randomly] out of HDFS. Benchmarking'll be implemented next. It's not production ready, suggestions are welcome.

Re: If statements in DataImportHandler?

2011-03-10 Thread Jason Rutherglen
Right that's not within the XML however, and it's unclear how to access the upper level entities that have already been instantiated, eg, beyond the given 'transform' row. On Thu, Mar 10, 2011 at 8:02 PM, Gora Mohanty wrote: > On Fri, Mar 11, 2011 at 4:48 AM, Jason Ruther

If statements in DataImportHandler?

2011-03-10 Thread Jason Rutherglen
Is it possible to conditionally load sub-entities in DataImportHandler, based on the gathered value of parent entities?

Re: NRT and warmupTime of filterCache

2011-03-10 Thread Jason Rutherglen
> - yes, i think so, thats the reason because i dont understand the > wiki-article ... Maybe the article is out of date? I think it's grossly inefficient to warm the searchers at all in the NRT case. Queries are being performed across *all* segments, even though there should only be 1 that's new

Re: NRT in Solr

2011-03-10 Thread Jason Rutherglen
lution. > > Thanks. > > On 3/9/11 3:29 PM, "Smiley, David W." wrote: > >>Zoie adds NRT to Solr: >>http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin >> >>I haven't tried it yet but looks cool. >> >>~ David Smiley >>Aut

Re: True master-master fail-over without data gaps (choosing CA in CAP)

2011-03-09 Thread Jason Rutherglen
Doesn't Solandra partition by term instead of document? On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. wrote: > I was just about to jump in this conversation to mention Solandra and go fig, > Solandra's committer comes in. :-)   It was nice to meet you at Strata, Jake. > > I haven't dug into t

Re: Solr Hanging all of sudden with update/csv

2011-03-09 Thread Jason Rutherglen
You will need to cap the maximum segment size using LogByteSizeMergePolicy.setMaxMergeMB. As then you will only have segments that are of an optimal size, and Lucene will not try to create gigantic segments. I think though on the query side you will run out of heap space due to the terms index si

Re: True master-master fail-over without data gaps

2011-03-09 Thread Jason Rutherglen
This is why there's block cipher cryptography. On Wed, Mar 9, 2011 at 9:11 AM, Otis Gospodnetic wrote: > On disk, yes, but only indexed, and thus far enough from the original content > to > make storing terms in Lucene's inverted index. > > Otis > > Sematext :: http://sematext.com/ :: Solr

Re: True master-master fail-over without data gaps

2011-03-09 Thread Jason Rutherglen
> Oh, there is no DB involved.  Think of a document stream continuously coming > in, > a component listening to that stream, grabbing docs, and pushing it to > master(s). I don't think Solr is designed for this use case, eg, I wouldn't expect deterministic results with the current architecture as

Re: True master-master fail-over without data gaps

2011-03-09 Thread Jason Rutherglen
If you're using the delta import handler the problem would seem to go away because you can have two separate masters running at all times, and if one fails, you can then point the slaves to the secondary master, that is guaranteed to be in sync because it's been importing from the same database? O

Re: NRT and warmupTime of filterCache

2011-03-09 Thread Jason Rutherglen
I think it's best to turn the warmupCount to zero because usually there isn't time in between the creation of a new searcher to run the warmup queries, eg, it'll negatively impact the desired goal of low latency new index readers? On Wed, Mar 9, 2011 at 3:41 AM, stockii wrote: > I tried to create

Re: NRT in Solr

2011-03-09 Thread Jason Rutherglen
Jae, NRT hasn't been implemented NRT as of yet in Solr, I think partially because major features such as replication, caching, and uninverted faceting suddenly are no longer viable, eg, it's another round of testing etc. It's doable, however I think the best approach is a separate request call pa

Re: Solr Hanging all of sudden with update/csv

2011-03-08 Thread Jason Rutherglen
> The index size itself is about 270Gb, (we are hopping to support upto > 500-1TB), and have supplied the system with ~3TB diskspace. That's simply massive for a single node. When the system tries to merge the segments the queries are probably not working? And the merges will take quite a while.

Passing parameters to DataImportHandler

2011-02-15 Thread Jason Rutherglen
It'd be nice to be able to pass HTTP parameters into DataImportHandler that'd be passed into the SQL as parameters, is this possible?

Re: Search for social networking sites

2011-01-21 Thread Jason Rutherglen
Out of curiousity, how would Lucandra help in the NRT use case? On Thu, Jan 20, 2011 at 11:42 PM, Espen Amble Kolstad wrote: > I haven't tried myself, but you could look at solandra : > https://github.com/tjake/Lucandra > > - Espen > > On Thu, Jan 20, 2011 at 6:30 PM, stockii wrote: >> >> http:/

Re: salvaging uncommitted data

2011-01-18 Thread Jason Rutherglen
tarted the process yet. > if i restart it, will i lose any data that is in memory? if so, is there a > way around it? > is there a way to know if there is any data waiting to be written? (if not, > i will just restart...) > > thanks. > > On Tue, Jan 18, 2011 at 12:23 PM,

Re: salvaging uncommitted data

2011-01-18 Thread Jason Rutherglen
> btw where will i find the writes that have not been committed? are they all > in memory or are they in some temp files somewhere? The writes'll be gone if they haven't been committed yet and the process fails. > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed If it's remov

Re: NRT

2011-01-17 Thread Jason Rutherglen
> How is NRT doing, being used in production? It works and there are not any lingering bugs as it's been available for quite a while. > Which Solr is it in? Per-segment field cache is used transparently by Solr, IndexWriter.getReader is what's not used yet. I'm not sure where per-segment faceti

Re: What can cause segment corruption?

2011-01-11 Thread Jason Rutherglen
Stéphane, I've only seen production index corruption when during merge the process ran out of disk space, or there is an underlying hardware related issue. On Tue, Jan 11, 2011 at 5:06 AM, Stéphane Delprat wrote: > Hi, > > > I'm using Solr 1.4.1 (Lucene 2.9.3) > > And some segments get corrupted

Re: Including Small Amounts of New Data in Searches (MultiSearcher ?)

2011-01-10 Thread Jason Rutherglen
> things are in the NRT work. I don't know how merges work now, in re > multitasking and thread contention. Most of the Solr sites I know of > have much larger indexes than ram and expect everything to work > smoothly. > > Lance > > On Sun, Jan 9, 2011 at 9:18 AM,

Re: Including Small Amounts of New Data in Searches (MultiSearcher ?)

2011-01-09 Thread Jason Rutherglen
> The older MergePolicies followed a strategy which is quite disruptive in an > NRT environment. Can you elaborate as to why (maybe we need to place this in a wiki)? If large merges are running in their own thread, they should not disrupt queries, eg, there won't be CPU contention. The IO conten

Re: Rollback can't be done after committing?

2010-11-14 Thread Jason Rutherglen
The timed deletion policy is a bit too abstract, as is keeping a numbered limit of commit points. How would one know what they're rolling back to when num limit is defined? I think committing to a name and being able to roll back to it in Solr is a good feature to add. On Fri, Nov 12, 2010 at 2:

Re: Deletes writing bytes len 0, corrupting the index

2010-11-14 Thread Jason Rutherglen
?). > > Yes this could be a hardware issue... > > Millions of docs indexed per hour sounds like fun! > > Mike > > On Fri, Nov 5, 2010 at 5:33 PM, Jason Rutherglen > wrote: >>>  can you enable IndexWriter's infoStream >> >> I'd like to however t

Re: Deletes writing bytes len 0, corrupting the index

2010-11-05 Thread Jason Rutherglen
n Fri, Nov 5, 2010 at 1:58 AM, Michael McCandless wrote: > Hmmm... Jason can you enable IndexWriter's infoStream and get the > corruption to happen again and post that (along with "ls -l" output)? > > Mike > > On Thu, Nov 4, 2010 at 5:11 PM, Jason Rutherglen > wro

Re: Deletes writing bytes len 0, corrupting the index

2010-11-04 Thread Jason Rutherglen
I'm still seeing this error after downloading the latest 2.9 branch version, compiling, copying to Solr 1.4 and deploying. Basically as mentioned, the .del files are of zero length... Hmm... On Wed, Oct 13, 2010 at 1:33 PM, Jason Rutherglen wrote: > Thanks Robert, that Jira iss

Re: Deletes writing bytes len 0, corrupting the index

2010-10-13 Thread Jason Rutherglen
9-branch (http://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9/). > > then you get the fix for https://issues.apache.org/jira/browse/LUCENE-2593 > too. > > On Wed, Oct 13, 2010 at 11:37 AM, Jason Rutherglen > wrote: >> We have unit tests for running out of disk space?  Howev

Re: Deletes writing bytes len 0, corrupting the index

2010-10-13 Thread Jason Rutherglen
t index?  Ie, exception on open or on > searching or on CheckIndex? > > Or: do you see a disk-full exception when writing the del file, during > indexing, that does not in fact corrupt the index (this is of course > what I hope you are seeing ;) ). > > Mike > > On Wed, Oct 13, 20

Deletes writing bytes len 0, corrupting the index

2010-10-13 Thread Jason Rutherglen
We have unit tests for running out of disk space? However we have Tomcat logs that fill up quickly and starve Solr 1.4.1 of space. The main segments are probably not corrupted, however routinely now, there are deletes files of length 0. 0 2010-10-12 18:35 _cc_8.del Which is fundamental index co

Re: Autosuggest with inner phrases

2010-10-02 Thread Jason Rutherglen
This's what yer lookin' for: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ On Sat, Oct 2, 2010 at 3:14 AM, sivaprasad wrote: > > Hi , > I implemented the auto suggest using terms component.But the suggestions are > coming from the starting of

Re: Autocomplete: match words anywhere in the token

2010-09-22 Thread Jason Rutherglen
This may be what you're looking for. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ On Wed, Sep 22, 2010 at 4:41 AM, Arunkumar Ayyavu wrote: > It's been over a week since I started learning Solr. Now, I'm using the > electronics store example t

Re: Can I tell Solr to merge segments more slowly on an I/O starved system?

2010-09-19 Thread Jason Rutherglen
Here's the remainder of the discussion, albeit, brief: http://www.lucidimagination.com/search/document/d6fa7b3241ed11b8/throttling_merges#9df776e79da71044 On Sun, Sep 19, 2010 at 12:04 AM, Ron Mayer wrote: > My system which has documents being added pretty much > continually seems pretty well beh

Re: Can I tell Solr to merge segments more slowly on an I/O starved system?

2010-09-19 Thread Jason Rutherglen
Ron, IO throttling was discussed a while back however I don't think it was implemented. For systems that search on indexes where indexing is happening on the same server, reducing IO contention would be useful. Here is a somewhat similar issue for merging segments: https://issues.apache.org/jira/

Re: Shingle filter factory and the min shingles

2010-09-14 Thread Jason Rutherglen
And here's the issue... https://issues.apache.org/jira/browse/SOLR-1740 On Tue, Sep 14, 2010 at 6:08 PM, Jason Rutherglen wrote: > To answer my own question, and this sucks :)  the minShingleSize isn't > set in at least 1.4.2.  I'm guessing a later version though? > >

Re: Shingle filter factory and the min shingles

2010-09-14 Thread Jason Rutherglen
To answer my own question, and this sucks :) the minShingleSize isn't set in at least 1.4.2. I'm guessing a later version though? On Tue, Sep 14, 2010 at 5:49 PM, Jason Rutherglen wrote: > positionIncrementGap="100"> > > > > words="stopwords.

Shingle filter factory and the min shingles

2010-09-14 Thread Jason Rutherglen
I'm using for a field, indexing, then looking at the terms component. I'm seeing shingles that consist of only 2 terms, whereas I'm expecting all the terms to be at least 4 terms... What's up? Thanks.

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-12 Thread Jason Rutherglen
tch work with Lucene > 2.9/branch_3x? > > Thanks, > Peter > > > > On Mon, Sep 13, 2010 at 12:05 AM, Jason Rutherglen > wrote: >> Peter, >> >> Are you using per-segment faceting, eg, SOLR-1617?  That could help >> your situation. >> >> On

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-12 Thread Jason Rutherglen
Peter, Are you using per-segment faceting, eg, SOLR-1617? That could help your situation. On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge wrote: > Hi, > > Below are some notes regarding Solr cache tuning that should prove > useful for anyone who uses Solr with frequent commits (e.g. <5min). > >

Re: Solr + Katta ... benefits?

2010-09-04 Thread Jason Rutherglen
Katta can be used for managing shards that are built and live in HDFS. On Fri, Sep 3, 2010 at 10:29 AM, thiseye wrote: > > I'm investigating using Lucene for a project to index a massive HBase > database. I was looking at using Katta to distribute the index because > people have said that becomes

Re: Auto Suggest

2010-09-04 Thread Jason Rutherglen
et.prefix=mou&facet.field=term_suggest&qt=basic&wt=javabin&rows=0&version=1 > > > Jason Rutherglen wrote: >> >> To clarify, the query analyzer returns that.  Variations such as >> "apple mou" also do not return anything.  Maybe Jay can comment

Re: Auto Suggest

2010-09-04 Thread Jason Rutherglen
/> is the bit missing i think here > > This way the search is agnostic to case and any non-alphanum chars, this was > to facilitate a location autocomplete for searching > > So is was a basic search, returning the top N results along with additional > info to show in the autocomp

Re: Auto Suggest

2010-09-03 Thread Jason Rutherglen
To clarify, the query analyzer returns that. Variations such as "apple mou" also do not return anything. Maybe Jay can comment and then we can amend the article? On Fri, Sep 3, 2010 at 6:12 AM, Jason Rutherglen wrote: > Analysis returns "app mou". > > On Thu,

Re: Auto Suggest

2010-09-03 Thread Jason Rutherglen
Analysis returns "app mou". On Thu, Sep 2, 2010 at 6:12 PM, Lance Norskog wrote: > What does analysis.jsp show? > > On Thu, Sep 2, 2010 at 5:53 AM, Jason Rutherglen > wrote: >> I'm having a different issue with the EdgeNGram technique described >> here:

Re: Auto Suggest

2010-09-02 Thread Jason Rutherglen
I'm having a different issue with the EdgeNGram technique described here: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ That is one word queries q=app on the query_text field, work fine however "q=app mou" do not. Why would this be or is ther

Re: Total number of terms in an index?

2010-07-28 Thread Jason Rutherglen
Tom, The total number of terms... Ah well, not a big deal, however yes the flex branch does expose this so we can show this in Solr at some point, hopefully outside of Solr's Luke impl. On Tue, Jul 27, 2010 at 9:27 AM, Burton-West, Tom wrote: > Hi Jason, > > Are you looking for the total number

Re: Total number of terms in an index?

2010-07-26 Thread Jason Rutherglen
Sorry, like the subject, I mean the total number of terms. On Mon, Jul 26, 2010 at 4:03 PM, Jason Rutherglen wrote: > What's the fastest way to obtain the total number of docs from the > index?  (The Luke request handler takes a long time to load so I'm > looking for something else). >

Total number of terms in an index?

2010-07-26 Thread Jason Rutherglen
What's the fastest way to obtain the total number of docs from the index? (The Luke request handler takes a long time to load so I'm looking for something else).

Re: solr with hadoop

2010-07-06 Thread Jason Rutherglen
> If you do distributed indexing correctly, what about updating the documents > and what about replicating them correctly? Yes, you can do you and it'll work great. On Mon, Jul 5, 2010 at 7:42 AM, MitchK wrote: > > I need to revive this discussion... > > If you do distributed indexing correctly,

Re: anyone use hadoop+solr?

2010-06-22 Thread Jason Rutherglen
We (Attensity Group) have been using SOLR-1301 for 6+ months now because we have a ready Hadoop cluster and need to be able to re/index up to 3 billion docs. I read the various emails and wasn't sure what you're asking. Cheers... On Tue, Jun 22, 2010 at 8:27 AM, Neeb wrote: > > Hey James, > > J

TrieRange for storage of dates

2010-06-09 Thread Jason Rutherglen
What is the best practice? Perhaps we can amend the article at http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ to include the recommendation (ie, dates are commonly unique). I'm assuming using a long is the best choice.

Re: Different mergeFactor for master and slaves

2010-06-03 Thread Jason Rutherglen
Kris, That wouldn't do anything because all merging occurs on the master. Jason On Thu, Jun 3, 2010 at 6:25 AM, Kris Jack wrote: > > Hi everyone, > > I have set up a master-slave configuration where the master machine will be > used primarily for indexing while the slave machines will be used f

Inserting shards in overridden SearchComponent prepare method yields null pointer

2010-06-01 Thread Jason Rutherglen
The insert shards code is as follows: ModifiableSolrParams modParams = new ModifiableSolrParams(params); modParams.set("shards", shards); rb.req.setParams(modParams); Where shards is a valid single shard pseudo URL. Stacktrace: HTTP Status 500 - null java.lang.NullPointerException at org.apache

Re: ApacheCon CFP Closes on Friday

2010-05-26 Thread Jason Rutherglen
Grant, the link's broken? http://blogs.apache.org/conferences/date/20100428 Unexpected Exception Status Code 500 Message You have closed the EntityManager, though the persistence context will remain active until the current transaction commits. Type Exception Roller has enco

Re: How real-time are Solr/Lucene queries?

2010-05-25 Thread Jason Rutherglen
The main issue is if you're using facets, which are currently inefficient for the realtime use case because they're created on the entire set of segment/readers. Field caches in Lucene are per segment and so don't have this problem. On Tue, May 25, 2010 at 4:09 AM, Grant Ingersoll wrote: > How m

Cleanup of indexes using Solr 1.4 Replication

2010-05-12 Thread Jason Rutherglen
Is the cleanup of indexes using Solr 1.4 Replication documented somewhere? I can't find any information regarding this at: http://wiki.apache.org/solr/SolrReplication Too many snapshot indexes are being left around, and so they need to be cleaned up.

Re: Replicate cores from master to slave

2010-04-28 Thread Jason Rutherglen
cores. > > Just throwing that out there since Im not sure even the ZooKeeper setup would > include something like this. > > - Jon > > On Apr 28, 2010, at 10:14 AM, Jason Rutherglen wrote: > >> I guess I didn't explain it properly. I want to create a core on >&g

Re: Specify the spellchecker by name in the request URL?

2010-04-28 Thread Jason Rutherglen
Ahmet, thanks, however it's un-intuitive, it should be spellchecker.name? On Wed, Apr 28, 2010 at 12:01 PM, Ahmet Arslan wrote: >> Multiple spellcheckers may be >> specified by name in solrconfig, such as >> "jarowinkler", however >> how does one make a >> request to this particular spellchecker,

Specify the spellchecker by name in the request URL?

2010-04-28 Thread Jason Rutherglen
Multiple spellcheckers may be specified by name in solrconfig, such as "jarowinkler", however how does one make a request to this particular spellchecker, as opposed to the one named "default"?

Re: Replicate cores from master to slave

2010-04-28 Thread Jason Rutherglen
28, 2010 at 10:14 AM, Jason Rutherglen > wrote: >> I guess I didn't explain it properly. I want to create a core on >> the master, and then have N slaves also (aka replicate) create >> those new core(s) on the slave servers, then of course, begin to >> replicate (ye

Re: Replicate cores from master to slave

2010-04-28 Thread Jason Rutherglen
I guess I didn't explain it properly. I want to create a core on the master, and then have N slaves also (aka replicate) create those new core(s) on the slave servers, then of course, begin to replicate (yeah, got that part). There doesn't appear to be anything today that does this, it's unclear ho

  1   2   3   >