Re: Faceting on a multi-valued field by index

2010-04-08 Thread Lance Norskog
Nope! Lucene is committed to maintaining the order of values added to a field, but does not have this feature. On Thu, Apr 8, 2010 at 6:44 PM, Blargy wrote: > > Is there anyway to facet on a multi-valued field at a particular index? > > For example, I have a field category_ids which is multi-valu

Re: [search_dev] Re: Opinions on Facet+Fulltext behavior?

2010-04-08 Thread Lance Norskog
This is how http://www.lucidimagination.com/search works. On Thu, Apr 8, 2010 at 2:54 PM, Yonik Seeley wrote: > On Thu, Apr 8, 2010 at 5:44 PM, Mark Bennett wrote: >> A while back I had asked about the proper behavior when combining fulltext >> search with Tags or Faceted filtering.  Generally f

Re: Index "transaction log" or equivalent?

2010-04-08 Thread Lance Norskog
Log everything to SLF? SLF & Log4j include a key/value pair map for each logging event. This is perfect for storing the fields, and log4j has a thing that writes raw logging event objects to a target server: http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/net/SocketServer.html All yo

Faceting on a multi-valued field by index

2010-04-08 Thread Blargy
Is there anyway to facet on a multi-valued field at a particular index? For example, I have a field category_ids which is multi-valued containing category ids. The first value in that field is always the root category and I would like to be able to facet on just that one field. Is this possible w

Re: Is there any other tool other than DIH to index a database

2010-04-08 Thread Lance Norskog
Nice! On Thu, Apr 8, 2010 at 6:50 AM, Brendan Grainger wrote: > For what it's worth, it's also really easy to implement your own > EntityProcessor. Extend from EntityProcessorBase then implement the getNext > method to return a Map representing the row you want indexed. > I did exactly this so

Re: Multi-core memory problem

2010-04-08 Thread Lance Norskog
Since the facet "cache" is hard-allocated and has not eviction policy, you could do a facet query on each core as part of the wam-up. This way, the facets will not fail. At that point, you can tune the Solr cache sizes. Solr caches documents, searches, and filter queries. Filter queries are sets

Re: use a solr-built index with lucene?

2010-04-08 Thread Erik Hatcher
Yes... gotta jive with schema.xml though. Erik On Apr 8, 2010, at 7:18 PM, Tommy Chheng wrote: If i build an index with solr, is it possible to use the index folder with lucene? -- Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com

autocommiting with expungeDeletes=true

2010-04-08 Thread Giovanni Fernandez-Kincade
Is there any way to configure autocommit to expungeDeletes? Looking at the code it seems to be that there isn't... >From org.apache.solr.update.DirectUpdateHandler2: public synchronized void run() { long started = System.currentTimeMillis(); try { CommitUpdateCommand command =

Re: Elevate query and standard RH

2010-04-08 Thread Chris Hostetter
: I found elevate query working fine with dismax handler when i added the : searchComponent to my Dismax RH. : : Couldn't find the desired results when trying with the standard : RequestHandler. Hope it works just like that with the Standard RH also. did you add it to the declaration for your st

Re: MoreLikeThis function queries

2010-04-08 Thread Blargy
Not yet I've been stuck trying to figure out what the hell is happening with my delta-imports: http://n3.nabble.com/Need-help-with-StackOverflowError-td704451.html#a704451 -- View this message in context: http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p707308.html Sent from the Solr

Re: MoreLikeThis function queries

2010-04-08 Thread Chris Hostetter
: Are function queries possible using the MLT request handler? How about using : the _val_ hack? Thanks for your help Did you try it? The MoreLikeThisHandler uses the same parsing logic to build it's initial query as the SearchHandler, so you can use whatever parsers you want (including the "{

use a solr-built index with lucene?

2010-04-08 Thread Tommy Chheng
If i build an index with solr, is it possible to use the index folder with lucene? -- Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com

Re: Minimum Should Match the other way round

2010-04-08 Thread Chris Hostetter
: However, I got some doubts on this: What about queries that should be : filtered with the WordDelimiterFilter. This could make a large difference to : a none-delimiter-filtered MAX_LEN *and* it has got a protwords param. I : can't instantiate a new WordDelimiterFilter everytime I do a query, so

Re: [search_dev] Re: Opinions on Facet+Fulltext behavior?

2010-04-08 Thread Yonik Seeley
On Thu, Apr 8, 2010 at 5:44 PM, Mark Bennett wrote: > A while back I had asked about the proper behavior when combining fulltext > search with Tags or Faceted filtering.  Generally folks agree that > Tags/Facets should further filter search search results.  The search text > should be "sticky" whe

Re: [search_dev] Re: Opinions on Facet+Fulltext behavior?

2010-04-08 Thread Mark Bennett
A while back I had asked about the proper behavior when combining fulltext search with Tags or Faceted filtering. Generally folks agree that Tags/Facets should further filter search search results. The search text should be "sticky" when facets are clicked. The issue was whether selected Tag or

Tutorials for developing filter plugins.

2010-04-08 Thread Michael
Hi all, I was wondering whether any of you know any good tutorials online describing how to develop a custom filter plugin. I have been trying to create a latitude/longitude bounding box filter using two NumericRangeFilters in a ChainedFilter object to no avail. No documents are returned by getDoc

Re: Handling missing date fields in a date-oriented function query

2010-04-08 Thread Chris Harris
If anyone is curious, I've created a patch that creates a variant of map that can be used in the way indicated below. See http://issues.apache.org/jira/browse/SOLR-1871 On Wed, Apr 7, 2010 at 3:41 PM, Chris Harris wrote: > Option 1. Use map > > The most obvious way to do this would be to wrap th

Re: including external files in config by corename

2010-04-08 Thread Chris Hostetter
: I'm having a problem with this idea. It seems that what you include with : XInclude can only be a single XML element, not "big chunks" as I had hoped. : If you have more than one, it dies. It took running it through xmllint to : figure out why, the Solr exception was not informative. This wil

Re: Need help with StackOverflowError

2010-04-08 Thread Blargy
Also, If i remove my deletedPkQuery on the root entity the delta-import will complete successfully. Does anyone have any idea how a deletedPkQuery would end up in this circular StackOverflowError? FYI. I have a logical model called "item" and whenever an item gets deleted it gets moved over to t

Re: index corruption / deployment strategy

2010-04-08 Thread Erik Hatcher
Kallin, It's a very rare report, and practically impossible I'm told, to corrupt the index these days thanks to Lucene's improvements over the last several releases (ignoring hardware malfunctions). A single index is the best way to go, in my opinion - though at your scale you're probably

index corruption / deployment strategy

2010-04-08 Thread Nagelberg, Kallin
Hi everyone, I've been doing work evaluating Solr for use on a hightraffic website for sometime and things are looking positive. I have some concerns from my higher-ups that I need to address. I have suggested that we use a single index in order to keep things simple, but there are suggestions

Re: Handling missing date fields in a date-oriented function query

2010-04-08 Thread Chris Harris
On Wed, Apr 7, 2010 at 7:10 PM, Lance Norskog wrote: >> Since min(a,b) == -1*max(-1*a, -1*b), you could rewrite the previous >> expression using this more complicated logic and it would work. But >> that's ugly. >> >> Also, it would crash anyway. It looks like max currently requires one >> of its

Re: Index "transaction log" or equivalent?

2010-04-08 Thread Erik Hatcher
And there's an open issue where this sort of feature can be contributed: https://issues.apache.org/jira/browse/SOLR-903 Though in that issue there are a two different approaches mentioned, one being purely SolrJ client-side (my original intention in opening the issue), but also what Mark m

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-08 Thread Erick Erickson
Thanks, that'll keep me from running down the wrong path Erick On Thu, Apr 8, 2010 at 11:22 AM, Robert Muir wrote: > right, its fixed only in the "new trunk": > http://svn.apache.org/repos/asf/lucene/dev/trunk/ > > nothing has been changed with regards to the solr 1.5 branch yet. > > On Thu

Re: Index "transaction log" or equivalent?

2010-04-08 Thread Rich Cariens
Thanks Mark. That's sort of what I was thinking of doing. On Thu, Apr 8, 2010 at 10:33 AM, Mark Miller wrote: > On 04/08/2010 09:23 AM, Rich Cariens wrote: > >> Are there any best practices or built-in support for keeping track of >> what's >> been indexed in a Solr application so as to support

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-08 Thread Robert Muir
right, its fixed only in the "new trunk": http://svn.apache.org/repos/asf/lucene/dev/trunk/ nothing has been changed with regards to the solr 1.5 branch yet. On Thu, Apr 8, 2010 at 10:01 AM, Erick Erickson wrote: > Your're right, it sure looks related. But according to that JIRA, it's > fixed >

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-08 Thread Erick Erickson
I'm not all that familiar with the underlying issues, but of the two I'd pick moving the WordDelimiterFactory rather than setting increments = "false". But that's at least partly a guess Best Erick On Thu, Apr 8, 2010 at 11:00 AM, Demian Katz wrote: > Thanks for looking into this -- I appre

RE: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-08 Thread Demian Katz
Thanks for looking into this -- I appreciate the help (and feel a little better that there seems to be a bug at work here and not just my total incomprehension). Sorry for any confusion over the UnicodeNormalizationFactory -- that's actually a plug-in from the SolrMarc project (http://code.goog

Re: Index "transaction log" or equivalent?

2010-04-08 Thread Mark Miller
On 04/08/2010 09:23 AM, Rich Cariens wrote: Are there any best practices or built-in support for keeping track of what's been indexed in a Solr application so as to support a full rebuild? I'm not indexing from a single source, but from many, sometimes arbitrary, sources including: 1. A doc

Re: Short Question: Fills this entity multiValued Fields (DIH)?

2010-04-08 Thread MitchK
Thank you Alexey. -- View this message in context: http://n3.nabble.com/Short-Question-Fills-this-entity-multiValued-Fields-DIH-tp703245p706111.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: numFound:0 when documents exists

2010-04-08 Thread Erick Erickson
We can't help with the information you've provided. Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Thu, Apr 8, 2010 at 7:23 AM, Pooja Verlani wrote: > Hi, > In our search engine, we are getting numFound to be "0" for some queries > where documents actually exist and a

RE: solr best practice to submit many documents

2010-04-08 Thread Wawok, Brian
As a follow up for anyone that is watching.. I changed from post via 100,000 separate posts in python -> stream with the java client (not sure if its xml or csv behind the scenes), and my time to send the data dropped from 340 seconds to 88 seconds. May try and figure out a better streaming

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-08 Thread Erick Erickson
Your're right, it sure looks related. But according to that JIRA, it's fixed in trunk and I'm pretty sure I have a very recent version that I built from code I updated within the last few days. I'll update tonight and double check. If it's still a problem I'll see if I can write a test case illust

Re: Is there any other tool other than DIH to index a database

2010-04-08 Thread Brendan Grainger
For what it's worth, it's also really easy to implement your own EntityProcessor. Extend from EntityProcessorBase then implement the getNext method to return a Map representing the row you want indexed. I did exactly this so I could use reuse my hibernate domain models to query for the data ins

Index "transaction log" or equivalent?

2010-04-08 Thread Rich Cariens
Are there any best practices or built-in support for keeping track of what's been indexed in a Solr application so as to support a full rebuild? I'm not indexing from a single source, but from many, sometimes arbitrary, sources including: 1. A document repository that fires events (containing

Re: Is there any other tool other than DIH to index a database

2010-04-08 Thread Shawn Heisey
On 4/7/2010 9:26 PM, bbarani wrote: Hi, I am currently using DIH to index the data from a database. I am just trying to figure out if there are any other open source tools which I can use just for indexing purpose and use SOLR for querying. I also thought of writing a custom code for retrieving

Re: Solr DataImportHandler

2010-04-08 Thread Shawn Heisey
On 4/8/2010 7:05 AM, Shawn Heisey wrote: Here's what I'm using as the query in my latest config: Actually, that was three separate queries: query="SELECT * FROM ${dataimporter.request.dataTable} WHERE did > ${dataimporter.request.minDid} AND did <= ${dataimporter.request.maxDid} AND (did %

Re: Solr DataImportHandler

2010-04-08 Thread Shawn Heisey
On 4/8/2010 2:11 AM, Mark N wrote: Is it possible to use solr DataImportHandler when that database fields are not fixed ? As per my findings we need to configure which table ( entity) we will read the data and must match which fields in database will map to fields in solr schema Since in my cas

Re: Error sorting random field with Solr 1.4

2010-04-08 Thread SandeepTagore
Commented the following line in org.apache.lucene.search.SortField return comparatorSource.newComparator(field, numHits, sortPos, reverse); and added the following line temporarily. return new FieldComparator.DoubleComparator(numHits, field, parser); Changed geo_distance type to double from sdoub

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-08 Thread Robert Muir
Erick, this sounds like https://issues.apache.org/jira/browse/SOLR-1852 On Wed, Apr 7, 2010 at 10:04 PM, Erick Erickson wrote: > Well, for a quick trial using trunk, I had to remove the > UnicodeNormalizationFactory, is that yours? > > But with that removed, I get the results you do, ASSUMING tha

Re: Spatial / Local Solr radius

2010-04-08 Thread SandeepTagore
I faced the same problem when i used locallucene 1.5 and localsolr 1.5. Now I am using localsolr 2.0 and locallucene 2.0 and I dont see that issue. You need to upgrade the binaries. -- View this message in context: http://n3.nabble.com/Spatial-Local-Solr-radius-tp487813p705744.html Sent from the

Re: Error sorting random field with Solr 1.4

2010-04-08 Thread SandeepTagore
I get the same error when I try to sort the result by geo_distance. I am using Solr 1.4 (Nov 2009 release), lucene 2.9.1 and localsolr 2.0. Thank you very much for your support. Here is the stacktrace... SEVERE: java.lang.NullPointerException at org.apache.lucene.search.SortField.getCompa

numFound:0 when documents exists

2010-04-08 Thread Pooja Verlani
Hi, In our search engine, we are getting numFound to be "0" for some queries where documents actually exist and also they are returned too. It randomly sometimes returns numfound="0". Does any one has an idea what can be the possible reason for the same? Regards, Pooja

Re: Tag Cloud Generation Problem

2010-04-08 Thread Ninad Raut
Thanks it worked... with some innovations of my own. :) On Thu, Apr 8, 2010 at 2:25 PM, Markus Jelsma wrote: > Hi, > > > It's simpler than you might think :) > > ?q=*:*&facet=true&facet.field=buzzWord&rows=0 > > This will retrieve an overall facet count (useful for navigation and tag > cloud > g

Re: Multi-core memory problem

2010-04-08 Thread Victoria Kagansky
I noticed now that the OutOfMemory exception occurs upon faceting queries. Queries without facets do return successfully. There are two log types upon the exception. The queries causing them differ only in q parameter, the faceting and sorting parameters are the same. I guess this has something to

Berlin Buzzwords - early registration extended

2010-04-08 Thread Isabel Drost
Hello, we would like to invite everyone interested in data storage, analysis and search to join us for two days on June 7/8th in Berlin for an in-depth, technical, developer-focused conference located in the heart of Europe. Presentations will range from beginner friendly introductions on the

Re: Short Question: Fills this entity multiValued Fields (DIH)?

2010-04-08 Thread Alexey Serba
> Have a look at these two lines: > > >                 > > > If there is more than one description per item_ID, does the features-field > gets multiple values if it is defined as multiValued=true? Correct.

Re: Tag Cloud Generation Problem

2010-04-08 Thread Markus Jelsma
Hi, It's simpler than you might think :) ?q=*:*&facet=true&facet.field=buzzWord&rows=0 This will retrieve an overall facet count (useful for navigation and tag cloud generation) but doesn't return the documents themselves. Check the facetting wiki [1] for more information. [1]: http://wiki

Re: Tag Cloud Generation Problem

2010-04-08 Thread Ninad Raut
Hi Markus, But the problem is, we donot know the words before hand. What will be the facet Query be? If you can just explain me with an example it would be really nice of you. Regards, Ninad R On Thu, Apr 8, 2010 at 2:09 PM, Markus Jelsma wrote: > The facetting engine can do this job. > > > >

Re: Tag Cloud Generation Problem

2010-04-08 Thread Markus Jelsma
The facetting engine can do this job. On Thursday 08 April 2010 10:16:09 Ninad Raut wrote: > Hi, > > I have a business use case where in I have to generate a tagcloud for words > with freequency greater than a specified threshold. > > The way I store records in solr is : > For every solr docum

Tag Cloud Generation Problem

2010-04-08 Thread Ninad Raut
Hi, I have a business use case where in I have to generate a tagcloud for words with freequency greater than a specified threshold. The way I store records in solr is : For every solr document (which includes content) I store mutlivalued entry of buzzwords with their frequency. The technical pr

Solr DataImportHandler

2010-04-08 Thread Mark N
Is it possible to use solr DataImportHandler when that database fields are not fixed ? As per my findings we need to configure which table ( entity) we will read the data and must match which fields in database will map to fields in solr schema Since in my case database fields could be dynamic ,

Separate index files

2010-04-08 Thread S Goorov
Hello, there - How often in Solr used possibility to store index in separate files for different things, for example, products (at the one Solr instance)? The aim is maintain separate files for backup, independent re-indexing, something else(?). And in what extent useful that solutions? Thanks Se

Re: Multi-core memory problem

2010-04-08 Thread Victoria Kagansky
The queries do require sorting (on int) and faceting. They should fetch first 200 docs. The current problematic core has 10 entries in fieldCache and 5 entries in filterCache. The other caches are empty. Is there any way to know how much memory specific cache takes? The problem is that one core b