Autocommit & Index Size

2011-12-06 Thread Husain, Yavar
In solrconfig.xml I was experimenting with Indexing Performance. When I set the maxDocs (in autoCommit) to say 1 documents the index size is double to if I just dont use autoCommit (i.e. keep it commented, i.e commit at the end only after adding documents). Does autoCommit affect the index

Re: [Announce] Solr-RA, Solr with RankingAlgorithm

2011-12-06 Thread yu shen
Hi Nagendra, I tried to use solr-nrt-ra-3.4, while the dataimporthandler does not work. The error message is: INFO: created /dataimport: org.apache.solr.handler.dataimport.DataImportHandler Dec 6, 2011 1:16:18 AM org.apache.solr.common.SolrException log SEVERE: java.lang.IncompatibleClassChangeEr

LineEntityProcessor

2011-12-06 Thread Oleg Tikhonov
Hello everybody, I'm trying to use LineEntityProcessor of DIH but somehow without success. I've create data-lep-config.xml, added request handler in solrconfig.xml. During full-import I get a response saying that x rows were fetched, 0 docs added/updated. I defined also very basic regex for Reg

lower score for synonyms

2011-12-06 Thread Robert Brown
is it possible to lower the score for synonym matches? we setup... admin => administration but if someone searches specifically for "admin", we want those specific matches to rank higher than matches for "administration" -- IntelCompute Web Design & Local Online Marketing http://www.inte

highlight 1 field twice

2011-12-06 Thread Robert Brown
When searching against 1 field, is it possible to have highlighting returned 2 different ways? We'd like the full field returned with keywords highlighted, but then also returned as snippets. Any possible approaches? -- IntelCompute Web Design & Local Online Marketing http://www.intelcomp

Testing a custom implementation of CommonsHttpSolrServer

2011-12-06 Thread Mark Swinson
Hi, I want to test a custom implementation CommonsHttpSolrServer, which is required so that we can enable it to use SSL certificates and proxies when accessing the Solr REST api. One thing I want to avoid is having to have a Solr instance set up on every developers sandbox in order for the tests

Re: lower score for synonyms

2011-12-06 Thread Marc SCHNEIDER
Hello, You could create an other field and link to it the synonym analyzer. When querying set a lower boost for this field. Marc. On Tue, Dec 6, 2011 at 11:31 AM, Robert Brown wrote: > is it possible to lower the score for synonym matches? > > we setup... > > admin => administration > > but if

Re: highlight 1 field twice

2011-12-06 Thread Erik Hatcher
Within one request, it isn't possible to highlight the same field twice differently (what's the use case here?), but you could either make multiple requests or copyField to have two stored copies that could be highlighted separately in a single request. Erik On Dec 6, 2011, at 06:01 ,

Re: Testing a custom implementation of CommonsHttpSolrServer

2011-12-06 Thread Erik Hatcher
Mark - So you want the *server* to be started programmatically? You could use Jetty's API to do this... or fork a JVM. As for client-side SolrJ, you can pass an HttpClient to CommonsHttpSolrServer's constructor to customize how the HTTP connection is configured. EmbeddedSolrServer - no, it i

Solr request handler queries in fiddler

2011-12-06 Thread Kashif Khan
Hi all, I have developed a solr request handler in which i am querying for shards and mergin the results but i do not see any queries in the fiddler. How can i track or capture the queries from the request handler in the fiddler to see the queries and what setting i have to do for that. Please hel

Solr sorting issue : can not sort on multivalued field

2011-12-06 Thread Ramesh kumar Velusamy
Hi, I am getting this weird error message `can not sort on multivalued field: fieldname` on all the indexed fields. This is the full error message from solr HTTP Status 400 - can not sort on multivalued field: pricetype Status reportmessagecan not sort on multivalued field: pricedescripti

Re: Document Processing

2011-12-06 Thread Erik Hatcher
As for XML "overloading" Solr... certainly it will add processing time to the situation as well as additional memory requirements. At worst it'd require more RAM and slow things down, but all depends on scale of ingestion rate and size of the documents whether it'd be prohibitive. Erik

Re: Replication downtime?? - master slave

2011-12-06 Thread Erick Erickson
Replication is basically a background file transfer, your slave shouldn't notice. But what your slave will notice is two things: 1> after replication if your first few queries are slow, you need to autowarm your caches. 2> you will see some memory footprint increase while autowarming is

Re: Grouping or Facet ?

2011-12-06 Thread Erick Erickson
OK, I'm not understanding here. You get the counts and the results if you facet on a single category field. The facet counts are the counts of the *values* in that field. So it would help me if you showed the output of faceting on a single category field and why that didn't work for you But ei

Re: Out of memory during the indexing

2011-12-06 Thread Erick Erickson
I'm going to defer to the folks who actually know the guts here. If you've turned down the cache entries for your Solr caches, you're pretty much left with Lucene caching which is a mystery... Best Erick On Mon, Dec 5, 2011 at 9:23 AM, Jeff Crump wrote: > Yes, and without doing much in the way o

UUID field changed when document is updated

2011-12-06 Thread blaise thomson
Hi, I've been trying to use the UUIDField in solr to maintain ids of the pages I've crawled with nutch (as per http://wiki.apache.org/solr/UniqueKey). The use case is that I want to have the server able to use these ids in another database for various statistics gathering. So I want the link u

Delays when deleting by query

2011-12-06 Thread Mike Gallan
Hello, We're encountering delays of 10+ minutes when trying to delete from our Solr 3.4 instance. We have 335k documents indexed and interface using SolrJ. Our schema basically consists of a parent object with multiple child objects. Every object is indexed as a separate document wit

Re: sub query parsing bug???

2011-12-06 Thread Erick Erickson
Hmmm, does this help? In Solr 1.4 and prior, you should basically set mm=0 if you want the equivilent of q.op=OR, and mm=100% if you want the equivilent of q.op=AND. In 3.x and trunk the default value of mm is dictated by the q.op param (q.op=AND => mm=100%; q.op=OR => mm=0%). Keep in mind the def

Re: Solr Version Upgrade issue

2011-12-06 Thread Mark Miller
Looks like you must have a mix of old and new jars. On Tuesday, December 6, 2011, Pawan Darira wrote: > Hi > > I am trying to upgrade my SOLR version from 1.4 to 3.2. but it's giving me > below exception. I have checked solr home path & it is correct.. Please help > > SEVERE: Could not start Solr

Re: synonyms with dashes '-'

2011-12-06 Thread Erick Erickson
Details matter. Your analysis chain on the field may well be the issue. Look at the terms in the field (admin/schema browser). Look at &debugQuery=on to see how the query is parsed Look at the admin/analysis page to see the effects of the analysis chain. You might review: http://wiki.apache.org/s

Alternate score-based sorting for Solr Grouping

2011-12-06 Thread George Stathis
My previous subject line was not very scannable. Apologies for the re-post, I'm just hoping to get more eye-balls and hopefully some insights. Thank you in advance for your time. See below. -GS On Mon, Dec 5, 2011 at 1:37 PM, George Stathis wrote: > Currently, solr grouping (http://wiki.apache.

Best practice schema.xml when importing rich documents?

2011-12-06 Thread Pål Brattberg
I'm working with SOLR on amainly MS Word, Powerpoint, Excel and PDFs. Is there a best practice schema.xml and/or solrconfig.xml to use in SOLR when using theExtractingRequestHandler? I have been doing tweaks to the default schema to attempt to get facets working on date modification times, but

Re: [Announce] Solr-RA, Solr with RankingAlgorithm

2011-12-06 Thread Nagendra Nagarajayya
Spark: The code is compiled to be compliant with JDK 1.5 and above. So you will need to use at least JDK 1.5 for this to work. BTW, make sure you add the lib path to the dataimporthandler-3.4.0.jar in you solrconfig.xml. If you want your data import to be searchable in real time, please make s

Re: Grouping or Facet ?

2011-12-06 Thread darren
Sorry to jump into this thread, but are you saying that the facet count is not # of result hits? So if I have 1 document with field CAT that has 10 values and I do a query that returns this 1 document with faceting, that the CAT facet count will be 10 not 1? I don't seem to be seeing that behavior

Solr Trunk Changes requires a reindex

2011-12-06 Thread Jamie Johnson
Are there any migration utilities to move from an index built by a Solr 4.0 snapshot to Solr Trunk? The issue is referenced here http://markmail.org/thread/4ruznwzofyrh776j https://issues.apache.org/jira/browse/LUCENE-3490

Re: Question on DIH delta imports

2011-12-06 Thread Mark
Anyone? On 12/5/11 11:04 AM, Mark wrote: *pk*: The primary key for the entity. It is*optional*and only needed when using delta-imports. It has no relation to the uniqueKey defined in schema.xml but they both can be the same. When using in a nested entity is the PK the primary key column of th

Use solr to search in a document repository

2011-12-06 Thread marotosg
Hi. I'm just thinking in the option of using solr to search in a huge document repository. My idea is reading documents(pdf,html,outlook,excel,doc,openoffice,powerpoint...) and extract the information from them and index it in Solr. Basically i'm looking for a solution to search in my documents.

Lucene 4.0 Index Format

2011-12-06 Thread Jamie Johnson
Does anyone know if this has been finalized yet?

Solr Join with Dismax

2011-12-06 Thread Pascal Dimassimo
Hi, I was trying Solr Join across 2 cores on the same Solr installation. Per example: /solr/index1/select?q={!join fromIndex=index2 from=tag to=tag}restaurant My understanding is that the "restaurant" query will be executed on index2 and the results of this query will be joined with the document

Re: Lucene 4.0 Index Format

2011-12-06 Thread Mark Miller
On Tue, Dec 6, 2011 at 12:51 PM, Jamie Johnson wrote: > Does anyone know if this has been finalized yet? > It's subject to change up till release. -- - Mark http://www.lucidimagination.com

Re: Autocommit & Index Size

2011-12-06 Thread Shawn Heisey
On 12/6/2011 1:01 AM, Husain, Yavar wrote: In solrconfig.xml I was experimenting with Indexing Performance. When I set the maxDocs (in autoCommit) to say 1 documents the index size is double to if I just dont use autoCommit (i.e. keep it commented, i.e commit at the end only after adding d

Solr tf ifd

2011-12-06 Thread Nejla Karacan
Hello, I need the tf-idf-values from texts and now Im using Apache-Solr. I am a novice and have some Problems. My question is, how can I extract the tf-idf-values? There are many files in the folder apache-solr-3.5.0\example\solr\data\index but I cant use them. Is the Output only as a xml-Fil

Facet values that should always appear

2011-12-06 Thread Jamie Johnson
Is there a way within Solr to instruct the system that a certain set of values should always appear regardless of their counts when faceting?

Re: Lucene/Solr

2011-12-06 Thread Erick Erickson
If you're not using Drupal, understand that Solr is an *engine*, not a full application. You download solr from the website and install it, which is just basically unpacking it and executing "ant -jar start.jar". From there you send documents to Solr (there are a number of ways to accomplish this).

RE: Autocommit & Index Size

2011-12-06 Thread Husain, Yavar
Hi Shawn Absolutely perfect. It is always great reading your answers again and again as you explain the concepts so very well. Three cheers and thanks for your reply. Regards, Yavar From: Shawn Heisey [s...@elyograg.org] Sent: Wednesday, December 07, 2011

Re: Multivalued field

2011-12-06 Thread Erick Erickson
Then sometime later id (all this in your schema.xml file). That's it. The data field isn't analyzed at all, so the type is largely irrelevant. what you put in it is all your pairs of doubles in some kind of delimited format, e.g. 2345.0, | 873945.7, Now you just get your data field back, spli

Re: Use solr to search in a document repository

2011-12-06 Thread Pål Brattberg
Go for it, it's perfect for that! Here's a good starting point for you: http://lucene.apache.org/solr/tutorial.html / pål On Dec 6, 2011, at 6:31 PM, marotosg wrote: > Hi. > > I'm just thinking in the option of using solr to search in a huge document > repository. > My idea is reading > docum

Re: Solr Join with Dismax

2011-12-06 Thread Jeff Schmidt
Hi Pascal: I have an issue similar to yours, but also need to facet the joined documents... I've been playing with various things. There's not much documentation I can find. Looking at http://wiki.apache.org/solr/Join, in the fourth example you can see the join being relegated to a filter quer

Solr Lucene Index Version

2011-12-06 Thread Jamie Johnson
Is there a way to specify the index version solr uses? We're currently using SolrCloud but with the index format changing I'd be preferable to be able to specify a particular index format to avoid having to do a complete reindex. Is this possible?

Re: Solr Join with Dismax

2011-12-06 Thread Pascal Dimassimo
Hi, Thanks for this! But your "partner-tmo" request handler is probably configured with your ing-content index, no? In my case, I'd like to execute a dismax query on the fromIndex. On Tue, Dec 6, 2011 at 2:57 PM, Jeff Schmidt wrote: > Hi Pascal: > > I have an issue similar to yours, but also ne

Re: Continuous update on progress of "New SolrCloud Design" work

2011-12-06 Thread Per Steffensen
Yonik Seeley skrev: On Mon, Dec 5, 2011 at 6:23 AM, Per Steffensen wrote: Will it be possible to maintain a how-to-use section on http://wiki.apache.org/solr/NewSolrCloudDesign with examples, e.g. like to ones on http://wiki.apache.org/solr/SolrCloud, Yep, it was on my near-term tod

RE: Sharing dih "dictionaries"

2011-12-06 Thread Brent Mills
You're totally correct. There's actually a link on the DIH page now which wasn't there when I had read it a long time ago. I'm really looking forward to 4.0, it's got a ton of great new features. Thanks for the links!! -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddy

Re: Solr Lucene Index Version

2011-12-06 Thread Alireza Salimi
Hi, I'm not sure if it would help. in solrconfig.xml: LUCENE_34 On Tue, Dec 6, 2011 at 3:14 PM, Jamie Johnson wrote: > Is there a way to specify the index version solr uses? We're > currently using SolrCloud but with the index format changing I'd be > preferable to be able to specify a p

Re: Continuous update on progress of "New SolrCloud Design" work

2011-12-06 Thread Per Steffensen
Andy skrev: Hi, add features corresponding to stuff that we used to use in ElasticSearch Does that mean you have used ElasticSearch but decided to try SolrCloud instead? Yes, or at least we are looking for altertives right now. Considering Solandra, SolrCloud, Katta, Riak Search,

Re: Solr Lucene Index Version

2011-12-06 Thread Jamie Johnson
Thanks, but I don't believe that will do it. From my understanding that does not control the index version written, it's used to control the behavior of some analyzers (taken from some googling). I'd love if someone told me otherwise though. On Tue, Dec 6, 2011 at 3:48 PM, Alireza Salimi wrote:

Re: UUID field changed when document is updated

2011-12-06 Thread Chris Hostetter
: I've been trying to use the UUIDField in solr to maintain ids of the : pages I've crawled with nutch (as per : http://wiki.apache.org/solr/UniqueKey). The use case is that I want to : have the server able to use these ids in another database for various : statistics gathering. So I want the l

Re: Solr Lucene Index Version

2011-12-06 Thread Erik Hatcher
Jamie - I think the best thing that you could do here would be to lock in a version of Lucene (all the Lucene libraries) that you use with SolrCloud. Certainly not out of the realm of possibilities of some upcoming SolrCloud capability that requires some upgrading of Lucene though, but you may

RE: Sharing dih "dictionaries"

2011-12-06 Thread Dyer, James
Just FYI that the final piece of SOLR-2382 has not been committed, and instead has been spun off to SOLR-2943. So it you're using Trunk and you need the ability to persist a cache on disk and then read it back again later as an DIH entity, you'll need both SOLR-2943 and also a cache implementat

Re: Solr Lucene Index Version

2011-12-06 Thread Jamie Johnson
So if I wanted to used lucene index 3.5 with SolrCloud I "should" be able to just move the 3.5 jars in and remove any of the snapshot jars that are present when I build locally? On Tue, Dec 6, 2011 at 4:06 PM, Erik Hatcher wrote: > Jamie - > > I think the best thing that you could do here would b

debugging failed documents

2011-12-06 Thread Alan Miller
Just getting started with DIH and I have a very simple setup. My dih-config.xml is querying my postgres db and does a select on a crosstab() table that returns just 100 rows. When i do a full-import i see that 22 docs fail but what debug settings do i have to tweak to see why the docs failed?

Re: Solr Lucene Index Version

2011-12-06 Thread Erik Hatcher
Oh geez... no... I didn't mean 3.x JARs... I meant the trunk/4.0 ones that are there now. Erik On Dec 6, 2011, at 16:22 , Jamie Johnson wrote: > So if I wanted to used lucene index 3.5 with SolrCloud I "should" be > able to just move the 3.5 jars in and remove any of the snapshot jars >

Re: Document Processing

2011-12-06 Thread Tommaso Teofili
Hello Michael, I can help you with using the UIMA UpdateRequestProcessor [1]; the current implementation uses in-memory execution of UIMA pipelines but since I was planning to add the support for higher scalability (with UIMA-AS [2]) that may help you as well. Tommaso [1] : http://svn.apache.org

Re: Solr Lucene Index Version

2011-12-06 Thread Jamie Johnson
Problem is that really doesn't help me. We still have the same issue that when the 4.0 becomes final there is no migration utility from this pre 4.0 version to 4.0, right? On Tue, Dec 6, 2011 at 4:36 PM, Erik Hatcher wrote: > Oh geez... no... I didn't mean 3.x JARs... I meant the trunk/4.0 ones

To optimize or not - Solr vs Lucene

2011-12-06 Thread Scott Smith
Wasn't sure which mailing list to send this to. I'm writing an application that can be configured to run directly with lucene or with solr and I'm trying to figure out whether optimization of the index should be totally eliminated, eliminated in the lucene case only or what. If I read the 3.5

Re: Solr Lucene Index Version

2011-12-06 Thread Erik Hatcher
Right. Not sure what to advise you. We have worked on this problem with our LucidWorks platform and have some tools available to do this sort of thing, I think, but it's not generally something that you can do with Lucene going from a snapshot to a released version. Perhaps others with deeper

Re: Solr Lucene Index Version

2011-12-06 Thread Jamie Johnson
What about modifying something like SolrIndexConfig.java to change the lucene version that is used when creating the index? (may not be the right place, but is something like this possible?) On Tue, Dec 6, 2011 at 5:13 PM, Erik Hatcher wrote: > Right.  Not sure what to advise you.  We have worke

Re: To optimize or not - Solr vs Lucene

2011-12-06 Thread Yonik Seeley
On Tue, Dec 6, 2011 at 5:04 PM, Scott Smith wrote: > If I read the 3.5 lucene javadocs, optimize() has been deprecated because it > is "rarely justified" with the current lucene index implementation It's functionality is not being deprecated... it's just that the method is being renamed in Lucen

Re: Lucene 4.0 Index Format

2011-12-06 Thread Jamie Johnson
Thanks for the response Mark. Is there any details on the expected Freeze date (not looking for exacts) for this? I'm thinking I'm going to catch hell if I tell our team we need to reindex the entire data set. On Tue, Dec 6, 2011 at 1:25 PM, Mark Miller wrote: > On Tue, Dec 6, 2011 at 12:51 PM,

RE: debugging failed documents

2011-12-06 Thread Young, Cody
In my experience with DIH, the errors for failed documents end up in the log files. Catalina.out for Tomcat. Can you check your log files? Cody -Original Message- From: Alan Miller [mailto:alan.mill...@gmail.com] Sent: Tuesday, December 06, 2011 1:25 PM To: Solr Subject: debugging fail

Re: Attempting to achieve something similar to PostgreSQL's pg_trgm / K-NN combo with Solr

2011-12-06 Thread Chris Hostetter
: I'm working on using trigrams for similarity matching on some data, : where there's a canonical name and lots of personalised variants, e.g.: : : canonical: "My Wonderful Thing" : variant: "My Wonderful Thing (for Matt Patterson)" I'm really not sure why you would need trigrams for something

Re: two word phrase search using dismax

2011-12-06 Thread Erick Erickson
OK, why not just bump the boost on the site field way higher than you already have? A note of caution. You'll drive yourself crazy trying to get *exact* ordering based on some arbitrary (and usually changing) set of requirements. Put what you have working in front of product management and see if

Re: Solr's FieldValueCache and Lucene's FieldCache

2011-12-06 Thread Erick Erickson
Cool! thanks, Hoss. On Mon, Dec 5, 2011 at 6:40 PM, Chris Hostetter wrote: > > : Have you looked at: > : http://wiki.apache.org/solr/SolrCaching > > this page was actually a little light on details about fieldValueCache, so > i tried to fill in some of hte blanks in the latest version. > > https:

Re: Solr tf ifd

2011-12-06 Thread Koji Sekiguchi
(11/12/07 3:42), Nejla Karacan wrote: Hello, I need the tf-idf-values from texts and now Im using Apache-Solr. I am a novice and have some Problems. My question is, how can I extract the tf-idf-values? Nejla, You can use TermVectorComponent on your field which is needed to be set termVector

Invoking an updateRequestProcessorChain from updateHandler

2011-12-06 Thread Jan
Hi all, I'm wondering if it's possible to configure solrconfig.xml so that the updateHandler invokes an updateRequestProcessorChain? At the moment I have modified the /update requestHandler to invoke an updateRequestProcessorChain, which is working nicely. The catch is that I have to POST docume

Re: Invoking an updateRequestProcessorChain from updateHandler

2011-12-06 Thread Mark Miller
You should use the LWE forums for questions about it. The crawlers are hard coded to use the lucid-update-chain currently. If you want them to use the UIMA processor you will have to modify that chain definition to include it. On Dec 6, 2011, at 8:16 PM, Jan wrote: > Hi all, > > I'm wondering

Re: [Announce] Solr-RA, Solr with RankingAlgorithm

2011-12-06 Thread yu shen
thanks for the information 2011/12/6 Nagendra Nagarajayya > Spark: > > The code is compiled to be compliant with JDK 1.5 and above. So you will > need to use at least JDK 1.5 for this to work. > BTW, make sure you add the lib path to the dataimporthandler-3.4.0.jar in > you solrconfig.xml. If yo

Re: Solr Version Upgrade issue

2011-12-06 Thread Pawan Darira
I checked that. there are only latest jars. I am not able to figure out the issue. On Tue, Dec 6, 2011 at 6:57 PM, Mark Miller wrote: > Looks like you must have a mix of old and new jars. > > On Tuesday, December 6, 2011, Pawan Darira wrote: > > Hi > > > > I am trying to upgrade my SOLR version

Re: Memory Leak in Solr?

2011-12-06 Thread Samarendra Pratap
Hi, one of problem is now alleviated. Number of lines with "can't identify protocol " in "lsof" output is now reduced very much. Earlier it kept increasing upto "ulimit -n" thus causing "Too many open files" error but now it is contained to a quite lesser number. This happened after I changed max

cache monitoring tools?

2011-12-06 Thread Dmitry Kan
Hello list, We've noticed quite huge strain on the filterCache in facet queries against trigram fields (see schema in the end of this e-mail). The typical query contains some keywords in the q parameter and boolean filter query on other solr fields. It is also facet query, the facet field is of ty

Re: Sharing dih "dictionaries"

2011-12-06 Thread Mikhail Khludnev
AFAIK DIH jar is separated from Solr war. Isn't there a chance to use DIH from 4.0 in Solr 3.4? James, Sorry for hijacking the thread. But, do you have a chance to review https://issues.apache.org/jira/browse/SOLR-2947 I want to provide a patch for fixing multi-threading in DIH. But formally speak

Solr or SQL fultext search

2011-12-06 Thread Mersad
hi Everyone, I am wondering how much benefit I get if I move from SQL server to Solr in my text -baed search project. Any help is apprechiated ! best Mersad

Re: Solr request handler queries in fiddler

2011-12-06 Thread Dmitry Kan
If you mean debugging the queries, you can use eclipse+jetty plugin setup ( http://code.google.com/p/run-jetty-run/) with solr web app ( http://hokiesuns.blogspot.com/2010/01/setting-up-apache-solr-in-eclipse.html ) On Tue, Dec 6, 2011 at 2:57 PM, Kashif Khan wrote: > Hi all, > > I have develope