date:20120208

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Rob Brown

Apologies if things were a little vague. Given the example snippet to index (numbered to show searches needed to match)... 1: i am a sales-manager in here 2: using asp.net and .net daily 3: working in design. 4: using something called sage 200. and i'm fluent 5: german sausages. 6: busy A&E dept

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Ted Dunning

This is true with Lucene as it stands. It would be much faster if there were a specialized in-memory index such as is typically used with high performance search engines. On Tue, Feb 7, 2012 at 9:50 PM, Lance Norskog wrote: > Experience has shown that it is much faster to run Solr with a small

Re:Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread James

But the solr did not have the im-memory index, I am right? At 2012-02-08 16:17:49,"Ted Dunning" wrote: >This is true with Lucene as it stands. It would be much faster if there >were a specialized in-memory index such as is typically used with high >performance search engines. > >On Tue, Feb

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Patrick Plaatje

A start maybe to use a RAM disk for that. Mount is as a normal disk and have the index files stored there. Have a read here: http://en.wikipedia.org/wiki/RAM_disk Cheers, Patrick 2012/2/8 Ted Dunning > This is true with Lucene as it stands. It would be much faster if there > were a speciali

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Dmitry Kan

Hi, This talk has some interesting details on setting up an Lucene index in RAM: http://www.lucidimagination.com/devzone/events/conferences/revolution/2011/lucene-yelp Would be great to hear your findings! Dmitry 2012/2/8 James > Is there any practice to load index into RAM to accelerate so

Query in starting solr 3.5

2012-02-08 Thread mechravi25

Hi, I am using solr 3.5 version. I moved the data import handler files from solr 1.4(which I used previously) to the new solr. When I tried to start the solr 3.5, I got the following message in my log WARNING: XML parse warning in "solrres:/dataimport.xml", line 2, column 95: Include operation fa

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Andrzej Bialecki

On 08/02/2012 09:17, Ted Dunning wrote: This is true with Lucene as it stands. It would be much faster if there were a specialized in-memory index such as is typically used with high performance search engines. This could be implemented in Lucene trunk as a Codec. The challenge though is to c

Re: Improving performance for SOLR geo queries?

2012-02-08 Thread Matthias Käppler

Hi Erick, if we're not doing geo searches, we filter by "location tags" that we attach to places. This is simply a hierachical regional id, which is simple to filter for, but much less flexible. We use that on Web a lot, but not on mobile, where we want to performance searches in arbitrary radii a

Re: URI Encoding with Solr and Weblogic

2012-02-08 Thread Elisabeth Adler

Hi, I found a solution to it. Adding the Weblogic Server Argument -Dfile.encoding=UTF-8 did not affect the encoding. Only a change to the .war file's weblogic.xml and redeployment of the modified .war solved it. I added the following to the weblogic.xml: * UTF-8 Would it ma

How to reindex about 10Mio. docs

2012-02-08 Thread Vadim Kisselmann

Hello folks, i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to another Solr(1.4.1). I changed my schema.xml (field types sing to slong), standard replication would fail. what is the fastest and smartest way to manage this? this here sound great (EntityProcessor): http://www.searchworkin

Re: How to reindex about 10Mio. docs

2012-02-08 Thread Ahmet Arslan

> i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to > another > Solr(1.4.1). > I changed my schema.xml (field types sing to slong), > standard > replication would fail. > what is the fastest and smartest way to manage this? > this here sound great (EntityProcessor): > http://www.searchwo

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Robert Stewart

I concur with this. As long as index segment files are cached in OS file cache performance is as about good as it gets. Pulling segment files into RAM inside JVM process may actually be slower, given Lucene's existing data structures and algorithms for reading segment file data. If you have

Re: How to reindex about 10Mio. docs

2012-02-08 Thread Vadim Kisselmann

Hi Ahmet, thanks for quick response:) I've already thought the same... And it will be a pain to export and import this huge doc-set as CSV. Do i have an another solution? Regards Vadim 2012/2/8 Ahmet Arslan : >> i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to >> another >> Solr(1.4.1

usage of /etc/jetty.xml when debugging Solr in Eclipse

2012-02-08 Thread jmlucjav

Hi, I am following http://www.lucidimagination.com/devzone/technical-articles/setting-apache-solr-eclipse in order to be able to debug Solr in eclipse. I got it working fine. Now, I usually use ./etc/jetty.xml to set logging configuration. When starting jetty in eclipse I dont see any log files c

Re: How to reindex about 10Mio. docs

2012-02-08 Thread Vadim Kisselmann

Another problem appeared ;) how can i export my docs in csv-format? In Solr 3.1+ i can use the query-param &wt=csv, but in Solr 1.4.1? Best Regards Vadim 2012/2/8 Vadim Kisselmann : > Hi Ahmet, > thanks for quick response:) > I've already thought the same... > And it will be a pain to export and

Custom Document Clustering and Mahout Integration

2012-02-08 Thread Selvam

Hi all, I am trying to write a custom document clustering component that should take all the docs in commit and cluster them; Solr Version:3.5.0 Main Class: public class KMeansClusteringEngine extends DocumentClusteringEngine implements SolrEventListener I added newSearcher event listener, that

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-08 Thread Erick Erickson

Hmmm, seems OK. Did you re-index after any schema changes? You'll learn to love admin/analysis for questions like this, that page should show you what the actual tokenization results are, make sure to click the "verbose" check boxes. Best Erick On Tue, Feb 7, 2012 at 10:52 PM, geeky2 wrote: > h

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Erick Erickson

Yes, WDDF creates multiple tokens. But that has nothing to do with the multiValued suggestion. You can get exactly what you want by 1> setting multiValued="true" in your schema file and re-indexing. Say positionIncrementGap is set to 100 2> When you index, add the field for each sentence, so your

Re: Fields not indexed?

2012-02-08 Thread Dmitry Kan

How does your schema for the fields look like? On Wed, Feb 8, 2012 at 2:41 PM, Radu Toev wrote: > Hi, > > I am really new to Solr so I apologize if the question is a little off. > I was playing with DataImportHandler and tried to index a table in a MS SQL > database. > I configured my datasource

Re: Fields not indexed?

2012-02-08 Thread Radu Toev

The schema.xml is the default file that comes with Solr 3.5, didn't change anything there. On Wed, Feb 8, 2012 at 2:45 PM, Dmitry Kan wrote: > How does your schema for the fields look like? > > On Wed, Feb 8, 2012 at 2:41 PM, Radu Toev wrote: > > > Hi, > > > > I am really new to Solr so I apolo

Re: Fields not indexed?

2012-02-08 Thread Dmitry Kan

well, you should add these fields in schema.xml, otherwise solr won't know them. On Wed, Feb 8, 2012 at 2:48 PM, Radu Toev wrote: > The schema.xml is the default file that comes with Solr 3.5, didn't change > anything there. > > On Wed, Feb 8, 2012 at 2:45 PM, Dmitry Kan wrote: > > > How does y

Re: Fields not indexed?

2012-02-08 Thread Radu Toev

I just realized that as I pushed the send button :P Thanks, I'll have a look. On Wed, Feb 8, 2012 at 2:58 PM, Dmitry Kan wrote: > well, you should add these fields in schema.xml, otherwise solr won't know > them. > > On Wed, Feb 8, 2012 at 2:48 PM, Radu Toev wrote: > > > The schema.xml is the d

Re: usage of /etc/jetty.xml when debugging Solr in Eclipse

2012-02-08 Thread Bernd Fehling

Hi, run-jetty-run issue #9: ... In the VM Arguments of your launch configuration set -Drjrxml=./jetty.xml If jetty.xml is in the root of your project it will be used (you can also use a fully qualified path name). The UI port, context and WebApp dir are ignored, since you can define them in j

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-08 Thread geeky2

hello, thank you for the reply. yes - i did re-index after the changes to the schema. also - thank you for the direction on using the analyzer - but i am not sure if i am interpreting the feedback from the analyzer correctly. here is what i did: in the Field value (Index) box - i placed this:

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Robert Brown

Thanks Erick, I didn't get confused with multiple tokens vs multiValued :) Before I go ahead and re-index 4m docs, and believe me I'm using the analysis page like a mad-man! What do I need to configure to have the following both indexed with and without the dots... .net sales manager. £12.50

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Ted Dunning

Add this as well: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.155.5030 On Wed, Feb 8, 2012 at 1:56 AM, Andrzej Bialecki wrote: > On 08/02/2012 09:17, Ted Dunning wrote: > >> This is true with Lucene as it stands. It would be much faster if there >> were a specialized in-memory inde

How to identify the field with highest score in dismax

2012-02-08 Thread crisfromnova

Hi, According solr documentation the dismax score is calculating after the formula : (score of matching clause with the highest score) + ( (tie paramenter) * (scores of any other matching clauses) ). Is there a way to identify the field on which the matching clause score is the highest? For exa

Sorting solrdocumentlist object after querying

2012-02-08 Thread Kashif Khan

Hi all, I want to sort a SolrDocumentList after it has been queried and obtained from the QueryResponse.getResults(). The reason is i have a SolrDocumentList obtained after querying using QueryResponse.getResults() and i have added few docs to it. Now i want to sort this SolrDocumentList based on

Wildcard ? issue?

2012-02-08 Thread Dalius Sidlauskas

Sorry for inaccurate title. I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full) containing same value: http://www.tei-c.org/ns/1.0";>cal.lígraf and these fields are configured accordingly:

Re: Wildcard ? issue?

2012-02-08 Thread Dalius Sidlauskas

If you can not read this mail easily check this ticket: https://issues.apache.org/jira/browse/SOLR-3106 This is a copy. Regards! Dalius Sidlauskas On 08/02/12 15:44, Dalius Sidlauskas wrote: Sorry for inaccurate title. I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full) containi

Re: Wildcard ? issue?

2012-02-08 Thread Sethi, Parampreet

Hi Dalius, If not already tried, Check http://localhost:8983/solr/admin/analysis.jsp (enable verbose output for both Field Value index and query for details) for your queries and see what all filters/tokenizers are being applied. Hope it helps! -param On 2/8/12 10:48 AM, "Dalius Sidlauskas" wr

Re: Wildcard ? issue?

2012-02-08 Thread Dalius Sidlauskas

I have already tried this and it did not helped because it does not highlight matches if wild-card is used. The field configuration turns data to: dc_title: calligraf dc_title_unicode: cal·lígraf dc_title_unicode_full: cal·lígraf Debug parsedquery says: [Search for *cal·ligraf*] +Disjunction

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-08 Thread Erick Erickson

Hmmm, that all looks correct, from the output you pasted I'd expect you to be finding the doc. So next thing: add &debugQuery=on to your query and look at the debug information after the list of documents, particularly the "parsedQuery" bit. Are you searching against the fields you think you are?

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Erick Erickson

You'll probably have to index them in separate fields to get what you want. The question is always whether it's worth it, is the use-case really well served by having a variant that keeps dots and things? But that's always more a question for your product manager Best Erick On Wed, Feb 8, 201

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Robert Brown

Attempting to re-produce legacy behaviour (i know!) of simple SQL substring searching, with and without phrases. I feel simply NGram'ing 4m CV's may be pushing it? --- IntelCompute Web Design & Local Online Marketing http://www.intelcompute.com On Wed, 8 Feb 2012 11:27:24 -0500, Erick Ericks

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-08 Thread geeky2

hello, thanks for sticking with me on this ...very frustrating ok - i did perform the query with the debug parms using two scenarios: 1) a successful search (where i insert the period / dot) in to the itemNo field and the search returns a document. itemNo:BP2.1UAA http://hfsthssolr1.intra.sea

Re: Wildcard ? issue?

2012-02-08 Thread Ahmet Arslan

> I have already tried this and it did > not helped because it does not > highlight matches if wild-card is used. The field > configuration turns > data to: This writeup should explain your scenario : http://wiki.apache.org/solr/MultitermQueryAnalysis

Re: solr cloud concepts

2012-02-08 Thread Mark Miller

On Feb 8, 2012, at 10:31 AM, Adeel Qureshi wrote: > I have been using solr for a while and have recently started getting into > solrcloud .. i am a bit confused with some of the concepts .. > > 1. what exactly is the relationship between a collection and the core .. > can a core has multiple col

Re: Sorting solrdocumentlist object after querying

2012-02-08 Thread Ahmet Arslan

> I want to sort a SolrDocumentList after it has been queried > and obtained > from the QueryResponse.getResults(). The reason is i have a > SolrDocumentList > obtained after querying using QueryResponse.getResults() and > i have added > few docs to it. Now i want to sort this SolrDocumentList > ba

Re: solr cloud concepts

2012-02-08 Thread Bruno Dumon

Hi Adeel, I just started looking into SolrCloud and had some of the same questions. I wrote a blog with the understanding I gained so far, maybe it will help you: http://outerthought.org/blog/491-ot.html Regards, Bruno. On Wed, Feb 8, 2012 at 4:31 PM, Adeel Qureshi wrote: > I have been using

Re: How to reindex about 10Mio. docs

2012-02-08 Thread Otis Gospodnetic

Vadim, Would using xslt output help? Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html > > From: Vadim Kisselmann >To: solr-user@lucene.apache.org >Sent: Wednesday, February 8, 2012 7:09 AM >Subje

Re: Using UUID for uniqueId

2012-02-08 Thread François Schiettecatte

Anderson I would say that this is highly unlikely, but you would need to pay attention to how they are generated, this would be a good place to start: http://en.wikipedia.org/wiki/Universally_unique_identifier Cheers François On Feb 8, 2012, at 1:31 PM, Anderson vasconcelos wrote: >

Thank you all

2012-02-08 Thread Tim Hibbs

All, It appears my attempt at using solr for the application I support is about to fail. I'm personally and professionally disappointed, but I wanted to say "Many Thanks" to those of you who have provided so much help to so many on this list. In the right hands and in the right environments, it ha

solr/tomcat performance.

2012-02-08 Thread adm1n

Hi, I'm running solr+tomcat with the following configuration: I have 16 slaves, which are being queried by aggregator, while aggregator being queried by the users. My slaveUrls variable in solr.xml (on aggregator) looks like - '' I'm running it on linux machine (not dedicated, there are some other

Index Start Question

2012-02-08 Thread Hoffman, Chase

Please forgive me if this is a dumb question. I've never dealt with SOLR before, and I'm being asked to determine from the logs when a SOLR index is kicked off (it is a Windows server). The TOMCAT service runs continually, so no love there. In parsing the logs, I think "org.apache.solr.core.

SolrCloud is in trunk.

2012-02-08 Thread Mark Miller

For those that are interested and have not noticed, the latest work on SolrCloud and distributed indexing is now in trunk. SolrCloud is our name for a new set of distributed capabilities that improve upon the old style distributed search and index based replication. It provides for high availab

Re: Using UUID for uniqueId

2012-02-08 Thread Anderson vasconcelos

Thanks 2012/2/8 François Schiettecatte > Anderson > > I would say that this is highly unlikely, but you would need to pay > attention to how they are generated, this would be a good place to start: > >http://en.wikipedia.org/wiki/Universally_unique_identifier > > Cheers > > François > > O

Re: SolrCloud is in trunk.

2012-02-08 Thread darren

Good job on this work. A monumental effort. On Wed, 8 Feb 2012 16:41:13 -0500, Mark Miller wrote: > For those that are interested and have not noticed, the latest work on > SolrCloud and distributed indexing is now in trunk. > > SolrCloud is our name for a new set of distributed capabilities th

Re: Improving performance for SOLR geo queries?

2012-02-08 Thread Ryan McKinley

Hi Matthias- I'm trying to understand how you have your data indexed so we can give reasonable direction. What field type are you using for your locations? Is it using the solr spatial field types? What do you see when you look at the debug information from &debugQuery=true? >From my experienc

Re: solr cloud concepts

2012-02-08 Thread Adeel Qureshi

okay so after reading Bruno's blog post .. lets add slice to the mix as well .. so we have got collections, cores, shards, partitions and slices :) .. The whole point with cores is to be able to have different schemas on the same solr server instance. So how does that changes with collections .. m

linking documents in solr

2012-02-08 Thread T Vinod Gupta

hi, I have a question around documents linking in solr and want to know if its possible. lets say i have a set of blogs and their authors that i want to index seperately. is it possible to link a document describing a blog to another document describing an author? if yes, can i search for blogs wit

Re: solr cloud concepts

2012-02-08 Thread Mark Miller

On Feb 8, 2012, at 5:26 PM, Adeel Qureshi wrote: > okay so after reading Bruno's blog post .. lets add slice to the mix as > well .. so we have got collections, cores, shards, partitions and slices :) > .. Yeah - heh - this has bugged me, but we have not really all come down on agreement of ter

Re: Improving performance for SOLR geo queries?

2012-02-08 Thread Nicolas Flacco

I compared locallucene to spatial search and saw a performance degradation, even using geohash queries, though perhaps I indexed things wrong? Locallucene across 6 machines handles 150 queries per second fine, but using geofilt and geohash I got lots of timeouts even when I was doing only 50 querie

Re: usage of /etc/jetty.xml when debugging Solr in Eclipse

2012-02-08 Thread jmlucjav

yes, I am using https://github.com/alexwinston/RunJettyRun that apparently is a fork of the original project that originated in the need to use an jetty.xml. So I am already setting an additional jetty.xml, this can be done in the Run configuration, no need to use -D param. But as I mentioned solr

Re: solr cloud concepts

2012-02-08 Thread Jamie Johnson

Mark, is the recommendation now to have each solr instance be a separate core in solr cloud? I had thought that the core name was by default the collection name? Or are you saying that although they have the same name they are separate because they are in different JVMs? On Wednesday, February 8,

Re: solr cloud concepts

2012-02-08 Thread Mark Miller

On Feb 8, 2012, at 9:36 PM, Jamie Johnson wrote: > Mark, > is the recommendation now to have each solr instance be a separate core in > solr cloud? I had thought that the core name was by default the collection > name? Or are you saying that although they have the same name they are > separate be

Re: multiple cores in a single instance vs multiple instances with single core

2012-02-08 Thread Mark Miller

On Feb 8, 2012, at 9:52 PM, Jamie Johnson wrote: > In solr cloud what is a better approach / use of resources having multiple > cores on a single instance or multiple instances with a single core? What > are the benefits and drawbacks of each? It depends I suppose. If you are talking about on a

Re: multiple cores in a single instance vs multiple instances with single core

2012-02-08 Thread Jamie Johnson

Thanks Mark, in regards to failover I completely agree, I am wondering more about performance and memory usage if the indexes are large and wondering if the separate Java instances under heavy load would more or less performant. Currently we deploy a single core per instance but deploy multiple in

Re: solr cloud concepts

2012-02-08 Thread Adeel Qureshi

Thanks for the explanation. It makes sense but I am hoping that you can clarify things a bit more .. so now it sounds like in solrcloud the concept of cores have changed a bit .. as you explained that for me to have 2 cores with different schemas I will need 2 different collections .. and one good

Re: Sorting solrdocumentlist object after querying

2012-02-08 Thread Kashif Khan

No that sorting is based on multiple fields. Basically i want to sort them as the group by statement like in the SQL based on few fields and many loops to go through. The problem is that i have say 1,000,000 solr docs after injecting my few solr docs and then i want to do group by these solr docs b

How do i do group by in solr with multiple shards?

2012-02-08 Thread Kashif Khan

Hi all, I have tried group by in solr with multiple shards but it does not work. Basically i want to simply do GROUP BY statement like in SQL in solr with multiple shards. Please suggest me how can i do this as it is not supported currently OOB by solr. Thanks & regards, Kashif Khan -- View this

Re: How to identify the field with highest score in dismax

2012-02-08 Thread Mikhail Khludnev

Hello, Have you tried to specify debugQuery=on and look into explain section? Though it's not really performant, but anyway I propose to start from it. Regards On Wed, Feb 8, 2012 at 7:32 PM, crisfromnova wrote: > Hi, > > According solr documentation the dismax score is calculating after the >

Help:Solr can't put all pdf files into index

2012-02-08 Thread 荣康

Hey , I am using solr as my search engine to search my pdf files. I have 18219 files(different file names) and all the files are in one same directory。But when I use solr to import the files into index using Dataimport method, solr report only import 17233 files. It's very strange. This problem

63 matches

Mail list logo