Re: Distributed search component.

2011-05-13 Thread Rok Rejc
I am still fighting (after a month of doing other things) with the first part of the problem. Any ideas? Many thanks, Rok On Mon, Apr 4, 2011 at 9:06 AM, Rok Rejc wrote: > Hi all, > > I am trying to create a distributed search component in solr which is quite > difficult (at least for me, becau

Re: Facet Count Based on Dates

2011-05-13 Thread Jasneet Sabharwal
Hey Otis, FieldCollapsing is again a feature of Solr 4.0, anything possible using the default feature of Solr 3.1. Btw, how can I apply these patches on Solr 3.1 ? On 13-05-2011 12:10, Otis Gospodnetic wrote: Jasneet, Like in http://wiki.apache.org/solr/FieldCollapsing ? Otis Sematext

Results with and without whitspace(soccer club and soccerclub)

2011-05-13 Thread roySolr
Hello, My index looks like this: Soccer club Football club etc. Now i want that a user can search for "soccer club" and "soccerclub". "Soccer club" works but without the whitespace it's not a match. How can i fix this? How does my configuration looks like? Is there a filter or something? --

Re: DIH entity threads (multithreading)

2011-05-13 Thread Jamroz Marcin
I am using Solr 3.1 but tried it with 4.0 beta too Does it depend on the batchSize argument? I also have table relationships, tried without them same effect Is there an full featured example of how to use this threads parameter ?

Re: Results with and without whitspace(soccer club and soccerclub)

2011-05-13 Thread Paul Libbrecht
Roy, I believe the way to do that is to use a compound-words-analyzer. The issue: you need to know the decompositions in advance. Compound words are pretty common in German, for example, and I'd wish research efforts to maintain compound-words-corpus but I have not seen it yet. paul Le 13 mai

Field collapsing classloading issues

2011-05-13 Thread karan singh
I applied the SOLR field collapsing patch to solr 3.1.0. I'm not really sure about what to add in solrconfig.xml. Right now I've added the following :

Re: DIH help request: nested xml entities and xpath

2011-05-13 Thread Gora Mohanty
On Fri, May 13, 2011 at 10:18 AM, Ashique wrote: > Hi All, > > I am a Java/J2ee programmer and very new to SOLR. I would  like to index a > table in a postgresSql database to SOLR. Then searching the records from a > GUI (Jsp Page) and showing the results in tabular form. Could any one help > me o

RE: Document match with no highlight

2011-05-13 Thread Pierre GOSSE
In WordDelimiterFilter the parameters catenateNumbers, catenateWords, catenateAlls are set to 1. This parameters adds overlapping tokens which could explain that you meet the bug described in the jira issue I mentioned. As I understand WordDelimiterFilter : "0176 R3 1.5 TO" should we tokenized

How does Solr's MoreLikeThis component internally work to get results?

2011-05-13 Thread Gnanakumar
Hi, I'm new to Apache Solr and am currently exploring/trying to make use of MoreLikeThis as a search component (instead of dedicated request handler). I'm finding difficult to understand clearly on how this works internally to get more-like-this results? For example, I'm trying to search for the

Re: Results with and without whitespace(soccer club and soccerclub)

2011-05-13 Thread Grijesh
what about synonym filter factory - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Results-with-and-without-whitespace-soccer-club-and-soccerclub-tp2934742p2934828.html Sent from the Solr - User mailing list archive at Nabble.com.

Order of words in proximity search

2011-05-13 Thread Tor Henning Ueland
Hi, The documentation does not(?) specify this, but still a interesting question. Does the order of the words in a proximity search matter? And if it does, is it possible to ignore the order? I did not belive it did, but some tests against a ngram field does give different results. Examples: "fo

Re: Document match with no highlight

2011-05-13 Thread Phong Dais
Pierre, Merci beaucoup Pierre. :) You saved me a lot of time and headache. >As I understand WordDelimiterFilter : > >"0176 R3 1.5 TO" should we tokenized with tokens "R3" overlapping with "R" > and "3", and "15" overlapping with "1" and "5" > >This parmeters are set to 0 for query, but having t

How does Solr's MoreLikeThis component internally work to get results?

2011-05-13 Thread Gnanakumar
Hi, I'm new to Apache Solr and am currently exploring/trying to make use of MoreLikeThis as a search component (instead of dedicated request handler). I'm finding difficult to understand clearly on how this works internally to get more-like-this results? For example, I'm trying to search for the

Re: Huge performance drop in distributed search w/ shards on the same server/container

2011-05-13 Thread Grant Ingersoll
Is that 10 different Tomcat instances or are you using multicore? How are you testing? On May 13, 2011, at 6:08 AM, Frederik Kraus wrote: > Hi, > > I'm having some serious problems scaling the following setup: > > 48 CPU / Tomcat / ... > > localhost/shard1 > ... > localhost/shard10 > > Whe

Re: Results with and without whitespace(soccer club and soccerclub)

2011-05-13 Thread roySolr
mm,, it's about 10.000 terms. It's possible but not the best solution i think. -- View this message in context: http://lucene.472066.n3.nabble.com/Results-with-and-without-whitespace-soccer-club-and-soccerclub-tp2934742p2934888.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Results with and without whitespace(soccer club and soccerclub)

2011-05-13 Thread Paul Libbrecht
Hey guys, keep a bit of the thread! Roy, I'm afraid it's not different with CompoundAnalyzer: all in memory. Have you tried? I sure wish such a compound-analysis would be done with a lucene-powered dictionary! That would rock. paul Le 13 mai 2011 à 11:57, Grijesh a écrit : > what about sy

Re: Results with and without whitspace(soccer club and soccerclub)

2011-05-13 Thread Markus Jelsma
Yes http://lucene.apache.org/solr/api/org/apache/solr/analysis/HyphenationCompoundWordTokenFilterFactory.html > Roy, > > I believe the way to do that is to use a compound-words-analyzer. > The issue: you need to know the decompositions in advance. > Compound words are pretty common in German, for

Re: Huge performance drop in distributed search w/ shards on the same server/container

2011-05-13 Thread Frederik Kraus
One Tomcat with multicore. I have a list of about 2mio "real" queries that I'm firing at the cluster with jmeter. Reason for splitting up the index in rather small parts is that the maximum response time of 1 sec cannot be exceeded for any of those queries. On Freitag, 13. Mai 2011 at 12:57,

Re: Changing the schema

2011-05-13 Thread Chamnap Chhorn
I wonder what if I add new field in the schema, do i have to reindex? If no need to reindex, can i just update the schema.xml directly? After that, Should I restart the tomcat service? If no need to reindex, how about the existing documents? If I do a query with new field, does it cause errors?

Re: Changing the schema

2011-05-13 Thread Stefan Matheis
Chamnap, On Fri, May 13, 2011 at 2:59 PM, Chamnap Chhorn wrote: > I wonder what if I add new field in the schema, do i have to reindex? If you're using that field within the DIH .. then of course yes, but normally/otherwise: No :) On Fri, May 13, 2011 at 2:59 PM, Chamnap Chhorn wrote: > If no

Re: Solr performance

2011-05-13 Thread javaxmlsoapdev
Alright. It turned out that defaultSearchField=title where title field is of a custom fieldType=edgyText where so if no value in the "q" parameter is passed, solr picks up default field, which is tiltle of type "edgyText" taking a very long time to return results. Is there a way to IGN

Re: Faceting question

2011-05-13 Thread Mark
No mixup. I probably didn't explain myself correctly. Suppose my document has fields "title", "description" and "foo". When I search I would like to search across "title" and "description". I then would like facet counts on "foo" for documents that matched the "title" field only. IE, I would l

Re: Support for huge data set?

2011-05-13 Thread Shawn Heisey
Our system, which I am not at liberty to disclose, consists of 55 million documents, mostly photos and text, but video is starting to become prominent. The entire archive is about 80 terabytes, but we only index a subset of the metadata, stored in a MySQL database, which is about 100GB or so i

When to use trie over standard

2011-05-13 Thread Mark
When should one use Trie fields over the standard fields? What aret the pro's and con's of each? Thanks

RE: When to use trie over standard

2011-05-13 Thread Jonathan Rochkind
Well, let's be clear about what we're talking about. The suggested numeric and date fields in the current Solr example schema are in fact ALL Trie based fields. http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup I don't think there is any downside to usi

Multi Word Filter Queries

2011-05-13 Thread davaugust
I've recently installed solr and built an index. It's working for the most part, but no matter what I do, I can't seem to get filter queries where the filtered value is multiple words to work. By work I mean to literally filter exactly by the phrase provided in "fq". After trying many different

Re: Results with and without whitespace(soccer club and soccerclub)

2011-05-13 Thread Robert Muir
On Fri, May 13, 2011 at 7:07 AM, Paul Libbrecht wrote: > I sure wish such a compound-analysis would be done with a lucene-powered > dictionary! > That would rock. > me too, but its a chicken-and-egg problem (you would have to basically index everything without decomposition to get the dictionar

Re: When to use trie over standard

2011-05-13 Thread Mark
Great explanation. Thanks On 5/13/11 8:25 AM, Jonathan Rochkind wrote: Well, let's be clear about what we're talking about. The suggested numeric and date fields in the current Solr example schema are in fact ALL Trie based fields. http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/so

Re: Results with and without whitespace(soccer club and soccerclub)

2011-05-13 Thread Paul Libbrecht
Le 13 mai 2011 à 17:32, Robert Muir a écrit : > On Fri, May 13, 2011 at 7:07 AM, Paul Libbrecht wrote: > >> I sure wish such a compound-analysis would be done with a lucene-powered >> dictionary! >> That would rock. >> > > me too, but its a chicken-and-egg problem (you would have to basica

Re: Multi Word Filter Queries

2011-05-13 Thread Tor Henning Ueland
On Fri, May 13, 2011 at 5:26 PM, davaugust wrote: > When I fq by a multi word string, it does something, but returns many > results with values not in the FQ It isnt as easy as you have forgotten to add quotation marks around them? If you have not changed the default operator and do not use opera

Schema Design Question

2011-05-13 Thread Zac Smith
Let's say I have a data model that involves books and bookshelves. I have tens of thousands of books and thousands of bookshelves. There is a many-many relationship between books & bookshelves. All of the books are indexed by SOLR. I need to be able to query SOLR and get all the books for a give

Editor loads wrong version of IndexSearcher while debugging - how to fix?

2011-05-13 Thread Gabriele Kahlout
Hello, I'm debugging Solr built as a maven project in NB, and when I enter the code of a Lucene dependency, namely org.apache.lucene.search.IndexSearcher.explain(..) the call stack expects this method to be at line 599 while in the editor the class ends at 304. from solr-core's pom.xml:

Re: Support for huge data set?

2011-05-13 Thread Jack Repenning
On May 13, 2011, at 7:59 AM, Shawn Heisey wrote: > The entire archive is about 80 terabytes, but we only index a subset of the > metadata, stored in a MySQL database, which is about 100GB or so in size. > > The Solr index (version 1.4.1) consists of six large shards, each about 16GB > in size,

Re: Support for huge data set?

2011-05-13 Thread Darren Govoni
Can I ask if you do any faceted or MLT type searches? Do those even work across shards? On Fri, 2011-05-13 at 08:59 -0600, Shawn Heisey wrote: > Our system, which I am not at liberty to disclose, consists of 55 > million documents, mostly photos and text, but video is starting to > become promi

Boosting score by distance

2011-05-13 Thread Ian Eure
I have a bunch of documents representing points of interest indexed in Solr. I'm trying to boost the score of documents based on distance from an origin point, and having some difficulty. I'm currently using the standard query parser and sending in this query: (name:sushi OR tags:sushi OR class

Re: DIH help request: nested xml entities and xpath

2011-05-13 Thread Weiss, Eric
I think my original question/thread was accidentally pwnd. Let me take this opportunity to refocus this thread to my original question about DIH and nested entities and xpath. I'll try to ask a very simple question instead: Why doesn't this field xpath work? By "not working" I mean the MsgKeywo

Re: Support for huge data set?

2011-05-13 Thread Shawn Heisey
The objects are in a number of filesystems, taking up 80TB of space. The MySQL database is about 128GB, 117GB of which is table containing the metadata for the documents. We don't use all that metadata, just a subset. I don't have any way to really calculate the subset's size. With seven sh

Re: Schema Design Question

2011-05-13 Thread Otis Gospodnetic
Hi Zac, Solr 4.0 (trunk) has support for relationships/JOIN. Have a look: http://search-lucene.com/?q=solr+join Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Zac Smith > To: "solr-

Re: Sub query using SOLR?

2011-05-13 Thread Thalaiselvam
But we cann't judge the subquery return value, Is this possible to add more than ID in sub query?.. Thanks & Regards, Thalaiselvam N -- View this message in context: http://lucene.472066.n3.nabble.com/Sub-query-using-SOLR-tp2193251p2931267.html Sent from the Solr - User mailing list archive at

Show filename in search result using a FileListEntityProcessor

2011-05-13 Thread Marcel Panse
Hi Solr community, I'm new to solr and trying to scan all pdf/doc files in a directory. This works fine and I am able to scan all documents. The next thing i'm trying to do is also receiving the filename of the file in the search results. The filename however never shows up. I tried a couple of th

Re: Support for huge data set?

2011-05-13 Thread Renaud Delbru
Hi, Our system [1] consists of +220 million semi-structured web documents (RDF, Microformats, etc.), with fairly small documents (a few kb) and large documents (a few MB). Each document has in addition a dozen of additional fields for indexing and storing metadata about the document. It runs

Re: UIMA analysisEngine path

2011-05-13 Thread chamara
Hi, Is this code line 57 needs to be changed to the location where the jar files(library files) resides? URL url = this.getClass().getResource(""); I did change it but no luck so far. Let me know what i am doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/UIMA-an

Re: Support for huge data set?

2011-05-13 Thread Shawn Heisey
On 5/13/2011 11:09 AM, Darren Govoni wrote: Can I ask if you do any faceted or MLT type searches? Do those even work across shards? We currently aren't using facets in production, but we've done some data mining with them. They work very well in distributed mode. We plan to start incorporat

Re: Faceting question

2011-05-13 Thread lee carroll
Hi Mark, I think you would need to issue two seperate queries. Its also a, I was going to say odd usecase but who am I to judge, interesting usecase. If you have a faceted navigation front end you are in real danger of confusing your users. I suppose its a case of what do you want to achieve? Facet

document storage

2011-05-13 Thread Mike Sokolov
Would anyone care to comment on the merits of storing indexed full-text documents in Solr versus storing them externally? It seems there are three options for us: 1) store documents both in Solr and externally - this is what we are doing now, and gives us all sorts of flexibility, but doesn't

Writing response from custom RequestHandler

2011-05-13 Thread logan.stinger
I am writing a custom RequestHandler by extending RequestHandlerBase. I would like this request handler to perform some work and then write the response using the VelocityResponseWriter. I am just getting started so currently my custom RequestHandler looks like this: @Override public void handle

Re: MoreLikeThis PDF search

2011-05-13 Thread Brian Lamb
Any thoughts on this one? On Thu, May 12, 2011 at 10:46 AM, Brian Lamb wrote: > Hi all, > > I've become more and more familiar with the MoreLikeThis handler over the > last several months. I'm curious whether it is possible to do a MoreLikeThis > search by uploading a PDF? I looked at the Extract

Re: Support for huge data set?

2011-05-13 Thread Darren Govoni
Thanks for the info Shawn. I'll look into the issue as well. On Fri, 2011-05-13 at 12:34 -0600, Shawn Heisey wrote: > On 5/13/2011 11:09 AM, Darren Govoni wrote: > > Can I ask if you do any faceted or MLT type searches? Do those even work > > across shards? > > We currently aren't using facets i

Re: Replication Clarification Please

2011-05-13 Thread Ravi Solr
Sorry guys spoke too soon I guess. The replication still remains very slow even after upgrading to 3.1 and setting the compression off. Now Iam totally clueless. I have tried everything that I know of to increase the speed of replication but failed. if anybody faced the same issue, can you please t

RE: SolrDispatchFilter

2011-05-13 Thread Chris Hostetter
: This problem is only occurring when using IE8 ( Chrome & FireFox fine ) if it only happens when using the form on the admin screen (and not when hitting the URL directly, via shift-reload for example), it may just be a differnet manifestation of this silly javascript bug... https://issues.ap

Re: document storage

2011-05-13 Thread Rich Cariens
We've decided to store the original document in both Solr and external repositories. This is to support the following: 1. highlighting - We need to mark-up the entire document with hit-terms. However if this was the only reason to store the text I'd seriously consider calling out to the e

Re: Solr Range Facets

2011-05-13 Thread Chris Hostetter
: I did try what you suggested, but I am not getting the expected results. The : code is given below, +5 points for posting the code you tried, but -10 points for not explaining how the results you get are differnet from the results you expect, and -5 more points for not even giving an example

Want to Delete Existing Index & create fresh index

2011-05-13 Thread Pawan Darira
Hi I had an existing index created months back. now my database schema has changed. i wanted to delete the current data/index directory & re-create the fresh index but it is saying that "segments" file not found & just create blank data/index directory. Please help -- Thanks, Pawan Darira

Re: Want to Delete Existing Index & create fresh index

2011-05-13 Thread Gabriele Kahlout
"curl --fail $solrIndex/update?commit=true -d '*:*'" #empty index [1 ] did u try? On Sat, May 14, 2011 at 7:26 AM, Pawan Darira wrote: > Hi > > I had an existing index created months back. now my database schema has > cha

RE: Schema Design Question

2011-05-13 Thread Zac Smith
Thanks that looks interesting. Don't think it helps my situation though as I would have to index all the bookshelves and will still end up having to put thousands of Book ID values in a multi-value field. I guess the question I have is: Is it more appropriate to load a multi-value field with a