Re: Solr or SQL fultext search

2011-12-07 Thread Mersad
Thanks hector! Is there any other comments from other people?? best mersad On 12/7/2011 7:20 PM, Hector Castro wrote: This article shouldn't flat out make the decision for you, but these concerns raised by the guys at StackOverflow (over SQL Server 2008) helped guide us toward Solr:

Re: cache monitoring tools?

2011-12-07 Thread Dmitry Kan
Otis, Tomás: thanks for the great links! 2011/12/7 Tomás Fernández Löbbe > Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any > tool that visualizes JMX stuff like Zabbix. See > > http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-z

Re: Solr 4 near real time commit

2011-12-07 Thread Mark Miller
Hmmm...that sounds pretty odd... How are you measuring the commit time? You likely want to turn off any caches, as they will be expired every second, but that should not cause this... I can try and duplicate your setup tomorrow and see what i can spot. - Mark On Dec 7, 2011, at 8:13 PM, yu sh

how to implement per doc weighting

2011-12-07 Thread Jason Toy
I've been reading the solr source code and made modifications by implementing a custom Similarity class. I want to implement a weight to the score by multiplying a number based on if the current doc has certain term in it. So if the query was q=data_text:foo then the Similiarity class would apply

Re: Solr Lucene Index Version

2011-12-07 Thread Mark Miller
Replication just copies the index, so I'm not sure how this would help offhand? With SolrCloud this is a breeze - just fire up another replica for a shard and the current index will replicate to it. If you where willing to export the data to some portable format and then pull it back in, why no

Re: Too long to index PDF - SOLR 3.5, SOLRNet 0.4.0.2001, Tom Cat 7.0

2011-12-07 Thread Soumitra Banerjee
Thanks for the response. I will set the stream accrodingly. As for extraction of the text from pdf, I want the entire content of the pdf. This content will be part of a SOLR document, which has an uniqueid. The unique is for what? Here's my schema: Inter

Re: Solr Lucene Index Version

2011-12-07 Thread Jamie Johnson
Yeah I was actually hoping that some how I could use the replication handler to do this, fire up 1 shard, set another as a slave and see if it would replicate the index to it but obviously I'm not sure that would work either. Something like this would be great too https://issues.apache.org/jira/br

Re: Grouping or Facet ?

2011-12-07 Thread Darren Govoni
Yes. That's what I would expect. I guess I didn't understand when you said "The facet counts are the counts of the *values* in that field" Because it seems its the count of the number of matching documents irrespective if one document has 20 values for that field and another 10, the facet coun

Re: Solr 4 near real time commit

2011-12-07 Thread yu shen
Hi Mark, and all I now use commit configuration exactly as below: 10 1000 But the commit time takes about 60 seconds. I have around 120 - 130 documents in my server. And each day, the number will increase about 6000. My symptom is if solr server is just s

Re: UUID field changed when document is updated

2011-12-07 Thread Lance Norskog
http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/UniqueKey On Wed, Dec 7, 2011 at 5:04 PM, Lance Norskog wrote: > Yes, the SignatureUpdateProcessor is what you want. The 128-bit hash is > exactly what you want to use in this situation. You will never get the > same ID

Re: UUID field changed when document is updated

2011-12-07 Thread Lance Norskog
Yes, the SignatureUpdateProcessor is what you want. The 128-bit hash is exactly what you want to use in this situation. You will never get the same ID for two urls- collisions have never been observed "in the wild" for this hash algorithm. Another cool thing about using hash-codes as fields is th

Re: Solr Lucene Index Version

2011-12-07 Thread Mark Miller
Unfortunately, I think the the only silver bullet here, for pure Solr, is to build a system that makes it possible to reindex somehow. On Dec 7, 2011, at 1:38 PM, Erik Hatcher wrote: > > On Dec 7, 2011, at 13:20 , Shawn Heisey wrote: > >> On 12/6/2011 2:06 PM, Erik Hatcher wrote: >>> I think t

Re: Too long to index PDF - SOLR 3.5, SOLRNet 0.4.0.2001, Tom Cat 7.0

2011-12-07 Thread Mauricio Scheffer
Try setting the StreamType to application/pdf, that way Tika doesn't have to infer it. BTW the second argument to ExtractParameters is the unique key... a value of "*" probably doesn't make sense. -- Mauricio On Wed, Dec 7, 2011 at 5:50 PM, Soumitra Banerjee < soumitrabaner...@gmail.com> wrote:

Too long to index PDF - SOLR 3.5, SOLRNet 0.4.0.2001, Tom Cat 7.0

2011-12-07 Thread Soumitra Banerjee
All - I am using SOLR 3.5, SOLRNet 0.4.0.2001, Tom Cat 7.0 and am running a job to extract the text from pds, stored on my local hard disk. *Tomcat StdErr log Shows:* INFO: [core1] webapp=/Solr path=/update/extract params={extractOnly=true& literal.id=*&resource.name=C:\XXX\10310.pdf&extractForm

Re: Using result grouping with SolrJ

2011-12-07 Thread Kissue Kissue
Thanks Juan. I guess i have found my reason to migrate to 3.4. Many thanks. On Wed, Dec 7, 2011 at 7:43 PM, Juan Grande wrote: > Hi Kissue, > > Support for grouping on SolrJ was added in Solr 3.4, see > https://issues.apache.org/jira/browse/SOLR-2637 > > In previous versions you can access the

Re: avoid overwrite in DataImportHandler

2011-12-07 Thread P Williams
Hi, I've wondered the same thing myself. I feel like the "clean" parameter has something to do with it but it doesn't work as I'd expect either. Thanks in advance to anyone who can answer this question. *clean* : (default 'true'). Tells whether to clean up the index before the indexing is start

Re: cache monitoring tools?

2011-12-07 Thread Tomás Fernández Löbbe
Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan wrote: > The culprit seems to

avoid overwrite in DataImportHandler

2011-12-07 Thread sabman
I have a unique ID defined for the documents I am indexing. I want to avoid overwriting the documents that have already been indexed. I am using XPathEntityProcessor and TikaEntityProcessor to process the documents. The DataImportHandler does not seem to have the option to set overwrite=false. I h

Re: Using result grouping with SolrJ

2011-12-07 Thread Juan Grande
Hi Kissue, Support for grouping on SolrJ was added in Solr 3.4, see https://issues.apache.org/jira/browse/SOLR-2637 In previous versions you can access the grouping results by simply traversing the various named lists. *Juan* On Wed, Dec 7, 2011 at 1:22 PM, Kissue Kissue wrote: > Hi, > > I

Re: Solr Lucene Index Version

2011-12-07 Thread Erik Hatcher
On Dec 7, 2011, at 13:20 , Shawn Heisey wrote: > On 12/6/2011 2:06 PM, Erik Hatcher wrote: >> I think the best thing that you could do here would be to lock in a version >> of Lucene (all the Lucene libraries) that you use with SolrCloud. Certainly >> not out of the realm of possibilities of s

Re: Solr Lucene Index Version

2011-12-07 Thread Shawn Heisey
On 12/6/2011 2:06 PM, Erik Hatcher wrote: I think the best thing that you could do here would be to lock in a version of Lucene (all the Lucene libraries) that you use with SolrCloud. Certainly not out of the realm of possibilities of some upcoming SolrCloud capability that requires some upgr

Re: cache monitoring tools?

2011-12-07 Thread Otis Gospodnetic
Hi Dmitry, You should use SPM for Solr - it exposes all Solr metrics and more (JVM, system info, etc.) PLUS it's currently 100% free. http://sematext.com/spm/solr-performance-monitoring/index.html We use it with our clients on a regular basis and it helps us a TON - we just helped a very popu

RE: SolR - Index problems

2011-12-07 Thread Husain, Yavar
Hi Jiggy When you query the index, what do you get in the tomcat logs? (Check that out in tomcat/logs directory) How much of Heap memory have you allocated to Tomcat? - Yavar From: jiggy [new...@trash-mail.com] Sent: Wednesday, December 07, 2011 9:53 P

SolR - Index problems

2011-12-07 Thread jiggy
Hello Guys, i have a big problem. I have integrated solr to Magento EE. I have two solr folder, one is in c:/tomcat 7.0/ and the other one is in my web-folder(c:/www/). In the tomcat-folder is the data folder of solr, their are about 200 MB index file(I think here are my datas from magento). In t

Boost Query in Edismax

2011-12-07 Thread John
I have a complex edismax query: facet=true&facet.mincount=0&qf=title^0.08+categorysearch^0.05+abstract^0.03+body^0.1&wt=javabin&rows=25&defType=edismax&version=2&omitHeader=true&fl=*,score&bq=eqid:(3yp^1.57+OR+5fi^1.55+OR+c1s^1.55+OR+3ym^1.55+OR+gjz^1.55...)&start=0&q=*:*&facet.field=category&face

Using result grouping with SolrJ

2011-12-07 Thread Kissue Kissue
Hi, I am using Solr 3.3 with SolrJ. Does anybody know how i can use result grouping with SolrJ? Particularly how i can retrieve the result grouping results with SolrJ? Any help will be much appreciated. Thanks.

Re: Solr response writer

2011-12-07 Thread Finotti Simone
Thank you Erik, I will work on your suggestion! It seems it could work, provided I can boost matches on "redirect" document type S Inizio: Erik Hatcher [erik.hatc...@gmail.com] Inviato: mercoledì 7 dicembre 2011 16.56 Fine: solr-user@lucene.apache.org Ogg

XPathEntityProcessor and ExtractingRequestHandler

2011-12-07 Thread Michael Kelleher
Can I use a XPathEntityProcessor in conjunction with an ExtractingRequestHandler? Also, the scripting language that XPathEntityProcessor uses/supports, is that just ECMA/JavaScript? Or is XPathEntityProcessor only supported for use in conjuntion with the DataImportHandler? Thanks.

Re: Solr response writer

2011-12-07 Thread Erik Hatcher
What you can do is index the "redirect" documents along with the associated words, and let Solr do the stemming. Maybe add a "document type" field and if you get a match on a redirect document type, your web service can do what it needs to do from there. Erik On Dec 7, 2011, at 10:

Re: Solr or SQL fultext search

2011-12-07 Thread Hector Castro
This article shouldn't flat out make the decision for you, but these concerns raised by the guys at StackOverflow (over SQL Server 2008) helped guide us toward Solr: http://www.infoq.com/news/2008/11/SQL-Server-Text -- Hector On Dec 7, 2011, at 2:17 AM, Mersad wrote: > hi Everyone, >

Re: Solr response writer

2011-12-07 Thread Finotti Simone
No, actually it's a .NET web service that queries Endeca (call it Wrapper). It returns to its clients a collection of unique product IDs, then the client will ask other web services for more detailed informations about the given products. As long as no URL redirection is involved, I think that s

Difference between field collapsing and result grouping

2011-12-07 Thread Kissue Kissue
Sorry if this question sounds stupid but i am really really confused about this. Is there actually a difference between field collapsing and result grouping in SOLR? I have come across articles that have talked about setting up field collapsing with commands that look different from the grouping o

Re: Solr 4 near real time commit

2011-12-07 Thread yu shen
Thanks for the correction, I did not notice that [?] Spark 2011/12/7 Mark Miller > Well, if that is exactly what you put, it's wrong. That second one should > be softAutoCommit. > > On Wednesday, December 7, 2011, yu shen wrote: > > Hi All, > > > > I tried using solr 4 nightly build: apache-s

Re: Solr Lucene Index Version

2011-12-07 Thread Erik Hatcher
Jamie - The details would of course be entirely dependent on what changed, but with Lucene trunk/4.0 there is the flexible indexing API with codecs. I imagine with a compatibility codec layer one could provide some insulation to changes. You're at big scale, so the "just reindex everything" an

Re: Solr Version Upgrade issue

2011-12-07 Thread Erick Erickson
How did you upgrade? What steps did you follow? Do you have any custom code? Any additional entries in your solrconfig.xml? These details help us diagnose your problem, but it's almost certainly that you have a mixture of jar files lying around your machine in a place you don't expect. Best Eric

Re: cache monitoring tools?

2011-12-07 Thread Dmitry Kan
The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan wrote: > Tomás: thanks. The page you gave didn't mention cache specifically, is > there more documentation on this specifically? I

Problem in searching the Indexed PDf and Word documents in apache tika+solr envieonment using solrj ?

2011-12-07 Thread kiran.bodigam
I am trying to index the pdf and word documents in solr 3.3.0 version+apache tika uisng SOLRJ when i am able to search the documents with the file name where as when i am trying to search the any text data in the content(text data in the file) its not showing any document in response ? Do i need t

Re: R: Solr response writer

2011-12-07 Thread Michael Kuhlmann
Am 07.12.2011 15:09, schrieb Finotti Simone: I got your and Michael's point. Indeed, I'm not very skilled in web devolpment so there may be something that I'm missing. Anyway, Endeca does something like this: 1. accept a query 2. does the stemming; 3. check if the result of the step 2. matches

Re: Solr Trunk Changes requires a reindex

2011-12-07 Thread Jamie Johnson
Thanks for the response Erick. On Wed, Dec 7, 2011 at 9:08 AM, Erick Erickson wrote: > Not that I now of. That's one drawback to being on the bleeding edge, when > the index format changes you have to re-index... > > Best > Erick > > On Tue, Dec 6, 2011 at 10:09 AM, Jamie Johnson wrote: >> Are t

Problems with SolrUIMA

2011-12-07 Thread Adriana Farina
Hello, I'm trying to use the SolrUIMA component of solr 3.4.0. I modified solrconfig.xml file in the following way:                  C:\Users\Stefano\workspace2\UimaComplete\descriptors\analysis_engine\AggregateAE.xml                true                          false                      te

Re: cache monitoring tools?

2011-12-07 Thread Dmitry Kan
Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased

R: Solr response writer

2011-12-07 Thread Finotti Simone
I got your and Michael's point. Indeed, I'm not very skilled in web devolpment so there may be something that I'm missing. Anyway, Endeca does something like this: 1. accept a query 2. does the stemming; 3. check if the result of the step 2. matches one of the redirectable words. If so, returns

Re: Solr 4 near real time commit

2011-12-07 Thread Mark Miller
Well, if that is exactly what you put, it's wrong. That second one should be softAutoCommit. On Wednesday, December 7, 2011, yu shen wrote: > Hi All, > > I tried using solr 4 nightly build: apache-solr-4.0-2011-12-06_08-52-46. > And try to enable autoSoftCommit like below in solrconfig.xml > >

Re: Solr Trunk Changes requires a reindex

2011-12-07 Thread Erick Erickson
Not that I now of. That's one drawback to being on the bleeding edge, when the index format changes you have to re-index... Best Erick On Tue, Dec 6, 2011 at 10:09 AM, Jamie Johnson wrote: > Are there any migration utilities to move from an index built by a > Solr 4.0 snapshot to Solr Trunk?  Th

Re: Grouping or Facet ?

2011-12-07 Thread Erick Erickson
In your example you'll have 10 facets returned each with a value of 1. Best Erick On Tue, Dec 6, 2011 at 9:54 AM, wrote: > Sorry to jump into this thread, but are you saying that the facet count is > not # of result hits? > > So if I have 1 document with field CAT that has 10 values and I do a

RE: Delays when deleting by query

2011-12-07 Thread Mike Gallan
I ran some more tests.  I added an explicit commit after each deleteByQuery() call and removed the add/reindex step.  This hung up immediately and completed (or timed out?) after 20 minutes.  The hangs occur almost exactly 20 minutes apart.  Could this be a Tomcat issue? I ran jconsole but did

Re: Solr Lucene Index Version

2011-12-07 Thread Jamie Johnson
Erik, Do you have any details behind what would be required to write a tool to move from one index format to another? Any examples/suggestions would be appreciated. On Tue, Dec 6, 2011 at 5:19 PM, Jamie Johnson wrote: > What about modifying something like SolrIndexConfig.java to change the > lu

Re: Solr response writer

2011-12-07 Thread Erik Hatcher
Either way (Endeca's 307, which seems crazy to me) or simply plucking off a "url" field from the first document returned in a search request... you're getting a URL back to your client and then using that URL to further send back to a users browser, I presume. I personally wouldn't implement it

Re: Solr response writer

2011-12-07 Thread Michael Kuhlmann
Am 07.12.2011 14:26, schrieb Finotti Simone: That's the scenario: I have an XML that maps words W to URLs; when a search request is issued by my web client, a query will be issued to my Solr application. If, after stemming, the query matches any in W, the client must be redirected to the associ

Re: Solr response writer

2011-12-07 Thread Finotti Simone
That's the scenario: I have an XML that maps words W to URLs; when a search request is issued by my web client, a query will be issued to my Solr application. If, after stemming, the query matches any in W, the client must be redirected to the associated URL. I agree that it should be handled ou

Re: Solr response writer

2011-12-07 Thread Erik Hatcher
First, could you tell us more about your use case? Why do you want to change the response code? HTTP 307 = Temporary redirect - where are you going to redirect? Sounds like something best handled outside of Solr. If you went down the route of creating your own custom response writer, then

custom types file for WordDelimeterFilterFactory

2011-12-07 Thread Maurizio Piccini
Hi, I'm actually having the exact same problem. Did you anyhow find a solution for this? cheers Maurizio

Re: cache monitoring tools?

2011-12-07 Thread Tomás Fernández Löbbe
Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan wrote: > Yes, we do require that much. > Ok, thanks, I will try increasing the maxsize. > > On

Solr response writer

2011-12-07 Thread Finotti Simone
Hello, I need to change the HTTP result code of the query result if some conditions are met. Analyzing the flow of execution of Solr query process, it seems to me that the "place" that fits better is the QueryResponseWriter. Anyway I didn't found a way to change the HTTP request layout (I need

Solr using very high I/O

2011-12-07 Thread Adrian Fita
Hi. I experience an issue where Solr is using huge ammounts of I/O. Basically it uses the whole HDD continously, leaving nothing to the other processes. Solr is called by a script which continously indexes some files. The index has around 800MB and I can't understand why it could trash the HDD so

Re: Solr request handler queries in fiddler

2011-12-07 Thread Dmitry Kan
Is it not possible to expose the shards to your IP and eclipse-debug the queries via the solr frontend? If you need to intercept the queries between frontend and shards in a non-windows environment, you could try wireshark or tcpmon (http://ws.apache.org/commons/tcpmon/) On Wed, Dec 7, 2011 at 10:

Re: UUID field changed when document is updated

2011-12-07 Thread blaise thomson
Hi Hoss, Thanks for getting back to me on this. : I've been trying to use the UUIDField in solr to maintain ids of the >: pages I've crawled with nutch (as per >: http://wiki.apache.org/solr/UniqueKey). The use case is that I want to >: have the server able to use these ids in another database

Re: cache monitoring tools?

2011-12-07 Thread Dmitry Kan
Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh wrote: > >>facet.limit=50 > your facet.limit seems too high. Do you actually require this much? > > Since there a lot of evictions from filtercache, so, increase the maxsize

Re: Solr sorting issue : can not sort on multivalued field

2011-12-07 Thread pravesh
Was that field multivalued="true" earlier by any chance??? Did you rebuild the index from scratch after changing it to multivalued="false" ??? Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-sorting-issue-can-not-sort-on-multivalued-field-tp3564266p356683

Re: cache monitoring tools?

2011-12-07 Thread pravesh
>>facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring

Re: Solr request handler queries in fiddler

2011-12-07 Thread Kashif Khan
i am already using eclipse jetty for debugging but it is really hectic when we have shards and queries going to each shard i want to skip it and see in the fiddler rather. -- Kashif Khan. B.E., +91 99805 57379 http://www.kashifkhan.in On Wed, Dec 7, 2011 at 12:54 PM, Dmitry Kan [via Lucene] < ml

Reducing heap space consumption for large dictionaries?

2011-12-07 Thread Mark Schoy
Hi, in my index schema I has defined a DictionaryCompoundWordTokenFilterFactory and a HunspellStemFilterFactory. Each FilterFactory has a dictionary with about 100k entries. To avoid an out of memory error I have to set the heap space to 128m for 1 index. Is there a way to reduce the memory cons

Solr 4 near real time commit

2011-12-07 Thread yu shen
Hi All, I tried using solr 4 nightly build: apache-solr-4.0-2011-12-06_08-52-46. And try to enable autoSoftCommit like below in solrconfig.xml 10 1000 I try to add a document to this solr instance using solrj client in the nightly build. I do saw a commit time boost. Single docume

Edismax and fuzzy querying

2011-12-07 Thread Marc SCHNEIDER
Hello, I'm using edismax and Solr 4.0 and I'd like to add fuzzy parameters for some fields like this : my_field1~2 my_field2 my_field3 Unfortunately It doesn't work, so I tried following approaches : 1) /select?q=my_search_string~2 => of course it applies to *all* fields of my edismax query, an