Re: Measuring SOLR performance

2013-07-30 Thread William Bell
But that link does not tell me which on you are using? You are listing like 4 versions on your site. Also, what did it fix? Pause times? Any other words of wisdom ? On Tue, Jul 30, 2013 at 9:01 PM, Shawn Heisey wrote: > On 7/30/2013 6:59 PM, Roman Chyla wrote: > > I have been wanting some

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Dotan Cohen
On Wed, Jul 31, 2013 at 4:56 AM, Bill Bell wrote: > On Jul 30, 2013, at 12:34 PM, Dotan Cohen wrote: >> On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal wrote: >>> Does adding facet.mincount=2 help? >> >> In fact, when adding facet.mincount=20 (I know that some dupes are in >> the hundreds) I got

Negative Query Behaviour in Solr 3.2

2013-07-30 Thread karanjindal
Hi All, I am using solr 3.2 and confused how a particular query is executed. q=name:memory OR -name:encoded separately firing q=name:memory gives 3 results and q=-name:encoded gives 25 results and result sets are disjoint sets. Since I am doing OR query it should return 28 results, but it is onl

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Dotan Cohen
On Wed, Jul 31, 2013 at 12:48 AM, Jack Krupansky wrote: > You could also try the terms component which provides a very efficient > facet-like feature - counting the terms. And you can set a minimum term > frequency of 2, so only the dups would come back: > > curl "http://localhost:8983/solr/terms?

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Dotan Cohen
On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky wrote: > The Solr SignatureUpdateProcessorFactory is designed to facilitate dedupe... > any particular reason you did not use it? > > See: > http://wiki.apache.org/solr/Deduplication > > and > > https://cwiki.apache.org/confluence/display/solr/De-Du

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Dotan Cohen
On Tue, Jul 30, 2013 at 11:00 PM, Mikhail Khludnev wrote: > Dotan, > > Could you please provide more line of the stack trace? Sure, thanks: java.lang.OutOfMemoryError: Java heap spacejava.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispat

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Dotan Cohen
On Tue, Jul 30, 2013 at 9:56 PM, Shawn Heisey wrote: > On 7/30/2013 12:49 PM, Dotan Cohen wrote: >> >> ‎Thanks, the query ran for almost 2 full minutes but it returned >> results! I'll google for how to increase the disk cache for queries >> like this. Other than the Qtime, is there no way to judg

Re: Ingesting geo data into Solr very slow

2013-07-30 Thread David Smiley (@MITRE.org)
Hi Marta, Presumably you are indexing polygons -- I suspect complex ones. There isn't too much that you can do about this right now other than index them in parallel. I see you are doing this in 2 threads; try 4, or maybe even 6. Also, ensure that maxDistErr is reflective of the smallest distan

solr4.0 how the log repeat "Waiting for client to connect to Zookeeper"

2013-07-30 Thread 黄飞鸿
Hi, The solr4.0’s log always show that “Waiting for client to connect to ZooKeeper” and “Client is connected to Zookeeper” , But I look at the code , it only happen when “state == KeeperState.Expired”. We can see the value of state is syncConnected, how did it happen? Can anyone he

Re: Measuring SOLR performance

2013-07-30 Thread Shawn Heisey
On 7/30/2013 6:59 PM, Roman Chyla wrote: > I have been wanting some tools for measuring performance of SOLR, similar > to Mike McCandles' lucene benchmark. > > so yet another monitor was born, is described here: > http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/ > > I teste

Re: [ANN] Solr Usability contest started

2013-07-30 Thread Alexandre Rafalovitch
Hello. I wanted to do a follow-up after the contest has been running for a week. It has been going relatively well. There was a lot of visitors last week, then a bit of quiet and then - after some of you re-announced the contest - a second wave of activities. Thanks to everybody contributing and

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Bill Bell
This seems like a fairly large issue. Can you create a Jira issue ? Bill Bell Sent from mobile On Jul 30, 2013, at 12:34 PM, Dotan Cohen wrote: > On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal wrote: >> Does adding facet.mincount=2 help? >> >> > > In fact, when adding facet.mincount=20 (I

Measuring SOLR performance

2013-07-30 Thread Roman Chyla
Hello, I have been wanting some tools for measuring performance of SOLR, similar to Mike McCandles' lucene benchmark. so yet another monitor was born, is described here: http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/ I tested it on the problem of garbage collectors (see

Solr rss indexation doubt

2013-07-30 Thread Luís Portela Afonso
Hi, I'm using Apache Solr to index RSS Feeds. I'm with success getting data (url and if feed is active to index) from a database, and using that has a source of an entity to index the rss data. I'm trying to reach a result but i don't get it. I will try to explain that with an example. The RSS

FieldCollapsing issues in SolrCloud 4.4

2013-07-30 Thread Ali, Saqib
Hello all, Is anyone experiencing issues with the numFound when using group=true in SolrCloud 4.4? Sometimes the results are off for us. I will post more details shortly. Thanks.

Ingesting geo data into Solr very slow

2013-07-30 Thread Simonian, Marta M (US SSA)
Hi, We are using Solr 4.4 to ingest geo data and it's really slow. When we don't index the geo it takes seconds to ingest 100, 000 records but as soon as we add it takes 2 hours. Also we found that when changing the distErrPct from 0.025 to 0.1, 1000 rows are ingested in 20 sec vs 2 min. But w

Ingesting geo data into Solr very slow

2013-07-30 Thread Simonian, Marta M (US SSA)
Hi, We are using Solr 4.4 to ingest geo data and it's really slow. When we don't index the geo it takes seconds to ingest 100, 000 records but as soon as we add it takes 2 hours. Also we found that when changing the distErrPct from 0.025 to 0.1, 1000 rows are ingested in 20 sec vs 2 min. But w

Re: Solr Cloud Questions

2013-07-30 Thread Timothy Potter
1) Depends on your document routing strategy. It sounds like you could be using the compositeId strategy and if so, there's still a hash range assigned to each shard, so you can split the big shards into smaller shards. 2) Since you're replicating in 2 places, when one of your servers crash, there

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Jack Krupansky
You could also try the terms component which provides a very efficient facet-like feature - counting the terms. And you can set a minimum term frequency of 2, so only the dups would come back: curl "http://localhost:8983/solr/terms?terms.fl=id&terms.mincount=2"; -- Jack Krupansky -Origina

Re: Performance question on Spatial Search

2013-07-30 Thread Luis Cappa Banda
Thank you very much, David. That was a great explanation! Regards, - Luis Cappa 2013/7/30 Smiley, David W. > Luis, > > field:* and field:[* TO *] are semantically equivalent -- they have the > same effect. But they internally work differently depending on the field > type. The field type ha

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
Luis, field:* and field:[* TO *] are semantically equivalent -- they have the same effect. But they internally work differently depending on the field type. The field type has the chance to intercept the range query to do something smart (FieldType.getRangeQuery(...)). Numeric/Date (trie) field

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
@David I will certainly update when we get the data refed... and if you have things you'd like to investigate or try out please let me know.. I'm happy to eval things at scale here... we will be taking this index from its current 45m records to 6-700m over the next few months as well.. steve On

SOLR Delta Import from a database View

2013-07-30 Thread danabdu
Hi All: I am new to SOLR and I would like to perform SOLR delta import from SQL Server database. I have seen a lot of examples online that use SQL server tables as the entity data sources in the dataconfig file. My question is - Is it possible to do delta import using SQL server database VIEW, rat

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
Very good read... Already using MMap... verified using pmap and vsz from top.. not sure what you mean by good hit raitio? Here are the stacks... Name Time (ms) Own Time (ms) org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext, Bits) 300879 203478 org.apache.luc

Re: Sending shard requests to all replicas

2013-07-30 Thread Isaac Hebsh
Hi, I submitted a new JIRA for this: https://issues.apache.org/jira/browse/SOLR-5092 A (very initial) patch is already attached. Reviews are very welcome. On Sun, Jul 28, 2013 at 4:50 PM, Erick Erickson wrote: > You'd probably start in CloudSolrServer in SolrJ code, > as far as I know that's wh

Solr Cloud Questions

2013-07-30 Thread AdityaR
Hi, I have created a solr cloud and have started using it , I have few questions regarding the set up and it would be really helpful if someone can answer these. Use Case: We have many clients and each clients data is in his own collection, we currently have 10 server cloud and have distributed

Re: Performance question on Spatial Search

2013-07-30 Thread Luis Cappa Banda
Hey, David, I´ve been reading the thread and I think that is one of the most educative mail-threads I´ve read in Solr mailing list. Just for curiosity: internally for Solr, is it the same a query like "field:*" and "field:[* TO *]"? I think that it´s expected to receive the same number of numFound

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
Steve, The FieldCache and DocValues are irrelevant to this problem. Solr's FilterCache is, and Lucene has no counterpart. Perhaps it would be cool if Solr could look for expensive field:* usages when parsing its queries and re-write them to use the FilterCache. That's quite doable, I think. I ju

Re: Solr Cloud Setup

2013-07-30 Thread AdityaR
I was able to get the setup to work. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Setup-tp4080182p4081434.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Email regular expression.

2013-07-30 Thread Luis Cappa Banda
I´ve been re-reading about that in older solr-mail-list messages, and it seems that a query like 'field:*' implies that internally the whole terms indexed are checked one by one even if they are some caches filled for that field. That make reasonable my poor performance in the past. However, it ma

Re: Performance question on Spatial Search

2013-07-30 Thread Mikhail Khludnev
On Tue, Jul 30, 2013 at 12:45 AM, Steven Bower wrote: > > - Most of my time (98%) is being spent in > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being Steven, please http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html .my benchmarking experience shows that

Re: Email regular expression.

2013-07-30 Thread Luis Cappa Banda
I´ve tried this kind of queries in the past but I detected that they have a poor performance and that they are incredibly slow. But it´s just my experience, maybe someone can share with us any other opinion. 2013/7/30 Raymond Wiker > On Jul 30, 2013, at 22:05 , Luis Cappa Banda wrote: > > Anyw

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Jack Krupansky
The Solr SignatureUpdateProcessorFactory is designed to facilitate dedupe... any particular reason you did not use it? See: http://wiki.apache.org/solr/Deduplication and https://cwiki.apache.org/confluence/display/solr/De-Duplication And I give a bunch of examples in my book. -- Jack Krupans

Re: Email regular expression.

2013-07-30 Thread Raymond Wiker
On Jul 30, 2013, at 22:05 , Luis Cappa Banda wrote: > Anyway, I still need to do a query like the following to retrieve those > documents with at least one E-mail detected: > > http://localhost:8080/mysolr/select?q=emails:[* TO > *]&start=0&rows=10&sort=mydate desc Can't you just use emails:* ?

Re: poor facet search performance

2013-07-30 Thread Mikhail Khludnev
On Tue, Jul 30, 2013 at 11:48 PM, Robert Stewart wrote: > Also we need to issue frequent commits since we are constantly streaming > new content into the system. I'd like to say show me profiler snapshot, but after that note. Solr's filter/field caches are top level datastructures, hence they ar

Re: Email regular expression.

2013-07-30 Thread Luis Cappa Banda
Hello guys, Hey, I think I´ve found how to do this just adding a filter. Just for anyone´s curiosity: Anyway, I still need to do a query like the following to retrieve those documents with at least one E-mail detected: http://localhost:8080/mysolr/select

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Mikhail Khludnev
Dotan, Could you please provide more line of the stack trace? I have no idea why it made worse at 4.3. I know that 4.3 can use facets backed on DocValues, which are modest for the heap. But from what I saw, but can be wrong it's disabled from numeric facets. Hence, I can suggest to reindex id as s

Re: Email regular expression.

2013-07-30 Thread Luis Cappa Banda
Hello, Jack, Steve, Thank you for your answers. I´ve never used UAX29URLEmailTokenizerFactory, but I´ve read about it before trying RegExp´s queries. As far as I know, UAX29URLEmailTokenizerFactory allows to tokenize an entry text value into patterns that match URLs, E-mails, etc. Reading the docu

poor facet search performance

2013-07-30 Thread Robert Stewart
A little bit of history: We built a solr-like solution on Lucene.NET and C# about 5 years ago, which including faceted search. In order to get really good facet performance, what we did was pre-cache all the facet fields in RAM as efficient compressed data structures (either a variable byte en

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Shawn Heisey
On 7/30/2013 12:49 PM, Dotan Cohen wrote: ‎Thanks, the query ran for almost 2 full minutes but it returned results! I'll google for how to increase the disk cache for queries like this. Other than the Qtime, is there no way to judge the amount of memory required for a particular query to run? T

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Dotan Cohen
On Tue, Jul 30, 2013 at 9:43 PM, Michael Della Bitta wrote: > Since this is a one-time problem, Have you thought of just dumping all the > IDs and looking for dupes using sort and awk or something similar to that? > All 100,000,000 of them :) That would take even longer! Also, I fear that this is

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Dotan Cohen
On Tue, Jul 30, 2013 at 9:24 PM, Shawn Heisey wrote: > Add &facet.method=enum to the query URL. This will cause Solr to enumerate > the facet information on every query rather than load it into the field > cache, which takes a lot of memory. Solr 4.1 was probably very close to > running out of m

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Michael Della Bitta
Since this is a one-time problem, Have you thought of just dumping all the IDs and looking for dupes using sort and awk or something similar to that? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Dotan Cohen
On Tue, Jul 30, 2013 at 9:23 PM, Michael Della Bitta wrote: > Are you talking about the document's ID field? > > If so, you can't have duplicates... the latter document would overwrite the > earlier. > > If not, sorry for asking irrelevant questions. :) > In Solr 4.1 we were using overwrite=false

Re: Sort top N results in solr after boosting

2013-07-30 Thread Utkarsh Sengar
Thanks guys! Will play around with it function query. Thanks, -Utkarsh On Tue, Jul 30, 2013 at 10:50 AM, Chris Hostetter wrote: > > : bq: I am also trying to figure out if I can place > : extra dimensions to the solr score which takes other attributes into > : consideration > > To re-iterate er

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Dotan Cohen
On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal wrote: > Does adding facet.mincount=2 help? > > In fact, when adding facet.mincount=20 (I know that some dupes are in the hundreds) I got the OutOfMemoryError in seconds instead of minutes. -- Dotan Cohen http://gibberish.co.il http://what-is-what

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Michael Della Bitta
Are you talking about the document's ID field? If so, you can't have duplicates... the latter document would overwrite the earlier. If not, sorry for asking irrelevant questions. :) Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Scienc

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Shawn Heisey
On 7/30/2013 12:16 PM, Dotan Cohen wrote: To search for duplicate IDs, I am running the following query: select?q=*:*&facet=true&facet.field=id&rows=0 However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving OutOfMemoryError errors instead of the desired facet: Might there be a les

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Aloke Ghoshal
Does adding facet.mincount=2 help? On Tue, Jul 30, 2013 at 11:46 PM, Dotan Cohen wrote: > To search for duplicate IDs, I am running the following query: > select?q=*:*&facet=true&facet.field=id&rows=0 > > However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving > OutOfMemoryError error

How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Dotan Cohen
To search for duplicate IDs, I am running the following query: select?q=*:*&facet=true&facet.field=id&rows=0 However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving OutOfMemoryError errors instead of the desired facet: java.lang.OutOfMemoryError: Java heap spacejava.lang.RuntimeExceptio

Trying to determine the benefit of spellcheck-based suggester vs. using terms component?

2013-07-30 Thread Timothy Potter
Going over the comments in SOLR-1316, I seemed to have lost the forrest for the trees. What is the benefit of using the spellcheck based suggester over something like the terms component to get suggestions as the user types? Maybe it is faster because it builds the in-memory data structure on comm

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
I am curious why the field:* walks the entire terms list.. could this be discovered from a field cache / docvalues? steve On Tue, Jul 30, 2013 at 2:00 PM, Steven Bower wrote: > Until I get the data refed I there was another field (a date field) that > was there and not when the geo field was/w

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
Until I get the data refed I there was another field (a date field) that was there and not when the geo field was/was not... i tried that field:* and query times come down to 2.5s .. also just removing that filter brings the query down to 30ms.. so I'm very hopeful that with just a boolean i'll be

Re: Sort top N results in solr after boosting

2013-07-30 Thread Chris Hostetter
: bq: I am also trying to figure out if I can place : extra dimensions to the solr score which takes other attributes into : consideration To re-iterate erick's point, you should definitely look at using things like the {!boost} qparser combined with funciton queries that take into account pre-

Re: Searching in stopwords

2013-07-30 Thread Chris Hostetter
: So if i search for companies like HR Club i get no results. Similarly : search for India HR giving no results. How can i get results in query for : following companies : take a look at the "CommonGramsFilterFactory" and "CommonGramsQueryFilterFactory" ... they shold let you remove stopwords,

Re: Failed to add documents

2013-07-30 Thread Jasvir Singh
On Tue, Jul 30, 2013 at 10:51 PM, Jack Krupansky-2 [via Lucene] wrote: > > Sorry, but you will have to ask your question in a haystack support forum. Ok thanks. > But, a 404 usually means the path of the request URL is wrong. What is the > request URL in this case? My url is http://127.0.0.1:89

Re: Shows different result with using 'and' and 'AND'

2013-07-30 Thread Erick Erickson
Try attaching &debug=query and see what the parsed query looks, that can often give you clues as to what's really going on. Of course if tag is a string type then Jack's comment is spot on, it's case sensitive. The admin/analysis page will also help you understand the analysis chains. But also, n

Re: Failed to add documents

2013-07-30 Thread Jack Krupansky
Sorry, but you will have to ask your question in a haystack support forum. But, a 404 usually means the path of the request URL is wrong. What is the request URL in this case? My guess is that you haven't configured haystack properly in terms of where your Solr server lives, but we can't help

Failed to add documents

2013-07-30 Thread Jasvir Singh
I am using apache solr 4.4.0 on ubuntu 13.04. I have to use apache solr with haystack, for which I am using documentation. But I got stuck at a point. Whenever I rebuild the index, it gives errorr "Failed to add docume

Re: Boost on specific fields

2013-07-30 Thread Chris Hostetter
: coming as part of search results. Here, I am applying boosting on the no of : reviews and the has_image(This will be "Y" Or "N") and I am expecting the : product which has no of reviews count is more and the has_image="Y" should : come first. But, in some of the cases , I am not getting what I

Using HP SiteScope to monitor individual Solr shards

2013-07-30 Thread Ali, Saqib
We would like to use HP SiteScope to monitor the availability of the individual Solr shards. Any ideas on how we can do that? Is there a shard based URL that is a sure shot of knowing that the shard is feeling healthy? Thanks! :)

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
Will give the boolean thing a shot... makes sense... On Tue, Jul 30, 2013 at 11:53 AM, Smiley, David W. wrote: > I see the problem ‹ it's +pp:*. It may look innocent but it's a > performance killer. What your telling Lucene to do is iterate over > *every* term in this index to find all document

Is there a way to prioritize document updates when using solrCloud?

2013-07-30 Thread SolrLover
We index the data using queue (from application) and batch processing (around 10 million documents). We want the data sent by queue to be seen instantaneously even when delta process is submitting the documents. I was thinking of submitting documents from queue to a node and using another node for

DataImportHandler rows parameter and performance

2013-07-30 Thread Luis Lebolo
Hi All, I'm using the Admin UI dataimport page to load some documents into my index. There's a rows parameter that you can leave blank (to load all documents). When I change it to the maximum number of documents, the performance drops by a factor of 10. For example, I have 1627 root entities. If

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
I see the problem ‹ it's +pp:*. It may look innocent but it's a performance killer. What your telling Lucene to do is iterate over *every* term in this index to find all documents that have this data. Most fields are pretty slow to do that. Lucene/Solr does not have some kind of cache for this. I

Re: CachedSqlEntityProcessor not adding fields

2013-07-30 Thread Luis Lebolo
I'm noticing some very odd behavior using dataimport from the Admin UI. Whenever I limit the number of rows to 75 or below, the aliases field never gets populated. As soon as I increase the limit to 76 or more, the aliases field gets populated! What am I not understanding here? On Tue, Jul 30,

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
#1 Here is my query: sort=vid asc start=0 rows=1000 defType=edismax q=*:* fq=recordType:"xxx" fq=vt:"X12B" AND fq=(cls:"3" OR cls:"8") fq=dt:[2013-05-08T00:00:00.00Z TO 2013-07-08T00:00:00.00Z] fq=(vid:86XXX73 OR vid:86XXX20 OR vid:89XXX60 OR vid:89XXX72 OR vid:89XXX48 OR vid:89XXX31 OR vid:89XXX2

Re: Email regular expression.

2013-07-30 Thread Jack Krupansky
Just use the UAX29URLEmailTokenizerFactory, which recognizes email addresses. Any particular reason that you're trying to reinvent the wheel? -- Jack Krupansky -Original Message- From: Luis Cappa Banda Sent: Tuesday, July 30, 2013 10:53 AM To: solr-user@lucene.apache.org Subject: Ema

Re: Email regular expression.

2013-07-30 Thread Steve Rowe
Luis, do you know about UAX29URLEmailTokenizerFactory?: On Jul 30, 2013, at 10:53 AM, Luis Cappa Banda wrote: > Hello everyone! > > Unfortunately I have to search all E-mail addresses found in a te

Re: Email regular expression.

2013-07-30 Thread Andy Lester
On Jul 30, 2013, at 9:53 AM, Luis Cappa Banda wrote: > The syntax is the following: > > *E-mail: * > text:/[a-z0-9_\|-]+(\.[a-z0-9_\|-]|)*@[a-z0-9-]|(\.[a-z0-9-]|)*\.([a-z]{2,4})/ Please note that the question of "How do I write a regex to match an email address" is one of the most discussed

Re: SolrCloud commit process is too time consuming, even if documents are light

2013-07-30 Thread Mark Miller
I don't seem to be seeing a signifigant slowdown over time when I use the old defaults for merge threads and max merges. - Mark On Jul 25, 2013, at 10:17 AM, Mark Miller wrote: > I'm looking into some possible slow down after long indexing issues when I > get back from vacation. This could be

CachedSqlEntityProcessor not adding fields

2013-07-30 Thread Luis Lebolo
Hi All, I'm trying to use CachedSqlEntityProcessor in one of my sub-entities, but the field never gets populated. I'm using Solr 4.4. The field is a multi-valued field: The relevant part of my data-config.xml looks like: ... Let me know if you

Email regular expression.

2013-07-30 Thread Luis Cappa Banda
Hello everyone! Unfortunately I have to search all E-mail addresses found in a text field from each document. I've been reading for a while how to use RegExp's in Solr, but after trying some of them they didn't work. I've noticed that Lucene RegExp syntax sometimes is very different from the class

Re: Performance question on Spatial Search

2013-07-30 Thread David Smiley (@MITRE.org)
Steve, (1) Can you give a specific example of how your are specifying the spatial query? I'm looking to ensure you are not using "IsWithin", which is not meant for point data. If your query shape is a circle or the bounding box of a circle, you should use the geofilt query parser, otherwise use

Re: Improper shutdown of Solr in Jetty 9

2013-07-30 Thread Alexandre Rafalovitch
Of course, I meant Jetty (not Tomcat). So apologies for spam and confusion of my own. The rest of the statement stands. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at

Re: Improper shutdown of Solr in Jetty 9

2013-07-30 Thread Alexandre Rafalovitch
Thanks for letting us know. See if you can add it to the documentation somewhere. Solr is not using Tomcat 9, but I believe that was primarily because Tomcat 9 requires Java 7 and Solr 4.x is staying with Java 6 as minimum requirement. Regards, Alex. Personal website: http://www.outerthoughts.

Re: Improper shutdown of Solr in Jetty 9

2013-07-30 Thread Artem Karpenko
Uh, sorry for spamming, but if anyone interested there is a way to properly shutdown Jetty when it's launched with --exec flag. You can use JMX to invoke method stop() on the Jetty's Server MBean. This triggers a proper shutdown with all Solr's close() callbacks executed. I wonder why it's not n

Re: Improper shutdown of Solr in Jetty 9

2013-07-30 Thread Artem Karpenko
After some investigation I found that the problem is not with Jetty's version but usage of --exec flag. Namely, when --exec is used (to specify JVM args) then shutdown is not graceful, it seems that Java process that is just killed. Not sure how to handle this... Regards, Artem Karpenko. 29.07

Re: Shows different result with using 'and' and 'AND'

2013-07-30 Thread Jack Krupansky
#3 and #4 are different queries - the "other" term is used in different fields. What is your default search field, which will be used for "other" in #3? Is your "tag" field a "string" field type? If so, then it is case sensitive. If you really need it to be case insensitive, make it a "text" f

Re: Auto Correction of Solr Query

2013-07-30 Thread Jack Krupansky
Not at this time. It is a reasonable request, but Solr doesn't yet have that feature. You might want to search Jira to see if it has been filed it. If not, file a request. -- Jack Krupansky -Original Message- From: sivaprasad Sent: Tuesday, July 30, 2013 5:40 AM To: solr-user@lucen

Re: Synonyms with wildcard search

2013-07-30 Thread Jack Krupansky
Sorry, but Solr synonym processing does not know about wildcards, so it is bypassed when a wildcard is present. Technically, it could probably be enhanced to support them, at least for some common special cases such as yours, but that prospect won't help you right now. Your best bet is to pr

Re: Boost on specific fields

2013-07-30 Thread Jack Krupansky
Can you point to a specific portion of your query that leads you to "expect" some particular document to score higher? Please keep in mind that Solr does not yet have the ability to read your mind - it simply does its best to follow your specific instructions. Doing a keyword match on a field

Re: Performance question on Spatial Search

2013-07-30 Thread Erick Erickson
bq: i've added {!cache=false} Ahh, ok. forget my comments on warming then, they're irrelevant. Heap probably isn't relevant either given, as you say, you don't see pressure there. What puzzles me then is why you're spending all your time in copyToByteArray(long,Object,long,long). I _suppose_ (an

Re: Shows different result with using 'and' and 'AND'

2013-07-30 Thread Payal.Mulani
Hi Raymond Wiker, When we search like this 1) tag:”test” works 2) tag:”TEST” works 3) tag:”test” && tag:”other” works to find items with both tags 4) tag:”TEST” && tag:”other” *doesn’t work.* Either 2 should fail with true case sensitivity or 4 should work (as the combination of two valid

Index timestamp of pdf in unix timeformat

2013-07-30 Thread xan
Currently, while using ExtractingResourceHandler to index rich documents like pdfs, docs, etc. solr automatically indexes the time-created/modified in human-readable time format (Wed May 29 20:38:30 IST 2013). How can I make solr to index the time in unixtime format? -- View this message in co

Re: DIH to index the data - 250 millions - Need a best architecture

2013-07-30 Thread Santanu8939967892
Hi Shawn, Thanks for your detailed explanation. Will do a POC and finalize the arch. With Regards, Santanu On Tue, Jul 30, 2013 at 12:20 PM, Shawn Heisey wrote: > On 7/30/2013 12:23 AM, Santanu8939967892 wrote: > > Yes, your assumption is correct. The index size is around 250 GB and

Re: Using a dictionary to boost queries

2013-07-30 Thread Giovanni Bricconi
Maybe you can try with synonyms add a to the field type you are using for text and then place habeas corpus => habeascorpusxx int the special_words.txt file then reindex some documents and try some queries with debugQuery=true. remember to reload the core when changing configuration. 2013

Re: Auto Correction of Solr Query

2013-07-30 Thread sivaprasad
Thank you for the quick response. I checked the document on spellcheck.collate. Looks like, it is going to return the suggestion to the client and the client need to make one more request to the server with the suggestion. Is there any way to "auto correct" at the server end? -- View this mess

Synonyms with wildcard search

2013-07-30 Thread Sandeep Gupta
Hello All, I want to know whether it is possible to make a query of word which has synonym+wildcard. For example : I have one field which is type of text_en (default fieldType in 4.3.1) And synonym.txt file has this entry colour => color Now when I am using full text search as colour* (with wil

Re: Machine memory full

2013-07-30 Thread Ranjith Venkatesan
Thanks for the reply. I think this approach will work only for new collections. Is there any approach to shift some existing cores to a new machine or node?? -- View this message in context: http://lucene.472066.n3.nabble.com/Machine-memory-full-tp4080511p4081235.html Sent from the Solr - User

Re: DataImportHandler, BlobTransformer, FieldReaderDataSource and TikaEntityExtractor

2013-07-30 Thread Shalin Shekhar Mangar
There's no BlobTransformer in DataImportHandler. You'll have to write one. Also, you'd probably need to write a FieldInputStreamDataSource instead of FieldReaderDataSource. On Tue, Jul 30, 2013 at 12:30 PM, Raymond Wiker wrote: > I have a case where I want to documents and metadata content from

Boost on specific fields

2013-07-30 Thread sivaprasad
Hi, I have indexed product information , the product name, no of reviews and has_image fields. For example, any two products has "laptop" in the product name and the user issued "laptop" as the search query, both the products are coming as part of search results. Here, I am applying boosting on t

Re: Auto Correction of Solr Query

2013-07-30 Thread Artem Karpenko
Hi Siva, you might want to consider spell check component: http://wiki.apache.org/solr/SpellCheckComponent. Parameter "collate=true" can be used to automatically replace query with the top suggestions. Best, Artem. 30.07.2013 10:31, sivaprasad пишет: Hi, Is there any way to "auto correct"

Auto Correction of Solr Query

2013-07-30 Thread sivaprasad
Hi, Is there any way to "auto correct" the Solr query and get the results? For example, user tries to search for "beats by dre" , but by mistake , he typed "beats bt dre". In this case, Solr should correct the query and return the results for "beats by dre". Is there any suggestions, how we can a

Re: Solr Cloud - How to balance Batch and Queue indexing?

2013-07-30 Thread Aditya
Hi, Do you want 5 replicas? 1 or 2 is enough. If you already have 100 million records, you don't need to do batch indexing. Push it once, Solr has the capability to soft commit every N docs. Use round robin and send documents to different core. When you search, search from all the cores. How yo

DataImportHandler, BlobTransformer, FieldReaderDataSource and TikaEntityExtractor

2013-07-30 Thread Raymond Wiker
I have a case where I want to documents and metadata content from a datebase. The metadata is is not a problem, but it does not appear that I can handle the document content (held as BLOBS in the database) with out-of-the-box SOLR 4.4 functionality. I was hoping to to be able to solve this by doin