RE: Mapping and Capture in ExtractingRequestHandler

2011-12-20 Thread Swapna Vuppala
Hi Erick, Can you please give me little more information about SolrJ program and how to use it to construct a Solr document ? Thanks and Regards, Swapna. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, December 21, 2011 2:28 AM To: solr-user@lu

Re: Release build or code for SolrCloud

2011-12-20 Thread Mark Miller
You might find the solr/cloud-dev/solrcloud.sh script informative. From a solrcloud branch checkout, you can run it and it will start up a 2 shard, 6 node cluster with zookeeper running on a single node. stop.sh will shutdown the 6 nodes. Once you start up the nodes, you can start indexing and s

Re: Inserting a field in the json doc before indexing

2011-12-20 Thread Dipti Srivastava
Thanks for responding. Both options seem good. For now I just need to manipulate the JSON string I get and add a special field to it which I will later on use for my searches. I am going with the approach of using a JSONObject and converting it back to string. Dipti On 12/20/11 10:32 AM, "Erick Er

Re: Release build or code for SolrCloud

2011-12-20 Thread Dipti Srivastava
Thanks for all responses. I got the code from the trunk. Now I will work through rest of the steps. Dipti On 12/20/11 1:58 PM, "Chris Hostetter" wrote: > >: >> I am following the 2 shard example from the wiki page >: >> http://wiki.apache.org/solr/SolrCloud#SolrCloud-1 > >Everything on that wiki

Re: Release build or code for SolrCloud

2011-12-20 Thread Chris Hostetter
: >> I am following the 2 shard example from the wiki page : >> http://wiki.apache.org/solr/SolrCloud#SolrCloud-1 Everything on that wiki should apply to trunk, as noted on the wiki page itself. the "solrcloud" branch people have mentioned is related to this comment from that wiki page... >>

Re: Release build or code for SolrCloud

2011-12-20 Thread Dipti Srivastava
Is this is the branch for SolrCloud? If not, then what is the location for the SolrCloud branch. https://svn.apache.org/repos/asf/lucene/dev/trunk/ Thanks! Dipti On 12/20/11 1:20 PM, "Rafał Kuć" wrote: >Hello! > >Those examples should work with the code in solrcloud branch. > >-- >Regards, > R

Re: queryResultCache hit count is not being increased when programmatically adding Lucene queries as filters in the SearchComponent

2011-12-20 Thread Igor Muntyan
Thanks Chris, Now it all makes sense. One of my filters is a date range filter with the current time as a lower range, so it changes with every query. I have just tried to comment it out and I see the queryResultCache being utilized. I will try to round the currentTimeMillis down to the nearest mi

Re: Solr Distributed Search vs Hadoop

2011-12-20 Thread Ted Dunning
Well that begins to not look so much like a Solr/Lucene problem. Overall data is moderately large (TB's to 10's of TB's) for Lucene and the individual user profiles are distinctly large to be storing in Lucene. If there is part of the profile that you might want to search, that would be appropria

Re: Release build or code for SolrCloud

2011-12-20 Thread Rafał Kuć
Hello! Those examples should work with the code in solrcloud branch. -- Regards, Rafał Kuć > I am following the 2 shard example from the wiki page > http://wiki.apache.org/solr/SolrCloud#SolrCloud-1 > And it points me to > Getting started: > Check out and build the trunk: > https://svn.apach

Re: Release build or code for SolrCloud

2011-12-20 Thread Dipti Srivastava
I am following the 2 shard example from the wiki page http://wiki.apache.org/solr/SolrCloud#SolrCloud-1 And it points me to Getting started: Check out and build the trunk: https://svn.apache.org/repos/asf/lucene/dev/trunk and build the example server with cd solr; ant example. That's why I am n

Re: Solr Distributed Search vs Hadoop

2011-12-20 Thread Alireza Salimi
Well, actually we haven't started the actual project yet. But probably it will have to handle the data of millions of users, and a rough estimation for each user's data would be something around 5 MB. The other problem is that those data will be changed very often. I hope I answered your question

Re: Release build or code for SolrCloud

2011-12-20 Thread Rafał Kuć
Hello! Please take a look at the solrcloud branch of the svn repository. > Hi, > I am looking to do a POC with SolrCloud, where could I get the > actual release build or else code that I can build from. > Thanks, > Dipti > > This message is private and confidenti

Re: Solr Distributed Search vs Hadoop

2011-12-20 Thread Ted Dunning
You didn't mention how big your data is or how you create it. Hadoop would mostly used in the preparation of the data or the off-line creation of indexes. On Tue, Dec 20, 2011 at 12:28 PM, Alireza Salimi wrote: > Hi, > > I have a basic question, let's say we're going to have a very very huge set

Release build or code for SolrCloud

2011-12-20 Thread Dipti Srivastava
Hi, I am looking to do a POC with SolrCloud, where could I get the actual release build or else code that I can build from. Thanks, Dipti This message is private and confidential. If you have received it in error, please notify the sender and remove it from your

Re: Mapping and Capture in ExtractingRequestHandler

2011-12-20 Thread Erick Erickson
When you start getting into complex HTML extraction, you're probably better off using a SolrJ program with a forgiving HTML parser and extracting the relevant bits yourself and construction a SolrDocument. FWIW, Erick On Tue, Dec 20, 2011 at 12:54 AM, Swapna Vuppala wrote: > Hi, > > I understand

Re: queryResultCache hit count is not being increased when programmatically adding Lucene queries as filters in the SearchComponent

2011-12-20 Thread Chris Hostetter
: In the /admin/stats.jsp I have noticed that if the code above gets executed : then my queryResultCache hit count does not increase. Filters are part of the cache key for the queryResultCache (because the cache contains the sorted paginated results *after* filters have been applied) so if your

Solr Distributed Search vs Hadoop

2011-12-20 Thread Alireza Salimi
Hi, I have a basic question, let's say we're going to have a very very huge set of data. In a way that for sure we will need many servers (tens or hundreds of servers). We will also need failover. Now the question is, if we should use Hadoop or using Solr Distributed Search with shards would be en

RE: Poor performance on distributed search

2011-12-20 Thread Chris Hostetter
: I had a similar requirement in my project, where a user might ask for up : to 3000 results. What I did was change SolrIndexSearcher.doc(int, Set) : to retrieve the unique key from the field cache instead of retrieving it : as a stored field from disk. This resulted in a massive speed : impro

Re: Poor performance on distributed search

2011-12-20 Thread Chris Hostetter
: So why do you have this 2,000 requirement in the first : place? This really sounds like an XY problem. I would really suggest re-visiting this question. No sinle user is going to look at 2000 docs on a single page, and in your previous email you said there was a requirement to ask solr for 2

Re: Poor performance on distributed search

2011-12-20 Thread Chris Hostetter
: For example I have 4 shards. Finally, I need 2000 docs. Now, when I'm using : &shards=127.0.0.1:8080/solr/shard1,127.0.0.1:8080/solr/shard2,127.0.0.1:8080/solr/shard3,127.0.0.1:8080/solr/shard4 : Solr gets 2000 docs from each shard (shard1,2,3,4, summary we have 8000 : docs) merge and sort it,

Re: Default Search UI not working

2011-12-20 Thread Chris Hostetter
: Subject: Default Search UI not working : In-Reply-To: <1324287306816-3597893.p...@n3.nabble.com> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fres

Call RequestHandler from QueryComponent

2011-12-20 Thread marita
Thanks a lot Hoss. I will try it today, that's exactly what I needed. Sorry again about the multiple emails. Thanks, Maria From: Chris Hostetter [hossman_luc...@fucit.org] > Sent: Friday, December 16, 2011 3:23 PM > To: solr-user@lucene.apache.org > Subject: Re: Call RequestHandler from QueryCom

Re: Inserting a field in the json doc before indexing

2011-12-20 Thread Erick Erickson
The other option is to write custom update handler that injected the data, a lot depends on whether you want to put the load on the server or client. Best Erick On Mon, Dec 19, 2011 at 1:35 PM, Anuj Kumar wrote: > Hi Dipti, > > If you are receiving the JSON within your Java code, you can try an

edismax ignores the quoted sub phrase query ?

2011-12-20 Thread ldavid2020
Hi, I am new to edismax, and trying to migrate from dismax to edismax. For the queries with the explicitly quoted sub phrase query, it seems edismax will ignore the quoted one, compared with dismax during the whole query phrase matching process (pf). Here is one example: For the same query: 2012

Re: Exception using SolrJ

2011-12-20 Thread Otis Gospodnetic
Shawn, Give httping a try: http://www.vanheusden.com/httping/ It may reveal something about connection being dropped periodically. Maybe even a plain ping would show some dropped packets if it's a general network and not a Solr-specific issue. Otis Performance Monitoring SaaS for Solr - h

Re: Exception using SolrJ

2011-12-20 Thread Shawn Heisey
On 12/20/2011 2:57 AM, Chantal Ackermann wrote: Hi Shawn, the exception indicates that the connection was lost. I'm sure you figured that out for yourself. Questions: - is that specific server instance really running? That is, can you reach it via browser? - If yes: how is your connection pool

Re: issues with WordDelimiterFilter

2011-12-20 Thread Otis Gospodnetic
Hi Steven, Hm, not being able to find the exact original phrase indeed sounds buggy to me, worthy of a JIRA issue and a unit test that shows this happening, if you can? Thanks, Otis  Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html >_

Re: In-web search

2011-12-20 Thread Otis Gospodnetic
Hi Remi, That depends on how you've structured and indexed your documents (web pages?) with Solr. If you've extracted the hostname into a 'hostname' field and indexed it, then you should be able to use syntax like:   hostname:www.sematext.com If you've extracted the domain name into a 'domain'

In-web search

2011-12-20 Thread remi tassing
Hi, What is the query syntax for Solr to search within a specific site? For example in google you can search like this: "Solr site:apache.org" Remi

Re: Problems while searching in default field

2011-12-20 Thread Ahmet Arslan
> This is a FAQ and nothing to do with default field. > > http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F > Also you might interested in this new feature : https://issues.apache.org/jira/browse/SOLR-2438

Re: full-data import suddenly stopped working. Total Rows Fetched remains 0

2011-12-20 Thread Chantal Ackermann
Never would have thought that MS could help me earn such honours... ;D On Tue, 2011-12-20 at 12:57 +0100, PeterKerk wrote: > Chantal...you are the queen! :p > That was it, I downgraded to 6.27 and now it works again...thank god! > > > -- > View this message in context: > http://lucene.472066.n3

Re: Problems while searching in default field

2011-12-20 Thread Ahmet Arslan
> I noticed a scenario where in I am giving the search string > as Country* > without mentioning any field name, but no documents are > fetched but if i > give the query string as country*, then the docs are > getting fetched. This is a FAQ and nothing to do with default field. http://wiki.apache

Problems while searching in default field

2011-12-20 Thread mechravi25
Hi, I noticed a scenario where in I am giving the search string as Country* without mentioning any field name, but no documents are fetched but if i give the query string as country*, then the docs are getting fetched. If no field name is given in search, then the search happens in the default on

Re: multiple temporary indexes

2011-12-20 Thread graham
Could be, but I had been assuming I would need to delete each result set at the end of the users' sessions to stop the index growing and indexing time growing with it, which would probably rule out complete statistics. Graham On 12/20/11 11:58, Chantal Ackermann wrote: > > You could also create

Re: multiple temporary indexes

2011-12-20 Thread Chantal Ackermann
You could also create a single index and use a field "user" to filter results for only a single user. This would also allow for statistics over the complete base. Chantal On Tue, 2011-12-20 at 12:43 +0100, graham wrote: > Hi, > > I'm a complete newbie and currently at the stage of wondering w

Re: full-data import suddenly stopped working. Total Rows Fetched remains 0

2011-12-20 Thread PeterKerk
Chantal...you are the queen! :p That was it, I downgraded to 6.27 and now it works again...thank god! -- View this message in context: http://lucene.472066.n3.nabble.com/full-data-import-suddenly-stopped-working-Total-Rows-Fetched-remains-0-tp3599004p3601013.html Sent from the Solr - User mailin

multiple temporary indexes

2011-12-20 Thread graham
Hi, I'm a complete newbie and currently at the stage of wondering whether Solr might be suitable for what I want. I need to take search results collected by another system in response to user requests and allow each user to view their set of results in different ways: sorting into different order

Re: Exception using SolrJ

2011-12-20 Thread Chantal Ackermann
Hi Shawn, the exception indicates that the connection was lost. I'm sure you figured that out for yourself. Questions: - is that specific server instance really running? That is, can you reach it via browser? - If yes: how is your connection pool configured and how do you initialize it? More spec

Re: full-data import suddenly stopped working. Total Rows Fetched remains 0

2011-12-20 Thread Chantal Ackermann
DIH does not simply fail. Without more information, it's hard just to guess. As your using MS SQLServer, maybe you ran into this? http://blogs.msdn.com/b/jdbcteam/archive/2011/11/07/supported-java-versions-november-2011.aspx Would be a problem caused by certain java versions. Have you turned the

Re: DeleteByQuery and date filter

2011-12-20 Thread darul
One of possible correct syntax may be : (context:BACKOFFICE)AND(type:IDEA)AND(creationDate:[2011-12-01T00:00:00Z TO 2011-12-20T00:00:00Z]) ... -- View this message in context: http://lucene.472066.n3.nabble.com/DeleteByQuery-and-date-filter-tp3600739p3600769.html Sent from the Solr - User maili

Re: DeleteByQuery and date filter

2011-12-20 Thread darul
I was thinking of this kind of reason but no. See my next reply. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/DeleteByQuery-and-date-filter-tp3600739p3600757.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: DeleteByQuery and date filter

2011-12-20 Thread darul
Looks like millisecond in formatter are useless and make query failed. Replace (context:BACKOFFICE)(type:IDEA)AND(creationDate:[2011-12-01T00:00:00*.000*Z TO 2011-12-25T00:00:00*.000*Z]) By (context:BACKOFFICE)(type:IDEA)AND(creationDate:[2011-12-01T00:00:00Z TO 2011-12-25T00:00:00Z]) Solved

Re: DeleteByQuery and date filter

2011-12-20 Thread yunfei wu
is it caused by space which should be encoded as %20? Yunfei On Tue, Dec 20, 2011 at 1:05 AM, darul wrote: > Hello, > > I have the following issue when using deleteByQuery, it works fine with > simple filters: > > > > and fail when using date filter > > > > Can you help me ? I have tried a lot

DeleteByQuery and date filter

2011-12-20 Thread darul
Hello, I have the following issue when using deleteByQuery, it works fine with simple filters: and fail when using date filter Can you help me ? I have tried a lot of syntax but not found the good one yet...boring Thanks, Julien -- View this message in context: http://lucene.472066.n3.

Re: Poor performance on distributed search

2011-12-20 Thread ku3ia
tomas.zerolo wrote > > But then the results would be wrong? Suppose the documents are not evenly > distributed (wrt the sort criterium) across all the shards. In an extreme > case, just imagine all 2000 top-most documents are on shard 3. You would > get > the 500 top-most (from shard 3) and some

Re: Poor performance on distributed search

2011-12-20 Thread Tomas Zerolo
On Mon, Dec 19, 2011 at 01:32:22PM -0800, ku3ia wrote: > >>Uhm, either I misunderstand your question or you're doing > >>a lot of extra work for nothing > > >>The whole point of sharding it exactly to collect the top N docs > >>from each shard and merge them into a single result [...] > >>