Boosting documents with terms derived from clustering - good idea?

2013-05-14 Thread David Parks
We have a number of queries that produce good results based on the textual data, but are contextually wrong (for example, an "SSD hard drive" search matches the music album "SSD hip hop drives us crazy". Textually a fair match, but SSD is a term that strongly relates to technical documents.

RE: Is the CoreAdmin RENAME method atomic?

2013-05-09 Thread David Parks
Find the discussion titled "Indexing off the production servers" just a week ago in this same forum, there is a significant discussion of this feature that you will probably want to review. -Original Message- From: Lan [mailto:dung@gmail.com] Sent: Friday, May 10, 2013 3:42 AM To: so

RE: More Like This and Caching

2013-05-09 Thread David Parks
I'm not the expert here, but perhaps what you're noticing is actually the OS's disk cache. The actual solr index isn't cached by solr, but as you read the blocks off disk the OS disk cache probably did cache those blocks for you. On the 2nd run the index blocks were read out of memory. There was a

RE: Solr Cloud with large synonyms.txt

2013-05-08 Thread David Parks
I can see your point, though I think edge cases would be one concern, if someone *can* create a very large synonyms file, someone *will* create that file. What would you set the zookeeper max data size to be? 50MB? 100MB? Someone is going to do something bad if there's nothing to tell them not to

RE: Solr Cloud with large synonyms.txt

2013-05-06 Thread David Parks
Wouldn't it make more sense to only store a pointer to a synonyms file in zookeeper? Maybe just make the synonyms file accessible via http so other boxes can copy it if needed? Zookeeper was never meant for storing significant amounts of data. -Original Message- From: Jan Høydahl [mailto:

RE: Indexing off of the production servers

2013-05-06 Thread David Parks
So, am I following this correctly by saying that, this proposed solution would present us a way to index a collection on an offline/dev solr cloud instance and *move* that pre-prepared index to the production server using an alias/rename trick? That seems like a reasonably doable solution. I also

RE: Indexing off of the production servers

2013-05-06 Thread David Parks
of them and every shard has 2 replica. When you > > send a query into a SolrCloud every replica will help you for > > searching and if > you > > add more replicas to your SolrCloud your search performance will improve. > > > > > > 2013/5/6 David Parks >

Indexing off of the production servers

2013-05-06 Thread David Parks
I've had trouble figuring out what options exist if I want to perform all indexing off of the production servers (I'd like to keep them only for user queries). We index data in batches roughly daily, ideally I'd index all solr cloud shards offline, then move the final index files to the solr cl

RE: Bug? JSON output changes when switching to solr cloud

2013-04-22 Thread David Parks
Subject: Re: Bug? JSON output changes when switching to solr cloud Thanks David, I've confirmed this is still a problem in trunk and opened https://issues.apache.org/jira/browse/SOLR-4746 -Yonik http://lucidworks.com On Sun, Apr 21, 2013 at 11:16 PM, David Parks wrote: > We just

Bug? JSON output changes when switching to solr cloud

2013-04-21 Thread David Parks
We just took an installation of 4.1 which was working fine and changed it to run as solr cloud. We encountered the most incredibly bizarre apparent bug: In the JSON output, a colon ':' changed to a comma ',', which of course broke the JSON parser. I'm guessing I should file this as a bug, but it

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
Friday, April 19, 2013 9:42 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover On 4/19/2013 3:48 AM, David Parks wrote: > The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has > dark grey allocation of 602MB, and light grey of an

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
Wow, thank you for those benchmarks Toke, that really gives me some firm footing to stand on in knowing what to expect and thinking out which path to venture down. It's tremendously appreciated! Dave -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Frida

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
day, April 19, 2013 4:19 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover On 4/19/2013 2:15 AM, David Parks wrote: > Interesting. I'm trying to correlate this new understanding to what I > see on my servers. I've got one server

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
uery over every single GB of data. If you only actually query over, say, 500MB of the 120GB data in your dev environment, you would only use 500MB worth of RAM for caching. Not 120GB On Fri, Apr 19, 2013 at 7:55 AM, David Parks wrote: > Wow! That was the most pointed, concise discussion of h

RE: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
---Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, April 19, 2013 11:51 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover On 4/18/2013 8:12 PM, David Parks wrote: > I think I still don't understand something h

RE: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
disk performance, and CPU regardless of how you lay out the cluster otherwise performance will suffer. My guess is if each Solr had sufficient resources, you wouldn't actually notice much difference in query performance. Tim On Thu, Apr 18, 2013 at 8:03 AM, David Parks wrote: > But my con

RE: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
On Apr 18, 2013 3:11 AM, "David Parks" wrote: > Step 1: distribute processing > > We have 2 servers in which we'll run 2 SolrCloud instances on. > > We'll define 2 shards so that both servers are busy for each request > (improving response time of the req

SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
Step 1: distribute processing We have 2 servers in which we'll run 2 SolrCloud instances on. We'll define 2 shards so that both servers are busy for each request (improving response time of the request). Step 2: Failover We would now like to ensure that if either of the servers goes down (we

RE: MoreLikeThis - Odd results - what am I doing wrong?

2013-04-02 Thread David Parks
Isn't this an AWS security groups question? You should probably post this question on the AWS forums, but for the moment, here's the basic reading material - go set up your EC2 security groups and lock down your systems. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-s

RE: Slow queries for common terms

2013-03-25 Thread David Parks
ng as it doesn't have an fq clause. Best Erick On Sat, Mar 23, 2013 at 3:10 AM, David Parks wrote: > I see the CPU working very hard, and at the same time I see 2 MB/sec > disk access for that 15 seconds. I am not running it this instant, but > it seems to me that there

RE: Slow queries for common terms

2013-03-23 Thread David Parks
I see the CPU working very hard, and at the same time I see 2 MB/sec disk access for that 15 seconds. I am not running it this instant, but it seems to me that there was more CPU cycles available, so unless it's an issue of not being able to multithread it any further I'd say it's more IO related.

RE: Slow queries for common terms

2013-03-21 Thread David Parks
ll have acceptable indexing/query performance. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 21. mars 2013 kl. 12:43 skrev David Parks : > We have 300M documents, each about a paragraph of text on average. The > index is 140GB i

Slow queries for common terms

2013-03-21 Thread David Parks
I've got a query that takes 15 seconds to return whenever I have the term "book" in a query that isn't cached. That's a pretty common term in our search index. We're indexing about 120 GB of text data. We only store terms and IDs, no document data, and the disk is virtually unused, it's all CPU

RE: Slow queries for common terms

2013-03-21 Thread David Parks
how much RAM, whether you utilize disk caching well enough and many other things which could affect this situation. But the pure fact that only a few common search words trigger such a delay would suggest commongrams as a possible way forward. -- Jan Høydahl, search solution architect Cominvent AS

Slow queries for common terms

2013-03-21 Thread David Parks
I've got a query that takes 15 seconds to return whenever I have the term "book" in a query that isn't cached. That's a pretty common term in our search index. We're indexing about 120 GB of text data. We only store terms and IDs, no document data, and the disk is virtually unused, it's all CPU tim

RE: Is Solr more CPU bound or IO bound?

2013-03-17 Thread David Parks
ibing your use case in more details with the above questions so we'd be able to give you guidelines. Best, Manu On Mon, Mar 18, 2013 at 3:55 AM, David Parks wrote: > I'm spec'ing out some hardware for a first go at our production Solr > instance, but I haven't spent

Is Solr more CPU bound or IO bound?

2013-03-17 Thread David Parks
I'm spec'ing out some hardware for a first go at our production Solr instance, but I haven't spent enough time loadtesting it yet. What I want to ask if how IO intensive solr is vs. CPU intensive, typically. Specifically I'm considering whether to dual-purpose the Solr servers to run Solr a

RE: After upgrade to solr4, search doesn't work

2013-03-07 Thread David Parks
ry much for all your help on this, it certainly helped me get my configuration straight and the upgrade to 4 is now complete. All the best, David -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, March 06, 2013 7:56 PM To: solr-user@lucene.apache.org;

Re: After upgrade to solr4, search doesn't work

2013-03-05 Thread David Parks
ied a comma separated list of my fields here but that was invalid. dvddvdid:dvdid:dvd From: David Parks To: "solr-user@lucene.apache.org" Sent: Wednesday, March 6, 2013 1:52 PM Subject: Re: After upgrade to solr4, search doesn't work Good th

Re: After upgrade to solr4, search doesn't work

2013-03-05 Thread David Parks
Oops, I didn't include the full XML there, hopefully this formats ok. From: David Parks To: "solr-user@lucene.apache.org" Sent: Wednesday, March 6, 2013 1:58 PM Subject: Re: After upgrade to solr4, search doesn't work All but the uni

Re: After upgrade to solr4, search doesn't work

2013-03-05 Thread David Parks
6, 2013 at 11:56 AM, David Parks wrote: > I just upgraded from solr3 to solr4, and I wiped the previous work and > reloaded 500,000 documents. > > I see in solr that I loaded the documents, and from the console, if I do a > query "*:*" I see documents returned. > > I

Re: After upgrade to solr4, search doesn't work

2013-03-05 Thread David Parks
uot;df" parameter in the /select request handler in solrconfig.xml to be your default query field name if it is not "text". -- Jack Krupansky -----Original Message- From: David Parks Sent: Wednesday, March 06, 2013 1:26 AM To: solr-user@lucene.apache.org Subject: After upgra

After upgrade to solr4, search doesn't work

2013-03-05 Thread David Parks
I just upgraded from solr3 to solr4, and I wiped the previous work and reloaded 500,000 documents. I see in solr that I loaded the documents, and from the console, if I do a query "*:*" I see documents returned. I copied a single word from the text of the query results I got from "*:*" but any qu

RE: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-18 Thread David Parks
ti-valued fields, would parent-child setup for you here? See http://search-lucene.com/?q=solr+join&fc_type=wiki Otis -- Solr & ElasticSearch Support http://sematext.com/ On Thu, Jan 17, 2013 at 8:04 PM, David Parks wrote: > The documents are individual products which come from 1 or

RE: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread David Parks
18, 2013 2:32 AM To: solr-user Subject: Re: Field Collapsing - Anything in the works for multi-valued fields? David, What's the documents and the field? It can help to suggest workaround. On Thu, Jan 17, 2013 at 5:51 PM, David Parks wrote: > I want to configure Field Collapsing, but m

Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread David Parks
I want to configure Field Collapsing, but my target field is multi-valued (e.g. the field I want to group on has a variable # of entries per document, 1-N entries). I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that grouping doesn't support multi-valued fields yet. Anything in

RE: Search strategy - improving search quality for short search terms such as "doll"

2013-01-16 Thread David Parks
result set. What I understood that you are talking about the context of the query. For example if you search "books on MK Gandhi" and "books by MK Gandhi" both queries have different context. Context based search at some level achieved by natural language processing. This on

RE: Search strategy - improving search quality for short search terms such as "doll"

2013-01-16 Thread David Parks
ex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Jan 16, 2013 at 4:40 AM, David Parks wr

Search strategy - improving search quality for short search terms such as "doll"

2013-01-16 Thread David Parks
I'm a beginner-intermediate solr admin, I've set up the basics for our application and it runs well. Now it's time for me to dig in and start tuning and improving queries. My next target is searches on simple terms such as "doll" which, in google, would return documents about, well, "toy do

RE: MoreLikeThis supporting multiple document IDs as input?

2013-01-04 Thread David Parks
are a bunch a parameters that you have to tune for either approach. -- Jack Krupansky -Original Message- From: David Parks Sent: Thursday, January 03, 2013 4:11 AM To: solr-user@lucene.apache.org Subject: RE: MoreLikeThis supporting multiple document IDs as input? I'm not seeing the

RE: MoreLikeThis supporting multiple document IDs as input?

2013-01-03 Thread David Parks
nce?! Or, maybe that you are wondering WHY they are different? That latter question I don't have the answer to. -- Jack Krupansky -Original Message- From: David Parks Sent: Friday, December 28, 2012 2:48 AM To: solr-user@lucene.apache.org Subject: RE: MoreLikeThis supporting multiple

What do I need to research to solve the problem of returning good results for a generic term?

2012-12-28 Thread David Parks
I'm sure this is a complex problem requiring many iterations of work, so I'm just looking for pointers in the right direction of research here. I have a base term, such as let's say "black dress" that I might search for. Someone searching on this term is most logically looking for black dresses

RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-27 Thread David Parks
each search request. If you open solrconfig.xml you will see how they are defined and used. HTH Otis Solr & ElasticSearch Support http://sematext.com/ On Dec 28, 2012 12:06 AM, "David Parks" wrote: > I'm somewhat new to Solr (it's running, I've been through the books, &g

RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-27 Thread David Parks
the components as they are. You would have to manually merge the values from the base documents and then you could POST that text back to the MLT handler and find similar documents using the posted text rather than a query. Kind of messy, but in theory that should work. -- Jack Krupansky

RE: MoreLikeThis only returns 1 result

2012-12-27 Thread David Parks
23.102.164:8080/solr/mlt?q=... Or, use the MoreLikeThis search component: http://localhost:8983/solr/select?q=...&mlt=true&;... See: http://wiki.apache.org/solr/MoreLikeThis -- Jack Krupansky -Original Message- From: David Parks Sent: Thursday, December 27, 201

MoreLikeThis only returns 1 result

2012-12-27 Thread David Parks
I'm doing a query like this for MoreLikeThis, sending it a document ID. But the only result I ever get back is the document ID I sent it. The debug response is below. If I read it correctly, it's taking "id:1004401713626" as the term (not the document ID) and only finding it once. But I want it to

RE: solr + jetty deployment issue

2012-12-27 Thread David Parks
Do you see any errors coming in on the console, stderr? I start solr this way and redirect the stdout and stderr to log files, when I have a problem stderr generally has the answer: java \ -server \ -Djetty.port=8080 \ -Dsolr.solr.home=/opt/solr \ -Dsolr.data.dir=/

RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-26 Thread David Parks
ually merge the values from the base documents > and then you could POST that text back to the MLT handler and find > similar documents using the posted text rather than a query. Kind of > messy, but in theory that should work. > > -- Jack Krupansky > > -Original

MoreLikeThis supporting multiple document IDs as input?

2012-12-25 Thread David Parks
I'm unclear on this point from the documentation. Is it possible to give Solr X # of document IDs and tell it that I want documents similar to those X documents? Example: - The user is browsing 5 different articles - I send Solr the IDs of these 5 articles so I can present the user other simi