We have a number of queries that produce good results based on the textual
data, but are contextually wrong (for example, an "SSD hard drive" search
matches the music album "SSD hip hop drives us crazy".
Textually a fair match, but SSD is a term that strongly relates to technical
documents.
Find the discussion titled "Indexing off the production servers" just a week
ago in this same forum, there is a significant discussion of this feature
that you will probably want to review.
-Original Message-
From: Lan [mailto:dung@gmail.com]
Sent: Friday, May 10, 2013 3:42 AM
To: so
I'm not the expert here, but perhaps what you're noticing is actually the
OS's disk cache. The actual solr index isn't cached by solr, but as you read
the blocks off disk the OS disk cache probably did cache those blocks for
you. On the 2nd run the index blocks were read out of memory.
There was a
I can see your point, though I think edge cases would be one concern, if
someone *can* create a very large synonyms file, someone *will* create that
file. What would you set the zookeeper max data size to be? 50MB? 100MB?
Someone is going to do something bad if there's nothing to tell them not to
Wouldn't it make more sense to only store a pointer to a synonyms file in
zookeeper? Maybe just make the synonyms file accessible via http so other
boxes can copy it if needed? Zookeeper was never meant for storing
significant amounts of data.
-Original Message-
From: Jan Høydahl [mailto:
So, am I following this correctly by saying that, this proposed solution
would present us a way to index a collection on an offline/dev solr cloud
instance and *move* that pre-prepared index to the production server using
an alias/rename trick?
That seems like a reasonably doable solution. I also
of them and every shard has 2 replica. When you
> > send a query into a SolrCloud every replica will help you for
> > searching and if
> you
> > add more replicas to your SolrCloud your search performance will
improve.
> >
> >
> > 2013/5/6 David Parks
>
I've had trouble figuring out what options exist if I want to perform all
indexing off of the production servers (I'd like to keep them only for user
queries).
We index data in batches roughly daily, ideally I'd index all solr cloud
shards offline, then move the final index files to the solr cl
Subject: Re: Bug? JSON output changes when switching to solr cloud
Thanks David,
I've confirmed this is still a problem in trunk and opened
https://issues.apache.org/jira/browse/SOLR-4746
-Yonik
http://lucidworks.com
On Sun, Apr 21, 2013 at 11:16 PM, David Parks
wrote:
> We just
We just took an installation of 4.1 which was working fine and changed it to
run as solr cloud. We encountered the most incredibly bizarre apparent bug:
In the JSON output, a colon ':' changed to a comma ',', which of course
broke the JSON parser. I'm guessing I should file this as a bug, but it
Friday, April 19, 2013 9:42 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud loadbalancing, replication, and failover
On 4/19/2013 3:48 AM, David Parks wrote:
> The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has
> dark grey allocation of 602MB, and light grey of an
Wow, thank you for those benchmarks Toke, that really gives me some firm
footing to stand on in knowing what to expect and thinking out which path to
venture down. It's tremendously appreciated!
Dave
-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: Frida
day, April 19, 2013 4:19 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud loadbalancing, replication, and failover
On 4/19/2013 2:15 AM, David Parks wrote:
> Interesting. I'm trying to correlate this new understanding to what I
> see on my servers. I've got one server
uery
over every single GB of data.
If you only actually query over, say, 500MB of the 120GB data in your dev
environment, you would only use 500MB worth of RAM for caching. Not 120GB
On Fri, Apr 19, 2013 at 7:55 AM, David Parks wrote:
> Wow! That was the most pointed, concise discussion of h
---Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Friday, April 19, 2013 11:51 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud loadbalancing, replication, and failover
On 4/18/2013 8:12 PM, David Parks wrote:
> I think I still don't understand something h
disk
performance, and CPU regardless of how you lay out the cluster otherwise
performance will suffer. My guess is if each Solr had sufficient resources,
you wouldn't actually notice much difference in query performance.
Tim
On Thu, Apr 18, 2013 at 8:03 AM, David Parks wrote:
> But my con
On Apr 18, 2013 3:11 AM, "David Parks" wrote:
> Step 1: distribute processing
>
> We have 2 servers in which we'll run 2 SolrCloud instances on.
>
> We'll define 2 shards so that both servers are busy for each request
> (improving response time of the req
Step 1: distribute processing
We have 2 servers in which we'll run 2 SolrCloud instances on.
We'll define 2 shards so that both servers are busy for each request
(improving response time of the request).
Step 2: Failover
We would now like to ensure that if either of the servers goes down (we
Isn't this an AWS security groups question? You should probably post this
question on the AWS forums, but for the moment, here's the basic reading
material - go set up your EC2 security groups and lock down your systems.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-s
ng as it doesn't have an fq clause.
Best
Erick
On Sat, Mar 23, 2013 at 3:10 AM, David Parks wrote:
> I see the CPU working very hard, and at the same time I see 2 MB/sec
> disk access for that 15 seconds. I am not running it this instant, but
> it seems to me that there
I see the CPU working very hard, and at the same time I see 2 MB/sec disk
access for that 15 seconds. I am not running it this instant, but it seems
to me that there was more CPU cycles available, so unless it's an issue of
not being able to multithread it any further I'd say it's more IO related.
ll have acceptable indexing/query
performance.
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
21. mars 2013 kl. 12:43 skrev David Parks :
> We have 300M documents, each about a paragraph of text on average. The
> index is 140GB i
I've got a query that takes 15 seconds to return whenever I have the term
"book" in a query that isn't cached. That's a pretty common term in our search
index. We're indexing about 120 GB of text data. We only store terms and IDs,
no document data, and the disk is virtually unused, it's all CPU
how much RAM, whether you utilize disk caching well enough and many other
things which could affect this situation. But the pure fact that only a few
common search words trigger such a delay would suggest commongrams as a
possible way forward.
--
Jan Høydahl, search solution architect
Cominvent AS
I've got a query that takes 15 seconds to return whenever I have the term
"book" in a query that isn't cached. That's a pretty common term in our
search index. We're indexing about 120 GB of text data. We only store terms
and IDs, no document data, and the disk is virtually unused, it's all CPU
tim
ibing your use case in more details with the above questions so
we'd be able to give you guidelines.
Best,
Manu
On Mon, Mar 18, 2013 at 3:55 AM, David Parks wrote:
> I'm spec'ing out some hardware for a first go at our production Solr
> instance, but I haven't spent
I'm spec'ing out some hardware for a first go at our production Solr
instance, but I haven't spent enough time loadtesting it yet.
What I want to ask if how IO intensive solr is vs. CPU intensive, typically.
Specifically I'm considering whether to dual-purpose the Solr servers to run
Solr a
ry much for all your
help on this, it certainly helped me get my configuration straight and the
upgrade to 4 is now complete.
All the best,
David
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Wednesday, March 06, 2013 7:56 PM
To: solr-user@lucene.apache.org;
ied a comma separated list of
my fields here but that was invalid.
dvddvdid:dvdid:dvd
From: David Parks
To: "solr-user@lucene.apache.org"
Sent: Wednesday, March 6, 2013 1:52 PM
Subject: Re: After upgrade to solr4, search doesn't work
Good th
Oops, I didn't include the full XML there, hopefully this formats ok.
From: David Parks
To: "solr-user@lucene.apache.org"
Sent: Wednesday, March 6, 2013 1:58 PM
Subject: Re: After upgrade to solr4, search doesn't work
All but the uni
6, 2013 at 11:56 AM, David Parks wrote:
> I just upgraded from solr3 to solr4, and I wiped the previous work and
> reloaded 500,000 documents.
>
> I see in solr that I loaded the documents, and from the console, if I do a
> query "*:*" I see documents returned.
>
> I
uot;df" parameter in the
/select request handler in solrconfig.xml to be your default query field name
if it is not "text".
-- Jack Krupansky
-----Original Message- From: David Parks
Sent: Wednesday, March 06, 2013 1:26 AM
To: solr-user@lucene.apache.org
Subject: After upgra
I just upgraded from solr3 to solr4, and I wiped the previous work and
reloaded 500,000 documents.
I see in solr that I loaded the documents, and from the console, if I do a
query "*:*" I see documents returned.
I copied a single word from the text of the query results I got from "*:*"
but any qu
ti-valued fields, would parent-child setup for you here?
See http://search-lucene.com/?q=solr+join&fc_type=wiki
Otis
--
Solr & ElasticSearch Support
http://sematext.com/
On Thu, Jan 17, 2013 at 8:04 PM, David Parks wrote:
> The documents are individual products which come from 1 or
18, 2013 2:32 AM
To: solr-user
Subject: Re: Field Collapsing - Anything in the works for multi-valued
fields?
David,
What's the documents and the field? It can help to suggest workaround.
On Thu, Jan 17, 2013 at 5:51 PM, David Parks wrote:
> I want to configure Field Collapsing, but m
I want to configure Field Collapsing, but my target field is multi-valued
(e.g. the field I want to group on has a variable # of entries per document,
1-N entries).
I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that
grouping doesn't support multi-valued fields yet.
Anything in
result set.
What I understood that you are talking about the context of the query. For
example if you search "books on MK Gandhi" and "books by MK Gandhi" both
queries have different context.
Context based search at some level achieved by natural language processing.
This on
ex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once.
Lately, it doesn't seem to be working. (Anonymous - via GTD book)
On Wed, Jan 16, 2013 at 4:40 AM, David Parks wr
I'm a beginner-intermediate solr admin, I've set up the basics for our
application and it runs well.
Now it's time for me to dig in and start tuning and improving queries.
My next target is searches on simple terms such as "doll" which, in google,
would return documents about, well, "toy do
are a bunch a parameters that you have to tune for either approach.
-- Jack Krupansky
-Original Message-
From: David Parks
Sent: Thursday, January 03, 2013 4:11 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?
I'm not seeing the
nce?! Or, maybe that you are wondering
WHY they are different? That latter question I don't have the answer to.
-- Jack Krupansky
-Original Message-
From: David Parks
Sent: Friday, December 28, 2012 2:48 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple
I'm sure this is a complex problem requiring many iterations of work, so I'm
just looking for pointers in the right direction of research here.
I have a base term, such as let's say "black dress" that I might search for.
Someone searching on this term is most logically looking for black dresses
each search request. If you open solrconfig.xml you will see how they
are defined and used.
HTH
Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, "David Parks" wrote:
> I'm somewhat new to Solr (it's running, I've been through the books,
&g
the components as they are.
You would have to manually merge the values from the base documents and then
you could POST that text back to the MLT handler and find similar documents
using the posted text rather than a query. Kind of messy, but in theory that
should work.
-- Jack Krupansky
23.102.164:8080/solr/mlt?q=...
Or, use the MoreLikeThis search component:
http://localhost:8983/solr/select?q=...&mlt=true&;...
See:
http://wiki.apache.org/solr/MoreLikeThis
-- Jack Krupansky
-Original Message-
From: David Parks
Sent: Thursday, December 27, 201
I'm doing a query like this for MoreLikeThis, sending it a document ID. But
the only result I ever get back is the document ID I sent it. The debug
response is below.
If I read it correctly, it's taking "id:1004401713626" as the term (not the
document ID) and only finding it once. But I want it to
Do you see any errors coming in on the console, stderr?
I start solr this way and redirect the stdout and stderr to log files, when
I have a problem stderr generally has the answer:
java \
-server \
-Djetty.port=8080 \
-Dsolr.solr.home=/opt/solr \
-Dsolr.data.dir=/
ually merge the values from the base documents
> and then you could POST that text back to the MLT handler and find
> similar documents using the posted text rather than a query. Kind of
> messy, but in theory that should work.
>
> -- Jack Krupansky
>
> -Original
I'm unclear on this point from the documentation. Is it possible to give
Solr X # of document IDs and tell it that I want documents similar to those
X documents?
Example:
- The user is browsing 5 different articles
- I send Solr the IDs of these 5 articles so I can present the user other
simi
49 matches
Mail list logo