I'm somewhat perplexed, under what circumstances would you be able to
send one query to Solr but not two?
-Mike
On 21-Jul-08, at 8:37 PM, Jon Baer wrote:
Well that's my problem ... I can't :-)
When you put a fq=doctype:news in there your can't get an explicit
facet
s behavior is? I'm using solr 1.2.
What exact url did you send to Solr? I bet there is a missing '&'.
-Mike
ther features like query-injected
filter queries. This type of extension is largely obsolete with
QueryComponents
Let me know if you want more detail--most of this is relative to a
somewhat older version of Solr, so it might not all apply.
cheers,
-Mike
same amount of ram.
The situation you are experiencing is one-seek-per-doc, which is
performance death.
-Mike
On 28-Jul-08, at 1:34 PM, Yonik Seeley wrote:
That's a bit too tight to have *all* of the index cached...your best
bet is to go to 4GB+, or figure out a way not to have to retrie
doc data should probably be all in cache. One way to
mitigate this is to partition the fields like I suggested in the other
reply.
-Mike
like these extra
fields should just be stored in a separate file/database. I also
wonder if solving the underlying problem really requires storing 10k
values per doc (you haven't given us many clues in this regard)?
-Mike
To me, the release timing doesn't much affect what logo we decided to
use or when to adopt it. Surely the most visible, important location
for the logo is on the website, that we can replace at any time?
-Mike
On 8-Aug-08, at 7:30 AM, Otis Gospodnetic wrote:
I think you are right
e
add happens before delete, in which case i end up with no more doc
id=1 ?
As long as you are sending these requests on the same thread, they
will occur in order.
-Mike
ly.
If 1.3, is the nightly build the best one to grab bearing in mind
that we would want any protocols around distributed search to be as
stable as possible? Or just wait for the 1.3 release?
Go for the nightly build. The release will look very similar to it.
-Mike
(rawText:python)=27)
2.581456 = idf(docFreq=16017)
0.03125 = fieldNorm(field=rawText, doc=950285)
The =27 is the number of times 'python' appears in this document.
You could also write a custom component that included in this
information in the response.
-Mike
On 18-Aug-08,
On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:
So you experience differs from Mike's. Obviously it's an important
decision as to whether to buy more machines. Can you (or Mike)
weigh in on what factors led to your different take on local shards
vs. shards distributed acros
Nice job Lukas; the professionalism and quality of work is evident. I
like aspects of the logo, but too am having trouble getting past the
eye-looking O. Is it intentional (eye:look:search, etc)?
-Mike
On 20-Aug-08, at 5:25 AM, Mark Miller wrote:
I went through the same thought process
I thought the plan was to run more of a logo contest?
-Mike
On 21-Aug-08, at 9:29 AM, Otis Gospodnetic wrote:
One more +1 for the eye/sun O. I don't think I thought "eye" when i
saw it, but I think having an eye there is actually a cool little
detail.
I think Shalin sh
Hi Jim,
Looks like a sql injection attack that is automatically entered into
search forms. Solr should not be affected, but it could affect you if
you insert the raw/unescaped query into a sql database (for logging,
etc.).
-Mike
On 21-Aug-08, at 3:30 PM, Jim Hurst wrote:
Hey folks
sure someone would find that useful.
-Mike
2008/8/22 Chris Hostetter <[EMAIL PROTECTED]>
: I would like to know if I can add a FAQ entry about this topic, the
: motivation, ideas and workarounds used. If yes, I would like do
it with
help
: from all guys that faced this problem.
Anyone
you can also use queries like field:[* to Z] or field:[Z TO *]
-Mike
Jake Conk wrote:
Hello,
I was trying to figure out how to query ranges greater than and less
than. The closest solution I could find was using the range format:
field:[x TO z]
While this solution works for querying
ormance benefit from
indexing the field, is there? I guess if you have indexed and
termVectors and termPositions then you'll see a highlighting speedup,
but not from indexed alone.
True.
-Mike
stead of
max.
Any custom scoring example will help.
(On one hand, DisjunctionMaxQuery itself is an example :-). It is too
professional :-)
DisjunctionMaxQuery takes the max plus (tiebreak)*sum(others). So, if
you set tie=1.0, dismax becomes exactly what you are seeking.
-Mike
restrictive fq, you might try an approach similar to the
one in https://issues.apache.org/jira/browse/SOLR-407 .
-Mike
ou can "fake" it by only using fieldsets (qf) that have a
consistent set of stopwords.
-Mike
On 7-Oct-08, at 9:27 AM, Jon Drukman wrote:
Mike Klaas wrote:
On 6-Oct-08, at 11:20 AM, Jon Drukman wrote:
is there any way i could 'fake' it by adding a second field
without stopwords, or something like that?
Yep, you can "fake" it by only using fieldsets (qf) that
I think you can do field:["" TO *] to grab everything that is not null.
-Mike
John E. McBride wrote:
Hello All,
I need to run a query which asks:
field = NOT NULL
should this perhaps be done with a filter?
I can't find out how to do NOT NULL from the documentation, would
I don't think that there is any outstanding work to do on this issue.
2.4.0 should be compatible with the Solr 1.3 release; simply drop the
lucene jars in solr's lib directory if you want to use the (slightly
newer) version of lucene.
-Mike
On 15-Oct-08, at 10:00 AM, Feak,
gher than this is a net loss.
-Mike
If you never execute any queries, a gig should be more than enough.
Of course, I've never played around with a .8 billion doc corpus on
one machine.
-Mike
On 3-Nov-08, at 2:16 PM, Alok Dhir wrote:
in terms of RAM -- how to size that on the indexer?
---
Alok K. Dhir
Symplicity Corpor
n but
http://localhost:7001/solr/admin/luke works fine.
Regards,
Mike
btw I don't have a solr.xml
--
View this message in context:
http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin-page-tp20157873p20460991.html
Sent from the Solr - User mailing list archive at Nabble.com.
hossman wrote:
>
>
> i don't have time to really dig into the code right now, but out of
> curiosity what happens when you hit http://localhost:7001/solr/admin/
> and/or http://localhost:7001/solr/admin/index.jsp ?
>
>
I get the same exception when going to both of those.
--
View this mes
this needs to be fixed. It
isn't as easy as synchronizing didCommit/didRollback, though--this
would introduce definite deadlock scenarios.
Mark, is there any chance you could post the thread dump for the
deadlocked process? Do you issue manual commits during insertion?
-Mike
On 18-Nov-08, at 12:18 PM, Mark Miller wrote:
Mike Klaas wrote:
autoCommitCount is written in a CommitTracker.synchronized block
only. It is read to print stats in an unsynchronized fashion,
which perhaps could be fixed, though I can't see how it could cause
a problem
lastAdde
& as performant as it
can be. Is there a test SQL database that is used to test Solr, so I
might try to do some comparisons?
Actually, I think that Solr's multithreaded indexing could be
improved. It is really only analysis that is parallelizable ATM.
-Mike
field, you are enumerating every possible value of that field
and excluding the docs containing it).
The solution is to store a token indicating that the field is empty,
such as "" (I think that "" works too). Then change your
fq to
fq=-comments:""
It should be much faster.
-Mike
d list of tokens
not to
tokenize like EnglishPorterFilter ?
That's a possibility. Another is to add code to filter out short
tokens from being generated, and use catenateAll=true
-Mike
ay to do it is to issue multiple queries
to Solr.
-Mike
On 21-Nov-08, at 3:45 AM, Mark Miller wrote:
To do it now, you'd have to switch the query parser to using the old
style wildcard (and/or prefix) query, which is slower on large
indexes and has max clause issues.
An alternative is to query for q=tele?*, which forces wildcardquery
https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg
https://issues.apache.org/jira/secure/attachment/12393995/sslogo-solr-70s.png
https://issues.apache.org/jira/secure/attachment/12393936/logo_remake
the old docs are gone? Try wiping the index
completely:
deleteByQuery *:*
(it is also more efficient to do this first if you are going to re-
index everything).
-Mike
iltercache code--have you tried the concurrent filter cache impl?
-Mike
I have posted my setup here:
http://www.nabble.com/Throughput-Optimization-td20335132.html.
My original filterCache was 700,000. Reducing it to 20,000, I found:
- Average response time decreased by 85%
- Average throughpu
particular word, how can i do that?
Sounds like you want query autocomplete. The best way to do this
(including if you want the box filled with some queries), is to use
the query logs, not the documents.
-Mike
y the index
until the merge is complete. But I am not familiar enough with this
code in lucene to be sure.
-Mike
On 2-Jan-09, at 10:17 AM, Brian Whitman wrote:
I think I'm getting close with this (sorry for the self-replies)
I tried an optimize (which we never do) and it took 30m and s
Kalidoss,
You can subscribe here:
http://lucene.apache.org/solr/mailing_lists.html
regards,
-Mike
On 5-Jan-09, at 4:19 AM, kalidoss wrote:
Thanks,
kalidoss.m,
** DISCLAIMER **
Information contained and transmitted by this E-MAIL is proprietary
to Sify Limited and is
y case, Jetty).
Note that if these instances are sharing a single disk, and your RAM
is low, then they will be competing over the slowest resource on your
machine and the query could be IO bound, in which case sharding is
useless.
-Mike
all sub-docs in a result
set. Which interface to I needs to implement to achieve this ?
3) if I do duping , my total result count will be off , what is the
right way to return an estimated total doc count ...
Thanks
Mike
e of
the highlighter (which first generates fragments and only then
determine whether they are snippets that contain the keyword(s))
-Mike
ate of the filtercache mem usage
by looking at its size.
-Mike
They are documented in http://wiki.apache.org/solr/
FieldOptionsByUseCase and in the FAQ , but I agree that it could be
more readily accessible.
-Mike
On 27-Jan-09, at 5:26 AM, Jarek Zgoda wrote:
Finally found that the fields have to have an analyzer to be
highlighted. Neat.
Can I ask
Well, both pages I listed are in the search results :). But I agree
that it isn't obvious to find, and that it should be improved. (The
Wiki is a community-created site which anyone can contribute to,
incidentally.)
cheers,
-Mike
On 28-Jan-09, at 1:11 AM, Jarek Zgoda wrote:
I sw
Thanks, Jarek.
-Mike
On 29-Jan-09, at 12:20 AM, Jarek Zgoda wrote:
Added appriopriate amendment to FAQ, but I'd consider reorganizing
information in the whole wiki, like creating a section titled
"Common Tasks". Bit of redundancy does not hurt if it comes to
documentati
'm using solr 1.3.
Try hl.requireFieldMatch=true
http://wiki.apache.org/solr/HighlightingParameters
-Mike
solr-user using the web, try nabble:
http://www.nabble.com/Solr---User-f14480.html
-Mike
to justify splitting the list into sub-
lists (or sub-fora)
Fora have the same problems as do mailinglists in terms of people
asking the same questions.
-Mike
,
one that is indexed and not stored and one that is stored and not
indexed and only send the first N characters to the stored field?
-Mike
Cool, we are actually still on 1.2 but were planning on upgrading to 1.3
is this a feature of 1.3 or just on the nightly builds?
-Mike
Koji Sekiguchi wrote:
> Mike Topper wrote:
>> Hello,
>>
>> In one of the fields in my schema I am sending somewhat large texts. I
>&g
that I'll make all efforts
to prevent it from happening again.
It would be forgivable if only the email didn't contain the
misspelling "Lucen" :)
-Mike
cally release by then, or should I stick
with 1.3?
Thanks,
Mike
the central problems of
computer science.
-Mike
Hi Jayson,
It is on my list of things to do. I've been having a very busy week
and and am also working all weekend. I hope to get to it next week
sometime, if no-one else has taken it.
cheers,
-mike
On 8-May-09, at 10:15 PM, jayson.minard wrote:
First cut of updated handler n
uld I consider changing the lock timeout settings (currently set to
defaults)? If so, I'm not sure what to base these values on.
Thanks in advance,
mike
On Wed, Nov 4, 2009 at 8:27 PM, Lance Norskog wrote:
> This will not ever work reliably. You should have 2x total disk space
> for the
I think you might be looking for Apache Tika.
On Mon, Jan 25, 2010 at 3:55 PM, Frank van Lingen wrote:
> I recently started working with solr and find it easy to setup and tinker
> with.
>
> I now want to scale up my setup and was wondering if there is an
> application/component that can do the
pipeline again. That's a lot of overheard AFAIK.
- A TokenFilter would allow me to tap into the existing analysis pipeline so
I get the tokens for free but I can't access the document.
Any suggestions on how to best implement this?
Thanks in advance,
mike
There might be an OCR plugin for Apache Tika (which does exactly this out of
the box except for OCR capability, i believe).
http://lucene.apache.org/tika/
-mike
2010/2/4 Kranti⢠K K Parisa
> Hi,
>
> Can anyone list the best OCR APIs available to use in combination with
> SOLR.
Hello,
One of the commercial search platforms I work with has the concept of
'document vectors', which are 1-gram and 2-gram phrases and their
associated tf/idf weights on a 0-1 scale, i.e. ["banana pie", 0.99]
means banana pie is very relevant for this document.
During the ingest/indexing proces
Thank you Ahmet, this is exactly what I was looking for. Looks like
the shingle filter can produce 3+-gram terms as well, that's great.
I'm going to try this with both western and CJK language tokenizers
and see how it turns out.
On Tue, Feb 9, 2010 at 5:07 PM, Ahmet Arslan wrote:
>> I've been l
on how to
implement this efficiently with Lucene/Solr.
mike
On Thu, Jan 28, 2010 at 4:31 PM, Otis Gospodnetic
wrote:
>
> How about this crazy idea - a custom TokenFilter that stores the safe flag in
> ThreadLocal?
>
>
>
> - Original Message
> > From: M
In an UpdateRequestProcessor (processing an AddUpdateCommand), I have
a SolrInputDocument with a field 'content' that has termVectors="true"
in schema.xml. Is it possible to get access to that field's term
vector in the URP?
false;
}
}
termAtt.setTermBuffer("n", 0, 1);
return false;
}
mike
disclosure I
work at New Relic.)
Mike
Siddhant Goel wrote:
>
> Hi everyone,
>
> I have an index corresponding to ~2.5 million documents. The index size is
> 43GB. The configuration of the machine which is running Solr is - Dual
> Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2
y the
clustering output is nowhere to be found. I noticed that the
clustering output has a data type of "Arr", where the response and
other components have output of type "Lst", could this be the problem?
If anyone can think of some other debugging I could try I'd love to hear it.
Thanks in advance,
Mike
false alarm, on the client side I was specifically setting a shard,
and this was causing my query/solr-ruby/solr to think it was a
distributed request, which isn't supported by the clustering
component.
cheers,
mike
On Mon, Mar 22, 2010 at 8:53 PM, mike anderson wrote:
> Has anybody
Hi, I don't think my problem is unique, but I couldn't find any answers after
an hour of searching...
I have two databases with identical schemas and different data. I want to
use DIH to index both into a single Solr index (right now, I have them in
separate indexes, but I find this cumbersome).
f docs, it's probably not
maxBufferedDocs, but when a big luicene merge is triggered.
Could happen when doDeleting the pending docs too. James: try
sending commit every 500k docs or so.
-Mike
On 18-Jul-07, at 2:58 PM, Yonik Seeley wrote:
On 7/18/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
Could happen when doDeleting the pending docs too. James: try
sending commit every 500k docs or so.
Hmmm, right... some of the memory usage will be related to the treemap
keeping tr
bedded mode, but the main problem is likely the values of your
parameters. you probably want:
Subject:(Math English)
Grade:(junior senior)
-Mike
The problem is that your query/filter syntax is incorrect.
subject:Math English
does _not_ search for subject=Math OR subject=English. It searches
for subject=Math OR 'English' in the default search field.
You need to use
subject:Math subject:English
or
subject:(Math English)
reg
after indexing is done, especially if an
is done.
If we change the value, do I have to reindex it?
1
This is the only setting that affects search, and it is the maximum
length of searchable documents. You will have to reindex to see the
changes here.
-Mike
E warnings too--you may have
more than two Searchers open at once.
-Mike
to express requirements. This also plays better with
caching.
NOT clauses -> fqs
required clauses (either OR or AND) -> q + mm
purely optional clauses -> bq/bf
If you want complicated (read: parenthesized) boolean logic, it's
best to develop your own solution.
-Mike
ontains the word in the query?
This is because the example Solr distribution is configured to do
stemming (see the definition for "text" fieldtype in schema.xml).
Remove PorterStemmerFilterFactory to do exact(er) searching/
highlighting only.
-Mike
s the most performant, but you can also issue a
series of facet queries:
?
q=...&facet.query=title:a*&facet.query=title:b*&facet.query=title:c*&...
Now, if this doesn't have to be query-specific (you want the global
counts), you can use TermDocs to get the answer quickly.
-MIke
pieces to
solrconfig.xml:
false
10
1000
2147483647
1
1000
Have you tried upping this? The problem might be that you are
commiting every 1.0s, and a single commit eventually might take
longer than this (and you're only waiting 1.0s to acquire the wri
On 8-Aug-07, at 2:09 PM, Jae Joo wrote:
How about standformat optimizion?
Jae
Optimized indexes are always faster at query time that their non-
optimized counterparts. Sometimes significantly so.
-Mike
e for
faceting, we should only append to the original query without the
pagination
params in order to get the correct faceting results. Right?
Faceting ignores pagenation/startat/maxresults/etc.
regards,
-Mike
On 9-Aug-07, at 7:52 AM, Ard Schrijvers wrote:
ulimit -n 8192
Unless you have an old, creaky box, I highly recommend simply upping
your filedesc cap.
-Mike
y has no way of
knowing) what parts of a doc matched, so it would still have to try
highlighting first.
Note that you can control the cpu usage for long fields by setting
hl.maxAnalyzedChars (will be in the next release).
best,
-Mike
No. Hard links are alternative names for an inode: when lucene
replaces a file, it is creating a new (underlying) inode/file, and
the backup "link" points to the old one.
Don't think of hard links as "links", but additional logical names
for the same physical dat
On 13-Aug-07, at 6:18 PM, Benjamin Higgins wrote:
(using last night's Solr build)
Can't seem to get this to work. I am trying to use the regex
highlighter fragment type. The docs suggest looking at the example
solrconifg.xml for a demonstration of a fragmentor that splits on
sentences. It
admin
ui that ships with Solr?
-Mike
s -sh" will tell you roughly where the the space is being
occupied. There is something strange going on: 2.5kB * 2.7m is only
6GB, and I have trouble imagining where the 30-fold index size
expansion is coming from.
-Mike
ster than every couple of minutes (say,
every 10 seconds)?
what if I take out the postcommit hook on the master and just have the
snapshooter run on a cron every 5 minutes?
-Mike
In the xml response its displaying the numDocs as 22 but giving
only first
10 records. I am unable to get the remaining 12 records. Whether i
have to
do any configuration in solrconfig.xml?
See the 'start' and 'rows' parameters:
http://wiki.apache.org/solr/CommonQueryParameters
-Mike
c per http request, using persistent connections, and
threading.
-Mike
n handling strategy: are you
using persistent http connections? Are you threadedly indexing?
cheers,
-Mike
Paul Sundling
-Original Message-
From: climbingrose [mailto:[EMAIL PROTECTED]
Sent: Monday, August 27, 2007 12:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Embedded about 50%
t;100">
best,
-Mike
2 billion docs (signed int).
On 29-Aug-07, at 6:24 PM, James liu wrote:
what is the limits for Lucene and Solr.
100m, 1000m, 5000m or other number docs?
2007/8/24, Walter Underwood <[EMAIL PROTECTED]>:
It should work fine to index them and search them. 13 million docs is
not even close to t
very slow. Any suggestions as to how to improve this?
Maybe a problem with HashSets? Try reducing this value to zero:
-Mike
you will hit physical limits of your machine long before
you can achieve your hypothetical situation: that's 20,000 Tb, which
is many, many times the size of a complete internet crawl.
-Mike
2007/8/30, Mike Klaas <[EMAIL PROTECTED]>:
2 billion docs (signed int).
On 29-Aug-07,
functionality.
Not currently developed. See
http://wiki.apache.org/solr/FederatedSearch and
http://issues.apache.org/jira/browse/SOLR-303
-Mike
-Nathan
-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 30, 2007 11:44 AM
To: solr-user@lucene.apache.org
n
the appropriate place (eg. as a filter).
best,
-Mike
uot; I can send multiple values?
Yes. The one-term-per-field restriction applies to:
a) sorting
b) _optimization_ of facets.
-Mike
On 30-Aug-07, at 12:09 PM, Lance Norskog wrote:
Is there an app that walks a Lucene index and checks for corruption?
How would we know if our index had become corrupted?
Try asking on [EMAIL PROTECTED]
-Mike
(for Embedded version of Solr)?
Probably not until the multiple solr core support patch gets
committed. SolrCore is currently a singleton.
-Mike
401 - 500 of 1080 matches
Mail list logo