hi guys
i'm trying to use UIMA contrib, but i got the following error
...
INFO: [] webapp=/solr path=/select
params={clean=false&commit=true&command=status&qt=/dataimport} status=0
QTime=0
05/02/2011 10:54:53 ص
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor processText
INFO: Analazying
Why not just:
q=*:*
fq={!bbox}
sfield=store
pt=49.45031,11.077721
d=40
fl=store
sort=geodist() asc
http://localhost:8983/solr/select?q=*:*&sfield=store&pt=49.45031,11.077721&;
d=40&fq={!bbox}&sort=geodist%28%29%20asc
That will sort, and filter up to 40km.
No need for the
fq={!func}geodist()
Heh, I'm not sure if this is valid thinking. :)
By *matching* doc distribution I meant: what proportion of your millions of
documents actually ever get matched and then how many of those make it to the
UI.
If you have 1000 queries in a day and they all end up matching only 3 of your
docs, the s
Salman,
Warming up may be useful if your caches are getting decent hit ratios. Plus,
you
are warming up the OS cache when you warm up.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From:
Hi Salman,
Ah, so in the end you *did* have TV enabled on one of your fields! :) (I think
this was a problem we were trying to solve a few weeks ago here)
How many docs you have in the index doesn't matter here - only N docs/fields
that you need to display on a page with N results need to be re
Hi Gustavo,
I think none of the answers I could give you would be valuable to you now,
because they would be from circa 2007 or 2008. We didn't use Solr, just Lucene.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
You can always try something like this out in the analysis.jsp page,
accessible from the Solr Admin home. Check out that page and see how it
allows you to enter text to represent what was indexed, and text for a
query. You can then see if there are matches. Very handy to see how the
various filters
If i use WordDelimiterFilterFactory during indexing and at query time,
will a search for "cls500" find "cls 500" and "cls500x"? If so, will
it find and score exact matches higher? If not, how do you get exact
matches to display first?
You mentioned that dismax does not support wildcards, but edismax does. Not
sure if dismax would have solved your other problems, or whether you just
had to shift gears because of the wildcard issue, but you might want to have
a look at edismax.
-Jay
http://www.lucidimagination.com
On Mon, Jan 3
I just moved to a multi core solr instance a few weeks ago, and it's
been working great. I'm trying to add a 3rd core and I can't query
against it though.
I'm running 1.4.1 (and tried 1.4.0) with the spatial search plugin.
This is the section in solr.xml
I've removed the index dir and c
Well I assume many people out there would have indexes larger than 100GB and
I don't think so normally you will have more RAM than 32GB or 64!
As I mentioned the queries are mostly phrase, proximity, wildcard and
combination of these.
What exactly do you mean by distribution of documents? On this
I know so we are not really using it for regular warm-ups (in any case index
is updated on hourly basis). Just tried few times to compare results. The
issue is I am not even sure if warming up is useful for such regular
updates.
On Fri, Feb 4, 2011 at 5:16 PM, Otis Gospodnetic wrote:
> Salman,
That's a good idea, Yonik. So, fields that aren't stored don't get displayed,
so
the float field in the schema never gets seen by the user. Good, I like it.
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better
idea
Hi Otis,
That's good I finally made it. For sematext I am afraid that I am too poor
to consider this solution :) (I am doing that for fun)
Thank you anyway !
2011/2/4 Otis Gospodnetic
> Hi,
>
> The main difference is that CommonGrams will take 2 adjacent words and put
> them
> together, while N
Your prices are just dollars and cents? For actual queries, you might consider
an int type rather than a float type. Multiple by a hundred to put it in the
index, then multiply your values in queries by a hundred before putting them in
the query. Same for range facetting, just divide by 100 be
Sorry for the lack of details.
It's all clear in my head.. :)
We checked out the head revision from the 3.x branch a few weeks ago
(https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). We
picked up r1058326.
We upgraded from a previous checkout (r960098). I am using our
customi
On Fri, Feb 4, 2011 at 12:56 PM, Dennis Gearon wrote:
> Using solr 1.4.
>
> I have a price in my schema. Currently it's a tfloat. Somewhere along the way
> from php, json, solr, and back, extra zeroes are getting truncated along with
> the decimal point for even dollar amounts.
>
> So I have two q
Hi Otis... thanks for your thoughts.
>I don't think DIH can read from a triple store today. It can read from a
>RDBMS,
>RSS/Atom feeds, URLs, mail servers, maybe others...
>Maybe what you should be looking at is the ManifoldCF instead, although I don't
>think it can fetch data from triple stores
Using solr 1.4.
I have a price in my schema. Currently it's a tfloat. Somewhere along the way
from php, json, solr, and back, extra zeroes are getting truncated along with
the decimal point for even dollar amounts.
So I have two questions, neither of which seemed to be findable with google.
A/
Hi Guys,
It depends on what properties you're trying to maximize. I've done several
studies of this over the years:
http://sunset.usc.edu/~mattmann/pubs/MSST2006.pdf
http://sunset.usc.edu/~mattmann/pubs/IWICSS07.pdf
http://sunset.usc.edu/~mattmann/pubs/icse-shark08.pdf
And if you're really bore
Hi Otis,
Hello,
You have many documents, 2 billion. Could you explain to me how this set
yours?
The mine is defined as follows, but using lucene.
I have 3 machines and each machine with 6 each hds. Each hd this index with
afragment of 10GB. Soon I have 3 servers search. Each server uses the l
Hello,
I am not using Nutch.
Let me explain more about how to use the lucene.
The class has lucene RemoteSearch which a server machine is used to publish
its index.
RemoteSearchable remote = new RemoteSearchable (parallelSearcher);
Naming.rebind ("//"+ LocalIP +"/"+ artPortMap.getNick (), remote
try http://localhost:8080/solr/select?q=*:* or while using solr's
default port http://localhost:8983/solr/select?q=*:*
On Fri, Feb 4, 2011 at 2:50 PM, Esclusa, Will
wrote:
> Hello Grijesh,
>
> The URL below returns a 404 with the following error:
>
> The requested resource (/select/) is not avail
Hello Grijesh,
The URL below returns a 404 with the following error:
The requested resource (/select/) is not available.
-Original Message-
From: Grijesh [mailto:pintu.grij...@gmail.com]
Sent: Friday, February 04, 2011 12:17 AM
To: solr-user@lucene.apache.org
Subject: RE: Index Not Ma
yes it works fine ... thanks
--
View this message in context:
http://lucene.472066.n3.nabble.com/Facet-Query-tp2422212p2424155.html
Sent from the Solr - User mailing list archive at Nabble.com.
Sending two separate queries is an approach but i think it may affect
performance of the solr because for every new search there will be two
queries to solr due to this reason i was thinking to do it by a single
query. I am going to implement it with two queries now but if any thing is
found usefu
Basically Term Vectors are only on one main field i.e. Contents. Average
size of each document would be few KB's but there are around 130 million
documents so what do you suggest now?
On Fri, Feb 4, 2011 at 5:24 PM, Otis Gospodnetic wrote:
> Salman,
>
> It also depends on the size of your docume
Rohan,
You can really do that with Lucene's tokenizers to get individual tokens/words
and a HashMap where keys are those words/tokens from the first document. You
can then tokenize the second doc and check each of its words in the HashMap.
Our Key Phrase Extractor (
http://sematext.com/produc
Salman,
It also depends on the size of your documents. Re-analyzing 20 fields of 500
bytes each will be a lot faster than re-analyzing 20 fields with 50 KB each.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Ori
Hi,
> Sharding is an option too but that too comes with limitations so want to
> keep that as a last resort but I think there must be other things coz 150GB
> is not too big for one drive/server with 32GB Ram.
Hmm what makes you think 32 GB is enough for your 150 GB index?
It depends on q
Salman,
I only skimmed your email, but wanted to say that this part sounds a little
suspicious:
> Our warm up script currently executes all distinct queries in our logs
> having count > 5. It was run yesterday (with all the indexing update every
It sounds like this will make warmup take a loo
Hi,
There are external tools that one can use to watch Java processes, listen for
errors, and restart processes if they die - monit, daemontools, and some
Java-specific ones.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
Hi,
2 GB for ramBufferSize is probably too much and not needed, but you could
increase it from default 32 MB to something like 128 MB or even 512 MB, if you
really have that much data where that would make a difference (you mention only
49 PDF files). I'd leave mergeFactor at 10 for now. The
Hi,
I'll admit I didn't read your email closely, but the first part makes me thing
that ngrams, which I don't think you mentioned, might be handy for you here,
allowing for misspellings without the implementation complexity.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lu
Hi,
The main difference is that CommonGrams will take 2 adjacent words and put them
together, while NGram* stuff will take a single word and chop it up in
sequences
of one or more characters/letters.
If you are stuck with auto-complete stuff, consider
http://sematext.com/products/autocomplete
Lewis,
A large maxFieldLength may not necessarily result in OOM - it depends on -Xmx
you are using, the number of concurrent documents being processed, and such.
So the first thing I'd look would be my machine's RAM, then -Xmx I can afford,
then based on that set maxFieldLengthmay.
Otis
Se
Gustavo,
I haven't used RMI in 5 years, but last time I used it I remember it being
problematic - this is in the context of Lucene-based search involving some 40
different shards/servers, high query rates, and some 2 billion documents, if I
remember correctly. I remember us wanting to get away
Hi Grant,
Thanks for the tip
This seems to work:
q=*:*
fq={!func}geodist()
sfield=store
pt=49.45031,11.077721
fq={!bbox}
sfield=store
pt=49.45031,11.077721
d=40
fl=store
sort=geodist() asc
On Thu, Feb 3, 2011 at 7:46 PM, Grant Ingersoll wrote:
> Use a filter query? See the {!geofilt} stuff
Yes, I see I didn't understand that facet.query parameter.
Have you consider submitting two queries ? One for results with q.op=OR, one
for faceting with q.op=AND ?
-Message d'origine-
De : Grijesh [mailto:pintu.grij...@gmail.com]
Envoyé : vendredi 4 février 2011 10:42
À : solr-user@lu
Hi Lewis,
> I am very interested in DataImportHandler. I have data stored in an RDF db
> and
>wish to use this data to boost query results via Solr. I wish to keep this
>data
>stored in db as I have a web app which directly maintains this db. Is it
>possible to use a DataImportHandler to
thanks Dominique
I am on windows... how do I do this on a windows 7 machine... I have
netbeans and I have SVN and ant plugins
regards
Mambe Churchill Nanje
237 33011349,
AfroVisioN Founder, President,CEO
http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
skypeID: mambenanje
www.twit
facet.query=+water +treatement +plant will not return the city facet that is
needed by poster.
That will give the counts matching the query facet.query=+water +treatement
+plant only
-
Thanx:
Grijesh
http://lucidimagination.com
--
View this message in context:
http://lucene.472066.n3.nabble
Using a facet query like
facet.query=+water +treatement +plant
... should give a count of 0 to documents not having all tree terms. This could
do the trick, if I understand how this parameter works.
Try solr's new Local Params ,may that will help for your requirement.
http://wiki.apache.org/solr/LocalParams
-
Thanx:
Grijesh
http://lucidimagination.com
--
View this message in context:
http://lucene.472066.n3.nabble.com/Problem-in-faceting-tp2422182p2422534.html
Sent from the Solr - Use
Hi friends, Is it possible to do faceting over score. I want to results from
facets which have more score. Please suggest.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-faceting-on-score-tp2422076p2422076.html
Sent from the Solr - User mailing list archive at Nabble.co
But i want results as it is as the above query is returning. There is no
problem with the results with it is returning.
Problem detail
I have implemented search for my company in which in search box user can
search any query. Now when a user search "water treatment plant". Then the
results come
No ,Facet query and fq parameters work with any type of query. when you will
search for facet.query=city:mumbai then it will return facet like
3
Facet query is for faceting against perticullar query.
If you wants result for that query then you have to go for fq=city:mumbai
-
Thanx:
change the default operator from "OR" to "AND" by using q.op or in schema
-
Thanx:
Grijesh
http://lucidimagination.com
--
View this message in context:
http://lucene.472066.n3.nabble.com/Problem-in-faceting-tp2422182p248.html
Sent from the Solr - User mailing list archive at Nabble.com.
48 matches
Mail list logo