Hi,
I am using SOLR with Tomcat server. I have configured two
multicore inside the SOLR home directory. The solr.xml file looks like
I am also using DIH to upload the data in these two cores separately &
document count in these two core is different. However wheneve
I'm running the 3.x branch and I'm trying to implement spatial searching.
I am able to sort results by distance from a given lat/long using a query
like:
http://localhost:8080/solr/select/?q=_val_:"recip(dist(2, lat_long,
vector(-66.5,75.1)),1,1,0)"&fl=*,score
which gives me the expected resul
On Jun 9, 2010, at 8:38pm, Blargy wrote:
What is the preferred way to index html using DIH (my html is stored
in a
blob field in our database)?
I know there is the built in HTMLStripTransformer but that doesn't
seem to
work well with malformed/incomplete HTML. I've created a custom
tra
Wait... do you mean I should try the HTMLStripCharFilterFactory analyzer at
index time?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
--
View this message in context:
http://lucene.472066.n3.nabble.com/Indexing-HTML-tp884497p884592.html
Sent from
Does the HTMLStripChar apply at index time or query time? Would it matter to
use over the other?
As a side question, if I want to perform highlighter summaries against this
field do I need to store the whole field or just index it with
TermVector.WITH_POSITIONS_OFFSETS?
--
View this message in
I tried put "shards" into default request handler.
But now each time if search, solr hangs forever.
So what's the correct solution?
Thanks.
explicit
10
*
2.1
localhost:7500/solr,localhost:7501/solr,localhost:7502/solr,localhost:7503/solr,localhost
Hi. I am running distributed search on solr.
I have 70 solr instances. So each time I want to search I need to use
?shards=localhost:7500/solr,localhost..7620/solr
It is very long url.
so how can I encode shards into config file then i don't need to type each
time.
thanks.
Scott
The HTMLStripChar variants are newer and might work better.
On Wed, Jun 9, 2010 at 8:38 PM, Blargy wrote:
>
> What is the preferred way to index html using DIH (my html is stored in a
> blob field in our database)?
>
> I know there is the built in HTMLStripTransformer but that doesn't seem to
> w
Every time you reload the index it is to rebuild the facet cached
data. Could that be it?
Also, how big are the fields being highlighted? And are they indexed
with term vectors? (If not, the text is re-analyzed in flight with
term vectors.)
How big are the caches? Are they growing & growing?
On
I want to try out the bobo plugin for Solr, which is a custom request handler
(http://code.google.com/p/bobo-browse/wiki/SolrIntegration).
At the same time I want to use BoostQParserPlugin to boost my queries,
something like {!boost b=log(popularity)}foo
Can I use the {!boost} feature in conj
What is the preferred way to index html using DIH (my html is stored in a
blob field in our database)?
I know there is the built in HTMLStripTransformer but that doesn't seem to
work well with malformed/incomplete HTML. I've created a custom transformer
to first tidy up the html using JTidy then
This is what Field Collapsing does. It is a complex feature and is not
in the Solr trunk yet.
On Tue, Jun 8, 2010 at 9:15 AM, Moazzam Khan wrote:
> How would I do a facet search if I did this and not get duplicates?
>
> Thanks,
> Moazzam
>
> On Mon, Jun 7, 2010 at 10:07 AM, Israel Ekpo wrote:
>>
Is it necessary that a document 1 year old be more relevant than one
that's 1 year and 1 hour old? In other words, can the boosting be
logarithmic wrt time instead of linear?
A schema design tip: you can store a separate date field which is
rounded down to the hour. This will make for a much small
The Distributed Search feature assumes that a document only exists in
one code. Updating a doc in a small core will fail because it may be
found twice.
If you are only updating a popularity score, and only need it for
boosting (but not for searching on a value), there is a feature called
the Exter
Does Solr handling having two masters that are also slaves to each other (ie
in a cycle)?
Regards,
Glen
The DataImportHandler has a tool for fetching recent updates in the
database and indexing only those new&changed records. It has no
scheduler. You would set up the DIH configuration and then write a
cron job to run it at regular intervals.
Lance
On Wed, Jun 9, 2010 at 7:51 AM, Sumit Arora wrote
https://issues.apache.org/jira/browse/LUCENE-2387
There is a "memory leak" that causes the last PDF binary file image to
stick around while working on the next binary image. When you commit
after every extraction, you clear up this "memory leak".
This is fixed in trunk and should make it into a '
Thanks for the comments. I still can't get this multicore thing to work!
Here is my directory structure:
d:
__apachesolr
lucidworks
__lucidworks
solr
__bin
__conf
__lib
tomcat
There is no solr.xml, and solr.solr.home points to
d:\apachesolr\lucidw
I use the following article as a reference when dealing with GC related issues
http://www.petefreitag.com/articles/gctuning/
I suggest you activate the verbose option and send GC stats to a file. I don't
remember exactly what
was the option but you should find the information easily
Good luck
On Fri, Jun 4, 2010 at 3:14 PM, Chris Hostetter
wrote:
> : That is still really small for 5MB documents. I think the default solr
> : document cache is 512 items, so you would need at least 3 GB of memory
> : if you didn't change that and the cache filled up.
>
> that assumes that the extracted te
I am keeping some data int Json format in HBase table.
I would like to index this data with solr.
Is there any examples of indexing HBase table?
Evry node in HBase has atribyte that saves the data then it was writed int
table.
Is there any option to search no only by text but also to search the da
On Tue, Jun 8, 2010 at 4:18 PM, wrote:
> The following should work on centos/redhat, don't forget to edit the paths,
> user, and java options for your environment. You can use chkconfig to add it
> to your startup.
Thanks, Colin.
Sixten
What is the best practice? Perhaps we can amend the article at
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
to include the recommendation (ie, dates are commonly unique).
I'm assuming using a long is the best choice.
>Have you looked at the garbage collector statistics? I've experienced this
>kind of issues in the past
and I was getting huge spikes when the GC was doing its job.
I haven't, and I'm not sure what a good way to monitor this is. The
problem occurs maybe once a week on a server. Should I run jstat
Have you looked at the garbage collector statistics? I've experienced this kind
of issues in the past
and I was getting huge spikes when the GC was doing its job.
On 2010-06-09, at 10:52 AM, Paul wrote:
> Hi all,
>
> In my app, it seems like solr has become slower over time. The index
> has gro
Hi,
When using the data import handler and clicking on 'Debug now' it stores the
current date as 'last_index_time' into the dataimport.properties file.
Is it the right behaviour, as debug don't do a commit?
Thanks
marc
Hi all,
In my app, it seems like solr has become slower over time. The index
has grown a bit, and there are probably a few more people using the
site, but the changes are not drastic.
I notice that when a solr search is made, the amount of cpu and ram
spike precipitously.
I notice in the solr lo
Hey All,
I am new to Solr Area, and just started exploring it and done basic stuff,
now I am stuck with logic :
How Solr Manages Connected Database Updates
Scenario :
-- Wrote one Indexing Program which runs on Tomcat , and by running this
program, it reads data from connected MySql Database a
help me please =(
--
View this message in context:
http://lucene.472066.n3.nabble.com/XSLT-for-JSON-tp845386p882319.html
Sent from the Solr - User mailing list archive at Nabble.com.
>>... but decided not to use it anyway?
that's pretty much correct. the huge commercial scale of the project
dictates that we need as much system stability as possible from the outset;
thus the tools we are use must be established, community-tested and trusted
versions. we also noticed that some
Hi,
Check your requestHandler. It may preset some values that you don't see. Your
echoParams setting may be explicit instead of all [1]. Alternatively, you
could add the echoParams parameter to your query if it isn't set as an
invariant in your requestHandler.
[1]: http://wiki.apache.org/solr
Hi all,
We are currently working on a proof-of-concept for a client using Solr
and have been able to configure all the features they want except the
scoring.
Problem is that they want scores that make results fall in buckets:
* Bucket 1: exact match on category (score = 4)
* Bu
Refering
http://lucene.472066.n3.nabble.com/unloading-a-solr-core-doesn-t-free-any-memory-td501246.html#a501246
Do we have any solution to free up memory after Solr Core Unload?
Ankit
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Core-Unload-tp882187p882187.html
S
Hi,
I have been using SOLR for sometime now and had no issues till I was using
it in windows. Yesterday I moved the SOLR code to Linux servers and started
to index the data. Indexing completed successfully in the linux severs but
when I queried the index, the response header returned (by the SOLR
Hi everyone,
I am trying to build the spellcheck index with *IndexBasedSpellChecker*
default
text
./spellchecker
And I want to specify the dynamic field "*_text" as the field option:
How it can be done?
Thanks, Bogdan
--
Bogdan Gusiev.
agre...@gmail.com
I agree. I'll add this information to the wiki.
On 9 June 2010 14:32, Jean-Sebastien Vachon wrote:
> ok great.
>
> I believe this should be mentioned in the wiki.
>
> Later
>
> On 2010-06-09, at 4:06 AM, Martijn v Groningen wrote:
>
>> The fieldCollapseCache should not be used as it is now, it us
ok great.
I believe this should be mentioned in the wiki.
Later
On 2010-06-09, at 4:06 AM, Martijn v Groningen wrote:
> The fieldCollapseCache should not be used as it is now, it uses too
> much memory. It stores any information relevant for a field collapse
> search. Like document collapse cou
- solr.xml have to reside in the solr.home dir. you can setup this with the
java-option
-Dsolr.solr.home=
- admin is per core, so solr/CORENAME/admin will work
it is quite simple to setup.
> -Ursprüngliche Nachricht-
> Von: xdzgor [mailto:p...@alphasolutions.dk]
> Gesendet: Mittwo
are there any built-in tools for performance test? thanks
Thanks again Ahmet and Erik.
Turns out that this was calling the correct query parser all along.
The real problem was a combination of the query cache and my hacking the query
to enable BM25 scoring.
When I use a standard BooleanQuery, this behaved as published.
Now I have to understand how to twe
Hello.
i want to call the termscomponent with this request:
http://host/solr/app/select/?q=har
i want the same result when i use this request:
http://host/solr/app/terms/?q=har&terms.prefix=har
-->
9
9
9
...
. this is my solrconfig.xml requestHandler
Hello,
Is there a way to copy a multivalued field to a single value by taking for
example the first index of the multivalued field?
I am actually trying to sort my index by Title and my index contains Tika
extracted titles which come in as multi valued hence why my title field is
multi valued.
> Thanks, Ahmet.
> Yes, my solrconfig.xml file is very similar to what you
> wrote.
> When I use &echoparams=all and defType=myqp, I get:
>
>
> hi
> all
> myqp
>
>
> However, when I do not use the defType (hoping it will be
> automatically
> Inserted from solrconfig), I get:
>
>
> hi
> all
If you take a look in the examples directory there is a directory called
multicore. This is an example of the solrhome of a multicore setup.
Otherwise take a look at the logged output of Solr itself. It should tell
you what is wrong with the setup
On 9 June 2010 11:08, xdzgor wrote:
>
> Hi - I
Thanks guys.
I will try this with some test documents, fingers crossed.
And by the way, I got the minTokenLen parameter from one of the thread
replies (from Erik).
Cheerz,
Ali
--
View this message in context:
http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignat
Hi - I can't seem to get "multicores" to work. I have a solr installtion
which does not have a "solr.xml" file - I assume this means it is not
multicore.
If I create a solr.xml, as described on
http://wiki.apache.org/solr/CoreAdmin, my solr installation fails - for
example I get 404 errors when t
We have following solr configuration:
java -Xms512M -Xmx1024M -Dsolr.solr.home= -jar
start.jar
in SolrConfig.xml
false
4
20
1024
1
1000
1
native
false
1024
4
false
true
Markus Jelsma wrote:
>
> Well, it got me too! KMail didn't properly order this thread. Can't seem
> to
> find Hatcher's reply anywhere. ??!!?
>
Whole thread here:
http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tt479039.html
--
View this message in co
Yuval - my only hunch is that you're hitting a different request
handler than where you configured the default defType. Send us the
URL you're hitting Solr with, and the full request handler mapping.
And you're sure you're the exact core you're hitting (since you
mention multicore) you th
On Jun 8, 2010, at 1:57 PM, Naomi Dushay wrote:
Missing Facet Values:
---
to find how many documents are missing values:
facet.missing=true&facet.mincount=really big
http://your.solr.baseurl/select?rows=0&facet.field=ffldname&facet.mincount=1000&facet.missing=true
Well, it got me too! KMail didn't properly order this thread. Can't seem to
find Hatcher's reply anywhere. ??!!?
On Tuesday 08 June 2010 22:00:06 Andrew Clegg wrote:
> Andrew Clegg wrote:
> > Re. your config, I don't see a minTokenLength in the wiki page for
> > deduplication, is this a recent a
Here's my config for the updateProcessor. It not uses another signature method
but i've used TextProfileSignature as well and it works - sort of.
true
sig
true
content
org.apache.solr.update.processor.Lookup3Signature
Of course, you must
The fieldCollapseCache should not be used as it is now, it uses too
much memory. It stores any information relevant for a field collapse
search. Like document collapse counts, collapsed document ids /
fields, collapsed docset and uncollapsed docset (everything per unique
search). So the memory usag
Thanks, Ahmet.
Yes, my solrconfig.xml file is very similar to what you wrote.
When I use &echoparams=all and defType=myqp, I get:
hi
all
myqp
However, when I do not use the defType (hoping it will be automatically
Inserted from solrconfig), I get:
hi
all
Can you see what I am doing wrong?
54 matches
Mail list logo