Thanks hector!
Is there any other comments from other people??
best
mersad
On 12/7/2011 7:20 PM, Hector Castro wrote:
This article shouldn't flat out make the decision for you, but these concerns
raised by the guys at StackOverflow (over SQL Server 2008) helped guide us
toward Solr:
Otis, Tomás: thanks for the great links!
2011/12/7 Tomás Fernández Löbbe
> Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any
> tool that visualizes JMX stuff like Zabbix. See
>
> http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-z
Hmmm...that sounds pretty odd...
How are you measuring the commit time?
You likely want to turn off any caches, as they will be expired every second,
but that should not cause this...
I can try and duplicate your setup tomorrow and see what i can spot.
- Mark
On Dec 7, 2011, at 8:13 PM, yu sh
I've been reading the solr source code and made modifications by
implementing a custom Similarity class.
I want to implement a weight to the score by multiplying a number
based on if the current doc has certain term in it.
So if the query was q=data_text:foo
then the Similiarity class would apply
Replication just copies the index, so I'm not sure how this would help offhand?
With SolrCloud this is a breeze - just fire up another replica for a shard and
the current index will replicate to it.
If you where willing to export the data to some portable format and then pull
it back in, why no
Thanks for the response. I will set the stream accrodingly. As for
extraction of the text from pdf, I want the entire content of the pdf. This
content will be part of a SOLR document, which has an uniqueid.
The unique is for what? Here's my schema:
Inter
Yeah I was actually hoping that some how I could use the replication
handler to do this, fire up 1 shard, set another as a slave and see if
it would replicate the index to it but obviously I'm not sure that
would work either.
Something like this would be great too
https://issues.apache.org/jira/br
Yes. That's what I would expect. I guess I didn't understand when you said
"The facet counts are the counts of the *values* in that field"
Because it seems its the count of the number of matching documents
irrespective
if one document has 20 values for that field and another 10, the facet
coun
Hi Mark, and all
I now use commit configuration exactly as below:
10
1000
But the commit time takes about 60 seconds.
I have around 120 - 130 documents in my server. And each day, the
number will increase about 6000. My symptom is if solr server is just
s
http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/UniqueKey
On Wed, Dec 7, 2011 at 5:04 PM, Lance Norskog wrote:
> Yes, the SignatureUpdateProcessor is what you want. The 128-bit hash is
> exactly what you want to use in this situation. You will never get the
> same ID
Yes, the SignatureUpdateProcessor is what you want. The 128-bit hash is
exactly what you want to use in this situation. You will never get the
same ID for two urls- collisions have never been observed "in the wild" for
this hash algorithm.
Another cool thing about using hash-codes as fields is th
Unfortunately, I think the the only silver bullet here, for pure Solr, is to
build a system that makes it possible to reindex somehow.
On Dec 7, 2011, at 1:38 PM, Erik Hatcher wrote:
>
> On Dec 7, 2011, at 13:20 , Shawn Heisey wrote:
>
>> On 12/6/2011 2:06 PM, Erik Hatcher wrote:
>>> I think t
Try setting the StreamType to application/pdf, that way Tika doesn't have
to infer it.
BTW the second argument to ExtractParameters is the unique key... a value
of "*" probably doesn't make sense.
--
Mauricio
On Wed, Dec 7, 2011 at 5:50 PM, Soumitra Banerjee <
soumitrabaner...@gmail.com> wrote:
All -
I am using SOLR 3.5, SOLRNet 0.4.0.2001, Tom Cat 7.0 and am running a job
to extract the text from pds, stored on my local hard disk.
*Tomcat StdErr log Shows:*
INFO: [core1] webapp=/Solr path=/update/extract params={extractOnly=true&
literal.id=*&resource.name=C:\XXX\10310.pdf&extractForm
Thanks Juan. I guess i have found my reason to migrate to 3.4.
Many thanks.
On Wed, Dec 7, 2011 at 7:43 PM, Juan Grande wrote:
> Hi Kissue,
>
> Support for grouping on SolrJ was added in Solr 3.4, see
> https://issues.apache.org/jira/browse/SOLR-2637
>
> In previous versions you can access the
Hi,
I've wondered the same thing myself. I feel like the "clean" parameter has
something to do with it but it doesn't work as I'd expect either. Thanks
in advance to anyone who can answer this question.
*clean* : (default 'true'). Tells whether to clean up the index before the
indexing is start
Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any
tool that visualizes JMX stuff like Zabbix. See
http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan wrote:
> The culprit seems to
I have a unique ID defined for the documents I am indexing. I want to avoid
overwriting the documents that have already been indexed. I am using
XPathEntityProcessor and TikaEntityProcessor to process the documents.
The DataImportHandler does not seem to have the option to set
overwrite=false. I h
Hi Kissue,
Support for grouping on SolrJ was added in Solr 3.4, see
https://issues.apache.org/jira/browse/SOLR-2637
In previous versions you can access the grouping results by simply
traversing the various named lists.
*Juan*
On Wed, Dec 7, 2011 at 1:22 PM, Kissue Kissue wrote:
> Hi,
>
> I
On Dec 7, 2011, at 13:20 , Shawn Heisey wrote:
> On 12/6/2011 2:06 PM, Erik Hatcher wrote:
>> I think the best thing that you could do here would be to lock in a version
>> of Lucene (all the Lucene libraries) that you use with SolrCloud. Certainly
>> not out of the realm of possibilities of s
On 12/6/2011 2:06 PM, Erik Hatcher wrote:
I think the best thing that you could do here would be to lock in a version of
Lucene (all the Lucene libraries) that you use with SolrCloud. Certainly not
out of the realm of possibilities of some upcoming SolrCloud capability that
requires some upgr
Hi Dmitry,
You should use SPM for Solr - it exposes all Solr metrics and more (JVM, system
info, etc.)
PLUS it's currently 100% free.
http://sematext.com/spm/solr-performance-monitoring/index.html
We use it with our clients on a regular basis and it helps us a TON - we just
helped a very popu
Hi Jiggy
When you query the index, what do you get in the tomcat logs? (Check that out
in tomcat/logs directory)
How much of Heap memory have you allocated to Tomcat?
- Yavar
From: jiggy [new...@trash-mail.com]
Sent: Wednesday, December 07, 2011 9:53 P
Hello Guys,
i have a big problem. I have integrated solr to Magento EE. I have two solr
folder, one is in c:/tomcat 7.0/
and the other one is in my web-folder(c:/www/).
In the tomcat-folder is the data folder of solr, their are about 200 MB
index file(I think here are my datas from magento).
In t
I have a complex edismax query:
facet=true&facet.mincount=0&qf=title^0.08+categorysearch^0.05+abstract^0.03+body^0.1&wt=javabin&rows=25&defType=edismax&version=2&omitHeader=true&fl=*,score&bq=eqid:(3yp^1.57+OR+5fi^1.55+OR+c1s^1.55+OR+3ym^1.55+OR+gjz^1.55...)&start=0&q=*:*&facet.field=category&face
Hi,
I am using Solr 3.3 with SolrJ. Does anybody know how i can use result
grouping with SolrJ? Particularly how i can retrieve the result grouping
results with SolrJ?
Any help will be much appreciated.
Thanks.
Thank you Erik, I will work on your suggestion! It seems it could work,
provided I can boost matches on "redirect" document type
S
Inizio: Erik Hatcher [erik.hatc...@gmail.com]
Inviato: mercoledì 7 dicembre 2011 16.56
Fine: solr-user@lucene.apache.org
Ogg
Can I use a XPathEntityProcessor in conjunction with an
ExtractingRequestHandler? Also, the scripting language that
XPathEntityProcessor uses/supports, is that just ECMA/JavaScript?
Or is XPathEntityProcessor only supported for use in conjuntion with the
DataImportHandler?
Thanks.
What you can do is index the "redirect" documents along with the associated
words, and let Solr do the stemming. Maybe add a "document type" field and if
you get a match on a redirect document type, your web service can do what it
needs to do from there.
Erik
On Dec 7, 2011, at 10:
This article shouldn't flat out make the decision for you, but these concerns
raised by the guys at StackOverflow (over SQL Server 2008) helped guide us
toward Solr:
http://www.infoq.com/news/2008/11/SQL-Server-Text
--
Hector
On Dec 7, 2011, at 2:17 AM, Mersad wrote:
> hi Everyone,
>
No, actually it's a .NET web service that queries Endeca (call it Wrapper). It
returns to its clients a collection of unique product IDs, then the client will
ask other web services for more detailed informations about the given products.
As long as no URL redirection is involved, I think that s
Sorry if this question sounds stupid but i am really really confused about
this. Is there actually a difference between field collapsing and result
grouping in SOLR?
I have come across articles that have talked about setting up field
collapsing with commands that look different from the grouping o
Thanks for the correction, I did not notice that [?]
Spark
2011/12/7 Mark Miller
> Well, if that is exactly what you put, it's wrong. That second one should
> be softAutoCommit.
>
> On Wednesday, December 7, 2011, yu shen wrote:
> > Hi All,
> >
> > I tried using solr 4 nightly build: apache-s
Jamie -
The details would of course be entirely dependent on what changed, but with
Lucene trunk/4.0 there is the flexible indexing API with codecs. I imagine
with a compatibility codec layer one could provide some insulation to changes.
You're at big scale, so the "just reindex everything" an
How did you upgrade? What steps did you follow? Do you have
any custom code? Any additional entries in your
solrconfig.xml?
These details help us diagnose your problem, but it's almost certainly
that you have a mixture of jar files lying around your machine in
a place you don't expect.
Best
Eric
The culprit seems to be the merger (frontend) SOLR. Talking to one shard
directly takes substantially less time (1-2 sec).
On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan wrote:
> Tomás: thanks. The page you gave didn't mention cache specifically, is
> there more documentation on this specifically? I
I am trying to index the pdf and word documents in solr 3.3.0 version+apache
tika uisng SOLRJ when i am able to search the documents with the file name
where as when i am trying to search the any text data in the content(text
data in the file) its not showing any document in response ? Do i need t
Am 07.12.2011 15:09, schrieb Finotti Simone:
I got your and Michael's point. Indeed, I'm not very skilled in web devolpment
so there may be something that I'm missing. Anyway, Endeca does something like
this:
1. accept a query
2. does the stemming;
3. check if the result of the step 2. matches
Thanks for the response Erick.
On Wed, Dec 7, 2011 at 9:08 AM, Erick Erickson wrote:
> Not that I now of. That's one drawback to being on the bleeding edge, when
> the index format changes you have to re-index...
>
> Best
> Erick
>
> On Tue, Dec 6, 2011 at 10:09 AM, Jamie Johnson wrote:
>> Are t
Hello,
I'm trying to use the SolrUIMA component of solr 3.4.0. I modified
solrconfig.xml file in the following way:
C:\Users\Stefano\workspace2\UimaComplete\descriptors\analysis_engine\AggregateAE.xml
true
false
te
Tomás: thanks. The page you gave didn't mention cache specifically, is
there more documentation on this specifically? I have used solrmeter tool,
it draws the cache diagrams, is there a similar tool, but which would use
jmx directly and present the cache usage in runtime?
pravesh:
I have increased
I got your and Michael's point. Indeed, I'm not very skilled in web devolpment
so there may be something that I'm missing. Anyway, Endeca does something like
this:
1. accept a query
2. does the stemming;
3. check if the result of the step 2. matches one of the redirectable words. If
so, returns
Well, if that is exactly what you put, it's wrong. That second one should
be softAutoCommit.
On Wednesday, December 7, 2011, yu shen wrote:
> Hi All,
>
> I tried using solr 4 nightly build: apache-solr-4.0-2011-12-06_08-52-46.
> And try to enable autoSoftCommit like below in solrconfig.xml
>
>
Not that I now of. That's one drawback to being on the bleeding edge, when
the index format changes you have to re-index...
Best
Erick
On Tue, Dec 6, 2011 at 10:09 AM, Jamie Johnson wrote:
> Are there any migration utilities to move from an index built by a
> Solr 4.0 snapshot to Solr Trunk? Th
In your example you'll have 10 facets returned each with a value of 1.
Best
Erick
On Tue, Dec 6, 2011 at 9:54 AM, wrote:
> Sorry to jump into this thread, but are you saying that the facet count is
> not # of result hits?
>
> So if I have 1 document with field CAT that has 10 values and I do a
I ran some more tests. I added an explicit commit after each deleteByQuery()
call and removed the add/reindex step. This hung up immediately and completed
(or timed out?) after 20 minutes. The hangs occur almost exactly 20 minutes
apart. Could this be a Tomcat issue?
I ran jconsole but did
Erik,
Do you have any details behind what would be required to write a tool
to move from one index format to another? Any examples/suggestions
would be appreciated.
On Tue, Dec 6, 2011 at 5:19 PM, Jamie Johnson wrote:
> What about modifying something like SolrIndexConfig.java to change the
> lu
Either way (Endeca's 307, which seems crazy to me) or simply plucking off a
"url" field from the first document returned in a search request... you're
getting a URL back to your client and then using that URL to further send back
to a users browser, I presume. I personally wouldn't implement it
Am 07.12.2011 14:26, schrieb Finotti Simone:
That's the scenario:
I have an XML that maps words W to URLs; when a search request is issued by my
web client, a query will be issued to my Solr application. If, after stemming,
the query matches any in W, the client must be redirected to the associ
That's the scenario:
I have an XML that maps words W to URLs; when a search request is issued by my
web client, a query will be issued to my Solr application. If, after stemming,
the query matches any in W, the client must be redirected to the associated URL.
I agree that it should be handled ou
First, could you tell us more about your use case? Why do you want to change
the response code? HTTP 307 = Temporary redirect - where are you going to
redirect? Sounds like something best handled outside of Solr.
If you went down the route of creating your own custom response writer, then
Hi,
I'm actually having the exact same problem. Did you anyhow find a solution
for this?
cheers
Maurizio
Hi Dimitry, cache information is exposed via JMX, so you should be able to
monitor that information with any JMX tool. See
http://wiki.apache.org/solr/SolrJmx
On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan wrote:
> Yes, we do require that much.
> Ok, thanks, I will try increasing the maxsize.
>
> On
Hello,
I need to change the HTTP result code of the query result if some conditions
are met.
Analyzing the flow of execution of Solr query process, it seems to me that the
"place" that fits better is the QueryResponseWriter. Anyway I didn't found a
way to change the HTTP request layout (I need
Hi. I experience an issue where Solr is using huge ammounts of I/O.
Basically it uses the whole HDD continously, leaving nothing to the
other processes. Solr is called by a script which continously indexes
some files.
The index has around 800MB and I can't understand why it could trash
the HDD so
Is it not possible to expose the shards to your IP and eclipse-debug the
queries via the solr frontend? If you need to intercept the queries between
frontend and shards in a non-windows environment, you could try wireshark
or tcpmon (http://ws.apache.org/commons/tcpmon/)
On Wed, Dec 7, 2011 at 10:
Hi Hoss,
Thanks for getting back to me on this.
: I've been trying to use the UUIDField in solr to maintain ids of the
>: pages I've crawled with nutch (as per
>: http://wiki.apache.org/solr/UniqueKey). The use case is that I want to
>: have the server able to use these ids in another database
Yes, we do require that much.
Ok, thanks, I will try increasing the maxsize.
On Wed, Dec 7, 2011 at 10:56 AM, pravesh wrote:
> >>facet.limit=50
> your facet.limit seems too high. Do you actually require this much?
>
> Since there a lot of evictions from filtercache, so, increase the maxsize
Was that field multivalued="true" earlier by any chance??? Did you rebuild
the index from scratch after changing it to multivalued="false" ???
Regards
Pravesh
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-sorting-issue-can-not-sort-on-multivalued-field-tp3564266p356683
>>facet.limit=50
your facet.limit seems too high. Do you actually require this much?
Since there a lot of evictions from filtercache, so, increase the maxsize
value to your acceptable limit.
Regards
Pravesh
--
View this message in context:
http://lucene.472066.n3.nabble.com/cache-monitoring
i am already using eclipse jetty for debugging but it is really hectic when
we have shards and queries going to each shard i want to skip it and see in
the fiddler rather.
--
Kashif Khan. B.E.,
+91 99805 57379
http://www.kashifkhan.in
On Wed, Dec 7, 2011 at 12:54 PM, Dmitry Kan [via Lucene] <
ml
Hi,
in my index schema I has defined a
DictionaryCompoundWordTokenFilterFactory and a
HunspellStemFilterFactory. Each FilterFactory has a dictionary with
about 100k entries.
To avoid an out of memory error I have to set the heap space to 128m
for 1 index.
Is there a way to reduce the memory cons
Hi All,
I tried using solr 4 nightly build: apache-solr-4.0-2011-12-06_08-52-46.
And try to enable autoSoftCommit like below in solrconfig.xml
10
1000
I try to add a document to this solr instance using solrj client in the
nightly build. I do saw a commit time boost. Single docume
Hello,
I'm using edismax and Solr 4.0 and I'd like to add fuzzy parameters for
some fields like this :
my_field1~2 my_field2 my_field3
Unfortunately It doesn't work, so I tried following approaches :
1) /select?q=my_search_string~2 => of course it applies to *all* fields of
my edismax query, an
64 matches
Mail list logo