Can you try this
bin/nutch solrindex http://127.0.0.1:8080/solr/ crawl/crawldb -linkdb
crawl/linkdb crawl/segments/*
--
View this message in context:
http://lucene.472066.n3.nabble.com/Problem-Integrating-solr-4-7-1-with-apache-nutch-1-8-tp4130149p4130569.html
Sent from the Solr - User mailing
Thanks! So solr 4.7 does not seem to respect the luceneMatchVersion on the
binary (index) level. Or perhaps, I misunderstand the meaning of the
luceneMatchVersion.
This is what I see when loading index from hdfs via luke and launching the
Index Checker tool:
[clip]
Segments file=segments_2 numSeg
You can use PowerShell in windows to kick off a URL at a scheduled time.
On Thu, Apr 10, 2014 at 11:02 PM, harshrossi wrote:
> I am using *DeltaImportHandler* for indexing data in Solr. Currently I am
> manually indexing the data into Solr by selecting commands full-import or
> delta-import fr
DataImportHandler is just a URL call. You can see the specific URL you
want to call by opening debugger window in Chrome/Firefox and looking
at the network tab.
Then, you have a general problem of how to call a URL from Windows
Scheduler. Google brings a lot of results for that, so you should be
a
I am using *DeltaImportHandler* for indexing data in Solr. Currently I am
manually indexing the data into Solr by selecting commands full-import or
delta-import from the Solr Admin screen.
I am using Windows 7 and would like to automate the process by specifying a
certain time interval for executi
Does your Solr schema match the data output by nutch? It’s up to you to create
a Solr schema that matches the output of nutch – read up on the nutch doc for
that info. Solr doesn’t define that info, nutch does.
-- Jack Krupansky
From: Xavier Morera
Sent: Thursday, April 10, 2014 12:58 PM
To: s
Its fine Erick, I am guessing that maybe* &fq=(SKU:204-161)... *this SKU
with that value is present in all results that's why Name products are not
getting boosted.
Ravi: check your results without filtering, does all the results
include *SKU:204-161.
*I guess this may help.
On Fri, Apr 11, 201
First, there is no "master" node, just leaders and replicas. But that's a nit.
No real clue why you would be going out of memory. Deleting a
document, even by query should just mark the docs as deleted, a pretty
low-cost operation.
how much memory are you giving the JVM?
Best,
Erick
On Thu, Apr
Aman:
Oops, looked at the wrong part of the query, didn't see the bq clause.
You're right of course. Sorry for the misdirection.
Erick
Thanks for you detailed answer, Trey! I guess it helps to have just
written that book :) By the way - I am eager to get it on our platform
(safariflow.com -- but I think it hasn't arrived from Manning yet).
I had a half-baked idea about using a prefix like that. It did seem like
it would be so
Yes, I see - I could essentially do the tokenization "myself" (or using
some Analyzer chain) in an Update Processor. Yes I think that could
work. Thanks, Alex!
-Mike
On 4/10/14 10:09 PM, Alexandre Rafalovitch wrote:
It's an interesting question.
To start from, the copyField copies the sour
Hi Michael,
It IS possible to utilize multiple Analyzers within a single field, but
it's not a "built in" capability of Solr right now. I wrote something I
called a "MultiTextField" which provides this capability, and you can see
the code here:
https://github.com/treygrainger/solr-in-action/tree/m
thanks sir,
in that case i need to know about svn as well.
Thanks
Aman Tandon
On Fri, Apr 11, 2014 at 7:26 AM, Alexandre Rafalovitch
wrote:
> You can find the read-only Git's version of Lucene+Solr source code
> here: https://github.com/apache/lucene-solr . The SVN preference is
> Apache Founda
It's an interesting question.
To start from, the copyField copies the source content, so there is no
source-related tokenization description. Only the target's one. So,
that approach is not suitable.
Regarding the lookups/auto-complete. There has been a bunch of various
implementations added rece
You can find the read-only Git's version of Lucene+Solr source code
here: https://github.com/apache/lucene-solr . The SVN preference is
Apache Foundation's choice and legacy. Most of the developers'
workflows are also around SVN.
Regards,
Alex.
Personal website: http://www.outerthoughts.com/
C
Put separate issues into separate emails. That way new people will
look at the new thread. As it was, it was out of the conversation flow
and got lost.
Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
[solr version 4.3.1]
Hello,
I have a solr cloud (4 nodes - 2 shards) with a fairly large amount
documents (~360G of index per shard). Now, a major portion of the data is
not required and I need to delete those documents. I would need to delete
around 75% of the data.
One of the solutions could b
The lack of response to this question makes me think that either there
is no good answer, or maybe the question was too obtuse. So I'll give
it one more go with some more detail ...
My main goal is to implement autocompletion with a mix of words and
short phrases, where the words are drawn fr
Hi,
I am new here, i have question in mind that why we are preferring the svn
more than git?
--
With Regards
Aman Tandon
Hello Erick,
I am confused here, how does the boosting will not affect if he is boosting
the name products by 2 because he is filtering the results and then
applying the boost.
On Fri, Apr 11, 2014 at 6:12 AM, Aman Tandon wrote:
> Hi Ravi,
>
> For the better analysis for ranking of document
Hi Ravi,
For the better analysis for ranking of documents, you should need to query
the index with these extra parameters(in bold)
eg...whole_query*&debug=true&wt=xml.*
Copy that xml and and paste it to http://explain.solr.pl/ you can then
easily find out the ranking alalysis in the forms of the p
What Shawn said. q=*:* is a "constant score query", i.e. every match
has a score as 1.0.
fq clauses don't contribute to the score. The boosts you're specifying
have absolutely no effect.
Move the fq clause to your main query (q=) to see any effect.
Try adding &debug=all to your query and look at th
any help related to my previous mail update??
On Thu, Apr 10, 2014 at 7:52 PM, Aman Tandon wrote:
> thanks sir, i always smile when people here are always ready for help, i
> am thankful to all, and yes i started learning by reading daily at least
> 50-60 mails to increase my knowledge gave my s
Chris,
Thank you for the detailed explanation, this helps a lot.
One of my current hurdles is my search system is in Java using Lucene Query
objects to construct a BooleanQuery which is then handed to Solr. Since Lucene
does not know about the LocalParams it's tricky to get them to play properly
Mark: first off, the details matter.
Nothing in your first email mae it clear that the {!join} query you were
refering to was not the entirety of your query param -- which is part of
the confusion and was a significant piece of Shawn's answer. Had you
posted the *exact* request you were sendi
All,
What is the best practice or guideline towards considering multiple
collections particularly in the solr cloud env?
Thanks
Srikanth
On Sun, Mar 16, 2014 at 2:47 PM, danny teichthal wrote:
>
> To make things short, I would like to use block joins, but to be able to
> index each document on the block separately.
>
> Is it possible?
>
no way. use query time {!join} or denormalize then, field collapsing.
--
Sincerely yours
Mi
It sounds like you can make it work with the frange qparser plugin:
fq={!frange l=0 u=0}sub(field(a),field(b))
Joel Bernstein
Search Engineer at Heliosearch
On Thu, Apr 10, 2014 at 3:36 PM, Erick Erickson wrote:
> Uhhhm, did you look at function queries at all? That should work for you.
>
>
There’s no such other location in there. BTW, you can disable the mtree merge
via --reducers=-2 (or --reducers=0 in old versions) .
Wolfgang.
On Apr 10, 2014, at 3:44 PM, Dmitry Kan wrote:
> a correction: actually when I tested the above change I had so little data,
> that it didn't trigger su
Hi there
I am using solrcloud (4.3). I am trying to get the status of a core from
solr using (localhost:8000/solr/admin/cores?action=STATUS&core=) and
i get the following output
100
102
2
20527
20
*false*
What does current mean? A few of the cores are optimized (with segment
count 1) and show
On 4/8/2014 22:00 GMT Shawn Heisey wrote:
>On 4/8/2014 1:48 PM, Mark Olsen wrote:
>> Solr version 4.2.1
>>
>> I'm having an issue using a "join" query with a range query, but only when
>> the query
is wrapped in parens.
>>
>> This query works:
>>
>> {!join from=member_profile_doc_id to=id}language
Eric, Below is the query part
select?q=*:*&fq={!join%20from=SKU%20to=SKU%20fromIndex=Collection2}(CatalogName:*Products)&fq=(SKU:204-161)%20OR%20(Name:%22204-161%22)&bq=Name:%22204-161%22^2
I am not getting the Name Match record in the first list , I am getting always
SKU matching Record.
An
Uhhhm, did you look at function queries at all? That should work for you.
You might want to review:
http://wiki.apache.org/solr/UsingMailingLists
Best,
Erick
On Thu, Apr 10, 2014 at 11:51 AM, horot wrote:
> Values come from the Solr doc. I can not get to compare the two fields to
> get some res
On 4/10/2014 12:49 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) wrote:
Hi I am looking boosting to help if I can achieve the Rank equal to MS SQL
Server.
I have a query something like
&fq=(SKU:123-87458) OR Name:" 123-87458"
I need to get the Exact Match as first in the re
What kind of field is "Name"? Assuming it's string, you should be able
to boost it. Boosts are not relevant to filters (fq) clauses at all,
where were you trying to add the boost?
You need to provide significantly more information to get a more
helpful answer. You might review:
http://wiki.apache
Values come from the Solr doc. I can not get to compare the two fields to
get some result. The logic of such a query: x <> '' and y <> '' and x = y.
it's something like
q=x:* AND y:* AND x:y
but the problem is that the field can not be compared direct form. If
someone knows how to solve this p
Hi I am looking boosting to help if I can achieve the Rank equal to MS SQL
Server.
I have a query something like
&fq=(SKU:123-87458) OR Name:" 123-87458"
I need to get the Exact Match as first in the results, In this case SKU. But
also I can change to display Name in the List which is not
Where are the values coming from? You might be able to use the _val_
hook for function queries if they're in fields in the doc. Or if it's
a constant you pass in
Let's claim it's just a value in you document. Can't you just form a
filter query on it?
Details matter, there's not enough info he
hi
whether it is possible to compare two variables in the Solr filter?
As best to build a filter: x - y = 0, i.e. get all records if x = y
--
View this message in context:
http://lucene.472066.n3.nabble.com/filter-capabilities-are-limited-tp4130458.html
Sent from the Solr - User mailing list
bq: The SQL query contains a Replace statement that does this
Well, I suspect that's where the issue is. The facet values being
reported include:
134826
which indicates that the incoming text to Solr still has the commas.
Solr is seeing the commas and all.
You can cure this by using PatternReplac
My analysis chain includes CJKBigramFilter on both the index and query.
I have outputUnigrams enabled on the index side, but it is disabled on
the query side. This has resulted in a problem with phrase queries.
This is a subset of my index analysis for the three terms you can see in
the ICUN
fwiw,
Facets are much less heap greedy when counted for docValues enabled fields,
they should not hit UnInvertedField in this case. Try them.
On Thu, Apr 10, 2014 at 8:20 PM, Toke Eskildsen wrote:
> Shawn Heisey [s...@elyograg.org] wrote:
> >On 4/9/2014 11:53 PM, Toke Eskildsen wrote:
> >> The m
The SQL query contains a Replace statement that does this
> -Original Message-
> From: Shawn Heisey [mailto:s...@elyograg.org]
> Sent: April-10-14 11:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Were changes made to facetting on multivalued fields recently?
>
> On 4/10/2014 9:14
Shawn Heisey [s...@elyograg.org] wrote:
>On 4/9/2014 11:53 PM, Toke Eskildsen wrote:
>> The memory allocation for enum is both low and independent of the amount
>> of unique values in the facets. The trade-off is that is is very slow
>> for medium- to high-cardinality fields.
> This is where it is
Hi Ayush,
I thinks this
""IBM!12345". The exclamation mark ('!') is critical here, as it distinguishes
the prefix used to determine which shard to direct the document to."
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
On Thursday, April 10, 2014 2:3
On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote:
> Here are the field definitions for both our old and new index... as you can
> see that are identical. We've been using this chain and field type starting
> with Solr 1.4 and never had any problem. As for the documents, both indexes
> are using
So you're saying that you have B_1 - B_8 in one doc, B_9 - B_16 in
another doc etc?
What's so confusing is that in your first e-mail, you said:
bq: This denormalization grows the index size with a factor 100 in worse case.
Which I took to mean you have at most 100 of these fields.
Please look at
On 4/9/2014 11:53 PM, Toke Eskildsen wrote:
>> This does not happen with the 'old' method 'facet.method=enum' - memory
>> usage is stable and solr is unbreakable with my hold-reload test.
>
> The memory allocation for enum is both low and independent of the amount
> of unique values in the facets.
Here are the field definitions for both our old and new index... as you can see
that are identical. We've been using this chain and field type starting with
Solr 1.4 and never had any problem. As for the documents, both indexes are
using the same data source. They could be slightly out of sync f
On 4/10/2014 12:40 AM, Atanas Atanasov wrote:
> I need some help. After updating to SOLR 4.4 the tomcat process is
> consuming about 2GBs of memory, the CPU usage is about 40% after the start
> for about 10 minutes. However, the bigger problem is, I have about 1000
> cores and seems that for each c
Erick Erickson wrote
> Well, you're constructing the URL somewhere, you can choose the right
> boost there can't you?
Yes of course!
As example:
We have one filter field called FILTER which can have unlimited values acros
all documents.
Each document as on average 8 values set for FILTER (e.g.
bq: But it is possible to select a different boost field depending on
the current filter query?
Well, you're constructing the URL somewhere, you can choose the right
boost there can't you?
I don't understand this bit:
Well its basically one multivalued field that can have unlimited
values and has
Erick Erickson wrote
>
> So why not index these boosts in separate fields in the document (e.g.
> f1_boost, f2_boost etc) and use a function query
> (https://cwiki.apache.org/confluence/display/solr/Function+Queries) at
> query time to boost by the correct one?
Well its basically one multivalued
OKay okay i am a centos user in office, windows at home :D
Thanks
Aman Tandon
On Thu, Apr 10, 2014 at 8:01 PM, Atanas Atanasov wrote:
> Hi,
>
> I see the threads of the tomcat7.exe process in the Windows Task manager.
>
> Regards,
> Atanas Atanasov
>
>
> On Thu, Apr 10, 2014 at 5:28 PM, Aman Ta
Hi,
I see the threads of the tomcat7.exe process in the Windows Task manager.
Regards,
Atanas Atanasov
On Thu, Apr 10, 2014 at 5:28 PM, Aman Tandon wrote:
> Hi Atanas,
>
> I have a question that, how do know that how much thread does the tomcat
> has?
>
> Thanks
> Aman Tandon
>
>
> On Thu, Apr
Hi Atanas,
I have a question that, how do know that how much thread does the tomcat
has?
Thanks
Aman Tandon
On Thu, Apr 10, 2014 at 7:53 PM, Erick Erickson wrote:
> I don't expect having equal values to make a noticeable difference,
> except possibly in some corner cases. Setting them equal is
I don't expect having equal values to make a noticeable difference,
except possibly in some corner cases. Setting them equal is mostly for
avoiding surprises...
Erick
On Thu, Apr 10, 2014 at 7:17 AM, Atanas Atanasov wrote:
> Thanks for the tip,
>
> I already set the core properties. Now tomcat h
thanks sir, i always smile when people here are always ready for help, i am
thankful to all, and yes i started learning by reading daily at least 50-60
mails to increase my knowledge gave my suggestion if i am familiar with it,
people here correct me as well if i am wrong. I know it will take time
Hmmm, I scanned your question, so maybe I missed something. It sounds
like you have a fixed number of filters known at index time, right? So
why not index these boosts in separate fields in the document (e.g.
f1_boost, f2_boost etc) and use a function query
(https://cwiki.apache.org/confluence/disp
Thanks for the tip,
I already set the core properties. Now tomcat has only 27 threads after
start up, which is awesome.
Works fine, first search is not noticeably slower than before.
I'll put equal values for Xmx and Xms and see if there will be any
difference.
Regards,
Atanas
On Thu, Apr 10, 2
Please provide suggestions what could be the reason for this.
Thanks,
On Thu, Apr 10, 2014 at 2:54 PM, Jilani Shaik wrote:
> Hi,
>
> When I queried terms component with a "terms.prefix" the QTime for it is
> <100 milli seconds, where as the same query I am giving with "terms.lower"
> then the
Trying to fit 1,000 cores in 6G of memory is... interesting. That's a
lot of stuff in a small amount of memory. I hope these cores' indexes
are tiny.
The lazy-loading bit for cores has a price. The first user in will pay
the warmup penalty for that core while it loads. This may or may not
be notic
Aman:
Here's another helpful resource:
http://wiki.apache.org/solr/HowToContribute
It tells you how to get the source code, set up an IDE etc. for Solr/Lucene
In addition to Alexandre's suggestions, one possibility (but I warn
you it can be challenging) is to create unit tests. Part of the buil
a correction: actually when I tested the above change I had so little data,
that it didn't trigger sub-shard slicing and thus merging of the slices.
Still, looks as if somewhere in the map-reduce contrib code there is a
"link" to what lucene version to use.
Wolfgang, do you happen to know where th
Hi,
We are migrating from Solr 4.6 standalone to Solr 4.7 cloud version, while
reindexing the document we are getting the following error. This is happening
when the unique key has special character, this was not noticed in version 4.6
standalone mode, so we are not sure if this is a version pro
Hello,
We have a denormalized index where certain documents point in essence to the
same content.
The relevance of the documents depends on the current context. E.g. document
A has a different boost factor when we apply filter F1 compared to when we
use filter F2 (or F3, etc).
To support this we
Thanks for responding, Wolfgang.
Changing to LUCENE_43:
IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_43,
null);
didn't affect on the index format version, because, I believe, if the
format of the index to merge has been of higher version (4.1 in this case),
it will merge
Hi,
When I queried terms component with a "terms.prefix" the QTime for it is
<100 milli seconds, where as the same query I am giving with "terms.lower"
then the QTime is > 500 milliseconds. I am using the Solr Cloud.
I am giving both the cases terms.limit as 60 and terms.sort=index.
Query1 Param
mark.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-special-characters-like-and-tp4129854p4130333.html
Sent from the Solr - User mailing list archive at Nabble.com.
On Thu, Apr 10, 2014 at 2:14 PM, Atanas Atanasov wrote:
> SEVERE: null:ClientAbortException: java.net.SocketException: Software
> caused connection abort: socket write error
Separate issue, but most likely the client closed the browser and the
server had nowhere to send the respond too. So, it c
Thanks for the quick responses,
I have allocated 1GB min and 6 GB max memory to Java. The cache settings
are the default ones (maybe this is a good point to start).
All cores share the same schema and config.
I'll try setting the
loadOnStartup=*false* transient=*true *options for each core and see
Thanks sir, I will look into this.
Solr and its developer are all helpful and awesome, i am feeling great.
Thanks
Aman Tandon
On Thu, Apr 10, 2014 at 12:29 PM, Alexandre Rafalovitch
wrote:
> Sure, you can do it in Java too. The difference is that Solr comes
> with Java client SolrJ which is tes
Sure, you can do it in Java too. The difference is that Solr comes
with Java client SolrJ which is tested and kept up-to-date. But there
could still be more tutorials.
For other languages/clients, there is a lot less information
available. Especially, if you start adding (human) languages into it
73 matches
Mail list logo