already processed. Fetch record from Solr, For each record,
> > check the new DB, if the record is already processed.
> >
> > Regards
> > Aditya
> > www.findbestopensource.com
> >
> >
> >
> >
> >
> > On Mon, Jul 29, 2013 at 10:26 AM, Joe Zhang
Basically, I was thinking about running a range query like Shawn suggested
on the tstamp field, but unfortunately it was not indexed. Range queries
only work on indexed fields, right?
On Sun, Jul 28, 2013 at 9:49 PM, Joe Zhang wrote:
> I've been thinking about tstamp solution int the
better performance, first I'd load just all the IDs,
> > after, during processing I'd load each document.
> > For what concern the incremental requirement, it should not be difficult
> to
> > write an hash function which maps a non-numerical I'd to a value.
> &
etion.
On Sat, Jul 27, 2013 at 10:28 AM, Shawn Heisey wrote:
> On 7/27/2013 11:17 AM, Joe Zhang wrote:
> > Thanks for sharing, Roman. I'll look into your code.
> >
> > One more thought on your suggestion, Shawn. In fact, for the id, we need
> > more than "
my
> current workload time, it will take longer and also somebody else will
> *have to* invest their time and energy in testing it, reporting, etc. Of
> course, feel free to create the jira yourself or reuse the code -
> hopefully, you will improve it and let me know ;-)
>
> Roman
Thanks.
On Fri, Jul 26, 2013 at 11:34 PM, Shawn Heisey wrote:
> On 7/27/2013 12:30 AM, Joe Zhang wrote:
> > ==> so a "url" field would work fine?
>
> As long as it's guaranteed unique on every document (especially if it is
> your uniqueKey) and goes in
On Fri, Jul 26, 2013 at 11:18 PM, Shawn Heisey wrote:
> On 7/26/2013 11:50 PM, Joe Zhang wrote:
> > ==> Essentially we are doing paigination here, right? If performance is
> not
> > the concern, given that the index is dynamic, does the order of
> > entries remain stab
On a related, inspired by what you said, Shawn, an auto increment id seems
perfect here. Yet I found there is no such support in solr. The UUID only
guarantees uniqueness.
On Fri, Jul 26, 2013 at 10:50 PM, Joe Zhang wrote:
> Thanks for your kind reply, Shawn.
>
> On Fri, Jul 26, 2013
Thanks for your kind reply, Shawn.
On Fri, Jul 26, 2013 at 10:27 PM, Shawn Heisey wrote:
> On 7/26/2013 11:02 PM, Joe Zhang wrote:
> > I have an ever-growing solr repository, and I need to process every
> single
> > document to extract statistics. What would be a reaso
Dear list:
I have an ever-growing solr repository, and I need to process every single
document to extract statistics. What would be a reasonable process that
satifies the following properties:
- Exhaustive: I have to traverse every single document
- Incremental: in other words, it has to allow me
Erick
>
> On Tue, Jul 23, 2013 at 1:43 AM, Jack Krupansky
> wrote:
> > That means that for that document "china" occurs in the title vs.
> "snowden"
> > found in a document but not in the title.
> >
> >
> > -- Jack Krupansky
>
Is my reading correct that the boost is only applied on "china" but not
"snowden"? How can that be?
My query is: q=china+snowden&qf=title^10 content
On Mon, Jul 22, 2013 at 9:43 PM, Joe Zhang wrote:
> Thanks for your hint, Jack. Here is the debug results, which I
score is dominated by your query terms in the non-title fields.
>
> -- Jack Krupansky
>
> -Original Message- From: Joe Zhang
> Sent: Monday, July 22, 2013 11:06 PM
> To: solr-user@lucene.apache.org
> Subject: Question about field boost
>
>
> Dear Solr experts:
>
>
Dear Solr experts:
Here is my query:
defType=dismax&q=term1+term2&qf=title^100 content
Apparently (at least I thought) my intention is to boost the title field.
While I'm getting some non-trivial results, I'm surprised that the
documents with both term1 and term2 in title (I know such docs do ex
g/apache/lucene/search/similarities/TFIDFSimilarity.html>
>
> You would need to talk to the Nutch guys to see why THEY are setting
> document boost to 0.0.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Joe Zhang
> Sent: Friday, July 12, 2013 11:57 PM
> To:
ansky wrote:
> Did you put a boost of 0.0 on the documents, as opposed to the default of
> 1.0?
>
> x * 0.0 = 0.0
>
> -- Jack Krupansky
>
> -Original Message- From: Joe Zhang
> Sent: Friday, July 12, 2013 10:31 PM
> To: solr-user@lucene.apache.org
> Subject: zer
when I search a keyword (such as "apple"), most of the docs carry 0.0 as
score. Here is an example from explain:
str name="
http://www.bloomberg.com/slideshow/2013-07-12/world-at-work-india.html";>
0.0 = (MATCH) fieldWeight(content:appl in 51), product of:
1.0 = tf(termFreq(content:appl)=1)
2.
Can somebody help with this one, please?
On Fri, Jun 21, 2013 at 10:36 PM, Joe Zhang wrote:
> A quite standard configuration of nutch seems to autoamtically map "url"
> to "id". Two questions:
>
> - Where is such mapping defined? I can't find it anywhere i
A quite standard configuration of nutch seems to autoamtically map "url" to
"id". Two questions:
- Where is such mapping defined? I can't find it anywhere in nutch-site.xml
or schema.xml. The latter does define the "id" field as well as its
uniqueness, but not the mapping.
- Given that nutch nutc
ce to see it structured
> properly.
>
> Upayavira
>
> On Tue, Jun 18, 2013, at 02:52 PM, Joe Zhang wrote:
> > I did include "debugQuery=on" in the query, but nothing extra showed up
> > in
> > the response.
> >
> >
> > On Mon, Jun 17, 2013 at 10:29 PM,
I did include "debugQuery=on" in the query, but nothing extra showed up in
the response.
On Mon, Jun 17, 2013 at 10:29 PM, Gora Mohanty wrote:
> On 18 June 2013 10:49, Joe Zhang wrote:
> > I issued a simple query ("apple") to my collection and got 201 documen
I issued a simple query ("apple") to my collection and got 201 documents
back, all of which are scored 0. What does this mean? --- The documents do
contain the query words.
Thank you very much! This is a good starting point!
On Fri, Dec 21, 2012 at 6:15 AM, Erick Erickson wrote:
> Have you seen the functions here:
> http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions
>
> Best
> Erick
>
>
> On Thu, Dec 20, 2012 at 1:18 PM, Joe Zhang
the problem.
>
> You can also change splitOnCaseChange="1" to splitOnCaseChange="0" to
> avoid the term splitting issue.
>
> Be sure to completely reindex in either case.
>
> -- Jack Krupansky
>
> -Original Message- From: Joe Zhang
> Sent:
I have a search like this:
When I query "COST", it gives reasonable results (n1);
When I query "CoSt", however, it gives me n2 (>n1) results, and I can't
locate actual
s are included.
>
>
> On Mon, Dec 3, 2012 at 3:04 PM, Joe Zhang wrote:
>
> > In other words, what I wanted to achieve is case-senstive indexing on a
> > small set of words. Can anybody help?
> >
> > On Sun, Dec 2, 2012 at 11:56 PM, Joe Zhang wrote:
>
In other words, what I wanted to achieve is case-senstive indexing on a
small set of words. Can anybody help?
On Sun, Dec 2, 2012 at 11:56 PM, Joe Zhang wrote:
> To be more specific, this is the data type I was using:
>
> positionIncremen
To be more specific, this is the data type I was using:
On Sun, Dec 2, 2012 at 11:51 PM, Joe Zhang wrote:
> yes, that is
ache/solr/analysis/KeepWordFilter.html
> ,
> I am pretty sure it is the correct behavior of this filter :)
>
> I guess you are trying to this filter to index some special words in
> Chinese?
>
>
> On Mon, Dec 3, 2012 at 1:54 PM, Joe Zhang wrote:
>
> > I defined the
Sorry I didn't make it perfectly clear. The "id" field is URL.
On Sun, Dec 2, 2012 at 11:33 PM, Joe Zhang wrote:
> Thanks!
>
>
> On Sun, Dec 2, 2012 at 11:20 PM, Xi Shen wrote:
>
>> If the value for "id" field is the same, the old entry will b
Thanks!
On Sun, Dec 2, 2012 at 11:20 PM, Xi Shen wrote:
> If the value for "id" field is the same, the old entry will be update; if
> it is new, a new entry will be created & indexed.
>
> This is my experience. :)
>
>
> On Mon, Dec 3, 2012 at 1:45 PM
I defined the following data type in my solr schema.xml
when I use the type "testkeep" to index a test field, my true expecation
was to make sure solr indexes the uppercase form of a small list of words
in the file, AND TREAT EVERY OTHER WORD AS USUAL. The goal of securing the
clo
This is very helpful. Thanks a lot, Shaun and Dikchant!
So in default single-core situation, the index would live in data/index,
correct?
On Fri, Nov 30, 2012 at 11:02 PM, Shawn Heisey wrote:
> On 11/30/2012 10:11 PM, Joe Zhang wrote:
>
>> May I ask: how to set up multiple indexes,
that is really strange. so basic stopwords such as "a" "the' are not
eliminated from the index?
On Tue, Nov 27, 2012 at 11:16 PM, 曹霖 wrote:
> justt no stopwords are considered in that case
>
> 2012/11/28 Joe Zhang
>
> > t no stopwords are considered in
> > this case
> >
>
34 matches
Mail list logo