Hi All
I have an issue in highlighting that if i query solr on more than one fields
like "+Contents:risk +Form:1" and even i specify the highlighting field is
"Contents" it still highlights risk as well as 1, because it is specified in
the
query.. now if i split the query as "+Contents:risk" i
Another feature missing in DIH is ability to pass parameters into your
queries. If one could pass a named or positional parameter for an entity
query, it will give them lot of freedom to optimize their delta or full load
queries. One can even get creative with entity and delta queries that can
take
Hi,
It's great to see such a fantastic response to this thread - NRT is
alive and well!
I'm hoping to collate this information and add it to the wiki when I
get a few free cycles (thanks Erik for the heads up).
In the meantime, I thought I'd add a few tidbits of additional
information that might
The entry for each term in the terms dict stores a long file offset
pointer, into the .frq file, and another long for the .prx file.
But, these longs are delta-coded, so as you scan you have to sum up
these deltas to get the absolute file pointers.
The terms index (once loaded into RAM) has absol
Hi
My index have fields named ad_title, ad_description & ad_post_date. Let's
suppose a user searches for more than one keyword, then i want the documents
with maximum occurence of all the keywords together should come on top. The
more closer the keywords in ad_title & ad_description should be give
A slightly different route to take, but one that should help test/refine a
semantic parser is wikipedia. They make available their entire corpus, or
any subset you define. The whole thing is like 14 terabytes, but you can get
smaller sets.
--
View this message in context:
http://lucene.472066.n
Those are at least 3 different questions. Easiest first, sorting.
add&sort=ad_post_date+desc (or asc) for sorting on date,
descending or ascending
check out how http://www.supermind.org/blog/378/lucene-scoring-for-dummies
Lucene scores by default. It might close to what you want. The
@Markus Jelsma - the wiki confirms what I said before:
rows
This parameter is used to paginate results from a query. When
specified, it indicates the maximum number of documents from the
complete result set to return to the client for every request. (You
can consider it as the maximum number of re
Sounds like you want the
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
CachedSqlEntityProcessor it lets you make one query that is cached locally
and can be joined to with a separate query.
--
View this message in context:
http://lucene.472066.n3.nabble.com/DataImportH
Chris, I agree, having the ability to make rows something like -1 to bring
back everything would be convenient. However, the 2 call approach
(q=blah&rows=0 followed by q=blah&rows=numFound) isn't that slow, and does
give you more information up front. You can optimize your Array or List<>
sizes in
You don't give an indication of size. How large are the documents being
indexed and how many of them are there. However, my opinion would be a
single index with an 'active' flag. In your queries you can use
FilterQueries (fq=) to optimize on just active if you wish, or just
inactive if that is ne
What is it about the standard relevance ranking that doesn't suit your
needs?
And note that if you sort by your date field, relevance doesn't matter at
all
because the date sort overrides all the scoring, by definition.
Best
Erick
On Fri, Sep 17, 2010 at 6:57 AM, Pawan Darira wrote:
> Hi
>
> My
Sure - start here: http://wiki.apache.org/solr/SolrLogging
Solr uses java util logging out of the box.
You will end up with something like this:
java.util.logging.FileHandler.limit=102400
java.util.logging.FileHandler.count=5
- Mark
lucidimagination.com
On 9/14/10 2:02 PM, Vladimir Sutskever wr
Hi,
I'm trying to filter and sort by distance with this URL:
http://localhost:8080/solr/select/?q=*:*&fq={!sfilt%20fl=loc_lat_lon}&pt=52.02694,-0.49567&d=2&sort={!func}hsin(52.02694,-0.49567,loc_lat_lon_0_d,%20loc_lat_lon_1_d,3963.205)asc
Filtering is fine but it's failing in parsing the sort wi
OK, 1.5 won't be released, so we'll avoid that. I've now got my code
additions compiling against a version of 3.x so we'll stick with that
rather than solr_trunk for the time being.
Does anyone have any sense of when 3.x might be considered stable
enough for a release? We're hoping to go
> The terms index (once loaded into RAM) has absolute longs, too.
So in the TermInfo Index(.tii), the FreqDelta, ProxDelta, And SkipDelta stored
with each TermInfo are actually absolute?
-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Friday, Septemb
Im sry to bother you all with this, but is there a way to search through
the mailinglist archive? Ive found
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ so far
but there isnt any convinient way to search through the archive.
Thanks for your help
(10/09/17 16:36), Ahson Iqbal wrote:
Hi All
I have an issue in highlighting that if i query solr on more than one fields
like "+Contents:risk +Form:1" and even i specify the highlighting field is
"Contents" it still highlights risk as well as 1, because it is specified in the
query.. now if i s
http://www.lucidimagination.com/search/?q=
On Friday 17 September 2010 16:10:23 alexander sulz wrote:
> Im sry to bother you all with this, but is there a way to search through
> the mailinglist archive? Ive found
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ so far
> but there i
The 3.x line should be pretty stable. Hopefully we will do a release
soon. A conversation was again started about more frequent releases
recently, and hopefully that will lead to a 3.x release near term.
In any case, 3.x is the stable branch - 4.x is where the more crazy
stuff happens. If you are
I think we aim for a "stable" trunk (4.0-dev) too, as we always have
(in the functional sense... i.e. operate correctly, don't crash, etc).
The stability is more a reference to API stability - the Java APIs are
much more likely to change on trunk. Solr's *external* APIs are much
less likely to ch
Yes.
They are decoded from the deltas in the tii file into absolutes in
memory, on load.
Note that trunk (w/ flex indexing) has changed this substantially: we
store only the offset into the terms dict file, as an absolute in a
packed int array (no object per indexed term). Then, at the seek
poin
Hi Koji
thank you very much it really works
From: Koji Sekiguchi
To: solr-user@lucene.apache.org
Sent: Fri, September 17, 2010 7:11:31 PM
Subject: Re: Solr Highlighting Issue
(10/09/17 16:36), Ahson Iqbal wrote:
> Hi All
>
> I have an issue in highlighting
Interesting. Thanks for your help Mike!
-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Friday, September 17, 2010 10:29 AM
To: solr-user@lucene.apache.org
Subject: Re: Understanding Lucene's File Format
Yes.
They are decoded from the deltas in the t
I agree it's mainly API wise, but there are other issues - largely due
to Lucene right now - consider the bugs that have been dug up this year
on the 4.x line because flex has been such a large rewrite deep in
Lucene. We wouldn't do flex on the 3.x stable line and it's taken a
while for everything
On Fri, Sep 17, 2010 at 10:46 AM, Mark Miller wrote:
> I agree it's mainly API wise, but there are other issues - largely due
> to Lucene right now - consider the bugs that have been dug up this year
> on the 4.x line because flex has been such a large rewrite deep in
> Lucene. We wouldn't do flex
I'm a total Lucene/SOLR newbie, and I'm surprised to see that when there are
multiple search terms, term proximity isn't part of the scoring process. Has
anyone on the list done custom scoring that weights proximity?
Andy Cogan
-Original Message-
From: kenf_nc [mailto:ken.fos...@realestat
You're welcome!
Mike
On Fri, Sep 17, 2010 at 10:44 AM, Giovanni Fernandez-Kincade
wrote:
> Interesting. Thanks for your help Mike!
>
> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Friday, September 17, 2010 10:29 AM
> To: solr-user@lucene.apach
>
> What I am envisioning (at least to start) is have all this add two fields in
> the index. One would be for color information for the color similarity
> search. The other would be a simple multivalued text field that we put
> keywords into based on what OpenCV can detect about the image. If i
Also there is http://lucene.472066.n3.nabble.com/Solr-User-f472068.html if
you prefer a forum format.
On Fri, Sep 17, 2010 at 9:15 AM, Markus Jelsma wrote:
> http://www.lucidimagination.com/search/?q=
>
>
> On Friday 17 September 2010 16:10:23 alexander sulz wrote:
> > Im sry to bother you all
Go ahead and put an absurdly large value as the rows parameter.
Then wait, because that query is going to take a really long time, it can
interfere with every other query on the Solr server (denial of service), and
quite possibly cause your client to run out of memory as it parses the result.
A
Or, for a fascinating multi-dimensional UI to mailing list archives:
http://markmail.org/ --wunder
On Sep 17, 2010, at 7:15 AM, Markus Jelsma wrote:
> http://www.lucidimagination.com/search/?q=
>
>
> On Friday 17 September 2010 16:10:23 alexander sulz wrote:
>> Im sry to bother you all with
Thanks for being so helpful! You really helped me to answer my
question! You aren't condescending at all!
I'm not using it to pull down *everything* that the Solr instance
stores, just a portion of it. Currently, I need to get 16 records at
once, not just the 10 that show. So I have the rows s
The problem, and it's a practical one, is that terms usually have to be
pretty
close to each other for proximity to matter, and you can get this with
phrase queries by varying the slop.
FWIW
Erick
On Fri, Sep 17, 2010 at 11:05 AM, Andrew Cogan
wrote:
> I'm a total Lucene/SOLR newbie, and I'm sur
Hi everyone.
Im successfully indexing PDF files right now but I still got some problems.
1. Tika seems to map some content to appropiate fields in my schema.xml
If I pass on a literal.title=blabla parameter, tika may have parsed some
information
out of the pdf to fill in the field "title" its
Many thank yous to all of you :)
Am 17.09.2010 17:24, schrieb Walter Underwood:
Or, for a fascinating multi-dimensional UI to mailing list archives:
http://markmail.org/ --wunder
On Sep 17, 2010, at 7:15 AM, Markus Jelsma wrote:
http://www.lucidimagination.com/search/?q=
On Friday 17 Sep
How does highlighting work with JSON output?
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Fri, 9/17/10, Ahson Iqbal wrote:
> From: Ahson Iqbal
> Subject: Solr Hi
BTW, what is NRT?
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Fri, 9/17/10, Peter Sturge wrote:
> From: Peter Sturge
> Subject: Re: Tuning Solr caches with high
Well ..
> because the date sort overrides all the scoring, by
> definition.
THAT'S not good for what I want, LOL!
Is there any way to chain things like distance, date, relevancy, an integer
field to force sort oder, like when using SQL 'SORT BY', the order of sort is
the order of listing?
Den
Near Real Time...
Erick
On Fri, Sep 17, 2010 at 12:55 PM, Dennis Gearon wrote:
> BTW, what is NRT?
>
> Dennis Gearon
>
> Signature Warning
>
> EARTH has a Right To Life,
> otherwise we all die.
>
> Read 'Hot, Flat, and Crowded'
> Laugh at http://www.yert.com/film.php
>
>
> ---
Sure, you can specify multiple sort fields. If the first sort field results
in a tie, then
the second is used to resolve. If both first and second match, then the
third is
used to break the tie.
Note that relevancy is tricky to include in the chain because it's
infrequent to have two
docs with exa
Yes. Just as you'd expect:
&sort=score asc,date desc,title asc [url encoded of course]
The only trick is knowing the special key 'score' for sorting by relevancy.
This is all in the wiki docs:
http://wiki.apache.org/solr/CommonQueryParameters#sort
Also keep in mind, as the docs say, sort
On Sep 17, 2010, at 10:00 AM, Dennis Gearon wrote:
> Well ..
>> because the date sort overrides all the scoring, by
>> definition.
>
> THAT'S not good for what I want, LOL!
>
> Is there any way to chain things like distance, date, relevancy, an integer
> field to force sort oder, like when usin
The users will be able to choose the order of sort based on distance, data and
time, relevancy.
More than likely, my first initial version will do range limits on distance,
data and time. Then relevancy will sort, send it to browser.
After that, the user will sort it in the browser as desired.
HOw does one 'vary the slop'?
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Fri, 9/17/10, Erick Erickson wrote:
> From: Erick Erickson
> Subject: Re: Can i do rel
This means both the indexing and the searching in NRT?
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Fri, 9/17/10, Erick Erickson wrote:
> From: Erick Erickson
>
Does Solr use Lucene NRT?
--- On Fri, 9/17/10, Erick Erickson wrote:
> From: Erick Erickson
> Subject: Re: Tuning Solr caches with high commit rates (NRT)
> To: solr-user@lucene.apache.org
> Date: Friday, September 17, 2010, 1:05 PM
> Near Real Time...
>
> Erick
>
> On Fri, Sep 17, 2010 at 12
On Fri, 17 Sep 2010 04:46:44 -0700 (PDT), kenf_nc
wrote:
>A slightly different route to take, but one that should help test/refine a
>semantic parser is wikipedia. They make available their entire corpus, or
>any subset you define. The whole thing is like 14 terabytes, but you can get
>smaller se
On 9/16/2010 12:27 PM, Dennis Gearon wrote:
Is a core a running piece of software, or just an index/config pairing?
Dennis Gearon
A core is one complete index within a Solr instance.
http://wiki.apache.org/solr/CoreAdmin
My master index servers have five cores - ncmain, ncrss, live, build,
On Thu, 16 Sep 2010 15:31:02 -0700, you wrote:
>The public terabyte dataset project would be a good match for what you
>need.
>
>http://bixolabs.com/datasets/public-terabyte-dataset-project/
>
>Of course, that means we have to actually finish the crawl & finalize
>the Avro format we use for th
On 9/17/2010 3:01 AM, Paul Dhaliwal wrote:
Another feature missing in DIH is ability to pass parameters into your
queries. If one could pass a named or positional parameter for an entity
query, it will give them lot of freedom to optimize their delta or full load
queries. One can even get creati
For some reason, when I run a query that has only two words in it, I get back
repeating results of the last word. If I were to search for something like
"good tonight", I'll get results like:
good tonight
tonight good
tonight
tonight
tonight
tonight
tonight
tonight
Basically, the first word if
All,
I have a new Windows 7 machine and have been trying to import an RSS feed
like in the SlashDot example that is included in the software. My dataConfig
file looks fine.
http://rss.slashdot.org/Slashdot/slashdot";
processor="XPathEnti
Hi,
I would like a json result like that:
{
id:2342,
name:"Abracadabra",
metadatas: [
{type:"tag", name:"tutorial"},
{type:"value", name:"2323.434/434"},
]
}
It's possible?
--
View this message in context:
http://lucene.472066.n3.nabble.com/doc-into-doc-tp1518090p15180
On Fri, Sep 17, 2010 at 4:12 PM, facholi wrote:
>
> Hi,
>
> I would like a json result like that:
>
> {
> id:2342,
> name:"Abracadabra",
> metadatas: [
> {type:"tag", name:"tutorial"},
> {type:"value", name:"2323.434/434"},
> ]
> }
Do you mean JSON with the tags not quoted (that
That's pretty good stuff to know, thanks everybody.
For my application, it's pretty hard to do crawling and universally assign
desired fields from the text returned.
However, I would WELCOME someone with that expertise into the company when it
gets funded, to prove me wrong :-)
Dennis Gearon
Solr 4.x has new NRT stuff included (uses latest Lucene 3.x, includes
per-segment faceting etc.). The Solr 3.x branch doesn't currently..
On Fri, Sep 17, 2010 at 8:06 PM, Andy wrote:
> Does Solr use Lucene NRT?
>
> --- On Fri, 9/17/10, Erick Erickson wrote:
>
>> From: Erick Erickson
>> Subject
An essential problem is that Solr does not let you update just one
field. When an ad changes from active to inactive, you have to reindex
the whole document. If you have large documents (large text fields for
example) this is a big pain.
On Fri, Sep 17, 2010 at 5:37 AM, kenf_nc wrote:
>
> You don
Look up _docid_ on the Solr wiki. It lets you walk the entire index
about as fast as possible.
On Fri, Sep 17, 2010 at 8:47 AM, Christopher Gross wrote:
> Thanks for being so helpful! You really helped me to answer my
> question! You aren't condescending at all!
>
> I'm not using it to pull dow
Tika is not perfect. Very much not perfect. I've seen a 10-15% failure
rate on randomly sampled files. It works for creating searchable text
fields, but not for text fields to return. That is, the anlyzers rip
out the nulls and make an intelligible stream of words.
If you want to save these words
And http://www.lucidimagination.com/Search
taptaptap calling Otis taptaptap
On Fri, Sep 17, 2010 at 9:30 AM, alexander sulz wrote:
> Many thank yous to all of you :)
>
> Am 17.09.2010 17:24, schrieb Walter Underwood:
>>
>> Or, for a fascinating multi-dimensional UI to mailing list archives:
>>
The same as with other formats. You give it strings to drop in before
and after the highlighted text.
On Fri, Sep 17, 2010 at 9:48 AM, Dennis Gearon wrote:
> How does highlighting work with JSON output?
>
> Dennis Gearon
>
> Signature Warning
>
> EARTH has a Right To Life,
> oth
http://wiki.apache.org/solr/CommonQueryParameters?action=fullsearch&context=180&value=slop&fullsearch=Text
On Fri, Sep 17, 2010 at 10:55 AM, Dennis Gearon wrote:
> HOw does one 'vary the slop'?
>
> Dennis Gearon
>
> Signature Warning
>
> EARTH has a Right To Life,
> otherwise we
I suspect that you're seeing the default query operator
in action, as an OR. We could tell more if you posted
the results of your query with &debugQuery=on
Best
Erick
On Fri, Sep 17, 2010 at 3:58 PM, wrote:
> For some reason, when I run a query that has only two words in it, I get
> back repeat
Wow, that's a lot to learn. At some point, I need to really dig in, or find
some pretty pictures, graphical aids.
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Fri
: Is it possible to use mergeindexes action using EmbeddedSolrServer?
: Thanks in advance
I haven't tried it, but this should be the same as any other feature of
the CoreAdminHandler -- construct an instance using your CoreContainer,
and then execute the appropriate request directly.
(you may
Brad:
1) if you haven't already figured this out, i would suggest emailin the
java-user mailing list. It's got a bigger collection of users who are
familiar with the internals of the Lucnee-Java API (that's the level it
seems like you are having difficulty at)
2) Maybe you mentioned your sor
'slop' is an actual argument!?!? LOL!
I thought you were just describing some ASPECT of the search process, not it's
workings :-)
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.p
: I would like to drop ft_text and make each index shard 3GB smaller, but make
: it so that any queries which use ft_text get automatically redirected to
: catchall. Ultimately we will be replacing catchall with dismax and
: eliminating it. After the switch to dismax is complete and catchall is
: During the actual import - SOLR complains because its looking for method
: with signature transformRow(Map row)
It would be helpful if you could clarify what you mean by "compalins"
Are you getting an error? a message in the logs? what exactly does it
say? (please cut/paste and provide plen
On 9/17/2010 7:22 PM, Chris Hostetter wrote:
a) not really. assuming you have no problem modifying the indexing code
in the way you want, and are primarily worried about searching from
various clients, then the most straight forward approach is probably to
use RewriteRules (or something equivi
: Reindexing with a +1MILLI hack had occurred to me and I guess that's what
: I'll do in the meantime; it just seemed like something that people must have
: run into before! I suppose it depends on the granularity of your
people have definitely run into it before, and most of them (that i know
: I use the PingRequestHandler option that tells my load balancer whether a
: machine is available.
:
: When the service is disabled, every one of those requests, which my load
: balancer makes every five seconds, results in the following in the log:
:
: Sep 9, 2010 6:06:58 PM org.apache.solr.c
: Since Lucene 3.0.2 is 'out there', does this mean the format is nailed down,
: and some sort of porting is possible?
: Does anyone know of a tool that can read the entire contents of a Solr index
: and (re)write it another? (as an indexing operation - eg 2.9 -> 3.0.x, so not
: repl)
3.0.2 shoul
: stores, just a portion of it. Currently, I need to get 16 records at
: once, not just the 10 that show. So I have the rows set to "99" for
: the testing phase, and I can increase it later. I just wanted to have
: a better way of getting all the results that didn't require hard
: coding a value
75 matches
Mail list logo