Just to confirm I'm not doing something insane, this is my general setup:
- index approx 1MM documents including HTML, pictures, office files, etc.
- files are not local to solr process
- use upload/extract to extract text from them through tika
- use commit=1 on each POST (reasons below)
- use op
I've found that when running a SolrJ client on J2SE 1.5 Update 21, in addition
to the jars in the the dist/solrj-lib directory I need slf4j-jdk14-1.5.5.jar in
the lib directory, otherwise I get an exception where it can't find
org.slf4j.impl.StaticLoggerBinder.
-Jon
---
: aha. the type is "sint"
:
: do i need to use "string" ore which field did not use any tokenizer ? ^^
: i thought sint is untokenized...
You have to be explicit about what you mean -- if by sint you are refering
to something like this from the Solr 1.4 example schema...
...then you ar
: thanks for your explanations. But why are all docs being *removed* from the
: set of all docs that contain R in their topic field? This would correspond to
: a boolean AND and would stand in conflict with the clause q.op=OR. This seems
: a bit strange to me.
Erick's explanation might have been
: The changes to AbstractSubTypeFieldType do not have any adverse effects on the
: solr.PointType class, so I'd quite like to suggest it gets included in the
: main solr source code. Where can I send a patch for someone to evaluate or
: should I just attach it to the issue in JIRA and see what hap
found the problem, please close/disregard this.
I needed to increase maxFieldLength
On Fri, Jul 2, 2010 at 3:47 PM, Moises Muratalla wrote:
> here is some more info
>
> I added this line
>
>
> 03320001 Travel Germany Arrive next day
>
>
> It finds the "03320001" but none of the tokens on th
: I am using Windows XP, curl 7.19.5, Solr 1.4.1
:
: the command is:
:
: curl http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true'
-F "myfi...@tutorial.pdf"
:
: I got error :
: HTTP Error: 400. missing content stream.
: 'commit' is not a recognized as an internal or extern
: > depending on words in the query i select the correct urls by applying
: > mathematical formulae. This result should be shown to the user in
: > descending order.
: >
: > Now as i know that lucene has its own searcher
: > which is used by solr as well. cant i replace this searcher part in
: > S
Hi Jan,
Thanks for this suggestion. If we choose parsing, then why don't we do it at
the indexing side, instead of the querying side, which might slows down the
search process? i.e., if a document has "is_man=true" and "is_single=true", the
we populate a text field by the words "man" and "singl
: In the default schema.xml, only text_rev fieldType has
: ReversedWildcardFilterFactory. The query below (manu_exact:*in) returns
: two documents (whose manu_exact is Belkin).
:
: Am i missing something?
:
:
http://localhost:8983/solr/select/?q=manu_exact%3A*in&version=2.2&start=0&rows=10&in
here is some more info
I added this line
03320001 Travel Germany Arrive next day
It finds the "03320001" but none of the tokens on the rest of the line
Here are the definitions from the schema
all
On Fri, Jul 2, 2010 at 3:23 PM, Moises Muratalla wrote
Hi,
I would rather go for the boolean variant and spend some time writing a query
parser which tries to understand all kinds of input people may make, mapping it
into boolean filters. In this way you can support both navigation and search
and keep both in sync whatever people prefert to start w
> that's how SolrQueryParser works at the moment, yes.
In the default schema.xml, only text_rev fieldType has
ReversedWildcardFilterFactory. The query below (manu_exact:*in) returns two
documents (whose manu_exact is Belkin).
Am i missing something?
http://localhost:8983/solr/select/?q=manu_e
i tried openNLP but found it's not very good for search queries
because it uses grammar features like capitalization.
i coded up a bayesian model with mutual information to model dependence
between terms. ex. grouping "stanford university" together in the query
"stanford university solar"
@
: Does this mean leading * operator can only be used with fields whose
: fieldType definition has ReversedWildcardFilterFactory at index time?
that's how SolrQueryParser works at the moment, yes.
-Hoss
And what did you use for entity detection?
GATE,openNLP?
Do you mind sharing that please?
From: Tommy Chheng-2 [via Lucene]
[mailto:ml-node+939600-682384129-124...@n3.nabble.com]
Sent: Friday, July 02, 2010 3:20 PM
To: caman
Subject: Re: Query modification
Hi,
I actually did somethin
I posted some plain ASCII text files using the post.jar in the exampledocs
directory.
My queries work mostly.
One recent problem: I search for "germany", it only returns 9 results, when
there are actually 10.
Would the schema help? I didn't really modify the solrconfig file
On Thu, Jul 1, 2010
Hi,
I actually did something similar on http://researchwatch.net/
if you search for "stanford university solar", it will process the query
by tagging the stanford university to the organization field.
I created a querycomponent class and altered the query string like
this(in scala but transla
If I wanted to intercept a query and turn
q=romantic italian restaurant in seattle
into
q=romantic tag:restaurant city:seattle cuisine:italian
would I subclass QueryComponent, modify the query, and pass it to super? Or
is there a standard way already to do this?
What about changing it to
> that's not correct what SolrQueryParser does is
> check which field
> types use ReversedWildcardFilterFactory at indexing time,
> and then when
> parsing queries, it allows fields that use field types to
> be parsed with a leading wildcard.
Does this mean leading * operator can only be u
: > I'm going to guess that is what you meant, that the very
: > presence of the
: > filter in the schema, whether it is used or not, allows you
: > to do wildcard
: > searches.
:
: Exactly.
that's not correct what SolrQueryParser does is check which field
types use ReversedWildcardFilterFa
I read that article. However, the thing is I am trying to find an NLP
application to interact with solr (or even by itself) to do context
based searches.
I think Solr has a wordnet filter which I haven't looked into but so
far I haven't come across anything helpful in this regard (maybe
because I
: Yes, the StatsComponent returns the values in an XML.
:
: http://wiki.apache.org/solr/StatsComponent
the StatsComponent returns stats about document values -- not stats from
the SolrInfoMBeans.
: > I knew that the jsp page= http://localhost:8983/solr/admin/stats.jsp
: > shows the differ
: Subject: OOM on uninvert field request
: In-Reply-To: <1277850992.1955.6.ca...@kratos>
: References: <1277726685.6747.2.ca...@kratos>
: <9f5fcd40-c9bb-4cfb-bb0d-d3cdf1680...@gmail.com>
: <9eb24a79bbfe195513fa05e0ce2c654c.squir...@sm.webmail.pair.com>
:
:
: <1277850992.1955.
: > I just noticed that field compression (e.g. compressed="true") is no longer
: > in Solr, nor can I find why this was done. Can a committer offer an
This is clearly spelled out in the "Upgrading from Solr 1.4" section of
CHANGES.txt on trunk and branch 3x...
* Field compression is no longer
: Side question. How would I know if a configuration option can also take a
: factory class.. like in this instance?
by reading the example schema.xml...
-Hoss
On 6/30/2010 5:44 PM, Shawn Heisey wrote:
Is it possible for Solr (or Luke/Lucene) to tell me exactly how much
of the total index disk space is used by each field? It would also be
very nice to know, for each field, how much is used by the index and
how much is used for stored data.
Still
A quick reminder that there's one week left to submit your abstract for
this year's Surge Scalability Conference. The event is taking place on
Sept 30 and Oct 1, 2010 in Baltimore, MD. Surge focuses on case studies
that address production failures and the re-engineering efforts that led
to victor
Is there a way to configure the Field Collapse functionality to not collapse
Null fields? I want to collapse on a field that a certain percentage of
documents in my index have...but not all of them. If they don't have the
field I want it to be treated uncollapsed. Is there a setting to do this?
-
I saw mention earlier about a way to link in openNLP into solr (
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Optimizing-Findability-Lucene-and-Solr)
.I haven't followed up on that yet so I don't know much about it. However if
you do figure anything out please share
Hi,
I have the following kind of data to index in a multilingual context: is_man,
is_single, has_job, etc.
Logically, the underlying fields have a value of "yes" or "no." That's why the
boolean type would be appropriate. But my problem is, in addition to be able to
filter on these fields, I wo
Hi,
sure. It basically depends on what kind of NLP you're going to do.
However, given it's solid tokenizers, management of large amounts of
texts and similarity measures I'd say it's well-suited for natural
language processing.
On 2 July 2010 17:15, Moazzam Khan wrote:
> Hi guys,
>
> Is there a
On Fri, Jul 2, 2010 at 9:51 AM, Mark Allan wrote:
[...]
> The changes to AbstractSubTypeFieldType do not have any adverse effects on
> the solr.PointType class, so I'd quite like to suggest it gets included in
> the main solr source code. Where can I send a patch for someone to evaluate
> or shou
Thanks Leonardo, I didn't know that tool, very good!
So I see what is wrong:
SnowballPorterFilterFactory and StopFilterFactory. (both used on index and
query)
I tried remove the snowball and change the stopfilter to "ignorecase=false" on
QUERY and restarted solr.
But now I get no results :(.
Hi guys,
Is there a way I can make Solr work with an NLP application? Are there
any NLP applications that will work with Solr? Can someone please
point me to a tutorial or something if it's possible.
Thanks,
Moazzam
Hi folks,
I've made a few small changes to the AbstractSubTypeFieldType class to
allow users to define distinct field types for each subfield. This
enables us to define complex data types in the schema.
For example, we have our own subclass of the CoordinateFieldType
called TemporalCover
most likely due to:
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
StopFilterFactory
you get those "fake" matches. try going into the admin, on the analysis
section. in there you can "simulate" the index/search of a document, and see
how its actually searched/indexed. it will give y
> My Query: Headline:("paying for it") on solr admin
> interface
>
> Some results:
> ...l stop paying tax until council pays for dam...
> "Why paying extra doesn't always pay!"
> "...pay cut as M&S investor pressure pays off"
> "Can't pay or won't pay: the debt collector call"
>
> What could be w
For the example given, I need the full expression "paying for it", so
yes all the words.
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: sexta-feira, 2 de Julho de 2010 12:30
To: solr-user@lucene.apache.org
Subject: RE: steps to improve search
> I need to know how t
I'm using " surrounding the text.
My Query: Headline:("paying for it") on solr admin interface
Some results:
...l stop paying tax until council pays for dam...
"Why paying extra doesn't always pay!"
"...pay cut as M&S investor pressure pays off"
"Can't pay or won't pay: the debt collector call"
> I need to know how to achieve more accurates queries (like
> the example below...) using these filters.
do you want that all terms - you search - must appear in returned documents?
You can change default operator of QueryParser to AND. either in schema.xml or
appending &q.op=AND you your searc
No, you explained alright, but then didnt understand the answer. Searching
with the " surrounding the text you are searching for, has exactly the
effect you are looking for. try it...
On Fri, Jul 2, 2010 at 1:23 PM, Frederico Azeiteiro <
frederico.azeite...@cision.com> wrote:
> I'm sorry, maybe I
I'm sorry, maybe I didn’t explain correctly.
The issue is using the default text FIELD TYPE, not the default text FIELD.
The "text" field type uses a lot of filters on indexing.
I need to know how to achieve more accurates queries (like the example
below...) using these filters.
-Origina
Try
field:"text to search"
On Fri, Jul 2, 2010 at 12:57 PM, Frederico Azeiteiro <
frederico.azeite...@cision.com> wrote:
> Hi,
>
> I'm using the default text field type on my schema.
>
>
>
> Is there a quick way to do more accurate searches like searching for
> "paying for it" only return docs wi
Hi,
I'm using the default text field type on my schema.
Is there a quick way to do more accurate searches like searching for
"paying for it" only return docs with the full expression "paying for
it", and not return articles with word "pay" as it does now?
Thanks,
Frederico
Indeed, I can reproduce this: if I create an index on 2.3 and try to
read it on trunk w/ CheckIndex, I hit that same exception. But: this
is [somewhat] expected, because trunk = 4.0, which can no longer read
indices created with Lucene <= 3.0. However... instead of throwing a
weird exception, we
I want to stem the terms in my index. but currently i am using standard
analyzer that is not performing any kind of stemming.
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
After some searching i found a code for PorterStemAnalyzer but that is having
some problems
Thanks! I tested it and it works perfectly.
> However remember that wildcard, prefix searches (*) are not analyzed.
> For example HAN* won't return anything.
I making query lowercasing also dynamically, so it's not a problem for me.
--
View this message in context:
http://lucene.472066.n3.n
Thanks Joe. This is all very interesting. So though it helps us scale,
sharding doesn't come cheap.
On Mon, Jun 28, 2010 at 9:50 AM, Joe Calderon wrote:
> there is a first pass query to retrieve all matching document ids from
> every shard along with relevant sorting information, the document ids
49 matches
Mail list logo