Re: general debugging techniques?

2010-07-02 Thread Jim Blomo
Just to confirm I'm not doing something insane, this is my general setup: - index approx 1MM documents including HTML, pictures, office files, etc. - files are not local to solr process - use upload/extract to extract text from them through tika - use commit=1 on each POST (reasons below) - use op

SolrJ-1.4.0 client needs slf4j-jdk14-1.5.5 library on J2SE 1.5 Update 21

2010-07-02 Thread Sharp, Jonathan
I've found that when running a SolrJ client on J2SE 1.5 Update 21, in addition to the jars in the the dist/solrj-lib directory I need slf4j-jdk14-1.5.5.jar in the lib directory, otherwise I get an exception where it can't find org.slf4j.impl.StaticLoggerBinder. -Jon ---

Re: ArrayIndexOutOfBoundsException heeeeeelp !?!?!?!!?! Sorting

2010-07-02 Thread Chris Hostetter
: aha. the type is "sint" : : do i need to use "string" ore which field did not use any tokenizer ? ^^ : i thought sint is untokenized... You have to be explicit about what you mean -- if by sint you are refering to something like this from the Solr 1.4 example schema... ...then you ar

Re: problem with formulating a negative query

2010-07-02 Thread Chris Hostetter
: thanks for your explanations. But why are all docs being *removed* from the : set of all docs that contain R in their topic field? This would correspond to : a boolean AND and would stand in conflict with the clause q.op=OR. This seems : a bit strange to me. Erick's explanation might have been

Re: Modifications to AbstractSubTypeFieldType

2010-07-02 Thread Chris Hostetter
: The changes to AbstractSubTypeFieldType do not have any adverse effects on the : solr.PointType class, so I'd quite like to suggest it gets included in the : main solr source code. Where can I send a patch for someone to evaluate or : should I just attach it to the issue in JIRA and see what hap

Re: incomplete/missing results

2010-07-02 Thread Moises Muratalla
found the problem, please close/disregard this. I needed to increase maxFieldLength On Fri, Jul 2, 2010 at 3:47 PM, Moises Muratalla wrote: > here is some more info > > I added this line > > > 03320001 Travel Germany Arrive next day > > > It finds the "03320001" but none of the tokens on th

Re: upload PDF using curl

2010-07-02 Thread Chris Hostetter
: I am using Windows XP, curl 7.19.5, Solr 1.4.1 : : the command is: : : curl http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true' -F "myfi...@tutorial.pdf" : : I got error : : HTTP Error: 400. missing content stream. : 'commit' is not a recognized as an internal or extern

Re: Mr Lance : customize the search algorithm of solr

2010-07-02 Thread Chris Hostetter
: > depending on words in the query i select the correct urls by applying : > mathematical formulae. This result should be shown to the user in : > descending order. : > : > Now as i know that lucene has its own searcher : > which is used by solr as well. cant i replace this searcher part in : > S

Re: Use free text to search against boolean fields?

2010-07-02 Thread Saïd Radhouani
Hi Jan, Thanks for this suggestion. If we choose parsing, then why don't we do it at the indexing side, instead of the querying side, which might slows down the search process? i.e., if a document has "is_man=true" and "is_single=true", the we populate a text field by the words "man" and "singl

Re: Leading Wildcard query strangeness

2010-07-02 Thread Chris Hostetter
: In the default schema.xml, only text_rev fieldType has : ReversedWildcardFilterFactory. The query below (manu_exact:*in) returns : two documents (whose manu_exact is Belkin). : : Am i missing something? : : http://localhost:8983/solr/select/?q=manu_exact%3A*in&version=2.2&start=0&rows=10&in

Re: incomplete/missing results

2010-07-02 Thread Moises Muratalla
here is some more info I added this line 03320001 Travel Germany Arrive next day It finds the "03320001" but none of the tokens on the rest of the line Here are the definitions from the schema all On Fri, Jul 2, 2010 at 3:23 PM, Moises Muratalla wrote

Re: Use free text to search against boolean fields?

2010-07-02 Thread Jan Høydahl / Cominvent
Hi, I would rather go for the boolean variant and spend some time writing a query parser which tries to understand all kinds of input people may make, mapping it into boolean filters. In this way you can support both navigation and search and keep both in sync whatever people prefert to start w

Re: Leading Wildcard query strangeness

2010-07-02 Thread Ahmet Arslan
> that's how SolrQueryParser works at the moment, yes. In the default schema.xml, only text_rev fieldType has ReversedWildcardFilterFactory. The query below (manu_exact:*in) returns two documents (whose manu_exact is Belkin). Am i missing something? http://localhost:8983/solr/select/?q=manu_e

Re: Query modification

2010-07-02 Thread Tommy Chheng
i tried openNLP but found it's not very good for search queries because it uses grammar features like capitalization. i coded up a bayesian model with mutual information to model dependence between terms. ex. grouping "stanford university" together in the query "stanford university solar" @

Re: Leading Wildcard query strangeness

2010-07-02 Thread Chris Hostetter
: Does this mean leading * operator can only be used with fields whose : fieldType definition has ReversedWildcardFilterFactory at index time? that's how SolrQueryParser works at the moment, yes. -Hoss

RE: Query modification

2010-07-02 Thread caman
And what did you use for entity detection? GATE,openNLP? Do you mind sharing that please? From: Tommy Chheng-2 [via Lucene] [mailto:ml-node+939600-682384129-124...@n3.nabble.com] Sent: Friday, July 02, 2010 3:20 PM To: caman Subject: Re: Query modification Hi, I actually did somethin

Re: incomplete/missing results

2010-07-02 Thread Moises Muratalla
I posted some plain ASCII text files using the post.jar in the exampledocs directory. My queries work mostly. One recent problem: I search for "germany", it only returns 9 results, when there are actually 10. Would the schema help? I didn't really modify the solrconfig file On Thu, Jul 1, 2010

Re: Query modification

2010-07-02 Thread Tommy Chheng
Hi, I actually did something similar on http://researchwatch.net/ if you search for "stanford university solar", it will process the query by tagging the stanford university to the organization field. I created a querycomponent class and altered the query string like this(in scala but transla

Query modification

2010-07-02 Thread osocurious2
If I wanted to intercept a query and turn q=romantic italian restaurant in seattle into q=romantic tag:restaurant city:seattle cuisine:italian would I subclass QueryComponent, modify the query, and pass it to super? Or is there a standard way already to do this? What about changing it to

Re: Leading Wildcard query strangeness

2010-07-02 Thread Ahmet Arslan
> that's not correct  what SolrQueryParser does is > check which field > types use ReversedWildcardFilterFactory at indexing time, > and then when > parsing queries, it allows fields that use field types to > be parsed with a leading wildcard. Does this mean leading * operator can only be u

Re: Leading Wildcard query strangeness

2010-07-02 Thread Chris Hostetter
: > I'm going to guess that is what you meant, that the very : > presence of the : > filter in the schema, whether it is used or not, allows you : > to do wildcard : > searches. : : Exactly. that's not correct what SolrQueryParser does is check which field types use ReversedWildcardFilterFa

Re: Solr and NLP

2010-07-02 Thread Moazzam Khan
I read that article. However, the thing is I am trying to find an NLP application to interact with solr (or even by itself) to do context based searches. I think Solr has a wordnet filter which I haven't looked into but so far I haven't come across anything helpful in this regard (maybe because I

Re: Cache hits exposed by API

2010-07-02 Thread Chris Hostetter
: Yes, the StatsComponent returns the values in an XML. : : http://wiki.apache.org/solr/StatsComponent the StatsComponent returns stats about document values -- not stats from the SolrInfoMBeans. : >  I knew that  the jsp page=  http://localhost:8983/solr/admin/stats.jsp : >  shows the differ

Re: OOM on uninvert field request

2010-07-02 Thread Chris Hostetter
: Subject: OOM on uninvert field request : In-Reply-To: <1277850992.1955.6.ca...@kratos> : References: <1277726685.6747.2.ca...@kratos> : <9f5fcd40-c9bb-4cfb-bb0d-d3cdf1680...@gmail.com> : <9eb24a79bbfe195513fa05e0ce2c654c.squir...@sm.webmail.pair.com> : : : <1277850992.1955.

Re: Wither field compresed="true" ?

2010-07-02 Thread Chris Hostetter
: > I just noticed that field compression (e.g. compressed="true") is no longer : > in Solr, nor can I find why this was done. Can a committer offer an This is clearly spelled out in the "Upgrading from Solr 1.4" section of CHANGES.txt on trunk and branch 3x... * Field compression is no longer

Re: SweetSpotSimilarity

2010-07-02 Thread Chris Hostetter
: Side question. How would I know if a configuration option can also take a : factory class.. like in this instance? by reading the example schema.xml... -Hoss

Re: Disk usage per-field

2010-07-02 Thread Shawn Heisey
On 6/30/2010 5:44 PM, Shawn Heisey wrote: Is it possible for Solr (or Luke/Lucene) to tell me exactly how much of the total index disk space is used by each field? It would also be very nice to know, for each field, how much is used by the index and how much is used for stored data. Still

CFP for Surge Scalability Conference 2010

2010-07-02 Thread Jason Dixon
A quick reminder that there's one week left to submit your abstract for this year's Surge Scalability Conference. The event is taking place on Sept 30 and Oct 1, 2010 in Baltimore, MD. Surge focuses on case studies that address production failures and the re-engineering efforts that led to victor

Field Collapse question

2010-07-02 Thread osocurious2
Is there a way to configure the Field Collapse functionality to not collapse Null fields? I want to collapse on a field that a certain percentage of documents in my index have...but not all of them. If they don't have the field I want it to be treated uncollapsed. Is there a setting to do this? -

Re: Solr and NLP

2010-07-02 Thread Gregg Hoshovsky
I saw mention earlier about a way to link in openNLP into solr ( http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Optimizing-Findability-Lucene-and-Solr) .I haven't followed up on that yet so I don't know much about it. However if you do figure anything out please share

Use free text to search against boolean fields?

2010-07-02 Thread Saïd Radhouani
Hi, I have the following kind of data to index in a multilingual context: is_man, is_single, has_job, etc. Logically, the underlying fields have a value of "yes" or "no." That's why the boolean type would be appropriate. But my problem is, in addition to be able to filter on these fields, I wo

Re: Solr and NLP

2010-07-02 Thread Björn Wilmsmann
Hi, sure. It basically depends on what kind of NLP you're going to do. However, given it's solid tokenizers, management of large amounts of texts and similarity measures I'd say it's well-suited for natural language processing. On 2 July 2010 17:15, Moazzam Khan wrote: > Hi guys, > > Is there a

Re: Modifications to AbstractSubTypeFieldType

2010-07-02 Thread Yonik Seeley
On Fri, Jul 2, 2010 at 9:51 AM, Mark Allan wrote: [...] > The changes to AbstractSubTypeFieldType do not have any adverse effects on > the solr.PointType class, so I'd quite like to suggest it gets included in > the main solr source code.  Where can I send a patch for someone to evaluate > or shou

RE: steps to improve search

2010-07-02 Thread Frederico Azeiteiro
Thanks Leonardo, I didn't know that tool, very good! So I see what is wrong: SnowballPorterFilterFactory and StopFilterFactory. (both used on index and query) I tried remove the snowball and change the stopfilter to "ignorecase=false" on QUERY and restarted solr. But now I get no results :(.

Solr and NLP

2010-07-02 Thread Moazzam Khan
Hi guys, Is there a way I can make Solr work with an NLP application? Are there any NLP applications that will work with Solr? Can someone please point me to a tutorial or something if it's possible. Thanks, Moazzam

Modifications to AbstractSubTypeFieldType

2010-07-02 Thread Mark Allan
Hi folks, I've made a few small changes to the AbstractSubTypeFieldType class to allow users to define distinct field types for each subfield. This enables us to define complex data types in the schema. For example, we have our own subclass of the CoordinateFieldType called TemporalCover

Re: steps to improve search

2010-07-02 Thread Leonardo Menezes
most likely due to: EnglishPorterFilterFactory RemoveDuplicatesTokenFilterFactory StopFilterFactory you get those "fake" matches. try going into the admin, on the analysis section. in there you can "simulate" the index/search of a document, and see how its actually searched/indexed. it will give y

RE: steps to improve search

2010-07-02 Thread Ahmet Arslan
> My Query: Headline:("paying for it") on solr admin > interface > > Some results: > ...l stop paying tax until council pays for dam... > "Why paying extra doesn't always pay!" > "...pay cut as M&S investor pressure pays off" > "Can't pay or won't pay: the debt collector call" > > What could be w

RE: steps to improve search

2010-07-02 Thread Frederico Azeiteiro
For the example given, I need the full expression "paying for it", so yes all the words. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: sexta-feira, 2 de Julho de 2010 12:30 To: solr-user@lucene.apache.org Subject: RE: steps to improve search > I need to know how t

RE: steps to improve search

2010-07-02 Thread Frederico Azeiteiro
I'm using " surrounding the text. My Query: Headline:("paying for it") on solr admin interface Some results: ...l stop paying tax until council pays for dam... "Why paying extra doesn't always pay!" "...pay cut as M&S investor pressure pays off" "Can't pay or won't pay: the debt collector call"

RE: steps to improve search

2010-07-02 Thread Ahmet Arslan
> I need to know how to achieve more accurates queries (like > the example below...) using these filters. do you want that all terms - you search - must appear in returned documents? You can change default operator of QueryParser to AND. either in schema.xml or appending &q.op=AND you your searc

Re: steps to improve search

2010-07-02 Thread Leonardo Menezes
No, you explained alright, but then didnt understand the answer. Searching with the " surrounding the text you are searching for, has exactly the effect you are looking for. try it... On Fri, Jul 2, 2010 at 1:23 PM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > I'm sorry, maybe I

RE: steps to improve search

2010-07-02 Thread Frederico Azeiteiro
I'm sorry, maybe I didn’t explain correctly. The issue is using the default text FIELD TYPE, not the default text FIELD. The "text" field type uses a lot of filters on indexing. I need to know how to achieve more accurates queries (like the example below...) using these filters. -Origina

Re: steps to improve search

2010-07-02 Thread Leonardo Menezes
Try field:"text to search" On Fri, Jul 2, 2010 at 12:57 PM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > Hi, > > I'm using the default text field type on my schema. > > > > Is there a quick way to do more accurate searches like searching for > "paying for it" only return docs wi

steps to improve search

2010-07-02 Thread Frederico Azeiteiro
Hi, I'm using the default text field type on my schema. Is there a quick way to do more accurate searches like searching for "paying for it" only return docs with the full expression "paying for it", and not return articles with word "pay" as it does now? Thanks, Frederico

Re: IOException: read past EOF when opening index built directly w/Lucene

2010-07-02 Thread Michael McCandless
Indeed, I can reproduce this: if I create an index on 2.3 and try to read it on trunk w/ CheckIndex, I hit that same exception. But: this is [somewhat] expected, because trunk = 4.0, which can no longer read indices created with Lucene <= 3.0. However... instead of throwing a weird exception, we

how to apply stemming to the index ?

2010-07-02 Thread sarfaraz masood
I want to stem the terms in my index. but currently i am using standard analyzer that is not performing any kind of stemming. StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); After some searching i found a code for PorterStemAnalyzer but that is having some problems

Re: How to force wildcard query not to ignore word endings

2010-07-02 Thread easy.angel
Thanks! I tested it and it works perfectly. > However remember that wildcard, prefix searches (*) are not analyzed. > For example HAN* won't return anything. I making query lowercasing also dynamically, so it's not a problem for me. -- View this message in context: http://lucene.472066.n3.n

Re: questions about Solr shards

2010-07-02 Thread Babak Farhang
Thanks Joe. This is all very interesting. So though it helps us scale, sharding doesn't come cheap. On Mon, Jun 28, 2010 at 9:50 AM, Joe Calderon wrote: > there is a first pass query to retrieve all matching document ids from > every shard along with relevant sorting information, the document ids