Re: question about Field Collapsing/ grouping

2011-09-13 Thread O. Klein
Isn't that what the parameter group.ngroups=true is for?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-about-Field-Collapsing-grouping-tp3331821p3332471.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: question about Field Collapsing/ grouping

2011-09-13 Thread Jayendra Patil
yup .. seems the group count feature is included now, as mentioned by Klein.

Regards,
Jayendra

On Tue, Sep 13, 2011 at 8:27 AM, O. Klein  wrote:
> Isn't that what the parameter group.ngroups=true is for?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/question-about-Field-Collapsing-grouping-tp3331821p3332471.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: can indexing information stored in db rather than filesystem?

2011-09-13 Thread kiran.bodigam
Thanks for u r reply guys

As suggested i agree that we are losing many of the benefits of Solr/Lucene
but i still want to store the index output (index files) in db table please
suggest what are the steps i need to follow to configure the db with SOLR
Engine (As how we done in it solrconfig.xml
${solr.data.dir:} similarly i would like to give the path
for db table)..?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-indexing-information-stored-in-db-rather-than-filesystem-tp3319687p3332663.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Markus Jelsma
I'm curious; what benefits do you think you'll get by storing the files in 
some DB table?

On Tuesday 13 September 2011 15:51:19 kiran.bodigam wrote:
> Thanks for u r reply guys
> 
> As suggested i agree that we are losing many of the benefits of Solr/Lucene
> but i still want to store the index output (index files) in db table please
> suggest what are the steps i need to follow to configure the db with SOLR
> Engine (As how we done in it solrconfig.xml
> ${solr.data.dir:} similarly i would like to give the
> path for db table)..?
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/can-indexing-information-stored-in-db-r
> ather-than-filesystem-tp3319687p3332663.html Sent from the Solr - User
> mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


RE: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Jaeger, Jay - DOT
I don't think you understand.  Solr does not have the code to do that.  It just 
isn't there, nor would I expect it would ever be there.

Solr is open source though.  You could look at the code and figure out how to 
do it (though why anyone would do that remains beyond my ability to 
understand).  As the saying goes:  "Knock yourself out".

(Happy programmer's day to all.
http://en.wikipedia.org/wiki/Programmers'_Day ).

JRJ

-Original Message-
From: kiran.bodigam [mailto:kiran.bodi...@gmail.com] 
Sent: Tuesday, September 13, 2011 8:51 AM
To: solr-user@lucene.apache.org
Subject: Re: can indexing information stored in db rather than filesystem?

Thanks for u r reply guys

As suggested i agree that we are losing many of the benefits of Solr/Lucene
but i still want to store the index output (index files) in db table please
suggest what are the steps i need to follow to configure the db with SOLR
Engine (As how we done in it solrconfig.xml
${solr.data.dir:} similarly i would like to give the path
for db table)..?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-indexing-information-stored-in-db-rather-than-filesystem-tp3319687p3332663.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Walter Underwood
On Sep 13, 2011, at 6:51 AM, kiran.bodigam wrote:

> As suggested i agree that we are losing many of the benefits of Solr/Lucene
> but i still want to store the index output (index files) in db table please
> suggest what are the steps i need to follow to configure the db with SOLR
> Engine 

The steps are:

1. write the Java code to do that
2. submit it as contrib, because it is such a bad idea that I doubt it will be 
added to the common code

wunder
--
Walter Underwood




using a function query with OR and spaces?

2011-09-13 Thread Jason Toy
I had queries breaking on me when there were spaces in the text I was
searching for. Originally I had :

fq=state_s:New York
and that would break, I found a work around by using:

fq={!raw f=state_s}New York


My problem now is doing this with an OR query,  this is what I have now, but
it doesn't work:


fq=({!raw f=country_s}United States OR {!raw f=city_s}New York


Re: How to combine RSS w/ Tika when using Data Import Handler (DIH)

2011-09-13 Thread Pulkit Singhal
Hello Everyone,

I've been investigating and I understand that using the RegexTransformer is
an option that is open for identifying and extracting data to multiple
fields from a single rss value source ... But rather than hack together
something I once again wanted to check with the community: Is there another
option for navigating the HTML DOM tree using some well-tested transformer
or TIka or something?

Thanks!
- Pulkit

On Mon, Sep 12, 2011 at 1:45 PM, Pulkit Singhal wrote:

> Given an RSS raw feed source link such as the following:
>
> http://persistent.info/cgi-bin/feed-proxy?url=http%3A%2F%2Fwww.amazon.com%2Frss%2Ftag%2Fblu-ray%2Fnew%2Fref%3Dtag_rsh_hl_ersn
>
> I can easily get to the value of the description for an item like so:
> 
>
> But the content of "description" happens to be in HTML and sadly it is this
> HTML chunk that has some pretty decent information that I would like to
> import as well.
> 1) For example it has the image for the item:
> http://ecx.images-amazon.com/images/I/51yyAAoYzKL._SL160_SS160_.jpg"; ...
> />
> 2) It has the price for the item:
> $13.99
> And many other useful pieces of data that aren't in a proper rss format but
> they are simply thrown together inside the html chunk that is served as the
> value for the xpath="/rss/item/description"
>
> So, how can I configure DIH to start importing this html information as
> well?
> Is Tika the way to go?
> Can someone give a brief example of what a config file with both Tika
> config and RSS config would/should look like?
>
> Thanks!
> - Pulkit
>


Re: using a function query with OR and spaces?

2011-09-13 Thread josh lucas
On Sep 13, 2011, at 8:37 AM, Jason Toy wrote:

> I had queries breaking on me when there were spaces in the text I was
> searching for. Originally I had :
> 
> fq=state_s:New York
> and that would break, I found a work around by using:
> 
> fq={!raw f=state_s}New York
> 
> 
> My problem now is doing this with an OR query,  this is what I have now, but
> it doesn't work:
> 
> 
> fq=({!raw f=country_s}United States OR {!raw f=city_s}New York

Couldn't you do:

fq=(country_s:(United States) OR city_s:(New York))

I think that should work though you probably will need to surround the queries 
with quotes to get the exact phrase match.

Re: using a function query with OR and spaces?

2011-09-13 Thread Chris Hostetter
: Subject: using a function query with OR and spaces?

First off, what you are asking about is a "filter query" not a "function 
query"

https://wiki.apache.org/solr/CommonQueryParameters#fq

: I had queries breaking on me when there were spaces in the text I was
: searching for. Originally I had :
: 
: fq=state_s:New York
: and that would break, I found a work around by using:
: 
: fq={!raw f=state_s}New York

assuming the field is a StrField, the "raw" or "Term" QParsers will work, 
or you can quote the value using something like fq=stats_s:"New York"

: My problem now is doing this with an OR query,  this is what I have now, but
: it doesn't work:
...
: fq=({!raw f=country_s}United States OR {!raw f=city_s}New York

That's beacause:

a) local params (ie: the {! ...} syntax) must comeat the start of a SOlr 
Param as an instruction of how to parse it.

b) the "raw" and "term" QParsers don't support *any* query markup/syntax 
(like "OR" modifiers).  If you want to build a complex query using 
multiple clauses that are constucted using specific QParsers, you need to 
build them up using multiple query params and/or the "_query_" hook in the 
LuceneQParser...

fq=_query_:"{!term f=state_s}New York" OR _query_:"{!term f=country_s}United 
States"

https://wiki.apache.org/solr/LocalParams
http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/


-Hoss


Re: How to combine RSS w/ Tika when using Data Import Handler (DIH)

2011-09-13 Thread Chris Hostetter

: I've been investigating and I understand that using the RegexTransformer is
: an option that is open for identifying and extracting data to multiple
: fields from a single rss value source ... But rather than hack together
: something I once again wanted to check with the community: Is there another
: option for navigating the HTML DOM tree using some well-tested transformer
: or TIka or something?

I don't think so ... if it's a *really* wellformed feed, then the 
description will actually be xhtml nodes (with the appropriate 
namespace) that are already part of the Document's DOM.

But if it's just a blob of CDATA that happens to contain welformed HTML, 
then I think a regex is currently your best option -- you'll probably want 
something tailor made for the subtleties of the site whose RSS you're 
scraping anyway since things like "are & chars in the URLs html escaped?" 
is going to vary from site to site.

It would probably be possible to write a DIH Transformer based on 
something like tagsoup to actually produce a DOM from an arbitrary html 
string in an entity, so you could then treat it as a subentity and use the 
XPathEntityProcessor -- but i don't think i've seen anyone talk about 
doing anything like that before.

-Hoss


Highlight compounded word instead of part

2011-09-13 Thread O. Klein
I am using DictionaryCompoundWordTokenFilterFactory and want to highlight the
whole word instead of the part that matched the dictionary. 

So when the query "word" matches on "compoundedword", the whole word
"compoundedword" is highlighted, instead of just "word".

Any ideas?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlight-compounded-word-instead-of-part-tp225p225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr messing up the UK GBP (pound) symbol in response, even though Java environment variabe has file encoding is set to UTF 8....

2011-09-13 Thread Chris Hostetter

: Any idea why solr is unable to return the pound sign as-is?
: 
: I tried typing in £ 1 million in Solr admin GUI and got following response.
...
: £ 1 million
...
: Here is my Java Properties I got also from admin interface:
...
: catalina.home =
: /home/rbhagdev/SCCRepos/SCC_Platform/search/solr/target/

Looks like you are using tomcat, so I suspect you are getting bit by 
this...

https://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

If that's not the problem, please try running the 
example/exampledocs/test_utf8.sh script against your Solr instance (you'll 
need to change the URL variable to match your host:port)


-Hoss

How to plug a new ANTLR grammar

2011-09-13 Thread Roman Chyla
Hi,

The standard lucene/solr parsing is nice but not really flexible. I
saw questions and discussion about ANTLR, but unfortunately never a
working grammar, so... maybe you find this useful:
https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr

In the grammar, the parsing is completely abstracted from the Lucene
objects, and the parser is not mixed with Java code. At first it
produces structures like this:
https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html

But now I have a problem. I don't know if I should use query parsing
framework in contrib.

It seems that the qParser in contrib can use different parser
generators (the default JavaCC, but also ANTLR). But I am confused and
I don't understand this new queryParser from contrib. It is really
very confusing to me. Is there any benefit in trying to plug the ANTLR
tree into it? Because looking at the AST pictures, it seems that with
a relatively simple tree walker we could build the same queries as the
current standard lucene query parser. And it would be much simpler and
flexible. Does it bring something new? I have a feeling I miss
something...

Many thanks for help,

  Roman


Re: How to serach on specific file types ?

2011-09-13 Thread ahmad ajiloo
1- How can I put the file extension into my index? I'm using Nutch to
crawling web pages and sending Nutch's data to Solr for indexing. and I have
no idea to put the file extension to my index.
2- please give me some help links about mime type. I'm new to Solr and don't
know anything about mime type. please note that I should index data of Nutch
and I couldn't find useful commands in Nutch tutorial for advanced indexing!
thank you very much


On Mon, Sep 12, 2011 at 6:07 PM, Jaeger, Jay - DOT wrote:

> Some possibilities:
>
> 1) Put the file extension into your index (that is what we did when we were
> testing indexing documents with Solr)
> 2) Put a mime type for the document into your index.
> 3) Put the whole file name / URL into your index, and match on part of the
> name.  This will give some false positives.
>
> JRJ
>
> -Original Message-
> From: ahmad ajiloo [mailto:ahmad.aji...@gmail.com]
> Sent: Monday, September 12, 2011 5:58 AM
> To: solr-user@lucene.apache.org
> Subject: Fwd: How to serach on specific file types ?
>
> Hello
> I want to search on articles. So need to find only specific files like doc,
> docx, and pdf.
> I don't need any html pages. Thus the result of our search should only
> consists of doc, docx, and pdf files.
> can you help me?
>


Get field value in custom searchcomponent (solr 3.3)

2011-09-13 Thread Pablo Ricco
What is the best way to get a float field value from docID?
I tried the following code but when it runs throws an exception For input
string: "`??eI" at line float lat = Float.parseFloat(tlat);

schemal.xml:
...

...


component.java:

@Override
public void process(ResponseBuilder rb) throws IOException {
DocSet docs = rb.getResults().docSet;
SolrIndexSearcher searcher = req.getSearcher()
FieldCache.StringIndex slat =
FieldCache.DEFAULT.getStringIndex(searcher.getReader(), "latitude");
DocIterator iter = docs.iterator(); while (iter.hasNext()) {
 int docID = iter.nextDoc(); String tlat = slat.lookup[slat.order[docID]];
if (tlat != null) {
 float lat = Float.parseFloat(tlat); //Exception!
}
}
}

Thanks,
Pablo


Re: How to serach on specific file types ?

2011-09-13 Thread Chris Hostetter

: 1- How can I put the file extension into my index? I'm using Nutch to
: crawling web pages and sending Nutch's data to Solr for indexing. and I have
: no idea to put the file extension to my index.
: 2- please give me some help links about mime type. I'm new to Solr and don't
: know anything about mime type. please note that I should index data of Nutch
: and I couldn't find useful commands in Nutch tutorial for advanced indexing!
: thank you very much

I think you need to ask on the nutch user's list about the type of schema 
nutch uses when indexing into Solr, wether it creates a specific field for 
file extension, and/or how you can modify the nutch indexer to create a 
field like that for you.

Assuming you get nutch to create a field named "extension" you can query 
solr for only docs that have a certain extension by adding it as an fq...

q=what i want&fq=extension:doc


-Hoss


Re: How to serach on specific file types ?

2011-09-13 Thread Markus Jelsma

> 1- How can I put the file extension into my index? I'm using Nutch to
> crawling web pages and sending Nutch's data to Solr for indexing. and I
> have no idea to put the file extension to my index.

To get the file extension in a separate field you can copyField the url and 
use Solr's char pattern replace filter to strip away everything up to the last 
dot, if there is any.

> 2- please give me some help links about mime type. I'm new to Solr and
> don't know anything about mime type. please note that I should index data
> of Nutch and I couldn't find useful commands in Nutch tutorial for
> advanced indexing! thank you very much

Use Nutch' index-more plugin. It'll by default add two or three values to a 
multi valued field (type); both sub-types and the complete mime-type of i'm 
not mistaken. There's a configuration directive to have it only index the 
complete mime-type.

> 
> On Mon, Sep 12, 2011 at 6:07 PM, Jaeger, Jay - DOT 
wrote:
> > Some possibilities:
> > 
> > 1) Put the file extension into your index (that is what we did when we
> > were testing indexing documents with Solr)
> > 2) Put a mime type for the document into your index.
> > 3) Put the whole file name / URL into your index, and match on part of
> > the name.  This will give some false positives.
> > 
> > JRJ
> > 
> > -Original Message-
> > From: ahmad ajiloo [mailto:ahmad.aji...@gmail.com]
> > Sent: Monday, September 12, 2011 5:58 AM
> > To: solr-user@lucene.apache.org
> > Subject: Fwd: How to serach on specific file types ?
> > 
> > Hello
> > I want to search on articles. So need to find only specific files like
> > doc, docx, and pdf.
> > I don't need any html pages. Thus the result of our search should only
> > consists of doc, docx, and pdf files.
> > can you help me?


Re: using a function query with OR and spaces?

2011-09-13 Thread Jason Toy
I wrote the title wrong, its a filter query, not a function query, thanks
for the correction.
The field is a string, I had tried  fq=stats_s:"New York"  before and that
did not work, I'm puzzled to why this didn't work.
I tried out your b suggestion and that worked,thanks!

On Tue, Sep 13, 2011 at 9:00 AM, Chris Hostetter
wrote:

> : Subject: using a function query with OR and spaces?
>
> First off, what you are asking about is a "filter query" not a "function
> query"
>
> https://wiki.apache.org/solr/CommonQueryParameters#fq
>
> : I had queries breaking on me when there were spaces in the text I was
> : searching for. Originally I had :
> :
> : fq=state_s:New York
> : and that would break, I found a work around by using:
> :
> : fq={!raw f=state_s}New York
>
> assuming the field is a StrField, the "raw" or "Term" QParsers will work,
> or you can quote the value using something like fq=stats_s:"New York"
>
> : My problem now is doing this with an OR query,  this is what I have now,
> but
> : it doesn't work:
>...
> : fq=({!raw f=country_s}United States OR {!raw f=city_s}New York
>
> That's beacause:
>
> a) local params (ie: the {! ...} syntax) must comeat the start of a SOlr
> Param as an instruction of how to parse it.
>
> b) the "raw" and "term" QParsers don't support *any* query markup/syntax
> (like "OR" modifiers).  If you want to build a complex query using
> multiple clauses that are constucted using specific QParsers, you need to
> build them up using multiple query params and/or the "_query_" hook in the
> LuceneQParser...
>
> fq=_query_:"{!term f=state_s}New York" OR _query_:"{!term
> f=country_s}United States"
>
> https://wiki.apache.org/solr/LocalParams
> http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/
>
>
> -Hoss
>


Re: Adding Query Filter custom implementation to Solr's pipeline

2011-09-13 Thread Chris Hostetter

: If you do need to implement something truely custom, writing it as your 
: own QParser to trigger via an "fq" can be advantageous so it can cached 
: and re-used by many queries.

I forgot to mention a very cool new feature that is about to be released 
in Solr 3.4

You can now instruct Solr that an "fq" filter query should not be cached, 
in which case Solr will only consult it after executing the main query -- 
which can be handy if you have some filtering logic that is very expensive 
to compute for each document, and you only wnat to evaluate for documents 
that have already been matched by the main query and all other filter 
queries.

Details are on the wiki...

https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters

-Hoss


Re: How to plug a new ANTLR grammar

2011-09-13 Thread Jason Toy
I'd love to see the progress on this.

On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla  wrote:

> Hi,
>
> The standard lucene/solr parsing is nice but not really flexible. I
> saw questions and discussion about ANTLR, but unfortunately never a
> working grammar, so... maybe you find this useful:
>
> https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr
>
> In the grammar, the parsing is completely abstracted from the Lucene
> objects, and the parser is not mixed with Java code. At first it
> produces structures like this:
>
> https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html
>
> But now I have a problem. I don't know if I should use query parsing
> framework in contrib.
>
> It seems that the qParser in contrib can use different parser
> generators (the default JavaCC, but also ANTLR). But I am confused and
> I don't understand this new queryParser from contrib. It is really
> very confusing to me. Is there any benefit in trying to plug the ANTLR
> tree into it? Because looking at the AST pictures, it seems that with
> a relatively simple tree walker we could build the same queries as the
> current standard lucene query parser. And it would be much simpler and
> flexible. Does it bring something new? I have a feeling I miss
> something...
>
> Many thanks for help,
>
>  Roman
>



-- 
- sent from my mobile
6176064373


Out of memory

2011-09-13 Thread Rohit
I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
core is very big containing 77516851 docs, the stats for searcher given
below

 

searcherName : Searcher@5a578998 main 
caching : true 
numDocs : 77516851 
maxDoc : 77518729 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 
indexVersion : 1308817281798 
openedAt : Tue Sep 13 18:59:52 GMT 2011 
registeredAt : Tue Sep 13 19:00:55 GMT 2011 
warmupTime : 63139

 

. Is there a way to reduce the number of docs loaded into memory for
this core?

. At any given time I dont need data more than past 15 days, unless
someone queries for it explicetly. How can this be achieved?

. Will it be better to go for Solr replication or distribution if
there is little option left

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:   http://about.me/rohitg

 



Lucene Grid question

2011-09-13 Thread sol myr
Hi,

I have a huge Lucene index, which I'd like to split between machines ("Grid").

E.g. say I have a chain of book-stores, in different countries, and I'm aiming 
for the following:
- Each country has its own index file, on its own machine (e.g. books from 
Japan are indexed on machine "japan1")
- Most users search only within their own country (e.g. search only the 
"japan1" index)
- But sometimes, they might ask to search the entire chain (all countries), 
meaning some sort of "map/reduce" (=collect data from all countries).


The main challenge is the "entire chain search", especially if I want 
reasonable ranking.

After some investigation (+great help from Hibernate Search forum), I've seen 
the following suggestions:


1) Implement a LuceneDirectory that transparently spreads across several 
machines.

I'm not sure how the Search would work - can I ask each index for *relevant* 
data only?
Or would I need to maintain one huge combined file, allowing "random access" 
for the Searcher?


2) Run an IndexReader on each machine.

They tell me each reader can report its relevant term-frequencies, and based on 
that I can fetch relevant results from each machine.
Apparently the ranking won't be perfect (for the overhaul result), but bearable.

Now, I'm not familiar with Lucene internals, and would really appreciate your 
views on it.
- Any good articles on Lucene "Gridding"?
- Any idea whether approach #1 makes any sense (IMHO it's not very sensible if 
I need to merge everything to a single huge file).
- Any good implementations (of either approaches)? So far I found Hibernate 
Search 4, and Solandra.


Thanks very much.



Using the contrib flexible query parser in Solr

2011-09-13 Thread Michael Ryan
Has anyone used the "Flexible Query Parser" 
(https://issues.apache.org/jira/browse/LUCENE-1567) in Solr?  I'm just starting 
to look at it for the first time and was wondering if it is something that can 
be dropped into Solr fairly easily, or if more extensive changes are needed.  I 
thought perhaps someone had already done this, but I couldn't find anything in 
the Solr bug tracker.

-Michael


Re: How to plug a new ANTLR grammar

2011-09-13 Thread Peter Keegan
Roman,

I'm not familiar with the contrib, but you can write your own Java code to
create Query objects from the tree produced by your lexer and parser
something like this:

StandardLuceneGrammarLexer lexer = new ANTLRReaderStream(new
StringReader(queryString));
CommonTokenStream tokens = new CommonTokenStream(lexer);
StandardLuceneGrammarParser parser = new
StandardLuceneGrammarParser(tokens);
StandardLuceneGrammarParser.query_return ret = parser.mainQ();
CommonTree t = (CommonTree) ret.getTree();
parseTree(t);

parseTree (Tree t) {

// recursively parse the Tree, visit each node

   visit (node);

}

visit (Tree node) {

switch (node.getType()) {
case (StandardLuceneGrammarParser.AND:
// Create BooleanQuery, push onto stack
...
}
}

I use the stack to build up the final Query from the queries produced in the
tree parsing.

Hope this helps.
Peter


On Tue, Sep 13, 2011 at 3:16 PM, Jason Toy  wrote:

> I'd love to see the progress on this.
>
> On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla 
> wrote:
>
> > Hi,
> >
> > The standard lucene/solr parsing is nice but not really flexible. I
> > saw questions and discussion about ANTLR, but unfortunately never a
> > working grammar, so... maybe you find this useful:
> >
> >
> https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr
> >
> > In the grammar, the parsing is completely abstracted from the Lucene
> > objects, and the parser is not mixed with Java code. At first it
> > produces structures like this:
> >
> >
> https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html
> >
> > But now I have a problem. I don't know if I should use query parsing
> > framework in contrib.
> >
> > It seems that the qParser in contrib can use different parser
> > generators (the default JavaCC, but also ANTLR). But I am confused and
> > I don't understand this new queryParser from contrib. It is really
> > very confusing to me. Is there any benefit in trying to plug the ANTLR
> > tree into it? Because looking at the AST pictures, it seems that with
> > a relatively simple tree walker we could build the same queries as the
> > current standard lucene query parser. And it would be much simpler and
> > flexible. Does it bring something new? I have a feeling I miss
> > something...
> >
> > Many thanks for help,
> >
> >  Roman
> >
>
>
>
> --
> - sent from my mobile
> 6176064373
>


Re: OOM issue

2011-09-13 Thread Erick Erickson
Multiple webapps will not help you, they're still on the underlying
memory. In fact, it'll make matters worse since they won't share
resources.

So questions become:
1> Why do you have 10 cores? Putting 10 cores on the same machine
doesn't really do much. It can make lots of sense to put 10 cores on the
same machine for *indexing*, then replicate them out. But putting
10 cores on one machine in hopes of making better use of memory
isn't useful. It may be useful to just go to one core.

2> Indexing, reindexing and searching on a single machine is requiring a
lot from that machine. Really you should consider having a master/slave
setup.

3> But assuming more hardware of any sort isn't in the cards, sure. reduce
your cache sizes. Look at  and make it small.

4> Consider indexing with Tika via SolrJ and only sending the finished
document to Solr.

Best
Erick

On Mon, Sep 12, 2011 at 5:42 AM, Manish Bafna  wrote:
> Number of cache is definitely going to reduce heap usage.
>
> Can you run those xlsx file separately with Tika and see if you are getting
> OOM issue.
>
> On Mon, Sep 12, 2011 at 3:09 PM, abhijit bashetti > wrote:
>
>> I am facing the OOM issue.
>>
>> OTHER than increasing the RAM , Can we chnage some other parameters to
>> avoid the OOM issue.
>>
>>
>> such as minimizing the filter cache size , document cache size etc.
>>
>> Can you suggest me some other option to avoid the OOM issue?
>>
>>
>> Thanks in advance!
>>
>>
>> Regards,
>>
>> Abhijit
>>
>


Re: Document row in solr Result

2011-09-13 Thread Erick Erickson
Not sure if it really applies, but consider the
QueryElevationComponent. It can force
the display of certain documents (identified by search term) to the
top of the results
list.

Best
Erick

On Mon, Sep 12, 2011 at 5:44 AM, Eric Grobler  wrote:
> Hi Pierre,
>
> Great idea, that will speed things up!
>
> Thank your very much.
>
> Regards
> Ericz
>
>
> On Mon, Sep 12, 2011 at 10:19 AM, Pierre GOSSE wrote:
>
>> Hi Eric,
>>
>> If you want a query informing one customer of its product row at any given
>> time, the easiest way is to filter on submission date greater than this
>> customer's and return the result count. If you have 500 products with an
>> earlier submission date, your row number is 501.
>>
>> Hope this helps,
>>
>> Pierre
>>
>>
>> -Message d'origine-
>> De : Eric Grobler [mailto:impalah...@googlemail.com]
>> Envoyé : lundi 12 septembre 2011 11:00
>> À : solr-user@lucene.apache.org
>> Objet : Re: Document row in solr Result
>>
>> Hi Manish,
>>
>> Thank you for your time.
>>
>> For upselling reasons I want to inform the customer that:
>> "your product is on the last page of the search result. However, click here
>> to put your product back on the first page..."
>>
>>
>> Here is an example:
>> I have a phone with productid 635001 in the iphone category.
>> When I sort this category by submissiondate this product will be near the
>> end of the result (on row 9863 in this example).
>> At the moment I have to scan nearly 1 rows in the client to determine
>> the position of this product.
>> Is there a more efficient way to find the position of a specific document
>> in
>> a resultset without returning the full result?
>>
>> q=category:iphone
>> fl=productid
>> sort=submissiondate desc
>> rows=1
>>
>>  row productid submissiondate
>>   1 656569    2011-09-12 08:12
>>   2 656468    2011-09-12 08:03
>>   3 656201    2011-09-11 23:41
>> ...
>> 9863 635001    2011-08-11 17:22
>> ...
>> 9922 634423    2011-08-10 21:51
>>
>> Regards
>> Ericz
>>
>> On Mon, Sep 12, 2011 at 9:38 AM, Manish Bafna > >wrote:
>>
>> > You might not be able to find the row index.
>> > Can you post your query in detail. The kind of inputs and outputs you are
>> > expecting.
>> >
>> > On Mon, Sep 12, 2011 at 2:01 PM, Eric Grobler > > >wrote:
>> >
>> > > Hi Manish,
>> > >
>> > > Thanks for your reply - but how will that return me the row index of
>> the
>> > > original query.
>> > >
>> > > Regards
>> > > Ericz
>> > >
>> > > On Mon, Sep 12, 2011 at 9:24 AM, Manish Bafna <
>> manish.bafna...@gmail.com
>> > > >wrote:
>> > >
>> > > > fq -> filter query parameter searches within the results.
>> > > >
>> > > > On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler <
>> > impalah...@googlemail.com
>> > > > >wrote:
>> > > >
>> > > > > Hi Solr experts,
>> > > > >
>> > > > > If you have a site with products sorted by submission date, the
>> > product
>> > > > of
>> > > > > a
>> > > > > customer might be on page 1 on the first day, and then move down to
>> > > page
>> > > > x
>> > > > > as other customers submit newer entries.
>> > > > >
>> > > > > To find the row of a product you can of course run the query and
>> loop
>> > > > > through the result until you find the specific productid like:
>> > > > > q=category:myproducttype
>> > > > > fl=productid
>> > > > > sort=submissiondate desc
>> > > > > rows=1
>> > > > >
>> > > > > But is there perhaps a more efficient way to do this? Maybe a
>> special
>> > > > > syntax
>> > > > > to search within the result.
>> > > > >
>> > > > > Thanks
>> > > > > Ericz
>> > > > >
>> > > >
>> > >
>> >
>>
>


RE: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Jaeger, Jay - DOT
Nicely put.  ;^)

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Tuesday, September 13, 2011 9:16 AM
To: solr-user@lucene.apache.org
Subject: Re: can indexing information stored in db rather than filesystem?

On Sep 13, 2011, at 6:51 AM, kiran.bodigam wrote:

> As suggested i agree that we are losing many of the benefits of Solr/Lucene
> but i still want to store the index output (index files) in db table please
> suggest what are the steps i need to follow to configure the db with SOLR
> Engine 

The steps are:

1. write the Java code to do that
2. submit it as contrib, because it is such a bad idea that I doubt it will be 
added to the common code

wunder
--
Walter Underwood




RE: Out of memory

2011-09-13 Thread Jaeger, Jay - DOT
numDocs is not the number of documents in memory.  It is the number of 
documents currently in the index (which is kept on disk).  Same goes for 
maxDocs, except that it is a count of all of the documents that have ever been 
in the index since it was created or optimized (including deleted documents).

Your subject indicates that something is giving you some kind of Out of memory 
error.  We might better be able to help you if you provide more information 
about your exact problem.

JRJ


-Original Message-
From: Rohit [mailto:ro...@in-rev.com] 
Sent: Tuesday, September 13, 2011 2:29 PM
To: solr-user@lucene.apache.org
Subject: Out of memory

I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
core is very big containing 77516851 docs, the stats for searcher given
below

 

searcherName : Searcher@5a578998 main 
caching : true 
numDocs : 77516851 
maxDoc : 77518729 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 
indexVersion : 1308817281798 
openedAt : Tue Sep 13 18:59:52 GMT 2011 
registeredAt : Tue Sep 13 19:00:55 GMT 2011 
warmupTime : 63139

 

. Is there a way to reduce the number of docs loaded into memory for
this core?

. At any given time I dont need data more than past 15 days, unless
someone queries for it explicetly. How can this be achieved?

. Will it be better to go for Solr replication or distribution if
there is little option left

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:   http://about.me/rohitg

 



Can index size increase when no updates/optimizes are happening?

2011-09-13 Thread Yury Kats
One of my users observed that the index size (in bytes)
increased over night. There was no indexing activity
at that time, only querying was taking place.

Running "optimize" brought the index size back down to
what it was when indexing finished the day before.

What could explain that?



Document Boost not evaluated when using standard Query Type?

2011-09-13 Thread Daniel Pötzinger
Hey all

I want to show all documents with of a certain type. The documents should be 
ordered by the index time document boost.

So I expected that this would work:

/select?debugQuery=on&q=doctype:music&q.op=OR&qt=standard

But in fact every document gets the same score:

0.7306 = (MATCH) fieldWeight(doctype:music in 1), product of:
  1.0 = tf(termFreq(doctype:music)=1)
  0.7306 = idf(docFreq=37138, maxDocs=37138)
  1.0 = fieldNorm(field=doctype, doc=1)


So I am a bit confused now? When is the (index time) document boost evaluated? 
 ( My understanding was that during indexing time the document field values are 
multiplicated - and that during search this will result in the higher scores? )

Is there a better way to get a list of all documents (matching a simple "where 
clause) sorted by documents boost?

Thanks for any hints.

Daniel




Re: DIH load only selected documents with XPathEntityProcessor

2011-09-13 Thread Pulkit Singhal
This solution doesn't seem to be working for me.

I am using Solr trunk and I have the same question as Bernd with a small
twist: the field that should NOT be empty, happens to be a derived field
called price, see the config below:






...


I have also changed the sample script to check the price field isntead of
the link field that was being used as an example in this thread earlier:





Does anyone have any thoughts on what I'm missing?
Thanks!
- Pulkit

On Mon, Jan 10, 2011 at 3:06 AM, Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> Hi Gora,
>
> thanks a lot, very nice solution, works perfectly.
> I will dig more into ScriptTransformer, seems to be very powerful.
>
> Regards,
> Bernd
>
> Am 08.01.2011 14:38, schrieb Gora Mohanty:
> > On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling
> >  wrote:
> >> Hello list,
> >>
> >> is it possible to load only selected documents with
> XPathEntityProcessor?
> >> While loading docs I want to drop/skip/ignore documents with missing
> URL.
> >>
> >> Example:
> >> 
> >>
> >>first title
> >>identifier_01
> >>http://www.foo.com/path/bar.html
> >>
> >>
> >>second title
> >>identifier_02
> >>
> >>
> >> 
> >>
> >> The first document should be loaded, the second document should be
> ignored
> >> because it has an empty link (should also work for missing link field).
> > [...]
> >
> > You can use a ScriptTransformer, along with $skipRow/$skipDoc.
> > E.g., something like this for your data import configuration file:
> >
> > 
> >  >   function skipRow(row) {
> > var link = row.get( 'link' );
> > if( link == null || link == '' ) {
> >   row.put( '$skipRow', 'true' );
> > }
> > return row;
> >   }
> > ]]>
> > 
> > 
> >  > baseDir="/home/gora/test" fileName=".*xml" newerThan="'NOW-3DAYS'"
> > recursive="true" rootEntity="false" dataSource="null">
> >  > forEach="/documents/document" url="${f.fileAbsolutePath}"
> > transformer="script:skipRow">
> >
> >
> >
> > 
> > 
> > 
> > 
> >
> > Regards,
> > Gora
>


Re: DIH load only selected documents with XPathEntityProcessor

2011-09-13 Thread Pulkit Singhal
Oh and I"m sure that I'm using Java 6 because the properties from the Solr
webpage spit out:

java.runtime.version = 1.6.0_26-b03-384-10M3425


On Tue, Sep 13, 2011 at 4:15 PM, Pulkit Singhal wrote:

> This solution doesn't seem to be working for me.
>
> I am using Solr trunk and I have the same question as Bernd with a small
> twist: the field that should NOT be empty, happens to be a derived field
> called price, see the config below:
>
>transformer="RegexTransformer,HTMLStripTransformer,DateFormatTransformer,
> script:skipRow">
>
>xpath="/rss/channel/item/description"
>   />
>
>   regex=".*\$(\d*.\d*)"
>  sourceColName="description"
>  />
> ...
> 
>
> I have also changed the sample script to check the price field isntead of
> the link field that was being used as an example in this thread earlier:
>
>
> 
>  function skipRow(row) {
> var price = row.get( 'price' );
> if ( price == null || price == '' ) {
>
> row.put( '$skipRow', 'true' );
> }
> return row;
> }
> ]]>
> 
>
> Does anyone have any thoughts on what I'm missing?
> Thanks!
> - Pulkit
>
>
> On Mon, Jan 10, 2011 at 3:06 AM, Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
>
>> Hi Gora,
>>
>> thanks a lot, very nice solution, works perfectly.
>> I will dig more into ScriptTransformer, seems to be very powerful.
>>
>> Regards,
>> Bernd
>>
>> Am 08.01.2011 14:38, schrieb Gora Mohanty:
>> > On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling
>> >  wrote:
>> >> Hello list,
>> >>
>> >> is it possible to load only selected documents with
>> XPathEntityProcessor?
>> >> While loading docs I want to drop/skip/ignore documents with missing
>> URL.
>> >>
>> >> Example:
>> >> 
>> >>
>> >>first title
>> >>identifier_01
>> >>http://www.foo.com/path/bar.html
>> >>
>> >>
>> >>second title
>> >>identifier_02
>> >>
>> >>
>> >> 
>> >>
>> >> The first document should be loaded, the second document should be
>> ignored
>> >> because it has an empty link (should also work for missing link field).
>> > [...]
>> >
>> > You can use a ScriptTransformer, along with $skipRow/$skipDoc.
>> > E.g., something like this for your data import configuration file:
>> >
>> > 
>> > > >   function skipRow(row) {
>> > var link = row.get( 'link' );
>> > if( link == null || link == '' ) {
>> >   row.put( '$skipRow', 'true' );
>> > }
>> > return row;
>> >   }
>> > ]]>
>> > 
>> > 
>> > > > baseDir="/home/gora/test" fileName=".*xml" newerThan="'NOW-3DAYS'"
>> > recursive="true" rootEntity="false" dataSource="null">
>> > > > forEach="/documents/document" url="${f.fileAbsolutePath}"
>> > transformer="script:skipRow">
>> >
>> >
>> >
>> > 
>> > 
>> > 
>> > 
>> >
>> > Regards,
>> > Gora
>>
>
>


Managing solr machines (start/stop/status)

2011-09-13 Thread Jamie Johnson
I know this isn't a solr specific question but I was wondering what
folks do in regards to managing the machines in their solr cluster?
Are there any recommendations for how to start/stop/manage these
machines?  Any suggestions would be appreciated.


DIH skipping imports with skipDoc vs skipDoc

2011-09-13 Thread Pulkit Singhal
Hello,

1)  The documented explanation of skipDoc and skipRow is not enough
for me to discern the difference between them:
$skipDoc : Skip the current document . Do not add it to Solr. The
value can be String true/false
$skipRow : Skip the current row. The document will be added with rows
from other entities. The value can be String true/false
Can someone please elaborate and help me out with an example?

2) I am working off the Solr trunk (4.x) and nothing I do seems to
make the import for a given row/doc get skipped.
As proof I've added these tests to my data import xml and all the rows
are still getting indexed!!!
If anyone sees something wrong with my config please tell me.
Make sure to take note of the blatant use of row.put( '$skipDoc',
'true' ); and 
Yet stuff still gets imported, this is beyond me. Need a fresh pair of eyes :)







http://www.amazon.com/gp/rss/new-releases/apparel/1040660/ref=zg_bsnr_1040660_rsslink";
processor="XPathEntityProcessor"
forEach="/rss/channel | /rss/channel/item"

transformer="RegexTransformer,HTMLStripTransformer,DateFormatTransformer,script:skipRow,TemplateTransformer">









Thanks!
- Pulkit


Re: select query does not find indexed pdf document

2011-09-13 Thread Michael Dockery
Thank you for your informative reply.

I would like to start simple by combining both filename and content 
  into the same default search field
   ...which my default schema xml calls  "text"
...
text
...

also:
-case and accent insensitive
-no splits on numb3rs
-no highlights 
-text processing same for index and search

however I do like
-I like ngrams prerrably (partial/prefix word/token search)


what schema mod's would be needed?

also what curl syntax to submit/index a pdf (with filename and content combined 
into the default search field)?




From: Bob Sandiford 
To: Michael Dockery 
Cc: "solr-user@lucene.apache.org" 
Sent: Monday, September 12, 2011 1:38 PM
Subject: RE: select query does not find indexed pdf document

Hi, Michael.

Well, the stock answer is, 'it depends'

For example - would you want to be able to search filename without searching 
file contents, or would you always search both of them together?  If both, then 
copy both the file name and the parsed file content from the pdf into a single 
search field, and you can set that up as the default search field.

Or - what kind of processing / normalizing do you want on this data?  Case 
insensitive?  Accent insensitive?  If a 'word' contains camel case (e.g. 
TheVeryIdea), do you want that split on the case changes?  (but then watch out 
for things like "iPad")  If a 'word' contains numbers, do want them left 
together, or separated?  Do you want stemming (where searching for 'stemming' 
would also find 'stem', 'stemmed', that sort of thing?)  Is this always 
English, or are the other languages involved.  Do you want the text processing 
to be the same for indexing vs searching?  Do you want to be able to find hits 
based on the first few characters of a term?  (ngrams)

Do you want to be able to highlight text segments where the search terms were 
found?

probably you want to read up on the various tokenizers and filters that are 
available.  Do some prototyping and see how it looks.

Here's a starting point: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Basically, there is no 'one size fits all' here.  Part of the power of Solr / 
Lucene is its configurability to achieve the results your business case calls 
for.  Part of the drawback of Solr / Lucene - especially for new folks - is its 
configurability to achieve the results you business case calls for. :)

Anyone got anything else to suggest for Michael?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

From: Michael Dockery [mailto:dockeryjava...@yahoo.com]
Sent: Monday, September 12, 2011 1:18 PM
To: Bob Sandiford
Subject: Re: select query does not find indexed pdf document

thank you.  that worked.

Any tips for   very   very  basic setup of the schema xml?
   or is the default basic enough?

I basically only want to search search on
        filename   and    file contents


From: Bob Sandiford 
To: "solr-user@lucene.apache.org" ; Michael 
Dockery 
Sent: Monday, September 12, 2011 10:04 AM
Subject: RE: select query does not find indexed pdf document

Um - looks like you specified your id value as "pdfy", which is reflected in 
the results from the "*:*" query, but your id query is searching for "vpn", 
hence no matches...

What does this query yield?

http://www/SearchApp/select/?q=id:pdfy

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | 
bob.sandif...@sirsidynix.com
www.sirsidynix.com

> -Original Message-
> From: Michael Dockery 
> [mailto:dockeryjava...@yahoo.com]
> Sent: Monday, September 12, 2011 9:56 AM
> To: solr-user@lucene.apache.org
> Subject: Re: select query does not find indexed pdf document
>
> http://www/SearchApp/select/?q=id:vpn
>
> yeilds this:
>   
> - 
> - 
>   0
>   15
> - 
>   id:vpn
>   
>   
>   
>   
>
>
> *
>
>  http://www/SearchApp/select/?q=*:*
>
> yeilds this:
>
>   
> - 
> - 
>   0
>   16
> - 
>   *.*
>   
>   
> - 
> - 
>   doc
> - 
>   application/pdf
>   
>   pdfy
>   2011-05-20T02:08:48Z
> - 
>   dmvpndeploy.pdf
>   
>   
>   
>   
>
>
> From: Jan Høydahl mailto:jan@cominvent.com>>
> To: solr-user@lucene.apache.org; Michael 
> Dockery
> mailto:dockeryjava...@yahoo.com>>
> Sent: Monday, September 12, 2011 4:59 AM
> Subject: Re: select query does not find indexed pdf document
>
> Hi,
>
> What do you get from a query http://www/SearchApp/select/?q=*:* or
> http://www/SearchApp/select/?q=id:vpn ?
> You may not have mapped the fields correctly to your schema?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 12. sep. 2011, at 02:12, Michael Dockery wrote:
>
> > I am new to solr.
> >
> > I tried to uploa

Re: Document Boost not evaluated when using standard Query Type?

2011-09-13 Thread Chris Hostetter

: I want to show all documents with of a certain type. The documents 
: should be ordered by the index time document boost.

...

: But in fact every document gets the same score:
: 
: 0.7306 = (MATCH) fieldWeight(doctype:music in 1), product of:
:   1.0 = tf(termFreq(doctype:music)=1)
:   0.7306 = idf(docFreq=37138, maxDocs=37138)
:   1.0 = fieldNorm(field=doctype, doc=1)

Index boosts are folded into the fieldNorm.  by the looks of it, you are 
using omitNorms="true" on the field "doctype"

: Is there a better way to get a list of all documents (matching a simple 
: "where clause) sorted by documents boost?

fieldNorms are very corse.  In my opinion, if you have a 
"weighting" you want to use to affect score sort, it's better to index 
that weight as a numeric field, and explicitly factor it into the score 
using a function query...

q={!boost b=yourWeightField v=$qq}&qq=doctype:music

More info...

https://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
http://www.lucidimagination.com/blog/2011/06/20/solr-powered-isfdb-part-10/
https://github.com/lucidimagination/isfdb-solr/commit/75f830caa1a11fd97ab48d6428096cf63f53cb3b

-Hoss


where is the SOLR_HOME ?

2011-09-13 Thread ahmad ajiloo
Hi
In this page 
(
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
)said:
"Note: to use this filter, see solr/contrib/analysis-extras/README.txt for
instructions on which jars you need to add to your SOLR_HOME/lib "
 I can't find "SOLR_HOME/lib" !
1- Is there: "apache-solr-3.3.0\example\solr" ? there is no directory which
name is lib
I created "example/solr/lib" directory and copied jar files to it and tested
this expressions in solrconfig.xml :


 (for more assurance!!!)
but it doesn't work and still has following errors !

2- or: "apache-solr-3.3.0\" ? there is no directory which name is lib
3- or : "apache-solr-3.3.0\example" ? there is a "lib" directory. I copied 4
libraries exist in "solr/contrib/analysis-extras/
" to "apache-solr-3.3.0\example\lib" but some errors exist in loading page "
http://localhost:8983/solr/admin"; :

I use Nutch to crawling the web and fetching web pages. I send data of Nutch
to Solr for Indexing. according to Nutch tutorial (
http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch) I
should copy schema.xml of Nutch to conf directory of Solr.
So I added all of my required Analyzer like "*ICUNormalizer2FilterFactory" *to
this new shema.xml


this is schema.xml :
-I
added bold text to this file



















*
 

  








  


  












*








*











*











































id
content


---

And these are errors in loading http://localhost:8983/solr/admin/

-org.apache.solr.common.SolrException:
Error loading class 'solr.ICUFoldingFilterFactory'
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:404)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:941)
at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:62)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:125)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:461)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499

Re: Document Boost not evaluated when using standard Query Type?

2011-09-13 Thread Daniel Pötzinger
Thanks, that helped!

Am Sep 14, 2011 um 4:56 PM schrieb Chris Hostetter:

> 
> 
> fieldNorms are very corse.  In my opinion, if you have a 
> "weighting" you want to use to affect score sort, it's better to index 
> that weight as a numeric field, and explicitly factor it into the score 
> using a function query...
> 
>   q={!boost b=yourWeightField v=$qq}&qq=doctype:music
> 
> More info...
> 
> https://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
> http://www.lucidimagination.com/blog/2011/06/20/solr-powered-isfdb-part-10/
> https://github.com/lucidimagination/isfdb-solr/commit/75f830caa1a11fd97ab48d6428096cf63f53cb3b
> 
> -Hoss



Re: Document Boost not evaluated when using standard Query Type?

2011-09-13 Thread Daniel Pötzinger
> 
> fieldNorms are very corse.  In my opinion, if you have a 
> "weighting" you want to use to affect score sort, it's better to index 
> that weight as a numeric field, and explicitly factor it into the score 
> using a function query...

I see that in this use case this makes most sense - thanks.

But why are fieldNorms in general very corse?

Thanks,
Daniel



DIH delta last_index_time

2011-09-13 Thread Maria Vazquez
Hi,
How do you handle the situation where the time on the server running Solr
doesn¹t match the time in the database?
I¹m using the last_index_time saved by Solr in the delta query checking it
against lastModifiedDate field in the database but the times are not in sync
so I might lose some changes.
Can we use something else other than last_index_time? Maybe something like
last_pk or something.
Thanks in advance.
Maria



RE: Out of memory

2011-09-13 Thread Rohit
Thanks Jaeger.

Actually I am storing twitter streaming data into the core, so the rate of
index is about 12tweets(docs)/second. The same solr contains 3 other cores
but these cores are not very heavy. Now the twitter core has become very
large (77516851) and its taking a long time to query (Mostly facet queries
based on date, string fields).

After sometime about 18-20hr solr goes out of memory, the thread dump
doesn't show anything. How can I improve this besides adding more ram into
the system.



Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg

-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] 
Sent: 13 September 2011 21:06
To: solr-user@lucene.apache.org
Subject: RE: Out of memory

numDocs is not the number of documents in memory.  It is the number of
documents currently in the index (which is kept on disk).  Same goes for
maxDocs, except that it is a count of all of the documents that have ever
been in the index since it was created or optimized (including deleted
documents).

Your subject indicates that something is giving you some kind of Out of
memory error.  We might better be able to help you if you provide more
information about your exact problem.

JRJ


-Original Message-
From: Rohit [mailto:ro...@in-rev.com] 
Sent: Tuesday, September 13, 2011 2:29 PM
To: solr-user@lucene.apache.org
Subject: Out of memory

I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
core is very big containing 77516851 docs, the stats for searcher given
below

 

searcherName : Searcher@5a578998 main 
caching : true 
numDocs : 77516851 
maxDoc : 77518729 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 
indexVersion : 1308817281798 
openedAt : Tue Sep 13 18:59:52 GMT 2011 
registeredAt : Tue Sep 13 19:00:55 GMT 2011 
warmupTime : 63139

 

. Is there a way to reduce the number of docs loaded into memory for
this core?

. At any given time I dont need data more than past 15 days, unless
someone queries for it explicetly. How can this be achieved?

. Will it be better to go for Solr replication or distribution if
there is little option left

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:   http://about.me/rohitg

 



EofException with Solr in Jetty

2011-09-13 Thread Michael Szalay
Hi all

sometimes we have this error in our system. We are running Solr 3.1.0 running 
on Jetty 7.2.2

Anyone an idea how to tune this?

14:41:05,693 | ERROR | qtp283504850-36 | SolrDispatchFilter | 
apache.solr.common.SolrException 151 | 154 - 
mvn_ch.basis06.eld.indexer_ch.basis06.eld.indexer.solrserver_0.1-SNAPSHOT_war - 
0 | org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:149)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96)
at 
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at 
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at 
org.ops4j.pax.web.service.internal.WelcomeFilesFilter.doFilter(WelcomeFilesFilter.java:169)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:473)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:516)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:929)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:116)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:184)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:864)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:72)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:114)
at org.eclipse.jetty.server.Server.handle(Server.java:352)
at 
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
at 
org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1051)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:590)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:212)
at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:426)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:508)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.access$000(SelectChannelEndPoint.java:34)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:40)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:451)
at java.lang.Thread.run(Thread.java:662)

-- 
Michael Szalay
Senior Software Engineer

basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
http://www.basis06.ch - source of smart business 



solr 1.4 highlighting issue

2011-09-13 Thread Dmitry Kan
Hello list,

Not sure how many of you are still using solr 1.4 in production, but here is
an issue with highlighting, that we've noticed:

The query is:

(drill AND ships) OR rigs


Excerpt from the highlighting list:



Within the fleet of 27 floating rigs (semisubmersibles and
drillships) are 21 deepwater drilling






Why did solr highlight "drilling" even though there is no "ships" in the
text?

*
*--
Regards,

Dmitry Kan