How to map database table for facted search?

2011-09-23 Thread Chorherr Nikolaus
Hi All!

We are working first time with solr and have a simple data model

Entity Person(column surname) has 1:n Attribute(column name) has 1:n 
Value(column text)

We need faceted search on the content of Attribute:name not on Attribute:name 
itself, e.g if an Attribute of person has name=hobby, we would like to have 
something like ... "facet=true&facet.name=hobby" and get back
all related Value with count.(We do not need a "facet.name=name" and get back 
all distinct values of the name column of Attribute)

How do we have to map our database, define or document and/or define our schema?

Any help is highly appreciated - Thx in advance

Niki


How to delete all of the Indexed data?

2011-09-23 Thread ahmad ajiloo
Hi all
I sent my data from Nutch to Solr for indexing and searching. Now I want to
delete all of the indexed data sent from Nutch. Can anyone help me?
thanks


Re: Slow autocomplete(terms)

2011-09-23 Thread roySolr
Thanks for helping me so far,

Yes i have seen the edgeNGrams possiblity. Correct me if i'm wrong, but i
thought it isn't possible to do infix searches with edgeNGrams? Like "chest"
gives suggestion "manchester".



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3361155.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to delete all of the Indexed data?

2011-09-23 Thread Héctor Trujillo
  Hi, I suppose that this isn't what you mean but I leave it here, because
it could help you.

If this what you need?



Using SolrJ, I delete all the rows of the index whit this command:

solr.deleteByQuery("id:*");



But you need to delete all the rows inserted from Nutch, could be this helps
you.



Regards,

Hector

2011/9/23 ahmad ajiloo 

> Hi all
> I sent my data from Nutch to Solr for indexing and searching. Now I want to
> delete all of the indexed data sent from Nutch. Can anyone help me?
> thanks
>


Re: StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-23 Thread Pranav Prakash
> You've got CommonGramsFilterFactory and StopFilterFactory both using
> stopwords.txt, which is a confusing configuration.  Normally you'd want one
> or the other, not both ... but if you did legitimately have both, you'd want
> them to each use a different wordlist.
>

Maybe I am wrong. But my intentions of using both of them is - first I want
to use phrase queries so used CommonGramsFilterFactory. Secondly, I dont
want those stopwords in my index, so I have used StopFilterFactory to remove
them.



>
> The commongrams filter turns each found occurrence of a word in the file
> into two tokens - one prepended with the token before it, one appended with
> the token after it.  If it's the first or last term in a field, it only
> produces one token.  When it gets to the stopfilter, the combined terms no
> longer match what's in stopwords.txt, so no action is taken.
>
> If I had to guess, what you are seeing in the top 10 terms is the
> concatenation of your most common stopword with another word.  If it were
> English, I would guess that to be "of_the" or something similar.  If my
> guess is wrong, then I'm not sure what's going on, and some cut/paste of
> what you're actually seeing might be in order.


term frequencyto 26164and 25804the 25566of 25022a 24918in 24590for 23646n23588
with 23055is 22510



>  Did you do delete and do a full reindex after you changed your schema?
>

Yup I did that a couple of times


>
> Thanks,
> Shawn
>
>
*Pranav Prakash*

"temet nosce"

Twitter  | Blog 
 | Google 


RE: OutOfMemoryError coming from TermVectorsReader

2011-09-23 Thread Anand.Nigam
Thanks Otis,

I am able to show the results such that the last match (500 characters around 
the match) in the log file is shown highlighted. I can try creating multiple 
documents from one log file to see if it improves the performance.

Can anything else be done to reduce the heap size?

Anand Nigam
RBS Global Banking & Markets
Office: +91 124 492 5506   

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: 23 September 2011 09:35
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemoryError coming from TermVectorsReader

Anand,

But do you really want the whole log file to be a single Solr document (from a 
cursory look at the thread it seems that is the case).  Why not break up a log 
file into multiple documents? e.g. each log message could be one Solr document. 
 Not only will that solve your memory issues, but I think it also makes more 
sense if the intention is for a person to do a search and then look at the 
matched log messages - much easier if you point a person to a short log doc 
than a giant ones through which the person then has to do a manual find.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem 
search :: http://search-lucene.com/


- Original Message -
> From: "anand.ni...@rbs.com" 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Thursday, September 22, 2011 11:56 PM
> Subject: RE: OutOfMemoryError coming from TermVectorsReader
> 
> Hi,
> 
> I am trying to index application log files and some database tables. 
> Size of the log files range from 1 MB to 100 MB. Database tables also 
> have few thousands of rows.
> 
> I have used termvector highlighter for the content of the log files as 
> mentioned
> below:
> 
> Heap size : 10 GB
> OS: Linux, 64 bit
> Solr version : 3.4.0
> 
> Thanks & Regards
> Anand
> 
> 
> 
> Anand Nigam
> RBS Global Banking & Markets
> Office: +91 124 492 5506
> 
> -Original Message-
> From: Glen Newton [mailto:glen.new...@gmail.com]
> Sent: 19 September 2011 16:52
> To: solr-user@lucene.apache.org
> Subject: Re: OutOfMemoryError coming from TermVectorsReader
> 
> Please include information about your heap size, (and other Java 
> command line
> arguments) as well a platform OS (version, swap size, etc), Java 
> version, underlying hardware (RAM, etc) for us to better help you.
> 
> From the information you have given, increasing your heap size should help.
> 
> Thanks,
> Glen
> 
> http://zzzoot.blogspot.com/
> 
> 
> On Mon, Sep 19, 2011 at 1:34 AM,   wrote:
>>  Hi,
>> 
>>  I am new to solr. I an trying to index text documents of large size. 
>> On
> searching from indexed documents I am getting following 
> OutOfMemoryError. Please help me in resolving this issue.
>> 
>>  The field which stores file content is configured in schema.xml as below:
>> 
>> 
>>   indexed="true" stored="true" 
>>  omitNorms="true" termVectors="true" 
> termPositions="true" 
>>  termOffsets="true" />
>> 
>>  and Highlighting is configured as below:
>> 
>> 
>>  on
>> 
>>  ${all.fields.list}
>> 
>>  500
>> 
>>   name="f.Content.hl.useFastVectorHighlighter">true
>> 
>> 
>> 
>>  2011-09-16 09:38:45.763 [http-thread-pool-9091(5)] ERROR -
>>  java.lang.OutOfMemoryError: Java heap space
>>         at
>>  
>> org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsR
>> e
>>  ader.java:503)
>>         at
>>  
>> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:
>> 2
>>  63)
>>         at
>>  
>> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:
>> 2
>>  84)
>>         at
>>  org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.
>>  java:759)
>>         at
>>  
>> org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryRe
>> a
>>  der.java:510)
>>         at
>>  
>> org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexRea
>> d
>>  er.java:234)
>>         at
>> 
> org.apache.lucene.search.vectorhighlight.FieldTermStack.(FieldTe
>>  rmStack.java:83)
>>         at
>>  
>> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFie
>> l
>>  dFragList(FastVectorHighlighter.java:175)
>>         at
>>  
>> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBes
>> t
>>  Fragments(FastVectorHighlighter.java:166)
>>         at
>>  
>> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFast
>> V
>>  ectorHighlighter(DefaultSolrHighlighter.java:509)
>>         at
>>  
>> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(Defau
>> l
>>  tSolrHighlighter.java:376)
>>         at
>>  
>> org.apache.solr.handler.component.HighlightComponent.process(Highligh
>> t
>>  Component.java:116)
>>         at
>>  
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea
>> r
>>  chHandler.java:194)
>>         at
>>  
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
>> e
>>  rBase.java:129)
>>         at org.apache.solr.core.SolrCore.execute(SolrCore.

Re: Solr wildcard searching

2011-09-23 Thread Doug McKenzie
Im using EdgeNgrams to do the same thing rather than wild card searches. 
More info here :

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

Make sure your search phrase is enclosed in quotes as well so its 
treated as a phrase rather than 2 words.


On 23/09/2011 03:08, jaystang wrote:

Hey guys,
Very new to solr.  I'm using the data import handler to pull customer data
out of my database and index it.  All works great so far.  Now I'm trying to
query against a specific field and I seem to be struggling with doing a
wildcard search. See below.

I have several indexed documents with a "customer_name" field containing
"John Doe".  I have a UI that contains a listing of this indexed data as
well has a keyword filter field (filter as you type).  So I would like when
the user starts typing "J", "John Doe will return, and "Jo", "John Doe" will
return, "Joh"... etc, etc...

I've tried the following:

Search: customer_name:Joh*
Returns: The correct "John Doe" Record"

Search: customer_name:John Do*
Returns: No results (nothing returns w/ 2 works since I don't have the
string in quotes.)

Search: customer_name:"Joh*"
Returns: No results

Search: customer_name:"John Do*"
Returns: No results

Search: customer_NAME:"John Doe*"
Returns: The correct "John Doe" Record"

I feel like I'm close, only issue is when there are multiple words.

Any advice would be appreciated.

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-wildcard-searching-tp3360681p3360681.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Become a Firebox Fan on Facebook: http://facebook.com/firebox
And Follow us on Twitter: http://twitter.com/firebox

Firebox has been nominated for Retailer of the Year in the 2011 Stuff Awards. 
Who will win? It's up to you! Visit http://www.stuff.tv/awards and place your 
vote. We'll do a special dance if it's us.

Firebox HQ is MOVING HOUSE! We're migrating from Streatham Hill to  shiny new 
digs in Shoreditch. As of 3rd October please update your records to:
Firebox.com, 6.10 The Tea Building, 56 Shoreditch High Street, London, E1 6JJ

Global Head Office: Firebox House, Ardwell Road, London SW2 4RT
Firebox.com Ltd is registered in England and Wales, company number 3874477
Registered Company Address: 41 Welbeck Street London W1G 8EA Firebox.com

Any views expressed in this email are those of the individual sender, except 
where the sender expressly, and with authority, states them to be the views of 
Firebox.com Ltd.


A fieldType for a address street

2011-09-23 Thread Nicolas Martin

Hi solR users!

I'd like to make research on my client database, in particular, i need 
to find client by their address (ex : "100 avenue des champs élysée")


Does anyone know a good fieldType to store my addresses to enable me to 
search client by address easily ?



thank you all




On 23/09/2011 11:06, Doug McKenzie wrote:
Im using EdgeNgrams to do the same thing rather than wild card 
searches. More info here :
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ 



Make sure your search phrase is enclosed in quotes as well so its 
treated as a phrase rather than 2 words.


On 23/09/2011 03:08, jaystang wrote:

Hey guys,
Very new to solr.  I'm using the data import handler to pull customer 
data
out of my database and index it.  All works great so far.  Now I'm 
trying to

query against a specific field and I seem to be struggling with doing a
wildcard search. See below.

I have several indexed documents with a "customer_name" field containing
"John Doe".  I have a UI that contains a listing of this indexed data as
well has a keyword filter field (filter as you type).  So I would 
like when
the user starts typing "J", "John Doe will return, and "Jo", "John 
Doe" will

return, "Joh"... etc, etc...

I've tried the following:

Search: customer_name:Joh*
Returns: The correct "John Doe" Record"

Search: customer_name:John Do*
Returns: No results (nothing returns w/ 2 works since I don't have the
string in quotes.)

Search: customer_name:"Joh*"
Returns: No results

Search: customer_name:"John Do*"
Returns: No results

Search: customer_NAME:"John Doe*"
Returns: The correct "John Doe" Record"

I feel like I'm close, only issue is when there are multiple words.

Any advice would be appreciated.

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-wildcard-searching-tp3360681p3360681.html 


Sent from the Solr - User mailing list archive at Nabble.com.

--
Become a Firebox Fan on Facebook: http://facebook.com/firebox
And Follow us on Twitter: http://twitter.com/firebox

Firebox has been nominated for Retailer of the Year in the 2011 Stuff 
Awards. Who will win? It's up to you! Visit http://www.stuff.tv/awards 
and place your vote. We'll do a special dance if it's us.


Firebox HQ is MOVING HOUSE! We're migrating from Streatham Hill to  
shiny new digs in Shoreditch. As of 3rd October please update your 
records to:
Firebox.com, 6.10 The Tea Building, 56 Shoreditch High Street, London, 
E1 6JJ


Global Head Office: Firebox House, Ardwell Road, London SW2 4RT
Firebox.com Ltd is registered in England and Wales, company number 
3874477

Registered Company Address: 41 Welbeck Street London W1G 8EA Firebox.com

Any views expressed in this email are those of the individual sender, 
except where the sender expressly, and with authority, states them to 
be the views of Firebox.com Ltd.




A fieldType for a address street

2011-09-23 Thread Nicolas Martin

Hi solR users!

I'd like to make research on my client database, in particular, i need 
to find client by their address (ex : "100 avenue des champs élysée")


Does anyone know a good fieldType to store my addresses to enable me to 
search client by address easily ?



thank you all


Re: Snippets and Boundaryscanner in Highlighter

2011-09-23 Thread O. Klein
The regex fragmenter showed that there was enough content to show multiple
snippets.

The amount of snippets has no effect on any of the types of breakIterator.
Only fragsize has effect.

Or is this highlighter not supporting multiple snippets?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Snippets-and-Boundaryscanner-in-Highlighter-tp3358898p3361510.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Snippets and Boundaryscanner in Highlighter

2011-09-23 Thread Koji Sekiguchi

(11/09/23 20:03), O. Klein wrote:

The regex fragmenter showed that there was enough content to show multiple
snippets.

The amount of snippets has no effect on any of the types of breakIterator.
Only fragsize has effect.

Or is this highlighter not supporting multiple snippets?


This highlighter supports multiple snippets (as I showed you at the first 
reply).

koji
--
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/


Re: Snippets and Boundaryscanner in Highlighter

2011-09-23 Thread O. Klein
OK, I found the problem was in our new interface.

Your feedback made me look deeper. Thanx.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Snippets-and-Boundaryscanner-in-Highlighter-tp3358898p3361571.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solritas issue in opening files

2011-09-23 Thread jagdish2011
Erik

I tried your solution.. but it still not open the files in solr results, I
am pasting my files.. take a look is somthing can be corrected :

data-config.xml:




 
 
 





 
 
 







Solrconfig.xml:

   
 synonyms.txt 
 anotherfile.txt 
   
 


Please suggest.
thanks 
Jagdish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solritas-issue-in-opening-files-tp3300836p3361588.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 1.4 facet.limit behaviour in merging from several shards

2011-09-23 Thread Dmitry Kan
Hi,

OK, if SOLR-2403 being related to the bug I described, has been fixed in
SOLR 3.4 than we are safe, since we are in the process of migration. Is it
possible to verify this somehow? Is FacetComponent class is the one I should
start checking this from? Can you give any other pointers?

OK, for the use case. Two things:

1. Regarding your confusion on docid and facets being sometimes 2: without
revealing the business details, I can tell you that we split our incoming
documents in parts and store them as separate documents in solr. But we use
common docid for all of them in order to restore the original document when
needed. However we would like to facet on them in normal solr facet meaning.
2. We have shards, because of the volume of data. As previously said, we use
logical sharding. Since each shard belong to a certain period of time (non
intersecting with other shards), we know in advance that a user query most
likely hits only specific shards. So we didn't want other shards to search
in vain and solr merger waiting for them. We implemented a router by
extending solr 1.4 source code. The router operates with 15 shards
currently.

Let me know, if I can help with more details on our use case.

Dmitry

On Wed, Sep 21, 2011 at 8:56 AM, Chris Hostetter
wrote:

>
> : with the setup you describe, there's no why i can imagine executing a
> : search that results in constraints being returned that come from multiple
> : shards with some constraints being "missing" from the middle of hte list,
> : near the border of values for that field that signify a change in shard.
>
> I take thta back ... after replying to your email i noticed the "1.4" in
> the subject, and it occured to me there might have been a bug in 1.4 that
> was since fixed.  doing a quick search i realized there *is* an open bug
> that relates to your problem that i just wasn't aware of...
>
> https://issues.apache.org/jira/browse/SOLR-2403
>
> there's some discussion in there that explains how the problem is
> happening -- the crux being that becuase the mincount can't be checked
> until the per-shard requests are merged, it's possible to miss some values
> when doing index (aka: "lex") ordering.
>
> the bad news is, a general fix sounds kind of hard.  the good news is,
> that for mincount=1, the solution seems pretty straight forward -- we just
> need someone to try working up a patch and some test cases.
>
> FWIW: I'd still like to here more about your usecase, because i still
> think there might be a better alternative...
>
> : Furthermore: what you are doing is a *really* wacky use of faceting ... i
> : have honestly never seen anything like it, hence my question about the
> : significance of the "docid1" and "docid2" in your example field values --
> : can you elaborate on what these values mean, and how you are ultimately
> : using the facet results you get back?  because i am seriously curious as
> : to your use case, and more then a little suspicios that there might be a
> : simpler and more efficient way to solve whatever use case you have...
> :
> : https://people.apache.org/~hossman/#xyproblem
> : XY Problem
> :
> : Your question appears to be an "XY Problem" ... that is: you are dealing
> : with "X", you are assuming "Y" will help you, and you are asking about
> "Y"
> : without giving more details about the "X" so that we can understand the
> : full issue.  Perhaps the best solution doesn't involve "Y" at all?
> : See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
> -Hoss
>



-- 
Regards,

Dmitry Kan


Re: DIH error when nested db datasource and file data source

2011-09-23 Thread Shalin Shekhar Mangar
On Sun, Sep 18, 2011 at 11:47 AM, abhayd  wrote:

> hi gora,
> Query works and if i remove xml data load indexing works fine too
>
> Problem seem to be with this
>
>   baseDir="${solr.solr.home}" fileName=".xml"
>recursive="false" rootEntity="true"
> dataSource="video_datasource">
>
> forEach="/gvpVideoMetaData/mediaItem[@media_id='${topic_tree.topic_id}']"
>url="${f.fileAbsolutePath}"
>>
>
> Basically how would i get details abt a id fetched from db using xpath from
> a xml file.
>
>
Is the following path mentioned in the error message correct?
C:\Projects\att\solr\catalogSOLRSearch.ear\SOLR-HOME\live_meta.xml

Also, the actual cause of the exception will also be in the logs. Can you
paste the complete stack trace?

-- 
Regards,
Shalin Shekhar Mangar.


Solrj - when a request fails

2011-09-23 Thread Walter Closenfleight
*
I have a java program which sends thousands of Solr XML files up to Solr
using the following code. It works fine until there is a problem with one of
the Solr XML files. The code fails on the solrServer.request(up) line, but
it does not throw an exception, my application therefore cannot catch it and
recover, and my whole application dies.

I've fixed this individual file that made it fail, but want to better trap
these so my application does not die.

Thanks for any insight you can provide. Java code and log below-


// ... start of a loop to process each file removed ...

try {

   String xml = read(filename);
   DirectXmlRequest up = new DirectXmlRequest( "/update", xml );

   solrServer.request( up );
   solrServer.commit();

} catch (SolrServerException e) {
   log.warn("Exception: "+ e.toString());
   throw new MyException(e);
} catch (IOException e) {
   log.warn("Exception: "+ e.toString());
   throw new MyException(e);
}
DEBUG >> "[\n]" - (Wire.java:70)
DEBUG Request body sent - (EntityEnclosingMethod.java:508)
DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
DEBUG << "Server: Apache-Coyote/1.1[\r][\n]" - (Wire.java:70)
DEBUG << "Content-Type: text/html;charset=utf-8[\r][\n]" - (Wire.java:70)
DEBUG << "Content-Length: 1271[\r][\n]" - (Wire.java:70)
DEBUG << "Date: Fri, 23 Sep 2011 12:08:05 GMT[\r][\n]" - (Wire.java:70)
DEBUG << "Connection: close[\r][\n]" - (Wire.java:70)
DEBUG << "[\r][\n]" - (Wire.java:70)
DEBUG << "Apache Tomcat/6.0.29 - Error
report
HTTP Status 400 - Unexpected character 'x' (code 120) in
prolog; expected '<'[\n]" - (Wire.java:70)
DEBUG << " at [row,col {unknown-source}]: [3,1]type Status reportmessage
Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
(Wire.java:70)
DEBUG << " at [row,col {unknown-source}]: [3,1]description
" - (Wire.java:84)
DEBUG << "The request sent by the client was syntactically incorrect
(Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
(Wire.java:70)
DEBUG << " at [row,col {unknown-source}]: [3,1]).Apache Tomcat/6.0.29" -
(Wire.java:84)
DEBUG Should close connection in response to directive: close -
(HttpMethodBase.java:1008)
*


Re: DataImportHandler using new connection on each query

2011-09-23 Thread Shalin Shekhar Mangar
On Sat, Sep 3, 2011 at 1:29 AM, Chris Hostetter wrote:

>
> : I am not sure if current version has this, but  DIH used to reload
> : connections after some idle time
> :
> : if (currTime - connLastUsed > CONN_TIME_OUT) {
> :   synchronized (this) {
> :   Connection tmpConn = factory.call();
> :   closeConnection();
> :   connLastUsed = System.currentTimeMillis();
> :   return conn = tmpConn;
> :   }
> :
> :
> : Where CONN_TIME_OUT = 10 seconds
>
> ...oh wow.  i saw the CONN_TIME_OUT constant but i thought (foolishly
> evidently) that CONN was "connect" as it a timeout on creating a
> connection, not a timeout on how long DIH is willing ot use a perfectly
> good connection.
>
> I honestly can't make heads or tails of why that code would exist.
>
> Noble? Shalin?  what's the point of throwing away a connection that's been
> in use for more then 10 seconds?
>
>
Hoss, as others have noted, DIH throws away connections which have been idle
for more than the timeout value (10 seconds). The jdbc standard way of
checking for a valid connection is not implemented or incorrectly
implemented by many drivers. So, either you can execute a query and get an
exception and try to determine if the exception was a case of an invalid
connection (which again is sometimes different from driver to driver) or
take the easy way out and throw away connections idle for more than 10
seconds, which is what we went for.

-- 
Regards,
Shalin Shekhar Mangar.


Re: StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-23 Thread Shawn Heisey

On 9/23/2011 1:45 AM, Pranav Prakash wrote:
Maybe I am wrong. But my intentions of using both of them is - first I 
want to use phrase queries so used CommonGramsFilterFactory. Secondly, 
I dont want those stopwords in my index, so I have used 
StopFilterFactory to remove them. 


CommonGrams is not necessary for phrase queries.  If you have a 
super-dense index with very large documents, it will reduce the amount 
of memory used by Solr, which can make them faster.  It comes at a large 
expense in disk space because your index gets considerably larger.  The 
cost trade-off in index size vs. memory usage may not be worth it.  For 
an index like the Hathi Trust, the tradeoff is worthwhile.



term frequencyto 26164and 25804the 25566of 25022a 24918in 24590for 23646n23588
with 23055is 22510


Is this typical of your production index size, or just a test?  With 
numbers this low, neither commongrams nor stopfilter is really 
necessary.  I suspect that these are probably test numbers, though.





  Did you do delete and do a full reindex after you changed your schema?


Yup I did that a couple of times


I don't know what's going  on here, but it sounds like your config might 
not be saying what you think it's saying.  It might be a good idea to 
include your entire schema.xml and the name of the field that you are 
looking at for term frequency.


Thanks,
Shawn



RE: Solrj - when a request fails

2011-09-23 Thread Gunther, Andrew
All the solr methods look like they should throw those 2 exceptions.
Have you tried the DirectXmlRequest method?

up.process(solrServer);

  public UpdateResponse process( SolrServer server ) throws 
SolrServerException, IOException
  {
long startTime = System.currentTimeMillis();
UpdateResponse res = new UpdateResponse();
res.setResponse( server.request( this ) );
res.setElapsedTime( System.currentTimeMillis()-startTime );
return res;
  }

From: Walter Closenfleight [walter.p.closenflei...@gmail.com]
Sent: Friday, September 23, 2011 8:55 AM
To: solr-user@lucene.apache.org
Subject: Solrj - when a request fails

*
I have a java program which sends thousands of Solr XML files up to Solr
using the following code. It works fine until there is a problem with one of
the Solr XML files. The code fails on the solrServer.request(up) line, but
it does not throw an exception, my application therefore cannot catch it and
recover, and my whole application dies.

I've fixed this individual file that made it fail, but want to better trap
these so my application does not die.

Thanks for any insight you can provide. Java code and log below-


// ... start of a loop to process each file removed ...

try {

   String xml = read(filename);
   DirectXmlRequest up = new DirectXmlRequest( "/update", xml );

   solrServer.request( up );
   solrServer.commit();

} catch (SolrServerException e) {
   log.warn("Exception: "+ e.toString());
   throw new MyException(e);
} catch (IOException e) {
   log.warn("Exception: "+ e.toString());
   throw new MyException(e);
}
DEBUG >> "[\n]" - (Wire.java:70)
DEBUG Request body sent - (EntityEnclosingMethod.java:508)
DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
DEBUG << "Server: Apache-Coyote/1.1[\r][\n]" - (Wire.java:70)
DEBUG << "Content-Type: text/html;charset=utf-8[\r][\n]" - (Wire.java:70)
DEBUG << "Content-Length: 1271[\r][\n]" - (Wire.java:70)
DEBUG << "Date: Fri, 23 Sep 2011 12:08:05 GMT[\r][\n]" - (Wire.java:70)
DEBUG << "Connection: close[\r][\n]" - (Wire.java:70)
DEBUG << "[\r][\n]" - (Wire.java:70)
DEBUG << "Apache Tomcat/6.0.29 - Error
report
HTTP Status 400 - Unexpected character 'x' (code 120) in
prolog; expected '<'[\n]" - (Wire.java:70)
DEBUG << " at [row,col {unknown-source}]: [3,1]type Status reportmessage
Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
(Wire.java:70)
DEBUG << " at [row,col {unknown-source}]: [3,1]description
" - (Wire.java:84)
DEBUG << "The request sent by the client was syntactically incorrect
(Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
(Wire.java:70)
DEBUG << " at [row,col {unknown-source}]: [3,1]).Apache Tomcat/6.0.29" -
(Wire.java:84)
DEBUG Should close connection in response to directive: close -
(HttpMethodBase.java:1008)
*


Re: Slow autocomplete(terms)

2011-09-23 Thread Otis Gospodnetic
Roy,

Use something other than Nabble or quote previous email to help people keep 
track of what your problem is/was about.
Yes, with edge ngrams you won't be able to do infix searches but are you 
sure you want that?  People typically don't miss/skip the beginning of a word...

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


- Original Message -
> From: roySolr 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Friday, September 23, 2011 3:15 AM
> Subject: Re: Slow autocomplete(terms)
> 
>T hanks for helping me so far,
> 
> Yes i have seen the edgeNGrams possiblity. Correct me if i'm wrong, but i
> thought it isn't possible to do infix searches with edgeNGrams? Like 
> "chest"
> gives suggestion "manchester".
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3361155.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to delete all of the Indexed data?

2011-09-23 Thread Otis Gospodnetic
Hi Ahmad,

Ah, that's a FAQ! :)
http://search-lucene.com/?q=delete+all+documents&fc_project=Solr&fc_type=wiki


Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


- Original Message -
> From: ahmad ajiloo 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Friday, September 23, 2011 3:10 AM
> Subject: How to delete all of the Indexed data?
> 
> Hi all
> I sent my data from Nutch to Solr for indexing and searching. Now I want to
> delete all of the indexed data sent from Nutch. Can anyone help me?
> thanks
>


Re: DIH error when nested db datasource and file data source

2011-09-23 Thread abhayd
hi 
I am not getting exception anymore.. I had issue with database

But now real problem i always have ...
Now that i can fetch ID's from database how would i fetch correcponding data
from ID in xm file

So after getting DB info from jdbcsource I use xpath processor like this,
but it does not work.

   

I even tried using script transformer but "row" in script transformer has
scope limited to entity "f"  If this is nested under another entity u cant
access top level variables with "row" .



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-error-when-nested-db-datasource-and-file-data-source-tp3345664p3362007.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: levenshtein ranked results

2011-09-23 Thread Otis Gospodnetic
Hi Roland,

I did this:
http://search-lucene.com/?q=sort+by+function&fc_project=Solr&fc_type=wiki


Which took me to this:
http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function


And further on that page you'll find strdist function documented:
http://wiki.apache.org/solr/FunctionQuery#strdist


I hope this helps.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message -
> From: Roland Tollenaar 
> To: "solr-user@lucene.apache.org" 
> Cc: 
> Sent: Friday, September 23, 2011 1:50 AM
> Subject: levenshtein ranked results
> 
> Hi,
> 
> I tried an internet search to find out how to query solr to get the results 
> ranked (ordered) by levenshtein distance.
> 
> This appears to be possible but I could not find a concrete example as to how 
> I 
> would have to formulate the query, or if its a schema setting on a particular 
> field, how to set up the schema.
> 
> I am new to solr, any help appreciated.
> 
> tia.
> 
> Roland.
>


Re: A fieldType for a address street

2011-09-23 Thread Otis Gospodnetic
Nicolas,

A text or ngram field should do it.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


- Original Message -
> From: Nicolas Martin 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Friday, September 23, 2011 5:55 AM
> Subject: A fieldType for a address street
> 
> Hi solR users!
> 
> I'd like to make research on my client database, in particular, i need to 
> find client by their address (ex : "100 avenue des champs élysée")
> 
> Does anyone know a good fieldType to store my addresses to enable me to 
> search 
> client by address easily ?
> 
> 
> thank you all
>


Re: autosuggest combination of data from documents and popular queries

2011-09-23 Thread abhayd
hi

My requirement is 
i have a list of popular search terms in database
seachterm | count
---
mango  | 100

Consider i have only oneterm in that table, mango. I use edgengram and put
that in auto_complete field in solr index with count.

If user starts typing "m" i wil show "mango" as suggestion. And other
suggestions should come from the document title in index. So if I have a
document in index with title "Man .." so suggestions would be
"mango"
"man"

Now say user starts typing "sa" now i dont have a popular search term then
it should show suggestions from index data 

Is this doable ? any options ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/autosuggest-combination-of-data-from-documents-and-popular-queries-tp3360657p3362049.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Slow autocomplete(terms)

2011-09-23 Thread abhayd
yes it is possible
http://www.medihack.org/2011/03/01/autocompletion-autosuggestion-using-solr/

Since i m looking into autosuggest i came across that info while doing
research..


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3362071.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrj - when a request fails

2011-09-23 Thread Walter Closenfleight
I tried that with the same results. You would think I would get the
exception back from Solr so I could trap it, instead I lose all other
requests after it.

On Fri, Sep 23, 2011 at 8:33 AM, Gunther, Andrew  wrote:

> All the solr methods look like they should throw those 2 exceptions.
> Have you tried the DirectXmlRequest method?
>
> up.process(solrServer);
>
>  public UpdateResponse process( SolrServer server ) throws
> SolrServerException, IOException
>  {
>long startTime = System.currentTimeMillis();
>UpdateResponse res = new UpdateResponse();
>res.setResponse( server.request( this ) );
>res.setElapsedTime( System.currentTimeMillis()-startTime );
>return res;
>  }
> 
> From: Walter Closenfleight [walter.p.closenflei...@gmail.com]
> Sent: Friday, September 23, 2011 8:55 AM
> To: solr-user@lucene.apache.org
> Subject: Solrj - when a request fails
>
> *
>  I have a java program which sends thousands of Solr XML files up to Solr
> using the following code. It works fine until there is a problem with one
> of
> the Solr XML files. The code fails on the solrServer.request(up) line, but
> it does not throw an exception, my application therefore cannot catch it
> and
> recover, and my whole application dies.
>
> I've fixed this individual file that made it fail, but want to better trap
> these so my application does not die.
>
> Thanks for any insight you can provide. Java code and log below-
>
>
> // ... start of a loop to process each file removed ...
>
> try {
>
>   String xml = read(filename);
>   DirectXmlRequest up = new DirectXmlRequest( "/update", xml );
>
>   solrServer.request( up );
>   solrServer.commit();
>
> } catch (SolrServerException e) {
>   log.warn("Exception: "+ e.toString());
>   throw new MyException(e);
> } catch (IOException e) {
>   log.warn("Exception: "+ e.toString());
>   throw new MyException(e);
> }
> DEBUG >> "[\n]" - (Wire.java:70)
> DEBUG Request body sent - (EntityEnclosingMethod.java:508)
> DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
> DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
> DEBUG << "Server: Apache-Coyote/1.1[\r][\n]" - (Wire.java:70)
> DEBUG << "Content-Type: text/html;charset=utf-8[\r][\n]" - (Wire.java:70)
> DEBUG << "Content-Length: 1271[\r][\n]" - (Wire.java:70)
> DEBUG << "Date: Fri, 23 Sep 2011 12:08:05 GMT[\r][\n]" - (Wire.java:70)
> DEBUG << "Connection: close[\r][\n]" - (Wire.java:70)
> DEBUG << "[\r][\n]" - (Wire.java:70)
> DEBUG << "Apache Tomcat/6.0.29 - Error
> report
> HTTP Status 400 - Unexpected character 'x' (code 120) in
> prolog; expected '<'[\n]" - (Wire.java:70)
> DEBUG << " at [row,col {unknown-source}]: [3,1] noshade="noshade">type Status reportmessage
> Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
> (Wire.java:70)
> DEBUG << " at [row,col {unknown-source}]:
> [3,1]description
> " - (Wire.java:84)
> DEBUG << "The request sent by the client was syntactically incorrect
> (Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
> (Wire.java:70)
> DEBUG << " at [row,col {unknown-source}]: [3,1]). noshade="noshade">Apache Tomcat/6.0.29" -
> (Wire.java:84)
> DEBUG Should close connection in response to directive: close -
> (HttpMethodBase.java:1008)
> *
>


Re: strategy for post-processing answer set

2011-09-23 Thread Fred Zimmerman
This seems to be out of date. I am running Solr 3.4

* the file structure of apachehome/contrib is different and I don't see
velocity anywhere underneath
* the page referenced below only talks about Solr 1.4 and 4.0

?

On Thu, Sep 22, 2011 at 19:51, Markus Jelsma wrote:

> Hi,
>
> Solr support the Velocity template engine and has veyr good support. Ideal
> for
> generating properly formatted output from the search engine. There's a
> clustering example and it's easy to format documents indexed by Nutch.
>
> http://wiki.apache.org/solr/VelocityResponseWriter
>
> Cheers
>
> > > Hi,
> >
> > I would like to take the HTML documents that are the result of a Solr
> > search and combine them into a single HTML document that combines the
> body
> > text of each individual document.  What is a good strategy for this? I am
> > crawling with Nutch and Carrot2 for clustering.
> > Fred
>


Solr 3.4 Problem with integrating Query Parser Plug In

2011-09-23 Thread Ahson Iqbal
Hi

I have indexed some 1M documents, just for performance testing. I have written 
a query parser plug, when i add it in solr lib folder under tomcat wepapps 
folder. and try to load solr admin page it keeps on loading and when I delete 
jar file of query parser plugin from lib it works fine. but jar file works good 
with solr 3.3 and also with solr 1.4.

please help.

Regards
Ahsan


Re: strategy for post-processing answer set

2011-09-23 Thread Fred Zimmerman
ok, answered my own question, found velocity rw in solrconfig.xml.  next
question:

where does velocity look for its templates?

-
Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS- for
monthly updates



On Fri, Sep 23, 2011 at 11:57, Fred Zimmerman  wrote:

> This seems to be out of date. I am running Solr 3.4
>
> * the file structure of apachehome/contrib is different and I don't see
> velocity anywhere underneath
> * the page referenced below only talks about Solr 1.4 and 4.0
>
> ?
>
> On Thu, Sep 22, 2011 at 19:51, Markus Jelsma 
> wrote:
>
>> Hi,
>>
>> Solr support the Velocity template engine and has veyr good support. Ideal
>> for
>> generating properly formatted output from the search engine. There's a
>> clustering example and it's easy to format documents indexed by Nutch.
>>
>> http://wiki.apache.org/solr/VelocityResponseWriter
>>
>> Cheers
>>
>> > > Hi,
>> >
>> > I would like to take the HTML documents that are the result of a Solr
>> > search and combine them into a single HTML document that combines the
>> body
>> > text of each individual document.  What is a good strategy for this? I
>> am
>> > crawling with Nutch and Carrot2 for clustering.
>> > Fred
>>
>
>


RE: JdbcDataSource and threads

2011-09-23 Thread Vazquez, Maria (STM)
Thanks Rahul.
Are you using 3.3 or 3.4? I'm on 3.3 right now
I will try the patch today
Thanks again,
Maria


-Original Message-
From: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com] 
Sent: Thursday, September 22, 2011 12:46 PM
To: solr-user@lucene.apache.org
Subject: Re: JdbcDataSource and threads

Hi,

Have you applied the patch that is provided with the Jira you mentioned
?
https://issues.apache.org/jira/browse/SOLR-2233

Please apply the patch and check if you are getting the same exceptions.
It has worked well for me till now.

On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) <
maria.vazq...@dexone.com> wrote:

> Hi!
>
> So as of 3.4 JdbcDataSource doesn't work with threads, correct?
>
>
>
> https://issues.apache.org/jira/browse/SOLR-2233
>
>
>
> I'm using Microsoft SQL Server, my data-config.xml has a lot of very
> complex SQL queries and it takes a long time to index.
>
> I'm migrating from Lucene to Solr and the Lucene code uses threads so
it
> takes little time to index, now in Solr if I add threads=xx to my
> rootEntity I get lots of errors about connections being closed.
>
>
>
> Thanks a lot,
>
> Maria
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: How to delete all of the Indexed data?

2011-09-23 Thread tamanjit.bin...@yahoo.co.in
Just another point worth mentioning here.. Though its related to Nutch and
not Solr..

If you want to re-crawl and try to get new data into the index, you have to
remove data from the crawl folder (default for nutch) of nutch too.. Only
then will you get fresh crawled data (not to be confused with re-crawled
data)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-delete-all-of-the-Indexed-data-tp3361148p3362558.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH error when nested db datasource and file data source

2011-09-23 Thread Pulkit Singhal
Few thoughts:

1) If you place the script transformer method on the entity named "x"
and then pass the ${topic_tree.topic_id} to that as an argument, then
shouldn't you have everything you need to work with x's row? Even if
you can't look up at the parent, all you needed to know was the
topic_id and based on that you can edit or not edit x's row ...
shouldn't that be sufficient to get you what you need to do?

2) Regarding the manner in which you are trying to use the following
xpath syntax:
forEach="/gvpVideoMetaData/mediaItem[@media_id='${topic_tree.topic_id}']"
There are two other closely related thread that I've come across:
(a) 
http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html
(b) 
http://lucene.472066.n3.nabble.com/using-DIH-with-mets-alto-file-sets-td1926642.html

They both seemed to want to use the full power of XPath like you do
and I think that in a roundabout way they were told utilize the xsl
attribute to make up for what the XPath was lacking by default.

Here are some choice words by Lance that I've extracted out for you:

"XPathEntityProcessor parses a very limited XPath syntax. However, you
can add an XSL script as an attribute, and this somehow gets called
instead."

- Lance


There is an option somewhere to use the full XML DOM implementation
for using xpaths. The purpose of the XPathEP is to be as simple and
dumb as possible and handle most cases: RSS feeds and other open
standards.
Search for xsl(optional)
http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1

- Lance

I hope you can make some sense of this, I'm no expert, but just
thought I'd offer my 2 cts.

On Fri, Sep 23, 2011 at 9:21 AM, abhayd  wrote:
> hi
> I am not getting exception anymore.. I had issue with database
>
> But now real problem i always have ...
> Now that i can fetch ID's from database how would i fetch correcponding data
> from ID in xm file
>
> So after getting DB info from jdbcsource I use xpath processor like this,
> but it does not work.
>  baseDir="${solr.solr.home}" fileName=".xml"
>                recursive="false" rootEntity="true"
> dataSource="video_datasource">
>           
> forEach="/gvpVideoMetaData/mediaItem[@media_id='${topic_tree.topic_id}']"
>            url="${f.fileAbsolutePath}"
>                    >
>
> I even tried using script transformer but "row" in script transformer has
> scope limited to entity "f"  If this is nested under another entity u cant
> access top level variables with "row" .
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/DIH-error-when-nested-db-datasource-and-file-data-source-tp3345664p3362007.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: JdbcDataSource and threads

2011-09-23 Thread Rahul Warawdekar
I am using Solr 3.1.
But you can surely try the patch with 3.3.

On Fri, Sep 23, 2011 at 1:35 PM, Vazquez, Maria (STM) <
maria.vazq...@dexone.com> wrote:

> Thanks Rahul.
> Are you using 3.3 or 3.4? I'm on 3.3 right now
> I will try the patch today
> Thanks again,
> Maria
>
>
> -Original Message-
> From: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com]
> Sent: Thursday, September 22, 2011 12:46 PM
> To: solr-user@lucene.apache.org
> Subject: Re: JdbcDataSource and threads
>
> Hi,
>
> Have you applied the patch that is provided with the Jira you mentioned
> ?
> https://issues.apache.org/jira/browse/SOLR-2233
>
> Please apply the patch and check if you are getting the same exceptions.
> It has worked well for me till now.
>
> On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) <
> maria.vazq...@dexone.com> wrote:
>
> > Hi!
> >
> > So as of 3.4 JdbcDataSource doesn't work with threads, correct?
> >
> >
> >
> > https://issues.apache.org/jira/browse/SOLR-2233
> >
> >
> >
> > I'm using Microsoft SQL Server, my data-config.xml has a lot of very
> > complex SQL queries and it takes a long time to index.
> >
> > I'm migrating from Lucene to Solr and the Lucene code uses threads so
> it
> > takes little time to index, now in Solr if I add threads=xx to my
> > rootEntity I get lots of errors about connections being closed.
> >
> >
> >
> > Thanks a lot,
> >
> > Maria
> >
> >
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Solr Size Estimator (JIRA#3435) . . .

2011-09-23 Thread CRB

Hi,

In working through some updates for the Solr Size Estimator, I have 
found a number of gaps in the Solr Wiki. I've Google'd to a fair degree 
on each of these and either found nothing or an insufficient explanation.


In particular, for each of the following I'm looking for:
A) An explanation of what it is
B) How to use it or estimate its size

Topics:
1) fieldValueCache
2) RamBufferSize
3) Transient Factor
4) Average number of Bytes per Term
5) Cache Key Average Size (Bytes)
6) Avgerage QueryResultKey size (in bytes)

Appreciate any input, so I can update the Solr Wiki as needed.

C


Re: strategy for post-processing answer set

2011-09-23 Thread Erik Hatcher
conf/velocity by default.  See Solr's example configuration.  

   Erik

On Sep 23, 2011, at 12:37, Fred Zimmerman  wrote:

> ok, answered my own question, found velocity rw in solrconfig.xml.  next
> question:
> 
> where does velocity look for its templates?
> 
> -
> Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS- for
> monthly updates
> 
> 
> 
> On Fri, Sep 23, 2011 at 11:57, Fred Zimmerman  wrote:
> 
>> This seems to be out of date. I am running Solr 3.4
>> 
>> * the file structure of apachehome/contrib is different and I don't see
>> velocity anywhere underneath
>> * the page referenced below only talks about Solr 1.4 and 4.0
>> 
>> ?
>> 
>> On Thu, Sep 22, 2011 at 19:51, Markus Jelsma 
>> wrote:
>> 
>>> Hi,
>>> 
>>> Solr support the Velocity template engine and has veyr good support. Ideal
>>> for
>>> generating properly formatted output from the search engine. There's a
>>> clustering example and it's easy to format documents indexed by Nutch.
>>> 
>>> http://wiki.apache.org/solr/VelocityResponseWriter
>>> 
>>> Cheers
>>> 
> Hi,
 
 I would like to take the HTML documents that are the result of a Solr
 search and combine them into a single HTML document that combines the
>>> body
 text of each individual document.  What is a good strategy for this? I
>>> am
 crawling with Nutch and Carrot2 for clustering.
 Fred
>>> 
>> 
>> 


RE: JdbcDataSource and threads

2011-09-23 Thread Vazquez, Maria (STM)
I tried the patch
(https://issues.apache.org/jira/secure/attachment/12481497/SOLR-2233-001
.patch)

And now I get these errors. Am I doing something wrong? Using MS SQL
Server

23 Sep 2011 12:26:14,418
[org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper]
Exception in entity : keyword_atts_expedite
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT String AS keyword_atts_expedite FROM
tbObjectProperty WITH (NOLOCK) WHERE FKObject =
'97F67CC9-B25D-416F-801C-863B5D5E4911' AND state = 0 AND FKProperty =
4226 ORDER BY String
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThr
ow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:251)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource
.java:206)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource
.java:39)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntit
yProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityP
rocessor.java:73)
at
org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper.nextRo
w(ThreadedEntityProcessorWrapper.java:84)
at
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(Do
cBuilder.java:445)
at
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilde
r.java:398)
at
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(Do
cBuilder.java:465)
at
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilde
r.java:398)
at
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(Do
cBuilder.java:465)
at
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilde
r.java:398)
at
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(Do
cBuilder.java:465)
at
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.access$000(Do
cBuilder.java:352)
at
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner$1.run(DocBuil
der.java:405)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
r.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
va:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.sql.SQLException: Network error IOException: Address
already in use: connect
at
net.sourceforge.jtds.jdbc.ConnectionJDBC2.(ConnectionJDBC2.java:41
0)
at
net.sourceforge.jtds.jdbc.ConnectionJDBC3.(ConnectionJDBC3.java:50
)
at net.sourceforge.jtds.jdbc.Driver.connect(Driver.java:184)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.
java:157)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.
java:124)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:238)
... 17 more

-Original Message-
From: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com] 
Sent: Friday, September 23, 2011 11:21 AM
To: solr-user@lucene.apache.org
Subject: Re: JdbcDataSource and threads

I am using Solr 3.1.
But you can surely try the patch with 3.3.

On Fri, Sep 23, 2011 at 1:35 PM, Vazquez, Maria (STM) <
maria.vazq...@dexone.com> wrote:

> Thanks Rahul.
> Are you using 3.3 or 3.4? I'm on 3.3 right now
> I will try the patch today
> Thanks again,
> Maria
>
>
> -Original Message-
> From: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com]
> Sent: Thursday, September 22, 2011 12:46 PM
> To: solr-user@lucene.apache.org
> Subject: Re: JdbcDataSource and threads
>
> Hi,
>
> Have you applied the patch that is provided with the Jira you
mentioned
> ?
> https://issues.apache.org/jira/browse/SOLR-2233
>
> Please apply the patch and check if you are getting the same
exceptions.
> It has worked well for me till now.
>
> On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) <
> maria.vazq...@dexone.com> wrote:
>
> > Hi!
> >
> > So as of 3.4 JdbcDataSource doesn't work with threads, correct?
> >
> >
> >
> > https://issues.apache.org/jira/browse/SOLR-2233
> >
> >
> >
> > I'm using Microsoft SQL Server, my data-config.xml has a lot of very
> > complex SQL queries and it takes a long time to index.
> >
> > I'm migrating from Lucene to Solr and the Lucene code uses threads
so
> it
> > takes little time to index, now in Solr if I add threads=xx to my
> > rootEntity I get lots of errors about connections being closed.
> >
> >
> >
> > Thanks a lot,
> >
> > Maria
> >
> >
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Search query doesn't work in solr/browse pnnel

2011-09-23 Thread hadi
When I create a query like "something&fl=content" in solr/browse the "&" and
"=" in URL converted to %26 and %3D and no result occurs. but it works in
solr/admin advanced search and also in URL bar directly, How can I solve
this problem?  Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-query-doesn-t-work-in-solr-browse-pnnel-tp3363032p3363032.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: two cores but have single result set in solr

2011-09-23 Thread Ken Krugler

On Sep 23, 2011, at 2:03pm, hadi wrote:

> I have to cores with seprate schema and index but i want to have single
> result set in solr/browse,

If they have different schemas, how would you combine results from the two?

If they have the same schemas, then you can define a third core with a 
different conf dir, and in that separate conf/solrschema.xml you can set up a 
request handler that just dispatches to the two real cores.

-- Ken

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr





Re: two cores but have single result set in solr

2011-09-23 Thread hadi
I index my files with solrj and crawl my sites with nutch 1.3 ,as you
know, i have to overwrite the nutch schema on solr schema in order to
have view the result in solr/browse, in this case i should define two
cores,but i want have single result or the user can search into both
core indexes at the same time


--
View this message in context: 
http://lucene.472066.n3.nabble.com/two-cores-but-have-single-result-set-in-solr-tp3363043p3363133.html
Sent from the Solr - User mailing list archive at Nabble.com.

what are the disdvantages of using dynamic fields?

2011-09-23 Thread Jason Toy
Hi all,

 I'd like to know what the specific disadvantages are for using dynamic
fields in my schema are? About half of my fields are dynamic, but I could
move all of them to be static fields. WIll my searches run faster? If there
are no disadvantages, can I just set all my fields to be dynamic?

Jason


Re: two cores but have single result set in solr

2011-09-23 Thread Yury Kats
On 9/23/2011 6:00 PM, hadi wrote:
> I index my files with solrj and crawl my sites with nutch 1.3 ,as you
> know, i have to overwrite the nutch schema on solr schema in order to
> have view the result in solr/browse, in this case i should define two
> cores,but i want have single result or the user can search into both
> core indexes at the same time

Can you not use 'shard' parameter and specify both cores there?



Re: Troubleshooting OOM in DIH w/ FileListEntityProcessor and XPathEntityProcessor

2011-09-23 Thread Erick Erickson
The first thing I'd try is just tweaking the Xmx parameter on the invocation,
java -Xmx2048M -jar start.jar

Second option: Play with your  options in solrconfig.xml
and lower it substantially, although I'm not quite sure how DIH interacts
with that.

Gotta rush, so sorry this is so terse.

Best
Erick

On Tue, Sep 20, 2011 at 2:45 PM, Pulkit Singhal  wrote:
> Hello Everyone,
>
> I need help in:
> (a) figuring out the causes of OutOfMemoryError (OOM) when I run Data
> Import Handler (DIH),
> (b) finding workarounds and fixes to get rid of the OOM issue per cause.
>
> The stacktrace is at the very bottom to avoid having your eyes glaze
> over and to prevent you from skipping this thread ;)
>
> 1) Based on the documentation so far, I would say that "batchSize"
> based control does not exist for FileListEntityProcessor or
> XPathEntityProcessor. Please correct me if I'm wrong about this.
>
> 2) The files being processed by FileListEntityProcessor range from
> 90.9 to 2.8 MB in size.
> 2.1) Is there some way to let FileListEntityProcessor bring in only
> one file at a time? Or is that the default already?
> 2.2) Is there some way to let FileListEntityProcessor stream the file
> to its nested XPathEntityProcessor?
> 2.3) If streaming a file is something that should be configured
> directly on XPathEntityProcessor, then please let me know how to do
> that as well.
>
> 3) Where are the default xms and xmx for Solr configured? Please let
> me know so I may try tweaking them for startup.
>
> 
> STACKTRACE:
> 
> SEVERE: Exception while processing: bbyopenProductsArchive document : null:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.OutOfMemoryError: Java heap space
>        at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:718)
> ...
> Caused by: java.lang.OutOfMemoryError: Java heap space
>        at java.util.Arrays.copyOf(Arrays.java:2734)
>        at java.util.ArrayList.toArray(ArrayList.java:275)
>        at java.util.ArrayList.(ArrayList.java:131)
>        at 
> org.apache.solr.handler.dataimport.XPathRecordReader$Node.getDeepCopy(XPathRecordReader.java:586)
> ...
> INFO: start rollback
> Sep 20, 2011 4:22:26 PM org.apache.solr.handler.dataimport.SolrWriter rollback
> SEVERE: Exception while solr rollback.
> java.lang.NullPointerException
>        at 
> org.apache.solr.update.DefaultSolrCoreState.rollbackIndexWriter(DefaultSolrCoreState.java:73)
>


Getting facet counts for 10,000 most relevant hits

2011-09-23 Thread Burton-West, Tom
If relevance ranking is working well, in theory it doesn't matter how many hits 
you get as long as the best results show up in the first page of results.  
However, the default in choosing which facet values to show is to show the 
facets with the highest count in the entire result set.  Is there a way to 
issue some kind of a filter query or facet query that would show only the facet 
counts for the 10,000 most relevant search results?

As an example, if you search in our full-text collection for "jaguar" you get 
170,000 hits.  If I am looking for the car rather than the OS or the animal, I 
might expect to be able to click on a facet and limit my results to the car.  
However, facets containing the word car or automobile are not in the top 5 
facets that we show.  If you click on "more"  you will see "automobile 
periodicals" but not the rest of the facets containing the word automobile .  
This occurs because the facet counts are for all 170,000 hits.  The facet 
counts  for at least 160,000 irrelevant hits are included (assuming only the 
top 10,000 hits are relevant) .

What we would like to do is get the facet counts for the N most relevant 
documents and select the 5 or 30 facet values with the highest counts for those 
relevant documents.

Is this possible or would it require writing some lucene or Solr code?

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search


Re: DIH error when nested db datasource and file data source

2011-09-23 Thread abhayd
hi
thanks for details. I will look into xsl suggestion.

Any idea how would i send parameter to script? 
As i understand thats the syntax for script transformer


--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-error-when-nested-db-datasource-and-file-data-source-tp3345664p3363762.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JdbcDataSource and threads

2011-09-23 Thread pulkitsinghal
Seems to be a rather innocent network issue based on your stacktrace:

Caused by: java.sql.SQLException: Network error IOException: Address
already in use: connect

Can you recheck connections and retry?

Sent from my iPhone

On Sep 23, 2011, at 3:34 PM, "Vazquez, Maria \(STM\)" 
 wrote:

> I tried the patch
> (https://issues.apache.org/jira/secure/attachment/12481497/SOLR-2233-001
> .patch)
> 
> And now I get these errors. Am I doing something wrong? Using MS SQL
> Server
> 
> 23 Sep 2011 12:26:14,418
> [org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper]
> Exception in entity : keyword_atts_expedite
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: SELECT String AS keyword_atts_expedite FROM
> tbObjectProperty WITH (NOLOCK) WHERE FKObject =
> '97F67CC9-B25D-416F-801C-863B5D5E4911' AND state = 0 AND FKProperty =
> 4226 ORDER BY String
>at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThr
> ow(DataImportHandlerException.java:72)
>at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator. t>(JdbcDataSource.java:251)
>at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource
> .java:206)
>at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource
> .java:39)
>at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntit
> yProcessor.java:59)
>at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityP
> rocessor.java:73)
>at
> org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper.nextRo
> w(ThreadedEntityProcessorWrapper.java:84)
>at
> org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(Do
> cBuilder.java:445)
>at
> org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilde
> r.java:398)
>at
> org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(Do
> cBuilder.java:465)
>at
> org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilde
> r.java:398)
>at
> org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(Do
> cBuilder.java:465)
>at
> org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilde
> r.java:398)
>at
> org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(Do
> cBuilder.java:465)
>at
> org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.access$000(Do
> cBuilder.java:352)
>at
> org.apache.solr.handler.dataimport.DocBuilder$EntityRunner$1.run(DocBuil
> der.java:405)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
> r.java:886)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
> va:908)
>at java.lang.Thread.run(Thread.java:662)
> Caused by: java.sql.SQLException: Network error IOException: Address
> already in use: connect
>at
> net.sourceforge.jtds.jdbc.ConnectionJDBC2.(ConnectionJDBC2.java:41
> 0)
>at
> net.sourceforge.jtds.jdbc.ConnectionJDBC3.(ConnectionJDBC3.java:50
> )
>at net.sourceforge.jtds.jdbc.Driver.connect(Driver.java:184)
>at
> org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.
> java:157)
>at
> org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.
> java:124)
>at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator. t>(JdbcDataSource.java:238)
>... 17 more
> 
> -Original Message-
> From: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com]
> Sent: Friday, September 23, 2011 11:21 AM
> To: solr-user@lucene.apache.org
> Subject: Re: JdbcDataSource and threads
> 
> I am using Solr 3.1.
> But you can surely try the patch with 3.3.
> 
> On Fri, Sep 23, 2011 at 1:35 PM, Vazquez, Maria (STM) <
> maria.vazq...@dexone.com> wrote:
> 
>> Thanks Rahul.
>> Are you using 3.3 or 3.4? I'm on 3.3 right now
>> I will try the patch today
>> Thanks again,
>> Maria
>> 
>> 
>> -Original Message-
>> From: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com]
>> Sent: Thursday, September 22, 2011 12:46 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: JdbcDataSource and threads
>> 
>> Hi,
>> 
>> Have you applied the patch that is provided with the Jira you
> mentioned
>> ?
>> https://issues.apache.org/jira/browse/SOLR-2233
>> 
>> Please apply the patch and check if you are getting the same
> exceptions.
>> It has worked well for me till now.
>> 
>> On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) <
>> maria.vazq...@dexone.com> wrote:
>> 
>>> Hi!
>>> 
>>> So as of 3.4 JdbcDataSource doesn't work with threads, correct?
>>> 
>>> 
>>> 
>>> https://issues.apache.org/jira/browse/SOLR-2233
>>> 
>>> 
>>> 
>>> I'm using Microsoft SQL Server, my data-config.xml has a lot of very
>>> complex SQL queries and it takes a long time to index.
>>> 
>>> I'm migrating from Lucene to Solr and the Lucene code uses threads
> so
>> it
>>> takes little time to index, now in Solr if I add threads=xx to

Re: how work with rss in solr?

2011-09-23 Thread Gora Mohanty
On Fri, Sep 23, 2011 at 11:59 AM, nagarjuna  wrote:
> yaa Gora i set up rss feed to my blog and i have the following url for the
> rss feed of my blog

It would be best if you stated your exact problem up front,
rather than having to dig through to find where exactly the
issue lies.

> http://nagarjunaavula.blogspot.com/feeds/posts/default?alt=rss
> http://nagarjunaavula.blogspot.com/feeds/posts/default?alt=rss  u can check
> this url.then how to use this url in my solr application i am not
> sure about the changes needed in the rss-data-config.xmlcan pls list the
> changes i need to do in the schema,solrconfig,rss-data-config files

This would depend on the structure of your blog. The only changes
you should need to make are in the DIH configuration file,
rss-data-config.xml. Please take a look at the input XML it indexes,
i.e., the feed from http://twitter.com/statuses/user_timeline/ctg_ualbany.atom
XPathEntityProcessor, and the way that the xpath for various fields
is set to pick up different entries from the XML input. You will have
to set up the xpath to pick up the portions of your feed that you are
interested in. Please do follow up if you run into issues in doing that,
but please be more specific about how you modified the DIH configuration,
and what errors you are running into.

Regards,
Gora