date:20100930

Re: DataImportHandler dynamic fields clarification

2010-09-30 Thread David Stuart

Two things, one are your DB column uppercase as this would effect the out.

Second what does your db-data-config.xml look like

Regards,

Dave

On 30 Sep 2010, at 03:01, harrysmith wrote:

> 
> Looking for some clarification on DIH to make sure I am interpreting this
> correctly.
> 
> I have a wide DB table, 100 columns. I'd rather not have to add 100 values
> in schema.xml and data-config.xml. I was under the impression that if the
> column name matched a dynamic Field name, it would be added. I am not
> finding this is the case, but only works when the column name is explicitly
> listed as a static field.
> 
> Example: 100 column table, columns named 'COLUMN_1, COLUMN_2 ... COLUMN_100'
> 
> If I add something like:
> 
> to schema.xml, and don't reference the column in data-config entity/field
> tag, it gets imported, as expected.
> 
> However, if I use:
>  stored="true"/>
> It does not get imported into Solr, I would expect it would.
> 
> 
> Is this the expected behavior?
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1606159.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Solr Cluster Indexing data question

2010-09-30 Thread ZAROGKIKAS,GIORGOS

Hi there solr experts


I have an solr cluster with two nodes  and separate index files for each
node

Node1 is master 
Node2 is slave 


Node1 is the one that I index my data and replicate them to Node2

How can I index my data at both nodes simultaneously ?
Is there any specific setup 


The problem is when my Node1 is down and I index the data from Node2 ,
Solr creates backup index folders like this "index.20100929060410"
and reduce the space of my hard disk

Thanks in advance

General hardware requirements?

2010-09-30 Thread Nicholas Swarr


Our index is about 10 gigs in size with about 3 million documents.  The 
documents range in size from dozens to hundreds of kilobytes.  Per week, we 
only get about 50k queries.
Currently, we use lucene and have one box for our indexer that has 32 gigs of 
memory and an 8 core CPU.  We have a pair of search boxes that have about 16 
gigs of ram a piece and 8 core CPUs.  They hardly break a sweat.
We're looking to adopt Solr.  Should we consider changing our configuration at 
all?  Are there any other hardware considerations for adopting Solr?
Thanks,Nick

Re: General hardware requirements?

2010-09-30 Thread Gora Mohanty

On Thu, Sep 30, 2010 at 8:09 PM, Nicholas Swarr  wrote:
>
> Our index is about 10 gigs in size with about 3 million documents.  The 
> documents range in size from dozens to hundreds of kilobytes.  Per week, we 
> only get about 50k queries.
> Currently, we use lucene and have one box for our indexer that has 32 gigs of 
> memory and an 8 core CPU.  We have a pair of search boxes that have about 16 
> gigs of ram a piece and 8 core CPUs.  They hardly break a sweat.
> We're looking to adopt Solr.  Should we consider changing our configuration 
> at all?  Are there any other hardware considerations for adopting Solr?
[...]

On the face of it, your machines should easily be adequate for the
the search volume you are looking at. However, there are other things
that you should consider:
* How are you indexing? What are acceptable times for this?
* Are there any new Solr-specific features that you are considering
  using, e.g., faceting? What performance benchmarks are you looking
  to achieve?
* What is your front-end for the search? Where is it running?

Regards,
Gora

Multiple Indexes and relevance ranking question

2010-09-30 Thread Valli Indraganti

I an new to Solr and the search technologies. I am playing around with
multiple indexes. I configured Solr for Tomcat, created two tomcat fragments
so that two solr webapps listen on port 8080 in tomcat. I have created two
separate indexes using each webapp successfully.

My documents are very primitive. Below is the structure. I have four such
documents with different doc id and increased number of the word "Hello"
corresponding to the name of the document (this is only to make my analysis
of the results easier). Documents One and two are in shar1 and three and
four are in shard 2. obviously, document two is ranked higher when queried
against that index (for the word Hello). And document four is ranked higher
when queried against second index. When using the shards, parameter, the
scores remain unaltered.
My question is, if the distributed search does not consider IDF, how is it
able to rank these documents correctly? Or do I not have the indexes truely
distributed? Is something wrong with my term distribution?


 - <#> 
   Valli1
   One
   Hello!This is a test document testing relevancy
scores.

Re: Solr Cluster Indexing data question

2010-09-30 Thread Jak Akdemir

If you want to use both of your nodes for building index (which means
two master), it makes them unified and collapses master slave
relation.

Would you take a look the link below for index snapshot problem?
http://wiki.apache.org/solr/SolrCollectionDistributionScripts

On Thu, Sep 30, 2010 at 11:03 AM, ZAROGKIKAS,GIORGOS
 wrote:
> Hi there solr experts
>
>
> I have an solr cluster with two nodes  and separate index files for each
> node
>
> Node1 is master
> Node2 is slave
>
>
> Node1 is the one that I index my data and replicate them to Node2
>
> How can I index my data at both nodes simultaneously ?
> Is there any specific setup 
>
>
> The problem is when my Node1 is down and I index the data from Node2 ,
> Solr creates backup index folders like this "index.20100929060410"
> and reduce the space of my hard disk
>
> Thanks in advance
>
>
>
>
>
>
>

can i have more update processors with solr

2010-09-30 Thread Dilshod Temirkhodjaev

I don't know if this is bug or not, but when i'm writing this in
solrconfig.xml


  
CustomRank
dedupe
  


only first update.processor works, why second is not working?

RE: Tuning Solr caches with high commit rates (NRT)

2010-09-30 Thread Bruce Ritchie

> One strategy that I like, but haven't found in discussion lists is
> auto-limiting cache size/warming based on available resources (similar
> to the way file system caches use free memory). This would allow
> caches to adjust to their memory environment as indexes grow.

I've written such a cache for use as a Voldemort store in the past. I'm going 
to rewrite it in the near future to improve the code however the general idea 
can be seen at http://code.google.com/p/project-voldemort/issues/detail?id=225

The trickiest part of doing an auto-limiting cache based on available memory
is making sure that it works nicely with the garbage collector. Getting that 
balance right so that the gc doesn't churn needlessly took me more time than 
writing the cache.

Bruce

Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-09-30 Thread Renee Sun


Hi -
I posted this problem but no response, I guess I need to post this in the
Solr-User forum. Hopefully you will help me on this.

We were running Solr 1.3 for long time, with 130 cores. Just upgrade to Solr
1.4, then when we start the Solr, it took about 45 minutes. The catalina.log
shows Solr is very slowly loading all the cores.

We did optimize, did not help at all.

I run JConsole to monitor the memory. I notice the first 70 cores were
loaded pretty fast, like in 3,4 minutes.

But after that, the memory went all way up to about 15GB (we allocated 16GB
to solr), and it slows down
right there, slower and slower. We use concurrent GC. JConsole shows only
ParNew GCs kicked off, but it doesnt bring down the memory.

With Solr 1.3, all 130 cores loaded in 5,6 minutes. 

Please let me know if there is known memory issue with Solr 1.4. Or is there
something (configuration)
we need to tweak to make it work efficiently in 1.4?

thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1608728.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Is Solr right for my business situation ?

2010-09-30 Thread Dennis Gearon

You need to be able to query the database with the 'Mother of all Queries', 
i.e. one that completely flattens all tables into each row. 

In other words, the JOIN section of the query will have EVERY table in it, and 
depending on your schema, some of them twice or more.

Trying to do that with CSV, separate tables would require you to put those into 
your OWN database, then query against that, as above.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/29/10, Sharma, Raghvendra  wrote:

> From: Sharma, Raghvendra 
> Subject: RE: Is Solr right for my business situation ?
> To: "solr-user@lucene.apache.org" 
> Date: Wednesday, September 29, 2010, 9:40 AM
> Some questions.  
> 
> 1. I have about 3-5 tables. Now designing schema.xml for a
> single table looks ok, but whats the direction for handling
> multiple table structures is something I am not sure about.
> Would it be like a big huge xml, wherein those three tables
> (assuming its three) would show up as three different
> tag-trees, nullable. 
> 
> My source provides me a single flat file per table (tab
> delimited).
> 
> Do you think having multiple indexes could be a solution
> for this case ?? or do I really need to spend effort in
> denormalizing the data ?
> 
> 2. Further, loading into solr can use some perf tuning..
> any tips ? best practices ?
> 
> 3. Also, is there a way to specify a xslt at the server
> side, and make it default, i.e. whenever a response is
> returned, that xslt is applied to the response
> automatically...
> 
> 4. And last question for the day - :) there was one post
> saying that the spatial support is really basic in solr and
> is going to be improved in next versions... Can you ppl help
> me get a definitive yes or no on spatial support... in the
> current form, does it work on not ? I would store lat and
> long, and would need to make them searchable...
> 
> --raghav..
> 
> -Original Message-
> From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
> 
> Sent: Tuesday, September 28, 2010 11:45 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Is Solr right for my business situation ?
> 
> Thanks for the responses people.
> 
> @Grant  
> 
> 1. can you show me some direction on that.. loading data
> from an incoming stream.. do I need some third party tools,
> or need to build something myself...
> 
> 4. I am basically attempting to build a very fast search
> interface for the existing data. The volume I mentioned is
> more like static one (data is already there). The sql
> statements I mentioned are daily updates coming. The good
> thing is that the history is not there, so the overall
> volume is not growing, but I need to apply the update
> statements. 
> 
> One workaround I had in mind is, (though not so great
> performance) is to apply the updates to a copy of rdbms, and
> then feed the rdbms extract to solr.  Sounds like
> overkill, but I don't have another idea right now. Perhaps
> business discussions would yield something.
> 
> @All -
> 
> Some more questions guys.  
> 
> 1. I have about 3-5 tables. Now designing schema.xml for a
> single table looks ok, but whats the direction for handling
> multiple table structures is something I am not sure about.
> Would it be like a big huge xml, wherein those three tables
> (assuming its three) would show up as three different
> tag-trees, nullable. 
> 
> My source provides me a single flat file per table (tab
> delimited).
> 
> 2. Further, loading into solr can use some perf tuning..
> any tips ? best practices ?
> 
> 3. Also, is there a way to specify a xslt at the server
> side, and make it default, i.e. whenever a response is
> returned, that xslt is applied to the response
> automatically...
> 
> 4. And last question for the day - :) there was one post
> saying that the spatial support is really basic in solr and
> is going to be improved in next versions... Can you ppl help
> me get a definitive yes or no on spatial support... in the
> current form, does it work on not ? I would store lat and
> long, and would need to make them searchable...
> 
> Looks like I m close to my solution.. :)
> 
> --raghav
> 
> -Original Message-
> From: Grant Ingersoll [mailto:gsing...@apache.org]
> 
> Sent: Tuesday, September 28, 2010 1:05 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Is Solr right for my business situation ?
> 
> Inline.
> 
> On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:
> 
> > When do you need to deploy?
> > 
> > As I understand it, the spatial search in Solr is
> being rewritten and is slated for Solr 4.0, the release
> after next.
> 
> It will be in 3.x, the next release
> 
> > 
> > The existing spatial search has some serious problems
> and is deprecated.
> > 
> > Right now, I think the only way to get spatial search
> in Solr is to deploy a nightly snapshot from the active
> development on trunk.

Re: spatial sorting

2010-09-30 Thread dan sutton

Hi All,

This is more of an FYI for those wanting to filter and sort by distance, and
have the values returned in the result set after determining a way to do
this with existing code.

Using solr 4.0 an example query would contain the following parameters:

/select?

q=stevenage^0.0
+_val_:"ghhsin(6371,geohash(52.0274,-0.4952),location)"^1.0

Make the boost on all parts of the query other than the ghhsin distance
value function 0 ,and 1 on the function, this is so that the score is then
equal to the distance. (52.0274,-0.4952) here is the query point and
'location' is the geohash field to search against

sort=score asc

basically sort by distance asc (closest first)

fq={!sfilt%20fl=location}&pt=52.0274,-0.4952&d=30

This is the spatial filter to limit the necessary distance calculations.

fl=*,score

Return all fields (if required) but include the score (which contains the
distance calculation)

Does anyone know if it's possible to return the distance and score
separately?  I know there has been a patch to sort by value function, but
how can one return the values from this?

Cheers,
Dan

On Fri, Sep 17, 2010 at 2:45 PM, dan sutton  wrote:

> Hi,
>
> I'm trying to filter and sort by distance with this URL:
>
>
> http://localhost:8080/solr/select/?q=*:*&fq={!sfilt%20fl=loc_lat_lon}&pt=52.02694,-0.49567&d=2&sort={!func}hsin(52.02694,-0.49567,loc_lat_lon_0_d,%20loc_lat_lon_1_d,3963.205)asc
>
> Filtering is fine but it's failing in parsing the sort with :
>
> "The request sent by the client was syntactically incorrect (can not sort
> on undefined field or function: {!func}(52.02694,-0.49567,loc_lat_lon_0_d,
> loc_lat_lon_1_d, 3963.205))."*
>
> *I'm using the solr/lucene trunk to try this out ... does anyone know what
> is wrong with the syntax?
>
> Additionally am I able to return the distance sort values e.g. with param
> fl ? ... else am I going to have to either write my own component (which
> would also look up the filtered cached values rather than re-calculating
> distance) or use an alternative like localsolr ?
>
> Dan
>

Re: Memory usage

2010-09-30 Thread Jeff Moss

There are 14,696,502 documents, we are doing a lot of funky stuff but I'm
not sure which is most likely to cause an impact. We're sorting on a dynamic
field there are about 1000 different variants of this field that look like
"priority_sort_for_", which is an integer field. I've heard that
sorting can have a big impact on memory consumption, could that be it?

How do I find out about the number of unique words in a field? These aren't
very large documents but there is a text field that contains user input,
which may be html and javascript so lots of symbols in there.

Thanks,

-Jeff

On Wed, Sep 29, 2010 at 9:07 PM, Lance Norskog  wrote:

> How many documents are there? How many unique words are in a text
> field? Both of these numbers can have a non-linear effect on the
> amount of space used.
>
> But, usually a 22Gb index (on disk) might need 6-12G of ram total.
> There is something odd going on here.
>
> Lance
>
> On Wed, Sep 29, 2010 at 4:34 PM, Jeff Moss  wrote:
> > My server has 128GB of ram, the index is 22GB large. It seems the memory
> > consumption goes up on every query and the garbage collector will never
> free
> > up as much memory as I expect it to. The memory consumption looks like a
> > curve, it eventually levels off but the old gen is always 60 or 70GB. I
> have
> > tried adjusting the cache settings but it doesn't seem to make any
> > difference.
> >
> > Is there something I'm doing wrong or is this expected behavior?
> >
> > Here is a screenshot of what I see in jconsole after running for a few
> > minutes:
> > http://i51.tinypic.com/2qntca1.png
> >
> > Here is a 24 hour period of the same data taken from a custom jmx
> monitor:
> > http://i51.tinypic.com/2vcu9u8.png
> >
> > The server performs pretty much as good at the beginning of this cycle as
> it
> > does at the end so all of this memory accumulation seems to not be doing
> > anything useful.
> >
> > I am running the 1.4 war but I was having this problem with 1.3 also.
> Tomcat
> > 6.0.18, Java 1.6.0. I haven't gone as far as doing any memory profiling
> or
> > java debugging because I'm inexperienced, but that will be the next thing
> I
> > try. Any help would be appreciated.
> >
> > Thanks,
> >
> > -Jeff
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread webdev1977


I have been reading through all the jira issues and patches, as well as the
wikis and I still have a few things that are not clear to me. 

I am currently running with Solr 1.4.1 and using Nutch for my crawling. 
Everything is working great, I am using a Nutch plugin to add lat long
information, I just don't know if it is possible to do what I am wanting to
do. 

1.  I noticed that it said that the type of LatLongType can not be
mulitvalued. Does that mean that I can not have multiple lat/lon values for
one document.  If so, that would be quite a limitation.  I have an average
of 10 geotags per document. 

2. Is LocalSolr and SpatialSearch the same thing?   

3. If I did want to use the LatLonType with the BBOX filter,  where would I
go to get a patch for 1.4.1? Is it even possible to patch 1.4.1 or do I have
to go to an entirely different dev version of Solr? 

Thanks for your input!!! 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Local-Solr-Spatial-Search-and-LatLonType-clarification-tp1609570p1609570.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DataImportHandler dynamic fields clarification

2010-09-30 Thread harrysmith

>
>Two things, one are your DB column uppercase as this would effect the out.
>
>

Interesting, I was under the impression that case does not matter.

>From http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config :
"It is possible to totally avoid the field entries in entities if the names
of the fields are same (case does not matter) as those in Solr schema"

I confirmed that matching the schema.xml field case to the database table is
needed for dynamic fields, and the wiki statement above is incorrect, or at
the very least confusing, possibly a bug.

My database is Oracle 10g and the column names have been created in all
uppercase in the database.

In Oracle:
Table name: wide_table
Column names: COLUMN_1 ... COLUMN_100 (yes, uppercase)

Please see following scenarios and results I found:

data-config.xml

schema.xml

Result:
Nothing Imported

data-config.xml

schema.xml

Result:
Note query column names changed to uppercase.
Nothing Imported

data-config.xml

schema.xml

Result:
Note ONLY the field entry was changed to caps

All records imported, with only COLUMN_100 id field.

data-config.xml

schema.xml

Result:
Note BOTH the field entry was changed to caps in data-config.xml, and the
dynamicField wildcard in schema.xml

All records imported, with all fields specified. This is the behavior
desired.

>
>Second what does your db-data-config.xml look like
>
>

The relevant data-config.xml is as follows:

Ideally, I would rather have the query be 'select * from wide_table" with
the fields being dynamically matched by the column name from the
dynamicField wildcard from the schema.xml.

--
View this message in context:
http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1609578.html
Sent from the Solr - User mailing list archive at Nabble.com.

Automatic xslt to responses ??

2010-09-30 Thread Sharma, Raghvendra

Is there a way to specify a xslt at the server side, and make it default, i.e. 
whenever a response is returned, that xslt is applied to the response 
automatically...
**
 
This message may contain confidential or proprietary information intended only 
for the use of the 
addressee(s) named above or may contain information that is legally privileged. 
If you are 
not the intended addressee, or the person responsible for delivering it to the 
intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly 
prohibited. If you have received this message by mistake, please immediately 
notify us by  
replying to the message and delete the original message and any copies 
immediately thereafter. 

Thank you. 
**
 
CLLD

Faster loading to solr...

2010-09-30 Thread Sharma, Raghvendra

I have been able to load around a million rows/docs in around 5+ minutes.  The 
schema contains around 250+ fields.  For the moment, I have kept everything as 
string. 
I am sure there are ways to get better loading speeds than this.

Will the data type matter in loading speeds ?? or anything else ?

Can someone help me with any tips ? perhaps any best practices  kind of 
document/article..
Anything ..

--raghav..

**
 
This message may contain confidential or proprietary information intended only 
for the use of the 
addressee(s) named above or may contain information that is legally privileged. 
If you are 
not the intended addressee, or the person responsible for delivering it to the 
intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly 
prohibited. If you have received this message by mistake, please immediately 
notify us by  
replying to the message and delete the original message and any copies 
immediately thereafter. 

Thank you. 
**
 
CLLD

Re: Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread Yonik Seeley

On Thu, Sep 30, 2010 at 1:09 PM, webdev1977  wrote:
> 1.  I noticed that it said that the type of LatLongType can not be
> mulitvalued. Does that mean that I can not have multiple lat/lon values for
> one document.

That means that if you want to have multiple points per document, each
point must be in a different field.
This often makes sense anyway, when the points have different
semantics - i.e. "work" and "home" locations.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

RE: General hardware requirements?

2010-09-30 Thread Nicholas Swarr

I think the indexing will be fine.  We are looking to use multi-select 
faceting, spelling suggestions, and highlighting to name a few.  On the front 
end (and on separate machines) are .NET web applications that issue queries via 
HTTP requests to our searchers.
I can't think of anything else that will require extra processing.  Thanks for 
bringing those considerations to my attention.  Is there anything there that 
significantly impacts the hardware needs?

-Original Message-

From: Gora Mohanty [mailto:g...@mimirtech.com] 

Sent: Thursday, September 30, 2010 10:47 AM

To: solr-user@lucene.apache.org

Subject: Re: General hardware requirements?

On Thu, Sep 30, 2010 at 8:09 PM, Nicholas Swarr
 wrote:

> 

> Our index is about 10 gigs in size with about 3
million documents.  The documents range in size from dozens to hundreds of
kilobytes.  Per week, we only get about 50k queries.

> Currently, we use lucene and have one box for our
indexer that has 32 gigs of memory and an 8 core CPU.  We have a pair of
search boxes that have about 16 gigs of ram a piece and 8 core CPUs.  They
hardly break a sweat.

> We're looking to adopt Solr.  Should we
consider changing our configuration at all?  Are there any other hardware
considerations for adopting Solr?

[...]

On the face of it, your machines should easily be
adequate for the

the search volume you are looking at. However, there are
other things

that you should consider:

* How are you indexing? What are acceptable times for
this?

* Are there any new Solr-specific features that you are
considering

  using, e.g.,
faceting? What performance benchmarks are you looking

  to achieve?

* What is your front-end for the search? Where is it
running?

Regards,

Gora

Re: Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread webdev1977


So it is still possible to do this in the index:

x,y
x,y

But not this:


  [0] x,y[0]
  [1]x,y[1]
 x,y
x,y

If the statement directly above is true (I hope that it is not), how does
one dynamically create fields when adding geotags?  They would have to be
something like, , , etc, etc.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Local-Solr-Spatial-Search-and-LatLonType-clarification-tp1609570p1609765.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread Yonik Seeley

On Thu, Sep 30, 2010 at 1:40 PM, webdev1977  wrote:
> Or.. do you mean each field must have a unique name, but both be of type
> latLon(solr.LatLonType).
>  x,y
> x,y

Yes.

> If the statement directly above is true (I hope that it is not), how does
> one dynamically create fields when adding geotags?

Dynamic field types.  You can configure it such that anything ending
with _latlon is of type LatLonType.
Perhaps we should do this in the example schema.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

error sending a delete all request

2010-09-30 Thread Christopher Gross

I'm writing some code that pushes data into a Solr instance.  I have my
Tomcat (5.5.28) set up to use 2 indexes, I'm hitting the second one for
this.
I try to issue the basic command to clear out the index
(*:*), and I get the error posted below
back.

Does anyone have an idea of what I'm missing or what could cause this
error?  I can clip in more from the logs if need be.

Thanks!

Logs:
2010-09-30 13:21:35,078 [pool-2-thread-1] DEBUG httpclient.wire.header - >>
"POST /solr2/update HTTP/1.1[\r][\n]"
2010-09-30 13:21:35,078 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpMethodBase - Adding Host request header
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - >>
"User-Agent: Jakarta Commons-HttpClient/3.0[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - >>
"Host: localhost:8080[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - >>
"Content-Length: 35[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - >>
"Content-Type: text/xml; charset=UTF-8[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - >>
"[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.content - >>
"*:*"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.methods.EntityEnclosingMethod - Request body
sent
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - <<
"HTTP/1.1 500 Internal Server Error[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - <<
"Server: Apache-Coyote/1.1[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - <<
"Content-Type: text/html;charset=utf-8[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - <<
"Content-Length: 7117[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - <<
"Date: Thu, 30 Sep 2010 17:21:35 GMT[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - <<
"Connection: close[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpMethodBase - Buffering response body
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.content - <<
"Apache Tomcat/5.5.28 -Error report
HTTP Status 500 - null[\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.content - <<
"[\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.content - <<
"java.lang.AbstractMethodError[\r][\n]"
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.content - <<
"[0x9]at
org.apache.lucene.search.Searcher.search(Searcher.java:150)[\r][\n]"

Then the actual stack trace in case that helps:
java.lang.AbstractMethodError
at org.apache.lucene.search.Searcher.search(Searcher.java:150)
at
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:343)
at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:260)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:204)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:433)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at
org.apache.coyote.http11.Http11Proce

Re: Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread Yonik Seeley

On Thu, Sep 30, 2010 at 1:48 PM, Yonik Seeley
 wrote:
> Dynamic field types.  You can configure it such that anything ending
> with _latlon is of type LatLonType.
> Perhaps we should do this in the example schema.

Looks like we already have it:

So you should be able to add stuff like home_p and work_p w/o defining
them ahead of time.  Anything ending in _p is of type location.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

Re: SEVERE: Unable to move index file

2010-09-30 Thread wojtekpia


Hi,
I ran into this problem again the other night. I've looked through my log
files in more detail, and nothing seems out of place (I stripped user
queries out and included it below). I have the following setup:
1. Indexer has 2 cores. One core gets incremental updates, the other is for
full re-syncs with a database. The last step in my full re-sync process is
to swap cores (so that the searchers don't have to change their replication
master URLs).
2. Searcher that is subscribed to a constant indexer URL.

I noticed this replication error occurred right after I swapped my indexer's
cores. Since the index version and generation numbers are independent across
the 2 cores, could the searcher's index clean up be pre-emptively deleting
the active searcher index? When the error occurred, index.20100921053730 did
not exist, but index.properties was pointing to it. Previous entries in the
log make it seem like the directory did exist a few minutes earlier
(replication + warmup succeeded pointing at that directory). 

I've tried to reproduce this in a development environment, but haven't been
able to so far. 
https://issues.apache.org/jira/browse/SOLR-1822?focusedCommentId=12845175
SOLR-1822  seems to address a similar issue. I suspect that it would solve
what I'm seeing, but it treats the symptom rather than the cause (and I'd
like to be able to repro before trying it). Any insight/theories are
appreciated.

Thanks,

Wojtek

Sep 21, 2010 5:35:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1271723727936, generation: 18616
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1271723727935, generation: 18615
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Number of files in latest index in master: 118
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13n9.prx
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13nx.fnm
...
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13m5.fnm
...
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13n9.frq
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Total time taken for download : 0 secs
Sep 21, 2010 5:37:31 PM org.apache.solr.update.DirectUpdateHandler2
__AW_commit
INFO: start
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening searc...@61080339 main
Sep 21, 2010 5:37:31 PM org.apache.solr.update.DirectUpdateHandler2
__AW_commit
INFO: end_commit_flush
Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher __AW_warm
INFO: autowarming searc...@61080339 main from searc...@26aebd8c main

fieldValueCache{lookups=866,hits=866,hitratio=1.00,inserts=0,evictions=0,size=11,warmupTime=0,cumulative_lookups=493365,cumulative_hits=493351,cumulative_hitratio=0.99,cumulative_inserts=7,cumulative_evictions=0,item_FeaturesFacet={field=FeaturesFacet,memSize=51896798,tindexSize=56,time=988,phase1=936,nTerms=50,bigTerms=9,termInstances=5403271,uses=146},...}
...
Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher __AW_warm
INFO: autowarming result for searc...@61080339 main

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=2036931,cumulative_hits=836191,cumulative_hitratio=0.41,cumulative_inserts=1200740,cumulative_evictions=1103563}
Sep 21, 2010 5:37:31 PM org.apache.solr.core.QuerySenderListener
__AW_newSearcher
INFO: QuerySenderListener sending requests to searc...@61080339 main
Sep 21, 2010 5:37:31 PM org.apache.solr.request.UnInvertedField uninvert
INFO: UnInverted multi-valued field
{field=BedFacet,memSize=48178130,tindexSize=42,time=313,phase1=261,nTerms=6,bigTerms=4,termInstances=328351,uses=0}
...
INFO: [] webapp=null path=null params={*:*} hits=11546888 status=0
QTime=20687 
Sep 21, 2010 5:37:58 PM org.apache.solr.core.QuerySenderListener
__AW_newSearcher
INFO: QuerySenderListener done.
Sep 21, 2010 5:37:58 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher searc...@61080339 main
Sep 21, 2010 5:37:58 PM org.apache.solr.search.SolrIndexSearcher __AW_close
INFO: Closing searc...@26aebd8c main

fieldValueCache{lookups=950,hits=950,hitratio=1.00,inserts=0,evictions=0,size=11,warmupTime=0,cumulative_lookups=493449,cumulative_hits=493435,cumulative_hitratio=0.99,cumulat

Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-09-30 Thread Yonik Seeley

On Thu, Sep 30, 2010 at 10:41 AM, Renee Sun  wrote:
>
> Hi -
> I posted this problem but no response, I guess I need to post this in the
> Solr-User forum. Hopefully you will help me on this.
>
> We were running Solr 1.3 for long time, with 130 cores. Just upgrade to Solr
> 1.4, then when we start the Solr, it took about 45 minutes. The catalina.log
> shows Solr is very slowly loading all the cores.

Have you tried 1.4.1 yet?
Could you open a JIRA issue for this and give whatever info you can?
Info like:
  - do you have any warming queries configured?
  - do the cores have documents already, and if so, how many per core?
  - are you using the same schema & solrconfig, or did you upgrade?
  - have you tried finding out what is taking up all the memory (or
all the CPU time)?

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

RE: Is Solr right for my business situation ?

2010-09-30 Thread Sharma, Raghvendra

Thanks for the ideas.

I think after reading enough documentation and articles around solr and xml 
indexing in general, I have come around to understand that there is no escaping 
denormalization.

However, one tiny thought remains... perhaps my last shot at avoiding 
denormalization (of course its going to be a costly affair)..

I was reading about how solr can handle multiple cores and therefore multiple 
indexes.  Can there be a single search interface sending queries to these three 
cores ?? in that case, who would do load balancing ? the merging of the results 
?? and whether I would be running three instances of solr on my system(s) or 
only one can handle that..



-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Thursday, September 30, 2010 9:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

You need to be able to query the database with the 'Mother of all Queries', 
i.e. one that completely flattens all tables into each row. 

In other words, the JOIN section of the query will have EVERY table in it, and 
depending on your schema, some of them twice or more.

Trying to do that with CSV, separate tables would require you to put those into 
your OWN database, then query against that, as above.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/29/10, Sharma, Raghvendra  wrote:

> From: Sharma, Raghvendra 
> Subject: RE: Is Solr right for my business situation ?
> To: "solr-user@lucene.apache.org" 
> Date: Wednesday, September 29, 2010, 9:40 AM
> Some questions.  
> 
> 1. I have about 3-5 tables. Now designing schema.xml for a
> single table looks ok, but whats the direction for handling
> multiple table structures is something I am not sure about.
> Would it be like a big huge xml, wherein those three tables
> (assuming its three) would show up as three different
> tag-trees, nullable. 
> 
> My source provides me a single flat file per table (tab
> delimited).
> 
> Do you think having multiple indexes could be a solution
> for this case ?? or do I really need to spend effort in
> denormalizing the data ?
> 
> 2. Further, loading into solr can use some perf tuning..
> any tips ? best practices ?
> 
> 3. Also, is there a way to specify a xslt at the server
> side, and make it default, i.e. whenever a response is
> returned, that xslt is applied to the response
> automatically...
> 
> 4. And last question for the day - :) there was one post
> saying that the spatial support is really basic in solr and
> is going to be improved in next versions... Can you ppl help
> me get a definitive yes or no on spatial support... in the
> current form, does it work on not ? I would store lat and
> long, and would need to make them searchable...
> 
> --raghav..
> 
> -Original Message-
> From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
> 
> Sent: Tuesday, September 28, 2010 11:45 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Is Solr right for my business situation ?
> 
> Thanks for the responses people.
> 
> @Grant  
> 
> 1. can you show me some direction on that.. loading data
> from an incoming stream.. do I need some third party tools,
> or need to build something myself...
> 
> 4. I am basically attempting to build a very fast search
> interface for the existing data. The volume I mentioned is
> more like static one (data is already there). The sql
> statements I mentioned are daily updates coming. The good
> thing is that the history is not there, so the overall
> volume is not growing, but I need to apply the update
> statements. 
> 
> One workaround I had in mind is, (though not so great
> performance) is to apply the updates to a copy of rdbms, and
> then feed the rdbms extract to solr.  Sounds like
> overkill, but I don't have another idea right now. Perhaps
> business discussions would yield something.
> 
> @All -
> 
> Some more questions guys.  
> 
> 1. I have about 3-5 tables. Now designing schema.xml for a
> single table looks ok, but whats the direction for handling
> multiple table structures is something I am not sure about.
> Would it be like a big huge xml, wherein those three tables
> (assuming its three) would show up as three different
> tag-trees, nullable. 
> 
> My source provides me a single flat file per table (tab
> delimited).
> 
> 2. Further, loading into solr can use some perf tuning..
> any tips ? best practices ?
> 
> 3. Also, is there a way to specify a xslt at the server
> side, and make it default, i.e. whenever a response is
> returned, that xslt is applied to the response
> automatically...
> 
> 4. And last question for the day - :) there was one post
> saying that the spatial support is really basic in solr and
> is going to be improved in next versions... Can you ppl help
> me get a definitive yes or no on spatial support... in the
> current f

Re: Automatic xslt to responses ??

2010-09-30 Thread Gora Mohanty

On Thu, Sep 30, 2010 at 10:47 PM, Sharma, Raghvendra
 wrote:
> Is there a way to specify a xslt at the server side, and make it default, 
> i.e. whenever a response is returned, that xslt is applied to the response 
> automatically...

This should be of help: http://wiki.apache.org/solr/XsltResponseWriter

Regards,
Gora

RE: Grouping in solr ?

2010-09-30 Thread Papp Richard

I'm really sorry - thank you for the note.

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Tuesday, September 28, 2010 05:12
To: solr-user@lucene.apache.org
Subject: Re: Grouping in solr ?

: References:
: 
: In-Reply-To:
: 
: Subject: Grouping in solr ?

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

Re: Solr Cluster Indexing data question

2010-09-30 Thread Steve Cohen

So how would one set it up to use multiple nodes for building an index? I
see a document for solr + hadoop (http://wiki.apache.org/solr/HadoopIndexing)
and it says it has an example but the example is missing.

Thanks,
Steve Cohen

On Thu, Sep 30, 2010 at 10:58 AM, Jak Akdemir  wrote:

> If you want to use both of your nodes for building index (which means
> two master), it makes them unified and collapses master slave
> relation.
>
> Would you take a look the link below for index snapshot problem?
> http://wiki.apache.org/solr/SolrCollectionDistributionScripts
>
> On Thu, Sep 30, 2010 at 11:03 AM, ZAROGKIKAS,GIORGOS
>  wrote:
> > Hi there solr experts
> >
> >
> > I have an solr cluster with two nodes  and separate index files for each
> > node
> >
> > Node1 is master
> > Node2 is slave
> >
> >
> > Node1 is the one that I index my data and replicate them to Node2
> >
> > How can I index my data at both nodes simultaneously ?
> > Is there any specific setup 
> >
> >
> > The problem is when my Node1 is down and I index the data from Node2 ,
> > Solr creates backup index folders like this "index.20100929060410"
> > and reduce the space of my hard disk
> >
> > Thanks in advance
> >
> >
> >
> >
> >
> >
> >
>

parsedquery is different from querystrin

2010-09-30 Thread abhayd


hi 
I am searching for blackberry for some reason parsedquery shows up as
blackberri.

I check synonyms but i don't see anywhere.


text:blackberry
text:blackberry
text:blackberri
text:blackberri

Not sure if its related query results are showing up when its matched with
"black" 

Any help or directions for knowing why a document is showing up in the
result what word in doc hit the search term? I am seeing docs in results
which do not have search term at all

thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/parsedquery-is-different-from-querystrin-tp1610081p1610081.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Memory usage

2010-09-30 Thread Chris Hostetter


: There are 14,696,502 documents, we are doing a lot of funky stuff but I'm
: not sure which is most likely to cause an impact. We're sorting on a dynamic
: field there are about 1000 different variants of this field that look like
: "priority_sort_for_", which is an integer field. I've heard that
: sorting can have a big impact on memory consumption, could that be it?

sorting on a field requires that an array of the corrisponding type be 
constructed for that field - the size of the array is the size of maxDoc 
(ie: the number of documents in your index, including deleted documents).

If you are using TrieInts, and have an index with no deletions, sorting 
~14.7Mil docs on 1000 diff int fields will take up about ~55GB.

Thats a minimum just for the sorting of those int fields (SortablIntField 
which keeps a string version of the field value will be signifcantly 
bigger) and doesn't take into consideration any other data structures used 
for searching.

I'm not a GC expert, but based on my limited understanding your graph 
actually seems fine to me .. particularly the part where it says 
you've configured a Max heap of ~122GB or ram, and it's 
never spend anytime doing ConcurrentMarkSweep.  My uneducated 
understanding of those two numbers is that you've told the JVM it can use 
an ungodly amount of RAM, so it is.  It's done some basic cleanup of 
young gen (ParNew) but because the heap size has never gone above 50GB, 
it hasn't found any reason to actualy start a CMS GC to look for dea 
objects in Old Gen that it can clean up.


(Can someone who understands GC and JVM tunning better then me please 
sanity check me on that?)


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!

Re: Memory usage

2010-09-30 Thread Jeff Moss

I think you've probably nailed it Chris, thanks for that, I think I can get
by with a different approach than this.

Do you know if I will get the same memory consumption using the
RandomFieldType vs the TrieInt?

-Jeff

On Thu, Sep 30, 2010 at 12:36 PM, Chris Hostetter
wrote:

>
> : There are 14,696,502 documents, we are doing a lot of funky stuff but I'm
> : not sure which is most likely to cause an impact. We're sorting on a
> dynamic
> : field there are about 1000 different variants of this field that look
> like
> : "priority_sort_for_", which is an integer field. I've heard
> that
> : sorting can have a big impact on memory consumption, could that be it?
>
> sorting on a field requires that an array of the corrisponding type be
> constructed for that field - the size of the array is the size of maxDoc
> (ie: the number of documents in your index, including deleted documents).
>
> If you are using TrieInts, and have an index with no deletions, sorting
> ~14.7Mil docs on 1000 diff int fields will take up about ~55GB.
>
> Thats a minimum just for the sorting of those int fields (SortablIntField
> which keeps a string version of the field value will be signifcantly
> bigger) and doesn't take into consideration any other data structures used
> for searching.
>
> I'm not a GC expert, but based on my limited understanding your graph
> actually seems fine to me .. particularly the part where it says
> you've configured a Max heap of ~122GB or ram, and it's
> never spend anytime doing ConcurrentMarkSweep.  My uneducated
> understanding of those two numbers is that you've told the JVM it can use
> an ungodly amount of RAM, so it is.  It's done some basic cleanup of
> young gen (ParNew) but because the heap size has never gone above 50GB,
> it hasn't found any reason to actualy start a CMS GC to look for dea
> objects in Old Gen that it can clean up.
>
>
> (Can someone who understands GC and JVM tunning better then me please
> sanity check me on that?)
>
>
> -Hoss
>
> --
> http://lucenerevolution.org/  ...  October 7-8, Boston
> http://bit.ly/stump-hoss  ...  Stump The Chump!
>
>

SolrJ

2010-09-30 Thread Christopher Gross

Where can I get SolrJ?  The wiki makes reference to it, and says that it is
a part of the Solr builds that you download, but I can't find it in the jars
that come with it.  Can anyone shed some light on this for me?

Thanks!

-- Chris

updating the solr index

2010-09-30 Thread Vicedomine, James (TS)

Sometimes with I update the solr index (for example post new DOCs with
the same id) old DOC ATTRIBUTE VALUES appear to be available to queries;
but not visible when the DOC ATTRIBUTE VALUES are listed?  In other
words, queries sometimes return results based upon old attribute values?

Thank you in advance.



James Vicedomine 
Software Development Analyst 4 
Northrop Grumman, Integrated Data and Software Solutions 
978-247-7842

Re: SolrJ

2010-09-30 Thread Allistair Crossley

it's in the dist folder with the name provided by the wiki page you refer to

On Sep 30, 2010, at 3:01 PM, Christopher Gross wrote:

> Where can I get SolrJ?  The wiki makes reference to it, and says that it is
> a part of the Solr builds that you download, but I can't find it in the jars
> that come with it.  Can anyone shed some light on this for me?
> 
> Thanks!
> 
> -- Chris

LocalSolr, Spatial Search, LatLonType clarification

2010-09-30 Thread webdev1977


I have been reading through all the jira issues and patches, as well as the
wikis and I still have a few things that are not clear to me. 

I am currently running with Solr 1.4.1 and using Nutch for my crawling. 
Everything is working great, I am using a Nutch plugin to add lat long
information, I just don't know if it is possible to do what I am wanting to
do.

1.  I noticed that it said that the type of LatLongType can not be
mulitvalued. Does that mean that I can not have multiple lat/lon values for
one document.  If so, that would be quite a limitation.  I have an average
of 10 geotags per document.

2. Is LocalSolr and SpatialSearch the same thing?  

3. If I did want to use the LatLonType with the BBOX filter,  where would I
go to get a patch for 1.4.1? Is it even possible to patch 1.4.1 or do I have
to go to an entirely different dev version of Solr? 

Thanks for your input!!! 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/LocalSolr-Spatial-Search-LatLonType-clarification-tp1609043p1609043.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrJ

2010-09-30 Thread Christopher Gross

Now I feel dumb, it was right there.  Thanks! :)

-- Chris


On Thu, Sep 30, 2010 at 3:04 PM, Allistair Crossley wrote:

> it's in the dist folder with the name provided by the wiki page you refer
> to
>
> On Sep 30, 2010, at 3:01 PM, Christopher Gross wrote:
>
> > Where can I get SolrJ?  The wiki makes reference to it, and says that it
> is
> > a part of the Solr builds that you download, but I can't find it in the
> jars
> > that come with it.  Can anyone shed some light on this for me?
> >
> > Thanks!
> >
> > -- Chris
>
>

RE: updating the solr index

2010-09-30 Thread Markus Jelsma

Updates will not show up if they weren't committed, either through a manual 
commit or auto commit. 

-Original message-
From: Vicedomine, James (TS) 
Sent: Thu 30-09-2010 21:04
To: solr-user@lucene.apache.org; 
Subject: updating the solr index

Sometimes with I update the solr index (for example post new DOCs with
the same id) old DOC ATTRIBUTE VALUES appear to be available to queries;
but not visible when the DOC ATTRIBUTE VALUES are listed?  In other
words, queries sometimes return results based upon old attribute values?

Thank you in advance.

James Vicedomine 
Software Development Analyst 4 
Northrop Grumman, Integrated Data and Software Solutions 
978-247-7842

Re: Memory usage

2010-09-30 Thread Lance Norskog

You can also sort on a field by using a function query instead of the
"sort=field+desc" parameter. This will not eat up memory. Instead, it
will be slower. In short, it is a classic speed v.s. space trade-off.

You'll have to benchmark and decide which you want, and maybe some
fields need the fast sort and some can get away with the slow one.

http://www.lucidimagination.com/search/?q=function+query

On Thu, Sep 30, 2010 at 11:47 AM, Jeff Moss  wrote:
> I think you've probably nailed it Chris, thanks for that, I think I can get
> by with a different approach than this.
>
> Do you know if I will get the same memory consumption using the
> RandomFieldType vs the TrieInt?
>
> -Jeff
>
> On Thu, Sep 30, 2010 at 12:36 PM, Chris Hostetter
> wrote:
>
>>
>> : There are 14,696,502 documents, we are doing a lot of funky stuff but I'm
>> : not sure which is most likely to cause an impact. We're sorting on a
>> dynamic
>> : field there are about 1000 different variants of this field that look
>> like
>> : "priority_sort_for_", which is an integer field. I've heard
>> that
>> : sorting can have a big impact on memory consumption, could that be it?
>>
>> sorting on a field requires that an array of the corrisponding type be
>> constructed for that field - the size of the array is the size of maxDoc
>> (ie: the number of documents in your index, including deleted documents).
>>
>> If you are using TrieInts, and have an index with no deletions, sorting
>> ~14.7Mil docs on 1000 diff int fields will take up about ~55GB.
>>
>> Thats a minimum just for the sorting of those int fields (SortablIntField
>> which keeps a string version of the field value will be signifcantly
>> bigger) and doesn't take into consideration any other data structures used
>> for searching.
>>
>> I'm not a GC expert, but based on my limited understanding your graph
>> actually seems fine to me .. particularly the part where it says
>> you've configured a Max heap of ~122GB or ram, and it's
>> never spend anytime doing ConcurrentMarkSweep.  My uneducated
>> understanding of those two numbers is that you've told the JVM it can use
>> an ungodly amount of RAM, so it is.  It's done some basic cleanup of
>> young gen (ParNew) but because the heap size has never gone above 50GB,
>> it hasn't found any reason to actualy start a CMS GC to look for dea
>> objects in Old Gen that it can clean up.
>>
>>
>> (Can someone who understands GC and JVM tunning better then me please
>> sanity check me on that?)
>>
>>
>> -Hoss
>>
>> --
>> http://lucenerevolution.org/  ...  October 7-8, Boston
>> http://bit.ly/stump-hoss      ...  Stump The Chump!
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com

RE: Automatic xslt to responses ??

2010-09-30 Thread Markus Jelsma

You can add a default setting to your request handler. Read about defaults, 
appends and invariants in requesthandlers defined in your solrconfig.xml. 

-Original message-
From: Sharma, Raghvendra 
Sent: Thu 30-09-2010 19:17
To: solr-user@lucene.apache.org; 
Subject: Automatic xslt to responses ??

Is there a way to specify a xslt at the server side, and make it default, i.e. 
whenever a response is returned, that xslt is applied to the response 
automatically...
**

This message may contain confidential or proprietary information intended only 
for the use of the 
addressee(s) named above or may contain information that is legally privileged. 
If you are 
not the intended addressee, or the person responsible for delivering it to the 
intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly 
prohibited. If you have received this message by mistake, please immediately 
notify us by  
replying to the message and delete the original message and any copies 
immediately thereafter. 

Thank you. 
**

CLLD

RE: can i have more update processors with solr

2010-09-30 Thread Markus Jelsma

Almost, you can define a updateRequestProcessorChain that houses multiple 
update processors.

 

  
    
  true
  title_signature
  true
  title
  org.apache.solr.update.processor.Lookup3Signature
    

    
  true
  content_signature
  true
  content
  org.apache.solr.update.processor.TextProfileSignature
    
 
    
  
 
-Original message-
From: Dilshod Temirkhodjaev 
Sent: Thu 30-09-2010 17:12
To: solr-user@lucene.apache.org; 
Subject: can i have more update processors with solr

I don't know if this is bug or not, but when i'm writing this in
solrconfig.xml


 
   CustomRank
   dedupe
 


only first update.processor works, why second is not working?

Re: error sending a delete all request

2010-09-30 Thread Christopher Gross

I have also tried using SolrJ to hit my index, and I get this error:

2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.useragent = Jakarta Commons-HttpClient/3.0
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.protocol.version = HTTP/1.1
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.class = class
org.apache.commons.httpclient.SimpleHttpConnectionManager
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.protocol.cookie-policy = rfc2109
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.protocol.element-charset = US-ASCII
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.protocol.content-charset = ISO-8859-1
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.method.retry-handler =
org.apache.commons.httpclient.defaulthttpmethodretryhand...@1a082e2
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.dateparser.patterns = [EEE, dd MMM  HH:mm:ss zzz, , dd-MMM-yy
HH:mm:ss zzz, EEE MMM d HH:mm:ss , EEE, dd-MMM- HH:mm:ss z, EEE,
dd-MMM- HH-mm-ss z, EEE, dd MMM yy HH:mm:ss z, EEE dd-MMM- HH:mm:ss
z, EEE dd MMM  HH:mm:ss z, EEE dd-MMM- HH-mm-ss z, EEE dd-MMM-yy
HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, EEE,dd-MMM-yy HH:mm:ss z,
EEE,dd-MMM- HH:mm:ss z, EEE, dd-MM- HH:mm:ss z]
2010-09-30 16:23:14,421 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.max-per-host = {HostConfiguration[]=32}
2010-09-30 16:23:14,421 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.max-total = 128
2010-09-30 16:23:14,421 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.socket.timeout = 2
2010-09-30 16:23:14,421 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.timeout = 4
2010-09-30 16:23:14,453 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.max-per-host = {HostConfiguration[]=100}
2010-09-30 16:23:14,453 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.max-total = 100
2010-09-30 16:23:14,484 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager -
HttpConnectionManager.getConnection:  config = HostConfiguration[host=
http://localhost:8080], timeout = 4
2010-09-30 16:23:14,484 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager -
Allocating new connection, hostConfig=HostConfiguration[host=
http://localhost:8080]
2010-09-30 16:23:14,500 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpConnection - Open connection to
localhost:8080
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpMethodBase - Adding Host request header
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.methods.EntityEnclosingMethod - Request body
sent
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpMethodBase - Should close connection in
response to directive: close
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpConnection - Releasing connection back to
connection manager.
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager - Freeing
connection, hostConfig=HostConfiguration[host=http://localhost:8080]
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.util.IdleConnectionHandler - Adding connection
at: 1285878194515
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager - Notifying
no-one, there are no waiting threads
2010-09-30 16:23:14,515 [pool-2-thread-1] WARN
gov.dni.search.intelsync.exporter.SyncExporter -
org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request: http://localhost:8080/solr2/update?wt=xml&version=2.2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
at
org.apache.solr

DataImportHandler Error CHARBytesToJavaChars

2010-09-30 Thread harrysmith


Anyone ever see this error on an import? 

Caused by: java.lang.NullPointerException
at
oracle.jdbc.driver.DBConversion._CHARBytesToJavaChars(DBConversion.java:1015)

The Oracle column being converted is VARCHAR2(4000 Char) and there are NULLs
present in the record set.

Envrionment: Solr 1.4, Windows, Jetty 


Full stack trace below:

at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
39)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:50
2)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpCo
nnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool
.java:442)
Caused by: java.lang.NullPointerException
at
oracle.jdbc.driver.DBConversion._CHARBytesToJavaChars(DBConversion.ja
va:1015)
at
oracle.jdbc.driver.DBConversion.CHARBytesToJavaChars(DBConversion.jav
a:892)
at
oracle.jdbc.driver.T4CVarcharAccessor.unmarshalOneRow(T4CVarcharAcces
sor.java:282)
at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:919)
at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:843)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:630)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:210)
at
oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:961)
at
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStateme
nt.java:1072)
at
oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.jav
a:845)
at
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStateme
nt.java:1154)
at
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.ja
va:1726)
at
oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1696)

at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<
init>(JdbcDataSource.java:246)
... 32 more
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-Error-CHARBytesToJavaChars-tp1611016p1611016.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: parsedquery is different from querystrin

2010-09-30 Thread Markus Jelsma

We cannot really give an answer without knowing your fieldType and query. We 
can see that the blackberry => blackberri is caused by a stemmer you have, 
perhaps a porter or snowball stemmer. Anyway, that's normal.

-Original message-
From: abhayd 
Sent: Thu 30-09-2010 20:32
To: solr-user@lucene.apache.org; 
Subject: parsedquery is different from querystrin

hi 
I am searching for blackberry for some reason parsedquery shows up as
blackberri.

I check synonyms but i don't see anywhere.

text:blackberry
text:blackberry
text:blackberri
text:blackberri

Not sure if its related query results are showing up when its matched with
"black" 

Any help or directions for knowing why a document is showing up in the
result what word in doc hit the search term? I am seeing docs in results
which do not have search term at all

thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/parsedquery-is-different-from-querystrin-tp1610081p1610081.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Is Solr right for my business situation ?

2010-09-30 Thread Markus Jelsma

Recent versions supports sharding and handles distribution of your query and 
result set merging. The problem, it won't help you to join on separate 
`tables`. The fields you query need to be present in each shard or you'll end 
up with an HTTP 400 - undefined field error.

 

Indeed, there is no escape.
 
-Original message-
From: Sharma, Raghvendra 
Sent: Thu 30-09-2010 20:07
To: solr-user@lucene.apache.org; 
Subject: RE: Is Solr right for my business situation ?

Thanks for the ideas.

I think after reading enough documentation and articles around solr and xml 
indexing in general, I have come around to understand that there is no escaping 
denormalization.

However, one tiny thought remains... perhaps my last shot at avoiding 
denormalization (of course its going to be a costly affair)..

I was reading about how solr can handle multiple cores and therefore multiple 
indexes.  Can there be a single search interface sending queries to these three 
cores ?? in that case, who would do load balancing ? the merging of the results 
?? and whether I would be running three instances of solr on my system(s) or 
only one can handle that..



-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Thursday, September 30, 2010 9:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

You need to be able to query the database with the 'Mother of all Queries', 
i.e. one that completely flattens all tables into each row. 

In other words, the JOIN section of the query will have EVERY table in it, and 
depending on your schema, some of them twice or more.

Trying to do that with CSV, separate tables would require you to put those into 
your OWN database, then query against that, as above.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
 otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/29/10, Sharma, Raghvendra  wrote:

> From: Sharma, Raghvendra 
> Subject: RE: Is Solr right for my business situation ?
> To: "solr-user@lucene.apache.org" 
> Date: Wednesday, September 29, 2010, 9:40 AM
> Some questions.  
> 
> 1. I have about 3-5 tables. Now designing schema.xml for a
> single table looks ok, but whats the direction for handling
> multiple table structures is something I am not sure about.
> Would it be like a big huge xml, wherein those three tables
> (assuming its three) would show up as three different
> tag-trees, nullable. 
> 
> My source provides me a single flat file per table (tab
> delimited).
> 
> Do you think having multiple indexes could be a solution
> for this case ?? or do I really need to spend effort in
> denormalizing the data ?
> 
> 2. Further, loading into solr can use some perf tuning..
> any tips ? best practices ?
> 
> 3. Also, is there a way to specify a xslt at the server
> side, and make it default, i.e. whenever a response is
> returned, that xslt is applied to the response
> automatically...
> 
> 4. And last question for the day - :) there was one post
> saying that the spatial support is really basic in solr and
> is going to be improved in next versions... Can you ppl help
> me get a definitive yes or no on spatial support... in the
> current form, does it work on not ? I would store lat and
> long, and would need to make them searchable...
> 
> --raghav..
> 
> -Original Message-
> From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
> 
> Sent: Tuesday, September 28, 2010 11:45 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Is Solr right for my business situation ?
> 
> Thanks for the responses people.
> 
> @Grant  
> 
> 1. can you show me some direction on that.. loading data
> from an incoming stream.. do I need some third party tools,
> or need to build something myself...
> 
> 4. I am basically attempting to build a very fast search
> interface for the existing data. The volume I mentioned is
> more like static one (data is already there). The sql
> statements I mentioned are daily updates coming. The good
> thing is that the history is not there, so the overall
> volume is not growing, but I need to apply the update
> statements. 
> 
> One workaround I had in mind is, (though not so great
> performance) is to apply the updates to a copy of rdbms, and
> then feed the rdbms extract to solr.  Sounds like
> overkill, but I don't have another idea right now. Perhaps
> business discussions would yield something.
> 
> @All -
> 
> Some more questions guys.  
> 
> 1. I have about 3-5 tables. Now designing schema.xml for a
> single table looks ok, but whats the direction for handling
> multiple table structures is something I am not sure about.
> Would it be like a big huge xml, wherein those three tables
> (assuming its three) would show up as three different
> tag-trees, nullable. 
> 
> My source provides me a single flat file per table (tab
> delimited).
> 
> 2. Further, loading into solr can use some perf tuning.

Re: tomcat, solr and dismax syntax

2010-09-30 Thread Chris Hostetter


: it turns the plus(es) into spaces. Is this a tomcat setting or a solr 
: one to stop this happening? How can I get the plus into solr so it 
: actually means a required word.

It's part of the URL specification -- all of your query params (not just 
the query string) need to be properly URL escaped regardless of what 
QParser you use...

http://wiki.apache.org/solr/SolrQuerySyntax#NOTE:_URL_Escaping_Special_Characters
http://en.wikipedia.org/wiki/Percent-encoding


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!

PHP Solr API

2010-09-30 Thread Scott Yeadon


 Hi,

I have inherited an application which uses Solr search and the PHP Solr 
API (http://pecl.php.net/package/solr). While the list of search results 
with appropriate highlighting is all good, when selecting a result that 
navigates to an individual article the users want to have all the hits 
highlighted in the full text.


The problem is that the article text is HTML and Solr appears to strip 
the HTML by default. The highlight snippets contain no formatting and 
neither does the "stored" version of the text. This means that using a 
large snippet size and using the returned text as the article text is 
not satisfactory, nor is using the stored version returned by the return 
response.


Obtaining offset information from the search and applying the 
highlighting myself within the webapp using the HTML version would be 
fine, but the offsets will be wrong due to the stripping of the tags. 
Does anyone have any advice on how I might get this to work, it doesn't 
seem to be a particularly unusual use case yet I could not find 
information on how to achieve it. It's likely I'm overlooking something 
simple. Anyone have any advice?


Thanks.

Scott.

Re: PHP Solr API

2010-09-30 Thread Neil Lunn

On Fri, 2010-10-01 at 12:00 +1000, Scott Yeadon wrote:
> Hi,
> 

> The problem is that the article text is HTML and Solr appears to strip 
> the HTML by default.

I think what you need to look at is how the fields are defined by
default in your schema. If Data sent as HTML is being added to the
standard html-text type and stored then the html is stripped and words
indexed by default. If you want to store the raw html then maybe you
should be doing that and not storing the stripped version, just indexing
it.

-- 

Regards,

Neil Lunn

Re: PHP Solr API

2010-09-30 Thread Scott Yeadon

 Thanks, but I still need to "store" text at any rate in order to get 
the highlighted snippets for the search results list. This isn't a 
problem. The issue is how to obtain correct offsets or other mechanisms 
for being able to display the original HTML text plus term highlighting 
when navigating to an individual search result.


Scott.

On 1/10/10 12:53 PM, Neil Lunn wrote:

On Fri, 2010-10-01 at 12:00 +1000, Scott Yeadon wrote:

Hi,

The problem is that the article text is HTML and Solr appears to strip
the HTML by default.

I think what you need to look at is how the fields are defined by
default in your schema. If Data sent as HTML is being added to the
standard html-text type and stored then the html is stripped and words
indexed by default. If you want to store the raw html then maybe you
should be doing that and not storing the stripped version, just indexing
it.

Re: Faster loading to solr...

2010-09-30 Thread Gora Mohanty

On Thu, Sep 30, 2010 at 10:49 PM, Sharma, Raghvendra
 wrote:
> I have been able to load around a million rows/docs in around 5+ minutes.  
> The schema contains around 250+ fields.  For the moment, I have kept 
> everything as string.
> I am sure there are ways to get better loading speeds than this.

A million documents with 250 fields in 5 minutes sounds fast to
me. As a comparison, we do a million documents with about 60 fields
in an hour, using multiple Solr cores. However, this is very likely an
apples to oranges comparison, as we are pulling large amounts of
data from a database over a network. What indexing times are you
aiming for?

If you can shard your data, using multiple cores on a single Solr
instance, and/or multiple Solr instances will speed up your indexing.
However, if you want a complete, non-sharded index, you will need
to merge the sharded ones.

> Will the data type matter in loading speeds ?? or anything else ?

Data type might matter if there is a lot of processing involved for
that data type. E.g., the text type has several analyzers and tokenizers.

> Can someone help me with any tips ? perhaps any best practices  kind of 
> document/article..
> Anything ..
[...]

The Solr Wiki has many suggestions, e.g., look at the documentation
on the DataImportHandler. In our experience, XML import has been
very fast. A generic document is difficult as the speed is dependent
on many things, such as the data source, number and type of fields,
size of data, etc. Your best bet is to try out several approaches.

Regards,
Gora

TermVector filter

2010-09-30 Thread Scott Yeadon


 Hi,

With the TermVector component, is there a means of limiting/filtering 
the returned information to only those terms found in a query?


Scott.

Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-09-30 Thread Renee Sun


Hi Yonik,
thanks for your reply.

I entered a bug for this at :
https://issues.apache.org/jira/browse/SOLR-2138

to answer your questions here:
  - do you have any warming queries configured? 
> no, all autowarmingcount are set to 0 for all caches
  - do the cores have documents already, and if so, how many per core? 
> yes, 130 cores total, 2,3 of them already have 1~2.4 million
documents, others have about 50,000 documents
  - are you using the same schema & solrconfig, or did you upgrade? 
> yes, absolutely no change
  - have you tried finding out what is taking up all the memory (or 
all the CPU time)? 
> yes, JConsole shows after 70 cores are loaded in about 4 minutes, all
16GB memory are taken and rest of cores load extremely slow. The memory
remain high and never dropped.

We are in process to upgrade to 1.4.1

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1611030.html
Sent from the Solr - User mailing list archive at Nabble.com.

Highlighting match term in bold rather than italic

2010-09-30 Thread efr...@gmail.com

Hi all -

Does anyone know how to produce solr results where the match term is
highlighted in bold rather than italic?

thanks in advance,

Brad

Re: Highlighting match term in bold rather than italic

2010-09-30 Thread Scott Gonyea

Your solrconfig has a highlighting section.  You can make that CDATA
thing whatever you want.  I changed it to .

On Thu, Sep 30, 2010 at 2:54 PM, efr...@gmail.com  wrote:
> Hi all -
>
> Does anyone know how to produce solr results where the match term is
> highlighted in bold rather than italic?
>
> thanks in advance,
>
> Brad
>

Re: Highlighting match term in bold rather than italic

2010-09-30 Thread Scott Yeadon


 Check out
http://wiki.apache.org/solr/HighlightingParameters
and the hl.simple.pre/hl.simple.post options

You may be also able to control the display of the default  via CSS 
but will depend on your rendering context as to whether this is feasible.


Scott.

On 1/10/10 7:54 AM, efr...@gmail.com wrote:

Hi all -

Does anyone know how to produce solr results where the match term is
highlighted in bold rather than italic?

thanks in advance,

Brad

54 matches

Mail list logo