Indexing Problem with SOLR multicore

2010-06-09 Thread seesiddharth
Hi, I am using SOLR with Tomcat server. I have configured two multicore inside the SOLR home directory. The solr.xml file looks like I am also using DIH to upload the data in these two cores separately & document count in these two core is different. However wheneve

3.1-dev spatial search problem

2010-06-09 Thread nickdos
I'm running the 3.x branch and I'm trying to implement spatial searching. I am able to sort results by distance from a given lat/long using a query like: http://localhost:8080/solr/select/?q=_val_:"recip(dist(2, lat_long, vector(-66.5,75.1)),1,1,0)"&fl=*,score which gives me the expected resul

Re: Indexing HTML

2010-06-09 Thread Ken Krugler
On Jun 9, 2010, at 8:38pm, Blargy wrote: What is the preferred way to index html using DIH (my html is stored in a blob field in our database)? I know there is the built in HTMLStripTransformer but that doesn't seem to work well with malformed/incomplete HTML. I've created a custom tra

Re: Indexing HTML

2010-06-09 Thread Blargy
Wait... do you mean I should try the HTMLStripCharFilterFactory analyzer at index time? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-HTML-tp884497p884592.html Sent from

Re: Indexing HTML

2010-06-09 Thread Blargy
Does the HTMLStripChar apply at index time or query time? Would it matter to use over the other? As a side question, if I want to perform highlighter summaries against this field do I need to store the whole field or just index it with TermVector.WITH_POSITIONS_OFFSETS? -- View this message in

Re: how to have "shards" parameter by default

2010-06-09 Thread Scott Zhang
I tried put "shards" into default request handler. But now each time if search, solr hangs forever. So what's the correct solution? Thanks. explicit 10 * 2.1 localhost:7500/solr,localhost:7501/solr,localhost:7502/solr,localhost:7503/solr,localhost

how to have "shards" parameter by default

2010-06-09 Thread Scott Zhang
Hi. I am running distributed search on solr. I have 70 solr instances. So each time I want to search I need to use ?shards=localhost:7500/solr,localhost..7620/solr It is very long url. so how can I encode shards into config file then i don't need to type each time. thanks. Scott

Re: Indexing HTML

2010-06-09 Thread Lance Norskog
The HTMLStripChar variants are newer and might work better. On Wed, Jun 9, 2010 at 8:38 PM, Blargy wrote: > > What is the preferred way to index html using DIH (my html is stored in a > blob field in our database)? > > I know there is the built in HTMLStripTransformer but that doesn't seem to > w

Re: Diagnosing solr timeout

2010-06-09 Thread Lance Norskog
Every time you reload the index it is to rebuild the facet cached data. Could that be it? Also, how big are the fields being highlighted? And are they indexed with term vectors? (If not, the text is re-analyzed in flight with term vectors.) How big are the caches? Are they growing & growing? On

Can query boosting be used with a custom request handlers?

2010-06-09 Thread Andy
I want to try out the bobo plugin for Solr, which is a custom request handler (http://code.google.com/p/bobo-browse/wiki/SolrIntegration). At the same time I want to use BoostQParserPlugin to boost my queries, something like {!boost b=log(popularity)}foo Can I use the {!boost} feature in conj

Indexing HTML

2010-06-09 Thread Blargy
What is the preferred way to index html using DIH (my html is stored in a blob field in our database)? I know there is the built in HTMLStripTransformer but that doesn't seem to work well with malformed/incomplete HTML. I've created a custom transformer to first tidy up the html using JTidy then

Re: Need help with document format

2010-06-09 Thread Lance Norskog
This is what Field Collapsing does. It is a complex feature and is not in the Solr trunk yet. On Tue, Jun 8, 2010 at 9:15 AM, Moazzam Khan wrote: > How would I do a facet search if I did this and not get duplicates? > > Thanks, > Moazzam > > On Mon, Jun 7, 2010 at 10:07 AM, Israel Ekpo wrote: >>

Re: Index-time vs. search-time boosting performance

2010-06-09 Thread Lance Norskog
Is it necessary that a document 1 year old be more relevant than one that's 1 year and 1 hour old? In other words, can the boosting be logarithmic wrt time instead of linear? A schema design tip: you can store a separate date field which is rounded down to the hour. This will make for a much small

Re: Faceted Search Slows Down as index gets larger

2010-06-09 Thread Lance Norskog
The Distributed Search feature assumes that a document only exists in one code. Updating a doc in a small core will fail because it may be found twice. If you are only updating a popularity score, and only need it for boosting (but not for searching on a value), there is a feature called the Exter

Master master?

2010-06-09 Thread Glen Stampoultzis
Does Solr handling having two masters that are also slaves to each other (ie in a cycle)? Regards, Glen

Re: How Solr Manages Connected Database Updates

2010-06-09 Thread Lance Norskog
The DataImportHandler has a tool for fetching recent updates in the database and indexing only those new&changed records. It has no scheduler. You would set up the DIH configuration and then write a cron job to run it at regular intervals. Lance On Wed, Jun 9, 2010 at 7:51 AM, Sumit Arora wrote

Re: general debugging techniques?

2010-06-09 Thread Lance Norskog
https://issues.apache.org/jira/browse/LUCENE-2387 There is a "memory leak" that causes the last PDF binary file image to stick around while working on the next binary image. When you commit after every extraction, you clear up this "memory leak". This is fixed in trunk and should make it into a '

Re: AW: how to get multicore to work?

2010-06-09 Thread xdzgor
Thanks for the comments. I still can't get this multicore thing to work! Here is my directory structure: d: __apachesolr lucidworks __lucidworks solr __bin __conf __lib tomcat There is no solr.xml, and solr.solr.home points to d:\apachesolr\lucidw

Re: Diagnosing solr timeout

2010-06-09 Thread Jean-Sebastien Vachon
I use the following article as a reference when dealing with GC related issues http://www.petefreitag.com/articles/gctuning/ I suggest you activate the verbose option and send GC stats to a file. I don't remember exactly what was the option but you should find the information easily Good luck

Re: general debugging techniques?

2010-06-09 Thread Jim Blomo
On Fri, Jun 4, 2010 at 3:14 PM, Chris Hostetter wrote: > : That is still really small for 5MB documents. I think the default solr > : document cache is 512 items, so you would need at least 3 GB of memory > : if you didn't change that and the cache filled up. > > that assumes that the extracted te

Some questions about ability of solr.

2010-06-09 Thread Vitaliy Avdeev
I am keeping some data int Json format in HBase table. I would like to index this data with solr. Is there any examples of indexing HBase table? Evry node in HBase has atribyte that saves the data then it was writed int table. Is there any option to search no only by text but also to search the da

Re: Tomcat startup script

2010-06-09 Thread Sixten Otto
On Tue, Jun 8, 2010 at 4:18 PM, wrote: > The following should work on centos/redhat, don't forget to edit the paths, > user, and java options for your environment. You can use chkconfig to add it > to your startup. Thanks, Colin. Sixten

TrieRange for storage of dates

2010-06-09 Thread Jason Rutherglen
What is the best practice? Perhaps we can amend the article at http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ to include the recommendation (ie, dates are commonly unique). I'm assuming using a long is the best choice.

Re: Diagnosing solr timeout

2010-06-09 Thread Paul
>Have you looked at the garbage collector statistics? I've experienced this >kind of issues in the past and I was getting huge spikes when the GC was doing its job. I haven't, and I'm not sure what a good way to monitor this is. The problem occurs maybe once a week on a server. Should I run jstat

Re: Diagnosing solr timeout

2010-06-09 Thread Jean-Sebastien Vachon
Have you looked at the garbage collector statistics? I've experienced this kind of issues in the past and I was getting huge spikes when the GC was doing its job. On 2010-06-09, at 10:52 AM, Paul wrote: > Hi all, > > In my app, it seems like solr has become slower over time. The index > has gro

Dataimport in debug mode store a last index date

2010-06-09 Thread Marc Emery
Hi, When using the data import handler and clicking on 'Debug now' it stores the current date as 'last_index_time' into the dataimport.properties file. Is it the right behaviour, as debug don't do a commit? Thanks marc

Diagnosing solr timeout

2010-06-09 Thread Paul
Hi all, In my app, it seems like solr has become slower over time. The index has grown a bit, and there are probably a few more people using the site, but the changes are not drastic. I notice that when a solr search is made, the amount of cpu and ram spike precipitously. I notice in the solr lo

How Solr Manages Connected Database Updates

2010-06-09 Thread Sumit Arora
Hey All, I am new to Solr Area, and just started exploring it and done basic stuff, now I am stuck with logic : How Solr Manages Connected Database Updates Scenario : -- Wrote one Indexing Program which runs on Tomcat , and by running this program, it reads data from connected MySql Database a

Re: AW: XSLT for JSON

2010-06-09 Thread stockii
help me please =( -- View this message in context: http://lucene.472066.n3.nabble.com/XSLT-for-JSON-tp845386p882319.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Anyone using Solr spatial from trunk?

2010-06-09 Thread Rob Ganly
>>... but decided not to use it anyway? that's pretty much correct. the huge commercial scale of the project dictates that we need as much system stability as possible from the outset; thus the tools we are use must be established, community-tested and trusted versions. we also noticed that some

Re: Issue with response header in SOLR running on Linux instance

2010-06-09 Thread Markus Jelsma
Hi, Check your requestHandler. It may preset some values that you don't see. Your echoParams setting may be explicit instead of all [1]. Alternatively, you could add the echoParams parameter to your query if it isn't set as an invariant in your requestHandler. [1]: http://wiki.apache.org/solr

custom scorer in Solr

2010-06-09 Thread Fornoville, Tom
Hi all, We are currently working on a proof-of-concept for a client using Solr and have been able to configure all the features they want except the scoring. Problem is that they want scores that make results fall in buckets: * Bucket 1: exact match on category (score = 4) * Bu

Solr Core Unload

2010-06-09 Thread abhatna...@vantage.com
Refering http://lucene.472066.n3.nabble.com/unloading-a-solr-core-doesn-t-free-any-memory-td501246.html#a501246 Do we have any solution to free up memory after Solr Core Unload? Ankit -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Core-Unload-tp882187p882187.html S

Issue with response header in SOLR running on Linux instance

2010-06-09 Thread bbarani
Hi, I have been using SOLR for sometime now and had no issues till I was using it in windows. Yesterday I moved the SOLR code to Linux servers and started to index the data. Indexing completed successfully in the linux severs but when I queried the index, the response header returned (by the SOLR

Solr spellcheck config

2010-06-09 Thread Bogdan Gusiev
Hi everyone, I am trying to build the spellcheck index with *IndexBasedSpellChecker* default text ./spellchecker And I want to specify the dynamic field "*_text" as the field option: How it can be done? Thanks, Bogdan -- Bogdan Gusiev. agre...@gmail.com

Re: question about the fieldCollapseCache

2010-06-09 Thread Martijn v Groningen
I agree. I'll add this information to the wiki. On 9 June 2010 14:32, Jean-Sebastien Vachon wrote: > ok great. > > I believe this should be mentioned in the wiki. > > Later > > On 2010-06-09, at 4:06 AM, Martijn v Groningen wrote: > >> The fieldCollapseCache should not be used as it is now, it us

Re: question about the fieldCollapseCache

2010-06-09 Thread Jean-Sebastien Vachon
ok great. I believe this should be mentioned in the wiki. Later On 2010-06-09, at 4:06 AM, Martijn v Groningen wrote: > The fieldCollapseCache should not be used as it is now, it uses too > much memory. It stores any information relevant for a field collapse > search. Like document collapse cou

AW: how to get multicore to work?

2010-06-09 Thread Markus.Rietzler
- solr.xml have to reside in the solr.home dir. you can setup this with the java-option -Dsolr.solr.home= - admin is per core, so solr/CORENAME/admin will work it is quite simple to setup. > -Ursprüngliche Nachricht- > Von: xdzgor [mailto:p...@alphasolutions.dk] > Gesendet: Mittwo

how to test solr's performance?

2010-06-09 Thread Li Li
are there any built-in tools for performance test? thanks

RE: Making my QParserPlugin the default one, with cores

2010-06-09 Thread Yuval Feinstein
Thanks again Ahmet and Erik. Turns out that this was calling the correct query parser all along. The real problem was a combination of the query cache and my hacking the query to enable BM25 scoring. When I use a standard BooleanQuery, this behaved as published. Now I have to understand how to twe

requesthandler, variable ...

2010-06-09 Thread stockii
Hello. i want to call the termscomponent with this request: http://host/solr/app/select/?q=har i want the same result when i use this request: http://host/solr/app/terms/?q=har&terms.prefix=har --> 9 9 9 ... . this is my solrconfig.xml requestHandler

Copyfield multi valued to single value

2010-06-09 Thread Marc Ghorayeb
Hello, Is there a way to copy a multivalued field to a single value by taking for example the first index of the multivalued field? I am actually trying to sort my index by Title and my index contains Tika extracted titles which come in as multi valued hence why my title field is multi valued.

RE: Making my QParserPlugin the default one, with cores

2010-06-09 Thread Ahmet Arslan
> Thanks, Ahmet. > Yes, my solrconfig.xml file is very similar to what you > wrote. > When I use &echoparams=all and defType=myqp, I get: > > > hi > all > myqp > > > However, when I do not use the defType (hoping it will be > automatically > Inserted from solrconfig),  I get: > > > hi > all

Re: how to get multicore to work?

2010-06-09 Thread Chris Rode
If you take a look in the examples directory there is a directory called multicore. This is an example of the solrhome of a multicore setup. Otherwise take a look at the logged output of Solr itself. It should tell you what is wrong with the setup On 9 June 2010 11:08, xdzgor wrote: > > Hi - I

Re: Filtering near-duplicates using TextProfileSignature

2010-06-09 Thread Neeb
Thanks guys. I will try this with some test documents, fingers crossed. And by the way, I got the minTokenLen parameter from one of the thread replies (from Erik). Cheerz, Ali -- View this message in context: http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignat

how to get multicore to work?

2010-06-09 Thread xdzgor
Hi - I can't seem to get "multicores" to work. I have a solr installtion which does not have a "solr.xml" file - I assume this means it is not multicore. If I create a solr.xml, as described on http://wiki.apache.org/solr/CoreAdmin, my solr installation fails - for example I get 404 errors when t

Re: Index search optimization for fulltext remote streaming

2010-06-09 Thread Danyal Mark
We have following solr configuration: java -Xms512M -Xmx1024M -Dsolr.solr.home= -jar start.jar in SolrConfig.xml false 4 20 1024 1 1000 1 native false 1024 4 false true

Re: Filtering near-duplicates using TextProfileSignature

2010-06-09 Thread Andrew Clegg
Markus Jelsma wrote: > > Well, it got me too! KMail didn't properly order this thread. Can't seem > to > find Hatcher's reply anywhere. ??!!? > Whole thread here: http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tt479039.html -- View this message in co

Re: Making my QParserPlugin the default one, with cores

2010-06-09 Thread Erik Hatcher
Yuval - my only hunch is that you're hitting a different request handler than where you configured the default defType. Send us the URL you're hitting Solr with, and the full request handler mapping. And you're sure you're the exact core you're hitting (since you mention multicore) you th

Re: [Blacklight-development] facet data cleanup

2010-06-09 Thread Erik Hatcher
On Jun 8, 2010, at 1:57 PM, Naomi Dushay wrote: Missing Facet Values: --- to find how many documents are missing values: facet.missing=true&facet.mincount=really big http://your.solr.baseurl/select?rows=0&facet.field=ffldname&facet.mincount=1000&facet.missing=true

Re: Filtering near-duplicates using TextProfileSignature

2010-06-09 Thread Markus Jelsma
Well, it got me too! KMail didn't properly order this thread. Can't seem to find Hatcher's reply anywhere. ??!!? On Tuesday 08 June 2010 22:00:06 Andrew Clegg wrote: > Andrew Clegg wrote: > > Re. your config, I don't see a minTokenLength in the wiki page for > > deduplication, is this a recent a

Re: Filtering near-duplicates using TextProfileSignature

2010-06-09 Thread Markus Jelsma
Here's my config for the updateProcessor. It not uses another signature method but i've used TextProfileSignature as well and it works - sort of. true sig true content org.apache.solr.update.processor.Lookup3Signature Of course, you must

Re: question about the fieldCollapseCache

2010-06-09 Thread Martijn v Groningen
The fieldCollapseCache should not be used as it is now, it uses too much memory. It stores any information relevant for a field collapse search. Like document collapse counts, collapsed document ids / fields, collapsed docset and uncollapsed docset (everything per unique search). So the memory usag

RE: Making my QParserPlugin the default one, with cores

2010-06-09 Thread Yuval Feinstein
Thanks, Ahmet. Yes, my solrconfig.xml file is very similar to what you wrote. When I use &echoparams=all and defType=myqp, I get: hi all myqp However, when I do not use the defType (hoping it will be automatically Inserted from solrconfig), I get: hi all Can you see what I am doing wrong?