Re: Need help with StackOverflowError

2010-04-07 Thread Jon Baer
You should maybe scan your db for bad data ... This bit ... at sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:324) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:561) Is probably happening on a specific record somewhere, in the query limit the id range and try to narrow down which

Re: Is there any other tool other than DIH to index a database

2010-04-07 Thread Jon Baer
There is the LuSQL tool which Ive used a few times. http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql http://www.slideshare.net/eby/lusql-quickly-and-easily-getting-your-data-from-your-dbms-into-lucene - Jon On Apr 7, 2010, at 11:26 PM, bbarani wrote: > > Hi, > > I am curr

Re: Need help with StackOverflowError

2010-04-07 Thread Blargy
If it helps at all to mention, I manually updated the last_index_time in conf/dataimport.properties so I could select a smaller subset and the delta-import worked which leads me to believe there is nothing wrong with my DIH delta queries themselves. There must be something wrong with my dataset th

Re: Is there any other tool other than DIH to index a database

2010-04-07 Thread Lance Norskog
SolrJ goes through the Solr stack. It talks to the Solr HTTP service or, in Embedded mode, to the top-level Solr code. All documents are processed just the same as if you uploaded them with 'curl'. You have to write JDBC code and submit the fields. There is no special code involved. On Wed, Apr

Is there any other tool other than DIH to index a database

2010-04-07 Thread bbarani
Hi, I am currently using DIH to index the data from a database. I am just trying to figure out if there are any other open source tools which I can use just for indexing purpose and use SOLR for querying. I also thought of writing a custom code for retrieving the data from database and use SOLRJ

Re: Multi-core memory problem

2010-04-07 Thread Lance Norskog
Sorting takes memory. What data types are the fields sorted on? If they're strings, that could be a space-eater. If they are ints or dates, not a problem. Do the queries pull all of the documents found? Or do they just fetch the, for example, first 10 documents? What are the cache statistics like

Re: Handling missing date fields in a date-oriented function query

2010-04-07 Thread Lance Norskog
> Since min(a,b) == -1*max(-1*a, -1*b), you could rewrite the previous > expression using this more complicated logic and it would work. But > that's ugly. > > Also, it would crash anyway. It looks like max currently requires one > of its arguments to be a float constant, and neither of our args wo

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-07 Thread Erick Erickson
Well, for a quick trial using trunk, I had to remove the UnicodeNormalizationFactory, is that yours? But with that removed, I get the results you do, ASSUMING that you've set your default operator to AND in schema.xml... Believe it or not, it all changes and all your queries return a hit if you d

Re: Suggestion for cachedSQLentityprocessor

2010-04-07 Thread Lance Norskog
The point of the cached table is that we don't know where interesting rows are. Loading from a DB is much faster when you grab the first N rows, the next N rows, etc. So, some strategy which switches back and forth between searching for a requested ID v.s. grabbing blocks would be very efficient.

Re: Best practice to handle misspellings

2010-04-07 Thread Lance Norskog
See the SpellCheckComponent: http://wiki.apache.org/solr/SpellCheckComponent Also, the phoneme filters like DoubleMetaphone turn a word into a series a phonemes. Misspellings that are in the right order will become the same series. I don't know how to build a spelling dictionary from a phoneme-fi

Re: solr best practice to submit many documents

2010-04-07 Thread Lance Norskog
Stream XML input (or CSV if you can make that happen) works fine. If the file is local, you can do a curl that would normally upload a file via POST, but give this parameter: stream.file=/full/path/name.xml Solr will read the file locally instead of through HTTP. On Wed, Apr 7, 2010 at 9:18 AM, W

Re: Backup/restore strategies for Solr cores and "legacy" Lucene applications

2010-04-07 Thread Lance Norskog
The NFS mount has to be done with distributed file locking. I don't know what DFL features are available. OS Native file locking is the default in solrconfig.xml, and I think this should be used in your scenario. But doing this over NFS is not likely to work well. On Tue, Apr 6, 2010 at 6:42 AM,

Multi-core memory problem

2010-04-07 Thread Victoria Kagansky
Hi, We are using Solr 1.4 running 2 cores each containing ~90M documents. Each core index size on the disk is ~ 120 G. The machine is a 64-bit quad-core 64G RAM running Windows Server 2008. Max heap size is set to 9G for the Tomcat process. Default caches are used. Our queries are complex and in

Handling missing date fields in a date-oriented function query

2010-04-07 Thread Chris Harris
I'm using function queries to boost more recent documents, using something like the recip(ms(NOW,mydatefield),3.16e-11,1,1) approach described on the wiki: http://wiki.apache.org/solr/FunctionQuery#Date_Boosting What I'd like to do is figure out the best way to tweak how documents with missi

Re: If you could have one feature in Solr...

2010-04-07 Thread Ingo Renner
Am 25.02.2010 um 02:07 schrieb Andy: > 1) Built-in hierarchical faceting > Right now there're 2 patches, SOLR-64 and SOLR-792. SOLR-64 seems to be > slated for 1.5 release but according to the wiki seems to have poor > performance. SOLR-792 has better performance according to the wiki but it's

Re: If you could have one feature in Solr...

2010-04-07 Thread Ingo Renner
Am 24.02.2010 um 14:42 schrieb Grant Ingersoll: > What would it be? Remote administration/editing/filling of synonyms.txt, stopwords.txt, ... through a request handler, maybe a JSON interface or similar best Ingo -- Ingo Renner TYPO3 Core Developer, Release Manager TYPO3 4.2, Admin Google S

Need help with StackOverflowError

2010-04-07 Thread Blargy
My last few delta-imports via DIH have been failing with a StackOverFlow error. Has anyone else encountered this why trying to importing? I don't even see any relevant information in the stack trace. Can anyone lend some suggestions. Thanks... pr 7, 2010 2:13:34 PM org.apache.solr.handler.dataimp

Re: Some help for folks trying to get new Solr/Lucene up in Eclipse

2010-04-07 Thread Chris Hostetter
: I had a slight hiccup that I just ignored. Even when I used Java 1.6 : JDK mode, Eclipse did not know this method. I had to comment out the : three places that use this method. : : javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(true) That method has existed since Java 1.5, so if you

Suggestion for cachedSQLentityprocessor

2010-04-07 Thread bbarani
Hi, I just thought of sharing a suggestion for overcoming OOM issues with CachedSQLEntityProcessor. Consider a scenario as below, If we have sub entities in DIH, ---> object --> object properties cachedSqlEntityprocessor works as below, • First enti

Best practice to handle misspellings

2010-04-07 Thread Blargy
Whats is the best way to handle misspellings? Complete ignore them and suggest alternative searches or some sort of fuzzy matching? Also, is it possible to use fuzzy matching using the dismax request handler? Thanks -- View this message in context: http://n3.nabble.com/Best-practice-to-handl

Re: including external files in config by corename

2010-04-07 Thread Shawn Heisey
On 4/7/2010 9:16 AM, Shawn Heisey wrote: On 4/5/2010 8:12 PM, Chris Hostetter wrote: what you cna do however, is have a distinct solrconfig.xml for each core, which is just a thin shell that uses XInclude to include big chunkcs of frequently reused declarations, and some cores can exclude some

Re: Minimum Should Match the other way round

2010-04-07 Thread MitchK
Erik, thank you for responsing. I will check the code to get some ideas for implementation. I do need some cached ressources like the CharArraySet of protected words for a WordDelimiterFilter (for the MAX_LEN-parameter mentioned by Hoss) or a SynonymFilter . I think it would consume too much t

RE: solr best practice to submit many documents

2010-04-07 Thread Wawok, Brian
I don't think I want to stream from Java, text munging in Java is a PITA. Would rather stream from a script, so need a more general solution. The Streaming document interface looks interesting, let me see if I can figure out how to achieve the same thing without a Java client.. Brian -Ori

Re: solr best practice to submit many documents

2010-04-07 Thread Paolo Castagna
Hi Brian, I had similar questions when I begun to try and evaluate Solr. If you use Java and SolrJ you might find these useful: - http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update - http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer

Is it possible to have Lucene and Solr (or two Solr instances) pointing at the same index directory?

2010-04-07 Thread Paolo Castagna
Hi, (I know that this is probably not recommended and not a common scenario, but...) Is it possible to have an application using Lucene and a separate (i.e. different JVM) instance of Solr both pointing at the same index and read/write to the index from both applications? I am trying (separately

solr best practice to submit many documents

2010-04-07 Thread Wawok, Brian
Hello, I am using SOLR for some proof of concept work, and was wondering if anyone has some guidance on a best practice. Background: Nightly get a delivery of a few 1000 reports. Each report is between 1 and 500,000 pages. For my proof of concept I am using a single 100,000 page report. I want

Short Question: Fills this entity multiValued Fields (DIH)?

2010-04-07 Thread MitchK
Hello all, I read through the wiki to find a solution on filling multiValued fields with, you guessed it, multiple values. :) What I have found was a short excerpt of code and I am not really sure, whether this fills a multiValued-field with multiple values. The code (not everything is relevant

Re: including external files in config by corename

2010-04-07 Thread Shawn Heisey
On 4/5/2010 8:12 PM, Chris Hostetter wrote: what you cna do however, is have a distinct solrconfig.xml for each core, which is just a thin shell that uses XInclude to include big chunkcs of frequently reused declarations, and some cores can exclude some of thes includes. (ie: turn the problem in

Re: Bucketing a price field

2010-04-07 Thread Blargy
Duh, didnt even think of that. This will probably be the easy way for now since we are only using a small number of predefined ranges. Thanks for the reply -- View this message in context: http://n3.nabble.com/Bucketing-a-price-field-tp701801p703169.html Sent from the Solr - User mailing list a

Re: including external files in config by corename

2010-04-07 Thread Shawn Heisey
On 4/5/2010 8:43 PM, Mark Miller wrote: On 04/05/2010 10:12 PM, Chris Hostetter wrote: : The best you have to work with at the moment is Xincludes: : : http://wiki.apache.org/solr/SolrConfigXml#XInclude : : and System Property Substitution: : : http://wiki.apache.org/solr/SolrConfigXml#System_pr

RE: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-07 Thread Demian Katz
Hello. It has been a few weeks, and I haven't gotten any responses. Perhaps my question is too complicated -- maybe a better approach is to try to gain enough knowledge to answer it myself. My gut feeling is still that it's something to do with the way term positions are getting handled by th

Re: Bucketing a price field

2010-04-07 Thread gwk
Oops, the new patch only works on Trie fields, other stuff I said should still be valid. (One extra thing to be aware of is double counting, see http://n3.nabble.com/Date-Faceting-and-Double-Counting-td502014.html for example) Regards, gwk On 4/7/2010 4:03 PM, gwk wrote: Hi, A while back I

Re: Bucketing a price field

2010-04-07 Thread gwk
Hi, A while back I created a patch for Solr (http://issues.apache.org/jira/browse/SOLR-1240) to do range faceting on numbers. I haven't uploaded an updated patch for Solr 1.4 yet, I'll try to do that shortly. I haven't tested it on a floating point field but in theory it should work on most n

Re: deploying nightly updates to slaves

2010-04-07 Thread Lukas Kahwe Smith
On 07.04.2010, at 14:24, Lukas Kahwe Smith wrote: > For Solr the idea is also just copy the index files into a new directory and > then use http://wiki.apache.org/solr/CoreAdmin#RELOAD after updating the > config file (I assume its not possible to hot swap like with MySQL). Since I want to ke

Re: Minimum Should Match the other way round

2010-04-07 Thread Erik Hatcher
On Apr 7, 2010, at 7:40 AM, MitchK wrote: I can't believe that Solr isn't caching data like the synonym.txt's etc. Solr does cache these, look at the implementation of SynonymFilterFactory where it keeps SynonymMap. Are there no ideas how to access them? There is a public getSynonymMap

Re: Bucketing a price field

2010-04-07 Thread Erik Hatcher
On Apr 6, 2010, at 8:44 PM, Blargy wrote: What would be the best way to do range bucketing on a price field? I'm sort of taking the example from the Solr 1.4 book and I was thinking about using a PatternTokenizerFactory with a SynonymFilterFactory. Is there a better way? For faceting..

deploying nightly updates to slaves

2010-04-07 Thread Lukas Kahwe Smith
Hi, For a project I am running a LAMP cluster (master and multiple slaves). Solr is running inside Jetty. To make things easy in terms of server management, all servers are configured the same way, and one server just acts as the MySQL master. As for Solr the only data changes happen over nigh

Re: Searching Lucene Indexes with Solr

2010-04-07 Thread Paolo Castagna
Erick Erickson wrote: It is possible but you have to take care to match Solr's schema with the structure of documents in the Lucene index. The correct field names and query-analyzers should be configured in schema.xml Is it possible to use Solr v1.4 together with a legacy Lucene (v2.1.0 and/or

Re: Minimum Should Match the other way round

2010-04-07 Thread MitchK
I can't believe that Solr isn't caching data like the synonym.txt's etc. Are there no ideas how to access them? - Mitch -- View this message in context: http://n3.nabble.com/Minimum-Should-Match-the-other-way-round-tp694867p702761.html Sent from the Solr - User mailing list archive at Nabble.c

Re: Searching Lucene Indexes with Solr

2010-04-07 Thread Erick Erickson
Copying from another answer to this question on the list (See "how to deploy index on SOLR")... It is possible but you have to take care to match Solr's schema with the structure of documents in the Lucene index. The correct field names and query-analyzers should be configured in schema.xml HTH

Re: no of cfs files are more that the mergeFactor

2010-04-07 Thread Michael McCandless
I like that name! That's a good way to think of it, assuming the available coins/bill denominations grow exponentially with a base roughly of mergeFactor :) It's also like the odometer on a car. Mike On Tue, Apr 6, 2010 at 10:51 PM, Lance Norskog wrote: > Ok, thanks. I'm studying the RAM buffe

Re: Search results based on priority

2010-04-07 Thread MitchK
Doni, have a look at DisMaxRequestHandler. For more information consider the Solr-Wiki. Kind regards - Mitch -- View this message in context: http://n3.nabble.com/Search-results-based-on-priority-tp701487p702350.html Sent from the Solr - User mailing list archive at Nabble.com.