Re: Using DIH to import 10 million records

2012-03-04 Thread Mikhail Khludnev
On Mon, Mar 5, 2012 at 5:56 AM, Lance Norskog wrote: > You can run the DIH with multiple threads feeding from the same query. > FWIW, https://issues.apache.org/jira/browse/SOLR-3011 > Depends also on the size of the document: large documents may index > faster if they have their own threads. Th

Re: Modify Standalone solr server to use it application without http request

2012-03-04 Thread Neel
Hi Erick, Sorry for confusing you. My concern was consuming Standalone solr server through my application can lead additional http requests. I tried to access standalone server using CommonsHttpSolrServer in app and found it works same as embedded solr server and no code change needed and not mu

Re: Using DIH to import 10 million records

2012-03-04 Thread Sonali Sambhus
Thanks for the info, Shawn! On Mon, Mar 5, 2012 at 6:49 AM, Shawn Heisey wrote: > On 3/4/2012 3:31 AM, Sphene Software wrote: > >> Folks, >> >> I am planning to use DIH for an index of size 10 million records. >> >> I would like to know the following; >> - Can DIH scale for this size of an index

Re: Fw:how to make fdx file

2012-03-04 Thread Li Li
lucene will never modify old segment files, it just flushes into a new segment or merges old segments into new one. after merging, old segments will be deleted. once a file(such as fdt and fdx) is generated. it will never be re-generated. the only possible is that in the generating stage, there is

Re: date queries too slow

2012-03-04 Thread veerene
Eric, thank you very much, it's really helpful! -- View this message in context: http://lucene.472066.n3.nabble.com/date-queries-too-slow-tp3794345p3799766.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: [SoldCloud] Slow indexing

2012-03-04 Thread Mark Miller
On Mar 4, 2012, at 5:43 PM, Markus Jelsma wrote: > everything stalls after it lists all segment files and that a ZK state change > has occured. Can you get a stack trace here? I'll try to respond to more tomorrow. What version of trunk are you using? We have been making fixes and improvements

Re: Using DIH to import 10 million records

2012-03-04 Thread Lance Norskog
You can run the DIH with multiple threads feeding from the same query. Depends also on the size of the document: large documents may index faster if they have their own threads. This may then interact with the new NRT multi-commit code. On Sun, Mar 4, 2012 at 5:19 PM, Shawn Heisey wrote: > On 3/4

Re: Using DIH to import 10 million records

2012-03-04 Thread Shawn Heisey
On 3/4/2012 3:31 AM, Sphene Software wrote: Folks, I am planning to use DIH for an index of size 10 million records. I would like to know the following; - Can DIH scale for this size of an indexes - If DIH is a bottleneck, what is the specific issue and how it can be addressed My entire index

Re: Fw:how to make fdx file

2012-03-04 Thread Erick Erickson
No, updating and optimizing will not delete and regenerate files. At most, they'll create the new segments and, only after everything is written, delete obsolete files. So whatever happened (and I haven't seen anything like this on the user's list), you're really out of luck. But the question is

Fwd: Re: [SoldCloud] Slow indexing

2012-03-04 Thread Markus Jelsma
Hi Perhaps, but my indexers just failed and receive a Service Unavaible error. The nodes themselves started to report multiple exceptions after many hours of indexing. One for an external file field which usually is not a problem and errors for not being able to talk to ZK. %2012-03-04 16:37

Re: [SoldCloud] Slow indexing

2012-03-04 Thread eks dev
hmm, loks like you are facing exactly the phenomena I asked about. See my question here: http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/61326 On Sun, Mar 4, 2012 at 9:24 PM, Markus Jelsma wrote: > Hi, > > With auto-committing disabled we can now index many millions of documents in

Re: Geonames/spatial stuff usable in 3.5?

2012-03-04 Thread Chris A Mattmann
(sorry for cross post -- if you are only interested in spatial portion, head over to discuss on sis-dev@) Hi Javi, Ultimately I'd love these patches to be applied to Solr trunk, or to the current 3.x stable branch. At the time I posted there was little interest in doing that and rather than have

Re: date queries too slow

2012-03-04 Thread Erick Erickson
You might find this of interest re: filter queries and NOW (the current time) http://www.lucidimagination.com/blog/2012/02/23/date-math-now-and-filter-queries/ Best Erick On Fri, Mar 2, 2012 at 3:56 PM, veerene wrote: > thanks for responding. we will try the trie fields. > the reason we are not

Re: Help with duplicate unique IDs

2012-03-04 Thread Erick Erickson
Thomas: It's *vaguely* possible that you have, say, a space in one of those keys. the string type is pretty stupid about things like that, it takes the raw input and does *nothing* to it. I'm assuming that you don't have shards. If you have shards it's possible you indexed the same uniqueKey to d

Re: Indexing and mapping multiple files to a unique solr id

2012-03-04 Thread Erick Erickson
Sounds like a fine application for using SolrJ. Here's a blog on the topic: http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/ In your case, just replace the tika bit with the PDFBox extraction you're using (or, you could just let Tika do it instead) and combine that with whatev

Indexing and mapping multiple files to a unique solr id

2012-03-04 Thread nitinkhosla79
My use case is to index 2 files: metadata file and a binary pdf file to a unique solr id. Metadata file has content in form of xml file and schema fields are mapped to elements in that file. What I do: Extract content from pdf files(using pdftotext), process that content and retrieve specific info

Geonames/spatial stuff usable in 3.5?

2012-03-04 Thread jmlucjav
Hi, I was looking for a way to use spatical search given a location name (like 'dallas,tx'), and also given an IP, and I found http://lucene.472066.n3.nabble.com/Spatial-Geonames-and-extension-to-Spatial-Solution-for-Solr-tc1311813.html this post by Chris Mattmann mentioning some work with Geona

[SoldCloud] Slow indexing

2012-03-04 Thread Markus Jelsma
Hi, With auto-committing disabled we can now index many millions of documents in our test environment on a 5-node cluster with 5 shards and a replication factor of 2. The documents are uploaded from map/reduce. No significant changes were made to solrconfig and there are no update processors

Re: General question on understanding Solr log output

2012-03-04 Thread Loren Siebert
Mikhail- That makes sense about the commits. Solr's reporting that it has 2 commits (the new one and the prior one), and then it deletes the old one and reports that there is 1 remaining commit in the directory (the one just applied). Thanks for explaining that. My next question is on all the logg

Re: Securing solr

2012-03-04 Thread Em
Hi, if you run Apache in front of your Tomcat-Instance/Servlet-Container, you can do that by specifying access-rules in your .htaccess-file (either password-based or IP-based). However there also exist Tomcat, JBoss, xyz-specific methods to do that. Try to search for it specific to your servlet-

Re: Securing solr

2012-03-04 Thread Gora Mohanty
On 4 March 2012 19:51, Ramo Karahasan wrote: [...] > i'm somehow unable to "secure" my  solr instance that runs on a dedicated > server. I have a webapplication that needs this solr instance, but the > webserver is running on another dedicated server. Is it possible to somehow > secure the solr in

org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'

2012-03-04 Thread PeterKerk
I wanted to to upgrade from version apache-solr-4.0-2010-10-12_08-05-48 to apache-solr-3.5.0. I installed apache-solr-3.5.0 and then copied all stuff from \example-DIH\solr of old installation to \example-DIH\solr but then I get the following error after I restart my server and try to do a full-imp

Securing solr

2012-03-04 Thread Ramo Karahasan
Hi, i'm somehow unable to "secure" my solr instance that runs on a dedicated server. I have a webapplication that needs this solr instance, but the webserver is running on another dedicated server. Is it possible to somehow secure the solr instance, e.g. with a web authentication mechanism or

Re: Need tokenization that finds part of stringvalue

2012-03-04 Thread Ahmet Arslan
> @iorixxx > I tried making my title_search of type text_rev and tried > adding the > ReversedWildcardFilterFactory to my existing "text" type, > but in both cases > no luck. I was able to perform *query* types of searches with solr 3.5 distro. Here is what I did: Download apache-solr-3.5.0 Edit

RE: Retrieving multiple levels with hierarchical faceting in Solr

2012-03-04 Thread adrian.strin...@holidaylettings.co.uk
At the moment, I'm just using a multi-valued string field. I was previously using a text field that was defined as follows: I've tried to have a look on the net, but I can't seem to find any d

Re: Need tokenization that finds part of stringvalue

2012-03-04 Thread PeterKerk
@iorixxx I tried making my title_search of type text_rev and tried adding the ReversedWildcardFilterFactory to my existing "text" type, but in both cases no luck. @Erick Erickson "On frequent method of doing leading and trailing wildcards is to use ngrams (as distinct from edgengrams). That in com

Using DIH to import 10 million records

2012-03-04 Thread Sphene Software
Folks, I am planning to use DIH for an index of size 10 million records. I would like to know the following; - Can DIH scale for this size of an indexes - If DIH is a bottleneck, what is the specific issue and how it can be addressed I also read about solrnet. Any experience using this and it's

Re: nutch log

2012-03-04 Thread alessio crisantemi
thanks koji, but i don't comprend ho can i do.. Il giorno 04 marzo 2012 06:31, Koji Sekiguchi ha scritto: > It is not solr error. Consult nutch/hadoop mailing list. > > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ > > (12/03/04 2:38), alessio crisantemi wrote: > >> no