Re: Questions on compound file format
Yonik Seeley wrote: > > Compound was a *lot* slower indexing in past versions of Lucene... > I've noticed the difference with Lucene 2.4.1 and Solr 1.3 of ~40% speed improvement on a RHEL 5.1 system while processing a fresh index of ~500,000 files by turning of the compound file. However, if you process a lot of files, you will inevitably get the FileNotFound (Too Many Files Open) exception. -Kenny -- View this message in context: http://old.nabble.com/Questions-on-compound-file-format-tp19318855p26903854.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr configuration with Text files
I am trying to figure out how to configure Solr. I have worked with the example and have been reading over the wiki, and am having some difficulty figuring out how I would setup this simple scenario: index a large number of text files(they aren't the csv files solr can ingest directly) that are named using an id. I have the schema.xml setup like this: I want to be able to define the id and subname fields in an xml file like this: 11012121200023232323 11012121200023232323_SYSTEM_OUT_.data But I want the 'content' field for the entry to be filled in with the contents of one of these id named files. I was looking into setting up a DataImportHandler but it looked like this was targeted at database uses(except the PlainTextEntityProcessor, which isn't available in the latest release [1.3]). What is the best way to go about doing this for Solr? -- View this message in context: http://www.nabble.com/Solr-configuration-with-Text-files-tp22438201p22438201.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr configuration with Text files
This functionality is possible 'out of the box', right? Or am I going to need to code up something that reads in the id named files and generates the xml file? -- View this message in context: http://www.nabble.com/Solr-configuration-with-Text-files-tp22438201p22440089.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr configuration with Text files
This functionality is possible 'out of the box', right? Or am I going to need to code up something that reads in the id named files and generates the xml file? -- View this message in context: http://www.nabble.com/Solr-configuration-with-Text-files-tp22438201p22440095.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr configuration with Text files
Thanks for the responses guys! I looked around the wiki for an example of using DataImportHandler to iterate over a list of files and read the content into a field and didn't find anything. I agree it would be useful! Erik Hatcher wrote: > > Using Solr Cell (ExtractingRequestHandler) which is now built into > trunk, and thus an eventual Solr 1.4 release, indexing a directory of > text (or even Word, PDF, etc) files is mostly 'out of the box'. > > It still requires scripting an iteration over all files and sending > them. Here's an example of doing that scripting using Ant and the ant- > contrib and tasks: > > > > > > Processing @{filename} > > failonerror="true"> > > > > > > > > > > > > > > > And it also should be possible, perhaps slightly easier and more built- > in to do the entire iteration using DataImportHandler's ability to > iterate over a list of files and read their contents into a field. > [an example of this on the wiki would be handy, or a pointer to it if > it doesn't already exist] > > Erik > > > On Mar 10, 2009, at 2:01 PM, KennyN wrote: > >> >> This functionality is possible 'out of the box', right? Or am I >> going to need >> to code up something that reads in the id named files and generates >> the xml >> file? >> -- >> View this message in context: >> http://www.nabble.com/Solr-configuration-with-Text-files-tp22438201p22440095.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Solr-configuration-with-Text-files-tp22438201p22457049.html Sent from the Solr - User mailing list archive at Nabble.com.
Multicore Solr not returning expects results from search
I have a two core multicore setup which currently has identical indices (just for testing, they will have different data when i deploy the system). I based this off the example, so core0 and core1. I have customized the scheme.xml and solrconfig.xml files(core0 and core1 are identical except that they have different data dirs), and have indexed data in both cores. I can start them without any errors showing up using: java -Dsolr.solr.home=multicore -jar start.jar I can search against each core correctly. For example, if I do a search for 'bob': Using http://192.168.55.101:8983/solr/core0/select/?version=2.1&q=contents:bob&start=0&rows=2 Returns 0 169 0 contents:bob 2.1 100 etc The other core: http://192.168.55.101:8983/solr/core1/select/?version=2.1&q=contents:bob&start=0&rows=2 Returns 0 169 0 contents:bob 2.1 100 etc So both of the cores return the same results as I expect (793). However, when I combine the search I get strange results. Doing the same search with the shards param: http://192.168.55.101:8983/solr/core0/select/?shards=192.168.55.101:8983/solr/core0,192.168.55.101:8983/solr/core1&version=2.1&q=contents:bob&start=0&rows=2 Returns 0 378 192.168.55.101:8983/solr/core0,192.168.55.101:8983/solr/core1 0 contents:bob 2.1 2 ... etc. So why would I not get the sum of the two cores results? If it's any help, when I place the shards param in the SearchHandler, Solr never returns any results(perhaps that is a different issue though?). -- View this message in context: http://www.nabble.com/Multicore-Solr-not-returning-expects-results-from-search-tp23623975p23623975.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore Solr not returning expects results from search
Not in this case. I literally copied the same index to the two shards, in my non-test environment they will be unique however. If that is the issue, wouldn't it lead to the number of results always being 793 in that case? 379 is less than either of the indices contain. markrmiller wrote: > > Do you have unique ids across shards? > > -- > - Mark > > http://www.lucidimagination.com > > > KennyN wrote: >> I have a two core multicore setup which currently has identical indices >> (just >> for testing, they will have different data when i deploy the system). I >> based this off the example, so core0 and core1. I have customized the >> scheme.xml and solrconfig.xml files(core0 and core1 are identical except >> that they have different data dirs), and have indexed data in both cores. >> >> I can start them without any errors showing up using: >> >> java -Dsolr.solr.home=multicore -jar start.jar >> >> I can search against each core correctly. For example, if I do a search >> for >> 'bob': >> Using >> http://192.168.55.101:8983/solr/core0/select/?version=2.1&q=contents:bob&start=0&rows=2 >> Returns >> >> >> 0 >> 169 >> >> 0 >> contents:bob >> 2.1 >> 100 >> >> >> etc >> >> The other core: >> http://192.168.55.101:8983/solr/core1/select/?version=2.1&q=contents:bob&start=0&rows=2 >> Returns >> >> >> 0 >> 169 >> >> 0 >> contents:bob >> 2.1 >> 100 >> >> >> etc >> >> >> So both of the cores return the same results as I expect (793). However, >> when I combine the search I get strange results. Doing the same search >> with >> the shards param: >> http://192.168.55.101:8983/solr/core0/select/?shards=192.168.55.101:8983/solr/core0,192.168.55.101:8983/solr/core1&version=2.1&q=contents:bob&start=0&rows=2 >> >> Returns >> >> >> 0 >> 378 >> >> > name="shards">192.168.55.101:8983/solr/core0,192.168.55.101:8983/solr/core1 >> >> 0 >> contents:bob >> 2.1 >> 2 >> >> >> ... etc. >> >> So why would I not get the sum of the two cores results? >> >> If it's any help, when I place the shards param in the SearchHandler, >> Solr >> never returns any results(perhaps that is a different issue though?). >> >> >> > > > > -- View this message in context: http://www.nabble.com/Multicore-Solr-not-returning-expects-results-from-search-tp23623975p23624593.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore Solr not returning expects results from search
I am still trying to figure this out... I am thinking maybe I have the shards setup wrong? If I have core0 and core1 with indices, and then I run the query on core0, specifying shards of core0 and core1. Is this how I should be doing it? Or should I have another core just to specify the other shards? -- View this message in context: http://www.nabble.com/Multicore-Solr-not-returning-expects-results-from-search-tp23623975p23729247.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore Solr not returning expects results from search
Thanks for the reply ahammad, that helps. Are you specifying them both in a URL, or in the localhost:8983/solr/core0,localhost:8983/solr/core1 like I have? I should add that I now have two indices that have different data in them. That is to say the ids are unique across both shards and I am still seeing this issue... I should also note that this is Solr 1.3, I don't think I mentioned that before. ahammad wrote: > > I have a multicore setup as well, and when I query something, I do it > through core0, then specify both core0 and core1 ins the "shards" > parameter. > > However, I don't have identical indicies. The results I get back are > basically and addition of both cores' results. > > Good luck, please reply to this message if you have it figured out, I am > curious to know what's going on. > > Regards > -- View this message in context: http://www.nabble.com/Multicore-Solr-not-returning-expects-results-from-search-tp23623975p23730204.html Sent from the Solr - User mailing list archive at Nabble.com.