How to index/search without whitespace but hightlight with whitespace?
Hey everyone! I'm trying to setup a Solr instance on some free text clinical data. This data has a lot of white space formatting, for example, I might have a document that contains unstructured bulleted lists or section titles. For example, blah blah blah... MEDICATIONS: * Xanax * Phenobritrol DIAGNOSIS: blah blah blah... When indexing (and thus querying) this document, I use a text field with tokenization, stemming, etc, lets call it "text". Unfortunately, when I try to print highlighted results, the newlines and whitespace are obviously not preserved. In an attempt to get around this, I created a second field in the index that stores the full content of each document as a string, thus preserving the whitespace, called "raw_text". If I setup the search page to search on the text field, but highlight on the text_raw field, then the highlighted matches don't always line up. Is there a way to some how project the stemmed matches from the text field onto the text_raw field when displaying hightlighting? Thank you for your time, Travis
Schema Design/Data Import
[Apologies if this is a duplicate -- I have sent several messages from my work email and they just vanish, so I subscribed with my personal email] Greetings. I am struggling to design a schema and a data import/update strategy for some semi-complicated data. I would appreciate any input. What we have is a bunch of database records that may or may not have files attached. Sometimes no files, sometimes 50. The requirement is to index the database records AND the documents, and the search results would be just links to the database records. I'd love to crawl the site with Nutch and be done with it, but we have a complicated search form with various codes and attributes for the database records, so we need a detailed schema that will loosely correspond to boxes on the search form. I don't think we could easily do that if we just crawl the site. But with a detailed schema, I'm having trouble understanding how we could import and index from the database, and also index the related files, and have the same schema being populated, especially with the number of related documents being variable (maybe index them all to one field?). We have a lot of flexibility on how we can build this, so I'm open to any suggestions or pointers for further reading. I've spent a fair amount of time on the wiki but I didn't see anything that seemed directly relevant. An additional difficulty, that I am willing to overlook for the first cut, is that some of these files are zipped, and some of the zip files may contain other zip files, to maybe 3 or 4 levels deep. Help, please? cheers, Travis
Re: Solr Server Add causes java.net.SocketException: No buffer space available
If it's a windows box, then you may be experiencing a kernel sockets leak problem. http://support.microsoft.com/kb/2577795 On Fri, Jun 14, 2013 at 1:20 PM, Shawn Heisey wrote: > On 6/14/2013 8:57 AM, Snubbel wrote: > >> Hello, >> >> I am upgrading from Solr 4.0 to 4.3 and a Testcase that worked fine is >> failing since. >> >> I do commit 1 Documents to Solr, then reload them and add a value to a >> multi-valued field with Atomic Update. >> I do commit every 50 Documents, so it's not so many at once, because the >> multi-valued field contains many values already. >> >> And at some point, I get this exception: >> >> java.net.SocketException: No buffer space available(maximum connections >> reached?): connect >> > > Looks like a client-side problem, either not enough java heap or you are > running out of connections because you're using a lot of connections at > once. This is happening on the client side, not the server side. That may > be an indication that you are doing something not quite right, but if you > actually do intend to create a lot of connections and you are using > HttpSolrServer, use code similar to this to bump up the max connections: > > ModifiableSolrParams params = new ModifiableSolrParams(); > params.set(HttpClientUtil.**PROP_MAX_CONNECTIONS, 1000); > params.set(HttpClientUtil.**PROP_MAX_CONNECTIONS_PER_HOST, 200); > HttpClient client = HttpClientUtil.createClient(**params); > String url = > "http://localhost:8983/solr/**collection1<http://localhost:8983/solr/collection1> > "; > SolrServer server = new HttpSolrServer(url, client); > > Thanks, > Shawn > > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
capacity planning
Greetings. I have a paltry 23,000 database records that point to a voluminous 300GB worth of PDF, Word, Excel, and other documents. We are planning on indexing the records and the documents they point to. I have no clue on how we can calculate what kind of server we need for this. I imagine the index isn't going to be bigger than the documents (is it?) so I suppose 1TB is a starting point for disk space. But what kind of processing power and memory might we need? Can anyone please point me in the right direction? cheers, Travis -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: capacity planning
Thanks, Erik! We probably won't use highlighting. Also, documents are added but *never* deleted. Does anyone have comments about memory and CPU resources required for indexing the 300GB of documents in a "reasonable" amount of time? It's okay if the initial indexing takes hours or maybe even days, but not too many days. Do we need 16GB of memory? 32GB? 8-core processor? I have zero sense of server requirements and I would appreciate any guidance. Do I need to be concerned about performance/resources later, when adding documents to an existing (large) index? cheers, Travis On Tue, Oct 11, 2011 at 9:49 AM, Erik Hatcher wrote: > Travis - > > Whether the index is bigger than the original content depends on what you > need to do with it in Solr. One of the primary deciding factors is if you > need to use highlighting, which currently requires the fields to be > highlighted be stored. Stored fields will take up about the same space as > the original documents (text-wise, likely a bit smaller than, say, the > actual Word doc itself). If you don't need highlighting or the contents > stored for other purposes, then you'll have a dramatically smaller index > than the original (roughly 35% the size, generally). > >Erik > > > On Oct 11, 2011, at 08:36 , Travis Low wrote: > > > Greetings. I have a paltry 23,000 database records that point to a > > voluminous 300GB worth of PDF, Word, Excel, and other documents. We are > > planning on indexing the records and the documents they point to. I have > no > > clue on how we can calculate what kind of server we need for this. I > > imagine the index isn't going to be bigger than the documents (is it?) so > I > > suppose 1TB is a starting point for disk space. But what kind of > processing > > power and memory might we need? Can anyone please point me in the right > > direction? > > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: capacity planning
Toke, thanks. Comments embedded (hope that's okay): On Tue, Oct 11, 2011 at 10:52 AM, Toke Eskildsen wrote: > > Greetings. I have a paltry 23,000 database records that point to a > > voluminous 300GB worth of PDF, Word, Excel, and other documents. We are > > planning on indexing the records and the documents they point to. I have > no > > clue on how we can calculate what kind of server we need for this. I > > imagine the index isn't going to be bigger than the documents (is it?) > > Sanity check: Let's say your average document is 200 pages with 1000 > words of 5 characters each. That gives you 200 * 1000 * 5 * 23,000 ~= > 21GB of raw text, which is a far cry from the 300GB. > > Either your documents are extremely text heavy or they contain > illustrations and other elements that are not to be indexed. Is it > possible for you to estimate the number of characters in your corpus? > Yes. We estimate each of the 23K DB records has 600 pages of text for the combined documents, 300 words per page, 5 characters per word. Which coincidentally works out to about 21GB, so good guessing there. :) > But what kind of processing power and memory might we need? I am not well-versed in Tika and other PDF/Word/etc analyzing > frameworks, so I'll just focus on the search part here. Guessing wildly, > you're aiming for a low number of running updates or even just a nightly > batch update. Response times should be below 200 ms and the number of > concurrent searches is 2 to 4 at most. > The way it works is we have researchers modifying the DB records during the day, and they may upload documents at that time. We estimate 50-60 uploads throughout the day. If possible, we'd like to index them as they are uploaded, but if that would negatively affect the search, then we can rebuild the index nightly. Which is better? > Bold claim: Assuming that your corpus is more 20GB of raw text than > 300GB, you'll get by just fine with an i7 machine with 8GB of RAM, a 1TB > 7200 RPM drive for storage and a 256GB consumer SSD for search. That is > more or less what we use for our 10M documents/60GB+ index, with a load > as I described above. > > I've always been wary of having to dictate hardware up front for such > projects. It is a lot easier and cheaper to just build the software, > then measure and buy hardware after that. > We have a very beefy VM server that we will use for benchmarking, but your specs provide a starting point. Thanks very much for that. cheers, Travis
Re: capacity planning
Our plan for the VM is just benchmarking, not production. We will turn off all guest machines, then configure a Solr VM. Then we'll tweak memory and see what effect it has on indexing and searching. Then we'll reconfigure the number of processors used and see what that does. Then again with more disk space. And so on. We'll try to start with a reasonable configuration and then make intelligent guesses for our changes so we don't spend a year on this. What we are trying to avoid is configuring a brand new box at the hoster, only to find we need a bigger and better box. Or, paying too much for something we don't need. Thanks everyone for your input, it was very helpful. cheers, Travis On Tue, Oct 11, 2011 at 2:19 PM, eks dev wrote: > Re. "I have little experience with VM servers for search." > > We had huge performance penalty on VMs, CPU was bottleneck. > We couldn't freely run measurements to figure out what the problem really > was (hosting was contracted by customer...), but it was something pretty > scary, kind of 8-10 times slower than advertised dedicated equivalent. > Whatever its worth, if you can afford it, keep lucene away from it. Lucene > is highly optimized machine, and someone twiddling with context switches is > not welcome there. > > Of course, if you get IO bound, it makes no big diff anyhow. > > This is just my singular experience, might be the hosting team did not > configure it right, or something changed in meantime (~ 4 Years old > experience), but we burnt our fingers that hard I still remember it > > > > > On Tue, Oct 11, 2011 at 7:49 PM, Toke Eskildsen >wrote: > > > Travis Low [t...@4centurion.com] wrote: > > > Toke, thanks. Comments embedded (hope that's okay): > > > > Inline or top-posting? Long discussion, but for mailing lists I clearly > > prefer the former. > > > > [Toke: Estimate characters] > > > > > Yes. We estimate each of the 23K DB records has 600 pages of text for > > the > > > combined documents, 300 words per page, 5 characters per word. Which > > > coincidentally works out to about 21GB, so good guessing there. :) > > > > Heh. Lucky Guess indeed, although the factors were off. Anyway, 21GB does > > not sound scary at all. > > > > > The way it works is we have researchers modifying the DB records during > > the > > > day, and they may upload documents at that time. We estimate 50-60 > > uploads > > > throughout the day. If possible, we'd like to index them as they are > > > uploaded, but if that would negatively affect the search, then we can > > > rebuild the index nightly. > > > > > > Which is better? > > > > The analyzing part is only CPU and you're running multi-core so as long > as > > you only analyze using one thread you're safe there. That leaves us with > > I/O: Even for spinning drives, a daily load of just 60 updates of 1MB of > > extracted text each shouldn't have any real effect - with the usual > caveat > > that large merges should be avoided by either optimizing at night or > > tweaking merge policy to avoid large segments. With such a relatively > small > > index, (re)opening and warm up should be painless too. > > > > Summary: 300GB is a fair amount of data and takes some power to crunch. > > However, in the Solr/Lucene end your index size and your update rates are > > nothing to worry about. Usual caveat for advanced use and all that > applies. > > > > [Toke: i7, 8GB, 1TB spinning, 256GB SSD] > > > > > We have a very beefy VM server that we will use for benchmarking, but > > your > > > specs provide a starting point. Thanks very much for that. > > > > I have little experience with VM servers for search. Although we use a > lot > > of virtual machines, we use dedicated machines for our searchers, > primarily > > to ensure low latency for I/O. They might be fine for that too, but we > > haven't tried it yet. > > > > Glad to be of help, > > Toke > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Multivalued fields question
Greetings. We're finally kicking off our little Solr project. We're indexing a paltry 25,000 records but each has MANY documents attached, so we're using Tika to parse those documents into a big long string, which we use in a call to solrj.addField("relateddoccontents", bigLongStringOfDocumentContents). We don't care about search results pointing back to a particular document, just one of the 25K records, so this should work. Now my question. Many of these records have related records in other tables, and there are several types of these related records. For example, we have record #100 that my have blue records with numbers , , , and , and red records with numbers , , , . Currently we're just handling these the same way as related document contents -- we concatenate them, separated by spaces, into one long string, then we do solrj.addField("redRecords", stringOfRedRecordNumbers). That is, stringOfRedRecordNumbers is " ". We have no need to show these records to the user in Solr search results, because we're going to use the database for displaying of detailed information for any records found. Is there any reason to specify redRecords and blueRecords as multivalued fields in schema.xml? And if we did that, we'd call solrj.addField() once for each value, would we not? cheers, Travis
Re: Multivalued fields question
Thanks much, Erick. Between your explanation, and what I read at http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html, the utility of multiValued fields is clear. On Thu, Nov 3, 2011 at 8:26 AM, Erick Erickson wrote: > multiValued has nothing to do with how many tokens are in the field, > it's just whether you can call document.add("field1", val1) more than > once on the same field. Or, equivalently, in input document in XML > has two entries with the same name="field" entries. So it > strictly depends upon whether you want to take it upon yourself > to make these long strings or call document.add once for each > value in the field. > > The field is returned as an array if it's multiValued > > Just to make your life interesting If you define your increment gap as > 0, > there is no difference between how multiValued fields are searched > as opposed to single-valued fields. > > FWIW > Erick > > On Tue, Nov 1, 2011 at 1:26 PM, Travis Low wrote: > > Greetings. We're finally kicking off our little Solr project. We're > > indexing a paltry 25,000 records but each has MANY documents attached, so > > we're using Tika to parse those documents into a big long string, which > we > > use in a call to solrj.addField("relateddoccontents", > > bigLongStringOfDocumentContents). We don't care about search results > > pointing back to a particular document, just one of the 25K records, so > > this should work. > > > > Now my question. Many of these records have related records in other > > tables, and there are several types of these related records. For > example, > > we have record #100 that my have blue records with numbers , , > > , and , and red records with numbers , , , . > > Currently we're just handling these the same way as related document > > contents -- we concatenate them, separated by spaces, into one long > string, > > then we do solrj.addField("redRecords", stringOfRedRecordNumbers). That > > is, stringOfRedRecordNumbers is " ". > > > > We have no need to show these records to the user in Solr search results, > > because we're going to use the database for displaying of detailed > > information for any records found. Is there any reason to specify > > redRecords and blueRecords as multivalued fields in schema.xml? And if > we > > did that, we'd call solrj.addField() once for each value, would we not? > > > > cheers, > > > > Travis > > > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Problems installing Solr PHP extension
I know this isn't strictly Solr, but I've been at this for hours and I'm at my wits end. I cannot install the Solr PECL extension ( http://pecl.php.net/package/solr), either by command line "pecl install solr" or by downloading and using phpize. Always the same error, which I see here: http://www.lmpx.com/nav/article.php/news.php.net/php.qa.reports/24197/read/index.html It boils down to this: PHP Warning: PHP Startup: Unable to load dynamic library '/root/solr-0.9.11/modules/solr.so' - /root/solr-0.9.11/modules/solr.so: undefined symbol: curl_easy_getinfo in Unknown on line 0 I am using the current Solr PECL extension. PHP 5.3.8. Curl 7.21.3. Yes, libcurl and libcurl-dev are both installed, also 7.21.3. Fedora Core 15, patched to current levels. Please help! cheers, Travis -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: Problems installing Solr PHP extension
Thanks so much for responding. I tried your suggestion and the pecl build *seems* to go okay, but after restarting Apache, I get this again in the error_log: > PHP Warning: PHP Startup: Unable to load dynamic library > '/usr/lib64/php/modules/solr.so' - /usr/lib64/php/modules/solr.so: > undefined symbol: curl_easy_getinfo in Unknown on line 0 I'm baffled by this because the undefined symbol is in libcurl.so, and I've specified the path to that library. If I can't solve this problem then we'll basically have to write our own PHP Solr client, which would royally suck. cheers, Travis On Wed, Nov 16, 2011 at 7:11 AM, Adolfo Castro Menna < adolfo.castrome...@gmail.com> wrote: > Pecl installation is kinda buggy. I installed it ignoring pecl dependencies > because I already had them. > > Try: pecl install -n solr (-n ignores dependencies) > And when it prompts for curl and libxml, point the path to where you have > installed them, probably in /usr/lib/ > > Cheers, > Adolfo. > > On Tue, Nov 15, 2011 at 7:27 PM, Travis Low wrote: > > > I know this isn't strictly Solr, but I've been at this for hours and I'm > at > > my wits end. I cannot install the Solr PECL extension ( > > http://pecl.php.net/package/solr), either by command line "pecl install > > solr" or by downloading and using phpize. Always the same error, which I > > see here: > > > > > http://www.lmpx.com/nav/article.php/news.php.net/php.qa.reports/24197/read/index.html > > > > It boils down to this: > > PHP Warning: PHP Startup: Unable to load dynamic library > > '/root/solr-0.9.11/modules/solr.so' - /root/solr-0.9.11/modules/solr.so: > > undefined symbol: curl_easy_getinfo in Unknown on line 0 > > > > I am using the current Solr PECL extension. PHP 5.3.8. Curl 7.21.3. > Yes, > > libcurl and libcurl-dev are both installed, also 7.21.3. Fedora Core 15, > > patched to current levels. > > > > Please help! > > > > cheers, > > > > Travis > > -- > > > > ** > > > > *Travis Low, Director of Development* > > > > > > ** * * > > > > *Centurion Research Solutions, LLC* > > > > *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* > > > > *703-956-6276 *•* 703-378-4474 (fax)* > > > > *http://www.centurionresearch.com* <http://www.centurionresearch.com> > > > > **The information contained in this email message is confidential and > > protected from disclosure. If you are not the intended recipient, any > use > > or dissemination of this communication, including attachments, is > strictly > > prohibited. If you received this email message in error, please delete > it > > and immediately notify the sender. > > > > This email message and any attachments have been scanned and are believed > > to be free of malicious software and defects that might affect any > computer > > system in which they are received and opened. No responsibility is > accepted > > by Centurion Research Solutions, LLC for any loss or damage arising from > > the content of this email. > > > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: Problems installing Solr PHP extension
Ah, ausgezeichnet, thank you Kuli! We'll just use that. On Wed, Nov 16, 2011 at 11:35 AM, Michael Kuhlmann wrote: > Am 16.11.2011 17:11, schrieb Travis Low: > > >> If I can't solve this problem then we'll basically have to write our own >> PHP Solr client, which would royally suck. >> > > Oh, if you really can't get the library work, no problem - there are > several PHP clients out there that don't need a PECL installation. > > Personally, I have used > http://code.google.com/p/solr-**php-client/<http://code.google.com/p/solr-php-client/>, > it works well. > > -Kuli > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
setting up schema (newbie question)
I have a large database table with many document records, and I plan to use SOLR to improve the searching for the documents. The twist here is that perhaps 50% of the records will originate from outside sources, and sometimes those records may be updated versions of documents we already have. Currently, a human visually examines the incoming information and performs a few document searches, and decides if a new document must be created, or an existing one should be updated. We would like to automate the matching to some extent, and it occurs to me that SOLR might be useful for this as well. Each document has many attributes that can be used for matching. The attributes are all in lookup tables. For example, there is a "location" field that might be something like "Central Public Library, Crawford, NE" for row with id #. The incoming document might have something like "Crawford Central Public Library, Nebraska", which ideally would map to # as well. I'm currently thinking that a two-phase import might work. First, we use SOLR to try and get a list of attribute ids for the incoming document. Those can be used for ordinary database queries to find primary keys of potential matches. Then we use SOLR again to search the reduced list for the unstructured information, essentially by including those primary keys as part of the search. I was looking at the example for DIH here: http://wiki.apache.org/solr/DataImportHandler and it is clear, but it obviously slanted on finding the products. I need to find the categories so that I can *then* find the products, if that makes sense. Any suggestions on how to proceed? My first thought is that I should set up two SOLR instances, one for indexing only attributes, and one for the documents themselves. Thanks in advance for any help. cheers, Travis
Re: stream.url problem
"Connection refused" (in any context) almost always means that nothing is listening on the TCP port that you are trying to connect to. So either the process you are connecting to isn't running, or you are trying to connect to the wrong port. On Tue, Aug 17, 2010 at 6:18 AM, satya swaroop wrote: > hi all, > i am indexing the documents to solr that are in my system. now i need > to index the files that are in remote system, i enabled the remote > streaming > to true in solrconfig.xml and when i use the stream.url it shows the error > as ""connection refused"" and the detail of the error is::: > > when i sent the request in my browser as:: > > > http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdf&literal.id=schb2 > > i get the error as > > HTTP Status 500 - Connection refused java.net.ConnectException: Connection > refused at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown > Source) at > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at > [snip] > > > if any body know > please help me with this > > regards, > satya >
Re: Solr, c/s type ?
I'll guess he means client/server. On Tue, Sep 7, 2010 at 5:52 PM, Chris Hostetter wrote: > > : Subject: Solr, c/s type ? > : > : i'm wondering c/s type is possible (not http web type). > : if possible, could i get the material about it? > > You're going t oneed to provide more info exaplining what it is you are > asking baout -- i don't know about anyone else, but i honestly have > absolutely no idea what you might possibly mean by "c/s type is possible > (not http web type)" > > -Hoss > > -- > http://lucenerevolution.org/ ... October 7-8, Boston > http://bit.ly/stump-hoss ... Stump The Chump! > >
Re: DIH fails after processing roughly 10million records
What you describe sounds right to me and seems consistent with the error stacktrace.. I would increase the MySQL wait_timeout to 3600 and, depending on your server, you might want to also increase max_connections. cheers, Travis On Tue, Jan 8, 2013 at 4:10 AM, vijeshnair wrote: > Solr version : 4.0 (running with 9GB of RAM) > MySQL : 5.5 > JDBC : mysql-connector-java-5.1.22-bin.jar > > I am trying to run the full import for my catalog data which is roughly > 13million of products. The DIH ran smoothly for 18 hours, and processed > roughly 10million of records. But all of a sudden it broke due to the jdbc > exception i.e. Communication failure with the server. I did an extensive > googling on this topic, and there are multiple recommendation to use > "readonly=true", "autocommit=true" etc. If I understand it correctly, the > possible reason is when DIH stops indexing due to the segment merging, and > when it tries to reconnect with the server. When index is slightly large > and > multiple merging happening at the same time, DIH stops indexing for some > time, and by the time it re-starts MySQL would have already discontinued > the > connection. So I am going to increase the wait time out at MySQL side from > the default 120 to some thing slightly large, to see if that solve the > issue > or not. I would know the result of that approach only after completing one > full run, which I will update you tomorrow. Mean time I thought of > validating my approach, and checking with you for any other fix which > exist. > > Here is the error stack > > Jan 8, 2013 12:44:00 PM org.apache.solr.handler.dataimport.JdbcDataSource > closeConnection > SEVERE: Ignoring Error when closing connection > java.sql.SQLException: Streaming result set > com.mysql.jdbc.RowDataDynamic@32d051c1 is still active. No statements may > be > issued when any streaming result sets are open and in use on a given > connection. Ensure that you have called .close() on any active streaming > result sets before attempting more queries. > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:923) > at > com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:3234) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2399) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728) > at > com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4908) > at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4794) > at > com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4403) > at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1594) > at > > org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:400) > at > > org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:391) > at > > org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:291) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:280) > at > > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382) > at > > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448) > at > > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429) > Jan 8, 2013 12:44:00 PM org.apache.solr.handler.dataimport.JdbcDataSource > closeConnection > SEVERE: Ignoring Error when closing connection > com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: > Communications link failure during rollback(). Transaction resolution > unknown. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) > at com.mysql.jdbc.Util.getInstance(Util.java:386) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1014) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:988) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:974) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:919) > at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionIm
Re: solr query
Yes. Write a program to consume the result xml and then spit it back out the way you'd like to see it. cheers, Travis On Tue, Jan 22, 2013 at 1:23 PM, hassancrowdc wrote: > ? > > > On Tue, Jan 22, 2013 at 12:24 PM, hassancrowdc [via Lucene] < > ml-node+s472066n4035390...@n3.nabble.com> wrote: > > > thnx. One quick question, can I control the way resultset of the query is > > shown: I mean if i want displayName to be shown first and then the id and > > then manufacturer and model? is there any way i can do that? > > > > -- > > If you reply to this email, your message will be added to the discussion > > below: > > http://lucene.472066.n3.nabble.com/solr-query-tp4035325p4035390.html > > To unsubscribe from solr query, click here< > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4035325&code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDM1MzI1fC00ODMwNzMyOTM= > > > > . > > NAML< > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml > > > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr-query-tp4035325p4035421.html > Sent from the Solr - User mailing list archive at Nabble.com. -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: Benefits of Solr over Lucene?
http://lucene.apache.org/solr/ On Tue, Feb 12, 2013 at 10:40 AM, JohnRodey wrote: > I know that Solr web-enables a Lucene index, but I'm trying to figure out > what other things Solr offers over Lucene. On the Solr features list it > says "Solr uses the Lucene search library and extends it!", but what > exactly > are the extensions from the list and what did Lucene give you? Also if I > have an index built through Solr is there a non-HTTP way to search that > index? Because solr4j essentially just makes HTTP requests correct? > > Some features Im particularly interested in are: > Geospatial Search > Highlighting > Dynamic Fields > Near Real-Time Indexing > Multiple Search Indices > > Thanks! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
querying multivalue fields
If a query matches one or more values of a multivalued field, is it possible to get the indexes back for WHICH values? For example, for a document with a multivalue field having ["red", "redder", "reddest", "yellow", "blue"] as its value, if "red" is the query, could we know that values 0,1, and 2 matched? Against all hope, if that's "yes", then the next question is, would the values be listed in the order they were specified when adding the document? The idea here is that each document may have a variable number of multiple external (e.g. Word) documents associated with it, and for any match, we not only want to provide a link to the Solr document, but also, be able to tell the user which external documents matched. The contents of these documents would populate the multivalued field (a very big field). If that can't be done, I think what we'll do is do some kind of prefixed hash of the document name and embed that in each mutlivalued field value (each document content). The prefix would contain (or be another hash of) the document id. Then we could find which documents matched, could we not? Sorry if this is a dumb question. I've asked about this before, and received some *very* useful input (thanks!) but nothing that has yet lead me to a robust solution for indexing a set of records along with their associated documents and being able to identify the matching record AND the matching document(s). Thanks for your help! cheers, Travis -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: UTF-8 support during indexing content
Are you sure the input document is in UTF-8? That looks like classic ISO-8859-1-treated-as-UTF-8. How did you confirm the document contains the right quote marks immediately prior to uploading? If you just visually inspected it, then use whatever tool you viewed it in to see what the character set is. cheers, Travis On Wed, Feb 1, 2012 at 9:17 AM, Van Tassell, Kristian < kristian.vantass...@siemens.com> wrote: > Hello everyone, > > I have a question that I imagine has been asked many times before, so I > apologize for the repeat. > > I have a basic text field with the following text: >the word ”stemming” in quotes > > Uploading the data yields no errors, however when it is indexed, the text > looks like this: > > the word â€�stemmingâ€� in quotes > > > Searching for the word stemming, without quotes or otherwise, does not > return any hits. > > Just some basic facts: > > - I included the solr.CollationKeyFilterFactory filter on the fieldType. > - Updating the index is done via a "solr xml" document. I've confirmed > that the document contains the right quote marks immediately prior to > uploading. > - Updating the index is done via solrj, essentially: >DirectXmlRequest up = new DirectXmlRequest( "/update", xml ); >solrServer.request( up ); >solrServer.commit(); > - In solr admin, the characters look like garbage, surrounding the word > stemming (as shown above) > > > Thanks in advance for any details you can provide! > -Kristian > **
Re: Why my email always been rejected?
I received it...sometimes it just needs some time. 2012/3/20 怪侠 <87863...@qq.com> > I send email to :solr-user@lucene.apache.org, but I always receive the > rejected email. It can't send successful. -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: UPDATE query in deltaquery
If you are getting a null pointer exception here: colNames = readFieldNames(resultSet.getMetaData()); Then that implies the DIH code is written to expect a select statement. You might be able to fool it with some SQL injection: update blah set foo=bar where id=1234; select id from blah But if that doesn't work then you may be out of luck. cheers, Travis On Thu, Dec 30, 2010 at 8:26 AM, Juan Manuel Alvarez wrote: > Erick: > > Thanks for the quick response. > > I can't use the timestamp for doing DIH, so I need to use a custom > field that I need to update one for each delta-import, so that is why > I need to execute an UPDATE on the deltaQuery. > > Cheers! > Juan M. > > On Thu, Dec 30, 2010 at 10:07 AM, Erick Erickson > wrote: > > WARNING: DIH isn't my strong suit, I generally prefer doing things > > in SolrJ. Mostly I asked for clarification so someone #else# who > > actually knows DIH details could chime in... > > > > That said, I'm a bit confused. As I understand it, you shouldn't > > be UPDATEing anything in DIH, it's a select where documents > > then get added to Solr "by magic". Your post leads me to believe > > that you're trying to change the database via DIH, is that at > > all true? > > > > This is based in part on > > "The ids are returned ok, but the UPDATE has no effect on the database" > > Or do you mean "effect on the index"? If the latter, then the select > > would only have a chance of updating the IDs of the Solr documents... > > > > At least I think that's close to reality... > > > > Best > > Erick > > > > On Thu, Dec 30, 2010 at 7:52 AM, Juan Manuel Alvarez >wrote: > > > >> Hi Erick! > >> > >> Here is my DIH configuration: > >> > >> > >> >> > >> > > url="jdbc:postgresql://${dataimporter.request.dbHost}:${dataimporter.request.dbPort}/${dataimporter.request.dbName}" > >>user="${dataimporter.request.dbUser}" > >> password="${dataimporter.request.dbPassword}" autoCommit="false" > >>transactionIsolation="TRANSACTION_READ_UNCOMMITTED" > >> holdability="CLOSE_CURSORS_AT_COMMIT"/> > >> > >> >>query=' . ' > >> deltaImportQuery=' . ' > >>deltaQuery=' . ' > >>> > >> > >> > >> > >> > >> I have tried two options for the deltaQuery: > >> UPDATE "Global"."Projects" SET "prj_lastSync" = now() WHERE "prj_id" = > >> '2'; < Throws a null pointer exception as described in the > >> previous email > >> > >> The second option is a DB function that I am calling this way: > >> SELECT "get_deltaimport_items" AS "id" FROM > >> project.get_deltaimport_items(2, 'project'); > >> > >> The function inside executes the UPDATE query shown above and a SELECT > >> query for the ids. > >> The ids are returned ok, but the UPDATE has no effect on the database. > >> > >> Cheers! > >> Juan M. > >> > >> > >> On Thu, Dec 30, 2010 at 1:32 AM, Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > Well, let's see the queries you're sending, and your DIH > configuration. > >> > > >> > Otherwise, we're just guessing... > >> > > >> > Best > >> > Erick > >> > > >> > On Wed, Dec 29, 2010 at 9:58 PM, Juan Manuel Alvarez < > naici...@gmail.com > >> >wrote: > >> > > >> >> Hi! I would like to ask you a question about using a deltaQuery in > DIH. > >> >> I am syncing with a PostgreSQL database. > >> >> > >> >> At first I was calling a function that made two queries: an UPDATE > and a > >> >> SELECT. > >> >> The select result was properly returned, but the UPDATE query did not > >> >> made any changes, > >> >> so I tried calling the same function from a PostgreSQL client and > >> >> everything went OK. > >> >> > >> >> So I tried calling a simple UPDATE query directly in the deltaQuery > >> >> and I receive a > >> >> NullPointerException that I traced to the line 251 of the > &g
Why does the StatsComponent only work with indexed fields?
Is there a reason why the StatsComponent only deals with indexed fields? I just updated the wiki: http://wiki.apache.org/solr/StatsComponent to call this fact out since it was not apparent previously. I've briefly skimmed the source of StatsComponent, but am not familiar enough with the code or Solr yet to understand if it was omitted for performance reasons or some other reason. Any information would be appreciated. Thanks, Travis
use case: structured DB records with a bunch of related files
Greetings. I have a bunch of highly structured DB records, and I'm pretty clear on how to index those. However, each of those records may have any number of related documents (Word, Excel, PDF, PPT, etc.). All of this information will change over time. Can someone point me to a use case or some good reading to get me started on configuring Solr to index the DB records and files in such a way as to relate the two types of information? By "relate", I mean that if there's a hit in a related file, then I need to show the user a link to the DB record as well as a link to the file. Thanks in advance. cheers, Travis -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Schema design/data import
Greetings. I am struggling to design a schema and a data import/update strategy for some semi-complicated data. I would appreciate any input. What we have is a bunch of database records that may or may not have files attached. Sometimes no files, sometimes 50. The requirement is to index the database records AND the documents, and the search results would be just links to the database records. I'd love to crawl the site with Nutch and be done with it, but we have a complicated search form with various codes and attributes for the database records, so we need a detailed schema that will loosely correspond to boxes on the search form. I don't think we could easily do that if we just crawl the site. But with a detailed schema, I'm having trouble understanding how we could import and index from the database, and also index the related files, and have the same schema being populated, especially with the number of related documents being variable (maybe index them all to one field?). We have a lot of flexibility on how we can build this, so I'm open to any suggestions or pointers for further reading. I've spent a fair amount of time on the wiki but I didn't see anything that seemed directly relevant. An additional difficulty, that I am willing to overlook for the first cut, is that some of these files are zipped, and some of the zip files may contain other zip files, to maybe 3 or 4 levels deep. Help, please? cheers, Travis -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: Schema Design/Data Import
Thanks so much Erick (and Stefan). Yes, I did some reading on SolrJ and Tika and you are spot-on. We will write our own importer using SolrJ and then we can grab the DB records and parse any attachments along the way. Now it comes down to a schema design question. The issue I'm struggling with is what kind of field or fields to use for the attachments. The reason for the difficulty is that the documents we're most interested in are the DB records, not the attachments, and there could be 0 or 3 or 50 attachments for a single DB record. Should we: (1) Just add fields called "attachment_0", "attachment_1", ... , "attachment_100" to the schema? (2) Somehow index all attachments to a single field? (Is this even possible?) (3) Use dynamic fields? (4) None of the above? The idea is that if there is a hit in one of the attachments, then we need to show a link to the DB record. It would be nice to show a link the the document as well, but that's less important. cheers, Travis On Mon, Jul 25, 2011 at 9:49 AM, Erick Erickson wrote: > I'd seriously consider going with SolrJ as your indexing strategy, it > allows > you to do anything you need to do in Java code. You can call the Tika > library yourself on the files pointed to by your rows as you see fit, > indexing > them as you choose, perhaps one Solr doc per attachment, perhaps one > per row, whatever. > > Best > Erick > > On Wed, Jul 20, 2011 at 3:27 PM, wrote: > > > > [Apologies if this is a duplicate -- I have sent several messages from my > work email and they just vanish, so I subscribed with my personal email] > > > > Greetings. I am struggling to design a schema and a data import/update > strategy for some semi-complicated data. I would appreciate any input. > > > > What we have is a bunch of database records that may or may not have > files attached. Sometimes no files, sometimes 50. > > > > The requirement is to index the database records AND the documents, and > the search results would be just links to the database records. > > > > I'd love to crawl the site with Nutch and be done with it, but we have a > complicated search form with various codes and attributes for the database > records, so we need a detailed schema that will loosely correspond to boxes > on the search form. I don't think we could easily do that if we just crawl > the site. But with a detailed schema, I'm having trouble understanding how > we could import and index from the database, and also index the related > files, and have the same schema being populated, especially with the number > of related documents being variable (maybe index them all to one field?). > > > > We have a lot of flexibility on how we can build this, so I'm open to > any suggestions or pointers for further reading. I've spent a fair amount > of time on the wiki but I didn't see anything that seemed directly > relevant. > > > > An additional difficulty, that I am willing to overlook for the first > cut, is that some of these files are zipped, and some of the zip files may > contain other zip files, to maybe 3 or 4 levels deep. > > > > Help, please? > > > > cheers, > > > > Travis > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: SOLR 4.0 DataImport frozen or fails with WARNING: Unable to read: dataimport.properties?
Change your data-config.xml connection XML to this: Then try again. This keeps the driver from trying to fetch the entire result set at the same time. cheers, Travis On Fri, Sep 7, 2012 at 4:17 AM, deniz wrote: > Hi all, > > I have been trying to index my data from mysql db, but somehow i cant > index > anything, and dont see any exception / error in logs, except a warning > which > is highlighted below... > > Here is my db-config's connection string: > > url="jdbc:mysql://dbhost:3396/myDB" user="XXX" password="XXX" /> > > (I can connect to the db from command line by using the above settings) > > and after i start dataimport i see these in the log: > > INFO: Starting Full Import > Sep 07, 2012 4:08:21 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:08:21 PM > org.apache.solr.handler.dataimport.SimplePropertiesWriter > readIndexerProperties > *WARNING: Unable to read: dataimport.properties* > Sep 07, 2012 4:08:21 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 > call > INFO: Creating a connection for entity user with URL: > jdbc:mysql://10.60.1.157:3396/poppen > Sep 07, 2012 4:08:22 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 > call > INFO: Time taken for getConnection(): 802 > Sep 07, 2012 4:08:23 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=1 > Sep 07, 2012 4:08:25 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=1 > Sep 07, 2012 4:08:27 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:08:29 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=1 > Sep 07, 2012 4:08:31 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:08:33 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:08:36 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=1 > Sep 07, 2012 4:08:38 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=1 > Sep 07, 2012 4:08:40 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=1 > Sep 07, 2012 4:08:42 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:08:44 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:08:46 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=1 > Sep 07, 2012 4:08:49 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:08:51 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:08:53 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:08:55 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:08:58 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:09:00 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:09:02 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:09:06 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > Sep 07, 2012 4:09:08 PM org.apache.solr.core.SolrCore execute > INFO: [collection1] webapp=/solr path=/dataimport params={command=status} > status=0 QTime=0 > S
Accidental multivalued fields?
Greetings. I am using Solr 3.4.0 with tomcat 7.0.22. I've been using these versions successfully for a while, but on my latest project, I cannot sort ANY field without getting this exception: SEVERE: org.apache.solr.common.SolrException: can not sort on multivalued field: id at org.apache.solr.schema.SchemaField.checkSortability(SchemaField.java:161) at org.apache.solr.schema.TrieField.getSortField(TrieField.java:126) at org.apache.solr.schema.SchemaField.getSortField(SchemaField.java:144) at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:385) at org.apache.solr.search.QParser.getSort(QParser.java:251) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) [snip] The thing is, I have only one multivalued field in my schema, at least, I thought so. I even tried sorting on id, which is the unique key, and got the same error. Here are the fields in my schema: id I can post the entire schema.xml if need be. Can anyone please tell me what's going on? cheers, Travis -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: Accidental multivalued fields?
Thanks much! It was the schema version attribute -- the recycled schema.xml I used did not contain that very useful comment. Everything works great now! On Fri, Sep 14, 2012 at 1:56 PM, Chris Hostetter wrote: > > : Greetings. I am using Solr 3.4.0 with tomcat 7.0.22. I've been using > : these versions successfully for a while, but on my latest project, I > cannot > : sort ANY field without getting this exception: > : > : SEVERE: org.apache.solr.common.SolrException: can not sort on multivalued > > ... > > : The thing is, I have only one multivalued field in my schema, at least, I > : thought so. I even tried sorting on id, which is the unique key, and got > : the same error. Here are the fields in my schema: > > a) multiValued can be set on fieldType and is then inherited by the fields > > b) Check the "version" property on your tag. If the value is > "1.0" then all fields are assumed to be multiValued. > > Here's the comment from the example schema included with Solr 3.4... > > > > > > -Hoss > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: Items disappearing from Solr index
That makes sense on the surface, but Kissue makes a good point. Shouldn't the delete match the same documents as the search? He said no documents come back when he searches on the phrase, but documents are deleted when he uses the same phrase. cheers, Travis On Wed, Sep 26, 2012 at 9:37 AM, Jack Krupansky wrote: > It is looking for documents with "Emory" in the specified field OR "Labs" > in the default search field. > > -- Jack Krupansky > > -Original Message- From: Kissue Kissue > Sent: Wednesday, September 26, 2012 7:47 AM > To: solr-user@lucene.apache.org > Subject: Re: Items disappearing from Solr index > > I have just solved this problem. > > We have a field called catalogueId. One possible value for this field could > be "Emory Labs". I found out that when the following delete by query is > sent to solr: > > getSolrServer().deleteByQuery(**catalogueId + ":" + Emory Labs) [Notice > that > there are no quotes surrounding the catalogueId value - Emory Labs] > > For some reason this delete by query ends up deleting the contents of some > other random catalogues too which is the reason why we are loosing items > from the index. When the query is changed to: > > getSolrServer().deleteByQuery(**catalogueId + ":" + "Emory Labs"), then it > starts to correctly delete only items in the Emory Labs catalogue. > > So my first question is, what exactly does deleteByQuery do in the first > query without the quotes? How is it determining which catalogues to delete? > > Secondly, shouldn't the correct behaviour be not to delete anything at all > in this case since when a search is done for the same catalogueId without > the quotes it just simply returns no results? > > Thanks. > > > On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue > wrote: > > Hi Erick, >> >> Thanks for your reply. Yes i am using delete by query. I am currently >> logging the number of items to be deleted before handing off to solr. And >> from solr logs i can it deleted exactly that number. I will verify >> further. >> >> Thanks. >> >> >> On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson >> **wrote: >> >> How do you delete items? By ID or by query? >>> >>> My guess is that one of two things is happening: >>> 1> your delete process is deleting too much data. >>> 2> your index process isn't indexing what you think. >>> >>> I'd add some logging to the SolrJ program to see what >>> it thinks is has deleted or added to the index and go from there. >>> >>> Best >>> Erick >>> >>> On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue >>> wrote: >>> > Hi, >>> > >>> > I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer >>> to >>> > index and delete items from solr. >>> > >>> > I basically index items from the db into solr every night. Existing >>> items >>> > can be marked for deletion in the db and a delete request sent to solr >>> to >>> > delete such items. >>> > >>> > My process runs as follows every night: >>> > >>> > 1. Check if items have been marked for deletion and delete from solr. I >>> > commit and optimize after the entire solr deletion runs. >>> > 2. Index any new items to solr. I commit and optimize after all the new >>> > items have been added. >>> > >>> > Recently i started noticing that huge chunks of items that have not > >>> been >>> > marked for deletion are disappearing from the index. I checked the solr >>> > logs and the logs indicate that it is deleting exactly the number of >>> items >>> > requested but still a lot of other items disappear from the index from >>> time >>> > to time. Any ideas what might be causing this or what i am doing wrong. >>> > >>> > >>> > Thanks. >>> >>> >> >> > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: PHP client for a web application
Hi Esteban, A year ago, we tried to use the Apache Solr PECL extension but we were unable to install it, much less use it. We ended up using the Solr-PHP-client. It has worked perfectly for a year and we have had zero problems with it. We haven't tried using Solarium. Good luck! cheers, Travis On Tue, Oct 2, 2012 at 3:37 PM, Esteban Cacavelos < estebancacave...@gmail.com> wrote: > Hi, I'm starting a web application using solr as a search engine. The web > site will be developed in PHP (maybe I'll use a framework also). > > I would like to know some thoughts and opinions about the clients ( > http://wiki.apache.org/solr/SolPHP). I didn't like very much the PHP > extension option because I think this is a limitation. So, I would like to > read opinions about SOLARIUM and SOLR-PHP-CLIENT. > > > Thanks in advance! > > > -- > Esteban L. Cacavelos de Amoriza > Cel: 0981 220 429 > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: Urgent Help Needed: Solr Data import problem
Like Amit said, this appears not to be a Solr problem. From the command line of your machine, try this: mysql -u'readonly' -p'readonly' -h'10.86.29.32' hpcms_db_new If that works, and 10.86.29.32 is the server referenced by the URL in your data-config.xml problem, then at least you know you have database connectivity, and to the right server. Also, if your unix server (presumably your mysql server) is 10.86.29.32, then the URL in your data-config.xml is pointing to the wrong machine. If the one in the data-config.xml is correct, you need to test for connectivity to that machine instead. cheers, Travis On Tue, Oct 30, 2012 at 5:15 AM, kunal sachdeva wrote: > Hi, > > This is my data-config file:- > > > > > > > > name="package" query="select concat('pckg', id) as id,pkg_name,updated_time > from hp_package_info;"> > > > name="destination" > query="select name,id from hp_city"> > > > > > > > > and password is not null. and 10.86.29.32 is my unix server ip. > > regards, > kunal > > On Tue, Oct 30, 2012 at 2:42 PM, Dave Stuart wrote: > > > It looks as though you have a password set on your unix server. you will > > need to either remove this or ti add the password into the connection > string > > > > e.g. readonly:[yourpassword]@'10.86.29.32' > > > > > > > > >> 'readonly'@'10.86.29.32' > > >> (using password: NO)" > > On 30 Oct 2012, at 09:08, kunal sachdeva wrote: > > > > > Hi, > > > > > > I'm not getting this error while running in local machine. Please Help > > > > > > Regards, > > > Kunal > > > > > > On Tue, Oct 30, 2012 at 10:32 AM, Amit Nithian > > wrote: > > > > > >> This looks like a MySQL permissions problem and not a Solr problem. > > >> "Caused by: java.sql.SQLException: Access denied for user > > >> 'readonly'@'10.86.29.32' > > >> (using password: NO)" > > >> > > >> I'd advise reading your stack traces a bit more carefully. You should > > >> check your permissions or if you don't own the DB, check with your DBA > > >> to find out what user you should use to access your DB. > > >> > > >> - Amit > > >> > > >> On Mon, Oct 29, 2012 at 9:38 PM, kunal sachdeva > > >> wrote: > > >>> Hi, > > >>> > > >>> I have tried using data-import in my local system. I was able to > > execute > > >> it > > >>> properly. but when I tried to do it unix server I got following > error:- > > >>> > > >>> > > >>> INFO: Starting Full Import > > >>> Oct 30, 2012 9:40:49 AM > > >>> org.apache.solr.handler.dataimport.SimplePropertiesWriter > > >>> readIndexerProperties > > >>> WARNING: Unable to read: dataimport.properties > > >>> Oct 30, 2012 9:40:49 AM org.apache.solr.update.DirectUpdateHandler2 > > >>> deleteAll > > >>> INFO: [core0] REMOVING ALL DOCUMENTS FROM INDEX > > >>> Oct 30, 2012 9:40:49 AM org.apache.solr.core.SolrDeletionPolicy > onInit > > >>> INFO: SolrDeletionPolicy.onInit: commits:num=1 > > >>> > > >>> > > >> > > > commit{dir=/opt/testsolr/multicore/core0/data/index,segFN=segments_1,version=1351490646879,generation=1,filenames=[segments_1] > > >>> Oct 30, 2012 9:40:49 AM org.apache.solr.core.SolrDeletionPolicy > > >>> updateCommits > > >>> INFO: newest commit = 1351490646879 > > >>> Oct 30, 2012 9:40:49 AM > > >> org.apache.solr.handler.dataimport.JdbcDataSource$1 > > >>> call > > >>> INFO: Creating a connection for entity destination with URL: > > >> jdbc:mysql:// > > >>> 172.16.37.160:3306/hpcms_db_new > > >>> Oct 30, 2012 9:40:50 AM org.apache.solr.common.SolrException log > > >>> SEVERE: Exception while processing: destination document : > > >>> > > >> > > > SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException: > > >>> Unable to execute query: select name,id from hp_city Processing > > Document > > >> # 1 > > >>>at > > >>> > > >> > > > org.apache.solr.handler.dataimport.DocBu
Re: Urgent Help Needed: Solr Data import problem
We're getting a little far afield...but here is the incantation: mysql> grant all on DBNAME.* to 'USER'@'IP-ADDRESS' identified by 'PASSWORD'; mysql> flush privileges; cheers, Travis On Tue, Oct 30, 2012 at 2:40 PM, Amit Nithian wrote: > This error is typically because of a mysql permissions problem. These > are usually resolved by a GRANT statement on your DB to allow for > users to connect remotely to your database server. > > I don't know the full syntax but a quick search on Google should yield > what you are looking for. If you don't control access to this DB, talk > to your sys admin who does maintain this access and s/he should be > able to help resolve this. > > On Tue, Oct 30, 2012 at 7:13 AM, Travis Low wrote: > > Like Amit said, this appears not to be a Solr problem. From the command > > line of your machine, try this: > > > > mysql -u'readonly' -p'readonly' -h'10.86.29.32' hpcms_db_new > > > > If that works, and 10.86.29.32 is the server referenced by the URL in > your > > data-config.xml problem, then at least you know you have database > > connectivity, and to the right server. > > > > Also, if your unix server (presumably your mysql server) is 10.86.29.32, > > then the URL in your data-config.xml is pointing to the wrong machine. > If > > the one in the data-config.xml is correct, you need to test for > > connectivity to that machine instead. > > > > cheers, > > > > Travis > > > > On Tue, Oct 30, 2012 at 5:15 AM, kunal sachdeva < > kunalsachde...@gmail.com>wrote: > > > >> Hi, > >> > >> This is my data-config file:- > >> > >> > >> > >> > >> > >> > >> > >> >> name="package" query="select concat('pckg', id) as > id,pkg_name,updated_time > >> from hp_package_info;"> > >> > >> > >> >> name="destination" > >> query="select name,id from hp_city"> > >> > >> > >> > >> > >> > >> > >> > >> and password is not null. and 10.86.29.32 is my unix server ip. > >> > >> regards, > >> kunal > >> > >> On Tue, Oct 30, 2012 at 2:42 PM, Dave Stuart > wrote: > >> > >> > It looks as though you have a password set on your unix server. you > will > >> > need to either remove this or ti add the password into the connection > >> string > >> > > >> > e.g. readonly:[yourpassword]@'10.86.29.32' > >> > > >> > > >> > > >> > >> 'readonly'@'10.86.29.32' > >> > >> (using password: NO)" > >> > On 30 Oct 2012, at 09:08, kunal sachdeva wrote: > >> > > >> > > Hi, > >> > > > >> > > I'm not getting this error while running in local machine. Please > Help > >> > > > >> > > Regards, > >> > > Kunal > >> > > > >> > > On Tue, Oct 30, 2012 at 10:32 AM, Amit Nithian > >> > wrote: > >> > > > >> > >> This looks like a MySQL permissions problem and not a Solr problem. > >> > >> "Caused by: java.sql.SQLException: Access denied for user > >> > >> 'readonly'@'10.86.29.32' > >> > >> (using password: NO)" > >> > >> > >> > >> I'd advise reading your stack traces a bit more carefully. You > should > >> > >> check your permissions or if you don't own the DB, check with your > DBA > >> > >> to find out what user you should use to access your DB. > >> > >> > >> > >> - Amit > >> > >> > >> > >> On Mon, Oct 29, 2012 at 9:38 PM, kunal sachdeva > >> > >> wrote: > >> > >>> Hi, > >> > >>> > >> > >>> I have tried using data-import in my local system. I was able to > >> > execute > >> > >> it > >> > >>> properly. but when I tried to do it unix server I got following > >> error:- > >> > >>> > >> > >>> > >> > >>> INFO: Starting Full Import > >> > >>> Oct 30, 2012 9:40:49 AM > &
Re: Dynamic core selection
If I understand you correctly, you would use a multicore setup and send the request to http://server.com/solr/core0 in one case, and http://server.com/solr/core1 in the other. Is there something else that makes this complicated? cheers, Travis On Thu, Nov 1, 2012 at 12:08 PM, Dzmitry Petrushenka wrote: > Hi All! > > I need to be able to send requests to 2 different cores based on the value > of some request parameter. > > First core (active) contains most recent docs. This core is used in 99% of > cases. > > The second core (it has 100-1000 times more docs then active core) and > used in 0.1% of cases. > > We wrote our own search handler (mostly based on the standard one but > handling our own custom params) and I wonder if there is a way to customize > Solr so we could direct calls to the required core based on request params > user passes? > > Any help would be helpful. > > Thanx, > -- ** *Travis Low, Director of Development* ** * * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Dynamically augment search with data
So my need is this: I have a site in which a user does a query for other users. The user can filter the query by different parameters that will limit the result set. One of the things about the system is that the user's can like different objects (Products, Services, etc.). When the user searches the index by a query and it returns a list of users I want to be able to calculate the "shared likes" between the user and each user result in the the returned result set. I would like to then append the calculation in each result in the result set and then sort by the greatest number of "shared likes", thereby making the results more relevant to the user. I would like to have this calculation run before the paging process kicks in so this function will be applied to the result set right before paging. I am using Solr 1.4 and have read just a little on FunctionQuery. Is this what I am needing to perform this task? *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~ Travis Chase ~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*