How to index/search without whitespace but hightlight with whitespace?

2015-06-11 Thread Travis
Hey everyone!

I'm trying to setup a Solr instance on some free text clinical data.
This data has a lot of white space formatting, for example, I might have a
document that contains unstructured bulleted lists or section titles.

For example,

blah blah blah...
MEDICATIONS:
* Xanax
* Phenobritrol

DIAGNOSIS:
blah blah blah...

When indexing (and thus querying) this document, I use a text field with
tokenization, stemming, etc, lets call it "text".

Unfortunately, when I try to print highlighted results, the newlines and
whitespace are obviously not preserved. In an attempt to get around this, I
created a second field in the index that stores the full content of each
document as a string, thus preserving the whitespace, called "raw_text".

If I setup the search page to search on the text field, but highlight on
the text_raw field, then the highlighted matches don't always line up. Is
there a way to some how project the stemmed matches from the text field
onto the text_raw field when displaying hightlighting?

Thank you for your time,
Travis


Schema Design/Data Import

2011-07-20 Thread travis

[Apologies if this is a duplicate -- I have sent several messages from my work 
email and they just vanish, so I subscribed with my personal email]
 
Greetings.  I am struggling to design a schema and a data import/update  
strategy for some semi-complicated data.  I would appreciate any input.

What we have is a bunch of database records that may or may not have files 
attached.  Sometimes no files, sometimes 50.
 
The requirement is to index the database records AND the documents,  and the 
search results would be just links to the database records.

I'd  love to crawl the site with Nutch and be done with it, but we have a  
complicated search form with various codes and attributes for the  database 
records, so we need a detailed schema that will loosely  correspond to boxes on 
the search form.  I don't think we could easily  do that if we just crawl the 
site.  But with a detailed schema, I'm  having trouble understanding how we 
could import and index from the  database, and also index the related files, 
and have the same schema  being populated, especially with the number of 
related documents being  variable (maybe index them all to one field?).
 
We have a lot of flexibility on how we can build this, so I'm open  to any 
suggestions or pointers for further reading.  I've spent a fair  amount of time 
on the wiki but I didn't see anything that seemed  directly relevant.
 
An additional difficulty, that I am willing to overlook for the  first cut, is 
that some of these files are zipped, and some of the zip  files may contain 
other zip files, to maybe 3 or 4 levels deep.  

Help, please?
 
cheers,

Travis

Re: Solr Server Add causes java.net.SocketException: No buffer space available

2013-06-14 Thread Travis Low
If it's a windows box, then you may be experiencing a kernel sockets leak
problem.

http://support.microsoft.com/kb/2577795


On Fri, Jun 14, 2013 at 1:20 PM, Shawn Heisey  wrote:

> On 6/14/2013 8:57 AM, Snubbel wrote:
>
>> Hello,
>>
>> I am upgrading from Solr 4.0 to 4.3 and a Testcase that worked fine is
>> failing since.
>>
>> I do commit 1 Documents to Solr, then reload them and add a value to a
>> multi-valued field with Atomic Update.
>> I do commit every 50 Documents, so it's not so many at once, because the
>> multi-valued field contains many values already.
>>
>> And at some point, I get this exception:
>>
>> java.net.SocketException: No buffer space available(maximum connections
>> reached?): connect
>>
>
> Looks like a client-side problem, either not enough java heap or you are
> running out of connections because you're using a lot of connections at
> once.  This is happening on the client side, not the server side. That may
> be an indication that you are doing something not quite right, but if you
> actually do intend to create a lot of connections and you are using
> HttpSolrServer, use code similar to this to bump up the max connections:
>
> ModifiableSolrParams params = new ModifiableSolrParams();
> params.set(HttpClientUtil.**PROP_MAX_CONNECTIONS, 1000);
> params.set(HttpClientUtil.**PROP_MAX_CONNECTIONS_PER_HOST, 200);
> HttpClient client = HttpClientUtil.createClient(**params);
> String url = 
> "http://localhost:8983/solr/**collection1<http://localhost:8983/solr/collection1>
> ";
> SolrServer server = new HttpSolrServer(url, client);
>
> Thanks,
> Shawn
>
>


-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


capacity planning

2011-10-11 Thread Travis Low
Greetings.  I have a paltry 23,000 database records that point to a
voluminous 300GB worth of PDF, Word, Excel, and other documents.  We are
planning on indexing the records and the documents they point to.  I have no
clue on how we can calculate what kind of server we need for this.  I
imagine the index isn't going to be bigger than the documents (is it?) so I
suppose 1TB is a starting point for disk space.  But what kind of processing
power and memory might we need?  Can anyone please point me in the right
direction?

cheers,

Travis

-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed to
be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from the
content of this email.


Re: capacity planning

2011-10-11 Thread Travis Low
Thanks, Erik!  We probably won't use highlighting.  Also, documents are
added but *never* deleted.

Does anyone have comments about memory and CPU resources required for
indexing the 300GB of documents in a "reasonable" amount of time?  It's okay
if the initial indexing takes hours or maybe even days, but not too many
days.  Do we need 16GB of memory?  32GB?  8-core processor?  I have zero
sense of server requirements and I would appreciate any guidance.

Do I need to be concerned about performance/resources later, when adding
documents to an existing (large) index?

cheers,

Travis

On Tue, Oct 11, 2011 at 9:49 AM, Erik Hatcher wrote:

> Travis -
>
> Whether the index is bigger than the original content depends on what you
> need to do with it in Solr.  One of the primary deciding factors is if you
> need to use highlighting, which currently requires the fields to be
> highlighted be stored.  Stored fields will take up about the same space as
> the original documents (text-wise, likely a bit smaller than, say, the
> actual Word doc itself).  If you don't need highlighting or the contents
> stored for other purposes, then you'll have a dramatically smaller index
> than the original (roughly 35% the size, generally).
>
>Erik
>
>
> On Oct 11, 2011, at 08:36 , Travis Low wrote:
>
> > Greetings.  I have a paltry 23,000 database records that point to a
> > voluminous 300GB worth of PDF, Word, Excel, and other documents.  We are
> > planning on indexing the records and the documents they point to.  I have
> no
> > clue on how we can calculate what kind of server we need for this.  I
> > imagine the index isn't going to be bigger than the documents (is it?) so
> I
> > suppose 1TB is a starting point for disk space.  But what kind of
> processing
> > power and memory might we need?  Can anyone please point me in the right
> > direction?
>
>


-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed to
be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from the
content of this email.


Re: capacity planning

2011-10-11 Thread Travis Low
Toke, thanks.  Comments embedded (hope that's okay):

On Tue, Oct 11, 2011 at 10:52 AM, Toke Eskildsen 
wrote:

> > Greetings.  I have a paltry 23,000 database records that point to a
> > voluminous 300GB worth of PDF, Word, Excel, and other documents.  We are
> > planning on indexing the records and the documents they point to.  I have
> no
> > clue on how we can calculate what kind of server we need for this.  I
> > imagine the index isn't going to be bigger than the documents (is it?)
>
> Sanity check: Let's say your average document is 200 pages with 1000
> words of 5 characters each. That gives you 200 * 1000 * 5 * 23,000 ~=
> 21GB of raw text, which is a far cry from the 300GB.
>
> Either your documents are extremely text heavy or they contain
> illustrations and other elements that are not to be indexed. Is it
> possible for you to estimate the number of characters in your corpus?
>

Yes.  We estimate each of the 23K DB records has 600 pages of text for the
combined documents, 300 words per page, 5 characters per word.  Which
coincidentally works out to about 21GB, so good guessing there. :)

>  But what kind of processing power and memory might we need?

 I am not well-versed in Tika and other PDF/Word/etc analyzing
> frameworks, so I'll just focus on the search part here. Guessing wildly,
> you're aiming for a low number of running updates or even just a nightly
> batch update. Response times should be below 200 ms and the number of
> concurrent searches is 2 to 4 at most.
>

The way it works is we have researchers modifying the DB records during the
day, and they may upload documents at that time.  We estimate 50-60 uploads
throughout the day.  If possible, we'd like to index them as they are
uploaded, but if that would negatively affect the search, then we can
rebuild the index nightly.

Which is better?


> Bold claim: Assuming that your corpus is more 20GB of raw text than
> 300GB, you'll get by just fine with an i7 machine with 8GB of RAM, a 1TB
> 7200 RPM drive for storage and a 256GB consumer SSD for search. That is
> more or less what we use for our 10M documents/60GB+ index, with a load
> as I described above.
>
> I've always been wary of having to dictate hardware up front for such
> projects. It is a lot easier and cheaper to just build the software,
> then measure and buy hardware after that.
>

We have a very beefy VM server that we will use for benchmarking, but your
specs provide a starting point.  Thanks very much for that.

cheers,

Travis


Re: capacity planning

2011-10-11 Thread Travis Low
Our plan for the VM is just benchmarking, not production.  We will turn off
all guest machines, then configure a Solr VM.  Then we'll tweak memory and
see what effect it has on indexing and searching.  Then we'll reconfigure
the number of processors used and see what that does.  Then again with more
disk space.  And so on.  We'll try to start with a reasonable configuration
and then make intelligent guesses for our changes so we don't spend a year
on this.

What we are trying to avoid is configuring a brand new box at the hoster,
only to find we need a bigger and better box.  Or, paying too much for
something we don't need.

Thanks everyone for your input, it was very helpful.

cheers,
Travis

On Tue, Oct 11, 2011 at 2:19 PM, eks dev  wrote:

> Re. "I have little experience with VM servers for search."
>
> We had huge performance penalty on VMs,  CPU was bottleneck.
> We couldn't freely run measurements to figure out what the problem really
> was (hosting was contracted by customer...), but it was something pretty
> scary, kind of 8-10 times slower than advertised dedicated equivalent.
> Whatever its worth, if you can afford it, keep lucene away from it. Lucene
> is highly optimized machine, and someone twiddling with context switches is
> not welcome there.
>
> Of course, if you get IO bound, it makes no big diff anyhow.
>
> This is just my singular experience, might be the hosting team did not
> configure it right, or something changed in meantime (~ 4 Years old
> experience),  but we burnt our fingers that hard I still remember it
>
>
>
>
> On Tue, Oct 11, 2011 at 7:49 PM, Toke Eskildsen  >wrote:
>
> > Travis Low [t...@4centurion.com] wrote:
> > > Toke, thanks.  Comments embedded (hope that's okay):
> >
> > Inline or top-posting? Long discussion, but for mailing lists I clearly
> > prefer the former.
> >
> > [Toke: Estimate characters]
> >
> > > Yes.  We estimate each of the 23K DB records has 600 pages of text for
> > the
> > > combined documents, 300 words per page, 5 characters per word.  Which
> > > coincidentally works out to about 21GB, so good guessing there. :)
> >
> > Heh. Lucky Guess indeed, although the factors were off. Anyway, 21GB does
> > not sound scary at all.
> >
> > > The way it works is we have researchers modifying the DB records during
> > the
> > > day, and they may upload documents at that time.  We estimate 50-60
> > uploads
> > > throughout the day.  If possible, we'd like to index them as they are
> > > uploaded, but if that would negatively affect the search, then we can
> > > rebuild the index nightly.
> > >
> > > Which is better?
> >
> > The analyzing part is only CPU and you're running multi-core so as long
> as
> > you only analyze using one thread you're safe there. That leaves us with
> > I/O: Even for spinning drives, a daily load of just 60 updates of 1MB of
> > extracted text each shouldn't have any real effect - with the usual
> caveat
> > that large merges should be avoided by either optimizing at night or
> > tweaking merge policy to avoid large segments. With such a relatively
> small
> > index, (re)opening and warm up should be painless too.
> >
> > Summary: 300GB is a fair amount of data and takes some power to crunch.
> > However, in the Solr/Lucene end your index size and your update rates are
> > nothing to worry about. Usual caveat for advanced use and all that
> applies.
> >
> > [Toke: i7, 8GB, 1TB spinning, 256GB SSD]
> >
> > > We have a very beefy VM server that we will use for benchmarking, but
> > your
> > > specs provide a starting point.  Thanks very much for that.
> >
> > I have little experience with VM servers for search. Although we use a
> lot
> > of virtual machines, we use dedicated machines for our searchers,
> primarily
> > to ensure low latency for I/O. They might be fine for that too, but we
> > haven't tried it yet.
> >
> > Glad to be of help,
> > Toke
>



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed to
be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from the
content of this email.


Multivalued fields question

2011-11-01 Thread Travis Low
Greetings.  We're finally kicking off our little Solr project.  We're
indexing a paltry 25,000 records but each has MANY documents attached, so
we're using Tika to parse those documents into a big long string, which we
use in a call to solrj.addField("relateddoccontents",
bigLongStringOfDocumentContents).  We don't care about search results
pointing back to a particular document, just one of the 25K records, so
this should work.

Now my question.  Many of these records have related records in other
tables, and there are several types of these related records.  For example,
we have record #100 that my have blue records with numbers , ,
, and , and red records with numbers , , , .
Currently we're just handling these the same way as related document
contents -- we concatenate them, separated by spaces, into one long string,
then we do solrj.addField("redRecords", stringOfRedRecordNumbers).  That
is, stringOfRedRecordNumbers is "   ".

We have no need to show these records to the user in Solr search results,
because we're going to use the database for displaying of detailed
information for any records found.  Is there any reason to specify
redRecords and blueRecords as multivalued fields in schema.xml?  And if we
did that, we'd call solrj.addField() once for each value, would we not?

cheers,

Travis


Re: Multivalued fields question

2011-11-03 Thread Travis Low
Thanks much, Erick.  Between your explanation, and what I read at
http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html,
the utility of multiValued fields is clear.

On Thu, Nov 3, 2011 at 8:26 AM, Erick Erickson wrote:

> multiValued has nothing to do with how many tokens are in the field,
> it's just whether you can call document.add("field1", val1) more than
> once on the same field. Or, equivalently, in input document in XML
> has two  entries with the same name="field" entries. So it
> strictly depends upon whether you want to take it upon yourself
> to make these long strings or call document.add once for each
> value in the field.
>
> The field is returned as an array if it's multiValued
>
> Just to make your life interesting If you define your increment gap as
> 0,
> there is no difference between how multiValued fields are searched
> as opposed to single-valued fields.
>
> FWIW
> Erick
>
> On Tue, Nov 1, 2011 at 1:26 PM, Travis Low  wrote:
> > Greetings.  We're finally kicking off our little Solr project.  We're
> > indexing a paltry 25,000 records but each has MANY documents attached, so
> > we're using Tika to parse those documents into a big long string, which
> we
> > use in a call to solrj.addField("relateddoccontents",
> > bigLongStringOfDocumentContents).  We don't care about search results
> > pointing back to a particular document, just one of the 25K records, so
> > this should work.
> >
> > Now my question.  Many of these records have related records in other
> > tables, and there are several types of these related records.  For
> example,
> > we have record #100 that my have blue records with numbers , ,
> > , and , and red records with numbers , , , .
> > Currently we're just handling these the same way as related document
> > contents -- we concatenate them, separated by spaces, into one long
> string,
> > then we do solrj.addField("redRecords", stringOfRedRecordNumbers).  That
> > is, stringOfRedRecordNumbers is "   ".
> >
> > We have no need to show these records to the user in Solr search results,
> > because we're going to use the database for displaying of detailed
> > information for any records found.  Is there any reason to specify
> > redRecords and blueRecords as multivalued fields in schema.xml?  And if
> we
> > did that, we'd call solrj.addField() once for each value, would we not?
> >
> > cheers,
> >
> > Travis
> >
>



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Problems installing Solr PHP extension

2011-11-15 Thread Travis Low
I know this isn't strictly Solr, but I've been at this for hours and I'm at
my wits end.  I cannot install the Solr PECL extension (
http://pecl.php.net/package/solr), either by command line "pecl install
solr" or by downloading and using phpize.  Always the same error, which I
see here:
http://www.lmpx.com/nav/article.php/news.php.net/php.qa.reports/24197/read/index.html

It boils down to this:
PHP Warning: PHP Startup: Unable to load dynamic library
'/root/solr-0.9.11/modules/solr.so' - /root/solr-0.9.11/modules/solr.so:
undefined symbol: curl_easy_getinfo in Unknown on line 0

I am using the current Solr PECL extension.  PHP 5.3.8.  Curl 7.21.3.  Yes,
libcurl and libcurl-dev are both installed, also 7.21.3.  Fedora Core 15,
patched to current levels.

Please help!

cheers,

Travis
-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: Problems installing Solr PHP extension

2011-11-16 Thread Travis Low
Thanks so much for responding.  I tried your suggestion and the pecl build
*seems* to go okay, but after restarting Apache, I get this again in the
error_log:

> PHP Warning: PHP Startup: Unable to load dynamic library
> '/usr/lib64/php/modules/solr.so' - /usr/lib64/php/modules/solr.so:
> undefined symbol: curl_easy_getinfo in Unknown on line 0

I'm baffled by this because the undefined symbol is in libcurl.so, and I've
specified the path to that library.

If I can't solve this problem then we'll basically have to write our own
PHP Solr client, which would royally suck.

cheers,

Travis

On Wed, Nov 16, 2011 at 7:11 AM, Adolfo Castro Menna <
adolfo.castrome...@gmail.com> wrote:

> Pecl installation is kinda buggy. I installed it ignoring pecl dependencies
> because I already had them.
>
> Try: pecl install -n solr  (-n ignores dependencies)
> And when it prompts for curl and libxml, point the path to where you have
> installed them, probably in /usr/lib/
>
> Cheers,
> Adolfo.
>
> On Tue, Nov 15, 2011 at 7:27 PM, Travis Low  wrote:
>
> > I know this isn't strictly Solr, but I've been at this for hours and I'm
> at
> > my wits end.  I cannot install the Solr PECL extension (
> > http://pecl.php.net/package/solr), either by command line "pecl install
> > solr" or by downloading and using phpize.  Always the same error, which I
> > see here:
> >
> >
> http://www.lmpx.com/nav/article.php/news.php.net/php.qa.reports/24197/read/index.html
> >
> > It boils down to this:
> > PHP Warning: PHP Startup: Unable to load dynamic library
> > '/root/solr-0.9.11/modules/solr.so' - /root/solr-0.9.11/modules/solr.so:
> > undefined symbol: curl_easy_getinfo in Unknown on line 0
> >
> > I am using the current Solr PECL extension.  PHP 5.3.8.  Curl 7.21.3.
>  Yes,
> > libcurl and libcurl-dev are both installed, also 7.21.3.  Fedora Core 15,
> > patched to current levels.
> >
> > Please help!
> >
> > cheers,
> >
> > Travis
> > --
> >
> > **
> >
> > *Travis Low, Director of Development*
> >
> >
> > ** * *
> >
> > *Centurion Research Solutions, LLC*
> >
> > *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*
> >
> > *703-956-6276 *•* 703-378-4474 (fax)*
> >
> > *http://www.centurionresearch.com* <http://www.centurionresearch.com>
> >
> > **The information contained in this email message is confidential and
> > protected from disclosure.  If you are not the intended recipient, any
> use
> > or dissemination of this communication, including attachments, is
> strictly
> > prohibited.  If you received this email message in error, please delete
> it
> > and immediately notify the sender.
> >
> > This email message and any attachments have been scanned and are believed
> > to be free of malicious software and defects that might affect any
> computer
> > system in which they are received and opened. No responsibility is
> accepted
> > by Centurion Research Solutions, LLC for any loss or damage arising from
> > the content of this email.
> >
>



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: Problems installing Solr PHP extension

2011-11-16 Thread Travis Low
Ah, ausgezeichnet, thank you Kuli!  We'll just use that.

On Wed, Nov 16, 2011 at 11:35 AM, Michael Kuhlmann  wrote:

> Am 16.11.2011 17:11, schrieb Travis Low:
>
>
>> If I can't solve this problem then we'll basically have to write our own
>> PHP Solr client, which would royally suck.
>>
>
> Oh, if you really can't get the library work, no problem - there are
> several PHP clients out there that don't need a PECL installation.
>
> Personally, I have used 
> http://code.google.com/p/solr-**php-client/<http://code.google.com/p/solr-php-client/>,
> it works well.
>
> -Kuli
>



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


setting up schema (newbie question)

2010-07-20 Thread Travis Low
I have a large database table with many document records, and I plan to use
SOLR to improve the searching for the documents.

The twist here is that perhaps 50% of the records will originate from
outside sources, and sometimes those records may be updated versions of
documents we already have.  Currently, a human visually examines the
incoming information and performs a few document searches, and decides if a
new document must be created, or an existing one should be updated.  We
would like to automate the matching to some extent, and it occurs to me that
SOLR might be useful for this as well.

Each document has many attributes that can be used for matching.  The
attributes are all in lookup tables.  For example, there is a "location"
field that might be something like "Central Public Library, Crawford, NE"
for row with id #.  The incoming document might have something like
"Crawford Central Public Library, Nebraska", which ideally would map to
# as well.

I'm currently thinking that a two-phase import might work.  First, we use
SOLR to try and get a list of attribute ids for the incoming document.
Those can be used for ordinary database queries to find primary keys of
potential matches.  Then we use SOLR again to search the reduced list for
the unstructured information, essentially by including those primary keys as
part of the search.

I was looking at the example for DIH here:
http://wiki.apache.org/solr/DataImportHandler and it is clear, but it
obviously slanted on finding the products.  I need to find the categories so
that I can *then* find the products, if that makes sense.

Any suggestions on how to proceed?  My first thought is that I should set up
two SOLR instances, one for indexing only attributes, and one for the
documents themselves.

Thanks in advance for any help.

cheers,

Travis


Re: stream.url problem

2010-08-17 Thread Travis Low
"Connection refused" (in any context) almost always means that nothing is
listening on the TCP port that you are trying to connect to. So either the
process you are connecting to isn't running, or you are trying to connect to
the wrong port.

On Tue, Aug 17, 2010 at 6:18 AM, satya swaroop  wrote:

> hi all,
>   i am indexing the documents to solr that are in my system. now i need
> to index the files that are in remote system, i enabled the remote
> streaming
> to true in solrconfig.xml and when i use the stream.url it shows the error
> as ""connection refused"" and the detail of the error is:::
>
> when i sent the request in my browser as::
>
>
> http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdf&literal.id=schb2
>
> i get the error as
>
> HTTP Status 500 - Connection refused java.net.ConnectException: Connection
> refused at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown
> Source) at
>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at
> [snip]
>
>
> if any body know
> please help me with this
>
> regards,
> satya
>


Re: Solr, c/s type ?

2010-09-08 Thread Travis Low
I'll guess he means client/server.

On Tue, Sep 7, 2010 at 5:52 PM, Chris Hostetter wrote:

>
> : Subject: Solr, c/s type ?
> :
> : i'm wondering c/s type is possible (not http web type).
> : if possible, could i get the material about it?
>
> You're going t oneed to provide more info exaplining what it is you are
> asking baout -- i don't know about anyone else, but i honestly have
> absolutely no idea what you might possibly mean by "c/s type is possible
> (not http web type)"
>
> -Hoss
>
> --
> http://lucenerevolution.org/  ...  October 7-8, Boston
> http://bit.ly/stump-hoss  ...  Stump The Chump!
>
>


Re: DIH fails after processing roughly 10million records

2013-01-08 Thread Travis Low
What you describe sounds right to me and seems consistent with the error
stacktrace..  I would increase the MySQL wait_timeout to 3600 and,
depending on your server, you might want to also increase max_connections.

cheers,

Travis

On Tue, Jan 8, 2013 at 4:10 AM, vijeshnair  wrote:

> Solr version : 4.0 (running with 9GB of RAM)
> MySQL : 5.5
> JDBC : mysql-connector-java-5.1.22-bin.jar
>
> I am trying to run the full import for my catalog data which is roughly
> 13million of products. The DIH ran smoothly for 18 hours, and processed
> roughly 10million of records. But all of a sudden it broke due to the jdbc
> exception i.e. Communication failure with the server. I did an extensive
> googling on this topic, and there are multiple recommendation to use
> "readonly=true", "autocommit=true" etc. If I understand it correctly, the
> possible reason is when DIH stops indexing due to the segment merging, and
> when it tries to reconnect with the server. When index is slightly large
> and
> multiple merging happening at the same time, DIH stops indexing for some
> time, and by the time it re-starts MySQL would have already discontinued
> the
> connection. So I am going to increase the wait time out at MySQL side from
> the default 120 to some thing slightly large, to see if that solve the
> issue
> or not. I would know the result of that approach only after completing one
> full run, which I will update you tomorrow. Mean time I thought of
> validating my approach, and checking with you for any other fix which
> exist.
>
> Here is the error stack
>
> Jan 8, 2013 12:44:00 PM org.apache.solr.handler.dataimport.JdbcDataSource
> closeConnection
> SEVERE: Ignoring Error when closing connection
> java.sql.SQLException: Streaming result set
> com.mysql.jdbc.RowDataDynamic@32d051c1 is still active. No statements may
> be
> issued when any streaming result sets are open and in use on a given
> connection. Ensure that you have called .close() on any active streaming
> result sets before attempting more queries.
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:923)
> at
> com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:3234)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2399)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
> at
> com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4908)
> at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4794)
> at
> com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4403)
> at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1594)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:400)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:391)
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:291)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:280)
> at
>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
> at
>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
> at
>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
> Jan 8, 2013 12:44:00 PM org.apache.solr.handler.dataimport.JdbcDataSource
> closeConnection
> SEVERE: Ignoring Error when closing connection
> com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException:
> Communications link failure during rollback(). Transaction resolution
> unknown.
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
> at com.mysql.jdbc.Util.getInstance(Util.java:386)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1014)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:988)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:974)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:919)
> at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionIm

Re: solr query

2013-01-22 Thread Travis Low
Yes.  Write a program to consume the result xml and then spit it back out
the way you'd like to see it.

cheers,

Travis

On Tue, Jan 22, 2013 at 1:23 PM, hassancrowdc wrote:

> ?
>
>
> On Tue, Jan 22, 2013 at 12:24 PM, hassancrowdc [via Lucene] <
> ml-node+s472066n4035390...@n3.nabble.com> wrote:
>
> > thnx. One quick question, can I control the way resultset of the query is
> > shown: I mean if i want displayName to be shown first and then the id and
> > then manufacturer and model? is there any way i can do that?
> >
> > --
> >  If you reply to this email, your message will be added to the discussion
> > below:
> > http://lucene.472066.n3.nabble.com/solr-query-tp4035325p4035390.html
> >  To unsubscribe from solr query, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4035325&code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDM1MzI1fC00ODMwNzMyOTM=
> >
> > .
> > NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-query-tp4035325p4035421.html
> Sent from the Solr - User mailing list archive at Nabble.com.




-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: Benefits of Solr over Lucene?

2013-02-12 Thread Travis Low
http://lucene.apache.org/solr/

On Tue, Feb 12, 2013 at 10:40 AM, JohnRodey  wrote:

> I know that Solr web-enables a Lucene index, but I'm trying to figure out
> what other things Solr offers over Lucene.  On the Solr features list it
> says "Solr uses the Lucene search library and extends it!", but what
> exactly
> are the extensions from the list and what did Lucene give you?  Also if I
> have an index built through Solr is there a non-HTTP way to search that
> index?  Because solr4j essentially just makes HTTP requests correct?
>
> Some features Im particularly interested in are:
> Geospatial Search
> Highlighting
> Dynamic Fields
> Near Real-Time Indexing
> Multiple Search Indices
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


querying multivalue fields

2012-01-27 Thread Travis Low
If a query matches one or more values of a multivalued field, is it
possible to get the indexes back for WHICH values?  For example, for a
document with a multivalue field having ["red", "redder", "reddest",
"yellow", "blue"] as its value, if "red" is the query, could we know that
values 0,1, and 2 matched?

Against all hope, if that's "yes", then the next question is, would the
values be listed in the order they were specified when adding the document?

The idea here is that each document may have a variable number of multiple
external (e.g. Word) documents associated with it, and for any match, we
not only want to provide a link to the Solr document, but also, be able to
tell the user which external documents matched.  The contents of these
documents would populate the multivalued field (a very big field).

If that can't be done, I think what we'll do is do some kind of prefixed
hash of the document name and embed that in each mutlivalued field value
(each document content).  The prefix would contain (or be another hash of)
the document id.  Then we could find which documents matched, could we
not?

Sorry if this is a dumb question.  I've asked about this before, and
received some *very* useful input (thanks!) but nothing that has yet lead
me to a robust solution for indexing a set of records along with their
associated documents and being able to identify the matching record AND the
matching document(s).

Thanks for your help!

cheers,
Travis

-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: UTF-8 support during indexing content

2012-02-01 Thread Travis Low
Are you sure the input document is in UTF-8?  That looks like classic
ISO-8859-1-treated-as-UTF-8.

How did you confirm the document contains the right quote marks immediately
prior to uploading?  If you just visually inspected it, then use whatever
tool you viewed it in to see what the character set is.

cheers,
Travis

On Wed, Feb 1, 2012 at 9:17 AM, Van Tassell, Kristian <
kristian.vantass...@siemens.com> wrote:

> Hello everyone,
>
> I have a question that I imagine has been asked many times before, so I
> apologize for the repeat.
>
> I have a basic text field with the following text:
>the word ”stemming” in quotes
>
> Uploading the data yields no errors, however when it is indexed, the text
> looks like this:
>
> the word �stemming� in quotes
>
>
> Searching for the word stemming, without quotes or otherwise, does not
> return any hits.
>
> Just some basic facts:
>
> - I included the solr.CollationKeyFilterFactory filter on the fieldType.
> - Updating the index is done via a "solr xml" document. I've confirmed
> that the document contains the right quote marks immediately prior to
> uploading.
> - Updating the index is done via solrj, essentially:
>DirectXmlRequest up = new DirectXmlRequest( "/update", xml );
>solrServer.request( up );
>solrServer.commit();
> - In solr admin, the characters look like garbage, surrounding the word
> stemming (as shown above)
>
>
> Thanks in advance for any details you can provide!
> -Kristian
>
**


Re: Why my email always been rejected?

2012-03-20 Thread Travis Low
I received it...sometimes it just needs some time.

2012/3/20 怪侠 <87863...@qq.com>

> I send email to :solr-user@lucene.apache.org, but I always receive the
> rejected email. It can't send successful.




-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: UPDATE query in deltaquery

2010-12-30 Thread Travis Low
If you are getting a null pointer exception here:

  colNames = readFieldNames(resultSet.getMetaData());

Then that implies the DIH code is written to expect a select statement.  You
might be able to fool it with some SQL injection:

  update blah set foo=bar where id=1234; select id from blah

But if that doesn't work then you may be out of luck.

cheers,

Travis

On Thu, Dec 30, 2010 at 8:26 AM, Juan Manuel Alvarez wrote:

> Erick:
>
> Thanks for the quick response.
>
> I can't use the timestamp for doing DIH, so I need to use a custom
> field that I need to update one for each delta-import, so that is why
> I need to execute an UPDATE on the deltaQuery.
>
> Cheers!
> Juan M.
>
> On Thu, Dec 30, 2010 at 10:07 AM, Erick Erickson
>  wrote:
> > WARNING: DIH isn't my strong suit, I generally prefer doing things
> > in SolrJ. Mostly I asked for clarification so someone #else# who
> > actually knows DIH details could chime in...
> >
> > That said, I'm a bit confused. As I understand it, you shouldn't
> > be UPDATEing anything in DIH, it's a select where documents
> > then get added to Solr "by magic". Your post leads me to believe
> > that you're trying to change the database via DIH, is that at
> > all true?
> >
> > This is based in part on
> > "The ids are returned ok, but the UPDATE has no effect on the database"
> > Or do you mean "effect on the index"? If the latter, then the select
> > would only have a chance of updating the IDs of the Solr documents...
> >
> > At least I think that's close to reality...
> >
> > Best
> > Erick
> >
> > On Thu, Dec 30, 2010 at 7:52 AM, Juan Manuel Alvarez  >wrote:
> >
> >> Hi Erick!
> >>
> >> Here is my DIH configuration:
> >>
> >> 
> >> >>
> >>
>  
> url="jdbc:postgresql://${dataimporter.request.dbHost}:${dataimporter.request.dbPort}/${dataimporter.request.dbName}"
> >>user="${dataimporter.request.dbUser}"
> >> password="${dataimporter.request.dbPassword}" autoCommit="false"
> >>transactionIsolation="TRANSACTION_READ_UNCOMMITTED"
> >> holdability="CLOSE_CURSORS_AT_COMMIT"/>
> >>
> >> >>query='  . '
> >>  deltaImportQuery='  . '
> >>deltaQuery=' . '
> >>>
> >>
> >>
> >> 
> >>
> >> I have tried two options for the deltaQuery:
> >> UPDATE "Global"."Projects" SET "prj_lastSync" = now() WHERE "prj_id" =
> >> '2'; < Throws a null pointer exception as described in the
> >> previous email
> >>
> >> The second option is a DB function that I am calling this way:
> >> SELECT "get_deltaimport_items" AS "id" FROM
> >> project.get_deltaimport_items(2, 'project');
> >>
> >> The function inside executes the UPDATE query shown above and a SELECT
> >> query for the ids.
> >> The ids are returned ok, but the UPDATE has no effect on the database.
> >>
> >> Cheers!
> >> Juan M.
> >>
> >>
> >> On Thu, Dec 30, 2010 at 1:32 AM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >> > Well, let's see the queries you're sending, and your DIH
> configuration.
> >> >
> >> > Otherwise, we're just guessing...
> >> >
> >> > Best
> >> > Erick
> >> >
> >> > On Wed, Dec 29, 2010 at 9:58 PM, Juan Manuel Alvarez <
> naici...@gmail.com
> >> >wrote:
> >> >
> >> >> Hi! I would like to ask you a question about using a deltaQuery in
> DIH.
> >> >> I am syncing with a PostgreSQL database.
> >> >>
> >> >> At first I was calling a function that made two queries: an UPDATE
> and a
> >> >> SELECT.
> >> >> The select result was properly returned, but the UPDATE query did not
> >> >> made any changes,
> >> >> so I tried calling the same function from a PostgreSQL client and
> >> >> everything went OK.
> >> >>
> >> >> So I tried calling a simple UPDATE query directly in the deltaQuery
> >> >> and I receive a
> >> >> NullPointerException that I traced to the line 251 of the
> &g

Why does the StatsComponent only work with indexed fields?

2011-02-09 Thread Travis Truman
Is there a reason why the StatsComponent only deals with indexed fields?

I just updated the wiki: http://wiki.apache.org/solr/StatsComponent to call
this fact out since it was not apparent previously.

I've briefly skimmed the source of StatsComponent, but am not familiar
enough with the code or Solr yet to understand if it was omitted for
performance reasons or some other reason.

Any information would be appreciated.

Thanks,
Travis


use case: structured DB records with a bunch of related files

2011-07-19 Thread Travis Low
Greetings.  I have a bunch of highly structured DB records, and I'm pretty
clear on how to index those.  However, each of those records may have any
number of related documents (Word, Excel, PDF, PPT, etc.).  All of this
information will change over time.

Can someone point me to a use case or some good reading to get me started on
configuring Solr to index the DB records and files in such a way as to
relate the two types of information?  By "relate", I mean that if there's a
hit in a related file, then I need to show the user a link to the DB record
as well as a link to the file.

Thanks in advance.

cheers,

Travis

-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed to
be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from the
content of this email.


Schema design/data import

2011-07-20 Thread Travis Low
Greetings.  I am struggling to design a schema and a data import/update
strategy for some semi-complicated data.  I would appreciate any input.

What we have is a bunch of database records that may or may not have files
attached.  Sometimes no files, sometimes 50.

The requirement is to index the database records AND the documents, and the
search results would be just links to the database records.

I'd love to crawl the site with Nutch and be done with it, but we have a
complicated search form with various codes and attributes for the database
records, so we need a detailed schema that will loosely correspond to boxes
on the search form.  I don't think we could easily do that if we just crawl
the site.  But with a detailed schema, I'm having trouble understanding how
we could import and index from the database, and also index the related
files, and have the same schema being populated, especially with the number
of related documents being variable (maybe index them all to one field?).

We have a lot of flexibility on how we can build this, so I'm open to any
suggestions or pointers for further reading.  I've spent a fair amount of
time on the wiki but I didn't see anything that seemed directly relevant.

An additional difficulty, that I am willing to overlook for the first cut,
is that some of these files are zipped, and some of the zip files may
contain other zip files, to maybe 3 or 4 levels deep.

Help, please?

cheers,

Travis



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed to
be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from the
content of this email.


Re: Schema Design/Data Import

2011-07-25 Thread Travis Low
Thanks so much Erick (and Stefan).  Yes, I did some reading on SolrJ and
Tika and you are spot-on.  We will write our own importer using SolrJ and
then we can grab the DB records and parse any attachments along the way.

Now it comes down to a schema design question.  The issue I'm struggling
with is what kind of field or fields to use for the attachments.  The reason
for the difficulty is that the documents we're most interested in are the DB
records, not the attachments, and there could be 0 or 3 or 50 attachments
for a single DB record.  Should we:

(1) Just add fields called "attachment_0", "attachment_1", ... ,
"attachment_100" to the schema?
(2) Somehow index all attachments to a single field? (Is this even
possible?)
(3) Use dynamic fields?
(4) None of the above?

The idea is that if there is a hit in one of the attachments, then we need
to show a link to the DB record.  It would be nice to show a link the the
document as well, but that's less important.

cheers,

Travis


On Mon, Jul 25, 2011 at 9:49 AM, Erick Erickson wrote:

> I'd seriously consider going with SolrJ as your indexing strategy, it
> allows
> you to do anything you need to do in Java code. You can call the Tika
> library yourself on the files pointed to by your rows as you see fit,
> indexing
> them as you choose, perhaps one Solr doc per attachment, perhaps one
> per row, whatever.
>
> Best
> Erick
>
> On Wed, Jul 20, 2011 at 3:27 PM,   wrote:
> >
> > [Apologies if this is a duplicate -- I have sent several messages from my
> work email and they just vanish, so I subscribed with my personal email]
> >
> > Greetings.  I am struggling to design a schema and a data import/update
>  strategy for some semi-complicated data.  I would appreciate any input.
> >
> > What we have is a bunch of database records that may or may not have
> files attached.  Sometimes no files, sometimes 50.
> >
> > The requirement is to index the database records AND the documents,  and
> the search results would be just links to the database records.
> >
> > I'd  love to crawl the site with Nutch and be done with it, but we have a
>  complicated search form with various codes and attributes for the  database
> records, so we need a detailed schema that will loosely  correspond to boxes
> on the search form.  I don't think we could easily  do that if we just crawl
> the site.  But with a detailed schema, I'm  having trouble understanding how
> we could import and index from the  database, and also index the related
> files, and have the same schema  being populated, especially with the number
> of related documents being  variable (maybe index them all to one field?).
> >
> > We have a lot of flexibility on how we can build this, so I'm open  to
> any suggestions or pointers for further reading.  I've spent a fair  amount
> of time on the wiki but I didn't see anything that seemed  directly
> relevant.
> >
> > An additional difficulty, that I am willing to overlook for the  first
> cut, is that some of these files are zipped, and some of the zip  files may
> contain other zip files, to maybe 3 or 4 levels deep.
> >
> > Help, please?
> >
> > cheers,
> >
> > Travis
>



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed to
be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from the
content of this email.


Re: SOLR 4.0 DataImport frozen or fails with WARNING: Unable to read: dataimport.properties?

2012-09-07 Thread Travis Low
Change your data-config.xml connection XML to this:



Then try again.  This keeps the driver from trying to fetch the entire
result set at the same time.

cheers,

Travis


On Fri, Sep 7, 2012 at 4:17 AM, deniz  wrote:

> Hi all,
>
> I have been trying to index my data from mysql db, but somehow  i cant
> index
> anything, and dont see any exception / error in logs, except a warning
> which
> is highlighted below...
>
> Here is my db-config's connection string:
>
>  url="jdbc:mysql://dbhost:3396/myDB" user="XXX" password="XXX" />
>
> (I can connect to the db from command line by using the above settings)
>
> and after i start dataimport i see these in the log:
>
> INFO: Starting Full Import
> Sep 07, 2012 4:08:21 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:08:21 PM
> org.apache.solr.handler.dataimport.SimplePropertiesWriter
> readIndexerProperties
> *WARNING: Unable to read: dataimport.properties*
> Sep 07, 2012 4:08:21 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Creating a connection for entity user with URL:
> jdbc:mysql://10.60.1.157:3396/poppen
> Sep 07, 2012 4:08:22 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Time taken for getConnection(): 802
> Sep 07, 2012 4:08:23 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=1
> Sep 07, 2012 4:08:25 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=1
> Sep 07, 2012 4:08:27 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:08:29 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=1
> Sep 07, 2012 4:08:31 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:08:33 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:08:36 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=1
> Sep 07, 2012 4:08:38 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=1
> Sep 07, 2012 4:08:40 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=1
> Sep 07, 2012 4:08:42 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:08:44 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:08:46 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=1
> Sep 07, 2012 4:08:49 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:08:51 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:08:53 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:08:55 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:08:58 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:09:00 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:09:02 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:09:06 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> Sep 07, 2012 4:09:08 PM org.apache.solr.core.SolrCore execute
> INFO: [collection1] webapp=/solr path=/dataimport params={command=status}
> status=0 QTime=0
> S

Accidental multivalued fields?

2012-09-14 Thread Travis Low
Greetings.  I am using Solr 3.4.0 with tomcat 7.0.22.  I've been using
these versions successfully for a while, but on my latest project, I cannot
sort ANY field without getting this exception:

SEVERE: org.apache.solr.common.SolrException: can not sort on multivalued
field: id
at
org.apache.solr.schema.SchemaField.checkSortability(SchemaField.java:161)
at org.apache.solr.schema.TrieField.getSortField(TrieField.java:126)
at
org.apache.solr.schema.SchemaField.getSortField(SchemaField.java:144)
at
org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:385)
at org.apache.solr.search.QParser.getSort(QParser.java:251)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
[snip]

The thing is, I have only one multivalued field in my schema, at least, I
thought so.  I even tried sorting on id, which is the unique key, and got
the same error.  Here are the fields in my schema:


   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   
   
   
   

   
   
   
   
   

   
   
   
   
   


   
   
   
   
   
   
   
   
   
   
   
   
   
   


 id

I can post the entire schema.xml if need be.  Can anyone please tell me
what's going on?

cheers,

Travis

-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: Accidental multivalued fields?

2012-09-14 Thread Travis Low
Thanks much!  It was the schema version attribute -- the recycled
schema.xml I used did not contain that very useful comment.  Everything
works great now!

On Fri, Sep 14, 2012 at 1:56 PM, Chris Hostetter
wrote:

>
> : Greetings.  I am using Solr 3.4.0 with tomcat 7.0.22.  I've been using
> : these versions successfully for a while, but on my latest project, I
> cannot
> : sort ANY field without getting this exception:
> :
> : SEVERE: org.apache.solr.common.SolrException: can not sort on multivalued
>
> ...
>
> : The thing is, I have only one multivalued field in my schema, at least, I
> : thought so.  I even tried sorting on id, which is the unique key, and got
> : the same error.  Here are the fields in my schema:
>
> a) multiValued can be set on fieldType and is then inherited by the fields
>
> b) Check the "version" property on your  tag.  If the value is
> "1.0" then all fields are assumed to be multiValued.
>
> Here's the comment from the example schema included with Solr 3.4...
>
> 
>   
>
>
> -Hoss
>



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: Items disappearing from Solr index

2012-09-26 Thread Travis Low
That makes sense on the surface, but Kissue makes a good point.  Shouldn't
the delete match the same documents as the search?  He said no documents
come back when he searches on the phrase, but documents are deleted when he
uses the same phrase.

cheers,
Travis

On Wed, Sep 26, 2012 at 9:37 AM, Jack Krupansky wrote:

> It is looking for documents with "Emory" in the specified field OR "Labs"
> in the default search field.
>
> -- Jack Krupansky
>
> -Original Message- From: Kissue Kissue
> Sent: Wednesday, September 26, 2012 7:47 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Items disappearing from Solr index
>
> I have just solved this problem.
>
> We have a field called catalogueId. One possible value for this field could
> be "Emory Labs". I found out that when the following delete by query is
> sent to solr:
>
> getSolrServer().deleteByQuery(**catalogueId + ":" + Emory Labs)  [Notice
> that
> there are no quotes surrounding the catalogueId value - Emory Labs]
>
> For some reason this delete by query ends up deleting the contents of some
> other random catalogues too which is the reason why we are loosing items
> from the index. When the query is changed to:
>
> getSolrServer().deleteByQuery(**catalogueId + ":" + "Emory Labs"), then it
> starts to correctly delete only items in the Emory Labs catalogue.
>
> So my first question is, what exactly does deleteByQuery do in the first
> query without the quotes? How is it determining which catalogues to delete?
>
> Secondly, shouldn't the correct behaviour be not to delete anything at all
> in this case since when a search is done for the same catalogueId without
> the quotes it just simply returns no results?
>
> Thanks.
>
>
> On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue 
> wrote:
>
>  Hi Erick,
>>
>> Thanks for your reply. Yes i am using delete by query. I am currently
>> logging the number of items to be deleted before handing off to solr. And
>> from solr logs i can it deleted exactly that number. I will verify
>> further.
>>
>> Thanks.
>>
>>
>> On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson 
>> **wrote:
>>
>>  How do you delete items? By ID or by query?
>>>
>>> My guess is that one of two things is happening:
>>> 1> your delete process is deleting too much data.
>>> 2> your index process isn't indexing what you think.
>>>
>>> I'd add some logging to the SolrJ program to see what
>>> it thinks is has deleted or added to the index and go from there.
>>>
>>> Best
>>> Erick
>>>
>>> On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue 
>>> wrote:
>>> > Hi,
>>> >
>>> > I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer
>>> to
>>> > index and delete items from solr.
>>> >
>>> > I basically index items from the db into solr every night. Existing
>>> items
>>> > can be marked for deletion in the db and a delete request sent to solr
>>> to
>>> > delete such items.
>>> >
>>> > My process runs as follows every night:
>>> >
>>> > 1. Check if items have been marked for deletion and delete from solr. I
>>> > commit and optimize after the entire solr deletion runs.
>>> > 2. Index any new items to solr. I commit and optimize after all the new
>>> > items have been added.
>>> >
>>> > Recently i started noticing that huge chunks of items that have not >
>>> been
>>> > marked for deletion are disappearing from the index. I checked the solr
>>> > logs and the logs indicate that it is deleting exactly the number of
>>> items
>>> > requested but still a lot of other items disappear from the index from
>>> time
>>> > to time. Any ideas what might be causing this or what i am doing wrong.
>>> >
>>> >
>>> > Thanks.
>>>
>>>
>>
>>
>


-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: PHP client for a web application

2012-10-03 Thread Travis Low
Hi Esteban,

A year ago, we tried to use the Apache Solr PECL extension but we were
unable to install it, much less use it.  We ended up using the
Solr-PHP-client.  It has worked perfectly for a year and we have had zero
problems with it.  We haven't tried using Solarium.  Good luck!

cheers,

Travis

On Tue, Oct 2, 2012 at 3:37 PM, Esteban Cacavelos <
estebancacave...@gmail.com> wrote:

> Hi, I'm starting a web application using solr as a search engine. The web
> site will be developed in PHP (maybe I'll use a framework also).
>
> I would like to know some thoughts and opinions about the clients (
> http://wiki.apache.org/solr/SolPHP). I didn't like very much the PHP
> extension option because I think this is a limitation. So, I would like to
> read opinions about SOLARIUM and SOLR-PHP-CLIENT.
>
>
> Thanks in advance!
>
>
> --
> Esteban L. Cacavelos de Amoriza
> Cel: 0981 220 429
>



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: Urgent Help Needed: Solr Data import problem

2012-10-30 Thread Travis Low
Like Amit said, this appears not to be a Solr problem. From the command
line of your machine, try this:

mysql -u'readonly' -p'readonly' -h'10.86.29.32' hpcms_db_new

If that works, and 10.86.29.32 is the server referenced by the URL in your
data-config.xml problem, then at least you know you have database
connectivity, and to the right server.

Also, if your unix server (presumably your mysql server) is 10.86.29.32,
then the URL in your data-config.xml is pointing to the wrong machine.  If
the one in the data-config.xml is correct, you need to test for
connectivity to that machine instead.

cheers,

Travis

On Tue, Oct 30, 2012 at 5:15 AM, kunal sachdeva wrote:

> Hi,
>
> This is my data-config file:-
>
> 
>
>   
>
>   
>
>  name="package" query="select concat('pckg', id) as id,pkg_name,updated_time
> from hp_package_info;">
> 
>
>  name="destination"
>  query="select name,id from hp_city">
>  
> 
> 
>   
> 
>
>
> and password is not null. and 10.86.29.32 is my unix server ip.
>
> regards,
> kunal
>
> On Tue, Oct 30, 2012 at 2:42 PM, Dave Stuart  wrote:
>
> > It looks as though you have a password set on your unix server. you will
> > need to either remove this or ti add the password into the connection
> string
> >
> > e.g. readonly:[yourpassword]@'10.86.29.32'
> >
> >
> >
> > >> 'readonly'@'10.86.29.32'
> > >> (using password: NO)"
> > On 30 Oct 2012, at 09:08, kunal sachdeva wrote:
> >
> > > Hi,
> > >
> > > I'm not getting this error while running in local machine. Please Help
> > >
> > > Regards,
> > > Kunal
> > >
> > > On Tue, Oct 30, 2012 at 10:32 AM, Amit Nithian 
> > wrote:
> > >
> > >> This looks like a MySQL permissions problem and not a Solr problem.
> > >> "Caused by: java.sql.SQLException: Access denied for user
> > >> 'readonly'@'10.86.29.32'
> > >> (using password: NO)"
> > >>
> > >> I'd advise reading your stack traces a bit more carefully. You should
> > >> check your permissions or if you don't own the DB, check with your DBA
> > >> to find out what user you should use to access your DB.
> > >>
> > >> - Amit
> > >>
> > >> On Mon, Oct 29, 2012 at 9:38 PM, kunal sachdeva
> > >>  wrote:
> > >>> Hi,
> > >>>
> > >>> I have tried using data-import in my local system. I was able to
> > execute
> > >> it
> > >>> properly. but when I tried to do it unix server I got following
> error:-
> > >>>
> > >>>
> > >>> INFO: Starting Full Import
> > >>> Oct 30, 2012 9:40:49 AM
> > >>> org.apache.solr.handler.dataimport.SimplePropertiesWriter
> > >>> readIndexerProperties
> > >>> WARNING: Unable to read: dataimport.properties
> > >>> Oct 30, 2012 9:40:49 AM org.apache.solr.update.DirectUpdateHandler2
> > >>> deleteAll
> > >>> INFO: [core0] REMOVING ALL DOCUMENTS FROM INDEX
> > >>> Oct 30, 2012 9:40:49 AM org.apache.solr.core.SolrDeletionPolicy
> onInit
> > >>> INFO: SolrDeletionPolicy.onInit: commits:num=1
> > >>>
> > >>>
> > >>
> >
> commit{dir=/opt/testsolr/multicore/core0/data/index,segFN=segments_1,version=1351490646879,generation=1,filenames=[segments_1]
> > >>> Oct 30, 2012 9:40:49 AM org.apache.solr.core.SolrDeletionPolicy
> > >>> updateCommits
> > >>> INFO: newest commit = 1351490646879
> > >>> Oct 30, 2012 9:40:49 AM
> > >> org.apache.solr.handler.dataimport.JdbcDataSource$1
> > >>> call
> > >>> INFO: Creating a connection for entity destination with URL:
> > >> jdbc:mysql://
> > >>> 172.16.37.160:3306/hpcms_db_new
> > >>> Oct 30, 2012 9:40:50 AM org.apache.solr.common.SolrException log
> > >>> SEVERE: Exception while processing: destination document :
> > >>>
> > >>
> >
> SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException:
> > >>> Unable to execute query: select name,id from hp_city Processing
> > Document
> > >> # 1
> > >>>at
> > >>>
> > >>
> >
> org.apache.solr.handler.dataimport.DocBu

Re: Urgent Help Needed: Solr Data import problem

2012-10-30 Thread Travis Low
We're getting a little far afield...but here is the incantation:

mysql> grant all on DBNAME.* to 'USER'@'IP-ADDRESS' identified by
'PASSWORD';
mysql> flush privileges;

cheers,

Travis

On Tue, Oct 30, 2012 at 2:40 PM, Amit Nithian  wrote:

> This error is typically because of a mysql permissions problem. These
> are usually resolved by a GRANT statement on your DB to allow for
> users to connect remotely to your database server.
>
> I don't know the full syntax but a quick search on Google should yield
> what you are looking for. If you don't control access to this DB, talk
> to your sys admin who does maintain this access and s/he should be
> able to help resolve this.
>
> On Tue, Oct 30, 2012 at 7:13 AM, Travis Low  wrote:
> > Like Amit said, this appears not to be a Solr problem. From the command
> > line of your machine, try this:
> >
> > mysql -u'readonly' -p'readonly' -h'10.86.29.32' hpcms_db_new
> >
> > If that works, and 10.86.29.32 is the server referenced by the URL in
> your
> > data-config.xml problem, then at least you know you have database
> > connectivity, and to the right server.
> >
> > Also, if your unix server (presumably your mysql server) is 10.86.29.32,
> > then the URL in your data-config.xml is pointing to the wrong machine.
>  If
> > the one in the data-config.xml is correct, you need to test for
> > connectivity to that machine instead.
> >
> > cheers,
> >
> > Travis
> >
> > On Tue, Oct 30, 2012 at 5:15 AM, kunal sachdeva <
> kunalsachde...@gmail.com>wrote:
> >
> >> Hi,
> >>
> >> This is my data-config file:-
> >>
> >> 
> >>
> >>   
> >>
> >>   
> >>
> >>  >> name="package" query="select concat('pckg', id) as
> id,pkg_name,updated_time
> >> from hp_package_info;">
> >> 
> >>
> >>  >> name="destination"
> >>  query="select name,id from hp_city">
> >>  
> >> 
> >> 
> >>   
> >> 
> >>
> >>
> >> and password is not null. and 10.86.29.32 is my unix server ip.
> >>
> >> regards,
> >> kunal
> >>
> >> On Tue, Oct 30, 2012 at 2:42 PM, Dave Stuart 
> wrote:
> >>
> >> > It looks as though you have a password set on your unix server. you
> will
> >> > need to either remove this or ti add the password into the connection
> >> string
> >> >
> >> > e.g. readonly:[yourpassword]@'10.86.29.32'
> >> >
> >> >
> >> >
> >> > >> 'readonly'@'10.86.29.32'
> >> > >> (using password: NO)"
> >> > On 30 Oct 2012, at 09:08, kunal sachdeva wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I'm not getting this error while running in local machine. Please
> Help
> >> > >
> >> > > Regards,
> >> > > Kunal
> >> > >
> >> > > On Tue, Oct 30, 2012 at 10:32 AM, Amit Nithian 
> >> > wrote:
> >> > >
> >> > >> This looks like a MySQL permissions problem and not a Solr problem.
> >> > >> "Caused by: java.sql.SQLException: Access denied for user
> >> > >> 'readonly'@'10.86.29.32'
> >> > >> (using password: NO)"
> >> > >>
> >> > >> I'd advise reading your stack traces a bit more carefully. You
> should
> >> > >> check your permissions or if you don't own the DB, check with your
> DBA
> >> > >> to find out what user you should use to access your DB.
> >> > >>
> >> > >> - Amit
> >> > >>
> >> > >> On Mon, Oct 29, 2012 at 9:38 PM, kunal sachdeva
> >> > >>  wrote:
> >> > >>> Hi,
> >> > >>>
> >> > >>> I have tried using data-import in my local system. I was able to
> >> > execute
> >> > >> it
> >> > >>> properly. but when I tried to do it unix server I got following
> >> error:-
> >> > >>>
> >> > >>>
> >> > >>> INFO: Starting Full Import
> >> > >>> Oct 30, 2012 9:40:49 AM
> &

Re: Dynamic core selection

2012-11-01 Thread Travis Low
If I understand you correctly, you would use a multicore setup and send the
request to http://server.com/solr/core0 in one case, and
http://server.com/solr/core1 in the other.

Is there something else that makes this complicated?

cheers,

Travis

On Thu, Nov 1, 2012 at 12:08 PM, Dzmitry Petrushenka wrote:

> Hi All!
>
> I need to be able to send requests to 2 different cores based on the value
> of some request parameter.
>
> First core (active) contains most recent docs. This core is used in 99% of
> cases.
>
> The second core (it has 100-1000 times more docs then active core) and
> used in 0.1% of cases.
>
> We wrote our own search handler (mostly based on the standard one but
> handling our own custom params) and I wonder if there is a way to customize
> Solr so we could direct calls to the required core based on request params
> user passes?
>
> Any help would be helpful.
>
> Thanx,
>



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Dynamically augment search with data

2010-05-24 Thread Travis Chase
So my need is this:

I have a site in which a user does a query for other users. The user can filter 
the query by different parameters that will limit the result set. One of the 
things about the system is that the user's can like different objects 
(Products, Services, etc.). When the user searches the index by a query and it 
returns a list of users I want to be able to calculate the "shared likes" 
between the user and each user result in the the returned result set. I would 
like to then append the calculation in each result in the result set and then 
sort by the greatest number of "shared likes", thereby making the results more 
relevant to the user. I would like to have this calculation run before the 
paging process kicks in so this function will be applied to the result set 
right before paging.

I am using Solr 1.4 and have read just a little on FunctionQuery. Is this what 
I am needing to perform this task? 



*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~

Travis Chase

~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*