Please help - Solr Cell using 'stream.url'

2011-10-07 Thread Tod
I'd be happy to provide any more detail that is needed. Thanks - Tod

Re: Please help - Solr Cell using 'stream.url'

2011-10-10 Thread Tod
On 10/07/2011 6:21 PM, � wrote: Hi, What Solr version? Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42. Its running on a Suse Linux VM. How often do you do commits, or do you use autocommit? I had been doing commits every 100 documents (the entire set is about 3

Re: Please help - Solr Cell using 'stream.url'

2011-10-12 Thread Tod
nd stack limit. I will try this - thanks. And you should also consider upgrading to latest Solr... Is there a clearly defined migration path? - Tod

Instructions for Multiple Server Webapps Configuring with JNDI

2011-10-13 Thread Tod
t;true"/> An empty /tomcat/webapps/solr0 directory exists. I expected to fire up tomcat and have it unpack the war file contents into the solr home directory specified in the context fragment, but its empty, as is the webapps directory. What am I doing wrong? I'm running Apache Tomcat/6.0.29. TIA - Tod

Re: Instructions for Multiple Server Webapps Configuring with JNDI

2011-10-18 Thread Tod
On 10/14/2011 2:44 PM, Chris Hostetter wrote: : modified the solr/home accordingly. I have an empty directory under : tomcat/webapps named after the solr home directory in the context fragment. if that empty directory has the same base name as your context fragment (ie: "tomcat/webapps/solr0"

java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log

2011-10-19 Thread Tod
act. TIA - Tod

can solr follow and index hyperlinks embedded in rich text documents (pdf, doc, etc)?

2011-10-21 Thread Tod
hought I would make sure. Thanks - Tod

Re: java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log

2011-10-21 Thread Tod
On 10/19/2011 2:58 PM, wrote: Hi Tod, I had similar issue with slf4j, but it was NoClassDefFound. Do you have some other dependencies in your application that use some other version of slf4j? You can use mvn dependency:tree to get all dependencies in your application. Or maybe there's

Batch indexing documents using ContentStreamUpdateRequest

2011-11-04 Thread Tod
nd without a .close() call - neither work. Is there a way to do this that I'm missing? Thanks - Tod

Re: Batch indexing documents using ContentStreamUpdateRequest

2011-11-04 Thread Tod
won't budge it. On 11/04/2011 12:36 PM, Tod wrote: This is a code fragment of how I am doing a ContentStreamUpdateRequest using CommonHTTPSolrServer: ContentStreamBase.URLStream csbu = new ContentStreamBase.URLStream(url); InputStream is = csbu.getStream(); FastInputStream fis

Help! - ContentStreamUpdateRequest

2011-11-14 Thread Tod
Could someone take a look at this page: http://wiki.apache.org/solr/ContentStreamUpdateRequestExample ... and tell me what code changes I would need to make to be able to stream a LOT of files at once rather than just one? It has to be something simple like a collection of some sort but I jus

Re: Help! - ContentStreamUpdateRequest

2011-11-15 Thread Tod
fer to get documents it needs to index in chunks rather than one at a time as I'm doing now. The one at a time approach is locking up the Solr server at around 700 entries. My thought was if I could chunk them in a batch at a time the lockup will stop and indexing performance would improve. T

Re: Help! - ContentStreamUpdateRequest

2011-11-16 Thread Tod
hing extraordinarily dumb. I'll be happy to share any information about my environment or configuration if it will help find my error. Thanks for all of your help. - Tod On 11/15/2011 8:08 PM, Erick Erickson wrote: That's odd. What are your autocommit parameters? And are you eith

Indexing Using XML Message

2012-01-25 Thread Tod
most appropriate way to accomplish this? I could use the Tika CLI to generate XML but I'm not sure it would work or that its the most efficient way to handle things. Can anyone offer some suggestions? Thanks - Tod

Data Import Handler Rich Format Documents

2010-06-18 Thread Tod
any coding and maybe without even needing to use Nutch. I'm using the current release version of Solr. Thanks in advance. - Tod

Re: Data Import Handler Rich Format Documents

2010-06-18 Thread Tod
On 6/18/2010 9:12 AM, Otis Gospodnetic wrote: Tod, You didn't mention Tika, which makes me think you are not aware of it... You could implement a custom Transformer that uses Tika to perform rich doc text extraction, just like ExtractingRequestHandler does it (see http://wiki.apache.org

Re: Data Import Handler Rich Format Documents

2010-06-18 Thread Tod
On 6/18/2010 11:24 AM, Otis Gospodnetic wrote: Tod, I don't think DIH can do that, but who knows, let's see what others say. Yes, Nutch uses TIKA, too. Otis Looks like the ExtractingRequestHandler uses Tika as well. I might just use this but I'm wondering if there

Re: Data Import Handler Rich Format Documents

2010-06-18 Thread Tod
database and utilize cURL and the Solr ExtractingRequestHandler to push everything into the index. I just wanted to see what everybody else is doing and what my other options might be. Thanks - Tod Ref: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS

Re: Data Import Handler Rich Format Documents

2010-06-22 Thread Tod
"my_database_url" section to an existing (working) database entity to be able to have Tika index the content pointed to by the content_url. Is there anything obviously wrong with what I've tried so far? Thanks - Tod

Indexing Rich Format Documents using Data Import Handler (DIH) and the TikaEntityProcessor

2010-06-23 Thread Tod
so far because this is not working, it keeps rolling back with the error above. Thanks - Tod

Re: Data Import Handler Rich Format Documents

2010-07-06 Thread Tod
guess, would be after I checked out and built from trunk? Thanks - Tod

Supplementing already indexed data

2010-07-11 Thread Tod
I'm getting metadata from a RDB but the actual content is stored somewhere else. I'd like to index the content too but I don't want to overlay the already indexed metadata. I know this can be done but I just can't seem to dig up the correct docs, can anyone point me in the right direction?

Solrj ContentStreamUpdateRequest Slow

2010-08-04 Thread Tod
local pdf file, there are no firewall issues, solr is running on the same machine, and I tried the actual host name in addition to localhost but nothing helps. Thanks - Tod http://wiki.apache.org/solr/ContentStreamUpdateRequestExample

Re: Solrj ContentStreamUpdateRequest Slow

2010-08-06 Thread Tod
SOLR server I am using is really a workstation class machine, plus I am still learning. I have a feeling I'm doing something dumb but just can't seem to pinpoint the exact problem. Thanks - Tod code--- import java.io.File; import java.io.IOException; import

Re: Solrj ContentStreamUpdateRequest Slow

2010-08-13 Thread Tod
a way to tell Solr where the document lived so it could go out and stream it into the index for me. That's where I thought StreamingUpdateSolrServer would help. - Tod

Re: Solrj ContentStreamUpdateRequest Slow

2010-08-18 Thread Tod
server.mydomain.com/test.pdf&stream.contentType=application/pdf&literal.content_id=12342&commit=true' ... works fine - I just want to do it a LOT and as efficiently as possible. If I have to I can wrap it in a perl script and run a cURL or LWP loop but I'd prefer to use SolrJ if I can. Thanks for all your help. - Tod

Re: Solrj ContentStreamUpdateRequest Slow

2010-08-19 Thread Tod
olrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at CommonTest.indexFilesSolrCell(CommonTest.java:59) at CommonTest.main(CommonTest.java:26) ... which is pointing to the solr.request(req) line. Thanks - Tod

Re: Data Import Handler Rich Format Documents

2010-09-24 Thread Tod
ill work I just needed something quickly and don't have the seasoned experience the other developers do. - Tod

UpdateXmlMessage

2010-10-01 Thread Tod
I can do this using GET: http://localhost:8983/solr/update?stream.body=%3Cdelete%3E%3Cquery%3Eoffice:Bridgewater%3C/query%3E%3C/delete%3E http://localhost:8983/solr/update?stream.body=%3Ccommit/%3E ... but can I pass a stream.url parameter using an UpdateXmlMessage? I looked at the schema and

Re: UpdateXmlMessage

2010-10-04 Thread Tod
can't or that I can? If I can, I'm doing something wrong. I'm specifying stream.url as its own field in the XML like: I am the author I am the title http://www.test.com/myOfficeDoc.doc . . . The wiki docs were a little sparse on this one. - Tod Tod wrote:

Overriding Tika's field processing

2010-10-28 Thread Tod
ry returns more than I have in the CMS. Am I understanding the 'literal.title' processing correctly? Does anybody have experience/suggestions on how to handle this? Thanks - Tod

Facet count of zero

2010-11-01 Thread Tod
ith a count of zero. All the other foo's show up with valid counts. Can I do this? Is my syntax incorrect? Thanks - Tod

Re: Facet count of zero

2010-11-01 Thread Tod
On 11/1/2010 1:03 PM, Yonik Seeley wrote: On Mon, Nov 1, 2010 at 12:55 PM, Tod wrote: I'm trying to exclude certain facet results from a facet query. �It seems to work but rather than being excluded from the facet list its returned with a count of zero. If you don't want to see 0 c

Phrase Query Problem?

2010-11-01 Thread Tod
Standards)OR(mykeywords:All)OR(mykeywords:ALL)))&start=0&indent=true&wt=json Should, with an exact match, return only one entry but it returns five some of which don't have any of the fields I've specified. I've tried this both with and without quotes. What could I be doing wrong? Thanks - Tod

Re: Phrase Query Problem?

2010-11-02 Thread Tod
On 11/1/2010 11:14 PM, Ken Stanley wrote: On Mon, Nov 1, 2010 at 10:26 PM, Tod wrote: I have a number of fields I need to do an exact match on. I've defined them as 'string' in my schema.xml. I've noticed that I get back query results that don't have all of the

Re: Phrase Query Problem?

2010-11-02 Thread Tod
ance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) If I tried q=(((mykeywords:"Compliance+With+Conduct+Standards")OR(mykeywords:All)OR(mykeywords:ALL))) ... it didn't work. Once I removed the quotes and escaped spaces it worked as expected. This seems odd since I would have expected the quotes to have triggered a phrase query. Thanks for your help. - Tod

Chinese characters - a little OT

2010-11-10 Thread Tod
optionObj.setAttribute('value',menuVal); optCnt++; selectObj.appendChild(optionObj); } My hunch is I should utf-8 encode the title and then try and display the result but its nor working. I still am seeing the unicode characters. Does anyone see what I could be doing wrong? TIA - Tod

Re: Any Copy Field Caveats?

2010-11-11 Thread Tod
I've noticed that using camelCase in field names causes problems. On 11/5/2010 11:02 AM, Will Milspec wrote: Hi all, we're moving from an old lucene version to solr and plan to use the "Copy Field" functionality. Previously we had "rolled our own" implementation, sticking title, description,

Retrieving indexed content containing multiple languages

2010-11-11 Thread Tod
tailed tutorial on how to handle these types of language challenges? Thanks in advance - Tod

Upgrading Tika "in place"

2013-02-05 Thread Tod
ssary Tika jars without needing to rebuild or upgrade Solr. Is that a possibility and if so how would I go about accomplishing it? I see tika-core and tika-parsers in the 3.6.2 Solr build distro, is that the only two files I need? Thanks - Tod

Solr 3.6 parsing and extraction files

2012-04-18 Thread Tod
? Thanks - Tod

Re: Retrieving indexed content containing multiple languages

2010-11-16 Thread Tod
he appropriate document using english and chinese. If someone could check my math I would appreciate it. If it looks reasonable and there is nothing else written about it on the wiki I'll create a tutorial to give everybody else a leg up. - Tod - Original Message

Opensearch Format Support

2011-01-20 Thread Tod
Does Solr support the Opensearch format? If so could someone point me to the correct documentation? Thanks - Tod

Term Vector Query on Single Document

2011-02-16 Thread Tod
? I'm thinking 'yes' - How expensive is setting the termVector on a field? Thanks - Tod

Can ExtractingRequestHandler ignore documents metadata

2011-05-09 Thread Tod
I'm indexing content from a CMS' database of metadata. The client would prefer that Solr exclude the properties (metadata) of any documents being indexed. Is there a way to tell Tika to only index a document's text and not its properties? Thanks - Tod

Indexing Mediawiki

2011-06-07 Thread Tod
e and would be better off dumping and indexing the wiki instead? Thanks - Tod

Tika Jax-RS and DIH

2011-06-22 Thread Tod
you to get something out to the group. Thanks - Tod

Default schema - 'keywords' not multivalued

2011-06-27 Thread Tod
This was a little curious to me and I wondered what the thought process was behind it before I decide to change it. Thanks - Tod

Re: Default schema - 'keywords' not multivalued

2011-06-28 Thread Tod
On 06/27/2011 11:23 AM, lee carroll wrote: Hi Tod, A list of keywords would be fine in a non multi valued field: keywords : "xxx yyy sss aaa " multi value field would allow you to repeat the field when indexing keywords: "xxx" keywords: "yyy" keywords: "

Re: Default schema - 'keywords' not multivalued

2011-06-29 Thread Tod
upgrading to 3.2? I have a pretty straight forward Tomcat install, would just dropping in the new war suffice? - Tod

mutliple webapps vs multi-core vs distruibuted

2011-06-30 Thread Tod
lack of a unified schema might throw a monkey wrench into the mix limiting the available solutions. Does anyone have a similar experience that would be willing to share? Its early enough in the project life cycle that alternative ideas can be considered. I'd be interested to hear other's opinions. TIA - Tod

tika.parser.AutoDetectParser

2011-07-01 Thread Tod
Solr build now but this message seems to contradict that unless I'm missing a jar somewhere. I've got both dataimporthandler jar files in my WEB-INF/lib dir so not sure what I could be missing. Any ideas? Thanks - Tod

Re: tika.parser.AutoDetectParser

2011-07-01 Thread Tod
On 07/01/2011 12:59 PM, Shawn Heisey wrote: On 7/1/2011 9:23 AM, Tod wrote: I'm working on upgrading to v3.2 from v 1.4.1. I think I've got everything working but when I try to do a data import using dataimport.jsp I'm rolling back and getting class not found exception on the a

ContentStreamLoader Problem

2011-07-12 Thread Tod
nder Tomcat. I already have an existing 1.4.1 instance running, could that be causing the problem? Thanks - Tod Jul 12, 2011 1:11:31 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 1 Jul 12, 2011 1:11:31 PM org.apache.solr.common.SolrExcep

Re: ContentStreamLoader Problem

2011-07-13 Thread Tod
27;t have 1.4.1 on it? If that works, then it's likely a classpath issue Best Erick I'll give it a shot and report back. Thanks - Tod

Most current tik jar files that work with Solr 1.4.1

2011-08-17 Thread Tod
What is the latest version of Tika that I can use with Solr 1.4.1? it comes packaged with 0.4. I tried 0.8 and it no workie.

Solr read timeout

2011-08-18 Thread Tod
the Solr config but would like to focus them first on resolving this problem rather than blanket tweaking the entire config. Is there anything in particular I should look at? Can I provide any more information? Thanks - Tod

JSON formatted response from SOLR question....

2010-05-10 Thread Tod
I apologize, this is such a JSON/javascript question but I'm stuck and am not finding any resources that address this specifically. I'm doing a faceted search and getting back in my facet_counts.faceted_fields response an array of countries. I'm gathering the count of the array elements retur

Re: JSON formatted response from SOLR question....

2010-05-11 Thread Tod
Jon, Yes!!! rsp.facet_counts.facet_fields.['var'].length to rsp.facet_counts.facet_fields[var].length and voila. Tripped up on a syntax error, how special. Just needed another set of eyes - thanks. VelocityResponse duly noted, it will come in handy later. - Tod On 5/10/2010 4:

Compile problems with anonymous SimpleCollector in custom request handler

2017-11-29 Thread Tod Olson
Java 1.8 and Solr 6.4.2. There are two things I do not understand. First: [javac] /Users/tod/src/vufind-browse-handler/browse-handler/java/org/vufind/solr/handler/BrowseRequestHandler.java:445: error: is not abstract and does not override abstract method setNextReader(AtomicReaderContex

Re: Compile problems with anonymous SimpleCollector in custom request handler

2017-11-30 Thread Tod Olson
build.xml: Classpath: ${classpathProp} -Tod On Nov 29, 2017, at 6:00 PM, Shawn Heisey mailto:apa...@elyograg.org>> wrote: On 11/29/2017 2:27 PM, Tod Olson wrote: I'm modifying a existing custom request handler for an open source project, and am looking for some help with a co

Debugging custom RequestHander: spinning up a core for debugging

2017-12-15 Thread Tod Olson
er.info>("Solr core loaded!"); } @AfterClass public static void cleanUpClass() { core.close(); container.shutdown(); logger.info<http://logger.info>("Solr core shut down!"); } } The test, run through ant, fails as follows: [juni

Re: Debugging custom RequestHander: spinning up a core for debugging

2017-12-22 Thread Tod Olson
Thanks, that pointed me in the right direction! The problem was an ancient ICU library in the distributed code. -Tod On Dec 15, 2017, at 5:15 PM, Erick Erickson mailto:erickerick...@gmail.com>> wrote: My guess is this isn't a Solr issue at all; you are somehow using