from:"P Williams"

Unknown query parser 'terms' with TermsComponent defined

2015-08-25 Thread P Williams

Hi,

We've encountered a strange situation, I'm hoping someone might be able to
shed some light. We're using Solr 4.9 deployed in Tomcat 7.

We build a query that has these params:

'params'=>{
  'fl'=>'id',
  'sort'=>'system_create_dtsi asc',
  'indent'=>'true',
  'start'=>'0',
  'q'=>'_query_:"{!raw f=has_model_ssim}Batch" AND ({!terms
f=id}ft849m81z)',
  'qt'=>'standard',
  'wt'=>'ruby',
  'rows'=>['1',
'1000']}},

And it responds with an error message
'error'=>{

'msg'=>'Unknown query parser \'terms\'',
'code'=>400}}

The terms component is defined in solrconfig.xml:

  

  

  true


  termsComponent

  

And the Standard Response Handler is defined:
  explicit lucene
  

In case its useful, we have
4.9

Why would we be getting the "Unknown query parser \'terms\'" error?

Thanks,
Tricia

Re: Unknown query parser 'terms' with TermsComponent defined

2015-08-25 Thread P Williams

Thanks Hoss! It's obvious what the problem(s) are when you lay it all out
that way.

On Tue, Aug 25, 2015 at 12:14 PM, Chris Hostetter 
wrote:

>
> 1) The "terms" Query Parser (TermsQParser) has nothing to do with the
> "TermsComponent" (the first is for quering many distinct terms, the
> later is for requesting info about low level terms in your index)
>
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser
> https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
>
> 2) TermsQParser (which is what you are trying to use with the "{!terms..."
> query syntax) was not added to Solr until 4.10
>
> 3) based on your example query, i'm pretty sure what you want is the
> TermQParser: "term" (singular, no "s") ...
>
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermQueryParser
>
> {!term f=id}ft849m81z
>
>
> : We've encountered a strange situation, I'm hoping someone might be able
> to
> : shed some light. We're using Solr 4.9 deployed in Tomcat 7.
> ...
> :   'q'=>'_query_:"{!raw f=has_model_ssim}Batch" AND ({!terms
> f=id}ft849m81z)',
> ...
> : 'msg'=>'Unknown query parser \'terms\'',
> : 'code'=>400}}
>
> ...
>
> : The terms component is defined in solrconfig.xml:
> :
> :   
>
> -Hoss
> http://www.lucidworks.com/
>

How to sync lib directory in SolrCloud?

2014-07-31 Thread P Williams

Hi,

I have an existing collection that I'm trying to add to a new SolrCloud.
 This collection has all the normal files in conf but also has a lib
directory to support the filters schema.xml uses.

wget
https://github.com/projectblacklight/blacklight-jetty/archive/v4.9.0.zip
unzip v4.9.0.zip

I add the configuration to Zookeeper

cd /solr-4.9.0/example/scripts
cloud-scripts/zkcli.sh -cmd upconfig -confname blacklight -zkhost
zk1:2181,zk2:2181,zk3:2181 -confdir
~/blacklight-jetty-4.9.0/solr/blacklight-core/conf/

I try to create the collection
curl "
http://solr1:8080/solr/admin/collections?action=CREATE&name=blacklight&numShards=3&collection.configName=blacklight&replicationFactor=2&maxShardsPerNode=2
"

but it looks like the jars in the lib directory aren't available and this
is what is causing my collection creation to fail.  I guess this makes
sense because it's not one of the files that I added to Zookeeper to share.
 How do I share the lib directory via Zookeeper?

Thanks,
Tricia

[pjenkins@solr1 scripts]$ cloud-scripts/zkcli.sh -cmd upconfig -zkhost
zk1:2181,zk2:2181,zk3:2181 -confdir
~/blacklight-jetty-4.9.0/solr/blacklight-core/conf/ -confname blacklight
INFO  - 2014-07-31 09:28:06.289; org.apache.zookeeper.Environment; Client
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
INFO  - 2014-07-31 09:28:06.292; org.apache.zookeeper.Environment; Client
environment:host.name=solr1.library.ualberta.ca
INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
environment:java.version=1.7.0_65
INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
environment:java.vendor=Oracle Corporation
INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
environment:java.home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
environment:java.class.path=cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hppc-0.5.2.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-auth-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/asm-commons-4.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-queries-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-memory-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-codec-1.9.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-join-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/joda-time-2.2.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-codecs-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-common-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/httpmime-4.3.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-hdfs-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/noggit-0.5.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-kuromoji-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/guava-14.0.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-configuration-1.6.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-expressions-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-highlighter-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-annotations-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/asm-4.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/dom4j-1.6.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-io-2.3.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/zookeeper-3.4.6.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/spatial4j-0.4.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/httpcore-4.3.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/protobuf-java-2.5.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-spatial-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-grouping-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-misc-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-suggest-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-phonetic-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-core-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-cli-1.2.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/solr-core-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/solr-solrj-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/antlr-runtime-3.5.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/concurrentlinkedhashmap-lru-1.2.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-queryparser-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/org.restlet.ext.servlet-2.1.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-fileupload-1.2.1.jar:cloud-scri

Using data-config.xml from DIH in SolrJ

2013-11-13 Thread P Williams

Hi All,

I'm building a utility (Java jar) to create SolrInputDocuments and send
them to a HttpSolrServer using the SolrJ API.  The intention is to find an
efficient way to create documents from a large directory of files (where
multiple files make one Solr document) and be sent to a remote Solr
instance for update and commit.

I've already solved the problem using the DataImportHandler (DIH) so I have
a data-config.xml that describes the templated fields and cross-walking of
the source(s) to the schema.  The original data won't always be able to be
co-located with the Solr server which is why I'm looking for another option.

I've also already solved the problem using ant and xslt to create a
temporary (and unfortunately a potentially large) document which the
UpdateHandler will accept.  I couldn't think of a solution that took
advantage of the XSLT support in the UpdateHandler because each document is
created from multiple files.  Our current dated Java based solution
significantly outperforms this solution in terms of disk and time.  I've
rejected it based on that and gone back to the drawing board.

Does anyone have any suggestions on how I might be able to reuse my DIH
configuration in the SolrJ context without re-inventing the wheel (or DIH
in this case)?  If I'm doing something ridiculous I hope you'll point that
out too.

Thanks,
Tricia

Re: Using data-config.xml from DIH in SolrJ

2013-11-14 Thread P Williams

Hi,

I just discovered
UpdateProcessorFactory<http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/package-summary.html>
in
a big way.  How did this completely slip by me?

Working on two ideas.
1. I have used the DIH in a local EmbeddedSolrServer previously.  I could
write a ForwardingUpdateProcessorFactory to take that local update and send
it to a HttpSolrServer.
2. I have code which walks the file-system to compose rough documents but
haven't yet written the part that handles the templated fields and
cross-walking of the source(s) to the schema.  I could configure the update
handler on the Solr server side to do this with the RegexReplace
<http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html>and
DefaultValue<http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/DefaultValueUpdateProcessorFactory.html>
 UpdateProcessorFactor(ies).

Any thoughts on the advantages/disadvantages of these approaches?

Thanks,
Tricia

On Thu, Nov 14, 2013 at 7:49 AM, Erick Erickson wrote:

> There's nothing that I know of that takes a DIH configuration and
> uses it through SolrJ. You can use Tika directly in SolrJ if you
> need to parse structured documents though, see:
> http://searchhub.org/2012/02/14/indexing-with-solrj/
>
> Yep, you're going to be kind of reinventing the wheel a bit I'm
> afraid.
>
> Best,
> Erick
>
>
> On Wed, Nov 13, 2013 at 1:55 PM, P Williams
> wrote:
>
> > Hi All,
> >
> > I'm building a utility (Java jar) to create SolrInputDocuments and send
> > them to a HttpSolrServer using the SolrJ API.  The intention is to find
> an
> > efficient way to create documents from a large directory of files (where
> > multiple files make one Solr document) and be sent to a remote Solr
> > instance for update and commit.
> >
> > I've already solved the problem using the DataImportHandler (DIH) so I
> have
> > a data-config.xml that describes the templated fields and cross-walking
> of
> > the source(s) to the schema.  The original data won't always be able to
> be
> > co-located with the Solr server which is why I'm looking for another
> > option.
> >
> > I've also already solved the problem using ant and xslt to create a
> > temporary (and unfortunately a potentially large) document which the
> > UpdateHandler will accept.  I couldn't think of a solution that took
> > advantage of the XSLT support in the UpdateHandler because each document
> is
> > created from multiple files.  Our current dated Java based solution
> > significantly outperforms this solution in terms of disk and time.  I've
> > rejected it based on that and gone back to the drawing board.
> >
> > Does anyone have any suggestions on how I might be able to reuse my DIH
> > configuration in the SolrJ context without re-inventing the wheel (or DIH
> > in this case)?  If I'm doing something ridiculous I hope you'll point
> that
> > out too.
> >
> > Thanks,
> > Tricia
> >
>

Re: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread P Williams

Hi Mathias,

I'd recommend testing one thing at a time.  See if you can get it to work
for one image before you try a directory of images.  Also try testing using
the solr-testframework using your ide (I use Eclipse) to debug rather than
your browser/print statements.  Hopefully that will give you some more
specific knowledge of what's happening around your plugin.

I also wrote an EntityProcessor plugin to read from a properties
file.
 Hopefully that'll give you some insight about this kind of Solr plugin and
testing them.

Cheers,
Tricia

On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux wrote:

> Hi all!
>
> I've got a question regarding writing a new EntityProcessor, in the
> same sense as the Tika one. My EntityProcessor should analyze jpg
> images and create document fields to be used with the LIRE Solr plugin
> (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
> same approach as the TikaEntityProcessor, but my setup just indexes
> the first of 1000 images. I'm using a FileListEntityProcessor to get
> all JPEGs from a directory and then I'm handing them over (see [2]).
> My code for the EntityProcessor is at [1]. I've tried to use the
> DataSource as well as the filePath attribute, but it ends up all the
> same. However, the FileListEntityProcessor is able to read all the
> files according to the debug output, but I'm missing the link from the
> FileListEntityProcessor to the LireEntityProcessor.
>
> I'd appreciate any pointer or help :)
>
> cheers,
>   Mathias
>
> [1] LireEntityProcessor http://pastebin.com/JFajkNtf
> [2] dataConfig http://pastebin.com/vSHucatJ
>
> --
> Dr. Mathias Lux
> Klagenfurt University, Austria
> http://tinyurl.com/mlux-itec
>

Changing Cache Properties after Indexing

2014-01-13 Thread P Williams

Hi,

I've gone through steps for tuning my cache sizes and I'm very happy with
the results of load testing.  Unfortunately the cache settings for querying
are not optimal for indexing - and in fact slow it down quite a bit.

I've made the caches small by default for the indexing stage and then want
to override the values using properties when used for querying.  That's
easy enough to do and described in
SolrConfigXml
.

I store these properties in a solrcore-querying.properties file.  When
indexing is complete I could unload the Solr core, move (mv) this file to
conf/solrcore.properties and then load the Solr core and it would pick up
the new properties.  The only problem with that is in production I won't
have access to the machine to make changes to the file system.  I need to
be able to do this using the Core Admin API.

I can see that I can specify individual properties with the CREATE command,
for instance property.solr.filterCache.size=2003232.  Great!  So this is
possible but I still have two questions:

   1. Is there a way to specify a conf/solrcore-querying.properties file to
   the admin/cores handler instead of each property individually?
   2. The same functionality doesn't seem to be available when I call the
   RELOAD command.  Is this expected behaviour?  Should it be?

Is there a better way?

Thanks,
Tricia

Re: Changing Cache Properties after Indexing

2014-01-17 Thread P Williams

You're both completely right.  There isn't any issue with indexing with
large cache settings.

I ran the same indexing job five times, twice with large cache and twice
with the default values. I threw out the first job because no matter if
it's cached or uncached it runs ~2x slower. This must have been the
observation I based my incorrect caching notion on.

I unloaded with delete of the data directory and reloaded the core each
time.  I'm using DIH with the FileEntityProcessor and
PlainTextEnityProcessor to index ~11000 fulltext books.

w/ cache
0:13:14.823
0:12:33.910

w/o cache
0:12:13.186
0:15:56.566

There is variation, but not anything that could be explained by the cache
settings. Doh!

Thanks,
Tricia

On Mon, Jan 13, 2014 at 6:08 PM, Shawn Heisey  wrote:

> On 1/13/2014 4:44 PM, Erick Erickson wrote:
>
>> On the face of it, it's somewhat unusual to have the cache settings
>> affect indexing performance. What are you seeing and how are you indexing?
>>
>
> I think this is probably an indirect problem.  Cache settings don't
> directly affect indexing speed, but when autoWarm values are high and NRT
> indexing is happening, new searchers are requested frequently and the
> autoWarm makes that happen slowly with a lot of resources consumed.
>
> Thanks,
> Shawn
>
>

Re: Advice on highlighting

2014-09-12 Thread P Williams

Hi Craig,

Have you seen SOLR-4722 (https://issues.apache.org/jira/browse/SOLR-4722)?
 This was my attempt at something similar.

Regards,
Tricia

On Fri, Sep 12, 2014 at 2:23 PM, Craig Longman  wrote:

> In order to take our Solr usage to the next step, we really need to
> improve its highlighting abilities.  What I'm trying to do is to be able
> to write a new component that can return the fields that matched the
> search (including numeric fields) and the start/end positions for the
> alphanumeric matches.
>
>
>
> I see three different approaches take, either way will require making
> some modifications to the lucene/solr parts, as it just does not appear
> to be doable as a completely stand alone component.
>
>
>
> 1) At initial search time.
>
> This seemed like a good approach.  I can follow IndexSearcher creating
> the TermContext that parses through AtomicReaderContexts to see if it
> contains a match and then adds it to the contexts available for later.
> However, at this point, inside SegmentTermsEnum.seekExact() it seems
> like Solr is not really looking for matching terms as such, it's just
> scanning what looks like the raw index.  So, I don't think I can easily
> extract term positions at this point.
>
>
>
> 2) Write a odified HighlighterComponent.  We have managed to get phrases
> to highlight properly, but it seems like getting the full field matches
> would be more difficult in this module, however, because it does its
> highlighting oblivious to any other criteria, we can't use it as is.
> For example, this search:
>
>
>
>   (body:large+AND+user_id:7)+OR+user_id:346
>
>
>
> Will highlight "large" in records that have user_id = 346 when
> technically (for our purposes at least) it should not be considered a
> hit because the "large" was accompanied by the user_id = 7 criteria.
> It's not immediately clear to me how difficult it would be to change
> this.
>
>
>
> 3) Make a modified DebugComponent and enhance the existing explain()
> methods (in the query types we require it at least) to include more
> information such as the start/end positions of the term that was hit.
> I'm exploring this now, but I don't easily see how I can figure out what
> those positions might be from the explain() information.  Any pointers
> on how, at the point that TermQuery.explain() is being called that I can
> figure out which indexed token was the actual hit on?
>
>
>
>
>
> Craig Longman
>
> C++ Developer
>
> iCONECT Development, LLC
> 519-645-1663
>
>
>
>
>
> This message and any attachments are intended only for the use of the
> addressee and may contain information that is privileged and confidential.
> If the reader of the message is not the intended recipient or an authorized
> representative of the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, notify the sender immediately by
> return email and delete the message and any attachments from your system.
>
>

Re: Total Term Frequency per ResultSet in Solr 4.3 ?

2013-07-04 Thread P Williams

Hi Tony,

Have you seen the
TermVectorComponent?
 It will return the TermVectors for the documents in your result set (note
that the rows parameter matters if you want results for the whole set, the
default is 10).  TermVectors also must be stored for each field that you
want term frequency returned for.  Suppose you have the query
http://localhost:8983/solr/collection1/tvrh?q=cable&fl=includes&tv.tf=true on
the example that comes packaged with Solr.  Then part of the response is:


id

IW-02


9885A004


1


1


1


2


1


1


1




3007WFP


1


1




MA147LL/A


1


1


1


1





Then you can use an XPath query like
sum(//lst[@name='cable']/int[@name='tf']) where 'cable' was the term, to
calculate the term frequency in the 'includes' field for the whole result
set.  You could extend this to get the term frequency across all fields for
your result set with some alterations to the query and schema.xml
configuration.  Alternately you could get the response as json (wt=json)
and use javascript to sum. I know this is not terribly efficient but, if
I'm understanding your request correctly, it's possible.

Cheers,
Tricia


On Thu, Jul 4, 2013 at 10:24 AM, Tony Mullins wrote:

> So what is the workaround for this problem ?
> Can it be done without changing any source code ?
>
> Thanks,
> Tony
>
>
> On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley  wrote:
>
> > Ah, sorry - I thought you were after docfreq, not termfreq.
> > -Yonik
> > http://lucidworks.com
> >
> > On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins 
> > wrote:
> > > Hi Yonik,
> > >
> > > With facet it didn't work.
> > >
> > > Please see the result set doc below
> > >
> > >
> >
> http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29&fq=id%3A27&q=spider&fl=*&df=product&wt=xml&indent=true&facet=true&facet.query=product:spider&facet.query=product:amazing&rows=20
> > >
> > > 
> > >  27
> > >  Movies
> > >   dvd
> > >   The amazing spider man is amazing spider the
> > > spider
> > >   1
> > >   1439641369145507840
> > >
> > >   2
> > >   3
> > >   
> > >   
> > >   1
> > >1
> > > 
> > >
> > > As you can see facet is actually just returning the no. of docs found
> > > against those keywrods not the actual frequency.
> > > Actual frequency is returned by the field 'amazing_freq' &
> 'spider_freq'
> > !
> > >
> > > So is there any workaround for this to get the total of term-frequency
> in
> > > resultset without any modification to Solr source code ?
> > >
> > >
> > > Thanks,
> > > Tony
> > >
> > >
> > > On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley 
> > wrote:
> > >
> > >> If you just want to retrieve those counts, this seems like simple
> > faceting.
> > >>
> > >> q=something
> > >> facet=true
> > >> facet.query=product:hunger
> > >> facet.query=product:games
> > >>
> > >> -Yonik
> > >> http://lucidworks.com
> > >>
> > >> On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins <
> tonymullins...@gmail.com>
> > >> wrote:
> > >> > Hi ,
> > >> >
> > >> > I have lots of crawled data, indexed in my Solr (4.3.0) and lets say
> > user
> > >> > creates a search criteria 'X1' and he/she wants to know the
> occurrence
> > >> of a
> > >> > specific term in the result set of that 'X1' search criteria.
> > >> > And then again he/she creates another search criteria 'X2' and
> he/she
> > >> wants
> > >> > to know the occurrence of that same term in the result set of that
> > 'X2'
> > >> > search criteria.
> > >> >
> > >> > At the moment if I give termfreq(field,term) then it gives me the
> term
> > >> > frequency per document and if I use totaltermfreq(field,term), it
> > gives
> > >> me
> > >> > the total term frequency in entire index not in the result set of my
> > >> search
> > >> > criteria.
> > >> >
> > >> > So what I need is your help to find how to how to get total
> occurrence
> > >> of a
> > >> > term in query's result set.
> > >> >
> > >> > If this is my result set
> > >> >
> > >> > 
> > >> > Movies
> > >> > dvd
> > >> > The Hunger Games
> > >> >
> > >> >   
> > >> > Books
> > >> > paperback
> > >> > The Hunger Book
> > >> >
> > >> > And I am looking for term 'hunger' in product field then I want to
> get
> > >> > value = '2' , and if I am searching for term 'games' in product
> field
> > I
> > >> > want to get value = '1' .
> > >> >
> > >> > Thanks,
> > >> > Tony
> > >>
> >
>

Re: How to Manage RAM Usage at Heavy Indexing

2013-09-09 Thread P Williams

Hi,

I've been seeing the same thing on CentOS with high physical memory use
with low JVM-Memory use.  I came to the conclusion that this was expected
behaviour.  Using top I noticed that my solr user's java process has
Virtual memory allocated of about twice the size of the index, actual is
within the limits I set when jetty starts.  I infer from this that 98% of
Physical Memory is being used to cache the index.  Walter, Erick and others
are constantly reminding people on list to have RAM the size of the index
available -- I think 98% physical memory use is exactly why.  Here is an
excerpt from Uwe Schindler's well written
piecewhich
explains in greater detail:

*"Basically mmap does the same like handling the Lucene index as a swap
file. The mmap() syscall tells the O/S kernel to virtually map our whole
index files into the previously described virtual address space, and make
them look like RAM available to our Lucene process. We can then access our
index file on disk just like it would be a large byte[] array (in Java this
is encapsulated by a ByteBuffer interface to make it safe for use by Java
code). If we access this virtual address space from the Lucene code we
don’t need to do any syscalls, the processor’s MMU and TLB handles all the
mapping for us. If the data is only on disk, the MMU will cause an
interrupt and the O/S kernel will load the data into file system cache. If
it is already in cache, MMU/TLB map it directly to the physical memory in
file system cache. It is now just a native memory access, nothing more! We
don’t have to take care of paging in/out of buffers, all this is managed by
the O/S kernel. Furthermore, we have no concurrency issue, the only
overhead over a standard byte[] array is some wrapping caused by
Java’s ByteBuffer
interface (it is still slower than a real byte[] array, but that is the
only way to use mmap from Java and is much faster than all other directory
implementations shipped with Lucene). We also waste no physical memory, as
we operate directly on the O/S cache, avoiding all Java GC issues described
before."*
*
*
Is it odd that my index is ~16GB but top shows 30GB in virtual memory?
 Would the extra be for the field and filter caches I've increased in size?

I went through a few Java tuning steps relating to OutOfMemoryErrors when
using DataImportHandler with Solr.  The first thing is that when using the
FileEntityProcessor for each file in the file system to be indexed an entry
is made and stored in heap before any indexing actually occurs.  When I
started pointing this at very large directories I started running out of
heap.  One work-around is to divide the job up into smaller batches, but I
was able to allocate more memory so that everything fit.  The next thing is
that with more memory allocated the limiting factor was too many open
files.  After allowing the solr user to open more files I was able to get
past this as well.  There was a sweet spot where indexing with just enough
memory was slow enough that I didn't experience the too many open files
error but why go slow?  Now I'm able to index ~4M documents (newspaper
articles and fulltext monographs) in about 7 hours.

I hope someone will correct me if I'm wrong about anything I've said here
and especially if there is a better way to do things.

Best of luck,
Tricia

On Wed, Aug 28, 2013 at 12:12 PM, Dan Davis  wrote:

> This could be an operating systems problem rather than a Solr problem.
> CentOS 6.4 (linux kernel 2.6.32) may have some issues with page flushing
> and I would read-up up on that.
> The VM parameters can be tuned in /etc/sysctl.conf
>
>
> On Sun, Aug 25, 2013 at 4:23 PM, Furkan KAMACI  >wrote:
>
> > Hi Erick;
> >
> > I wanted to get a quick answer that's why I asked my question as that
> way.
> >
> > Error is as follows:
> >
> > INFO  - 2013-08-21 22:01:30.978;
> > org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
> > webapp=/solr path=/update params={wt=javabin&version=2}
> > {add=[com.deviantart.reachmeh
> > ere:http/gallery/, com.deviantart.reachstereo:http/,
> > com.deviantart.reachstereo:http/art/SE-mods-313298903,
> > com.deviantart.reachtheclouds:http/,
> com.deviantart.reachthegoddess:http/,
> > co
> > m.deviantart.reachthegoddess:http/art/retouched-160219962,
> > com.deviantart.reachthegoddess:http/badges/,
> > com.deviantart.reachthegoddess:http/favourites/,
> > com.deviantart.reachthetop:http/
> > art/Blue-Jean-Baby-82204657 (1444006227844530177),
> > com.deviantart.reachurdreams:http/, ... (163 adds)]} 0 38790
> > ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
> > java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
> > early EOF
> > at
> >
> >
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamRead

DIH field defaults or re-assigning field values

2013-09-18 Thread P Williams

Hi All,

I'm using the DataImportHandler to import documents to my index.  I assign
one of my document's fields by using a sub-entity from the root to look for
a value in a file.  I've got this part working.  If the value isn't in the
file or the file doesn't exist I'd like the field to be assigned a default
value.  Is there a way to do this?

I think I'm looking for a way to re-assign the value of a field.  If this
is possible then I can assign the default value in the root entity and
overwrite it if the value is found in the sub-entity. Ideas?

Thanks,
Tricia

Re: DIH field defaults or re-assigning field values

2013-09-24 Thread P Williams

I discovered how to use the
ScriptTransformer<http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer>
which
worked to solve my problem.  I had to make use
of context.setSessionAttribute(...,...,'global') to store a flag for the
value in the file because the script is only called if there are rows to
transform and I needed to know when the default was appropriate to set in
the root entity.

Thanks for your suggestions Alex.

Cheers,
Tricia

On Wed, Sep 18, 2013 at 1:19 PM, P Williams
wrote:

> Hi All,
>
> I'm using the DataImportHandler to import documents to my index.  I assign
> one of my document's fields by using a sub-entity from the root to look for
> a value in a file.  I've got this part working.  If the value isn't in the
> file or the file doesn't exist I'd like the field to be assigned a default
> value.  Is there a way to do this?
>
> I think I'm looking for a way to re-assign the value of a field.  If this
> is possible then I can assign the default value in the root entity and
> overwrite it if the value is found in the sub-entity. Ideas?
>
> Thanks,
> Tricia
>

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-26 Thread P Williams

Hi,

Haven't tried this myself but maybe try leaving out the
FieldReaderDataSource entirely.  From my quick searching looks like it's
tied to SQL.  Did you try copying the
http://wiki.apache.org/solr/TikaEntityProcessor Advanced Parsing example
exactly?  What happens when you leave out FieldReaderDataSource?

Cheers,
Tricia


On Thu, Sep 26, 2013 at 4:17 AM, Andreas Owen  wrote:

> i'm using solr 4.3.1 and the dataimporter. i am trying to use
> XPathEntityProcessor within the TikaEntityProcessor for indexing html-pages
> but i'm getting this error for each document. i have also tried
> dataField="tika.text" and dataField="text" to no avail. the nested
> XPathEntityProcessor "detail" creates the error, the rest works fine. what
> am i doing wrong?
>
> error:
>
> ERROR - 2013-09-26 12:08:49.006;
> org.apache.solr.handler.dataimport.SqlEntityProcessor; The query failed
> 'null'
> java.lang.ClassCastException: java.io.StringReader cannot be cast to
> java.util.Iterator
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:365)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
> at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Threa

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread P Williams

essor.testTikaHTMLMapperSubEntity(TestTikaEntityProcessor.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
 at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at java.lang.Thread.run(Thread.java:722)



On Fri, Sep 27, 2013 at 3:55 AM, Andreas Owen  wrote:

> i removed the FieldReaderDataSource and dataSource="fld" but it didn't
> help. i get the following for each document:
> DataImportHandlerException: Exception in invoking url null
> Processing Document # 9
> nullpointerexception
>
>
> On 26. Sep 2013, at 8:39 PM, P Williams wrote:
>
> > Hi,
> >
> > Haven't tried this myself but maybe try leaving out the
> > FieldReaderDataSource entirely.  From my quick searching looks like it's
> > tied to SQL.  Did you try copying the
> > http://wiki.apache.org/solr/TikaEntityProcessor Advanced Parsing example
> > exactly?  What happens when you leave out FieldReaderDataSource?
> >
> > Cheers,
> > Tricia
> >
> >
> > On Thu, Sep 26, 2013 at 4:17 AM, Andreas Owen  wrote:
> >
> >> i'm using solr 4.3.1 and the dataimporter. i am trying to use
> >> XPathEntityProcessor within the TikaEntityProcessor fo

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-30 Thread P Williams

Hi Andreas,

When using 
XPathEntityProcessor<http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor>your
DataSource
must be of type DataSource.  You shouldn't be using
BinURLDataSource, it's giving you the cast exception.  Use
URLDataSource<https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/URLDataSource.html>
or
FileDataSource<https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/FileDataSource.html>instead.

I don't think you need to specify namespaces, at least you didn't used to.
 The other thing that I've noticed is that the anywhere xpath expression //
doesn't always work in DIH.  You might have to be more specific.

Cheers,
Tricia





On Sun, Sep 29, 2013 at 9:47 AM, Andreas Owen  wrote:

> how dum can you get. obviously quite dum... i would have to analyze the
> html-pages with a nested instance like this:
>
>  url="file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml"
> forEach="/docs/doc" dataSource="main">
>
>  url="${rec.urlParse}" forEach="/xhtml:html" dataSource="dataUrl">
> 
> 
> 
> 
> 
> 
>
> but i'm pretty sure the foreach is wrong and the xpath expressions. in the
> moment i getting the following error:
>
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException:
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream cannot be cast
> to java.io.Reader
>
>
>
>
>
> On 28. Sep 2013, at 1:39 AM, Andreas Owen wrote:
>
> > ok i see what your getting at but why doesn't the following work:
> >
> >   
> >   
> >
> > i removed the tiki-processor. what am i missing, i haven't found
> anything in the wiki?
> >
> >
> > On 28. Sep 2013, at 12:28 AM, P Williams wrote:
> >
> >> I spent some more time thinking about this.  Do you really need to use
> the
> >> TikaEntityProcessor?  It doesn't offer anything new to the document you
> are
> >> building that couldn't be accomplished by the XPathEntityProcessor alone
> >> from what I can tell.
> >>
> >> I also tried to get the Advanced
> >> Parsing<http://wiki.apache.org/solr/TikaEntityProcessor>example to
> >> work without success.  There are some obvious typos (
> >> instead of ) and an odd order to the pieces ( is
> >> enclosed by ).  It also looks like
> >> FieldStreamDataSource<
> http://lucene.apache.org/solr/4_3_1/solr-dataimporthandler/org/apache/solr/handler/dataimport/FieldStreamDataSource.html
> >is
> >> the one that is meant to work in this context. If Koji is still around
> >> maybe he could offer some help?  Otherwise this bit of erroneous
> >> instruction should probably be removed from the wiki.
> >>
> >> Cheers,
> >> Tricia
> >>
> >> $ svn diff
> >> Index:
> >>
> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
> >> ===
> >> ---
> >>
> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
> >>(revision 1526990)
> >> +++
> >>
> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
> >>(working copy)
> >> @@ -99,13 +99,13 @@
> >>runFullImport(getConfigHTML("identity"));
> >>assertQ(req("*:*"), testsHTMLIdentity);
> >>  }
> >> -
> >> +
> >>  private String getConfigHTML(String htmlMapper) {
> >>return
> >>"" +
> >>"  " +
> >>"  " +
> >> -" >> processor='TikaEntityProcessor' " +
> >> +" >> processor='TikaEntityProcessor' " +
> >>"   url='" +
> >> getFile("dihextras/structured.html").getAbsolutePath() + "' " +
> >>((htmlMapper == null) ? "" : (" htmlMapper='" + h

Results Order When Performing Wildcard Query

2013-04-09 Thread P Williams

Hi,

I wrote a test of my application which revealed a Solr oddity (I think).
 The test which I wrote on Windows 7 and makes use of the
solr-test-framework
fails
under Ubuntu 12.04 because the Solr results I expected for a wildcard query
of the test data are ordered differently under Ubuntu than Windows.  On
both Windows and Ubuntu all items in the result set have a score of 1.0 and
appear to be ordered by docid (which looks like in corresponds to
alphabetical unique id on Windows but not Ubuntu).  I'm guessing that the
root of my issue is that a different docid was assigned to the same
document on each operating system.

The data was imported using a DataImportHandler configuration during a
@BeforeClass step in my JUnit test on both systems.

Any suggestions on how to ensure a consistently ordered wildcard query
result set for testing?

Thanks,
Tricia

Re: Results Order When Performing Wildcard Query

2013-04-09 Thread P Williams

Hey Shawn,

My gut says the difference in assignment of docids has to do with how the
FileListEntityProcessor<http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor>
works
on the two operating systems. The documents are updated/imported in a
different order is my guess, but I haven't tested that theory. I still
think it's kind of odd that there would be a difference.

Indexes are created from scratch in my test, so it's not that. java
-versionreports the same values on both machines
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) Client VM (build 23.7-b01, mixed mode)

The explicit (arbitrary non-score) sort parameter will work as a
work-around to get my test to pass in both environments while I think about
this some more. Thanks!

Cheers,
Tricia


On Tue, Apr 9, 2013 at 2:13 PM, Shawn Heisey  wrote:

> On 4/9/2013 12:08 PM, P Williams wrote:
>
>> I wrote a test of my application which revealed a Solr oddity (I think).
>>   The test which I wrote on Windows 7 and makes use of the
>> solr-test-framework<http://**lucene.apache.org/solr/4_1_0/**
>> solr-test-framework/index.html<http://lucene.apache.org/solr/4_1_0/solr-test-framework/index.html>
>> **>
>>
>> fails
>> under Ubuntu 12.04 because the Solr results I expected for a wildcard
>> query
>> of the test data are ordered differently under Ubuntu than Windows.  On
>> both Windows and Ubuntu all items in the result set have a score of 1.0
>> and
>> appear to be ordered by docid (which looks like in corresponds to
>> alphabetical unique id on Windows but not Ubuntu).  I'm guessing that the
>> root of my issue is that a different docid was assigned to the same
>> document on each operating system.
>>
>
> It might be due to differences in how Java works on the two platforms, or
> even something as simple as different Java versions.  I don't know a lot
> about the underlying Lucene stuff, so this next sentence may not be
> correct: If you have are not starting from an index where the actual index
> directory was deleted before the test started (rather than deleting all
> documents), that might produce different internal Lucene document ids.
>
>
>  The data was imported using a DataImportHandler configuration during a
>> @BeforeClass step in my JUnit test on both systems.
>>
>> Any suggestions on how to ensure a consistently ordered wildcard query
>> result set for testing?
>>
>
> Include an explicit sort parameter.  That way it will depend on the data,
> not the internal Lucene representation.
>
> Thanks,
> Shawn
>
>

Re: How do I recover the position and offset a highlight for solr (4.1/4.2)?

2013-04-16 Thread P Williams

Hi,

It doesn't have the offset information, but checkout my patch
https://issues.apache.org/jira/browse/SOLR-4722 which outputs the position
of each term that's been matched.  I'm eager to get some feedback on this
approach and any improvements that might be suggested.

Cheers,
Tricia

On Wed, Mar 27, 2013 at 8:28 AM, Skealler Nametic wrote:

> Hi,
>
> I would like to retrieve the position and offset of each highlighting
> found.
> I searched on the internet, but I have not found the exact solution to my
> problem...
>

SolrEntityProcessor doesn't grok responseHeader tag in Ancient Solr 1.2 source

2013-04-23 Thread P Williams

Hi,

I'd like to use the SolrEntityProcessor to partially migrate an old index
to Solr 4.1.  The source is pretty old (dated 2006-06-10 16:05:12Z)...
maybe Solr 1.2?  My data-config.xml is based on the SolrEntityProcessor
example 
and wt="xml".
 I'm getting an error from SolrJ complaining about

0
1

in the response.  Does anyone know of a work-around?

Thanks,
Tricia

1734 T12 C0 oasc.SolrException.log SEVERE Exception while processing: sep
document :
SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
org.apache.solr.common.SolrException: parsing error
Caused by: org.apache.solr.common.SolrException: parsing error
Caused by: java.lang.RuntimeException: this must be known type! not:
responseHeader
at
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:222)
 at
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:128)
... 43 more

Re: SolrEntityProcessor doesn't grok responseHeader tag in Ancient Solr 1.2 source

2013-04-23 Thread P Williams

Thanks Erik.  I remember Solr Flare :)


On Tue, Apr 23, 2013 at 11:56 AM, Erik Hatcher wrote:

> You might be out of luck with the SolrEntityProcessor I'd recommend
> writing a simple little script that pages through /select?q=*:* from the
> source Solr and write to the destination Solr.   Back in the day there was
> this fun little beast <
> https://github.com/erikhatcher/solr-ruby-flare/blob/master/solr-ruby/lib/solr/importer/solr_source.rb>
> where you could do something like this:
>
>Solr::Indexer.new(SolrSource.new(...), mapping).index
>
>     Erik
>
>
> On Apr 23, 2013, at 13:41 , P Williams wrote:
>
> > Hi,
> >
> > I'd like to use the SolrEntityProcessor to partially migrate an old index
> > to Solr 4.1.  The source is pretty old (dated 2006-06-10 16:05:12Z)...
> > maybe Solr 1.2?  My data-config.xml is based on the SolrEntityProcessor
> > example <
> http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor>
> > and wt="xml".
> > I'm getting an error from SolrJ complaining about
> > 
> > 0
> > 1
> > 
> > in the response.  Does anyone know of a work-around?
> >
> > Thanks,
> > Tricia
> >
> > 1734 T12 C0 oasc.SolrException.log SEVERE Exception while processing: sep
> > document :
> >
> SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
> > org.apache.solr.common.SolrException: parsing error
> > Caused by: org.apache.solr.common.SolrException: parsing error
> > Caused by: java.lang.RuntimeException: this must be known type! not:
> > responseHeader
> > at
> >
> org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:222)
> > at
> >
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:128)
> > ... 43 more
>
>

Re: DIH doesn't handle bound namespaces?

2011-11-03 Thread P Williams

Hi Gary,

From
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource

*It does not support namespaces , but it can handle xmls with namespaces .
When you provide the xpath, just drop the namespace and give the rest (eg
if the tag is '' the mapping should just
contain 'subject').Easy, isn't it? And you didn't need to write one line of
code! Enjoy **
*
You should be able to use xpath="//titleInfo/title" without making any
modifications (removing the namespace) to your xml.

I hope that answers your question.

Regards,
Tricia

On Mon, Oct 31, 2011 at 9:24 AM, Moore, Gary wrote:

> I'm trying to import some MODS XML using DIH.  The XML uses bound
> namespacing:
>
> http://www.w3.org/2001/XMLSchema-instance";
>  xmlns:mods="http://www.loc.gov/mods/v3";
>  xmlns:xlink="http://www.w3.org/1999/xlink";
>  xmlns="http://www.loc.gov/mods/v3";
>  xsi:schemaLocation="http://www.loc.gov/mods/v3
> http://www.loc.gov/mods/v3/mods-3-4.xsd";
>  version="3.4">
>   
>  Malus domestica: Arnold
>   
> 
>
> However, XPathEntityProcessor doesn't seem to handle xpaths of the type
> xpath="//mods:titleInfo/mods:title".
>
> If I remove the namespaces from the source XML:
>
> http://www.w3.org/2001/XMLSchema-instance";
>  xmlns:mods="http://www.loc.gov/mods/v3";
>  xmlns:xlink="http://www.w3.org/1999/xlink";
>  xmlns="http://www.loc.gov/mods/v3";
>  xsi:schemaLocation="http://www.loc.gov/mods/v3
> http://www.loc.gov/mods/v3/mods-3-4.xsd";
>  version="3.4">
>   
>  Malus domestica: Arnold
>   
> 
>
> then xpath="//titleInfo/title" works just fine.  Can anyone confirm that
> this is the case and, if so, recommend a solution?
> Thanks
> Gary
>
>
> Gary Moore
> Technical Lead
> LCA Digital Commons Project
> NAL/ARS/USDA
>
>

Re: Stream still in memory after tika exception? Possible memoryleak?

2011-11-03 Thread P Williams

Hi All,

I'm experiencing a similar problem to the other's in the thread.

I've recently upgraded from apache-solr-4.0-2011-06-14_08-33-23.war to
apache-solr-4.0-2011-10-14_08-56-59.war and then
apache-solr-4.0-2011-10-30_09-00-00.war to index ~5300 pdfs, of various
sizes, using the TikaEntityProcessor.  My indexing would run to completion
and was completely successful under the June build.  The only error was
readability of the fulltext in highlighting.  This was fixed in Tika 0.10
(TIKA-611).  I chose to use the October 14 build of Solr because Tika 0.10
had recently been included (SOLR-2372).

On the same machine without changing any memory settings my initial problem
is a Perm Gen error.  Fine, I increase the PermGen space.

I've set the "onError" parameter to "skip" for the TikaEntityProcessor.
 Now I get several (6)

*SEVERE: Exception thrown while getting data*
*java.net.SocketTimeoutException: Read timed out*
*SEVERE: Exception in entity :
tika:org.apache.solr.handler.dataimport.DataImport*
*HandlerException: Exception in invoking url  # 2975*

pairs.  And after ~3881 documents, with auto commit set unreasonably
frequently I consistently get an Out of Memory Error

*SEVERE: Exception while processing: f document :
null:org.apache.solr.handle**r.dataimport.DataImportHandlerException:
java.lang.OutOfMemoryError: Java heap s**pace*

The stack trace points
to 
org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151)
and org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:718).

The October 30 build performs identically.

Funny thing is that monitoring via JConsole doesn't reveal any memory
issues.

Because the out of Memory error did not occur in June, this leads me to
believe that a bug has been introduced to the code since then.  Should I
open an issue in JIRA?

Thanks,
Tricia

On Tue, Aug 30, 2011 at 12:22 PM, Marc Jacobs  wrote:

> Hi Erick,
>
> I am using Solr 3.3.0, but with 1.4.1 the same problems.
> The connector is a homemade program in the C# programming language and is
> posting via http remote streaming (i.e.
>
> http://localhost:8080/solr/update/extract?stream.file=/path/to/file.doc&literal.id=1
> )
> I'm using Tika to extract the content (comes with the Solr Cell).
>
> A possible problem is that the filestream needs to be closed, after
> extracting, by the client application, but it seems that there is going
> something wrong while getting a Tika-exception: the stream never leaves the
> memory. At least that is my assumption.
>
> What is the common way to extract content from officefiles (pdf, doc, rtf,
> xls etc) and index them? To write a content extractor / validator yourself?
> Or is it possible to do this with the Solr Cell without getting a huge
> memory consumption? Please let me know. Thanks in advance.
>
> Marc
>
> 2011/8/30 Erick Erickson 
>
> > What version of Solr are you using, and how are you indexing?
> > DIH? SolrJ?
> >
> > I'm guessing you're using Tika, but how?
> >
> > Best
> > Erick
> >
> > On Tue, Aug 30, 2011 at 4:55 AM, Marc Jacobs  wrote:
> > > Hi all,
> > >
> > > Currently I'm testing Solr's indexing performance, but unfortunately
> I'm
> > > running into memory problems.
> > > It looks like Solr is not closing the filestream after an exception,
> but
> > I'm
> > > not really sure.
> > >
> > > The current system I'm using has 150GB of memory and while I'm indexing
> > the
> > > memoryconsumption is growing and growing (eventually more then 50GB).
> > > In the attached graph I indexed about 70k of office-documents
> > (pdf,doc,xls
> > > etc) and between 1 and 2 percent throws an exception.
> > > The commits are after 64MB, 60 seconds or after a job (there are 6
> evenly
> > > divided jobs).
> > >
> > > After indexing the memoryconsumption isn't dropping. Even after an
> > optimize
> > > command it's still there.
> > > What am I doing wrong? I can't imagine I'm the only one with this
> > problem.
> > > Thanks in advance!
> > >
> > > Kind regards,
> > >
> > > Marc
> > >
> >
>

Re: avoid overwrite in DataImportHandler

2011-12-07 Thread P Williams

Hi,

I've wondered the same thing myself.  I feel like the "clean" parameter has
something to do with it but it doesn't work as I'd expect either.  Thanks
in advance to anyone who can answer this question.

*clean* : (default 'true'). Tells whether to clean up the index before the
indexing is started.

Tricia

On Wed, Dec 7, 2011 at 12:49 PM, sabman  wrote:

> I have a unique ID defined for the documents I am indexing. I want to avoid
> overwriting the documents that have already been indexed. I am using
> XPathEntityProcessor and TikaEntityProcessor to process the documents.
>
> The DataImportHandler does not seem to have the option to set
> overwrite=false. I have read some other forums to use deduplication instead
> but I don't see how it is related to my problem.
>
> Any help on this (or explanation on how deduplication would apply to my
> probelm ) would be great. Thanks!
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/avoid-overwrite-in-DataImportHandler-tp3568435p3568435.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: avoid overwrite in DataImportHandler

2011-12-08 Thread P Williams

Ah.  Thanks Erick.

I see now that my question is different from sabman's.

Is there a way to use the DataImportHandler's "full-import" command so that
it does not delete the existing material before it begins?

Thanks,
Tricia

On Thu, Dec 8, 2011 at 6:35 AM, Erick Erickson wrote:

> This is all controlled by Solr via the  field in your schema.
> Just
> remove that entry.
>
> But then it's all up to you to handle the fact that there will be multiple
> documents with the same ID all returned as a result of querying. And
> it won't matter what program adds data, *nothing* will be overwritten,
> DIH has no part in that decision.
>
> Deduplication is about defining some fields in your record and avoiding
> adding another document if the contents are "close", where close is a
> slippery concept. I don't think it's related to your problem at all.
>
> Best
> Erick
>
> On Wed, Dec 7, 2011 at 3:27 PM, P Williams
>  wrote:
> > Hi,
> >
> > I've wondered the same thing myself.  I feel like the "clean" parameter
> has
> > something to do with it but it doesn't work as I'd expect either.  Thanks
> > in advance to anyone who can answer this question.
> >
> > *clean* : (default 'true'). Tells whether to clean up the index before
> the
> > indexing is started.
> >
> > Tricia
> >
> > On Wed, Dec 7, 2011 at 12:49 PM, sabman  wrote:
> >
> >> I have a unique ID defined for the documents I am indexing. I want to
> avoid
> >> overwriting the documents that have already been indexed. I am using
> >> XPathEntityProcessor and TikaEntityProcessor to process the documents.
> >>
> >> The DataImportHandler does not seem to have the option to set
> >> overwrite=false. I have read some other forums to use deduplication
> instead
> >> but I don't see how it is related to my problem.
> >>
> >> Any help on this (or explanation on how deduplication would apply to my
> >> probelm ) would be great. Thanks!
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/avoid-overwrite-in-DataImportHandler-tp3568435p3568435.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>

Re: Solr - Tika(?) memory leak

2012-01-16 Thread P Williams

Hi,

I'm not sure which version of Solr/Tika you're using but I had a similar
experience which turned out to be the result of a design change to PDFBox.

https://issues.apache.org/jira/browse/SOLR-2886

Tricia

On Sat, Jan 14, 2012 at 12:53 AM, Wayne W wrote:

> Hi,
>
> we're using Solr running on tomcat with 1GB in production, and of late
> we've been having a huge number of OutOfMemory issues. It seems from
> what I can tell this is coming from the tika extraction of the
> content. I've processed the java dump file using a memory analyzer and
> its pretty clean at least the class involved. It seems like a leak to
> me, as we don't parse any files larger than 20M, and these objects are
> taking up ~700M
>
> I've attached 2 screen shots from the tool (not sure if you receive
> attachments).
>
> But to summarize (class, number of objects, Used heap size, Retained Heap
> Size):
>
>
> org.apache.xmlbeans.impl.store.Xob$ElementXObj 838,993
> 80,533,728   604,606,040
> org.apache.poi.openxml4j.opc.ZipPackage  2
>   112  87,009,848
> char[]
>  58732,216,960   38,216,950
>
>
> We're really desperate to find a solution to this - any ideas or help
> is greatly appreciated.
> Wayne
>

JSON and DataImportHandler

2010-07-16 Thread P Williams


Hi All,

Has anyone gotten the DataImportHandler to work with json as 
input?  Is there an even easier alternative to DIH?  Could you show me 
an example?


Many thanks,
Tricia

Re: Highlighting data stored outside of Solr

2012-12-17 Thread P Williams

Your problem seems really similar to "It should be possible to highlight
external text"  in JIRA.

Tricia
[https://issues.apache.org/jira/browse/SOLR-1397]

On Tue, Dec 11, 2012 at 12:48 PM, Michael Ryan  wrote:

> Has anyone ever attempted to highlight a field that is not stored in Solr?
>  We have been considering not storing fields in Solr, but still would like
> to use Solr's built-in highlighting.  On first glance, it looks like it
> would be fairly simply to modify DefaultSolrHighlighter to get the stored
> fields from an external source.  We already do not use term vectors, so no
> concerns there.  Any gotchas that I am not seeing?
>
> -Michael
>

Re: Using

2012-10-15 Thread P Williams

Hi,

Thanks for the suggestions.  Didn't work for me :(

I'm calling

which depends on org.eclipse.jetty:jetty-server
which depends on org.eclipse.jetty.orbit:jettty-servlet

I think I'm experiencing https://jira.codehaus.org/browse/JETTY-1493.

The pom file for
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.pom
 contains orbit, so ivy looks for
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
rather
than
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar
hence
my troubles.

I'm an IVY newbie so maybe there is something I'm missing here?  Is there
another 'conf' value other than 'default' I can use?

Thanks,
Tricia

On Fri, Oct 12, 2012 at 4:32 PM, P Williams
wrote:

> Hi,
>
> Has anyone tried using  name="solr-test-framework" rev="4.0.0" conf="test->default"/> with Apache
> IVY in their project?
>
> rev 3.6.1 works but any of the 4.0.0 ALPHA, BETA and release result in:
> [ivy:resolve] :: problems summary ::
> [ivy:resolve]  WARNINGS
> [ivy:resolve]   [FAILED ]
> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit:
>  (0ms)
> [ivy:resolve]    shared: tried
> [ivy:resolve]
> C:\Users\pjenkins\.ant/shared/org.eclipse.jetty.orbit/javax.servlet/3.0.0.v201112011016/orbits/javax.servlet.orbit
> [ivy:resolve]    public: tried
> [ivy:resolve]
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
> [ivy:resolve]   ::
> [ivy:resolve]   ::  FAILED DOWNLOADS::
> [ivy:resolve]   :: ^ see resolution messages for details  ^ ::
> [ivy:resolve]   ::
> [ivy:resolve]   ::
> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit
> [ivy:resolve]   ::
> [ivy:resolve]
> [ivy:resolve]
> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>
> Can anybody point me to the source of this error or a workaround?
>
> Thanks,
> Tricia
>

Re: Using

2012-10-15 Thread P Williams

Apologies, there was a typo in my last message.

org.eclipse.jetty.orbit:jettty-servlet  should have been
org.eclipse.jetty.orbit:javax.servlet


On Mon, Oct 15, 2012 at 11:19 AM, P Williams  wrote:

> Hi,
>
> Thanks for the suggestions.  Didn't work for me :(
>
> I'm calling
>  conf="test->default"/>
>
> which depends on org.eclipse.jetty:jetty-server
> which depends on org.eclipse.jetty.orbit:jettty-servlet
>
> I think I'm experiencing https://jira.codehaus.org/browse/JETTY-1493.
>
> The pom file for
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.pom
>  contains orbit, so ivy looks for
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
>  rather
> than
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar
>  hence
> my troubles.
>
> I'm an IVY newbie so maybe there is something I'm missing here?  Is there
> another 'conf' value other than 'default' I can use?
>
> Thanks,
> Tricia
>
>
>
> On Fri, Oct 12, 2012 at 4:32 PM, P Williams <
> williams.tricia.l...@gmail.com> wrote:
>
>> Hi,
>>
>> Has anyone tried using > name="solr-test-framework" rev="4.0.0" conf="test->default"/> with
>> Apache IVY in their project?
>>
>> rev 3.6.1 works but any of the 4.0.0 ALPHA, BETA and release result in:
>> [ivy:resolve] :: problems summary ::
>> [ivy:resolve]  WARNINGS
>> [ivy:resolve]   [FAILED ]
>> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit:
>>  (0ms)
>> [ivy:resolve]    shared: tried
>> [ivy:resolve]
>> C:\Users\pjenkins\.ant/shared/org.eclipse.jetty.orbit/javax.servlet/3.0.0.v201112011016/orbits/javax.servlet.orbit
>> [ivy:resolve]    public: tried
>> [ivy:resolve]
>> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
>> [ivy:resolve]   ::
>> [ivy:resolve]   ::  FAILED DOWNLOADS::
>> [ivy:resolve]   :: ^ see resolution messages for details  ^ ::
>> [ivy:resolve]   ::
>> [ivy:resolve]   ::
>> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit
>> [ivy:resolve]   ::
>> [ivy:resolve]
>> [ivy:resolve]
>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>>
>> Can anybody point me to the source of this error or a workaround?
>>
>> Thanks,
>> Tricia
>>
>
>

Re: Using

2012-10-16 Thread P Williams

Hi,

Just wanted to update with a workaround.





Works for me to test my configs and project code with SolrTestCaseJ4 using
IVY as a dependency manager.

Does anyone else think it's odd that the directory structure
solr.home/collection1 is hard coded into the test-framework?

Regards,
Tricia

On Mon, Oct 15, 2012 at 11:19 AM, P Williams  wrote:

> Hi,
>
> Thanks for the suggestions.  Didn't work for me :(
>
> I'm calling
>  conf="test->default"/>
>
> which depends on org.eclipse.jetty:jetty-server
> which depends on org.eclipse.jetty.orbit:jettty-servlet
>
> I think I'm experiencing https://jira.codehaus.org/browse/JETTY-1493.
>
> The pom file for
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.pom
>  contains orbit, so ivy looks for
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
>  rather
> than
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar
>  hence
> my troubles.
>
> I'm an IVY newbie so maybe there is something I'm missing here?  Is there
> another 'conf' value other than 'default' I can use?
>
> Thanks,
> Tricia
>
>
>
> On Fri, Oct 12, 2012 at 4:32 PM, P Williams <
> williams.tricia.l...@gmail.com> wrote:
>
>> Hi,
>>
>> Has anyone tried using > name="solr-test-framework" rev="4.0.0" conf="test->default"/> with
>> Apache IVY in their project?
>>
>> rev 3.6.1 works but any of the 4.0.0 ALPHA, BETA and release result in:
>> [ivy:resolve] :: problems summary ::
>> [ivy:resolve]  WARNINGS
>> [ivy:resolve]   [FAILED ]
>> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit:
>>  (0ms)
>> [ivy:resolve]    shared: tried
>> [ivy:resolve]
>> C:\Users\pjenkins\.ant/shared/org.eclipse.jetty.orbit/javax.servlet/3.0.0.v201112011016/orbits/javax.servlet.orbit
>> [ivy:resolve]    public: tried
>> [ivy:resolve]
>> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
>> [ivy:resolve]   ::
>> [ivy:resolve]   ::  FAILED DOWNLOADS::
>> [ivy:resolve]   :: ^ see resolution messages for details  ^ ::
>> [ivy:resolve]   ::
>> [ivy:resolve]   ::
>> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit
>> [ivy:resolve]   ::
>> [ivy:resolve]
>> [ivy:resolve]
>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>>
>> Can anybody point me to the source of this error or a workaround?
>>
>> Thanks,
>> Tricia
>>
>
>

Re: How does Solr know which relative paths to use?

2012-10-16 Thread P Williams

Hi Dotan,

It seems that the examples now use Multiple
Coresby default.  If your test
server is based on the stock example, you should
see a solr.xml file in your CWD path which is how Solr knows about the
relative paths.  There should also be a README.txt file that will tell you
more about how the directory is expected to be organized.

Cheers,
Tricia

On Tue, Oct 16, 2012 at 3:50 PM, Dotan Cohen  wrote:

> I have just installed Solr 4.0 on a test server. I start it like so:
> $ pwd
> /some/dir
> $ java -jar start.jar
>
> The Solr Instance now looks like this:
> CWD
> /some/dir
> Instance
> /some/dir/solr/collection1
> Data
> /some/dir/solr/collection1/data
> Index
> /some/dir/solr/collection1/data/index
>
> From where did the additional relative paths 'collection1',
> 'collection1/data', and 'collection1/data/index' come from? I know
> that I can change the value of CWD with the -Dsolr.solr.home flag, but
> what affects the relative paths mentioned?
>
> Thanks.
>
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>

Unknown query parser 'terms' with TermsComponent defined

Re: Unknown query parser 'terms' with TermsComponent defined

How to sync lib directory in SolrCloud?

Using data-config.xml from DIH in SolrJ

Re: Using data-config.xml from DIH in SolrJ

Re: DataImport Handler, writing a new EntityProcessor

Changing Cache Properties after Indexing

Re: Changing Cache Properties after Indexing

Re: Advice on highlighting

Re: Total Term Frequency per ResultSet in Solr 4.3 ?

Re: How to Manage RAM Usage at Heavy Indexing

DIH field defaults or re-assigning field values

Re: DIH field defaults or re-assigning field values

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

Results Order When Performing Wildcard Query

Re: Results Order When Performing Wildcard Query

Re: How do I recover the position and offset a highlight for solr (4.1/4.2)?

SolrEntityProcessor doesn't grok responseHeader tag in Ancient Solr 1.2 source

Re: SolrEntityProcessor doesn't grok responseHeader tag in Ancient Solr 1.2 source

Re: DIH doesn't handle bound namespaces?

Re: Stream still in memory after tika exception? Possible memoryleak?

Re: avoid overwrite in DataImportHandler

Re: avoid overwrite in DataImportHandler

Re: Solr - Tika(?) memory leak

JSON and DataImportHandler

Re: Highlighting data stored outside of Solr

Re: Using

Re: Using

Re: Using

Re: How does Solr know which relative paths to use?

32 matches

Site Navigation

Mail list logo

Footer information