Can the solr dataimporthandler consume an atom feed?

2014-03-21 Thread eShard
Good afternoon,
I'm using solr 4.0 Final.
I have an IBM atom feed I'm trying to index but it won't work.
There are no errors in the log.
All the other DIH I've created consumed RSS 2.0
Does it NOT work with an atom feed?

here's my configuration:




https://[redacted]";
processor="XPathEntityProcessor"
forEach="/atom:feed/atom:entry"
transformer="DateFormatTransformer,TemplateTransformer">















 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can the solr dataimporthandler consume an atom feed?

2014-03-24 Thread eShard
The only message I get is:
 Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
Requests: 1, Skipped: 0

And there are no errors in the log.

Here's what the ibm atom feed looks like:


http://www.w3.org/2005/Atom";
xmlns:wplc="http://www.ibm.com/wplc/atom/1.0";
xmlns:age="http://purl.org/atompub/age/1.0";
xmlns:snx="http://www.ibm.com/xmlns/prod/sn";
xmlns:lconn="http://www.ibm.com/lotus/connections/seedlist/atom/1.0";>

  
 
https://[redacted]/files/seedlist/myserver?Action=GetDocuments&Format=ATOM&Locale=en_US&Range=2&Start=0
  https://[redacted]/files/seedlist/myserver?Action=GetDocuments&Range=2&Start=1000&Format=ATOM&Locale=en_US&State=U0VDT05EXzIwMTQtMDMtMTMgMTY6MjM6NTguODRfMjAxMS0wNi0wNiAwODowNDoxNC42MjJfNmQ1YzQ3MWMtYTM3ZS00ZjlmLWE0OGEtZWZjYjMyZjU2NDgzXzEwMDBfZmFsc2U%3D";
  rel="next" type="application/atom+xml" title="Next page" />
  Seedlist Service Backend
  System
  
  Files : 1,000 entries of Seedlist
  FILES
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
7dd904f1-698d-4180-aa5a-a5b7d96405c9
http://[redacted]/files/app/file/7dd904f1-698d-4180-aa5a-a5b7d96405c9";
rel="via" type="application/msword" hreflang="en"
title="Connections Install Draft.doc" />


  [redacted]
  85110DD2-994D-4BC4-91BF-5B318ACB8C4E

Connections Install Draft.doc
2010-07-13T10:06:34-04:00


  85110DD2-994D-4BC4-91BF-5B318ACB8C4E
  08E7268C-1C16-4932-9219-4D39CA956AC1
  29D4DDC1-DABE-4ED4-8BF4-4DCB83538E1C
  9D7F1873-5FD9-4DE0-8AF2-AA425E51CCDF


2010-07-13T10:06:34-04:00
1491968
Connections Install
Draft.doc
0
3
0

   
http://[redacted]/files/basic/api/library/d6b2e97b-b72a-4d39-bf8b-971fc2213f56/document/7dd904f1-698d-4180-aa5a-a5b7d96405c9/entry

d6b2e97b-b72a-4d39-bf8b-971fc2213f56

08E7268C-1C16-4932-9219-4D39CA956AC1
[redacted]

29D4DDC1-DABE-4ED4-8BF4-4DCB83538E1C
[redacted]

9D7F1873-5FD9-4DE0-8AF2-AA425E51CCDF
[redacted]
personalFiles
2010-07-13T10:06:34-04:00
  ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126592.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can the solr dataimporthandler consume an atom feed?

2014-03-24 Thread eShard
I confirmed the xpath is correct with a third party XPath visualizer.
/atom:feed/atom:entry parses the xml correctly.

Can anyone confirm or deny that the dataimporthandler can handle an atom
feed?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126672.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can the solr dataimporthandler consume an atom feed?

2014-03-24 Thread eShard
Ok, I found one typo:
the links need to be this: /atom:feed/atom:entry/atom:link/@href
But the import still doesn't work... :(

I guess I have to convert the feed over to RSS 2.0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126691.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can the solr dataimporthandler consume an atom feed?

2014-03-25 Thread eShard
Gora! It works now! 
You are amazing! thank you so much!
I dropped the atom: from the xpath and everything is working.
I did have a typo that might have been causing issues too.
thanks again!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126887.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to exclude a mimetype in tika?

2014-03-26 Thread eShard
Good afternoon,
I'm using solr 4.0 Final
I need movies "hidden" in zip files that need to be excluded from the index.
I can't filter movies on the crawler because then I would have to exclude
all zip files.
I was told I can have tika skip the movies.
the details are escaping me at this point.
How do I exclude a file in the tika configuration?
I assume it's something I add in the update/extract handler but I'm not
sure.

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-exclude-a-mimetype-in-tika-tp4127168.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to build Solr4.0 Final?

2014-05-30 Thread eShard
Good morning,
My company uses Solr4.0Final and I need to add some code to it and
recompile.
However, when I rebuild, all of the jars and the war file say Solr 5.0!
I'm using the old build.xml file from 4.0 so I don't know why it's
automatically upgrading.

How do I force it to build the older version of Solr?

Thank you,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-build-Solr4-0-Final-tp4138918.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to build Solr4.0 Final?

2014-05-30 Thread eShard
Ok, I think I figured it out.
Somehow my Solr4.0Final project was accidentally updated to 5.0.
The solr/build.xml was fine.
the build.xml file at the top level was pointed at 5.0-snapshot.

I need to pull down the 4.0 and start from scratch.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-build-Solr4-0-Final-tp4138918p4138922.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can the elevation component work with synonyms?

2014-06-06 Thread eShard
Good morning Solr compatriots,
I'm using Solr4.0Final and I have synonyms.txt in my schema (only at query
time) like so:

  







  
  






  
  
  






  
  


However, when I try to call my /elevate handler; the synonyms are factored
in but none of the results say [elevated]=true
I'm assuming this is because the elevation must be an exact match and the
synonyms are expanding it beyond that so elevation is thwarted.
For example, if I have TV elevated and TV is also in synonyms.txt then the
query gets expanded to text:TV text:television.

Is there any way to get the elevation to work correctly with synonyms?

BTW
(I did find a custom synonym handler that works but this will require
significant changes to the front end and I'm not sure it will break if and
when we finally upgrade solr)
Here's the custom synonym filter (I had to drop the code in and rebuild
solr.war to get it to work):
https://github.com/healthonnet/hon-lucene-synonyms 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-elevation-component-work-with-synonyms-tp4140423.html
Sent from the Solr - User mailing list archive at Nabble.com.


I need a replacement for the QueryElevation Component

2014-07-08 Thread eShard
Good morning to one and all,
I'm using Solr 4.0 Final and I've been struggling mightily with the
elevation component.
It is too limited for our needs; it doesn't handle phrases very well and I
need to have more than one doc with the same keyword or phrase.
So, I need a better solution. One that allows us to tag the doc with
keywords that clearly identify it as a promoted document would be ideal.
I tried using an external file field but that only allows numbers and not
strings (please correct me if I'm wrong)
EFF would be ideal if there is a way to make it take strings.
I also need an easy way to add these tags to specific docs.
If possible, I would like to avoid creating a separate elevation core but it
may come down to that...

Thank you, 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-need-a-replacement-for-the-QueryElevation-Component-tp4146077.html
Sent from the Solr - User mailing list archive at Nabble.com.


Configuration and specs to index a 1 terabyte (TB) repository

2013-10-29 Thread eShard
Good morning,
I have a 1 TB repository with approximately 500,000 documents (that will
probably grow from there) that needs to be indexed.  
I'm limited to Solr 4.0 final (we're close to beta release, so I can't
upgrade right now) and I can't use SolrCloud because work currently won't
allow it for some reason.

I found this configuration from this link:
http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-td3656484.html#a3657056
 
He said he was able to index 1 TB on a single server with 40 cores and 128
GB of RAM with 10 shards.

Is this my only option? Or is there a better configuration?
Is there some formula for calculating server specifications (this much data
and documents equals this many cores, RAM, hard disk space etc)?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Configuration and specs to index a 1 terabyte (TB) repository

2013-10-29 Thread eShard
Wow, thanks for your response.
You raise a lot of great questions; I wish I had the answers!
We're still trying to get enough resources to finish crawling the
repository, so I don't even know what the final size of the index will be.
I've thought about excluding the videos and other large files and using a
data import handler to just send the meta data but there are problems no
matter where I turn.  
I'm taking what you said back to the server team for deliberation.
Thanks again for your insights



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227p4098259.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Configuration and specs to index a 1 terabyte (TB) repository

2013-10-29 Thread eShard
P.S. 
Offhand, how do I control how much of the index is held in RAM?
Can you point me in the right direction?
Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227p4098260.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Configuration and specs to index a 1 terabyte (TB) repository

2013-10-30 Thread eShard
Wow again! 
Thank you all very much for your insights.  
We will certainly take all of this under consideration.

Erik: I want to upgrade but unfortunately, it's not up to me. You're right,
we definitely need to do it.  
And SolrJ sounds interesting, thanks for the suggestions.

By the way, is there a Solr upgrade guide out there anywhere?


Thanks again!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227p4098431.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can I combine standardtokenizer with solr.WordDelimiterFilterFactory?

2013-11-01 Thread eShard
Good morning,
Here's the issue: 
I have and ID that consists of two letters and a number.
The whole user title looks like this: Lastname, Firstname (LA12345).
Now, with my current configuration, I can search for LA12345 and find the
user. 
However, when I type in just the number I get zero results.
If I put a wildcard in (*12345) I find the correct record.  
The problem is I changed that user title to use the
worddelimiterfitlerfactory and it seems to work. 
However, I also copy that field into the text field which just uses the
standardtokenizer and I lose the ability to search for 12345 without a
wildcard.
My question is can (or should) I put the worddelimiterfactory in with the
standardtokenizer in the text field?
Or should I just use one or the other?
Thank you,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-combine-standardtokenizer-with-solr-WordDelimiterFilterFactory-tp4098814.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to get phrase recipe working?

2014-01-21 Thread eShard
Good morning,
  In  the Apache Solr 4 cookbook, p 112 there is a recipe for setting up
phrase searches; like so:

  



  


I ran a sample query q=text_ph:"a-z index" and it didn't work very well at
all.
Is there a better way to do phrase searches? 
I need a specific configuration to follow/use.
Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-phrase-recipe-working-tp4112484.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ODP: How to get phrase recipe working?

2014-01-21 Thread eShard
Thanks, I'll remove the snowball filter and give it try.
I guess I'm looking for an exact phrase match to start. (Is that the
standard phrase search?)
Is there something better or more versatile?
Btw, great job on the book!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ODP-How-to-get-phrase-recipe-working-tp4112491p4112511.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is there a way to get Solr to delete an uploaded document after its been indexed?

2014-01-30 Thread eShard
Hi,
My crawler uploads all the documents to Solr for indexing to a tomcat/temp
folder.  
Over time this folder grows so large that I run out of disk space.  
So, I wrote a bash script to delete the files and put it in the crontab.
However, if I delete the docs too soon, it doesn't get indexed; too late and
I run out of disk.
I'm still trying to find the right window...
So, (and this is probably a long shot)  I'm wondering if there's anything in
Solr that can delete these docs from /temp after they've been indexed...

Thank you,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-Solr-to-delete-an-uploaded-document-after-its-been-indexed-tp4114463.html
Sent from the Solr - User mailing list archive at Nabble.com.


SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param

2014-03-02 Thread eShard
Hi,
I'm using Solr 4.0 Final (yes, I know I need to upgrade)

I'm getting this error:
SEVERE: org.apache.solr.common.SolrException: no field name specified in
query and no default specified via 'df' param

And I applied this fix: https://issues.apache.org/jira/browse/SOLR-3646 
And unfortunately, the error persists.
I'm using a multi shard environment and the error is only happening on one
of the shards.
I've already updated about half of the other shards with the missing default
text in /browse but the error persists on that one shard.
Can anyone tell me how to make the error go away?

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SEVERE-org-apache-solr-common-SolrException-no-field-name-specified-in-query-and-no-default-specifiem-tp4120789.html
Sent from the Solr - User mailing list archive at Nabble.com.


RegexTransformer and xpath in DataImportHandler

2014-03-03 Thread eShard
Good afternoon,
I have this DIH:




https://redacted/";
processor="XPathEntityProcessor"
forEach="/rss/channel/item"
   
transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer">


















I can't seem to populate BOTH blogtitle and short_blogtitle with the same
xpath.
I can only do one or the other; why can't I put the same xpath in 2
different fields?
I removed the short_blogtitle (with the xpath statement) and left in the
regex statement and blogtitle gets populated and short_blogtitle goes to my
update.chain (to the auto complete index) but the field itself is blank in
this index.

If I leave the dih as above, then blogtitle doesn't get populated but
short_blogtitle does.

What am I doing wrong here? Is there a way to populate both? 
And I CANNOT use copyfield here because then the update.chain won't work

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/RegexTransformer-and-xpath-in-DataImportHandler-tp4120946.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param

2014-03-05 Thread eShard
Hi Erick,
  Let me make sure I understand you:
I'm NOT running SolrCloud; so I just have to put the default field in ALL of
my solrconfig.xml files and then restart and that should be it?
Thanks for your reply,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SEVERE-org-apache-solr-common-SolrException-no-field-name-specified-in-query-and-no-default-specifiem-tp4120789p4121495.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param

2014-03-05 Thread eShard
Ok, I updated all of my solrconfig.xml files and I restarted the tomcat
server
AND the errors are still there on 2 out of 10 cores
Am I not reloading correctly?

Here's my /browse handler:
 
 
   explicit

   
   velocity
   browse
   layout
   Solritas

   
   edismax
   
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   
   text
   100%
   *:*
   10
   *,score

   
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   
   text,features,name,sku,id,manu,cat
   3

   
   on
   cat
   manu_exact
   ipod
   GB
   1
   cat,inStock
   after
   price
   0
   600
   50
   popularity
   0
   10
   3
   manufacturedate_dt
   NOW/YEAR-10YEARS
   NOW
   +1YEAR
   before
   after

   
   on
   text features name
   0
   name

   
   on
   false   
   5
   2
   5   
   true
   true  
   5
   3   
 

 
 
   spellcheck
   manifoldCFSecurity
 
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SEVERE-org-apache-solr-common-SolrException-no-field-name-specified-in-query-and-no-default-specifiem-tp4120789p4121502.html
Sent from the Solr - User mailing list archive at Nabble.com.


Why does the q parameter change?

2014-09-25 Thread eShard
Good afternoon all,
I just implemented a phrase search and the parsed query gets changed from
rapid prototyping to rapid prototype. 
I used the solr analyzer and prototyping was unchanged so I think I ruled
out a tokenizer.
So can anyone tell me what's going on?
Here's the query:
q=rapid prototyping&defType=edismax&qf=text&pf2=text^40&ps=0

here's the debugger:
as you can see; prototyping gets changed to just prototype. What's causing
this and how do I turn it off?
Thanks,



rapid prototyping

rapid prototypingrapid prototyping
(+((DisjunctionMaxQuery((text:rapid))
DisjunctionMaxQuery((text:prototype)))~2) DisjunctionMaxQuery((text:"rapid
prototype"^40.0)))/no_coord
+(((text:rapid) (text:prototype))~2)
(text:"rapid prototype"^40.0)
ExtendedDismaxQParser



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why does the q parameter change?

2014-09-25 Thread eShard
Ok, I think I'm on to something.
I omitted this parameter which means it is set to false by default on my
text field.
I need to set it to true and see what happens...
autoGeneratePhraseQueries="true"
If I'm reading the wiki right, this parameter if true will preserve phrase
queries...





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161185.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why does the q parameter change?

2014-09-25 Thread eShard
No, apparently it's the KStemFilter.
should I turn this off at query time?
I'll put this in another question...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161199.html
Sent from the Solr - User mailing list archive at Nabble.com.


Best practice for KStemFilter query or index or both?

2014-09-25 Thread eShard
Good afternoon,
Here's my configuration for a text field.
I have the same configuration for index and query time.
Is this valid? 
What's the best practice for these query or index or both?
for synonyms; I've read conflicting reports on when to use it but I'm
currently changing it over to at indexing time only.

Thanks,


  







  
  






  
  
  






  
  






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-KStemFilter-query-or-index-or-both-tp4161201.html
Sent from the Solr - User mailing list archive at Nabble.com.


recip function error

2014-10-23 Thread eShard
Good evening,
I'm using solr 4.0 Final.
I tried using this function
boost=recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))
but it fails with this error:
org.apache.lucene.queryparser.classic.ParseException: Expected ')' at
position 29 in 'recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))'

I applied this patch https://issues.apache.org/jira/browse/SOLR-3522 
Rebuilt and redeployed AND I get the exact same error.
I only copied over the new jars and war file. Non of the other libraries
seemed to have changed.
the patch is in solr core so I figured I was safe.

Does anyone know how to fix this?

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: recip function error

2014-10-23 Thread eShard
Thanks we're planning on going to 4.10.1 in a few months.
I discovered that recip only works with dismax; I use edismax by default.
does anyone know why I can't use recip with edismax??

I hope this is fixed in 4.10.1...


Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600p4165613.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: recip function error

2014-10-24 Thread eShard
Thank you very much for your replies.
I discovered there was a typo in the function I was given.
One of the parenthesis was in the wrong spot
It should be this:
boost=recip(ms(NOW/HOUR,general_modifydate),3.16e-11,0.08,0.05)

And now it works with edismax! Strange...

Thanks again,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600p4165713.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is there a way to capture div tag by id?

2013-06-25 Thread eShard
let's say I have a div with id="myDiv"
Is there a way to set up the solr upate/extract handler to capture just that
particular div?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to improve (keyword) relevance?

2013-07-22 Thread eShard
Good morning,
I'm currently running Solr 4.0 final (multi core) with manifoldcf v1.3 dev
on tomcat 7.
Early on, I used copyfield to put the meta data into the text field to
simplify solr queries (i.e. I only have to query one field now.)
However, a lot people are concerned about improving relevance.
I found a relevancy solution on page 298 of the Apache Solr 4.0 Cookbook;
however is there a way to modify it so it only uses one field? (i.e. the
text field?) 

(Note well: I have multi cores and the schemas are all somewhat different;
If I can't get this to work with one field then I would have to build
complex queries for all the other cores; this would vastly over complicate
the UI. Is there another way?)
here's the requesthandler in question:

  <1st name="defaults">
  true
  _query_:"{!edismaxqf=$qfQuery
mm=$mmQuerypf=$pfQuerybq=$boostQuery v=$mainQuery}"
  
  name^10 description
  1
  name description
  _query_:"{!edismaxqf=$boostQuerQf mm=100%
v=$mainQuery}"^10
  





--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to improve (keyword) relevance?

2013-07-22 Thread eShard
Sure, let's say the user types in test pdf;
we need the results with all the query words to be near the top of the
result set.
the query will look like this: /select?q=text%3Atest+pdf&wt=xml

How do I ensure that the top resultset contains all of the query words?
How can I boost the first (or second) term when they are both the same field
(i.e. text)?

Does this make sense?

Please bear with me; I'm still new to the solr query syntax so I don't even
know if I'm asking the right question. 

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462p4079502.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to parse multivalued data into single valued fields?

2013-08-07 Thread eShard
Hi,
I'm currently using solr 4.0 final with Manifoldcf v1.3 dev.
I have multivalued titles (the names are all the same so far) that must go
into a single valued field.
Can a transformer do this?
Can anyone show me how to do it?

And this has to fire off before an update chain takes place.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-parse-multivalued-data-into-single-valued-fields-tp4083108.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to parse multivalued data into single valued fields?

2013-08-08 Thread eShard
Ok, I have one index called Communities from an RSS feed.
each item in the feed has multiple titles (which are all the same for this
feed) 
So, the title needs to be cleaned up before it is put into the community
index
let's call the field community_title;
And then an UpdateProcessorChain needs to fire and it takes community_title
and puts it into another index for auto completion suggestions called
SolrAC.

Does that make sense?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-parse-multivalued-data-into-single-valued-fields-tp4083108p4083302.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexoutofbounds size: 9 index: 8 with data import handler

2013-08-15 Thread eShard
Good morning,
I'm using solr 4.0 final on tomcat 7.0.34 on linux
I created 3 new data import handlers to consume 3 RSS feeds.
They seemed to work perfectly.
However, today, I'm getting these errors:
10:42:17SEVERE  SolrCorejava.lang.IndexOutOfBoundsException: 
Index: 9,​
Size: 8
10:42:17SEVERE  SolrDispatchFilter 
null:java.lang.IndexOutOfBoundsException: Index: 9,​ Size: 8
10:42:17SEVERE  SolrCoreorg.apache.solr.common.SolrException: 
Server at
https://search:7443/solr/Communities returned non ok status:500,​
message:Internal Server Error
10:42:17SEVERE  SolrDispatchFilter 
null:org.apache.solr.common.SolrException: Server at
https://search/solr/Communities returned non ok status:500,​
message:Internal Server Error

I read that the index is corrupt so I deleted it and restarted and then the
same errors jumped to the next core with the DIH for the RSS feed.

How do I fix this?

Here's my dih in solrconfig.xml
  

dih-comm-feed.xml
 SemaAC

  

Here's the dih config




https://search/C3CommunityFeedDEV/";
processor="XPathEntityProcessor"
forEach="/rss/channel/item"
transformer="DateFormatTransformer">
















Here a partial of my schema
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   
   
  
   
   

   
   
   
   
   
   
   
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexoutofbounds-size-9-index-8-with-data-import-handler-tp4084812.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexoutofbounds size: 9 index: 8 with data import handler

2013-08-15 Thread eShard
Ok, these errors seem to be caused by passing incorrect parameters in a
search query.
Such as: spellcheck=extendedResults=true 
instead of 
spellcheck.extendedResults=true

Thankfully, it seems to have nothing to do with the DIH at all.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexoutofbounds-size-9-index-8-with-data-import-handler-tp4084812p4084874.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH : Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'st'

2013-08-25 Thread eShard
I just resolved this same error.
The problem was that I had a lot of ampersands (&) that were un-escaped in
my XML doc
There was nothing wrong with my DIH; it was the xml doc it was trying to
consume.
I just used StringEscapeUtils.escapeXml from apache to resolve...
Another big help was the Eclipse XML validation engine. 
Just add your doc to an existing project and right click anywhere on the doc
and select validate from the menu.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Unexpected-character-code-61-expected-a-semi-colon-after-the-reference-for-entity-st-tp2816210p4086531.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can a data import handler grab all pages of an RSS feed?

2013-08-26 Thread eShard
Good morning,
I have an IBM Portal atom feed that spans multiple pages.
Is there a way to instruct the DIH to grab all available pages?
I can put a huge range in but that can be extremely slow with large amounts
of XML data.
I'm currently using Solr 4.0 final.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-data-import-handler-grab-all-pages-of-an-RSS-feed-tp4086635.html
Sent from the Solr - User mailing list archive at Nabble.com.


QueryElevationComponent results only show up with debug = true

2013-08-30 Thread eShard
Hi,
I'm using solr 4.0 final built around Dec 2012.
I was initially told that the QEC didn't work for distributed search but
apparently it was fixed.
Anyway, I use the /elevate handler with [elevated] in the field list and I
don't get any elevated results.
elevated=false in the result block.
however, if I turn on debugQuery; the elevated result appears in the debug
section under queryBoost.
Is this the only way you can get elevated results?
Because before (and I can't remember if this was before or after I went to
4.0 Final) I would get the elevated results mixed in with the "regular"
results in the result block.
elevated=true was the only way to tell them apart.
I also tried forceElevation, enableElevation, exclusive but there is still
no elevated results in the result block.
What am I doing wrong?
query:
http://localhost:8080/solr/Profiles/elevate?q=gangnam+style&fl=*,[elevated]&wt=xml&start=0&rows=100&enableElevation=true&forceElevation=true&df=text&qt=edismax&debugQuery=true
Here's my config:
  

text_general
elevate.xml
  

  
  

  explicit
 text 


  elevator

  
elevate.xml

 
  https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3";
/>  
 
 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: QueryElevationComponent results only show up with debug = true

2013-08-30 Thread eShard
Sure,
Here are the results with the debugQuery=true; with debugging off, there are
no results.
The elevated result appears in the queryBoost section but not in the result
section:


  
0
0

  true
  xml
  100
  *,[elevated]
  text
  true
  0
  gangnam
  true
  edismax

  
  
  

  gangnam
  

   
https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3
  

gangnam
gangnam
(text:gangnam
   
((id:https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3)^0.0))/no_coord
text:gangnam
   
((id:https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3)^0.0)

LuceneQParser

  0.0
  
0.0


  0.0



  0.0



  0.0



  0.0



  0.0



  0.0



  0.0

  
  
0.0


  0.0



  0.0



  0.0



  0.0



  0.0



  0.0



  0.0

  

  





--
View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531p4087554.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: QueryElevationComponent results only show up with debug = true

2013-08-30 Thread eShard
I can guarantee you that the ID is unique and it exists in that index.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531p4087565.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to manually update a field in the index without re-crawling?

2013-10-01 Thread eShard
Good morning,
I'm currently using Solr 4.0 FINAL.
I indexed a website and it took over 24 hours to crawl.
I just realized I need to rename one of the fields (or add a new one). 
so I added the new field to the schema,
But how do I copy the data over from the old field to the new field without
recrawling everything?

Is this possible?

I was thinking about maybe putting an update chain processor in the /update
handler but I'm not sure that will work.

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-manually-update-a-field-in-the-index-without-re-crawling-tp4092955.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 is stripping XML format from RSS content field

2013-10-01 Thread eShard
If anyone is interested, I managed to resolve this a long time ago.
I used a Data Import Handler instead and it worked beautifully.
DIH are very forgiving and it takes what ever XML data is there and injects
it into the Solr Index.
It's a lot faster than crawling too.
You use XPATH to map the fields to your schema.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-is-stripping-XML-format-from-RSS-content-field-tp4039809p4092961.html
Sent from the Solr - User mailing list archive at Nabble.com.


detailed Error reporting in Solr

2013-04-04 Thread eShard
Good morning,
I'm currently running Solr 4.0 final with tika v1.2 and Manifoldcf v1.2 dev. 
And I'm battling Tika XML parse errors again. 
Solr reports this error:org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: XML parse error which is too vague.
I had to manually run the link against the tika app and I got a much more
detailed error.
Caused by: org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 105;
The entity "nbsp" was referenced, but not declared.
so there are old school non break space in the html that tika can't handle.

for example:  Cyber Systems and Technology ›
   

My question is two fold:
1) how do I get solr to report more detailed errors and
2) how do I get tika to accept (or ignore) nbsp?

thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: detailed Error reporting in Solr

2013-04-04 Thread eShard
ok, one possible fix is to add the xml equivalent to nbsp with is:


]> 

but how do I add this into the tika configuration?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053823.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: detailed Error reporting in Solr

2013-04-04 Thread eShard
Yes, that's it exactly.
I crawled a link with these ( ›) in each list item and solr
couldn't handle it threw the xml parse error and the crawler terminated the
job.

Is this fixable? Or do I have to submit a bug to the tika folks?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053882.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to configure shards with SSL?

2013-04-09 Thread eShard
Good morning everyone,
I'm running solr 4.0 Final with ManifoldCF v1.2dev on tomcat 7.0.37 and I
had shards up and running on http but when I migrated to SSL it won't work
anymore.
First I got an IO Exception but then I changed my configuration in
solrconfig.xml to this:
   
 
   explicit
   xml
   true
   *:*

id, solr.title, content, category, link, pubdateiso

dev:7443/solr/ProfilesJava/|dev:7443/solr/C3Files/|dev:7443/solr/Blogs/|dev:7443/solr/Communities/|dev:7443/solr/Wikis/|dev:7443/solr/Bedeworks/|dev:7443/solr/Forums/|dev:7443/solr/Web/|dev:7443/solr/Bookmarks/
 

 
 
https://
1000
5000
   

  

And Now I'm getting this error:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request:
How do I configure shards with SSL?
Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-configure-shards-with-SSL-tp4054735.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to configure shards with SSL?

2013-04-10 Thread eShard
Ok, 
We figured it out:
The cert wasn't in the trusted CA keystore. I know we put it in there
earlier; I don't know why it was missing.
But we added it in again and everything works as before.

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-configure-shards-with-SSL-tp4054735p4055064.html
Sent from the Solr - User mailing list archive at Nabble.com.


relevance when merging results

2013-04-26 Thread eShard
Hi,
I'm currently using Solr 4.0 final on tomcat v7.0.3x
I have 2 cores (let's call them A and B) and I need to combine them as one
for the UI. 
However we're having trouble on how to best merge these two result sets.
Currently, I'm using relevancy to do the merge. 
For example,
I search for "red" in both cores.
Core A has a max score of .919856 with 87 results
Core B has a max score or .6532563 with 30 results

I would like to simply merge numerically but I don't know if that's valid.
If I merge in numerical order then Core B results won't appear until element
25 or later.

I initially thought about just taking the top 5 results from each and layer
one on top of the other.

Is there a best practice out there for merging relevancy?
Please advise...
Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/relevance-when-merging-results-tp4059275.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to store the document folder path in solr?

2013-05-15 Thread eShard
Good afternoon,
I'm using solr 4.0 final with manifoldcf v1.2dev on tomcat 7.0.34
today, a user asked a great question. What if I only know the name of the
folder that the documents are in?
Can I just search on the folder name?
Currently, I'm only indexing documents; how do I capture the folder name (or
full path) and store it?
I have a variety of repositories: web, rss, livelink (I can get the folder
hierarchy for this); I guess indexing a file share would be straight forward
and the path readily available but I haven't been asked to index those yet.
I'll try to run some tests on network file shares...

Can anyone point me in the right direction?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-store-the-document-folder-path-in-solr-tp4063581.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to aggregate data in solr 4.0?

2013-05-15 Thread eShard
Good afternoon,
Does anyone know of a good tutorial on how to perform SQL like aggregation
in solr queries?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-aggregate-data-in-solr-4-0-tp4063584.html
Sent from the Solr - User mailing list archive at Nabble.com.


how do I capture tags?

2013-06-24 Thread eShard
I'm currently running solr 4.0 final with manifoldcf 1.3 dev on tomcat 7.
I need to capture the "h1" tags on each web page as that is the true "title"
for the lack of a better word.
I can't seem to get it to work at all. 
I read the instructions and used the capture component and then mapped it to
a field named h1 in the schema.
Here's my update/extract handler:



  text
  solr.title
  solr.name
  h1
  h1
  
  comments
  
  last_modified
  attr_
  true
  

Can anyone tell me what I doing wrong?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-capture-h1-tags-tp4072792.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how do I capture tags?

2013-06-24 Thread eShard
Ok, I figured it out:
you need to add this too:

true



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-capture-h1-tags-tp4072792p4072798.html
Sent from the Solr - User mailing list archive at Nabble.com.


Too many Tika errors

2012-12-11 Thread eShard
I'm running Solr 4.0 on Tomcat 7.0.8 and I'm running the solr/example single
core as well with manifoldcf v1.1
I had everything working but then the crawler stops and I have Tika errors
in the solr log
I had tika 1.1 and that produces these errors: 
org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@17bc9c03

So, I upgraded to tika 1.2 and again everything seemed to be working (I
indexed 24,000 files) then I recrawled the repository and again it stops;
this time the tika errors are:
null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
org/mozilla/universalchardet/CharsetListener at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)

What's going on here? What version of tika should I use?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-Tika-errors-tp4026126.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Too many Tika errors

2012-12-12 Thread eShard
Ok, I managed to fix the universal charset error is caused by a missing
dependency
just download universalchardet-1.0.3.jar and put it in your extraction lib

the microsoft errors will probably be fixed in a future release of the POI
jars. (v3.9 didn't fix this error)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-Tika-errors-tp4026126p4026347.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr invalid date string

2013-01-08 Thread eShard
I'm currently running solr 4.0 alpha with manifoldCF v1.1 dev
Manifold is sending solr the datetime as milliseconds expired after
1-1-1970.
I've tried setting several date.formats in the extraction handler but I
always get this error: 
and the manifoldcf crawl aborts.
SolrCoreorg.apache.solr.common.SolrException: Invalid Date
String:'134738361' at
org.apache.solr.schema.DateField.parseMath(DateField.java:174) at
org.apache.solr.schema.TrieField.createField(TrieField.java:540)

here's my extraction handler:
requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">

  text
  solr.title
  solr.name
  link
  pubdate
  summary
  comments
  published
  
  last_modified
  attr_
  true
  ignored_

 
  -MM-dd
  -MM-dd'T'HH:mm:ss.SSS'Z'

 
-->
  


here's pubdate in the schema


the dates are already in UTC time they're just in milliseconds...

What am I doing wrong?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-invalid-date-string-tp4031661.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr invalid date string

2013-01-08 Thread eShard
I'll certainly ask manifold if they can send the date in the correct format.
Meanwhile;
How would I create an updater to change the format of a date?
Are there any decent examples out there?

thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-invalid-date-string-tp4031661p4031669.html
Sent from the Solr - User mailing list archive at Nabble.com.


is there an easy way to upgrade from Solr 4 alpha to 4.0 final?

2013-01-08 Thread eShard
I just found out I must upgrade to Solr 4.0 final (from 4.0 alpha)
I'm currently running Solr 4.0 alpha on Tomcat 7.
Is there an easy way to surgically replace files and upgrade? 
Or should I completely start over with a fresh install?
Ideally, I'm looking for a set of steps...
Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-an-easy-way-to-upgrade-from-Solr-4-alpha-to-4-0-final-tp4031682.html
Sent from the Solr - User mailing list archive at Nabble.com.


ivy errors trying to build solr from trunk

2013-01-10 Thread eShard
I downloaded the latest from solr.
I applied a patch
cd to solr dir
and I try ant dist
I get these ivy errors
ivy-availability-check:
 [echo] Building analyzers-phonetic...
ivy-fail:
 [echo]  This build requires Ivy and Ivy could not be found in your
ant classpath.
 [echo]  (Due to classpath issues and the recursive nature of the
Lucene/Solr 
 [echo]  build system, a local copy of Ivy can not be used an loaded
dynamically 
 [echo]  by the build.xml)
 [echo]  You can either manually install a copy of Ivy 2.2.0 in your
ant classpath:
 [echo]http://ant.apache.org/manual/install.html#optionalTasks
 [echo]  Or this build file can do it for you by running the Ivy
Bootstrap target:
 [echo]ant ivy-bootstrap 
 [echo]  
 [echo]  Either way you will only have to install Ivy one time.
 [echo]  'ant ivy-bootstrap' will install a copy of Ivy into your
Ant User Library:
 [echo]C:\Users\da24005/.ant/lib
 [echo]  
 [echo]  If you would prefer, you can have it installed into an
alternative 
 [echo]  directory using the
"-Divy_install_path=/some/path/you/choose" option, 
 [echo]  but you will have to specify this path every time you build
Lucene/Solr 
 [echo]  in the future...
 [echo]ant ivy-bootstrap
-Divy_install_path=/some/path/you/choose
 [echo]...
 [echo]ant -lib /some/path/you/choose clean compile
 [echo]...
 [echo]ant -lib /some/path/you/choose clean compile
 [echo]  If you have already run ivy-bootstrap, and still get this
message, please 
 [echo]  try using the "--noconfig" option when running ant, or
editing your global
 [echo]  ant config to allow the user lib to be loaded.  See the
wiki for more details:
 [echo]http://wiki.apache.org/lucene-java/HowToContribute#antivy
 [echo] 

BUILD FAILED

i tried the ivy-bootstrap but I still get the same error.
I have the ivy jar in the ant lib directory.

what am I doing wrong? and it says use --noconfig if ivy-bootstrap didn't
work. Well --noconfig is not a valid ant command. where/how do I use it?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ivy-errors-trying-to-build-solr-from-trunk-tp4032300.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ivy errors trying to build solr from trunk

2013-01-10 Thread eShard
Ok, the old problem was that eclipse was using a different version of ant
1.8.3.
I dropped the ivy jar in the build path and now I get these errors:
[ivy:retrieve]  ERRORS
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://repo1.maven.org/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.pom
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://repo1.maven.org/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.jar
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://oss.sonatype.org/content/repositories/releases/commons-codec/commons-codec/1.7/commons-codec-1.7.pom
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://oss.sonatype.org/content/repositories/releases/commons-codec/commons-codec/1.7/commons-codec-1.7.jar
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://mirror.netcologne.de/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.pom
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://mirror.netcologne.de/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.jar

Apparently, I can't get to maven since I'm behind a firewall.
are the solr deps available for manual download somewhere?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ivy-errors-trying-to-build-solr-from-trunk-tp4032300p4032332.html
Sent from the Solr - User mailing list archive at Nabble.com.


Tutorial for Solr query language, dismax and edismax?

2013-01-15 Thread eShard
Does anyone have a great tutorial for learning the solr query language,
dismax and edismax?
I've searched endlessly for one but I haven't been able to locate one that
is comprehensive enough and has a lot of examples (that actually work!).
I also tried to use wildcards, logical operators, and a phrase search and it
either didn't work or behave the way I thought it would.

for example, I tried to search a multivalued field solr.title and a content
field that contains their phone number (and a lot of other data)
so, from the solr admin query page;
in the q field i tried lots of variations of this-> solr.title:*Costa,
Julie* AND content:tel=
And I either got 0 results or ALL the results.
solr.title would only work if I put in solr.title:*Costa* but not anything
longer than that. Even though there are plenty of Costa, J's (John, Julie,
Julia, Jerry etc)
I should be able to do a phrase search out of the box, shouldn't I?
I also read on one site that only edismax can use logical operators but I
couldn't get that to work either.
Can anyone point me in the right direction?
I'm currently using Solr 4.0 Final with ManifoldCF v 1.2 dev

Thank you,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tutorial-for-Solr-query-language-dismax-and-edismax-tp4033465.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr multicore aborts with socket timeout exceptions

2013-01-17 Thread eShard
I'm currently running Solr 4.0 final on tomcat v7.0.34 with ManifoldCF v1.2
dev running on Jetty.

I have solr multicore set up with 10 cores. (Is this too much?)
I so I also have at least 10 connectors set up in ManifoldCF (1 per core, 10
JVMs per connection)
>From the look of it; Solr couldn't handle all the data that ManifoldCF was
sending it and the connection would abort socket timeout exceptions.
I tried increasing the maxThreads to 200 on tomcat and it didn't work.
In the ManifoldCF throttling section, I decreased the number of JVMs per
connection from 10 down to 1 and not only did the crawl speed up
significantly, the socket exceptions went away (for the most part)
Here's the ticket for this issue:
https://issues.apache.org/jira/browse/CONNECTORS-608

My question is this: how do I increase the number of connections on the solr
side so I can run multiple ManifoldCF jobs concurrently without aborting or
timeouts?

The ManifoldCF team did mention that there was a committer who had socket
timeout exceptions in a newer version of Solr and he fixed it by increasing
the timeout window. I'm looking for that patch if available.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-multicore-aborts-with-socket-timeout-exceptions-tp4034250.html
Sent from the Solr - User mailing list archive at Nabble.com.


Why do I keep seeing org.apache.solr.core.SolrCore execute in the tomcat logs

2013-01-17 Thread eShard
I keep seeing these in the tomcat logs:
Jan 17, 2013 3:57:33 PM org.apache.solr.core.SolrCore execute
INFO: [Lisa] webapp=/solr path=/admin/logging
params={since=1358453312320&wt=jso
n} status=0 QTime=0

I'm just curious:
What is getting executed here? I'm not running any queries against this core
or using it in any way currently.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-do-I-keep-seeing-org-apache-solr-core-SolrCore-execute-in-the-tomcat-logs-tp4034353.html
Sent from the Solr - User mailing list archive at Nabble.com.


error initializing QueryElevationComponent

2013-01-21 Thread eShard
Hi,
I'm trying to test out the queryelevationcomponent.
elevate.xml is in the solrconfig.xml and it's in the conf directory.
I left the defaults.
I added this to the elevate.xml

 
  https://opentextdev/cs/llisapi.dll?func=ll&objID=577575&objAction=download";
/>
 


id is a string setup as the uniquekey

And I get this error:
16:25:48SEVERE  Config  Exception during parsing file:
elevate.xml:org.xml.sax.SAXParseException; systemId: solrres:/elevate.xml;
lineNumber: 28; columnNumber: 77; The reference to entity "objID" must end
with the ';' delimiter.
16:25:48SEVERE  SolrCorejava.lang.NullPointerException
16:25:48SEVERE  CoreContainer   Unable to create core: Lisa
16:25:48SEVERE  CoreContainer   
null:org.apache.solr.common.SolrException:
Error initializing QueryElevationComponent. 

what am I doing wrong?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/error-initializing-QueryElevationComponent-tp4035194.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'

2013-01-21 Thread eShard
Hi,
This is related to my earlier question regarding the elevationcomponent.
I tried turning this on:
 If you are using the QueryElevationComponent, you may wish to mark
documents that get boosted.  The
  EditorialMarkerFactory will do exactly that: 
 --> 
 

but it fails to load this class.

I'm using solr 4.0 final.
How do I get this to load?

thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'

2013-01-22 Thread eShard
Good morning,
I can't seem to figure out how to load this class
Can someone please point me in the right direction?
Thank you,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203p4035330.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'

2013-01-23 Thread eShard
Thanks,
That worked.
So the documentation needs to be fixed in a few places (the solr wiki and
the default solrconfig.xml in Solr 4.0 final; I didn't check any other
versions)
I'll either open a new ticket in JIRA to request a fix or reopen the old
one...

Furthermore,
I tried using the ElevatedMarkerFactory and it didn't behave the way I
thought it would.

this http://localhost:8080/solr/Lisa/elevate?q=foo+bar&wt=xml&defType=dismax
got me all the doc info but no elevated marker

I ran this
http://localhost:8080/solr/Lisa/elevate?q=foo+bar&fl=[elevated]&wt=xml&defType=dismax
and all I got was response = 1 and elevated = true

I had to run this to get all of the above info:
http://localhost:8080/solr/Lisa/elevate?q=foo+bar&fl=*,[elevated]&wt=xml&defType=dismax



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203p4035621.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: error initializing QueryElevationComponent

2013-01-25 Thread eShard
In case anyone was wondering, the solution is to html encode the URL.
Solr didn't like the &'s; just convert them to & and it works!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/error-initializing-QueryElevationComponent-tp4035194p4036261.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multicore search with ManifoldCF security not working

2013-01-28 Thread eShard
Good morning,
I used this post here to join to search 2 different cores and return one
data set.
http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set
The good news is that it worked!
The bad news is that one of the cores is Opentext and the ManifoldCF
security check isn't firing!
So users could see documents that they aren't supposed to.
The opentext security works if I call the core handler individually. it
fails for the merged result.
I need to find a way to get the AuthenticatedUserName parameter to the
opentext core.
Here's my /query handler for the merged result
  
  
  
*:*

id, attr_general_name, attr_general_owner,
attr_general_creator, attr_general_modifier, attr_general_description,
attr_general_creationdate, attr_general_modifydate, solr.title, 
content, category, link, pubdateiso

localhost:8080/solr/opentext/,localhost:8080/solr/Profiles/
   
  
manifoldCFSecurity
  
  

As you can see, I tried calling manifoldCFSecurity first and it didn't work. 
I was thinking perhaps I can call the shards directly in the URL and put the
AuthenticatedUserName on the opentext shard but I'm getting pulled in
different directions currently.

Can anyone point me in the right direction?
Thanks,






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multicore-search-with-ManifoldCF-security-not-working-tp4036776.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multicore search with ManifoldCF security not working

2013-01-28 Thread eShard
I'm sorry, I don't know what you mean.
I clicked on the hidden email link, filled out the form and when I hit
submit; 
I got this error:
Domain starts with dot
Please fix the error and try again.

Who exactly am I sending this to and how do I get the form to work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multicore-search-with-ManifoldCF-security-not-working-tp4036776p4036829.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to use SolrAjax with multiple cores?

2013-01-28 Thread eShard
Hi,
I need to build a UI that can access multiple cores. And combine them all on
an Everything tab.
The solrajax example only has 1 core.
How do I setup multicore with solrajax? 
Do I setup 1 manager per core? How much of a performance hit will I take
with multiple managers running?
Is there a better way to do this?
Is there a better UI to use?

Can anyone point me in the right direction?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-use-SolrAjax-with-multiple-cores-tp4036840.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching for field that contains multiple values

2013-01-28 Thread eShard
All I had to do was put a wildcard before and after the search term and it
would succeed. (*Maritime*)
Searching multi value fields wouldn't work any other way. 
Like so:
http://localhost:8080/solr/Blogs/select?q=title%3A*Maritime*&wt=xml

but I'll check out those other suggestions...

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-for-field-that-contains-multiple-values-tp4033944p4036854.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can you call the elevation component in another requesthandler?

2013-02-07 Thread eShard
Good day,
I got my elevation component working with the /elevate handler. 
However, I would like to add the elevation component to my main search
handler which is currently /query.
so I can have one handler return everything (elevated items with "regular"
search results; i.e. one stop shopping, so to speak)
This is what I tried:
  
 
   explicit
   xml
   true
   text
 
 
elevator
manifoldCFSecurity 
 
  

I also tried it in first components as well.
Is there any way to combine these? Otherwise the UI will have to make
separate ajax calls and we're trying to minimize that.
Thanks,








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-you-call-the-elevation-component-in-another-requesthandler-tp4039054.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can you call the elevation component in another requesthandler?

2013-02-07 Thread eShard
Update:
Ok, If I search for gangnam style in /query handler by itself, elevation
works!
If I search with gangnam style and/or something else the elevation component
doesn't work but the rest of the query does.

here's the examples:
works:
/query?q=gangnam+style&fl=*,[elevated]&wt=xml&start=0&rows=50&debugQuery=true&dismax=true

elevation fails:
/query?q=gangnam+style+OR+title%3A*White*&fl=*,[elevated]&wt=xml&start=0&rows=50&debugQuery=true&dismax=true

So I guess I have to do separate queries at this point.
Is there a way to combine these 2 request handlers?

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-you-call-the-elevation-component-in-another-requesthandler-tp4039054p4039076.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.0 is stripping XML format from RSS content field

2013-02-11 Thread eShard
Hi,
I'm running solr 4.0 final with manifoldcf 1.1 and I verified via fiddler
that Manifold is indeed sending the content field from a RSS feed that
contains xml data
However, when I query the index the content field is there with just the
data; the XML structure is gone.
Does anyone know how to stop Solr from doing this?
I'm using tika but I don't see it in the update/extract handler.
Can anyone point me in the right direction?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-is-stripping-XML-format-from-RSS-content-field-tp4039809.html
Sent from the Solr - User mailing list archive at Nabble.com.


query builder for solr UI?

2013-02-27 Thread eShard
Good day,
Currently we are building a front end for solr (in jquery, html, and css)
and I'm struggling with making a query builder that can handle pretty much
whatever the end user types into the search box.
does something like this already exist in javascript/jquery?

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/query-builder-for-solr-UI-tp4043481.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: query builder for solr UI?

2013-02-28 Thread eShard
sorry,
The easiest way to describe it is specifically we desire a "google-like"
experience.
so if the end user types in a phrase or quotes or +, - (for and, not) etc
etc.
the UI will be flexible enough to build the correct solr query syntax.

How will edismax help?

And I tried simplifying queries by using the copyfield command to copy all
of the metadata to the text field.
So now the only field we have to query is the text field but I doubt that is
going to be a panacea.

Does that make sense?

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/query-builder-for-solr-UI-tp4043481p4043643.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: query builder for solr UI?

2013-02-28 Thread eShard
Good question,
if the user types in special characters like the dash - 
How will I know to treat it like a dash or the NOT operator? The first one
will need to be URL encoded the second one won't be resulting in very
different queries.

So I apologize for not being more clear, so really what I'm after is making
it easy for the user to communicate what exactly they are looking for and to
URL encode their input correctly. that's what I meant by "query building"

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/query-builder-for-solr-UI-tp4043481p4043659.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to get solr (tika?) to capture more metadata from RSS feed?

2013-03-01 Thread eShard
Hi,
I have a lot of non standard IBM RSS feeds that needs to be crawled (via
ManifoldCF v1.1.1) and put into solr 4.0 final.
The problem is that we need to put the additional non standard metadata into
solr.
I've confirmed via fiddler that manifoldcf is indeed sending all the
appropriate metadata but something in solr is removing all of it. It's
either tika, rome or something else in solr.
see this link for more details  tika post

  

So, is there a way to configure tika (or rome which handles RSS parsing) to
capture the additional metadata?
I read that the tika config file is deprecated or obsolete. Is that true?

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-solr-tika-to-capture-more-metadata-from-RSS-feed-tp4044015.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr not getting OpenText document name and metadata

2012-07-27 Thread eShard
Hi,
I'm currently using ManifoldCF (v.5.1) to crawl OpenText (v10.5) and the
output is sent to Solr (4.0 alpha).
All I see in the index is an id = to the opentext download URL and a version
(a big integer value).
What I don't see is the document name from OpenText or any of the Opentext
metadata.
Does anyone know how I can get this data? because I can't even search by
document name or by document extension! 
Only a few of the documents actually have a title in the solr index. but the
Opentext name of the document is nowhere to be found.
if I know some text within the document I can search for that.
I'm using the default schema with tika as the extraction handler
I'm also using uprefix = attr to get all of the ignored properties but most
of those are useless.
Please advise...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-not-getting-OpenText-document-name-and-metadata-tp3997786.html
Sent from the Solr - User mailing list archive at Nabble.com.


How do you get the document name from Open Text?

2012-08-02 Thread eShard
I'm using Solr 4.0 with ManifoldCF .5.1 crawling Open Text v10.5. 
I have the cats/atts turned on in Open Text and I can see them all in the
Solr index.
However, the id is just the URL to download the doc from open text and the
document name either from Open Text or the document properties is nowhere to
be found.
I tried using resourceName in the solrconfig.xml as it was described in the
manual but it doesn't work.
I used this:


  text
  last_modified
  attr_
  File Name
  true



  

but all I get is "File Name" in resourceName. Should I leave the value blank
or is there some other field I should use?
Please advise



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-you-get-the-document-name-from-Open-Text-tp3998908.html
Sent from the Solr - User mailing list archive at Nabble.com.