Good afternoon,
I'm using solr 4.0 Final.
I have an IBM atom feed I'm trying to index but it won't work.
There are no errors in the log.
All the other DIH I've created consumed RSS 2.0
Does it NOT work with an atom feed?
here's my configuration:
https://[redacted]";
processor="XPat
The only message I get is:
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
Requests: 1, Skipped: 0
And there are no errors in the log.
Here's what the ibm atom feed looks like:
http://www.w3.org/2005/Atom";
xmlns:wplc="http://www.ibm.com/wplc/atom/1.0";
xmlns:age="http://p
I confirmed the xpath is correct with a third party XPath visualizer.
/atom:feed/atom:entry parses the xml correctly.
Can anyone confirm or deny that the dataimporthandler can handle an atom
feed?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandle
Ok, I found one typo:
the links need to be this: /atom:feed/atom:entry/atom:link/@href
But the import still doesn't work... :(
I guess I have to convert the feed over to RSS 2.0
--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-f
Gora! It works now!
You are amazing! thank you so much!
I dropped the atom: from the xpath and everything is working.
I did have a typo that might have been causing issues too.
thanks again!
--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-cons
Good afternoon,
I'm using solr 4.0 Final
I need movies "hidden" in zip files that need to be excluded from the index.
I can't filter movies on the crawler because then I would have to exclude
all zip files.
I was told I can have tika skip the movies.
the details are escaping me at this point.
How d
Good morning,
My company uses Solr4.0Final and I need to add some code to it and
recompile.
However, when I rebuild, all of the jars and the war file say Solr 5.0!
I'm using the old build.xml file from 4.0 so I don't know why it's
automatically upgrading.
How do I force it to build the older versi
Ok, I think I figured it out.
Somehow my Solr4.0Final project was accidentally updated to 5.0.
The solr/build.xml was fine.
the build.xml file at the top level was pointed at 5.0-snapshot.
I need to pull down the 4.0 and start from scratch.
--
View this message in context:
http://lucene.47206
Good morning Solr compatriots,
I'm using Solr4.0Final and I have synonyms.txt in my schema (only at query
time) like so:
Good morning to one and all,
I'm using Solr 4.0 Final and I've been struggling mightily with the
elevation component.
It is too limited for our needs; it doesn't handle phrases very well and I
need to have more than one doc with the same keyword or phrase.
So, I need a better solution. One that all
Good morning,
I have a 1 TB repository with approximately 500,000 documents (that will
probably grow from there) that needs to be indexed.
I'm limited to Solr 4.0 final (we're close to beta release, so I can't
upgrade right now) and I can't use SolrCloud because work currently won't
allow it for
Wow, thanks for your response.
You raise a lot of great questions; I wish I had the answers!
We're still trying to get enough resources to finish crawling the
repository, so I don't even know what the final size of the index will be.
I've thought about excluding the videos and other large files and
P.S.
Offhand, how do I control how much of the index is held in RAM?
Can you point me in the right direction?
Thanks,
--
View this message in context:
http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227p4098260.html
Sent from the Solr - User
Wow again!
Thank you all very much for your insights.
We will certainly take all of this under consideration.
Erik: I want to upgrade but unfortunately, it's not up to me. You're right,
we definitely need to do it.
And SolrJ sounds interesting, thanks for the suggestions.
By the way, is ther
Good morning,
Here's the issue:
I have and ID that consists of two letters and a number.
The whole user title looks like this: Lastname, Firstname (LA12345).
Now, with my current configuration, I can search for LA12345 and find the
user.
However, when I type in just the number I get zero results.
Good morning,
In the Apache Solr 4 cookbook, p 112 there is a recipe for setting up
phrase searches; like so:
I ran a sample query q=text_ph:"a-z index" and it didn't work very well at
all.
Is there a bet
Thanks, I'll remove the snowball filter and give it try.
I guess I'm looking for an exact phrase match to start. (Is that the
standard phrase search?)
Is there something better or more versatile?
Btw, great job on the book!
--
View this message in context:
http://lucene.472066.n3.nabble.com/ODP
Hi,
My crawler uploads all the documents to Solr for indexing to a tomcat/temp
folder.
Over time this folder grows so large that I run out of disk space.
So, I wrote a bash script to delete the files and put it in the crontab.
However, if I delete the docs too soon, it doesn't get indexed; too
Hi,
I'm using Solr 4.0 Final (yes, I know I need to upgrade)
I'm getting this error:
SEVERE: org.apache.solr.common.SolrException: no field name specified in
query and no default specified via 'df' param
And I applied this fix: https://issues.apache.org/jira/browse/SOLR-3646
And unfortunately, t
Good afternoon,
I have this DIH:
https://redacted/";
processor="XPathEntityProcessor"
forEach="/rss/channel/item"
transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer">
Hi Erick,
Let me make sure I understand you:
I'm NOT running SolrCloud; so I just have to put the default field in ALL of
my solrconfig.xml files and then restart and that should be it?
Thanks for your reply,
--
View this message in context:
http://lucene.472066.n3.nabble.com/SEVERE-org-apa
Ok, I updated all of my solrconfig.xml files and I restarted the tomcat
server
AND the errors are still there on 2 out of 10 cores
Am I not reloading correctly?
Here's my /browse handler:
explicit
velocity
browse
layout
Solritas
Good afternoon all,
I just implemented a phrase search and the parsed query gets changed from
rapid prototyping to rapid prototype.
I used the solr analyzer and prototyping was unchanged so I think I ruled
out a tokenizer.
So can anyone tell me what's going on?
Here's the query:
q=rapid prototypin
Ok, I think I'm on to something.
I omitted this parameter which means it is set to false by default on my
text field.
I need to set it to true and see what happens...
autoGeneratePhraseQueries="true"
If I'm reading the wiki right, this parameter if true will preserve phrase
queries...
--
View
No, apparently it's the KStemFilter.
should I turn this off at query time?
I'll put this in another question...
--
View this message in context:
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161199.html
Sent from the Solr - User mailing list archive at Nabble.co
Good afternoon,
Here's my configuration for a text field.
I have the same configuration for index and query time.
Is this valid?
What's the best practice for these query or index or both?
for synonyms; I've read conflicting reports on when to use it but I'm
currently changing it over to at indexin
Good evening,
I'm using solr 4.0 Final.
I tried using this function
boost=recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))
but it fails with this error:
org.apache.lucene.queryparser.classic.ParseException: Expected ')' at
position 29 in 'recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))'
I a
Thanks we're planning on going to 4.10.1 in a few months.
I discovered that recip only works with dismax; I use edismax by default.
does anyone know why I can't use recip with edismax??
I hope this is fixed in 4.10.1...
Thanks,
--
View this message in context:
http://lucene.472066.n3.nabble.
Thank you very much for your replies.
I discovered there was a typo in the function I was given.
One of the parenthesis was in the wrong spot
It should be this:
boost=recip(ms(NOW/HOUR,general_modifydate),3.16e-11,0.08,0.05)
And now it works with edismax! Strange...
Thanks again,
--
View this
let's say I have a div with id="myDiv"
Is there a way to set up the solr upate/extract handler to capture just that
particular div?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html
Sent from the Solr - User mailing list a
Good morning,
I'm currently running Solr 4.0 final (multi core) with manifoldcf v1.3 dev
on tomcat 7.
Early on, I used copyfield to put the meta data into the text field to
simplify solr queries (i.e. I only have to query one field now.)
However, a lot people are concerned about improving relevance
Sure, let's say the user types in test pdf;
we need the results with all the query words to be near the top of the
result set.
the query will look like this: /select?q=text%3Atest+pdf&wt=xml
How do I ensure that the top resultset contains all of the query words?
How can I boost the first (or secon
Hi,
I'm currently using solr 4.0 final with Manifoldcf v1.3 dev.
I have multivalued titles (the names are all the same so far) that must go
into a single valued field.
Can a transformer do this?
Can anyone show me how to do it?
And this has to fire off before an update chain takes place.
Thanks,
Ok, I have one index called Communities from an RSS feed.
each item in the feed has multiple titles (which are all the same for this
feed)
So, the title needs to be cleaned up before it is put into the community
index
let's call the field community_title;
And then an UpdateProcessorChain needs to
Good morning,
I'm using solr 4.0 final on tomcat 7.0.34 on linux
I created 3 new data import handlers to consume 3 RSS feeds.
They seemed to work perfectly.
However, today, I'm getting these errors:
10:42:17SEVERE SolrCorejava.lang.IndexOutOfBoundsException:
Index: 9,
Size: 8
10:
Ok, these errors seem to be caused by passing incorrect parameters in a
search query.
Such as: spellcheck=extendedResults=true
instead of
spellcheck.extendedResults=true
Thankfully, it seems to have nothing to do with the DIH at all.
--
View this message in context:
http://lucene.472066.n3.n
I just resolved this same error.
The problem was that I had a lot of ampersands (&) that were un-escaped in
my XML doc
There was nothing wrong with my DIH; it was the xml doc it was trying to
consume.
I just used StringEscapeUtils.escapeXml from apache to resolve...
Another big help was the Eclipse
Good morning,
I have an IBM Portal atom feed that spans multiple pages.
Is there a way to instruct the DIH to grab all available pages?
I can put a huge range in but that can be extremely slow with large amounts
of XML data.
I'm currently using Solr 4.0 final.
Thanks,
--
View this message in co
Hi,
I'm using solr 4.0 final built around Dec 2012.
I was initially told that the QEC didn't work for distributed search but
apparently it was fixed.
Anyway, I use the /elevate handler with [elevated] in the field list and I
don't get any elevated results.
elevated=false in the result block.
howeve
Sure,
Here are the results with the debugQuery=true; with debugging off, there are
no results.
The elevated result appears in the queryBoost section but not in the result
section:
0
0
true
xml
100
*,[elevated]
text
true
0
gangnam
I can guarantee you that the ID is unique and it exists in that index.
--
View this message in context:
http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531p4087565.html
Sent from the Solr - User mailing list archive at Nabble.com.
Good morning,
I'm currently using Solr 4.0 FINAL.
I indexed a website and it took over 24 hours to crawl.
I just realized I need to rename one of the fields (or add a new one).
so I added the new field to the schema,
But how do I copy the data over from the old field to the new field without
recra
If anyone is interested, I managed to resolve this a long time ago.
I used a Data Import Handler instead and it worked beautifully.
DIH are very forgiving and it takes what ever XML data is there and injects
it into the Solr Index.
It's a lot faster than crawling too.
You use XPATH to map the field
Good morning,
I'm currently running Solr 4.0 final with tika v1.2 and Manifoldcf v1.2 dev.
And I'm battling Tika XML parse errors again.
Solr reports this error:org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: XML parse error which is too vague.
I had to manu
ok, one possible fix is to add the xml equivalent to nbsp with is:
]>
but how do I add this into the tika configuration?
--
View this message in context:
http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053823.html
Sent from the Solr - User mailing list archiv
Yes, that's it exactly.
I crawled a link with these ( ›) in each list item and solr
couldn't handle it threw the xml parse error and the crawler terminated the
job.
Is this fixable? Or do I have to submit a bug to the tika folks?
Thanks,
--
View this message in context:
http://lucene.472066.
Good morning everyone,
I'm running solr 4.0 Final with ManifoldCF v1.2dev on tomcat 7.0.37 and I
had shards up and running on http but when I migrated to SSL it won't work
anymore.
First I got an IO Exception but then I changed my configuration in
solrconfig.xml to this:
explicit
Ok,
We figured it out:
The cert wasn't in the trusted CA keystore. I know we put it in there
earlier; I don't know why it was missing.
But we added it in again and everything works as before.
Thanks,
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-configure-shards-
Hi,
I'm currently using Solr 4.0 final on tomcat v7.0.3x
I have 2 cores (let's call them A and B) and I need to combine them as one
for the UI.
However we're having trouble on how to best merge these two result sets.
Currently, I'm using relevancy to do the merge.
For example,
I search for "red"
Good afternoon,
I'm using solr 4.0 final with manifoldcf v1.2dev on tomcat 7.0.34
today, a user asked a great question. What if I only know the name of the
folder that the documents are in?
Can I just search on the folder name?
Currently, I'm only indexing documents; how do I capture the folder nam
Good afternoon,
Does anyone know of a good tutorial on how to perform SQL like aggregation
in solr queries?
Thanks,
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-aggregate-data-in-solr-4-0-tp4063584.html
Sent from the Solr - User mailing list archive at Nabble.com
I'm currently running solr 4.0 final with manifoldcf 1.3 dev on tomcat 7.
I need to capture the "h1" tags on each web page as that is the true "title"
for the lack of a better word.
I can't seem to get it to work at all.
I read the instructions and used the capture component and then mapped it to
Ok, I figured it out:
you need to add this too:
true
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-do-I-capture-h1-tags-tp4072792p4072798.html
Sent from the Solr - User mailing list archive at Nabble.com.
I'm running Solr 4.0 on Tomcat 7.0.8 and I'm running the solr/example single
core as well with manifoldcf v1.1
I had everything working but then the crawler stops and I have Tika errors
in the solr log
I had tika 1.1 and that produces these errors:
org.apache.solr.common.SolrException:
org.apache.
Ok, I managed to fix the universal charset error is caused by a missing
dependency
just download universalchardet-1.0.3.jar and put it in your extraction lib
the microsoft errors will probably be fixed in a future release of the POI
jars. (v3.9 didn't fix this error)
--
View this message in con
I'm currently running solr 4.0 alpha with manifoldCF v1.1 dev
Manifold is sending solr the datetime as milliseconds expired after
1-1-1970.
I've tried setting several date.formats in the extraction handler but I
always get this error:
and the manifoldcf crawl aborts.
SolrCoreorg.apache.sol
I'll certainly ask manifold if they can send the date in the correct format.
Meanwhile;
How would I create an updater to change the format of a date?
Are there any decent examples out there?
thanks,
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-invalid-date-string-tp
I just found out I must upgrade to Solr 4.0 final (from 4.0 alpha)
I'm currently running Solr 4.0 alpha on Tomcat 7.
Is there an easy way to surgically replace files and upgrade?
Or should I completely start over with a fresh install?
Ideally, I'm looking for a set of steps...
Thanks,
--
View
I downloaded the latest from solr.
I applied a patch
cd to solr dir
and I try ant dist
I get these ivy errors
ivy-availability-check:
[echo] Building analyzers-phonetic...
ivy-fail:
[echo] This build requires Ivy and Ivy could not be found in your
ant classpath.
[echo] (Due
Ok, the old problem was that eclipse was using a different version of ant
1.8.3.
I dropped the ivy jar in the build path and now I get these errors:
[ivy:retrieve] ERRORS
[ivy:retrieve] Server access Error: Connection timed out: connect
url=http://repo1.maven.org/maven2/commons-codec/commons-
Does anyone have a great tutorial for learning the solr query language,
dismax and edismax?
I've searched endlessly for one but I haven't been able to locate one that
is comprehensive enough and has a lot of examples (that actually work!).
I also tried to use wildcards, logical operators, and a phr
I'm currently running Solr 4.0 final on tomcat v7.0.34 with ManifoldCF v1.2
dev running on Jetty.
I have solr multicore set up with 10 cores. (Is this too much?)
I so I also have at least 10 connectors set up in ManifoldCF (1 per core, 10
JVMs per connection)
>From the look of it; Solr couldn't ha
I keep seeing these in the tomcat logs:
Jan 17, 2013 3:57:33 PM org.apache.solr.core.SolrCore execute
INFO: [Lisa] webapp=/solr path=/admin/logging
params={since=1358453312320&wt=jso
n} status=0 QTime=0
I'm just curious:
What is getting executed here? I'm not running any queries against this core
Hi,
I'm trying to test out the queryelevationcomponent.
elevate.xml is in the solrconfig.xml and it's in the conf directory.
I left the defaults.
I added this to the elevate.xml
https://opentextdev/cs/llisapi.dll?func=ll&objID=577575&objAction=download";
/>
id is a string setup as the uniq
Hi,
This is related to my earlier question regarding the elevationcomponent.
I tried turning this on:
If you are using the QueryElevationComponent, you may wish to mark
documents that get boosted. The
EditorialMarkerFactory will do exactly that:
-->
but it fails to loa
Good morning,
I can't seem to figure out how to load this class
Can someone please point me in the right direction?
Thank you,
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp40352
Thanks,
That worked.
So the documentation needs to be fixed in a few places (the solr wiki and
the default solrconfig.xml in Solr 4.0 final; I didn't check any other
versions)
I'll either open a new ticket in JIRA to request a fix or reopen the old
one...
Furthermore,
I tried using the ElevatedMar
In case anyone was wondering, the solution is to html encode the URL.
Solr didn't like the &'s; just convert them to & and it works!
--
View this message in context:
http://lucene.472066.n3.nabble.com/error-initializing-QueryElevationComponent-tp4035194p4036261.html
Sent from the Solr - User ma
Good morning,
I used this post here to join to search 2 different cores and return one
data set.
http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set
The good news is that it worked!
The bad news is that one of the cores is Opentext and the ManifoldCF
secu
I'm sorry, I don't know what you mean.
I clicked on the hidden email link, filled out the form and when I hit
submit;
I got this error:
Domain starts with dot
Please fix the error and try again.
Who exactly am I sending this to and how do I get the form to work?
--
View this message in context
Hi,
I need to build a UI that can access multiple cores. And combine them all on
an Everything tab.
The solrajax example only has 1 core.
How do I setup multicore with solrajax?
Do I setup 1 manager per core? How much of a performance hit will I take
with multiple managers running?
Is there a bett
All I had to do was put a wildcard before and after the search term and it
would succeed. (*Maritime*)
Searching multi value fields wouldn't work any other way.
Like so:
http://localhost:8080/solr/Blogs/select?q=title%3A*Maritime*&wt=xml
but I'll check out those other suggestions...
Thanks,
-
Good day,
I got my elevation component working with the /elevate handler.
However, I would like to add the elevation component to my main search
handler which is currently /query.
so I can have one handler return everything (elevated items with "regular"
search results; i.e. one stop shopping, so
Update:
Ok, If I search for gangnam style in /query handler by itself, elevation
works!
If I search with gangnam style and/or something else the elevation component
doesn't work but the rest of the query does.
here's the examples:
works:
/query?q=gangnam+style&fl=*,[elevated]&wt=xml&start=0&rows=5
Hi,
I'm running solr 4.0 final with manifoldcf 1.1 and I verified via fiddler
that Manifold is indeed sending the content field from a RSS feed that
contains xml data
However, when I query the index the content field is there with just the
data; the XML structure is gone.
Does anyone know how to st
Good day,
Currently we are building a front end for solr (in jquery, html, and css)
and I'm struggling with making a query builder that can handle pretty much
whatever the end user types into the search box.
does something like this already exist in javascript/jquery?
Thanks,
--
View this messa
sorry,
The easiest way to describe it is specifically we desire a "google-like"
experience.
so if the end user types in a phrase or quotes or +, - (for and, not) etc
etc.
the UI will be flexible enough to build the correct solr query syntax.
How will edismax help?
And I tried simplifying queries
Good question,
if the user types in special characters like the dash -
How will I know to treat it like a dash or the NOT operator? The first one
will need to be URL encoded the second one won't be resulting in very
different queries.
So I apologize for not being more clear, so really what I'm af
Hi,
I have a lot of non standard IBM RSS feeds that needs to be crawled (via
ManifoldCF v1.1.1) and put into solr 4.0 final.
The problem is that we need to put the additional non standard metadata into
solr.
I've confirmed via fiddler that manifoldcf is indeed sending all the
appropriate metadata b
Hi,
I'm currently using ManifoldCF (v.5.1) to crawl OpenText (v10.5) and the
output is sent to Solr (4.0 alpha).
All I see in the index is an id = to the opentext download URL and a version
(a big integer value).
What I don't see is the document name from OpenText or any of the Opentext
metadata.
D
I'm using Solr 4.0 with ManifoldCF .5.1 crawling Open Text v10.5.
I have the cats/atts turned on in Open Text and I can see them all in the
Solr index.
However, the id is just the URL to download the doc from open text and the
document name either from Open Text or the document properties is nowhe
81 matches
Mail list logo