Hi,
From what I have read, I think I have to use Tika (?) to index PDF,
xls, doc, etc files. How do I start? Do I use mvn clean install in the
source directory to get all the jar files to begin? Centos doesn't
provide mvn, how do I build Tika after getting it from
http://maven.apache.org ?
On 05/07/2012 10:35 PM, Jack Krupansky wrote:
Try SolrCell (ExtractingRequestHandler).
See:
http://wiki.apache.org/solr/ExtractingRequestHandler
-- Jack Krupansky
-Original Message- From: Tolga Sent: Monday, May 07, 2012 3:24
PM To: solr-user@lucene.apache.org Subject: PDF indexing
Hi,
Probably off-topic, but what directory should I export to CLASSPATH
environment variable so that I can begin using nutch?
Regards,
Otis,
I've just subscribed to nutch mailing list, however it's a very
low-volume one (at least that's what I came across), so can't I ask here?
Regards,
On 5/8/12 11:54 PM, Otis Gospodnetic wrote:
Tolga - you should ask on the Nutch mailing list, not Solr one. :)
Oti
Hi,
Apache servers are returning my post with the status messages
HTML_FONT_SIZE_HUGE,HTML_MESSAGE,HTTP_ESCAPED_HOST,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,URI_HEX,WEIRD_PORT.
I've tried clearing all formatting and a re-post, but the same thing
occurred. What to do?
Regards,
Hi,
I've been reading
http://lucene.apache.org/solr/api/doc-files/tutorial.html and in the
section "Deleting Data", I've edited schema.xml to include a field named
id, issued the command for f in *;java -Ddata=args -Dcommit=no -jar
post.jar "$f";done, went on to the stats page
only to find no
Sorry, commit=no should have been commit=yes in my previous post.
Regards,
Anyone at all?
Original Message
Subject:Delete documents
Date: Thu, 10 May 2012 22:59:49 +0300
From: Tolga
To: solr-user@lucene.apache.org
Hi,
I've been reading
http://lucene.apache.org/solr/api/doc-files/tutorial.html and in the
section "Del
r/FAQ#How_can_I_delete_all_documents_from_my_index.3F
-- Jack Krupansky
-Original Message- From: Tolga
Sent: Friday, May 11, 2012 12:31 AM
To: solr-user@lucene.apache.org
Subject: Fwd: Delete documents
Anyone at all?
Original Message
Subject: Delete documents
Date: Th
Hi,
I have a few questions, please bear with me:
1- I have a theory. nutch may be used to index to solr when we don't
have access to URL's file system, while we can use curl when we do have
access. Am I correct?
2- A tutorial I have been reading is talking about different levels of
id. Is the
Hi,
I have been trying for a week. I really want to get a start, so what
should I use? curl or nutch? I want to be able to index pdf, xml etc.
and search within them as well.
Regards,
16, 2012 at 1:13 PM, Tolga wrote:
Hi,
I have been trying for a week. I really want to get a start, so what
should I use? curl or nutch? I want to be able to index pdf, xml etc. and
search within them as well.
Regards,
Hi,
Is there a way what fields to add to schema.xml prior to crawling with
nutch, rather than crawling over and over again and fixing the fields
one by one?
Regards,
Hi,
I have 96 documents added to index, and I would like to be able to
search in them in plain text, without using complex search queries. How
can I do that?
Regards,
?
Besides, keywords and quoted phrases?
The dismax query parser may be good enough.
-- Jack Krupansky
-Original Message----- From: Tolga
Sent: Friday, May 18, 2012 6:27 AM
To: solr-user@lucene.apache.org
Subject: Search plain text
Hi,
I have 96 documents added to index, and I would like
Hi,
I've put the line indexed="true"/> in my schema.xml and restarted Solr, crawled my
website, and indexed (I've also committed but do I really have to
commit?). But I still have to search with content:mykeyword at the admin
interface. What do I have to do so that I can search only with mykey
or that hadn't
> changed since the last crawl, leaving the old index data as it was before the
> change.
>
> -- Jack Krupansky
>
> -Original Message- From: Tolga
> Sent: Friday, May 18, 2012 9:54 AM
> To: solr-user@lucene.apache.org
> Subject: copyFie
Default field? I'm not sure but I think I do. Will have to look.
myPhone'dan gönderdim
18 May 2012 tarihinde 18:11 saatinde, Yury Kats şunları
yazdı:
> On 5/18/2012 9:54 AM, Tolga wrote:
>> Hi,
>>
>> I've put the line > indexed="true"/
Oh this one. Yes I have it.
myPhone'dan gönderdim
18 May 2012 tarihinde 23:14 saatinde, Yury Kats şunları
yazdı:
> On 5/18/2012 4:02 PM, Tolga wrote:
>> Default field? I'm not sure but I think I do. Will have to look.
>
> http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field
Hi,
I am getting this error:
[doc=null] missing required field: id
request: http://localhost:8983/solr/update?wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrSer
How do I verify it exists? I've been crawling the same site and it
wasn't giving an error on Thursday.
Regards,
On 5/21/12 1:20 PM, Michael Kuhlmann wrote:
Am 21.05.2012 12:07, schrieb Tolga:
Hi,
I am getting this error:
[doc=null] missing required field: id
[...]
I've
Yes.
On 5/21/12 1:49 PM, Michael Kuhlmann wrote:
Am 21.05.2012 12:40, schrieb Tolga:
How do I verify it exists? I've been crawling the same site and it
wasn't giving an error on Thursday.
It depends on what you're doing.
Are you using nutch?
-Kuli
Hi,
Can you recommend a good PHP UI to search? Is SolrPHPClient good?
Hi,
Two separate things asked in one thread...
I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about excerpt?
Thanks and regards,
Krupansky
-Original Message----- From: Tolga
Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt
Hi,
Two separate things asked in one thread...
I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight
that you need?
Try "/browse" in the Solr example. It does exactly what your example
shows. So, what else is it that you are trying to do? Or if something
isn't working, what specifically isn't working?
-- Jack Krupansky
-Original Message- From: Tolga
Sent: Thur
Hi,
When I started Solr, I got the following errors. The same are at
http://www.example.com:8983/solr
SEVERE: Exception during parsing file:
schema:org.xml.sax.SAXParseException: Open quote is expected for
attribute "{1}" associated with an element type "source".
at
com.sun.org.apache
Hi,
I'm trying to crawl my website with Nutch, and I think Nutch completed
properly. However, I got these errors when the results were being
indexed. It is not providing information to my knowledge except "Severe
errors in the configuration". What is the problem? Or is there a tool to
test my
Most probably I found out. I closed the XML tag with />. :S
Thanks anyway,
Original Message
Subject:Error while indexing with Nutch
Date: Mon, 10 Sep 2012 11:55:02 +0300
From: Tolga
To: solr-user@lucene.apache.org
Hi,
I'm trying to crawl my webs
Hi,
I installed Solr and Nutch on a server, crawled with Nutch, and searched
at http://localhost:8983/solr/, to no avail. I mean it turns up no
results. What to do?
Regards,
Nope. Nutch says "Adding x documents" then "Error adding title 'Sabancı
University'".
On 10/04/2012 03:59 PM, Otis Gospodnetic wrote:
Hi
Search for *:* to retrieve all docs. Got anything?
Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 4, 2012
uot;Indexing
224 documents"
Regards,
On 10/04/2012 01:57 PM, Erick Erickson wrote:
I'm at a complete loss here, you've provided no
information at all to help diagnose your issues. Please
review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Thu, Oct 4, 2012 a
If Solr is still
running, you could manually send a commit yourself.
-- Jack Krupansky
-Original Message----- From: Tolga
Sent: Friday, October 05, 2012 12:14 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr search
Nope. Nutch says "Adding x documents" then "Error adding
Hi,
There are two servers with the same configuration. I crawl the same URL.
One of them is giving the following error:
Caused by: org.apache.solr.common.SolrException: ERROR:
[doc=http://bilgisayarciniz.org/] multiple values encountered for non
multiValued copy field text: bilgisayarciniz w
's schema with its own.
On 10/08/2012 01:33 PM, Jan Høydahl wrote:
Hi,
Please describe your environemnt better
* How do you "crawl", using which crawler?
* To which RequestHandler do you send the docs?
* Which version of Solr
* Can you share your schema and other relevant config wit
#x27;s schema with its own.
Regards,
On 10/08/2012 01:33 PM, Jan Høydahl wrote:
Hi,
Please describe your environemnt better
* How do you "crawl", using which crawler?
* To which RequestHandler do you send the docs?
* Which version of Solr
* Can you share your schema and other relevan
Hi,
My previous schema didn't have the body defined as field, so I did and
searched for "body:Smyrna", and no results turned up. What am I doing wrong?
Regards,
I had no idea I had to index again, thanks for the heads up.
On 10/09/2012 02:58 PM, Rafał Kuć wrote:
Hello!
After altering your schema.xml have you indexed your documents again ?
It would be nice to see how you schema.xml looks like and example of
the data, because otherwise we can only guess
I've just indexed again, and no luck.
Below is my schema
sortMissingLast="true"
omitNorms="true"/>
precisionStep="0"
omitNorms="true" positionIncrementGap="0"/>
I was expecting to be able to search in the body, but apparently I don't
need it according to Markus.
Regards,
On 10/09/2012 03:27 PM, Rafał Kuć wrote:
Hello!
I assume you've added the body field, but you don't populate it. As
far as I remember Nutch don't fill the body field by default. What
Hi,
I use nutch to crawl my website and index to solr. However, how can I
search for piece of content in a specific website? I use multiple URL's
Regards,
Hi again,
In Nutch list, I was told to use "url:example\.net AND content:some
keyword" and so I did. However, I get results from both my URLs. Why
this behaviour?
Regards,
PS: I've re(crawl|index)ed my data.
On 10/12/2012 05:07 PM, Otis Gospodnetic wrote:
Hi Tolga,
You
Hello,
I was wondering if there was any facility to directly manipulate search results
based on business criteria to place documents at a fixed position in those
results. For example, when I issue a query, the first four results would be
based on natural search relevancy, then the fifth result
... This is not based
on query term, but appears to be based on a "document type" meta-data field. We
can certainly create the meta-data in Solr, but I can't seem to figure out how
to manipulate the search results to the extent I need.
On 2/24/09 9:12 AM, "Steven A Row
Hi,
I've started Solr as usual, and when I browsed to
http://www.example.com:8983/solr/admin, I got
HTTP ERROR 404
Problem accessing /solr/admin/index.jsp. Reason:
missing core name in path
Powered by Jetty://
Also, below are the lines I got when starting it:
SEVERE: org.apache.solr.co
anges have
occurred. Maybe you were viewing it in some editor and accidentally
hit some keys that corrupted the format.
And, tell us what release of Solr you are using.
-- Jack Krupansky
-Original Message- From: Muzaffer Tolga Özses
Sent: Thursday, August 16, 2012 6:57 AM
To: solr
46 matches
Mail list logo