PDF indexing

2012-05-07 Thread Tolga
Hi, From what I have read, I think I have to use Tika (?) to index PDF, xls, doc, etc files. How do I start? Do I use mvn clean install in the source directory to get all the jar files to begin? Centos doesn't provide mvn, how do I build Tika after getting it from http://maven.apache.org ?

Re: PDF indexing

2012-05-07 Thread Tolga
On 05/07/2012 10:35 PM, Jack Krupansky wrote: Try SolrCell (ExtractingRequestHandler). See: http://wiki.apache.org/solr/ExtractingRequestHandler -- Jack Krupansky -Original Message- From: Tolga Sent: Monday, May 07, 2012 3:24 PM To: solr-user@lucene.apache.org Subject: PDF indexing

CLASSPATH

2012-05-08 Thread Tolga
Hi, Probably off-topic, but what directory should I export to CLASSPATH environment variable so that I can begin using nutch? Regards,

Re: CLASSPATH

2012-05-09 Thread Tolga
Otis, I've just subscribed to nutch mailing list, however it's a very low-volume one (at least that's what I came across), so can't I ask here? Regards, On 5/8/12 11:54 PM, Otis Gospodnetic wrote: Tolga - you should ask on the Nutch mailing list, not Solr one. :) Oti

Error messages

2012-05-10 Thread Tolga
Hi, Apache servers are returning my post with the status messages HTML_FONT_SIZE_HUGE,HTML_MESSAGE,HTTP_ESCAPED_HOST,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,URI_HEX,WEIRD_PORT. I've tried clearing all formatting and a re-post, but the same thing occurred. What to do? Regards,

Delete documents

2012-05-10 Thread Tolga
Hi, I've been reading http://lucene.apache.org/solr/api/doc-files/tutorial.html and in the section "Deleting Data", I've edited schema.xml to include a field named id, issued the command for f in *;java -Ddata=args -Dcommit=no -jar post.jar "$f";done, went on to the stats page only to find no

Delete data

2012-05-10 Thread Tolga
Sorry, commit=no should have been commit=yes in my previous post. Regards,

Fwd: Delete documents

2012-05-10 Thread Tolga
Anyone at all? Original Message Subject:Delete documents Date: Thu, 10 May 2012 22:59:49 +0300 From: Tolga To: solr-user@lucene.apache.org Hi, I've been reading http://lucene.apache.org/solr/api/doc-files/tutorial.html and in the section "Del

Re: Fwd: Delete documents

2012-05-11 Thread Tolga
r/FAQ#How_can_I_delete_all_documents_from_my_index.3F -- Jack Krupansky -Original Message- From: Tolga Sent: Friday, May 11, 2012 12:31 AM To: solr-user@lucene.apache.org Subject: Fwd: Delete documents Anyone at all? Original Message Subject: Delete documents Date: Th

Index an URL

2012-05-15 Thread Tolga
Hi, I have a few questions, please bear with me: 1- I have a theory. nutch may be used to index to solr when we don't have access to URL's file system, while we can use curl when we do have access. Am I correct? 2- A tutorial I have been reading is talking about different levels of id. Is the

curl or nutch

2012-05-16 Thread Tolga
Hi, I have been trying for a week. I really want to get a start, so what should I use? curl or nutch? I want to be able to index pdf, xml etc. and search within them as well. Regards,

Re: curl or nutch

2012-05-16 Thread Tolga
16, 2012 at 1:13 PM, Tolga wrote: Hi, I have been trying for a week. I really want to get a start, so what should I use? curl or nutch? I want to be able to index pdf, xml etc. and search within them as well. Regards,

Unknown field

2012-05-17 Thread Tolga
Hi, Is there a way what fields to add to schema.xml prior to crawling with nutch, rather than crawling over and over again and fixing the fields one by one? Regards,

Search plain text

2012-05-18 Thread Tolga
Hi, I have 96 documents added to index, and I would like to be able to search in them in plain text, without using complex search queries. How can I do that? Regards,

Re: Search plain text

2012-05-18 Thread Tolga
? Besides, keywords and quoted phrases? The dismax query parser may be good enough. -- Jack Krupansky -Original Message----- From: Tolga Sent: Friday, May 18, 2012 6:27 AM To: solr-user@lucene.apache.org Subject: Search plain text Hi, I have 96 documents added to index, and I would like

copyField

2012-05-18 Thread Tolga
Hi, I've put the line indexed="true"/> in my schema.xml and restarted Solr, crawled my website, and indexed (I've also committed but do I really have to commit?). But I still have to search with content:mykeyword at the admin interface. What do I have to do so that I can search only with mykey

Re: copyField

2012-05-18 Thread Tolga
or that hadn't > changed since the last crawl, leaving the old index data as it was before the > change. > > -- Jack Krupansky > > -Original Message- From: Tolga > Sent: Friday, May 18, 2012 9:54 AM > To: solr-user@lucene.apache.org > Subject: copyFie

Re: copyField

2012-05-18 Thread Tolga
Default field? I'm not sure but I think I do. Will have to look. myPhone'dan gönderdim 18 May 2012 tarihinde 18:11 saatinde, Yury Kats şunları yazdı: > On 5/18/2012 9:54 AM, Tolga wrote: >> Hi, >> >> I've put the line > indexed="true"/

Re: copyField

2012-05-18 Thread Tolga
Oh this one. Yes I have it. myPhone'dan gönderdim 18 May 2012 tarihinde 23:14 saatinde, Yury Kats şunları yazdı: > On 5/18/2012 4:02 PM, Tolga wrote: >> Default field? I'm not sure but I think I do. Will have to look. > > http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field

org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Tolga
Hi, I am getting this error: [doc=null] missing required field: id request: http://localhost:8983/solr/update?wt=javabin&version=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrSer

Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Tolga
How do I verify it exists? I've been crawling the same site and it wasn't giving an error on Thursday. Regards, On 5/21/12 1:20 PM, Michael Kuhlmann wrote: Am 21.05.2012 12:07, schrieb Tolga: Hi, I am getting this error: [doc=null] missing required field: id [...] I've

Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Tolga
Yes. On 5/21/12 1:49 PM, Michael Kuhlmann wrote: Am 21.05.2012 12:40, schrieb Tolga: How do I verify it exists? I've been crawling the same site and it wasn't giving an error on Thursday. It depends on what you're doing. Are you using nutch? -Kuli

UI

2012-05-21 Thread Tolga
Hi, Can you recommend a good PHP UI to search? Is SolrPHPClient good?

Hightlighting and excerpt

2012-05-31 Thread Tolga
Hi, Two separate things asked in one thread... I am crawling my websites with nutch. When I index them, I'd like to be able to highlight my keyword and display en excerpt containing that keyword. I found a solution with highlight, but what can I about excerpt? Thanks and regards,

Re: Hightlighting and excerpt

2012-05-31 Thread Tolga
Krupansky -Original Message----- From: Tolga Sent: Thursday, May 31, 2012 4:55 AM To: solr-user@lucene.apache.org Subject: Hightlighting and excerpt Hi, Two separate things asked in one thread... I am crawling my websites with nutch. When I index them, I'd like to be able to highlight

Re: Hightlighting and excerpt

2012-05-31 Thread Tolga
that you need? Try "/browse" in the Solr example. It does exactly what your example shows. So, what else is it that you are trying to do? Or if something isn't working, what specifically isn't working? -- Jack Krupansky -Original Message- From: Tolga Sent: Thur

Start up errors

2012-09-04 Thread Tolga
Hi, When I started Solr, I got the following errors. The same are at http://www.example.com:8983/solr SEVERE: Exception during parsing file: schema:org.xml.sax.SAXParseException: Open quote is expected for attribute "{1}" associated with an element type "source". at com.sun.org.apache

Error while indexing with Nutch

2012-09-10 Thread Tolga
Hi, I'm trying to crawl my website with Nutch, and I think Nutch completed properly. However, I got these errors when the results were being indexed. It is not providing information to my knowledge except "Severe errors in the configuration". What is the problem? Or is there a tool to test my

Fwd: Error while indexing with Nutch

2012-09-10 Thread Tolga
Most probably I found out. I closed the XML tag with />. :S Thanks anyway, Original Message Subject:Error while indexing with Nutch Date: Mon, 10 Sep 2012 11:55:02 +0300 From: Tolga To: solr-user@lucene.apache.org Hi, I'm trying to crawl my webs

Solr search

2012-10-04 Thread Tolga
Hi, I installed Solr and Nutch on a server, crawled with Nutch, and searched at http://localhost:8983/solr/, to no avail. I mean it turns up no results. What to do? Regards,

Re: Solr search

2012-10-04 Thread Tolga
Nope. Nutch says "Adding x documents" then "Error adding title 'Sabancı University'". On 10/04/2012 03:59 PM, Otis Gospodnetic wrote: Hi Search for *:* to retrieve all docs. Got anything? Otis -- Performance Monitoring - http://sematext.com/spm On Oct 4, 2012

Re: Solr search

2012-10-04 Thread Tolga
uot;Indexing 224 documents" Regards, On 10/04/2012 01:57 PM, Erick Erickson wrote: I'm at a complete loss here, you've provided no information at all to help diagnose your issues. Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Thu, Oct 4, 2012 a

Re: Solr search

2012-10-04 Thread Tolga
If Solr is still running, you could manually send a commit yourself. -- Jack Krupansky -Original Message----- From: Tolga Sent: Friday, October 05, 2012 12:14 AM To: solr-user@lucene.apache.org Subject: Re: Solr search Nope. Nutch says "Adding x documents" then "Error adding

I don't understand

2012-10-08 Thread Tolga
Hi, There are two servers with the same configuration. I crawl the same URL. One of them is giving the following error: Caused by: org.apache.solr.common.SolrException: ERROR: [doc=http://bilgisayarciniz.org/] multiple values encountered for non multiValued copy field text: bilgisayarciniz w

Re: I don't understand

2012-10-08 Thread Tolga
's schema with its own. On 10/08/2012 01:33 PM, Jan Høydahl wrote: Hi, Please describe your environemnt better * How do you "crawl", using which crawler? * To which RequestHandler do you send the docs? * Which version of Solr * Can you share your schema and other relevant config wit

Re: I don't understand

2012-10-08 Thread Tolga
#x27;s schema with its own. Regards, On 10/08/2012 01:33 PM, Jan Høydahl wrote: Hi, Please describe your environemnt better * How do you "crawl", using which crawler? * To which RequestHandler do you send the docs? * Which version of Solr * Can you share your schema and other relevan

Search in body

2012-10-09 Thread Tolga
Hi, My previous schema didn't have the body defined as field, so I did and searched for "body:Smyrna", and no results turned up. What am I doing wrong? Regards,

Re: Search in body

2012-10-09 Thread Tolga
I had no idea I had to index again, thanks for the heads up. On 10/09/2012 02:58 PM, Rafał Kuć wrote: Hello! After altering your schema.xml have you indexed your documents again ? It would be nice to see how you schema.xml looks like and example of the data, because otherwise we can only guess

Re: Search in body

2012-10-09 Thread Tolga
I've just indexed again, and no luck. Below is my schema sortMissingLast="true" omitNorms="true"/> precisionStep="0" omitNorms="true" positionIncrementGap="0"/>

Re: Search in body

2012-10-09 Thread Tolga
I was expecting to be able to search in the body, but apparently I don't need it according to Markus. Regards, On 10/09/2012 03:27 PM, Rafał Kuć wrote: Hello! I assume you've added the body field, but you don't populate it. As far as I remember Nutch don't fill the body field by default. What

Search in specific website

2012-10-11 Thread Tolga
Hi, I use nutch to crawl my website and index to solr. However, how can I search for piece of content in a specific website? I use multiple URL's Regards,

Re: Search in specific website

2012-10-16 Thread Tolga
Hi again, In Nutch list, I was told to use "url:example\.net AND content:some keyword" and so I did. However, I get results from both my URLs. Why this behaviour? Regards, PS: I've re(crawl|index)ed my data. On 10/12/2012 05:07 PM, Otis Gospodnetic wrote: Hi Tolga, You&#

Direct control over document position in search results

2009-02-23 Thread Ercan, Tolga
Hello, I was wondering if there was any facility to directly manipulate search results based on business criteria to place documents at a fixed position in those results. For example, when I issue a query, the first four results would be based on natural search relevancy, then the fifth result

Re: Direct control over document position in search results

2009-02-25 Thread Ercan, Tolga
... This is not based on query term, but appears to be based on a "document type" meta-data field. We can certainly create the meta-data in Solr, but I can't seem to figure out how to manipulate the search results to the extent I need. On 2/24/09 9:12 AM, "Steven A Row

missing core name in path

2012-08-16 Thread Muzaffer Tolga Özses
Hi, I've started Solr as usual, and when I browsed to http://www.example.com:8983/solr/admin, I got HTTP ERROR 404 Problem accessing /solr/admin/index.jsp. Reason: missing core name in path Powered by Jetty:// Also, below are the lines I got when starting it: SEVERE: org.apache.solr.co

Re: missing core name in path

2012-08-16 Thread Muzaffer Tolga Özses
anges have occurred. Maybe you were viewing it in some editor and accidentally hit some keys that corrupted the format. And, tell us what release of Solr you are using. -- Jack Krupansky -Original Message- From: Muzaffer Tolga Özses Sent: Thursday, August 16, 2012 6:57 AM To: solr