how to make solr to log navigation and search on tomcat server? Please help!!

2012-05-16 Thread blacknumber2009
hi guys, I have a tomcat server running my website which was developed by Spring framework. I have install solr successfully to tomcat. I'm just wondering if there is anyway to make solr to log all the navigation and search within the website of users who visit my website. I'm a newbie. any help wo

Re: Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Katsuyoshi NOGUCHI
I can receive same result! Thanks! 2012/5/17 Shinichiro Abe > If you want to treat test.pdf as a phrase "test pdf", > it might work by setting text_sen autoGeneratePhraseQueries="true". > > Regards, > Shinichiro Abe > > On 2012/05/17, at 10:39, Katsuyoshi NOGUCHI wrote: > > > OK, I understand ho

Re: Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Jack Krupansky
I just noticed that you used "dismax" in 1.4 vs. "edismax" in 3.6. There may be other differences that I have not yet noticed. Also, you should have separate index and query analyzers so that catenateWords="0" catenateNumbers="0" for the query analyzer. It could be that the catenateWords="1" c

highlighter not respecting sentence boundry

2012-05-16 Thread abhayd
hi I am using highlighter component with hl.frgmenter=regex&hl.regex.pattern=[-\w ,/\n]\"']{20,200} Basically the configuration that comes with fragmenter in highlighting component in solrconfig.xml file. My snippets don't start with start of sentence. I also tried boundary scanner &q=iphon

Re: Quering Solr

2012-05-16 Thread Lance Norskog
Yes, 'text_gr' in solr/example/conf/schema.xml is (I think) the Greek text type. It is commented out. This has someone's idea of how Greek text analysis should work. It may not be right for your use case. On Tue, May 15, 2012 at 1:32 PM, anarchos78 wrote: > Hello guys! > > I ha

Re: curl or nutch

2012-05-16 Thread Otis Gospodnetic
It can, as can ManifoldCF.  But you should ask on nutch-user list (this may also be documented on the Wiki) Otis  Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm  > > From: Tolga >To: solr-user@lucene.apache.org >Sent: W

Re: Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Shinichiro Abe
If you want to treat test.pdf as a phrase "test pdf", it might work by setting text_sen autoGeneratePhraseQueries="true". Regards, Shinichiro Abe On 2012/05/17, at 10:39, Katsuyoshi NOGUCHI wrote: > OK, I understand how those words are tokenized by different tokenizer > factories. > My question

Re: Update JSON not working for me

2012-05-16 Thread Lance Norskog
This is my json variant of solr/example/exampledocs/post.sh. It takes an url as the first parameter. #!/bin/sh # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regardi

Posting JSON Data to Solr using XHR?

2012-05-16 Thread rjain15
Hi I am trying to post JSON Data to Solr using XHR / JQuery and it doesn't seem to work. I don't get any exception on the jetty console. Has anyone tried this before and are their any obvious gotchas in my code. Here is my code snippet $(document).ready(function(){ var url='http://localhost:89

Re: Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Katsuyoshi NOGUCHI
OK, I understand how those words are tokenized by different tokenizer factories. My question is that how I can have solr analyze and search for "test" AND "pdf". As Solr1.4 gives result of "test" AND "pdf", I want Solr 3.6 to do the same. (Solr3.6 gives result of "test" OR "pdf"). Any idea? 2012/

Re: Must match and terms with only one letter

2012-05-16 Thread Jack Krupansky
Ah, sorry. I meant to add that you should have a stop filter in the query analyzer, but not in the index analyzer. -- Jack Krupansky -Original Message- From: Walter Underwood Sent: Wednesday, May 16, 2012 8:52 PM To: solr-user@lucene.apache.org Subject: Re: Must match and terms with o

Re: Must match and terms with only one letter

2012-05-16 Thread Walter Underwood
Except you can never match "a", so that is a bad idea. So much for the query "vitamin a". wunder On May 16, 2012, at 5:47 PM, Jack Krupansky wrote: > Add "a" (and maybe other single letters) to the stopwords file. Then it won't > show up in the query at all. > > And with edismax, enable PF2 a

Re: Must match and terms with only one letter

2012-05-16 Thread Jack Krupansky
Add "a" (and maybe other single letters) to the stopwords file. Then it won't show up in the query at all. And with edismax, enable PF2 and maybe PF3 so that instances of "a cole" would get boosted. -- Jack Krupansky -Original Message- From: roySolr Sent: Wednesday, May 16, 2012 10

Re: Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Jack Krupansky
The query may be the same, but your analyzers are radically different. Just a hunch, but maybe GosenTokenizerFactory is treating the "." as a space. In 1.4 you were using SenTokenizerFactory. Or maybe GosenBasicFormFilterFactory is treating the "." as a space. In any case, my hunch is that "te

Re: Solr Single Core vs Multiple Cores installation for localization

2012-05-16 Thread Jack Krupansky
First you have to answer the twin questions of what you want the user experience to be and what expectations users may have independent of your "intentions". Do you intend to have separate, language specific search UI? That would match up with separate cores, but can be done with a language ty

Re: PermGen OOM Error

2012-05-16 Thread Jack Krupansky
PermGen memory has to do with number of classes loaded, rather than documents. Here are a couple of pages that help explain Java PermGen issues. The bottom line is that you can increase the PermGen space, or enable unloading of classes, or at least trace class loading to see why the problem oc

Re: Solr query and double quotes

2012-05-16 Thread Jack Krupansky
Change "blah blah" to "blah" "blah", two separate strings, two separate query terms. -- Jack Krupansky -Original Message- From: anarchos78 Sent: Wednesday, May 16, 2012 1:28 PM To: solr-user@lucene.apache.org Subject: Solr query and double quotes Hello friends, When I am passing que

Re: Solr 4.0 commit parameter 'waitFlush'

2012-05-16 Thread Jack Krupansky
As the doc says: "In Solr 4.0 it will be removed." See: http://wiki.apache.org/solr/UpdateXmlMessages But, the UpdateJSON doc certainly needs to be updated as well. -- Jack Krupansky -Original Message- From: rjain15 Sent: Wednesday, May 16, 2012 5:08 PM To: solr-user@lucene.apache.or

Solr 4.0 commit parameter 'waitFlush'

2012-05-16 Thread rjain15
I am using the commit parameter waitFlush, and seems it throws an exception in 4.0 I am not sure what is the purpose of this parameter and whether it is required or not SEVERE: org.apache.solr.common.SolrException: Unknown commit parameter 'waitFlush' at org.apache.solr.handler.RequestHan

Re: Update JSON not working for me

2012-05-16 Thread rjain15
Yonik You are the best !!! Yes, as soon as I changed the "Content-type:application/json" it worked. Now I can see all my updates to the book category. I am ready to roll, thanks for the patience and help. regards Rajesh -- View this message in context: http://lucene.472066.n3.nabble.com/U

Re: Update JSON not working for me

2012-05-16 Thread Yonik Seeley
On Wed, May 16, 2012 at 4:10 PM, rjain15 wrote: > Hi > > Firstly, apologies for the long post, I changed the quote to double quote > (and sometimes it is messy copying from DOS windows) > > Here is the command and the output on the Jetty Server Window. I am > highlighting some important pieces, >

Re: Update JSON not working for me

2012-05-16 Thread rjain15
Hi Firstly, apologies for the long post, I changed the quote to double quote (and sometimes it is messy copying from DOS windows) Here is the command and the output on the Jetty Server Window. I am highlighting some important pieces, I have enabled the LOG LEVEL to DEBUG on the JETTY window.

Re: Update JSON not working for me

2012-05-16 Thread Michael Della Bitta
Lookout, the first end quote is in the wrong spot. Michael On Wed, May 16, 2012 at 3:29 PM, Yonik Seeley wrote: > On Wed, May 16, 2012 at 2:36 PM, rjain15 wrote: >> No. Changing to name:monsters didn't work > > OK, but you'll have to do that if you get the other part working. > >> Here is my gu

Re: Update JSON not working for me

2012-05-16 Thread Yonik Seeley
On Wed, May 16, 2012 at 2:36 PM, rjain15 wrote: > No. Changing to name:monsters didn't work OK, but you'll have to do that if you get the other part working. > Here is my guess, the UpdateJSON is not adding any new documents to the > existing index. If that's true, the most likely culprit is yo

Re: CloudSolrServer not working with standalone Zookeeper

2012-05-16 Thread Daniel Brügge
OK, it's also not working with an internal started Zookeeper. On Wed, May 16, 2012 at 8:29 PM, Daniel Brügge < daniel.brue...@googlemail.com> wrote: > Hi, > > I am just playing around with SolrCloud and have read in articles like > > http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-in

Re: Update JSON not working for me

2012-05-16 Thread rjain15
Hi, No. Changing to name:monsters didn't work Here is my guess, the UpdateJSON is not adding any new documents to the existing index. The document count remains the same after I call the UpdateJSON. I am new to Solr, my guess is that if there is some underlying schema that dictates what can

CloudSolrServer not working with standalone Zookeeper

2012-05-16 Thread Daniel Brügge
Hi, I am just playing around with SolrCloud and have read in articles like http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/that it is sufficient to create the connection to the Zookeeper instance and not to the Solr instance. When I try to c

Re: Update JSON not working for me

2012-05-16 Thread Yonik Seeley
On Wed, May 16, 2012 at 1:43 PM, rjain15 wrote: > http://localhost:8983/solr/select?q=title:monsters&wt=json&indent=true Try switching title:monsters to name:monsters https://issues.apache.org/jira/browse/SOLR-2598 Looks like the data was changed to use the name field instead and the docs were n

Re: - Solr 4.0 - How do I enable JSP support ? ...

2012-05-16 Thread Stefan Matheis
That will just enable the Support for rendering JSP's, but not more. For SolrCloud you may want to read the Wiki: http://wiki.apache.org/solr/SolrCloud On Wednesday, May 16, 2012 at 8:07 PM, rjain15 wrote: > java -jar start.jar -OPTIONS=jsp > > What is SolrCloud...sorry newbie to Solr. > >

Re: - Solr 4.0 - How do I enable JSP support ? ...

2012-05-16 Thread rjain15
java -jar start.jar -OPTIONS=jsp What is SolrCloud...sorry newbie to Solr. Thanks Rajesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-How-do-I-enable-JSP-support-tp3983763p3984195.html Sent from the Solr - User mailing list archive at Nabble.com.

Solr request tracking

2012-05-16 Thread Rahul Warawdekar
Hi, Is there any mechanism by which we can track and trend the incoming Solr search requests ? Some mechanisms like logging all incoming Solr requests to a different log file than Tomcat's and have a tool to trend the patterns ? -- Thanks and Regards Rahul A. Warawdekar

Re: - Solr 4.0 - How do I enable JSP support ? ...

2012-05-16 Thread Stefan Matheis
And you're running SolrCloud and not just 'java -jar start.jar', right Rajesh? On Wednesday, May 16, 2012 at 7:39 PM, rjain15 wrote: > http://localhost:8983/solr/#/~cloud > > I get the 404 error > > Loading of undefined failed with HTTP-Status 404 > > I am using the nightly build, apache-so

Re: Update JSON not working for me

2012-05-16 Thread rjain15
Hi I have tried with the latest nightly build apache-solr-4.0-2012-05-15_08-20-37 I am trying on a Windows 64 bit OS, I believe you have tested this on the LINUX box (based on the shell script) Not sure what I am missing, but the doesn't seem to work: I have changed the URL to just call the upd

Re: Update JSON not working for me

2012-05-16 Thread rjain15
Hi I have tried with the latest nightly build apache-solr-4.0-2012-05-15_08-20-37 I am trying on a Windows 64 bit OS, I believe you have tested this on the LINUX box (based on the shell script) Not sure what I am missing, but the doesn't seem to work: I have changed the URL to just call the upd

Re: - Solr 4.0 - How do I enable JSP support ? ...

2012-05-16 Thread rjain15
http://localhost:8983/solr/#/~cloud I get the 404 error Loading of undefined failed with HTTP-Status 404 I am using the nightly build, apache-solr-4.0-2012-05-15_08-20-37 Thanks Rajesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-How-do-I-enable-JSP-support-tp

Solr query and double quotes

2012-05-16 Thread anarchos78
Hello friends, When I am passing queries in solr I pass them as strings (“blah blah”). I am doing this because I have encoding problems with Greek (my input field accept Greek characters only as string). But solr sees the characters inside the quotes as an “exact match” term. Is there a way to rem

RE: Sort by length percentage match

2012-05-16 Thread Steven A Rowe
Hi Alejandro, N-grams might be a good fit. Using bigrams (n-grams of length 2) for "london", you'd get tokens "lo", "on", "nd", "do", "on". This should provide the hit ordering you want. Although it's not listed on Solr's analysis factories wiki page

Sort by length percentage match

2012-05-16 Thread Alejandro Cuesta
Hi, I have a field containing "cities" and I'd like to sort the results based on length percentage match. Example: Asuming I've got these cities in the index: london, south west london, londonderry, oxford And I search for "london", I'd like to get a list sorted like this: london

Re: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Tanguy Moal
Thank you! JIRA issue filed : https://issues.apache.org/jira/browse/SOLR-3463 -- Tanguy 2012/5/16 Steven A Rowe > Hi Tanguy, > > I looked at the code, and I can see where the problem you describe is > happening. > > I think it's a bug: if numbers are search terms, "stemming" them by > compress

Must match and terms with only one letter

2012-05-16 Thread roySolr
Hello, I use the MM function on my edismax requesthandler(70%). This works great but i have one problem: When is search for "A Cole" there has to been only one term match(mm = 70%). The problem is the "A", It returns 9200 documents with an "A" in it. Is there a posssibility to skip terms with onl

Re: slave index not cleaned

2012-05-16 Thread Jasper Floor
Btw, confirmed that this doesn't happen on our development stage with 3.6. On Wed, May 16, 2012 at 3:59 PM, Jasper Floor wrote: > The slave index does indeed grow over a period of time regardless of > restarts. We do run on 1.4 however. We will be updating to 3.6 very > soon however so I will see

RE: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Steven A Rowe
Hi Tanguy, I looked at the code, and I can see where the problem you describe is happening. I think it's a bug: if numbers are search terms, "stemming" them by compressing repeated digits makes little sense. Could you file a bug in JIRA? Please include the examples you gave in your earlier em

Re: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Robert Muir
On Wed, May 16, 2012 at 8:28 AM, Tanguy Moal wrote: > Any idea someone ? > > I think this is important since this could produce weird results on > collections with numbers mixed in text. I agree, i think we should just add '&& Character.isLetter(ch)' to the undoublet check? Thanks for bringing t

Re: Language analyzers

2012-05-16 Thread Robert Muir
On Wed, May 16, 2012 at 10:17 AM, anarchos78 wrote: > Hello, > > Is it possible to use two language analyzers for one fieldtype. Lets say > Greek and English (for indexing and querying) > For greek and english, its easy, they use totally different characters so none of their tokenfilters will con

Re: Language analyzers

2012-05-16 Thread Sven Maurmann
Hi! Could you explain this a little more detailed? Thanks, Sven Am 16.05.2012 um 16:17 schrieb anarchos78: > Hello, > > Is it possible to use two language analyzers for one fieldtype. Lets say > Greek and English (for indexing and querying) > > Thanks > > -- > View this message in context

Language analyzers

2012-05-16 Thread anarchos78
Hello, Is it possible to use two language analyzers for one fieldtype. Lets say Greek and English (for indexing and querying) Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Language-analyzers-tp3984116.html Sent from the Solr - User mailing list archive at Nabble.com

Re: SolrJ 4, soft commit

2012-05-16 Thread crive
Will have a go at it in a bit, in the meantime I've kind of workaround it setting autoSoftCommit maxDocs to 1. On Wed, May 16, 2012 at 3:08 PM, Ahmet Arslan wrote: > > You can still access the raw params for the update request > > though - and then just look at > http://wiki.apache.org/solr/Upda

Re: SolrJ 4, soft commit

2012-05-16 Thread Ahmet Arslan
> You can still access the raw params for the update request > though - and then just look at > http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 > > Just get the modifiable params from the request and set the > soft commit. Does this code work? SolrServer s

Facing Problem while testing solr 3.6 with Tomcat 6

2012-05-16 Thread Amit Handa
hi All, Kindly guide me in resolving the following issue which is coming while testing Apache Solr 3.6 with Tomcat 6 while trying to access " http://localhost:8080/solr-example/"; HTTP Status 500 - -- *type* Exception report *message* ** *description* *The server en

Re: slave index not cleaned

2012-05-16 Thread Jasper Floor
The slave index does indeed grow over a period of time regardless of restarts. We do run on 1.4 however. We will be updating to 3.6 very soon however so I will see how that works out. Actually we should be able to see this on our staging platform. thanks everyone. mvg, Jasper On Mon, May 14, 201

Re: commit question

2012-05-16 Thread Mark Miller
On May 16, 2012, at 5:23 AM, marco crivellaro wrote: > Hi all, > this might be a silly question but I've found different opinions on the > subject. > > When a search is run after a commit is performed will the result include all > document(s) committed until last commit? > > use case (sync): >

Re: SolrJ 4, soft commit

2012-05-16 Thread Mark Miller
On May 16, 2012, at 6:07 AM, marco crivellaro wrote: > Hi all, > I am evaluating Solr 4.0 fot its NRT capabilities. > How can you perform a soft commit with solrj 4.0? > > HttpSolrServer.commit method doesn't have softCommit option which appears to > be an option available for the commit command

Re: Adding config to SolrCloud without creating any shards/slices

2012-05-16 Thread Mark Miller
k On May 16, 2012, at 5:35 AM, Per Steffensen wrote: > Hi > > We want to create a Solr config in ZK during installation of our product, but > we dont want to create any shards in that phase. We will create shards from > our application when it starts up and also automatically maintain the set o

Re: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Tanguy Moal
Any idea someone ? I think this is important since this could produce weird results on collections with numbers mixed in text. >From my understanding, there are a few options to address the issue : 1) Make *LightStemmer token type aware and don't try to stem on things that are not text (alpha/alp

Re: curl or nutch

2012-05-16 Thread Tolga
Can nutch crawl/index files as well? On 5/16/12 12:29 PM, findbestopensource wrote: You could very well use Solr. It has support to index the PDF and XML files. If you want to index websites and search using page rank then choose Nutch. Regards Aditya www.findbestopensource.com On Wed, May 16

Re: PermGen OOM Error

2012-05-16 Thread SH
so have to increase the memory available to the JVM, what servlet container are you using? SH On 05/16/2012 01:50 PM, richard.pog...@holidaylettings.co.uk wrote: When running Solr we are experiencing PermGen OOM exceptions, this problem gets worse and worse the more documents are added and co

Re: curl or nutch

2012-05-16 Thread Tirthankar Chatterjee
If you use curl you will need to track every document and recurse inside folders,etc. If you use nutch it takes care of incremental crawling in the configured locations and submits the docs which changed from its previous run. The lack of a simple File system crawler around Solr is a big disadv

PermGen OOM Error

2012-05-16 Thread richard.pog...@holidaylettings.co.uk
When running Solr we are experiencing PermGen OOM exceptions, this problem gets worse and worse the more documents are added and committed. Stopping the java process does not seem to free the memory. Has anyone experienced issues like this. Kind regards, Richard

Solr Single Core vs Multiple Cores installation for localization

2012-05-16 Thread Ivan Hrytsyuk
Hello, We are going to add multi-language support for our Solr-based project. We consider next Solr installation types: 1. Single core - all fields for all languages reside in a single core. I.e. title_en, description_en, title_de, description_de, title_fr, description_fr 2. Multi

indexing Dublin core xml files

2012-05-16 Thread ggggGuys
Hello, i'd like to index xml files in the Dublin Core format in Solr. I'd like to know which files i should modify and how. Thank you :) -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-Dublin-core-xml-files-tp3984060.html Sent from the Solr - User mailing list archive

SolrJ 4, soft commit

2012-05-16 Thread marco crivellaro
Hi all, I am evaluating Solr 4.0 fot its NRT capabilities. How can you perform a soft commit with solrj 4.0? HttpSolrServer.commit method doesn't have softCommit option which appears to be an option available for the commit command: http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.

Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Katsuyoshi NOGUCHI
Hi, guys! I need some advice. When sending the same dismax query to Solr 1.4 and 3.6, query results of search words analized by WordDelimiterFilterFactory are different as below: [Search Word] test.pdf [Result] Solr1.4: Search results are analized by "test" AND "pdf" Solr3.6: Search results are

Adding config to SolrCloud without creating any shards/slices

2012-05-16 Thread Per Steffensen
Hi We want to create a Solr config in ZK during installation of our product, but we dont want to create any shards in that phase. We will create shards from our application when it starts up and also automatically maintain the set of shards from our application (which uses SolrCloud). The onl

Re: Boosting score by Geo distance

2012-05-16 Thread Mikhail Khludnev
http://wiki.apache.org/solr/FunctionQuery#recip you are welcome On Wed, May 16, 2012 at 12:25 PM, roySolr wrote: > Hello, > > I want to boost the score of the founded documents by geo distance. I use > this: > > bf=recip(geodist(),2,1000,30) > > It works but i don't know what the parameters mea

Re: First query to find meta data, second to search. How to group into one?

2012-05-16 Thread Mikhail Khludnev
Your approach sounds like well knows old school one http://nlp.stanford.edu/IR-book/html/htmledition/pseudo-relevance-feedback-1.html I believe you can hack MLT and do what you need. I'm working on something like this, and there are a number of approaches. One of the simple one is build custom co

Re: curl or nutch

2012-05-16 Thread findbestopensource
You could very well use Solr. It has support to index the PDF and XML files. If you want to index websites and search using page rank then choose Nutch. Regards Aditya www.findbestopensource.com On Wed, May 16, 2012 at 1:13 PM, Tolga wrote: > Hi, > > I have been trying for a week. I really wan

commit question

2012-05-16 Thread marco crivellaro
Hi all, this might be a silly question but I've found different opinions on the subject. When a search is run after a commit is performed will the result include all document(s) committed until last commit? use case (sync): 1- add document 2- commit 3- search (faceted) will faceted search on poi

Re: First query to find meta data, second to search. How to group into one?

2012-05-16 Thread Samarendra Pratap
Thanks Sujit, Mikhail for you suggestions Sujit - Continuing to do it at client side increases one extra cycle between server and the client. Moreover it does not remain centralized, so I may have to repeat client side logic to multiple places, depending upon how it is implemented. Mikhail - More

Re: Problem with AND clause in multi core search query

2012-05-16 Thread ravicv
Hi Eric, So for this scenario i wrote a custom request handler and get individual results from each core and then i am applying *AND * clause up on the results. Please let me know whether this approach will cause any other disturbances/Issues later? Or can you suggest me some other approach?

Boosting score by Geo distance

2012-05-16 Thread roySolr
Hello, I want to boost the score of the founded documents by geo distance. I use this: bf=recip(geodist(),2,1000,30) It works but i don't know what the parameters mean? (2,1000,30) Thanks Roy -- View this message in context: http://lucene.472066.n3.nabble.com/Boosting-score-by-Geo-distance

curl or nutch

2012-05-16 Thread Tolga
Hi, I have been trying for a week. I really want to get a start, so what should I use? curl or nutch? I want to be able to index pdf, xml etc. and search within them as well. Regards,