actually something changed, I managed to crawl and index some pages (the
other must have to do with regex-urls). Thank you!

Was this always necessary? Any pointer discussing why it's needed?

On Tue, May 10, 2011 at 5:40 PM, Gabriele Kahlout
<gabri...@mysimpatico.com>wrote:

> You mean that I should copy it from nutch into solr?
>
> $ cp $NUTCH_HOME/conf/schema.xml $SOLR_HOME/conf/schema.xml
>
> After restarting tomcat, and re-executing the script nothing changed.
>
>
> On Tue, May 10, 2011 at 5:35 PM, Markus Jelsma <markus.jel...@openindex.io
> > wrote:
>
>> You need to use the schema.xml shipped with Nutch in Solr. It provides
>> most
>> fields that you need.
>>
>> On Tuesday 10 May 2011 17:31:33 Gabriele Kahlout wrote:
>> > I don't get you, are you talking about conf/schema.xml? That's what I'm
>> > referring to. Am i supposed to do something with the nutch's
>> > conf/schema.xml?
>> >
>> > On Tue, May 10, 2011 at 4:46 PM, Markus Jelsma
>> >
>> > <markus.jel...@openindex.io>wrote:
>> > > There is a working example schema in Nutch' conf directory.
>> > >
>> > > On Tuesday 10 May 2011 16:40:02 Gabriele Kahlout wrote:
>> > > > From solr logs:
>> > > >
>> > > > May 10, 2011 4:33:20 PM org.apache.solr.common.SolrException log
>> > > > *SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field
>> > > > 'content' *
>> > > >
>> > > >     at
>> > >
>> > >
>> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:32
>> > > 1)
>> > >
>> > > >     at
>> > >
>> > >
>> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP
>> > > ro
>> > >
>> > > > cessorFactory.java:60) at
>> > > > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
>> at
>> > > > org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
>> > > >
>> > > >     at
>> > >
>> > >
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten
>> > > tS
>> > >
>> > > > treamHandlerBase.java:55) at
>> > >
>> > >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
>> > > se
>> > >
>> > > > .java:129) at
>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
>> > > >
>> > > >     at
>> > >
>> > >
>>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:
>> > > > 356) at
>> > >
>> > >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
>> > > va
>> > >
>> > > > :252) at
>> > >
>> > >
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
>> > > io
>> > >
>> > > > nFilterChain.java:244) at
>> > >
>> > >
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
>> > > rC
>> > >
>> > > > hain.java:210) at
>> > >
>> > >
>> org.netbeans.modules.web.monitor.server.MonitorFilter.doFilter(MonitorFil
>> > > te
>> > >
>> > > > r.java:393) at
>> > >
>> > >
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
>> > > io
>> > >
>> > > > nFilterChain.java:244) at
>> > >
>> > >
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
>> > > rC
>> > >
>> > > > hain.java:210) at
>> > >
>> > >
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
>> > > .j
>> > >
>> > > > ava:240) at
>> > >
>> > >
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve
>> > > .j
>> > >
>> > > > ava:161) at
>> > >
>> > >
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
>> > > 16
>> > >
>> > > > 4) at
>> > >
>> > >
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
>> > > 10
>> > >
>> > > > 0) at
>> > > >
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55
>> > > > 0)
>> > > >
>> > > >     at
>> > >
>> > >
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j
>> > > av
>> > >
>> > > > a:118) at
>> > >
>> > >
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:38
>> > > 0)
>> > >
>> > > >     at
>> > >
>> > >
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243
>> > > )
>> > >
>> > > >     at
>> > >
>> > >
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H
>> > > tt
>> > >
>> > > > p11Protocol.java:188) at
>> > >
>> > >
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H
>> > > tt
>> > >
>> > > > p11Protocol.java:166) at
>> > >
>> > >
>> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.ja
>> > > va
>> > >
>> > > > :288) at
>> > >
>> > >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor
>> > > .j
>> > >
>> > > > ava:886) at
>> > >
>> > >
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
>> > > > 908) at java.lang.Thread.run(Thread.java:680)
>> > > >
>> > > > in conf/schema.xml:
>> > > >    <!-- fields for index-basic plugin -->
>> > > >
>> > > >         <field name="host" type="url" stored="false"
>> indexed="true"/>
>> > > >         <field name="site" type="string" stored="false"
>> > > >         indexed="true"/> <field name="url" type="url" stored="true"
>> > > >         indexed="true"
>> > > >
>> > > >             required="true"/>
>> > > >
>> > > > *        <field name="content" type="text" stored="false"
>> > >
>> > > indexed="true"/>*
>> > >
>> > > > in conf/solrindex-mapping.xml:
>> > > >     <fields>
>> > > >
>> > > >         <field dest="content" source="content"/>
>> > > >
>> > > > In recent solr I think this has been renamed into text?
>> > > >
>> > > > Solr's conf/schema.xml:
>> > > >         via copyField further on in this schema  -->
>> > > >
>> > > > *   <field name="text" type="text" indexed="true" stored="false"
>> > > > multiValued="true"/>*
>> > > >
>> > > > On Tue, May 10, 2011 at 4:30 PM, Gabriele Kahlout
>> > > >
>> > > > <gabri...@mysimpatico.com>wrote:
>> > > > > It apparently is normal, and my issue is indeed with nutch.
>> > > > >
>> > > > > I've modified post.sh from the example docs to use the solr in
>> > > > > http://localhost:8080/apache-solr-3.1-SNAPSHOT and now finally
>> data
>> > >
>> > > made
>> > >
>> > > > > it to the index.
>> > > > > $ post.sh solr.xml monitor.xml
>> > > > >
>> > > > > With nutch I'm at:
>> > > > >
>> > > > > $ svn info
>> > > > > Path: .
>> > > > > URL: http://svn.apache.org/repos/asf/nutch/branches/branch-1.3
>> > > > > Repository Root: http://svn.apache.org/repos/asf
>> > > > > Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>> > > > > Revision: *1101459*
>> > > > > Node Kind: directory
>> > > > > Schedule: normal
>> > > > > Last Changed Author: markus
>> > > > > Last Changed Rev: 1101280
>> > > > > Last Changed Date: 2011-05-10 02:46:04 +0200 (Tue, 10 May 2011)
>> > > > >
>> > > > > Does this work for you? All I've done is svn co nutch 1.3 and
>> execute
>> > >
>> > > my
>> > >
>> > > > > script which up to now worked.
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, May 10, 2011 at 4:11 PM, Gabriele Kahlout <
>> > > > >
>> > > > > gabri...@mysimpatico.com> wrote:
>> > > > >> Hello,
>> > > > >>
>> > > > >> I'm having trouble getting Solr 3.1 to work with nutch-1.3.  I'm
>> not
>> > > > >> sure where the problem is, but I'm wondering why does the
>> solrHome
>> > >
>> > > path
>> > >
>> > > > >> end with /./.
>> > > > >>
>> > > > >> cwd=/Applications/NetBeans/apache-tomcat-7.0.6/bin
>> > > > >> SolrHome=/Users/simpatico/apache-solr-3.1.0/solr/./
>> > > > >>
>> > > > >> In the web.xml of solr:
>> > > > >>    <env-entry>
>> > > > >>
>> > > > >>        <env-entry-name>solr/home</env-entry-name>
>> > > > >>
>> > > > >>
>> <env-entry-value>${user.home}/apache-solr-3.1.0/solr</env-entry-valu
>> > > > >> e>
>> > > > >>
>> > > > >>        <env-entry-type>java.lang.String</env-entry-type>
>> > > > >>
>> > > > >>     </env-entry>
>> > > > >>
>> > > > >> --
>> > > > >> Regards,
>> > > > >> K. Gabriele
>> > > > >>
>> > > > >> --- unchanged since 20/9/10 ---
>> > > > >> P.S. If the subject contains "[LON]" or the addressee
>> acknowledges
>> > > > >> the receipt within 48 hours then I don't resend the email.
>> > > > >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
>> this) ∧
>> > > > >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>> > > > >>
>> > > > >> If an email is sent by a sender that is not a trusted contact or
>> the
>> > > > >> email does not contain a valid code then the email is not
>> received.
>> > > > >> A valid code starts with a hyphen and ends with "X".
>> > > > >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
>> subject(x) ∧
>> > > > >> y
>> > >
>> > > ∈
>> > >
>> > > > >> L(-[a-z]+[0-9]X)).
>> > > > >
>> > > > > --
>> > > > > Regards,
>> > > > > K. Gabriele
>> > > > >
>> > > > > --- unchanged since 20/9/10 ---
>> > > > > P.S. If the subject contains "[LON]" or the addressee acknowledges
>> > > > > the receipt within 48 hours then I don't resend the email.
>> > > > > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this)
>> ∧
>> > > > > time(x) < Now + 48h) ⇒ ¬resend(I, this).
>> > > > >
>> > > > > If an email is sent by a sender that is not a trusted contact or
>> the
>> > > > > email does not contain a valid code then the email is not
>> received. A
>> > > > > valid code starts with a hyphen and ends with "X".
>> > > > > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x)
>> ∧
>> > > > > y
>> > >
>> > > ∈
>> > >
>> > > > > L(-[a-z]+[0-9]X)).
>> > >
>> > > --
>> > > Markus Jelsma - CTO - Openindex
>> > > http://www.linkedin.com/in/markus17
>> > > 050-8536620 / 06-50258350
>>
>> --
>> Markus Jelsma - CTO - Openindex
>> http://www.linkedin.com/in/markus17
>> 050-8536620 / 06-50258350
>>
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>
>


-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Reply via email to