Securing Solr 5.0.0

2015-03-22 Thread Frederik Arnold
I followed the "Taking Solr to Production" tutorial and I now have an
solr 5.0.0 instance up and running.

What is the recommended way for securing solr?
Searching should be available for everyone but I want authentication for
the Solr Admin UI and also for posting and deleting files.


schemaless slow indexing

2015-03-22 Thread Mike Murphy
I'm trying out schemaless in solr 5.0, but the indexing seems quite a
bit slower than it did in the past on 4.10.  Any pointers?

--Mike


Re: Securing Solr 5.0.0

2015-03-22 Thread Erick Erickson
Have you looked at https://wiki.apache.org/solr/SolrSecurity?

Best,
Erick

On Sun, Mar 22, 2015 at 4:20 AM, Frederik Arnold  wrote:
> I followed the "Taking Solr to Production" tutorial and I now have an
> solr 5.0.0 instance up and running.
>
> What is the recommended way for securing solr?
> Searching should be available for everyone but I want authentication for
> the Solr Admin UI and also for posting and deleting files.


Re: schemaless slow indexing

2015-03-22 Thread Erick Erickson
Please review: http://wiki.apache.org/solr/UsingMailingLists

You haven't quantified the slowdown. Or given any details on how
you're measuring the "slowdown". Or how you've configured your setups
in 4.10 and 5.0. Or... Ad Hossman would say "details matter".

Best,
Erick

On Sun, Mar 22, 2015 at 8:35 AM, Mike Murphy  wrote:
> I'm trying out schemaless in solr 5.0, but the indexing seems quite a
> bit slower than it did in the past on 4.10.  Any pointers?
>
> --Mike


Re: schemaless slow indexing

2015-03-22 Thread Mike Murphy
I start up solr schemaless and index a bunch of data, and it takes a
lot longer to finish indexing.
No configuration changes, just straight schemaless.

--Mike

On Sun, Mar 22, 2015 at 12:27 PM, Erick Erickson
 wrote:
> Please review: http://wiki.apache.org/solr/UsingMailingLists
>
> You haven't quantified the slowdown. Or given any details on how
> you're measuring the "slowdown". Or how you've configured your setups
> in 4.10 and 5.0. Or... Ad Hossman would say "details matter".
>
> Best,
> Erick
>
> On Sun, Mar 22, 2015 at 8:35 AM, Mike Murphy  wrote:
>> I'm trying out schemaless in solr 5.0, but the indexing seems quite a
>> bit slower than it did in the past on 4.10.  Any pointers?
>>
>> --Mike


Re: Need help using DIH with FileListEntityProcessor with XPathEntityProcessor

2015-03-22 Thread Martin Wunderlich
Hi Alex, 

Thanks a lot for the reply and apologies for being unclear. The 
XPathEntityProcessor provides an option to specify an XSLT file that should be 
applied to the XML input prior to the actual data import. I am including my 
current configuration below, with the respective attribute highlighted. 

I have checked various forums and documentation bits, but the config XML seems 
ok to me. And yet, nothing gets imported. 

Cheers, 

Martin

 












 
> Am 22.03.2015 um 01:18 schrieb Alexandre Rafalovitch  >:
> 
> What do you mean using DIH with XSLT together? DIH uses a basic XPath
> parser, but not full XSLT.
> 
> So, it's not very clear what the question actually means. How did you
> configure it all?
> 
> Regards,
>   Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/ 
> 
> 
> On 21 March 2015 at 14:14, Martin Wunderlich  wrote:
>> Hi all,
>> 
>> I am trying to create a data import handler (DIH) to import XML files. The 
>> source XML should be transformed using XSLT into the standard Solr import 
>> format. I have tested the XSLT and successfully imported data using the 
>> Java-based simple import tool. However, when I try to import the same XML 
>> files with the same XSLT pre-processing using a DIH configured in 
>> solrconfig.xml, it doesn’t work. I can execute the DIH from the admin 
>> interface, but no documents get imported. The logging console doesn’t give 
>> any errors.
>> 
>> Could someone who has managed to successfully set up a similar configuration 
>> (XML import via DIH with XSL pre-processing), provide with the basic 
>> configuration, so that I can check what might be wrong in mine?
>> 
>> Thanks a lot.
>> 
>> Cheers,
>> 
>> Martin
>> 
>> 



Re: Securing Solr 5.0.0

2015-03-22 Thread Frederik Arnold
I have and I tried all sorts of things and they didn't work.
But I figured it out now. I setup Apache as a reverse proxy and it works.

2015-03-22 17:25 GMT+01:00 Erick Erickson :

> Have you looked at https://wiki.apache.org/solr/SolrSecurity?
>
> Best,
> Erick
>
> On Sun, Mar 22, 2015 at 4:20 AM, Frederik Arnold 
> wrote:
> > I followed the "Taking Solr to Production" tutorial and I now have an
> > solr 5.0.0 instance up and running.
> >
> > What is the recommended way for securing solr?
> > Searching should be available for everyone but I want authentication for
> > the Solr Admin UI and also for posting and deleting files.
>


Re: Solr hangs / LRU operations are heavy on cpu

2015-03-22 Thread Umesh Prasad
We use filter very heavily because we run an e-commerce site which has a
lot of faceting and drill downs configured at different paths on the store
..
 We are using master  slave replication and we use slaves to support
higher qps.

filterCache :
 Concurrent LFU Cache(maxSize=1, initialSize=4000, minSize=9000,
acceptableSize=9500, cleanupThread=true, timeDecay=true).

We see 95-99% hit ratio on filter cache and most of our filters evictions
on filter cache.

These are figures from one of our prod boxes ..

   - size:9260
   - warmupTime:272007
   - timeDecay:true
   - cumulative_lookups:9220776
   - cumulative_hits:9048703
   - cumulative_hitratio:0.98


We had the default settings 2 yrs back on cache (untuned caches) and our
perf numbers were real bad. We got like 25%   latency improvement by tuning
our caches properly .. So tuning the caches was well worth the effort ..




On 21 March 2015 at 02:16, Erick Erickson  wrote:

> Are you faceting? That can sometimes use one of the caches
> (just glanced at stack trace...) as entries are pushed into and
> removed from the cache during the same request. Shot
> in the dark.
>
> Best,
> Erick
>
> On Fri, Mar 20, 2015 at 12:17 PM, Yonik Seeley  wrote:
> > The document cache is not really going to be taking up time here.
> > How many concurrent requests (threads) are you testing with here?
> >
> > One thing I've seen over the years is a false sense of what is taking
> > up time when benchmarks with a lot of threads are used.  The reason is
> > that when there are a lot more threads than CPUs, it's natural for
> > context switches to happen where synchronizations happen.  You look at
> > a profiler or thread dumps, and you see a bunch of threads piled up on
> > synchronization.  This does not mean that removing that
> > synchronization will really help anything... the threads can't all run
> > at once.
> >
> > -Yonik
> >
> >
> > On Thu, Mar 19, 2015 at 6:35 PM, Sergey Shvets 
> wrote:
> >> Hi,
> >>
> >> we have quite a problem with Solr. We are running it in a config 6x3,
> and
> >> suddenly solr started to hang, taking all the available cpu on the
> nodes.
> >>
> >> In the threads dump noticed things like this can eat lot of CPU time
> >>
> >>
> >>- org.apache.solr.search.LRUCache.put(LRUCache.java:116)
> >>-
> >>
> org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:705)
> >>-
> >>
> org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:155)
> >>-
> >>
> org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:183)
> >>-
> >>
> org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:88)
> >>-
> >>
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:158)
> >>-
> >>
> org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:148)
> >>-
> >>
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:242)
> >>-
> >>
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:153)
> >>-
> org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:96)
> >>-
> >>
> org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:52)
> >>-
> >>
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:758)
> >>-
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
> >>-
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> >>-
> >>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> >>-
> >>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >>-
> >>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
> >>-
> >>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
> >>-
> >>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
> >>-
> >>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
> >>-
> >>
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
> >>-
> >>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
> >>
> >>
> >> The cache itself is very minimalistic
> >>
> >>
> >>>> autowarmCount="0"/>
> >>  >> initialSize="512" autowarmCount="0"/>
> >>  >> autowarmCount="0"/>
> >>  >> autowarmCount="256" showItems="10" />
> >>  >> initialSize="0" autowarmCount="10"
> >> regenerator="solr.NoOpRegenerator"/>
> >> true
> >> 20
> >> 200
> >>
> >> Solr version is 4.10.3
> >>
> >> Any of help is appreciated!
> >>
> >> sergey
>



-- 
Thanks & Regards
Umesh Prasad
Tech Lead @ flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Need help using DIH with FileListEntityProcessor with XPathEntityProcessor

2015-03-22 Thread Alexandre Rafalovitch
I am not entirely sure your problem is at the XSL level yet?

*) I see problems with quotes in two places (in datasource, and in
outer entity). Did you paste definitions from MSWord by any chance?
*) I see that you declare outer entity to be rootEntity=true, so you
will not get anything from inner documents
*) I don't see any XPath definitions in the inner entity, so the
processor does not know how to actually map to the fields (that's
different for SQLEntityProcessor which auto-maps).

I would step back from inner DIH entity and make sure your outer
entity actually captures something. Maybe by enabling dynamicField "*"
with stored=true. See what you get into the schema. Then, add XPath
against original XML, just to make sure you capture _something_. Then,
XSLT and XPath.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 22 March 2015 at 12:36, Martin Wunderlich  wrote:
> Hi Alex,
>
> Thanks a lot for the reply and apologies for being unclear. The 
> XPathEntityProcessor provides an option to specify an XSLT file that should 
> be applied to the XML input prior to the actual data import. I am including 
> my current configuration below, with the respective attribute highlighted.
>
> I have checked various forums and documentation bits, but the config XML 
> seems ok to me. And yet, nothing gets imported.
>
> Cheers,
>
> Martin
>
>
> 
>  type=„FileDataSource />
>  name="pickupdir"
> processor="FileListEntityProcessor"
> rootEntity="true"
> fileName=".*xml"
> baseDir=„/abs/path/to/source/dir/for/import/"
> recursive="true"
> newerThan="${dataimporter.last_index_time}"
> dataSource="null">
>
>  name="xml"
> processor="XPathEntityProcessor"
> stream="false"
> useSolrAddSchema="true"
> url="${pickupdir.fileAbsolutePath}"
> xsl="/abs/path/to/xslt/file/in/myCore/conf/transform.xsl">
> 
> 
> 
> 
>
>
>
>
>> Am 22.03.2015 um 01:18 schrieb Alexandre Rafalovitch > >:
>>
>> What do you mean using DIH with XSLT together? DIH uses a basic XPath
>> parser, but not full XSLT.
>>
>> So, it's not very clear what the question actually means. How did you
>> configure it all?
>>
>> Regards,
>>   Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/ 
>>
>>
>> On 21 March 2015 at 14:14, Martin Wunderlich  wrote:
>>> Hi all,
>>>
>>> I am trying to create a data import handler (DIH) to import XML files. The 
>>> source XML should be transformed using XSLT into the standard Solr import 
>>> format. I have tested the XSLT and successfully imported data using the 
>>> Java-based simple import tool. However, when I try to import the same XML 
>>> files with the same XSLT pre-processing using a DIH configured in 
>>> solrconfig.xml, it doesn’t work. I can execute the DIH from the admin 
>>> interface, but no documents get imported. The logging console doesn’t give 
>>> any errors.
>>>
>>> Could someone who has managed to successfully set up a similar 
>>> configuration (XML import via DIH with XSL pre-processing), provide with 
>>> the basic configuration, so that I can check what might be wrong in mine?
>>>
>>> Thanks a lot.
>>>
>>> Cheers,
>>>
>>> Martin
>>>
>>>
>


Re: schemaless slow indexing

2015-03-22 Thread Alexandre Rafalovitch
Same data with same version of Solr with the only difference between
Schema vs. Schemaless? How much longer, 10%, 2x, 20x?

Schemaless mode has a much more complex UpdateRequestProcessor chain,
that's partially what makes it schemaless. But I hesitate pointing
fingers at that without any real details.

Notice I am still asking the same questions as Erick!

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 22 March 2015 at 12:32, Mike Murphy  wrote:
> I start up solr schemaless and index a bunch of data, and it takes a
> lot longer to finish indexing.
> No configuration changes, just straight schemaless.
>
> --Mike
>
> On Sun, Mar 22, 2015 at 12:27 PM, Erick Erickson
>  wrote:
>> Please review: http://wiki.apache.org/solr/UsingMailingLists
>>
>> You haven't quantified the slowdown. Or given any details on how
>> you're measuring the "slowdown". Or how you've configured your setups
>> in 4.10 and 5.0. Or... Ad Hossman would say "details matter".
>>
>> Best,
>> Erick
>>
>> On Sun, Mar 22, 2015 at 8:35 AM, Mike Murphy  wrote:
>>> I'm trying out schemaless in solr 5.0, but the indexing seems quite a
>>> bit slower than it did in the past on 4.10.  Any pointers?
>>>
>>> --Mike


Re: schemaless slow indexing

2015-03-22 Thread Yonik Seeley
I took a quick look at the stock schemaless configs... unfortunately
they contain a performance trap.
There's a copyField by default that copies *all* fields to a catch-all
field called "_text".

IMO, that's not a great default.  Double the index size (well, the
"index" portion of it at least... not stored fields), and slower
indexing performance.

The other unfortunate thing is the name.  No where else in solr (that
I know of) do we have a single underscore field name.  _text looks
more like a dynamicField pattern.  Our other fields with underscores
look like _version_ and _root_.  If we're going to start a new naming
convention (or expand the naming conventions) we need to have some
consistency and logic behind it.

-Yonik

On Sun, Mar 22, 2015 at 12:32 PM, Mike Murphy  wrote:
> I start up solr schemaless and index a bunch of data, and it takes a
> lot longer to finish indexing.
> No configuration changes, just straight schemaless.
>
> --Mike
>
> On Sun, Mar 22, 2015 at 12:27 PM, Erick Erickson
>  wrote:
>> Please review: http://wiki.apache.org/solr/UsingMailingLists
>>
>> You haven't quantified the slowdown. Or given any details on how
>> you're measuring the "slowdown". Or how you've configured your setups
>> in 4.10 and 5.0. Or... Ad Hossman would say "details matter".
>>
>> Best,
>> Erick
>>
>> On Sun, Mar 22, 2015 at 8:35 AM, Mike Murphy  wrote:
>>> I'm trying out schemaless in solr 5.0, but the indexing seems quite a
>>> bit slower than it did in the past on 4.10.  Any pointers?
>>>
>>> --Mike


Re: How to use ConcurrentUpdateSolrServer for Secured Solr?

2015-03-22 Thread Ramkumar R. Aiyengar
Not a direct answer, but Anshum just created this..

https://issues.apache.org/jira/browse/SOLR-7275
 On 20 Mar 2015 23:21, "Furkan KAMACI"  wrote:

> Is there anyway to use ConcurrentUpdateSolrServer for secured Solr as like
> CloudSolrServer:
>
> HttpClientUtil.setBasicAuth(cloudSolrServer.getLbServer().getHttpClient(),
> , );
>
> I see that there is no way to access HTTPClient for
> ConcurrentUpdateSolrServer?
>
> Kind Regards,
> Furkan KAMACI
>


Re: schemaless slow indexing

2015-03-22 Thread Mike Murphy
That's it!
I hand edited the file that says you are not supposed to edit it and
removed that copyField.
Indexing performance is now back to expected levels.

I created an issue for this, https://issues.apache.org/jira/browse/SOLR-7284

--Mike

On Sun, Mar 22, 2015 at 3:29 PM, Yonik Seeley  wrote:
> I took a quick look at the stock schemaless configs... unfortunately
> they contain a performance trap.
> There's a copyField by default that copies *all* fields to a catch-all
> field called "_text".
>
> IMO, that's not a great default.  Double the index size (well, the
> "index" portion of it at least... not stored fields), and slower
> indexing performance.
>
> The other unfortunate thing is the name.  No where else in solr (that
> I know of) do we have a single underscore field name.  _text looks
> more like a dynamicField pattern.  Our other fields with underscores
> look like _version_ and _root_.  If we're going to start a new naming
> convention (or expand the naming conventions) we need to have some
> consistency and logic behind it.
>
> -Yonik
>
> On Sun, Mar 22, 2015 at 12:32 PM, Mike Murphy  wrote:
>> I start up solr schemaless and index a bunch of data, and it takes a
>> lot longer to finish indexing.
>> No configuration changes, just straight schemaless.
>>
>> --Mike
>>
>> On Sun, Mar 22, 2015 at 12:27 PM, Erick Erickson
>>  wrote:
>>> Please review: http://wiki.apache.org/solr/UsingMailingLists
>>>
>>> You haven't quantified the slowdown. Or given any details on how
>>> you're measuring the "slowdown". Or how you've configured your setups
>>> in 4.10 and 5.0. Or... Ad Hossman would say "details matter".
>>>
>>> Best,
>>> Erick
>>>
>>> On Sun, Mar 22, 2015 at 8:35 AM, Mike Murphy  wrote:
 I'm trying out schemaless in solr 5.0, but the indexing seems quite a
 bit slower than it did in the past on 4.10.  Any pointers?

 --Mike


Error trying to index files to Solr

2015-03-22 Thread Majisha Parambath
Hello,

As part of an assignment, we initially crawled and collected  NSF and NASA
Polar Datasets using Nutch. We used the nutch dump command to dump out the
segments that were created as part of the crawl.
Now we have to index this data into Solr. I am using java -jar post.jar
filename to post to solr however after the execution I do not see my file
indexed and checking the log I found exceptions which I am attaching with
this mail.

Could you please let me know if I am missing something?

Thanks and regards,
*Majisha Namath Parambath*
*Graduate Student, M.S in Computer Science*
*Viterbi School of Engineering*
*University of Southern California, Los Angeles*
org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0x0 (at char 
#10, byte #-1)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:99)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)


SolrCloud on Hadoop (Hortonworks Data Platform)

2015-03-22 Thread Vijay Bhoomireddy
Hi, 

 

I am trying to setup a SolrCloud cluster on top of Hadoop cluster using
Hortonworks Data Platform. I understood how to configure Solr to enable it
to store data in HDFS (process given below). However, I could not understand
how to enable Solr to setup the cluster using Zookeeper already available
with HDP. 

 

As per my understanding, if I make only the below HDFS related change, Solr
index data will be stored in HDFS. However, only machine from which the Solr
application is run, will act as a Solr server. Can anyone please let me know
how to configure Solr to use an external Zookeeper ensemble on HDP so that
the complete Hadoop cluster can be used as a SolrCloud cluster?

 



  hdfs://Hadoop_namenode:8020/user/solr

  true

  1

  true

  16384

  true

  true

  true

  16

  192



 

Also, please let me know if there are other activities that need to be
performed to make SolrCloud working on Hadoop apart from this HDFS and
Zookeeper changes.

 

 

Thanks & Regards

Vijay


-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: schemaless slow indexing

2015-03-22 Thread Erick Erickson
I think you mean https://issues.apache.org/jira/browse/SOLR-7290?

Erick

On Sun, Mar 22, 2015 at 2:30 PM, Mike Murphy  wrote:
> That's it!
> I hand edited the file that says you are not supposed to edit it and
> removed that copyField.
> Indexing performance is now back to expected levels.
>
> I created an issue for this, https://issues.apache.org/jira/browse/SOLR-7284
>
> --Mike
>
> On Sun, Mar 22, 2015 at 3:29 PM, Yonik Seeley  wrote:
>> I took a quick look at the stock schemaless configs... unfortunately
>> they contain a performance trap.
>> There's a copyField by default that copies *all* fields to a catch-all
>> field called "_text".
>>
>> IMO, that's not a great default.  Double the index size (well, the
>> "index" portion of it at least... not stored fields), and slower
>> indexing performance.
>>
>> The other unfortunate thing is the name.  No where else in solr (that
>> I know of) do we have a single underscore field name.  _text looks
>> more like a dynamicField pattern.  Our other fields with underscores
>> look like _version_ and _root_.  If we're going to start a new naming
>> convention (or expand the naming conventions) we need to have some
>> consistency and logic behind it.
>>
>> -Yonik
>>
>> On Sun, Mar 22, 2015 at 12:32 PM, Mike Murphy  wrote:
>>> I start up solr schemaless and index a bunch of data, and it takes a
>>> lot longer to finish indexing.
>>> No configuration changes, just straight schemaless.
>>>
>>> --Mike
>>>
>>> On Sun, Mar 22, 2015 at 12:27 PM, Erick Erickson
>>>  wrote:
 Please review: http://wiki.apache.org/solr/UsingMailingLists

 You haven't quantified the slowdown. Or given any details on how
 you're measuring the "slowdown". Or how you've configured your setups
 in 4.10 and 5.0. Or... Ad Hossman would say "details matter".

 Best,
 Erick

 On Sun, Mar 22, 2015 at 8:35 AM, Mike Murphy  wrote:
> I'm trying out schemaless in solr 5.0, but the indexing seems quite a
> bit slower than it did in the past on 4.10.  Any pointers?
>
> --Mike


Re: Error trying to index files to Solr

2015-03-22 Thread Shawn Heisey
On 3/22/2015 5:04 PM, Majisha Parambath wrote:
> As part of an assignment, we initially crawled and collected  NSF and
> NASA Polar Datasets using Nutch. We used the nutch dump command to dump
> out the segments that were created as part of the crawl.
> Now we have to index this data into Solr. I am using java -jar post.jar
> filename to post to solr however after the execution I do not see my
> file indexed and checking the log I found exceptions which I am
> attaching with this mail.

Here's the first part of your exception:

org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0x0 (at
char #10, byte #-1)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)

Solr is expecting UTF-8 characters, but the info you are sending it is
in another character set, and includes characters outside the normal
ASCII set.  The error message indicates that it is XML data.

If you know what character set the data actually uses for encoding, you
can use XML methods to indicate the character set, and the XML libraries
that Solr is utilizing can probably convert to UTF-8 automatically.

http://www.w3schools.com/xml/xml_encoding.asp

Thanks,
Shawn