Re: ngrams with position

2016-03-08 Thread Emir Arnautovic

Hi Elisabeth,
I don't think there is such token filter, so you would have to create 
your own token filter that takes token and emits ngram token of specific 
length. It should not be too hard to create such filter - you can take a 
look how nagram filter is coded - yours should be simpler than that.


Regards,
Emir

On 08.03.2016 08:52, elisabeth benoit wrote:

Hello,

I'm using solr 4.10.1. I'd like to index words with ngrams of fix lenght
with a position in the end.

For instance, with fix lenght 3, Amsterdam would be something like:


a0 (two spaces added at beginning)
am1
ams2
mst3
ste4
ter5
erd6
rda7
dam8
am9 (one more space in the end)

The number at the end being the position.

Does anyone have a clue how to achieve this?

Best regards,
Elisabeth



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: ngrams with position

2016-03-08 Thread elisabeth benoit
Thanks for your answer Emir,

I'll check that out.

Best regards,
Elisabeth

2016-03-08 10:24 GMT+01:00 Emir Arnautovic :

> Hi Elisabeth,
> I don't think there is such token filter, so you would have to create your
> own token filter that takes token and emits ngram token of specific length.
> It should not be too hard to create such filter - you can take a look how
> nagram filter is coded - yours should be simpler than that.
>
> Regards,
> Emir
>
>
> On 08.03.2016 08:52, elisabeth benoit wrote:
>
>> Hello,
>>
>> I'm using solr 4.10.1. I'd like to index words with ngrams of fix lenght
>> with a position in the end.
>>
>> For instance, with fix lenght 3, Amsterdam would be something like:
>>
>>
>> a0 (two spaces added at beginning)
>> am1
>> ams2
>> mst3
>> ste4
>> ter5
>> erd6
>> rda7
>> dam8
>> am9 (one more space in the end)
>>
>> The number at the end being the position.
>>
>> Does anyone have a clue how to achieve this?
>>
>> Best regards,
>> Elisabeth
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


solr simple query searh

2016-03-08 Thread Mugeesh Husain
Hello,

I am implementing simple search demo.
I have a field abc, insert "iphone" in abc.

if i will search "iphone" then it will give result,if i will search i phone,
then result will not populated. in analyzer i should implement for this
case.

input==abc:iphone
search query= i phone




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-simple-query-searh-tp4262402.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stopping Solr JVM on OOM

2016-03-08 Thread Binoy Dalal
Hi Shawn,
I've just finished writing a batch oom killer script and it seems to work
fine.

I couldn't try it on the actual solr process since I'm a bit stumped on how
I can make solr throw an oom at will.
Although I did write another code that does throw an oom upon which this
script is called and the running solr process is killed.

I would like to know how I should proceed from here with submitting the
code for review etc.

Thanks.

On Tue, 8 Mar 2016, 00:56 Shawn Heisey,  wrote:

> On 2/25/2016 2:06 PM, Fuad Efendi wrote:
> > The best practice: do not ever try to catch Throwable or its descendants
> Error, VirtualMachineError, OutOfMemoryError, and etc.
> >
> > Never ever.
> >
> > Also, do not swallow InterruptedException in a loop.
> >
> > Few simple rules to avoid hanging application. If we follow these, there
> will be no question "what is the best way to stop Solr when it gets in OOM”
> (or just becomes irresponsive because of swallowed exceptions)
>
> As I understand from SOLR-8539, if an OOM is thrown by a Java program
> and there is a properly configured OOM script, regardless of what
> happens with exception rewrapping, the script *should* kick in.  Here's
> an issue where this behavior was verified by a Jetty developer on a
> small-scale test program which catches and swallows the OOM:
>
> https://issues.apache.org/jira/browse/SOLR-8539
>
> Solr 5.x, when started on Linux/UNIX systems with the included shell
> scripts, comes default with an "oom killer" script that is supposed to
> stop Solr when OOM occurs.
>
> Recently it was discovered that the OnOutOfMemoryError option in the
> start script for Linux/UNIX was being incorrectly specified on the
> command line -- it doesn't actually work.  Here's the issue for that
> problem:
>
> https://issues.apache.org/jira/browse/SOLR-8145
>
> The fix for the incorrect OnOutOfMemoryError usage will be in version
> 6.0 when that version is finally released, which I think will make the
> OOM killer actually work on Linux/UNIX.  There is currently no concrete
> information on when 6.0 is expected.  If any plans for future 5.x
> versions come up, that fix will likely make it into those versions as well.
>
> There is no OOM killer script for Windows, so this feature is not
> present when running on Windows.  If somebody can come up with a way for
> Windows to find and kill the Solr process, I'd be happy to include it.
>
> Thanks,
> Shawn
>
> --
Regards,
Binoy Dalal


Re: solr simple query searh

2016-03-08 Thread John Blythe
what does your current analyzer look like?

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, Mar 8, 2016 at 6:42 AM, Mugeesh Husain  wrote:

> Hello,
>
> I am implementing simple search demo.
> I have a field abc, insert "iphone" in abc.
>
> if i will search "iphone" then it will give result,if i will search i
> phone,
> then result will not populated. in analyzer i should implement for this
> case.
>
> input==abc:iphone
> search query= i phone
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-simple-query-searh-tp4262402.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Stopping Solr JVM on OOM

2016-03-08 Thread Shawn Heisey
On 3/8/2016 5:13 AM, Binoy Dalal wrote:
> I've just finished writing a batch oom killer script and it seems to work
> fine.
>
> I couldn't try it on the actual solr process since I'm a bit stumped on how
> I can make solr throw an oom at will.
> Although I did write another code that does throw an oom upon which this
> script is called and the running solr process is killed.
>
> I would like to know how I should proceed from here with submitting the
> code for review etc.

Open an Improvement issue on the SOLR project in Apache's Jira with a
title like "OOM killer for Windows" and a useful description.  Clone the
source code from git, make your changes/additions.  Create a patch using
"git diff" and upload it using SOLR-.patch as the filename -- the
same name as the Jira issue.

Making Solr OOM on purpose is possible, but it is usually better to
write a small test program with an intentional memory leak.

I wonder if we can write a test for OOM death.

Thanks,
Shawn



Re: High Cpu sys usage

2016-03-08 Thread YouPeng Yang
Hi all
  Thanks for your reply.I do some investigation for much time.and I will
post some logs of the 'top' and IO in a few days when the crash come again.

2016-03-08 10:45 GMT+08:00 Shawn Heisey :

> On 3/7/2016 2:23 AM, Toke Eskildsen wrote:
> > How does this relate to YouPeng reporting that the CPU usage increases?
> >
> > This is not a snark. YouPeng mentions kernel issues. It might very well
> > be that IO is the real problem, but that it manifests in a non-intuitive
> > way. Before memory-mapping it was easy: Just look at IO-Wait. Now I am
> > not so sure. Can high kernel load (Sy% in *nix top) indicate that the IO
> > system is struggling, even if IO-Wait is low?
>
> It might turn out to be not directly related to memory, you're right
> about that.  A very high query rate or particularly CPU-heavy queries or
> analysis could cause high CPU usage even when memory is plentiful, but
> in that situation I would expect high user percentage, not kernel.  I'm
> not completely sure what might cause high kernel usage if iowait is low,
> but no specific information was given about iowait.  I've seen iowait
> percentages of 10% or less with problems clearly caused by iowait.
>
> With the available information (especially seeing 700GB of index data),
> I believe that the "not enough memory" scenario is more likely than
> anything else.  If the OP replies and says they have plenty of memory,
> then we can move on to the less common (IMHO) reasons for high CPU with
> a large index.
>
> If the OS is one that reports load average, I am curious what the 5
> minute average is, and how many real (non-HT) CPU cores there are.
>
> Thanks,
> Shawn
>
>


Different scores depending on cloud node

2016-03-08 Thread Robert Brown

Hi,

I have 2 shards, each with 1 replica.

When sending the same request to the cluster, I'm seeing the same 
results, but ordered differently, and with different scores.


Does this highlight an issue with my index, or is this an accepted anomaly?

Example of 8 results:

1st call:

160.2047
160.2047
157.86732
157.86732
157.86732
157.86732
152.6514
152.6514

2nd call:

157.86732
157.86732
157.86732
157.86732
157.64246
157.64246
150.39238
150.39238



Thanks,
Rob



Re: Different scores depending on cloud node

2016-03-08 Thread Shawn Heisey
On 3/8/2016 6:56 AM, Robert Brown wrote:
> I have 2 shards, each with 1 replica.
>
> When sending the same request to the cluster, I'm seeing the same
> results, but ordered differently, and with different scores.
>
> Does this highlight an issue with my index, or is this an accepted
> anomaly?

SolrCloud's method of operation can result in a different number of
deleted documents on different replicas.  Deleted documents that still
exist within the index can affect scores.  Because SolrCloud picks an
available replica at random to satisfy queries, different requests will
use different replicas.

Distributed IDF, available starting in version 5.0 and described on the
following documentation page, can help even these differences out, but
will not completely eliminate them:

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests

The only way that I know of to completely wipe out these anomalies is to
optimize your collection.  This will completely rewrite the index,
getting rid of deleted documents as it runs, which tends to be very slow
and can be very disruptive to Solr's performance.  It will also block
deleteByQuery requests until it is finished.

Thanks,
Shawn



IllegalArgumentException: Seeking to negative position

2016-03-08 Thread Yago Riveiro
I saw this exception in my log. What can caused this?

java.lang.IllegalArgumentException: Seeking to negative position:
MMapIndexInput(path="/opt/solr/node/collections/2016_shard9_replica2/data/index/_0.fdx")
at
org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.seek(ByteBufferIndexInput.java:407)
at 
org.apache.lucene.codecs.CodecUtil.retrieveChecksum(CodecUtil.java:400)
at 
org.apache.solr.handler.IndexFetcher.compareFile(IndexFetcher.java:843)
at 
org.apache.solr.handler.IndexFetcher.isIndexStale(IndexFetcher.java:914)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:376)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:380)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:162)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:244)
at
org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.seek(ByteBufferIndexInput.java:404)
... 9 more



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/IllegalArgumentException-Seeking-to-negative-position-tp4262463.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Different scores depending on cloud node

2016-03-08 Thread Markus Jelsma
Hi - see inline.
Markus

-Original message-
> From:Shawn Heisey 
> Sent: Tuesday 8th March 2016 15:11
> To: solr-user@lucene.apache.org
> Subject: Re: Different scores depending on cloud node
> 
> On 3/8/2016 6:56 AM, Robert Brown wrote:
> > I have 2 shards, each with 1 replica.
> >
> > When sending the same request to the cluster, I'm seeing the same
> > results, but ordered differently, and with different scores.
> >
> > Does this highlight an issue with my index, or is this an accepted
> > anomaly?
> 
> SolrCloud's method of operation can result in a different number of
> deleted documents on different replicas.  Deleted documents that still
> exist within the index can affect scores.  Because SolrCloud picks an
> available replica at random to satisfy queries, different requests will
> use different replicas.

This is indeed a problem is your similarity relies on maxDoc. DocCount does not 
suffer from this problem. It becomes much more stable, although we still 
sometimes see tiny anomalies.

> 
> Distributed IDF, available starting in version 5.0 and described on the
> following documentation page, can help even these differences out, but
> will not completely eliminate them:
> 
> https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
> 
> The only way that I know of to completely wipe out these anomalies is to
> optimize your collection.  This will completely rewrite the index,
> getting rid of deleted documents as it runs, which tends to be very slow
> and can be very disruptive to Solr's performance.  It will also block
> deleteByQuery requests until it is finished.
> 
> Thanks,
> Shawn
> 
> 


Re: Solrcloud Batch Indexing

2016-03-08 Thread Cassandra Targett
There is an open source Hive -> Solr SerDe available that might be worth
checking out: https://github.com/lucidworks/hive-solr. I'm not sure how it
would work with the source table being rebuilt every day since it uses
Hive's external tables, but it might be something you could extend.

On Mon, Mar 7, 2016 at 4:40 PM, Erick Erickson 
wrote:

> Bin:
>
> The MRIT/Morphlines only makes sense if you have lots more
> nodes devoted to the M/R jobs than you do Solr shards since the
> actual work done to index a given doc is exactly the same either
> with MRIT/Morphlines or just sending straight to Solr.
>
> A bit of background here. I mentioned that MRIT/Morphlines uses
> EmbeddedSolrServer. This is exactly Solr as far as the actual indexing
> is concerned. So using --go-live is not buying you anything and, in fact,
> is costing you quite a bit over just using <2> to index directly to Solr
> since
> the index has to be copied around. I confess I'm surprised that --go-live
> is taking that long. basically it's just copying your index up to Solr so
> perhaps there's an I/O problem or some such.
>
> OK, I'm lying a little bit here, _if_ you have more than one replica per
> shard, then indexing straight to Solr will cost you (anecdotally)
> 10-15% in indexing speed. But if this is a single replica/shard (i.e.
> leader-only), then it's near enough to being the exact same.
>
> Anyway, at the end of the day, the index produced is self-contained.
> You could even just copy it to your shards (with Solr down), and then
> bring up your Solr nodes on a non-HDFS-based Solr.
>
> But frankly I'd avoid that and benchmark on <2> first. My expectation
> is that you'll be fine there and see indexing roughly on par with your
> MRIT/Morphlines.
>
> Now, all that said, indexing 300M docs in 'a few minutes' is a bit
> surprising.
> I'm really wondering if you're not being fooled by something "odd". Have
> you compared the identical runs with and without --go-live?
>
> _Very_ often, the bottleneck isn't Solr at all, it's the data acquisition,
> so be
> careful when measuring that the Solr CPU's are pegged... otherwise
> you're bottlenecking upstream of Solr. A super-simple way to figure that
> out is to comment out the solrServer.add(list, 1) line in <2> or just
> run MRIT/Morphlines without the --go-live switch.
>
> BTW, with <2> you could run with as many jobs as you wanted to run
> the Solr servers flat-out.
>
> FWIW,
> Erick
>
> On Mon, Mar 7, 2016 at 1:14 PM, Bin Wang  wrote:
> > Hi Eric,
> >
> > Thanks for your quick response.
> >
> > From the data's perspective, we have 300+ million rows and believe it or
> > not, the source data is from relational database (Hive) and the database
> is
> > rebuilt every day (I am as frustrated as most of you who read this but it
> > is what it is) and potentially need to store actually all of the fields.
> > In this case, I have to figure out a solution to quickly index 300+
> million
> > rows as fast as I can.
> >
> > I am still at a stage evaluating all the different solutions, and I am
> > sorry that I haven't really benchmarked the second approach yet.
> > I will find a time to run some benchmark and share the result with the
> > community.
> >
> > Regarding the approach that I suggested - mapreduce Lucene indexes, do
> you
> > think it is feasible and does that worth the effort to dive into?
> >
> > Best regards,
> >
> > Bin
> >
> >
> >
> > On Mon, Mar 7, 2016 at 1:57 PM, Erick Erickson 
> > wrote:
> >
> >> I'm wondering if you need map reduce at all ;)...
> >>
> >> The achilles heel with M/R viz: Solr is all the copying around
> >> that's done at the end of the cycle. For really large bulk indexing
> >> jobs, that's a reasonable price to pay..
> >>
> >> How many docs and how would you characterize them as far
> >> as size, fields, etc? And what are your time requirements? What
> >> kind of docs?
> >>
> >> I'm thinking this may be an "XY Problem". You're asking about
> >> a specific solution before explaining the problem.
> >>
> >> Why do you say that Solr is not really optimized for bulk loading?
> >> I took a quick look at <2> and the approach is sound. It batches
> >> up the docs in groups of 1,000 and uses CloudSolrServer as it should.
> >> Have you tried it? At the end of the day, MapReduceIndexerTool does
> >> the same work to index a doc as a regular Solr server would via
> >> EmbeddedSolrServer so if the number of tasks you have running is
> >> roughly equal to the number of shards, it _should_ be roughly
> >> comparable.
> >>
> >> Still, though, I have to repeat my question about how many docs you're
> >> talking here. Using M/R inevitably adds complexity, what are you trying
> >> to gain here that you can't get with several threads in a SolrJ client?
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Mar 7, 2016 at 12:28 PM, Bin Wang  wrote:
> >> > Hi there,
> >> >
> >> > I have a fairly big data set that I need to quick index into
> Solrcloud.
> >> >
> >> > I have done some research

RE: Multiple custom Similarity implementations

2016-03-08 Thread Markus Jelsma
Hello, you can not change similarities per request, and this is likely never 
going to be supported for good reasons. You need multiple cores, or multiple 
fields with different similarity defined in the same core.
Markus
 
-Original message-
> From:Parvesh Garg 
> Sent: Tuesday 8th March 2016 5:36
> To: solr-user@lucene.apache.org
> Subject: Multiple custom Similarity implementations
> 
> Hi,
> 
> We have a requirement where we want to run an A/B test over multiple
> Similarity implementations. Is it possible to define multiple similarity
> tags in schema.xml file and chose one using the URL parameter? We are using
> solr 4.7
> 
> Currently, we are planning to have different cores with different
> similarity configured and split traffic based on core names. This is
> leading to index duplication and un-necessary resource usage.
> 
> Any help is highly appreciated.
> 
> Parvesh Garg,
> 
> http://www.zettata.com
> 


Re: Warning and Error messages in Solr's log

2016-03-08 Thread Steven White
Re-posting.  Anyone has any idea about this question?  Thanks.

Steve

On Mon, Mar 7, 2016 at 5:15 PM, Steven White  wrote:

> Hi folks,
>
> In Solr's solr-8983-console.log I see the following (about 50 in a span of
> 24 hours when index is on going):
>
> WARNING: Couldn't flush user prefs:
> java.util.prefs.BackingStoreException: Couldn't get file lock.
>
> What does it mean?  Should I wary about it?
>
> What about this one:
>
> 118316292 [qtp114794915-39] ERROR org.apache.solr.core.SolrCore  [
> test_idx] ? java.lang.IllegalStateException: file: 
> MMapDirectory@/b/vin291f1/vol/vin291f1v3/idx/solr_index/test/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@169f6ad3 appears
> both in delegate and in cache: cache=[_2omj.fnm, _2omg_Lucene50_0.doc,
>  _2omg.nvm],delegate=[write.lock, _1wuk.si,  segments_2b]
> at
> org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:103)
>
> What does it mean?
>
> I _think_ the error log is due to the NAS drive being disconnected before
> shutting down Solr, but I need a Solr expect to confirm.
>
> Unfortunately, I cannot find anything in solr.log files regarding this
> because those files have rotated.
>
> Thanks in advanced.
>
> Steve
>


Re: ngrams with position

2016-03-08 Thread Alessandro Benedetti
Elizabeth,
out of curiousity, could we know what you are trying to solve with that
complex way of tokenisation ?
Solr is really good in storing positions along with token, so I am curious
to know why your are mixing the things up.

Cheers

On 8 March 2016 at 10:08, elisabeth benoit 
wrote:

> Thanks for your answer Emir,
>
> I'll check that out.
>
> Best regards,
> Elisabeth
>
> 2016-03-08 10:24 GMT+01:00 Emir Arnautovic :
>
> > Hi Elisabeth,
> > I don't think there is such token filter, so you would have to create
> your
> > own token filter that takes token and emits ngram token of specific
> length.
> > It should not be too hard to create such filter - you can take a look how
> > nagram filter is coded - yours should be simpler than that.
> >
> > Regards,
> > Emir
> >
> >
> > On 08.03.2016 08:52, elisabeth benoit wrote:
> >
> >> Hello,
> >>
> >> I'm using solr 4.10.1. I'd like to index words with ngrams of fix lenght
> >> with a position in the end.
> >>
> >> For instance, with fix lenght 3, Amsterdam would be something like:
> >>
> >>
> >> a0 (two spaces added at beginning)
> >> am1
> >> ams2
> >> mst3
> >> ste4
> >> ter5
> >> erd6
> >> rda7
> >> dam8
> >> am9 (one more space in the end)
> >>
> >> The number at the end being the position.
> >>
> >> Does anyone have a clue how to achieve this?
> >>
> >> Best regards,
> >> Elisabeth
> >>
> >>
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Indexing Twitter - Hypothetical

2016-03-08 Thread Joseph Obernberger
Thank you for the links and explanation.  We are using GATE (General
Architecture for Text Engineering) and parts of the Stanford NER/Parser for
the data that we ingest, but we do not apply it to the queries - only the
data.  We've been concentrating on the back-end, and analytics, not so much
what comes in for queries; something that we need to address.  For this
hypothetical, I wanted to get ideas on what questions would need to be
asked, and how large the system would need to be.  Thank you all very much
for the information so far!
Jack - I want to be a guru-level Solr expert.  :)

-Joe

On Sun, Mar 6, 2016 at 1:29 PM, Walter Underwood 
wrote:

> This is a very good presentation on using entity extraction in query
> understanding. As you’ll see from the preso, it is not easy.
>
>
> http://www.slideshare.net/dtunkelang/better-search-through-query-understanding
> <
> http://www.slideshare.net/dtunkelang/better-search-through-query-understanding
> >
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Mar 6, 2016, at 7:27 AM, Jack Krupansky 
> wrote:
> >
> > Back to the original question... there are two answers:
> >
> > 1. Yes - for guru-level Solr experts.
> > 2. No - for anybody else.
> >
> > For starters, (as always), you would need to do a lot more upfront work
> on
> > mapping out the forms of query which will be supported. For example, is
> > your focus on precision or recall. And, are you looking to analyze all
> > matching tweets or just a sample. And, the load, throughput, and latency
> > requirements. And, any spatial search requirements. And, any entity
> search
> > requirements. Without a clear view of the query requirements it simply
> > isn't possible to even begin defining a data model. And without a data
> > model, indexing is a fool's errand. In short, no focus, no progress.
> >
> > -- Jack Krupansky
> >
> > On Sun, Mar 6, 2016 at 7:42 AM, Susheel Kumar 
> wrote:
> >
> >> Entity Recognition means you may want to recognize different entities
> >> name/person, email, location/city/state/country etc. in your
> >> tweets/messages with goal of  providing better relevant results to
> users.
> >> NER can be used at query or indexing (data enrichment) time.
> >>
> >> Thanks,
> >> Susheel
> >>
> >> On Fri, Mar 4, 2016 at 7:55 PM, Joseph Obernberger <
> >> joseph.obernber...@gmail.com> wrote:
> >>
> >>> Thank you all very much for all the responses so far.  I've enjoyed
> >> reading
> >>> them!  We have noticed that storing data inside of Solr results in
> >>> significantly worse performance (particularly faceting); so we store
> the
> >>> values of all the fields elsewhere, but index all the data with Solr
> >>> Cloud.  I think the suggestion about splitting the data up into blocks
> of
> >>> date/time is where we would be headed.  Having two Solr-Cloud clusters
> -
> >>> one to handle ~30 days of data, and one to handle historical.  Another
> >>> option is to use a single Solr Cloud cluster, but use multiple
> >>> cores/collections.  Either way you'd need a job to come through and
> clean
> >>> up old data. The historical cluster would have much worse performance,
> >>> particularly for clustering and faceting the data, but that may be
> >>> acceptable.
> >>> I don't know what you mean by 'entity recognition in the queries' -
> could
> >>> you elaborate?
> >>>
> >>> We would want to index and potentially facet on any of the fields - for
> >>> example entities_media_url, username, even background color, but we do
> >> not
> >>> know a-priori what fields will be important to users.
> >>> As to why we would want to make the data searchable; well - I don't
> make
> >>> the rules!  Tweets is not the only data source, but it's certainly the
> >>> largest that we are currently looking at handling.
> >>>
> >>> I will read up on the Berlin Buzzwords - thank you for the info!
> >>>
> >>> -Joe
> >>>
> >>>
> >>>
> >>> On Fri, Mar 4, 2016 at 9:59 AM, Jack Krupansky <
> jack.krupan...@gmail.com
> >>>
> >>> wrote:
> >>>
>  As always, the initial question is how you intend to query the data -
> >>> query
>  drives data modeling. How real-time do you need queries to be? How
> fast
> >>> do
>  you need archive queries to be? How many fields do you need to query
> >> on?
>  How much entity recognition do you need in queries?
> 
> 
>  -- Jack Krupansky
> 
>  On Fri, Mar 4, 2016 at 4:19 AM, Charlie Hull 
> >> wrote:
> 
> > On 03/03/2016 19:25, Toke Eskildsen wrote:
> >
> >> Joseph Obernberger  wrote:
> >>
> >>> Hi All - would it be reasonable to index the Twitter 'firehose'
> >>> with Solr Cloud - roughly 500-600 million docs per day indexing
> >>> each of the fields (about 180)?
> >>>
> >>
> >> Possible, yes. Reasonable? It is not going to be cheap.
> >>
> >> Twitter index the tweets themselves and have been quite open about
> >> how they do it. I would suggest looking 

Failed to set SSL solr 5.2.1 Windows OS

2016-03-08 Thread Ilan Schwarts
Hi all, I am trying to integrate solr with SSL on Windows 7 OS
I followed the enable ssl guide at
https://cwiki.apache.org/confluence/display/solr/Enabling+SSL

I created the keystore and placed in on etc folder. I un-commented the
lines and set:
SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
SOLR_SSL_KEY_STORE_PASSWORD=password
SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
SOLR_SSL_TRUST_STORE_PASSWORD=password
SOLR_SSL_NEED_CLIENT_AUTH=false

When i test the storekey using
keytool -list -alias solr-ssl -keystore
C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password -keypass
password
It is okay, and print me there is 1 entry in keystore.

When i am running in from solr, it will write:
"Keystore was tampered with, or password was incorrect"
I get this exception after JavaKeyStore.engineLoad(JavaKeyStore.java:780)


If i replace
SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
it will write the same error, i suspect i dont deliver the path as it
should be.

Any suggestions ?

Thanks


-- 


-
Ilan Schwarts


Re: Indexing Twitter - Hypothetical

2016-03-08 Thread Jack Krupansky
You have my permission... and blessing... and... condolences!

BTW, our usual recommendation is to do a subset proof of concept to see how
all the pieces come together and then calculate the scaling from there.
IOW, go ahead and index a day, a week, a month from the firehose and see
how many nodes, RAM, and SSD that takes and scale from there, although
estimating by more than a factor of ten is problematic given nonlinear
effects.


-- Jack Krupansky

On Tue, Mar 8, 2016 at 11:50 AM, Joseph Obernberger <
joseph.obernber...@gmail.com> wrote:

> Thank you for the links and explanation.  We are using GATE (General
> Architecture for Text Engineering) and parts of the Stanford NER/Parser for
> the data that we ingest, but we do not apply it to the queries - only the
> data.  We've been concentrating on the back-end, and analytics, not so much
> what comes in for queries; something that we need to address.  For this
> hypothetical, I wanted to get ideas on what questions would need to be
> asked, and how large the system would need to be.  Thank you all very much
> for the information so far!
> Jack - I want to be a guru-level Solr expert.  :)
>
> -Joe
>
> On Sun, Mar 6, 2016 at 1:29 PM, Walter Underwood 
> wrote:
>
> > This is a very good presentation on using entity extraction in query
> > understanding. As you’ll see from the preso, it is not easy.
> >
> >
> >
> http://www.slideshare.net/dtunkelang/better-search-through-query-understanding
> > <
> >
> http://www.slideshare.net/dtunkelang/better-search-through-query-understanding
> > >
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Mar 6, 2016, at 7:27 AM, Jack Krupansky 
> > wrote:
> > >
> > > Back to the original question... there are two answers:
> > >
> > > 1. Yes - for guru-level Solr experts.
> > > 2. No - for anybody else.
> > >
> > > For starters, (as always), you would need to do a lot more upfront work
> > on
> > > mapping out the forms of query which will be supported. For example, is
> > > your focus on precision or recall. And, are you looking to analyze all
> > > matching tweets or just a sample. And, the load, throughput, and
> latency
> > > requirements. And, any spatial search requirements. And, any entity
> > search
> > > requirements. Without a clear view of the query requirements it simply
> > > isn't possible to even begin defining a data model. And without a data
> > > model, indexing is a fool's errand. In short, no focus, no progress.
> > >
> > > -- Jack Krupansky
> > >
> > > On Sun, Mar 6, 2016 at 7:42 AM, Susheel Kumar 
> > wrote:
> > >
> > >> Entity Recognition means you may want to recognize different entities
> > >> name/person, email, location/city/state/country etc. in your
> > >> tweets/messages with goal of  providing better relevant results to
> > users.
> > >> NER can be used at query or indexing (data enrichment) time.
> > >>
> > >> Thanks,
> > >> Susheel
> > >>
> > >> On Fri, Mar 4, 2016 at 7:55 PM, Joseph Obernberger <
> > >> joseph.obernber...@gmail.com> wrote:
> > >>
> > >>> Thank you all very much for all the responses so far.  I've enjoyed
> > >> reading
> > >>> them!  We have noticed that storing data inside of Solr results in
> > >>> significantly worse performance (particularly faceting); so we store
> > the
> > >>> values of all the fields elsewhere, but index all the data with Solr
> > >>> Cloud.  I think the suggestion about splitting the data up into
> blocks
> > of
> > >>> date/time is where we would be headed.  Having two Solr-Cloud
> clusters
> > -
> > >>> one to handle ~30 days of data, and one to handle historical.
> Another
> > >>> option is to use a single Solr Cloud cluster, but use multiple
> > >>> cores/collections.  Either way you'd need a job to come through and
> > clean
> > >>> up old data. The historical cluster would have much worse
> performance,
> > >>> particularly for clustering and faceting the data, but that may be
> > >>> acceptable.
> > >>> I don't know what you mean by 'entity recognition in the queries' -
> > could
> > >>> you elaborate?
> > >>>
> > >>> We would want to index and potentially facet on any of the fields -
> for
> > >>> example entities_media_url, username, even background color, but we
> do
> > >> not
> > >>> know a-priori what fields will be important to users.
> > >>> As to why we would want to make the data searchable; well - I don't
> > make
> > >>> the rules!  Tweets is not the only data source, but it's certainly
> the
> > >>> largest that we are currently looking at handling.
> > >>>
> > >>> I will read up on the Berlin Buzzwords - thank you for the info!
> > >>>
> > >>> -Joe
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Mar 4, 2016 at 9:59 AM, Jack Krupansky <
> > jack.krupan...@gmail.com
> > >>>
> > >>> wrote:
> > >>>
> >  As always, the initial question is how you intend to query the data
> -
> > >>> query
> >  drives data modeling. How real-time do you need queries to be? How
> > 

Can termfreq count stemmed forms of terms?

2016-03-08 Thread Aki Balogh
Hi All,

We're using solr termfreq to count raw term frequencies (i.e. the tf in
tf-idf).

This works fine on a regular text field.

However, we have a field where we've added snowball stemmer.

Should termfreq also work on a stemmed field?

Right now, we're only getting data back on terms where the stemmed form is
the same as the non-stemmed form.

Thanks,
Aki


Re: Can termfreq count stemmed forms of terms?

2016-03-08 Thread Aki Balogh
Doh! I think I had answered my own question back last year:

http://qnalist.com/questions/6147365/term-frequency-with-stemming




*The only trick is, each term in a phrase has to be stemmed separately
(i.e."end-user experience" has to be broken down into "end-user" ->
"end-us" and"experience" -> "experi") before being passed, i.e.
termfreq(body, "end-usexperi").*


Akos (Aki) Balogh
Co-Founder / Chief Product Officer
https://www.MarketMuse.com

On Tue, Mar 8, 2016 at 1:14 PM, Aki Balogh  wrote:

> Hi All,
>
> We're using solr termfreq to count raw term frequencies (i.e. the tf in
> tf-idf).
>
> This works fine on a regular text field.
>
> However, we have a field where we've added snowball stemmer.
>
> Should termfreq also work on a stemmed field?
>
> Right now, we're only getting data back on terms where the stemmed form is
> the same as the non-stemmed form.
>
> Thanks,
> Aki
>


Re: XJoin, a way to use external data sources with Solr

2016-03-08 Thread Zisis Tachtsidis
Hi Charlie, 

This looks like an interesting feature, but I have a couple of questions
before giving it a try. 

I had similar needs - filtering results based on information outside of the
queried Solr collection - and I went down the post-filtering path.
More specifically I've implemented a *PostFilter* which gets the info
outside of the current Solr collection and based on that it filters the
normal search results in the *collect(int docNumber)* method later on. It's
something similar to the approach described at 
http://qaware.blogspot.com.tr/2014/11/how-to-write-postfilter-for-solr-49.html

  

Did you consider such an approach? Do you think there are some downsides of
the post-filtering approach? And what extra functionality can I get from
XJoin (if they are doing the same thing more or less)?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/XJoin-a-way-to-use-external-data-sources-with-Solr-tp4254055p4262540.html
Sent from the Solr - User mailing list archive at Nabble.com.


Duplicate Document IDs when updateing parent document with child document

2016-03-08 Thread Sebastian Riemer
Hi,

I have created a simple Java application which illustrates this issue.

I am using Solr-Version 5.5.0 and SolrJ.

Here is a link to the github repository: 
https://github.com/sebastianriemer/SolrDuplicateTest

The issue I am facing is also described by another person on stackoverflow: 
http://stackoverflow.com/questions/34253178/solr-doesnt-overwrite-duplicated-uniquekey-entries

I would love if any of you could run the test at your place and give me 
feedback.

If you have any questions do not hesitate to write me.

Many thanks in advance and best regards,

Sebastian Riemer





Re: Stopping Solr JVM on OOM

2016-03-08 Thread Binoy Dalal
Hi Shawn,
The JIRA issue is SOLR-8803 (https://issues.apache.org/jira/browse/SOLR-8803
).
I've used "git diff" and created a patch but it only has the changes that I
made to the solr.cmd file under bin to add the -XX:OnOutOfMemoryError
option.
There's the entire file of the actual OOM kill script that does not show in
the patch.
Do I upload this file along with the patch or is there something else I've
to do to put in the new file.
Please advise.

Thanks.


On Tue, Mar 8, 2016 at 7:03 PM Shawn Heisey  wrote:

> On 3/8/2016 5:13 AM, Binoy Dalal wrote:
> > I've just finished writing a batch oom killer script and it seems to work
> > fine.
> >
> > I couldn't try it on the actual solr process since I'm a bit stumped on
> how
> > I can make solr throw an oom at will.
> > Although I did write another code that does throw an oom upon which this
> > script is called and the running solr process is killed.
> >
> > I would like to know how I should proceed from here with submitting the
> > code for review etc.
>
> Open an Improvement issue on the SOLR project in Apache's Jira with a
> title like "OOM killer for Windows" and a useful description.  Clone the
> source code from git, make your changes/additions.  Create a patch using
> "git diff" and upload it using SOLR-.patch as the filename -- the
> same name as the Jira issue.
>
> Making Solr OOM on purpose is possible, but it is usually better to
> write a small test program with an intentional memory leak.
>
> I wonder if we can write a test for OOM death.
>
> Thanks,
> Shawn
>
> --
Regards,
Binoy Dalal


Re: Stopping Solr JVM on OOM

2016-03-08 Thread Binoy Dalal
I've uploaded both files.
Please review and advise.

On Wed, Mar 9, 2016 at 12:46 AM Binoy Dalal  wrote:

> Hi Shawn,
> The JIRA issue is SOLR-8803 (
> https://issues.apache.org/jira/browse/SOLR-8803).
> I've used "git diff" and created a patch but it only has the changes that
> I made to the solr.cmd file under bin to add the -XX:OnOutOfMemoryError
> option.
> There's the entire file of the actual OOM kill script that does not show
> in the patch.
> Do I upload this file along with the patch or is there something else I've
> to do to put in the new file.
> Please advise.
>
> Thanks.
>
>
> On Tue, Mar 8, 2016 at 7:03 PM Shawn Heisey  wrote:
>
>> On 3/8/2016 5:13 AM, Binoy Dalal wrote:
>> > I've just finished writing a batch oom killer script and it seems to
>> work
>> > fine.
>> >
>> > I couldn't try it on the actual solr process since I'm a bit stumped on
>> how
>> > I can make solr throw an oom at will.
>> > Although I did write another code that does throw an oom upon which this
>> > script is called and the running solr process is killed.
>> >
>> > I would like to know how I should proceed from here with submitting the
>> > code for review etc.
>>
>> Open an Improvement issue on the SOLR project in Apache's Jira with a
>> title like "OOM killer for Windows" and a useful description.  Clone the
>> source code from git, make your changes/additions.  Create a patch using
>> "git diff" and upload it using SOLR-.patch as the filename -- the
>> same name as the Jira issue.
>>
>> Making Solr OOM on purpose is possible, but it is usually better to
>> write a small test program with an intentional memory leak.
>>
>> I wonder if we can write a test for OOM death.
>>
>> Thanks,
>> Shawn
>>
>> --
> Regards,
> Binoy Dalal
>
-- 
Regards,
Binoy Dalal


Re: Warning and Error messages in Solr's log

2016-03-08 Thread Shawn Heisey
On 3/7/2016 3:15 PM, Steven White wrote:
> In Solr's solr-8983-console.log I see the following (about 50 in a span of
> 24 hours when index is on going):
>
> WARNING: Couldn't flush user prefs:
> java.util.prefs.BackingStoreException: Couldn't get file lock.

This is not directly related to Solr.  It's a problem in something that
looks like it's part of Java.  I have very little idea about what causes
it, but google finds other people with this log message, in connection
with software other than Solr.

Some information I found suggests one thing that *MIGHT* explain both
problems you mentioned:  Trying to share an index directory between
multiple Solr instances or cores.  Don't try to do that.  Lucene and
Solr are designed around exclusive access to the index directory.

> 118316292 [qtp114794915-39] ERROR org.apache.solr.core.SolrCore  [
> test_idx] ? java.lang.IllegalStateException: file:
> MMapDirectory@/b/vin291f1/vol/vin291f1v3/idx/solr_index/test/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@169f6ad3 appears
> both in delegate and in cache: cache=[_2omj.fnm, _2omg_Lucene50_0.doc,
>  _2omg.nvm],delegate=[write.lock, _1wuk.si,  segments_2b]
> at
> org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:103)
>
> What does it mean?
>
> I _think_ the error log is due to the NAS drive being disconnected before
> shutting down Solr, but I need a Solr expect to confirm.

This is a low-level Lucene error in the caching Directory implementation
that Solr uses by default.  Basically, segment files should either be
sitting in memory (the cache) or on the disk -- never both.  This should
not happen if the index directory is only being used by one Solr
instance and core.

I did find some issues filed against Solr that showed this problem in
relation to frequent checking of Solr's informational APIs:

https://issues.apache.org/jira/browse/SOLR-7785
https://issues.apache.org/jira/browse/SOLR-8630

This error message has also turned up in relation to Solr indexes stored
on HDFS, but I don't think that applies here.

HDFS is the only network filesystem with explicit support in Solr, but
even when using HDFS, one directory should not be shared by more than
one Solr instance or core.  Other network filesystems like NFS and SMB
(which I bring up because you mentioned NAS) are not well-supported by
Lucene/Solr, and definitely should not be used to share index directories.

Thanks,
Shawn



Re: Failed to set SSL solr 5.2.1 Windows OS

2016-03-08 Thread Steve Rowe
Hi Ilan,

Looks like you’re modifying solr.in.sh instead of solr.in.cmd?

FYI running under Cygwin is not supported.

--
Steve
www.lucidworks.com

> On Mar 8, 2016, at 11:51 AM, Ilan Schwarts  wrote:
> 
> Hi all, I am trying to integrate solr with SSL on Windows 7 OS
> I followed the enable ssl guide at
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL
> 
> I created the keystore and placed in on etc folder. I un-commented the
> lines and set:
> SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> SOLR_SSL_KEY_STORE_PASSWORD=password
> SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> SOLR_SSL_TRUST_STORE_PASSWORD=password
> SOLR_SSL_NEED_CLIENT_AUTH=false
> 
> When i test the storekey using
> keytool -list -alias solr-ssl -keystore
> C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password -keypass
> password
> It is okay, and print me there is 1 entry in keystore.
> 
> When i am running in from solr, it will write:
> "Keystore was tampered with, or password was incorrect"
> I get this exception after JavaKeyStore.engineLoad(JavaKeyStore.java:780)
> 
> 
> If i replace
> SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
> SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
> it will write the same error, i suspect i dont deliver the path as it
> should be.
> 
> Any suggestions ?
> 
> Thanks
> 
> 
> -- 
> 
> 
> -
> Ilan Schwarts



Re: Failed to set SSL solr 5.2.1 Windows OS

2016-03-08 Thread Ilan Schwarts
Hi, thanks for reply.
I am using solr.in.cmd
I even put some pause in the cmd with echo to see the parameters are ok..
This is the original file as found in
https://www.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.zip

[image: Inline image 1]

On Tue, Mar 8, 2016 at 10:25 PM, Steve Rowe  wrote:

> Hi Ilan,
>
> Looks like you’re modifying solr.in.sh instead of solr.in.cmd?
>
> FYI running under Cygwin is not supported.
>
> --
> Steve
> www.lucidworks.com
>
> > On Mar 8, 2016, at 11:51 AM, Ilan Schwarts  wrote:
> >
> > Hi all, I am trying to integrate solr with SSL on Windows 7 OS
> > I followed the enable ssl guide at
> > https://cwiki.apache.org/confluence/display/solr/Enabling+SSL
> >
> > I created the keystore and placed in on etc folder. I un-commented the
> > lines and set:
> > SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> > SOLR_SSL_KEY_STORE_PASSWORD=password
> > SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> > SOLR_SSL_TRUST_STORE_PASSWORD=password
> > SOLR_SSL_NEED_CLIENT_AUTH=false
> >
> > When i test the storekey using
> > keytool -list -alias solr-ssl -keystore
> > C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password
> -keypass
> > password
> > It is okay, and print me there is 1 entry in keystore.
> >
> > When i am running in from solr, it will write:
> > "Keystore was tampered with, or password was incorrect"
> > I get this exception after JavaKeyStore.engineLoad(JavaKeyStore.java:780)
> >
> >
> > If i replace
> > SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
> > SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
> > it will write the same error, i suspect i dont deliver the path as it
> > should be.
> >
> > Any suggestions ?
> >
> > Thanks
> >
> >
> > --
> >
> >
> > -
> > Ilan Schwarts
>
>


-- 


-
Ilan Schwarts


Re: Failed to set SSL solr 5.2.1 Windows OS

2016-03-08 Thread Steve Rowe
Hmm, not sure what’s happening.  Have you tried converting the backslashes in 
your paths to forward slashes?

--
Steve
www.lucidworks.com

> On Mar 8, 2016, at 3:39 PM, Ilan Schwarts  wrote:
> 
> Hi, thanks for reply.
> I am using solr.in.cmd
> I even put some pause in the cmd with echo to see the parameters are ok.. 
> This is the original file as found in 
> https://www.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.zip
> 
> 
> 
> On Tue, Mar 8, 2016 at 10:25 PM, Steve Rowe  wrote:
> Hi Ilan,
> 
> Looks like you’re modifying solr.in.sh instead of solr.in.cmd?
> 
> FYI running under Cygwin is not supported.
> 
> --
> Steve
> www.lucidworks.com
> 
> > On Mar 8, 2016, at 11:51 AM, Ilan Schwarts  wrote:
> >
> > Hi all, I am trying to integrate solr with SSL on Windows 7 OS
> > I followed the enable ssl guide at
> > https://cwiki.apache.org/confluence/display/solr/Enabling+SSL
> >
> > I created the keystore and placed in on etc folder. I un-commented the
> > lines and set:
> > SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> > SOLR_SSL_KEY_STORE_PASSWORD=password
> > SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> > SOLR_SSL_TRUST_STORE_PASSWORD=password
> > SOLR_SSL_NEED_CLIENT_AUTH=false
> >
> > When i test the storekey using
> > keytool -list -alias solr-ssl -keystore
> > C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password -keypass
> > password
> > It is okay, and print me there is 1 entry in keystore.
> >
> > When i am running in from solr, it will write:
> > "Keystore was tampered with, or password was incorrect"
> > I get this exception after JavaKeyStore.engineLoad(JavaKeyStore.java:780)
> >
> >
> > If i replace
> > SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
> > SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
> > it will write the same error, i suspect i dont deliver the path as it
> > should be.
> >
> > Any suggestions ?
> >
> > Thanks
> >
> >
> > --
> >
> >
> > -
> > Ilan Schwarts
> 
> 
> 
> 
> -- 
> 
> 
> -
> Ilan Schwarts



Re: DataImportHandler - Automatic scheduling of delta imports in Solr in windows 7

2016-03-08 Thread B Weber
harshrossi  gmail.com> writes:

> 
> I am using *DeltaImportHandler* for indexing data in Solr. Currently I 
am
> manually indexing the data into Solr by selecting commands full-import 
or
> delta-import from the Solr Admin screen.
> 
> I am using Windows 7 and would like to automate the process by 
specifying a
> certain time interval for executing the commands through windows task
> scheduler or something. e.g.: like every two minutes it should index 
data
> into solr.
> 
> From few sites I came to know that I need to create a *batch file* 
with some
> command to run the imports and the batch file is run using *windows
> scheduler*. But there were no examples regarding this.
> 
> I am not sure what to code in the batch file and how to link it with 
the
> scheduler.
> 
> Can someone provide me the code and the steps to accomplish it?
> 
> Thanks a lot in advance.
> 
> --
> View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-Automatic-
scheduling-of-delta-imports-in-Solr-in-windows-7-tp4130565.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 

I created a WPF Application and Windows Service to deal with this, since 
using batch files and Windows Task Scheduler was not an acceptable 
solution for me.

https://github.com/systemidx/SolrScheduler/




Retrieving of Field Type

2016-03-08 Thread Zheng Lin Edwin Yeo
Hi,

Is there any way that we can retrieve the field type of a field, either by
using SolrJ or by using URL?
The field type that we assigned in schema.xml, like int, float, tdate..
Would like to see if it is possible to retrieve it without going back to
the schema.xml.
I'm using Solr 5.4.0

Regards,
Edwin


Re: Retrieving of Field Type

2016-03-08 Thread Alexandre Rafalovitch
The Admin UI does and it uses Javascript. So you know it is possible.

Admin UI uses Luke for technical-level info:
http://localhost:8983/solr/techproducts/admin/luke
You can use Schema API for slightly better one:
http://localhost:8983/solr/techproducts/schema
You can also use Schema API to get just one field's info too:
http://localhost:8983/solr/techproducts/schema/fields/text

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 9 March 2016 at 16:19, Zheng Lin Edwin Yeo  wrote:
> Hi,
>
> Is there any way that we can retrieve the field type of a field, either by
> using SolrJ or by using URL?
> The field type that we assigned in schema.xml, like int, float, tdate..
> Would like to see if it is possible to retrieve it without going back to
> the schema.xml.
> I'm using Solr 5.4.0
>
> Regards,
> Edwin


Re: Retrieving of Field Type

2016-03-08 Thread Zheng Lin Edwin Yeo
Hi Alex,

Thanks for the information. That was helpful.

Regards,
Edwin

On 9 March 2016 at 13:31, Alexandre Rafalovitch  wrote:

> The Admin UI does and it uses Javascript. So you know it is possible.
>
> Admin UI uses Luke for technical-level info:
> http://localhost:8983/solr/techproducts/admin/luke
> You can use Schema API for slightly better one:
> http://localhost:8983/solr/techproducts/schema
> You can also use Schema API to get just one field's info too:
> http://localhost:8983/solr/techproducts/schema/fields/text
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 9 March 2016 at 16:19, Zheng Lin Edwin Yeo 
> wrote:
> > Hi,
> >
> > Is there any way that we can retrieve the field type of a field, either
> by
> > using SolrJ or by using URL?
> > The field type that we assigned in schema.xml, like int, float, tdate..
> > Would like to see if it is possible to retrieve it without going back to
> > the schema.xml.
> > I'm using Solr 5.4.0
> >
> > Regards,
> > Edwin
>


Disable hyper-threading for better Solr performance?

2016-03-08 Thread Avner Levy
I have a machine with 16 real cores (32 with HT enabled).
I'm running on it a Solr server and trying to reach maximum performance for 
indexing and queries (indexing 20k documents/sec by a number of threads).
I've read on multiple places that in some scenarios / products disabling the 
hyper-threading may result in better performance results.
I'm looking for inputs / insights about HT on Solr setups.
Thanks in advance,
  Avner


Query behavior.

2016-03-08 Thread Modassar Ather
Hi,

Kindly help me understand the parsing of following query. I am using
edismax parser and Solr-5.5.0.
q.op is set to AND and there is no explicit mm value set.

fl:(java OR book) => "boost(+((fl:java fl:book)~2),int(val))"

When the query has explicit OR then why the ~2 is present in the parsed
query?

How can I achieve following?
"boost(+((fl:java fl:book)),int(val))"

The reason being the ANDed and ORed queries both returns the same number of
documents. But what expected is that the ORed query should have more number
of documents.

Thanks,
Modassar


Re: Disable hyper-threading for better Solr performance?

2016-03-08 Thread Ilan Schwarts
What is the solr version and shard config? Standalone? Multiple cores?
Spread over RAID ?
On Mar 9, 2016 9:00 AM, "Avner Levy"  wrote:

> I have a machine with 16 real cores (32 with HT enabled).
> I'm running on it a Solr server and trying to reach maximum performance
> for indexing and queries (indexing 20k documents/sec by a number of
> threads).
> I've read on multiple places that in some scenarios / products disabling
> the hyper-threading may result in better performance results.
> I'm looking for inputs / insights about HT on Solr setups.
> Thanks in advance,
>   Avner
>


Re: Failed to set SSL solr 5.2.1 Windows OS

2016-03-08 Thread Ilan Schwarts
How would one try to solve this issue? What would you suggest me to do?
Debug that module? I will try only to install clean jetty with ssl first.

Another question. The files jetty.xml\jetty-ssl.xml and the rest of files
in /etc are being used in solr 5.2.1?
On Mar 9, 2016 12:08 AM, "Steve Rowe"  wrote:

> Hmm, not sure what’s happening.  Have you tried converting the backslashes
> in your paths to forward slashes?
>
> --
> Steve
> www.lucidworks.com
>
> > On Mar 8, 2016, at 3:39 PM, Ilan Schwarts  wrote:
> >
> > Hi, thanks for reply.
> > I am using solr.in.cmd
> > I even put some pause in the cmd with echo to see the parameters are
> ok.. This is the original file as found in
> https://www.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.zip
> >
> > 
> >
> > On Tue, Mar 8, 2016 at 10:25 PM, Steve Rowe  wrote:
> > Hi Ilan,
> >
> > Looks like you’re modifying solr.in.sh instead of solr.in.cmd?
> >
> > FYI running under Cygwin is not supported.
> >
> > --
> > Steve
> > www.lucidworks.com
> >
> > > On Mar 8, 2016, at 11:51 AM, Ilan Schwarts  wrote:
> > >
> > > Hi all, I am trying to integrate solr with SSL on Windows 7 OS
> > > I followed the enable ssl guide at
> > > https://cwiki.apache.org/confluence/display/solr/Enabling+SSL
> > >
> > > I created the keystore and placed in on etc folder. I un-commented the
> > > lines and set:
> > > SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> > > SOLR_SSL_KEY_STORE_PASSWORD=password
> > > SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> > > SOLR_SSL_TRUST_STORE_PASSWORD=password
> > > SOLR_SSL_NEED_CLIENT_AUTH=false
> > >
> > > When i test the storekey using
> > > keytool -list -alias solr-ssl -keystore
> > > C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password
> -keypass
> > > password
> > > It is okay, and print me there is 1 entry in keystore.
> > >
> > > When i am running in from solr, it will write:
> > > "Keystore was tampered with, or password was incorrect"
> > > I get this exception after
> JavaKeyStore.engineLoad(JavaKeyStore.java:780)
> > >
> > >
> > > If i replace
> > > SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
> > > SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
> > > it will write the same error, i suspect i dont deliver the path as it
> > > should be.
> > >
> > > Any suggestions ?
> > >
> > > Thanks
> > >
> > >
> > > --
> > >
> > >
> > > -
> > > Ilan Schwarts
> >
> >
> >
> >
> > --
> >
> >
> > -
> > Ilan Schwarts
>
>