On 6/26/2013 11:25 PM, Sandeep Gupta wrote:
> To have singleton design pattern for SolrServer object creation,
> I found that there are so many ways described in
> http://en.wikipedia.org/wiki/Singleton_pattern
> So which is the best one, out of 5 examples mentioned in above url, for web
> applicat
Thanks Shawn.
To have singleton design pattern for SolrServer object creation,
I found that there are so many ways described in
http://en.wikipedia.org/wiki/Singleton_pattern
So which is the best one, out of 5 examples mentioned in above url, for web
application in general practice.
I am sure lots
If hibernate search is like regular hibernate ORM I'm not sure I'd
trust it to pick the most optimal solutions...
Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jun 26, 2013 4:44 PM, "Guido Medina" wrote:
> Never heard of embedded Solr server, isn't better to just use lucene alone
>From https://wiki.apache.org/solr/SolrReplication I understand that index
dir and any files under the conf dir can be replicated to slaves. I want to
know if there is any way the files under the data dir containing external
file fields can be replicated. These are not replicated by default.
Curren
Thanks for the feedback Daniel ... For now, I've opted to just kill
the JVM with System.exit(1) in the SolrDispatchFilter code and will
restart it with a Linux supervisor. Not elegant but the alternative of
having a zombie Solr instance walking around my cluster is much worse
;-) Will try to dig in
On Wed, Jun 26, 2013 at 4:43 PM, Guido Medina wrote:
> Never heard of embedded Solr server,
I guess that's the exciting part about Solr. Always more nuances to learn:
https://wiki.apache.org/solr/EmbeddedSolr :-)
Regards,
Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http:
Is it possible to to configure Solr to automatically grab documents in a
specidfied directory, with having to use the post command?
I've not found any way to do this, though admittedly, I'm not terribly
experienced with config files of this type.
Thanks!
-
<| A.Spielman |>
"In theory there
Yonik,
Thanks, your answer works!
On Wed, Jun 26, 2013 at 2:07 PM, Yonik Seeley wrote:
> On Wed, Jun 26, 2013 at 4:02 PM, Arun Rangarajan
> wrote:
> >
> http://docs.lucidworks.com/display/solr/Working+with+External+Files+and+Processes
> > says
> > this about external file fields:
> > "They can
On Wed, Jun 26, 2013 at 4:02 PM, Arun Rangarajan
wrote:
> http://docs.lucidworks.com/display/solr/Working+with+External+Files+and+Processes
> says
> this about external file fields:
> "They can be used only for function queries or display".
> I understand how to use them in function queries, but h
The only way is using a frange (function range) query:
q={!frange l=0 u=10}my_external_field
Will pull out documents that have your external field with a value
between zero and 10.
Upayavira
On Wed, Jun 26, 2013, at 09:02 PM, Arun Rangarajan wrote:
> http://docs.lucidworks.com/display/solr/Wor
Oh this is good!
On Wed, Jun 26, 2013 at 12:05 PM, Shawn Heisey wrote:
> On 6/25/2013 6:15 PM, Jack Krupansky wrote:
> > Are you using Tomcat?
> >
> > See:
> > http://wiki.apache.org/solr/SolrTomcat#Enabling_Longer_Query_Requests
> >
> > Enabling Longer Query Requests
> >
> > If you try to subm
On 6/26/2013 1:36 PM, Mike L. wrote:
Here's the scrubbed version of my DIH: http://apaste.info/6uGH
It contains everything I'm more or less doing...pretty straight forward.. One thing to note and I
don't know if this is a bug or not, but the batchSize="-1" streaming feature doesn't seem
to wor
Never heard of embedded Solr server, isn't better to just use lucene
alone for that purpose? Using a helper like Hibernate? Since most
applications that require indexes will have a relational DB behind the
scene, it would not be a bad idea to use a ORM combined with Lucene
annotations (aka hibe
Ooh, I guess Jetty is trapping that java.lang.OutOfMemoryError, and
throwing it/packaging it as a java.lang.RuntimeException. The -XX option
assumes that the application doesn't handle the Errors and so they would
reach the JVM and thus invoke the handler.
Since Jetty has an exception handler that
AFAIK solrj is just the network client that connects to a Solr server
using Java, now, if you just need to index your data on your local HDD
you might want to step back to Lucene. I'm assuming you are using Java
so you could also annotate your POJO's with Lucene annotations, google
hibernate-se
Thanks, Shawn & Jack. I will go with the wiki and use autoCommit with
openSearcher set to false.
On Wed, Jun 26, 2013 at 10:23 AM, Jack Krupansky wrote:
> You need to do occasional hard commits, otherwise the update log just
> grows and grows and gets replayed on each server start.
>
> -- Jack K
http://docs.lucidworks.com/display/solr/Working+with+External+Files+and+Processes
says
this about external file fields:
"They can be used only for function queries or display".
I understand how to use them in function queries, but how do I retrieve the
values for display?
If I want to fetch only t
A little more to this ...
Just on chance this was a weird Jetty issue or something, I tried with
the latest 9 and the problem still occurs :-(
This is on Java 7 on debian:
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23
Thanks for the response.
Here's the scrubbed version of my DIH: http://apaste.info/6uGH
It contains everything I'm more or less doing...pretty straight forward.. One
thing to note and I don't know if this is a bug or not, but the batchSize="-1"
streaming feature doesn't seem to work, at leas
Yes, it is possible by running an embedded Solr inside SolrJ process.
The nice thing is that the index is portable, so you can then access
it from the standalone Solr server later.
I have an example here:
https://github.com/arafalov/solr-indexing-book/tree/master/published/solrj
, which shows Solr
Take a look at LucidWorks Search for automated crawler scheduling:
http://docs.lucidworks.com/display/help/Create+or+Edit+a+Schedule
http://docs.lucidworks.com/display/lweug/Data+Source+Schedules
ManifoldCF also has crawler job scheduling:
http://manifoldcf.apache.org/release/trunk/en_US/end-user
I currently have a SOLRJ program which I am using for indexing the data in
SOLR. I am trying to figure out a way to build index without depending on
running instance of SOLR. I should be able to supply the solrconfig and
schema.xml to the indexing program which in turn create index files that I
can
Recently upgraded to 4.3.1 but this problem has persisted for a while now ...
I'm using the following configuration when starting Jetty:
-XX:OnOutOfMemoryError="/home/solr/oom_killer.sh 83 %p"
If an OOM is triggered during Solr web app initialization (such as by
me lowering -Xmx to a value that
This kind of text processing is called entity extraction. I'm not up to date on
what is available in Solr, but search on that.
wunder
On Jun 26, 2013, at 10:26 AM, Warren H. Prince wrote:
> We receive about 100 documents a day of various sizes. The documents
> could pertain to any of 40
Is it possible to to configure Solr to automatically grab documents in a
specidfied directory, with having to use the post command?
I've not found any way to do this, though admittedly, I'm not terribly
experienced with config files of this type.
Thanks!
-
<| A.Spielman |>
"In theory the
Thanks Erick, that's a very helpful answer.
Regarding the grouping option, does that require all the docs to be put
into a single collection, or could it be done with across N collections
(assuming each collection had a common "type" field for grouping on)?
Chris
On Wed, Jun 26, 2013 at 7:01 AM
Thank you Erick!
Will look at all these suggestions.
-Vinay
On Wed, Jun 26, 2013 at 6:37 AM, Erick Erickson wrote:
> Right, unfortunately this is a gremlin lurking in the weeds, see:
> http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock
>
> There are a couple of ways to deal wit
You need to do occasional hard commits, otherwise the update log just grows
and grows and gets replayed on each server start.
-- Jack Krupansky
-Original Message-
From: Arun Rangarajan
Sent: Wednesday, June 26, 2013 1:18 PM
To: solr-user@lucene.apache.org
Subject: Solr 4.2.1 - master
We receive about 100 documents a day of various sizes. The documents
could pertain to any of 40,000 contacts stored in our database, and could
include more than one. For each file we have, we maintain a list of contacts
that are related to or involved in that file. I know it will nev
On 6/26/2013 11:18 AM, Arun Rangarajan wrote:
> Upgraded from Solr 3.6.1 to 4.2.1. Since we wanted to use atomic updates,
> we enabled updateLog and made the few unstored int and boolean fields as
> "stored". We have a single master and a single slave and all the queries go
> only to the slave. We
Upgraded from Solr 3.6.1 to 4.2.1. Since we wanted to use atomic updates,
we enabled updateLog and made the few unstored int and boolean fields as
"stored". We have a single master and a single slave and all the queries go
only to the slave. We make only max. 50 atomic update requests/hour to the
m
On 6/26/2013 10:58 AM, Mike L. wrote:
>
> Hello,
>
>I'm trying to execute a parallel DIH process and running into heap
> related issues, hoping somebody has experienced this and can recommend some
> options..
>
>Using Solr 3.5 on CentOS.
>Currently have JVM heap 4GB
Hi Mike,
Have you considered trying something like jhat or visualvm to see what's
taking up room on the heap?
http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html
http://visualvm.java.net/
Michael Della Bitta
Applications Developer
o: +1 646 532 3062 | c: +1 917 477 7906
app
Hello,
I'm trying to execute a parallel DIH process and running into heap
related issues, hoping somebody has experienced this and can recommend some
options..
Using Solr 3.5 on CentOS.
Currently have JVM heap 4GB min , 8GB max
When executing the entities in a se
Hi,
I have the current worklow, which works fine:
- User enters search text
- Text is send to SOLR as query. Quite some faceting is also include in the
request.
- Result comes back and extensive facet information is displayed.
Now I want to allow my user to enter a whole reference text as searc
On 6/26/2013 8:51 AM, Furkan KAMACI wrote:
> If I get a document that has a "lang" field holds "*tr*" I want that:
>
> ...
>
>
Changing the TYPE of a field based on the contents of another field
isn't possible. The language detection that has been mentioned in your
other replies makes it possi
On 6/25/2013 6:15 PM, Jack Krupansky wrote:
> Are you using Tomcat?
>
> See:
> http://wiki.apache.org/solr/SolrTomcat#Enabling_Longer_Query_Requests
>
> Enabling Longer Query Requests
>
> If you try to submit too long a GET query to Solr, then Tomcat will
> reject your HTTP request on the ground
On Wed, Jun 26, 2013 at 11:46 AM, Jack Krupansky
wrote:
> But there are also built-in "language identifier" update processors that can
> simultaneously identify what language is used in the input value for a field
> AND do the redirection to a language-specific field AND store the language
> code.
You can certainly do redirection of input values in an update processing,
even in a JavaScript script.
But there are also built-in "language identifier" update processors that can
simultaneously identify what language is used in the input value for a field
AND do the redirection to a language-
On 6/25/2013 11:52 PM, Sandeep Gupta wrote:
> Also in application development side,
> as I said that I am going to use HTTPSolrServer API and I found that we
> shouldn't create this object multiple times
> (as per the wiki document http://wiki.apache.org/solr/Solrj#HttpSolrServer)
> So I am plannin
Erick, thanks for the response.
I think the stats component works with strings.
In StatsValuesFactory, I see the following code:
public static StatsValues createStatsValues(SchemaField sf) {
...
else if (StrField.class.isInstance(fieldType)) {
return new StringStatsValues(sf);
}
Obviously I messed up with email thread...however I found a problem
indexing my document via post.sh.
This is basically my schema.xml:
url
and this is the document I tried to upload via post.sh:
http://test.example.org/first.html
1000
1000
1000
5000
See Mark's comments on the Jira when I asked that question.
My take: If 4.4 happens real soon (which some people have proposed), then it
may not make it into 4.4. But if a 4.4 RC doesn't happen for another couple
of weeks (my inclination), then the HDFS support could well make it into
4.4. If
Your other best friend is &debug=query on the URL, you might
be seeing different parsed queries than you expect, although that
doesn't really hold water given you say SolrJ fixes things.
I'd be surprised if posting the xml was the culprit, but you never
know. Did you re-index after schema changes
Pardon, my unfamiliarity with the Solr development process.
Now that it's in the trunk, will it appear in the next 4.X release?
--
David
On Wed, Jun 26, 2013 at 9:42 AM, Erick Erickson wrote:
> Well, it's been merged into trunk according to the comments, so
>
> Try it on trunk, help with
Yes! A rather extreme difference and you probably want it in both.
The admin/analysis page is your friend.
Basically, putting stuff in the type="index" section dictates what
goes into the index, and that is _all_ that is searchable. The result
of the full analysis chain is what's in the index and
If there is a bug... we should identify it. What's a sample post command
that you issued?
-- Jack Krupansky
-Original Message-
From: Flavio Pompermaier
Sent: Wednesday, June 26, 2013 10:53 AM
To: solr-user@lucene.apache.org
Subject: Re: URL search and indexing
I was doing exactly tha
I was doing exactly that and, thanks to the administration page and
explanation/debugging, I checked if results were those expected.
Unfortunately, results were not correct submitting updates trough post.sh
script (that use curl in the end).
Probably, if it founds the same tag (same value for the s
I use Solr 4.3.1 as SolrCloud. I know that I can define analyzer at
schema.xml. Let's assume that I have specialized my analyzer for Turkish.
However I want to have another analzyer too, i.e. for English. I have that
fields at my schema:
...
...
I have a field type as text_tr that is combined fo
>From the stats component page:
"The stats component returns simple statistics for indexed numeric
fields within the DocSet"
So string, text, anything non-numeric won't work. You can declare it
multiValued but then
you have to add multiple values for the field when you send the doc to
Solr or imp
Flavio:
You mention that you're new to Solr, so I thought I'd make sure
you know that the admin/analysis page is your friend! I flat
guarantee that as you try to index/search following the suggestions
you'll scratch your head at your results and you'll discover that
the analysis process isn't
bq: Would the above setup qualify as "multiple compatible collections"
No. While there may be enough fields in common to form a single query,
the TF/IDF calculations will not be "compatible" and the scores from the
various collections will NOT be comparable. So simply getting the list of
top N doc
Yes, the LimitTokenCountFilterFactory will do the trick.
I have some examples in the book, showing for a given input string, what the
output tokens will be.
Otherwise, the Solr Javadoc does given one generic example, but without
showing how it actually works:
http://lucene.apache.org/core/4_
On the lengthy TODO list is making SolrCloud nodes "rack aware"
that should help with this, but it's not real high in the priority queue
as I recall. The current architecture sends updates and requests
all over the cluster, so there are lots of messages that go
across the presumably expensive pipe
The field I am grouping on is a single-valued string.
It looks like in non-distributed mode if I use group=true, sort,
group.sort, and
group.limit=1, it will..
- group the results
- sort with in each group
- limit down to 1 result per group
- apply the sort between groups using the single result
Well, it's been merged into trunk according to the comments, so
Try it on trunk, help with any bugs, buy Mark beer.
And, most especially, document up what it takes to make it work.
Mark is juggling a zillion things and I'm sure he'd appreciate any
help there.
Erick
On Tue, Jun 25, 2013 at 1
Hi,
We will have two categories of data, where one category will be the list of
primary data (for example products) and the other collection (it could be
spread across shards) holds the transaction data (for example product sales
data).
We have search scenario where we need to show the products
Right, unfortunately this is a gremlin lurking in the weeds, see:
http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock
There are a couple of ways to deal with this:
1> go ahead and up the limit and re-compile, if you look at
SolrCmdDistributor the semaphore is defined there.
2> http
Hello,
What's the criteria used in putting an analyzer at query or index? e.g. I
want to use NGramFilterFactory, is there a difference whether I put it
under or ?
Thanks.
Mugoma
You could use an update processor to turn the text string into multiple
string values. A short snippet of JavaScript in a
StatelessScriptUpdateProcessor could do the trick. The field could then be a
multivalued string field.
-- Jack Krupansky
-Original Message-
From: Elran Dvir
Sen
On 06/25/2013 01:17 PM, eShard wrote:
let's say I have a div with id="myDiv"
Is there a way to set up the solr upate/extract handler to capture just that
particular div?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html
I tried to play a little with the tools you suggested. However, I probably
miss something because the term frequency is not that expected.
My itemid field is defined (in schema.xml) as:
I was supposing that indexing via post.sh the xml mentioned in the first
mail, the term frequency of itemid 1
Hi all,
StatsComponent doesn't work if field's type is TextField.
I get the following message:
"Field type
textstring{class=org.apache.solr.schema.TextField,analyzer=org.apache.solr.analysis.TokenizerChain,args={positionIncrementGap=100,
sortMissingLast=true}} is not currently supported".
My fie
Hi.
I ran into this issue a while ago.
In my case, the div I was trying to extract was the main content of the
page.
If that is your case, boilerpipe way help.
There is a patch at https://issues.apache.org/jira/browse/SOLR-3808 that
worked for me.
Arcadius.
On 25 June 2013 18:17, eShard wrote
I mentioned two features, [explain] and termfreq(field, 'value').
Neither of these require anything special, as they are using stuff
central to Lucene's scoring mechanisms. I think you can turn off the
storage of term frequencies, obviously that would spoil things, but
that's certainly not on my de
So, in order to achieve that feature I have to declare my fileds (authorid
and itemid) with termVectors="true" termPositions="true"
termOffsets="false"?
Should it be enough?
On Wed, Jun 26, 2013 at 10:42 AM, Upayavira wrote:
> Add fl=[explain],* to your query, and review the output in the new
>
Add fl=[explain],* to your query, and review the output in the new
field. It will tell you how the score was calculated. Look at the TF or
termfreq values, as this is the number of times the term appears.
Also, you could add this to your fl= param: count:termfreq(authorid,
'1000’) which would give
Hi to everybody,
I have some multiValued (single-token) field, for example authorid and
itemid, and what I'd like to know if there's the possibility to know how
many times a match was found in that document for some field and if the
score is higher when multiple match are found. For example, my doc
What type of field are you grouping on? What happens when you distribute
it? I.e. what specifically goes wrong?
Upayavira
On Tue, Jun 25, 2013, at 09:12 PM, Bryan Bende wrote:
> I was reading this documentation on Result Grouping...
> http://docs.lucidworks.com/display/solr/Result+Grouping
>
> w
We have a requirement to grab the first N words in a particular field and
weight them differently for scoring purposes. So I thought to use a
and have some extra filter on the destination to truncate it
down (post tokenization).
Did a quick search and found both a LimitTokenCountAnalyzer
and Lim
Ok thank you all for the great help!
Now I'm ready to start playing with my index!
Best,
Flavio
On Tue, Jun 25, 2013 at 11:40 PM, Jack Krupansky wrote:
> Yeah, URL Classify does only do so much. That's why you need to combine
> multiple methods.
>
> As a fourth method, you could code up a short
Hi all,
I have some memory problems (OOM) with Solr 3.5.0 and I suppose that it has
something to do with the fieldCache. The entries count of the fieldCache
grows and grows, why is it not rebuilt after a commit? I commit every 60
seconds, but the memory consumption of Solr increased within one day
When you say you move to different machines, did you copy the zoo_data from
your old setup, or did you just start up zookeeper and your shards one by
one? Also did you use collection API to create the collection or just
start up your cores and let them attach to ZK. I believe the ZK rules for
ass
Hello again!
The missing pivot facet when sorting by index can also be repeated in solr
4.3.1
Does anyone have an idea, how to debug this?
Best regards Johannes
-- Forwarded message --
From: jotpe
Date: 2013/6/25
Subject: facet.pivot and facet.sort does not work with fq
To: solr
74 matches
Mail list logo