Any updates on this?
--
View this message in context:
http://lucene.472066.n3.nabble.com/TikaEntityProcesor-Exception-Handling-tp3502495p4129580.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi,
we are currently facing a new problem while reindexing one of our SOLR
4.4 instances:
We are using SOLR 4.4 getting data via DIH out of a MySQL Server.
The data is constantly growing.
We have reindexed our data a lot of times without any trouble.
The problem can be reproduced.
There is an
Hi all,
this is my first message on this mailing list, so I hope I'm doing all
correctly.
My problem is: I have to create a search engine of dealers that are in a
well defined routing distance from the address entered by the user. I have
already used Solr for some previous works, but I never neede
Hello Alex,
I saw your example and took it as template for my needs.
I tried with the aliasing, but, maybe because I did it wrong, it does not
work...
"error": {
"msg": "undefined field all",
"code": 400
}
Here is a snippet of my solrconfig.xml:
...
Thanks Hoss, with the filter queries it works. I was trying to use a normal
query from Mikhail's blog that looked like this:
q={!parent which=type_s:parent}+search_t:item1 +search_t:item2
-search_t:item3
That query doesn't work for me but the filter query does just what I want.
ps last years stu
Sorry, found the problem myself...
I used the /select where the edismax was not defined.
The other two, /selectEN and /selectDE, worked.
Adding the edismax to the /select made it work too.
Ciao
Francesco
-Original Message-
From: Croci Francesco Luigi (ID SWS) [mailto:fcr...@id.ethz.ch
Hello,
i have a problem of bad request during indexing data.
I have for nodes with solr cloud. The architecture is this:
10.0.0.86 10.0.0.87
NODE1 NODE 2
| |
| |
| |
| |
NODE 3
Furkan,
I haven't worked with the boundary scanner before, but one thing I had to
tweak with position increments was the highlighter component itself.
Because it started to throw exceptions. The solution is described in this
thread (a conversation with myself :) )
http://mail-archives.apache.org/
for sake of completeness, here is the same query w/o fq
q=+{!parent which=type_s:parent}search_t:item1 +{!parent
which=type_s:parent}search_t:item2 -{!parent
which=type_s:parent}search_t:item3
here is more detail about the first symbol magic
http://www.mail-archive.com/solr-user@lucene.apache.org
Hello,
In my index, i am using the LatlonType, for using the geodist to calculate
the distance, and i am using it like geodist(lat, lon, location). Can
anybody told me what value the geodist will return if i will pass
geodist(0, 0, location)
Thanks
Aman Tandon
Yeah, that works also for me. Thanks Mikhail.
On Mon, Apr 7, 2014 at 12:42 PM, Mikhail Khludnev [via Lucene] <
ml-node+s472066n4129604...@n3.nabble.com> wrote:
> for sake of completeness, here is the same query w/o fq
>
> q=+{!parent which=type_s:parent}search_t:item1 +{!parent
> which=type_s:pa
Do you mean to tell me that the people on this list that are indexing 100s of
millions of documents are doing this over http? I have been using custom
Lucene code to index files, as I thought this would be faster for many
documents and I wanted some non-standard OCR and index fields. Is there
Dear list,
We have been generating solr indices with the solr-hadoop contrib module
(SOLR-1301). Our current solr in use is of 4.3.1 version. Is there any tool
that could do the backward conversion, i.e. 4.7->4.3.1? Or is the upgrade
the only way to go?
--
Dmitry
Blog: http://dmitrykan.blogspot.
it works well. now why does the search only find something when the
fieldname is added to the query with stopwords?
"cug" -> 9 hits
"mit cug" -> 0 hits
"plain_text:mit cug" -> 9 hits
why is this so? could it be a problem that stopwords aren't used in the
query because no all fields that are
I've defined a elevator as like that:
When I send a query it gives error
of: org.apache.solr.common.SolrException: Boosting query defined twice for
query
When I check the source code it says:
map.containsKey( elev.analyzed )
What I want is that:
when a user e
You say you see the commit happen in the log, is openSearcher
specified? This sounds like you're somehow getting a commit
with openSearcher=false...
Best,
Erick
On Sun, Apr 6, 2014 at 5:37 PM, Jamie Johnson wrote:
> I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to
> work
See: https://tika.apache.org/1.4/formats.html
short answer "yes".
Longer answer: It would be a lot easier to reply meaningfully if you
told us what you were trying to do.
You might want to review:
http://wiki.apache.org/solr/UsingMailingLists
Best,
Erick
On Sun, Apr 6, 2014 at 11:20 PM, Алекс
Could I define a pattern for hl.bs.chars? I mean *$* shows the start or end
of a string at my documents and I want to define it as regex to hl.bs.chars?
On the other hand I do not use currently termVectors=on, termPositions=on
and termOffsets=on at my fields. Does it cause a performance issue or
b
On Mon, 2014-04-07 at 13:52 +0200, Jonathan Varsanik wrote:
> Do you mean to tell me that the people on this list that are indexing
> 100s of millions of documents are doing this over http?
Some of us do. Our net archive indexer runs a lot of Tika processes that
sends their analysed documents thro
Hi,
This is definitely not possible with Solr. Use GraphHopper.
~ David
On Mon, Apr 7, 2014 at 5:09 AM, Matteo Tarantino wrote:
> Hi all,
> this is my first message on this mailing list, so I hope I'm doing all
> correctly.
>
> My problem is: I have to create a search engine of dealers that ar
Hi,
I'm not sure why you are asking or maybe I'm not getting what you *really*
want to know. You'll get the geodesic distance (i.e. the "great circle
distance", the distance on the surface of a sphere) from 0,0 (off the coast
of Africa), to each point indexed in your "location" field.
~ David
You can use Solrj : https://wiki.apache.org/solr/Solrj
Anyway, even using http the performance is good.
André
On 2014-04-07 13:52, Jonathan Varsanik wrote:
Do you mean to tell me that the people on this list that are indexing 100s of
millions of documents are doing this over http? I have been
Thanks Ahmat and Jack for replying.
I found a another way to solve the problem by using FilterQuery.
fq=RuleA:*+OR+RuleC:*
but due to development platform query parsing stuck some where else.
Hopefully after platform fix it will work for me.
I will get back to you if any other issue occurred.
On 4/7/2014 3:00 AM, Ralf Matulat wrote:
we are currently facing a new problem while reindexing one of our SOLR
4.4 instances:
We are using SOLR 4.4 getting data via DIH out of a MySQL Server.
The data is constantly growing.
We have reindexed our data a lot of times without any trouble.
The pr
What does the call look like? Are you setting opening a new searcher
or not? That should be in the log line where the commit is recorded...
FWIW,
Erick
On Sun, Apr 6, 2014 at 5:37 PM, Jamie Johnson wrote:
> I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to
> work when I am
Hi all,
I know someone has posted similar question before. But my case is little
different as I don't have the schema set up issue mentioned in those posts
but still get duplicate records.
My unique key in schema is
id$
Search on Solr- admin UI: id$:1
I got two documents
{
On 4/7/2014 5:52 AM, Jonathan Varsanik wrote:
Do you mean to tell me that the people on this list that are indexing 100s of
millions of documents are doing this over http? I have been using custom
Lucene code to index files, as I thought this would be faster for many
documents and I wanted so
Hi;
I try that but it does not work do I miss anything:
q=portu&hl.regex.pattern=.*\*\|\*.*&hl.fragsize=120&hl.regex.slop=0.2
My aim is to check whether it includes *|* or not (that's why I've put .*
beginning and end of the regex to achieve whatever you match)
How to fix it?
Thanks;
Furkan KA
That was my first attempt, but it's much trickier than I anticipated.
A filter that calls HttpServletRequest#getParameter() before
SolrDispatchFilter will trigger an exception -- see
getParameterIncompatibilityException [1] -- if the request is a POST. It
seems that Solr depends on the configured
Hmmm, that's odd. I just tried it (admittedly with post.jar rather
than SolrJ) and it works just fine.
what server are you using (e.g. CloudSolrServer)? And can you create a
self-contained program that illustrates the problem?
Best,
Erick
On Mon, Apr 7, 2014 at 8:50 AM, Simon wrote:
> Hi all,
>
One more question: does that regex works on analyzed field or raw data?
2014-04-07 19:21 GMT+03:00 Furkan KAMACI :
> Hi;
>
> I try that but it does not work do I miss anything:
>
> q=portu&hl.regex.pattern=.*\*\|\*.*&hl.fragsize=120&hl.regex.slop=0.2
>
> My aim is to check whether it includes *|
Hi does anybody know where the ranking code is held. Which file in Solr
stores it the solr schema.xml or solrconfig.xml file?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Ranking-code-tp4129664.html
Sent from the Solr - User mailing list archive at Nabble.com.
Maybe you should try a more recent release of Luke:
https://github.com/DmitryKey/luke/releases
François
On Apr 7, 2014, at 12:27 PM, azhar2007 wrote:
> Hi All,
>
> I have a solr index which is indexed ins Solr.4.7.0.
>
> Ive attempted to open the index with Luke4.0.0 and also other v
Hi All,
I have a solr index which is indexed ins Solr.4.7.0.
Ive attempted to open the index with Luke4.0.0 and also other verisons with
no luck.
Gives me an error message.
Is there a way of reading the data?
I would like to convert the file to a readable format where i can see the
terms it hol
I have to agree with Shawn. We have a SolrCloud setup with 256 shards,
~400M documents in total, with 4-way replication (so its quite a big
setup!) I had thought that HTTP would slow things down, so we recently
trialed a JNI approach (clients are C++) so we could call SolrJ and get the
benefits o
So to rephrase:
Solr will barf at unknown parameters, so we cannot currently send them in
band.
And the out of band dies not work due to post body handling complexity.
You are proposing effectively a dynamic set with common prefix to stop the
complaints. Plus the code to propagate those params.
I think it was not just rootEntity="true".
We need to add transformer="TemplateTransformer" and make sure that each
entity has some kind of Unique column across all entities e.g. in this case
is a made up column and this doc_id values should be unique across all
entities. temp
I had to grapple with something like this problem when I wrote Lux's
app-server. I extended SolrDispatchFilter and handle parameter
swizzling to keep everything nicey-nicey for Solr while being able to
play games with parameters of my own. Perhaps this will give you some
ideas:
https://gith
On 4/7/2014 10:29 AM, azhar2007 wrote:
Hi does anybody know where the ranking code is held. Which file in Solr
stores it the solr schema.xml or solrconfig.xml file?
Your question is very generic. It needs to be more specific -- what are
you actually trying to do?
The generic answer is "both
Erick,
It's indeed quite odd. And after I trigger re-indexing all documents (via
the normal process of existing program). The duplication is gone. It can
not be reproduced easily. But it did occur occasionally and that makes it a
frustrating task to troubleshoot.
Thanks,
Simon
--
View this
Yonik,
Requesting
fl=unique_key:field(unique_key),secondary_key:field(secondary_key),score vs
fl=unique_key,secondary_key,score was a nice performance win, as unique_key
and secondary_key were both already in the fieldCache. We removed our
documentCache, in fact, as it got very such little use.
W
Michael,
Thanks! Unfortunately, as we use POSTs, that approach would trigger the
getParameterIncompatibilityException call due to the Enumeration of
getParameterNames before SolrDispatchFilter has a chance to access the
InputStream.
I opened https://issues.apache.org/jira/browse/SOLR-5969 to disc
I wanted to take a moment and say thank you for your help. We haven't
solved the problem yet but it seems like we may be on the path.
Responses to your questions below:
1) We are using settings of 6GBs for -Xmx and -Xms on a production server
where this process is failing on about 30 million rel
Oh my yes! I feel a great sense of relief every time an intermittent
problem becomes reproducible... The problem is not solved, but at
least I have a good feeling that once I don't see it any more it's
_really_ gone!
One possibility is index merging, see:
https://wiki.apache.org/solr/MergingSolrIn
Tom,
You should be using JapaneseAnalyzer (kuromoji).
Neither CJK nor ICU tokenize at word boundaries.
On 04/02/2014 10:33 AM, Tom Burton-West wrote:
Hi Shawn,
I'm not sure I understand the problem and why you need to solve it at the
ICUTokenizer level rather than the CJKBigramFilter
Can you pe
On 4/7/2014 2:07 PM, T. Kuro Kurosaka wrote:
Tom,
You should be using JapaneseAnalyzer (kuromoji).
Neither CJK nor ICU tokenize at word boundaries.
Is JapaneseAnalyzer configurable with regard to what it does with
non-japanese text? If it's not, it won't work for me.
We use a combination of
The speed of ingest via HTTP improves greatly once you do two things:
1. Batch multiple documents into a single request.
2. Index with multiple threads at once.
Michael Della Bitta
Applications Developer
o: +1 646 532 3062
appinions inc.
"The Science of Influence Marketing"
18 East 41st Stre
Hi,
I had similar problems before. We were trying to do same thing as you, fetching
too many small documents from Oracle with dih. We were getting
Caused by: java.sql.SQLException: ORA-01652: unable to extend temp segment by
128 in tablespace TS_TEMP ORA-06512: at "IZCI.GET_FEED_KEYWORDS", lin
Yes I did restart solr, but did not re-index. Is that necessary? We've
got 80G of indexed data, is there a "preferred" way of doing it without
impacting performance?
On Sat, Apr 5, 2014 at 9:44 AM, Ahmet Arslan wrote:
> Hi,
>
> Did restart solr and you re-index after schema change?
>On Sa
Yes, I see. SolrDispatchFilter is - not really written with
extensibility in mind.
-Mike
On 4/7/14 3:50 PM, Gregg Donovan wrote:
Michael,
Thanks! Unfortunately, as we use POSTs, that approach would trigger the
getParameterIncompatibilityException call due to the Enumeration of
getParameterN
I have had this exact same use case and we ended up just setting a header
value, then in a Servlet Filter we read the header value and set the MDC
property within the filter. By reading the header value it didn’t complain
about reading the request before making it to the SolrDispatchFilter. We u
This. And so much this. As much this as you can muster.
On Apr 7, 2014, at 1:49 PM, Michael Della Bitta
wrote:
> The speed of ingest via HTTP improves greatly once you do two things:
>
> 1. Batch multiple documents into a single request.
> 2. Index with multiple threads at once.
>
> Michael
Below is the log showing what I believe to be the commit
07-Apr-2014 23:40:55.846 INFO [catalina-exec-5]
org.apache.solr.update.processor.LogUpdateProcessor.finish [forums]
webapp=/solr path=/update/extract
params={uprefix=attr_&literal.source_id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcce&literal.conten
The regex pattern should match the text of the fragment. IOW, exclude
whatever delimiters are not allowed in the fragment.
The default is:
[-\w ,\n"']{20,200}
-- Jack Krupansky
-Original Message-
From: Furkan KAMACI
Sent: Monday, April 7, 2014 10:21 AM
To: solr-user@lucene.apache.or
Thanks, François.
azhar2007: remember to set the perm gen size:
java -XX:MaxPermSize=512m -jar luke-with-deps.jar
Dmitry
On Mon, Apr 7, 2014 at 7:29 PM, François Schiettecatte <
fschietteca...@gmail.com> wrote:
> Maybe you should try a more recent release of Luke:
>
> https://github.c
Hi,
Have all of your nodes the same configuration?
2014-04-07 12:45 GMT+03:00 Gastone Penzo :
> Hello,
> i have a problem of bad request during indexing data.
> I have for nodes with solr cloud. The architecture is this:
>
> 10.0.0.86 10.0.0.87
> NODE1 NODE 2
> |
56 matches
Mail list logo