I don't know wether this was discussed previously,
but if you tell the synonmyfilter to not break your synonyms (which
might be the default). In this case, the parts of the synonyms get new
word positions. So you could use a Keywordtokenizer to avoid that behaviour:
with regards,
kon
There is no way to do it within DataImportHandler but you can configure
in solrconfig.xml to automatically commit pending updates by
time or number of documents.
On Tue, Aug 14, 2012 at 4:11 PM, ravicv wrote:
> Hi,
>
> Is there any way for intermediate commits while indexing data using
> dataim
> When I send a scanned pdf to extraction request
> handler, below icon appears in my Dock.
>
> http://tinypic.com/r/2mpmo7o/6
> http://tinypic.com/r/28ukxhj/6
I found that text-extractable pdf files triggers above weird icon too.
curl
"http://localhost:8983/solr/update/extract?literal.id=solr-
Ahmet,
the dock icon appears when AWT starts, e.g. when a font is loaded.
You can prevent it using the headless mode but this is likely to trigger an
exception.
Same if your user is not UI-logged-in.
hope it helps.
Paul
Le 15 août 2012 à 01:30, Ahmet Arslan a écrit :
> Hi All,
>
> I have set
> the dock icon appears when AWT starts, e.g. when a font is
> loaded.
> You can prevent it using the headless mode but this is
> likely to trigger an exception.
> Same if your user is not UI-logged-in.
Hi Paul, thanks for the explanation. So is it nothing to worry about?
Hi Erick,
You are so right on the memory calculations. I am happy that I know now that I
was doing something wrong. Yes I am getting confused with SQL.
I will back up and let you know the use case. I am tracking file versions. And
I want to give an option to browse your system for the latest fil
Le 15 août 2012 à 13:03, Ahmet Arslan a écrit :
> Hi Paul, thanks for the explanation. So is it nothing to worry about?
it is nothing to worry about except to remember that you can't run this step in
a daemon-like process.
(on Linux, I had to set-up a VNC-server for similar tasks)
paul
> Because I have set a post in Stackoverflow, I wan't, that
> there is dublicate
> questions. Can you please read this post:
>
> http://stackoverflow.com/questions/11956608/sphinx-user-is-switching-to-solr
Your questions require Sphinx knowledge. I suggest you to read these book(s)
http://lucene
HI iorixxx, thanks for the reply.
Well you don't need sphinx knowledge to answer my questions.
I have write you what I want:
1. I need to have 2 seprate indexes. In Stackoverlfow I became the answer I
need to start 2 cores for example. How many cores can I run for solr? I have
for example over 1
Hi solr-users
I have a case where I need to build an index from a database.
***Data structure***
The data is spread across multiple tables and in each table the
records are versioned - this means that one "real" record can exist
multiple times in a table, each with different validFrom/validUntil
> 1. I need to have 2 seprate indexes. In Stackoverlfow I
> became the answer I
> need to start 2 cores for example. How many cores can I run
> for solr?
Please see : http://search-lucene.com/m/6rYti2ehFZ82
> I have for example jobs form country A, jobs from country B
> and so on until
> 100 c
Hi, Lance,
Thanks for your reply!
It seems as if RAMDirectoryFactory is being passed the correct path to
the index, as it's being logged correctly. It just doesn't recognize
it as an index.
Michael Della Bitta
Appinions | 18 East 41st St., Suite
The date checking can be implemented using range query as a filter query,
such as
&fq=startDate:[* TO NOW] AND endDate:[NOW TO *]
(You can also use an "frange" query.)
Then you will have to flatten the database tables. Your Solr schema would
have a single "merged" record type. You will have t
You can try passing -Djava.awt.headless=true as one of the arguments
when you start Jetty to see if you can get this to go away with no ill
effects.
Michael Della Bitta
Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
> You can try passing
> -Djava.awt.headless=true as one of the arguments
> when you start Jetty to see if you can get this to go away
> with no ill
> effects.
I started jetty using : 'java -Djava.awt.headless=true -jar start.jar' and
successfully indexed two pdf files. That icon didn't appeared :
On Aug 14, 2012, at 4:34 PM, Michael Della Bitta
wrote:
> Hi everyone,
>
> It looks like I found a bug with RAMDirectoryFactory (I know, I know...)
>
Fair warning - RAMDir use in Solr is like a third class citizen. You probably
should be using the mmap dir anyway.
See http://blog.thetaphi.d
Yes, moving to mmap was on our roadmap. I'm in the middle of moving
our infrastructure from 1.4 to 3.6.1, and didn't want to make too many
changes at the same time. However, this bug might push us over the
edge to mmap and away from ram.
I'll file a bug regardless.
Thanks!
Michael Della Bitta
-
You would index rectangles of 0 height but that have a left edge 'x' of the
start time and a right edge 'x' of your end time. You can index a variable
number of these per Solr document and then query by either a point or
another rectangle to find documents which intersect your query shape. It
can
On Tue, Aug 14, 2012 at 5:37 PM, Jonatan Fournier
wrote:
> On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson
> wrote:
>> This is quite odd, it really sounds like you're not
>> actually committing. So, some questions.
>>
>> 1> What happens if you search before you shut
>> down your tomcat? Do you s
These do require some Sphinx knowledge. I could answer them on StackOverflow
because I converted Chegg from Sphinx to Solr this year.
As I said there, read about Solr cores. They are independent search
configurations and indexes within one Solr server:
http://wiki.apache.org/solr/CoreAdmin
Fo
No problem, and thanks for posting the resolution
If you have the time and energy, anyone can edit the Wiki if you
create a logon, so any clarification you'd like to provide to keep
others from having this problem would be most welcome!
Best
Erick
On Tue, Aug 14, 2012 at 6:13 PM, Buttler, Da
the problem you're running into is that lexical ordering of
numeric data != numeric ordering. If you have a mixed
alpha and numeric data, you man not care if the alpha
stuff is first, i.e.
asdb456
asdf490
sorts fine. Problems happen with
9jsdf
100ukel
the 100ukel comes first.
So if you have a m
Please attach the results of adding &debugQuery=on
to your query in both the success and failure case, there's
very little information to go on here. You might review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Wed, Aug 15, 2012 at 12:57 AM, chethan wrote:
> Hi,
>
> I'm trying
No, sharding into multiple cores on the same machine still
is limited by the physical memory available. It's still lots
of stuf on a limited box.
But try backing up and re-thinking the problem a bit.
Some possibilities off the top of my head:
1> have a new field "current". when you update a d
Hey solr-user, are you by chance indexing LineStrings? That is something I
never tried with this spatial index. Depending on which iteration of LSP
you are using, I figure you'd either end up indexing a vast number of points
along the line which would be slow to index and make the index quite big
I am pulling some fields from a mysql database using DataImportHandler and
some of them have invalid XML in them. Does DataImportHandler do any kind
of filtering/sanitizing to ensure that it will go in OK or is it all on me?
Example bad data: orphaned ampersands ("Peanut Butter & Jelly"), curly
Hi, Jon,
As far as I know, DataImportHandler doesn't transfer data to the rest
of Solr via XML so it shouldn't be a problem...
Michael Della Bitta
Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’
Hi,
I have to index a tuple like ('blah', 'more blah info') in a multivalued
field type.
I have read about the PolyField type and it seems the best solution so far
but i can't find documentation pointing how to use or implement a custom
field.
Any help is appreciated.
--
Leonardo S Souza
Hello,
I created an index => all the schema.xml & solrconfig.xml files are
created with content (I checked that they have contents in the xml files).
But, if I poweroff the system & restart again - the contents of the files
are gone. It's like 0 bytes files.
Even, the solr.xml file which got up
Just guessing,.
disk full?
--
Abraços,
Leonardo S Souza
2012/8/15 vempap
> Hello,
>
> I created an index => all the schema.xml & solrconfig.xml files are
> created with content (I checked that they have contents in the xml files).
> But, if I poweroff the system & restart again - the conte
nopes .. there is good amount of space left on disk
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-xml-entries-got-deleted-when-powered-off-tp4001496p4001502.html
Sent from the Solr - User mailing list archive at Nabble.com.
It's happening when I'm not doing a clean shutdown. Are there any more
scenarios it might happen ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-xml-entries-got-deleted-when-powered-off-tp4001496p4001503.html
Sent from the Solr - User mailing list archive at Nabble.co
You are not putting these files in /tmp are you? That is sometimes wiped by
different OS's on shutdown
-Original Message-
From: vempap [mailto:phani.vemp...@emc.com]
Sent: Wednesday, August 15, 2012 3:31 PM
To: solr-user@lucene.apache.org
Subject: Re: solr.xml entries got deleted when
No, I'm not keeping them in /tmp
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-xml-entries-got-deleted-when-powered-off-tp4001496p4001506.html
Sent from the Solr - User mailing list archive at Nabble.com.
: 2> Use external file fields (EFF) for the same purpose, that
: won't require you to re-index the doc. The trick
: here is you use the value in the EFF as a multiplier
: for the score (that's what function queries do). So older
: versions of the doc have scores of 0 and just d
Haven't managed to find a good way to do this yet. Does anyone have any
ideas on how I could implement this feature?
Really need to move docs across from one core to another atomically.
Many thanks,
Nicholas
On Mon, 02 Jul 2012 04:37:12 -0600, Nicholas Ball
wrote:
> That could work, but then ho
在 2012-7-2 傍晚6:37,"Nicholas Ball" 写道:
>
>
> That could work, but then how do you ensure commit is called on the two
> cores at the exact same time?
that may needs something like two phrase commit in relational dB. lucene
has prepareCommit, but to implement 2pc, many things need to do.
> Also, any w
do you really need this?
distributed transaction is a difficult problem. in 2pc, every node could
fail, including coordinator. something like leader election needed to make
sure it works. you maybe try zookeeper.
but if the transaction is not very very important like transfer money in
bank, you can
http://zookeeper.apache.org/doc/r3.3.6/recipes.html#sc_recipes_twoPhasedCommit
On Thu, Aug 16, 2012 at 7:41 AM, Nicholas Ball
wrote:
>
> Haven't managed to find a good way to do this yet. Does anyone have any
> ideas on how I could implement this feature?
> Really need to move docs across from on
Awesome thanks a lot, I am already on it with option 1. We need to track delete
to flip the previous one as the current.
Erick Erickson wrote:
No, sharding into multiple cores on the same machine still
is limited by the physical memory available. It's still lots
of stuf on a limited box.
But.
If you want to sanitize them during indexing, the regular expression
tools can do this. You would create a regular expression that matches
bogus elements. There is a regular expression transformer in the DIH,
and a regular expression CharFilter inside the Lucene text analysis
stack.
On Wed, Aug 15
Hi all:
I'm using DataImportHandler load data from MySQL.
It works fine on my develop machine and online environment.
But I got an exception on test environment:
> Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
>> Communications link failure
>
>
>> The last packet sent success
I see the problem, but there are no possibilities for normalization as the
upper limit could be anything in different cases (hard to explain).
I think it is better for me to just apply the correct type of sorting with
an array/list with some script. This is just for getting the facet values to
look
43 matches
Mail list logo