Hello Friend,
I am working on delta-import I have configured according to given article
"http://wiki.apache.org/solr/DataImportHandler#head-9ee74e0ad772fd57f6419033fb0af9828222e041";.
but every time when i am executing delta-import through DIH it picked only
changed data that is ok, but rather th
Thanks Mikhail,what I mean is that when I index an instance of my POJO
which has a property of List type with Field annotation and it's element is a
complex type while not primitive type,
such as my own Contact class, can solr index this instance successfully?
If successful how can I retrieve via
Hi, Erick,
I get your point. Thank you so much.
Best Regards,
Bing
--
View this message in context:
http://lucene.472066.n3.nabble.com/TikaLanguageIdentifierUpdateProcessorFactory-since-Solr3-5-0-to-be-used-in-Solr3-3-0-tp3771620p3782938.html
Sent from the Solr - User mailing list archive at
As I understand it (and I'm just getting into SolrCloud myself), you can
essentially forget about master/slave stuff. If you're using NRT,
the soft commit will make the docs visible, you don't ned to do a hard
commit (unlike the master/slave days). Essentially, the update is sent
to each shard lead
You might be able to do something with the XSL Transformer step in DIH.
It might also be easier to just write a SolrJ program to parse the XML and
construct a SolrInputDocument to send to Solr. It's really pretty
straightforward.
Best
Erick
On Sun, Feb 26, 2012 at 11:31 PM, Anupam Bhattacharya
You *probaby* can update the Tika libraries in Solr, but it'll be "interesting"
to get all the right ones updated, there are a bunch of them in Tika. And I
make no guarantees.
If it proves difficult, it's not too hard to write a SolrJ program that does
the Tika extraction and run it on a client to
It runs any place that has access to the raw files and an HTTP connection
to the Solr server, which is another way of saying "sounds good to me".
Erick
On Mon, Feb 27, 2012 at 9:18 PM, bing wrote:
> HI, Erick,
>
> I can write SolrJ client to call Tika, but I am not certain where to invoke
> the
I'll have to check on the commit situation. We have been pushing data from
SharePoint the last week or so. Would that somehow block the documents
moving between the solr instances?
I'll try another version tomorrow. Thanks for the suggestions.
On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller wrote:
HI, Erick,
I can write SolrJ client to call Tika, but I am not certain where to invoke
the client. In my case, I work on Dspace to call Solr, and I suppose the
client should be invoked in-between Dspace and Solr. That is, Dspace invokes
SolrJ client when doing index/query, which call Tika and So
Hi,
We are already using embedded solr in our application. In production we have
3 app servers and each app server has a copy of index of each type. These
indexes built externally once in a week and replaced.
We now want allow incremental indexing and auto update to other servers
rather than bui
Exactly, I'm using a tint field type and works really well. The only problem
is when I have a set of very wide ranges and make Solr make fireworks out of
the blue.
Thank you a lot Michael, I appreciate your help on this one :)
--
View this message in context:
http://lucene.472066.n3.nabble.com/I
I don't know if this would help with OOM conditions, but are you using a
tint type field for this? That should be more efficient to search than
a regular int or string.
-Mike
On 02/27/2012 05:27 PM, federico.wachs wrote:
Yeah that's what I'm doing right now.
But whenever I try to index an ap
Hmmm...all of that looks pretty normal...
Did a commit somehow fail on the other machine? When you view the stats for the
update handler, are there a lot of pending adds for on of the nodes? Do the
commit counts match across nodes?
You can also query an individual node with distrib=false to che
Yeah that's what I'm doing right now.
But whenever I try to index an apartment that has many wide ranges, my
master solr server throws OutOfMemoryError ( I have set max heap to 1024m).
So I thought this could be a good workaround but puf it is a lot harder than
it seems!
--
View this message in c
Yes, I see - I think your best bet is to index every day as a distinct
value. Don't worry about having 100's of values.
-Mike
On 02/27/2012 05:11 PM, federico.wachs wrote:
This is used on an apartment booking system, and what I store as solr
documents can be seen as apartments. These apartmen
This is used on an apartment booking system, and what I store as solr
documents can be seen as apartments. These apartments can be booked for a
certain amount of days with a check in and a check out date hence the ranges
I was speaking of before.
What I want to do is to filter off the apartments t
Actually, "use raw parser unless query has dismax syntax" approach doesn't
fit, because it kills a lot of useful dismax-related functionality,
described here: http://wiki.apache.org/solr/DisMaxQParserPlugin#Parameters.
However, there is a little cleaner solution than what I originally had in
mind:
No; contiguous means there are no gaps between them.
You need something like what you described initially.
Another approach is to de-normalize your data so that you have a single
document for every range. But this might or might not suit your
application. You haven't said anything about the
Oh No, I think I understood wrong when you said that my ranges where
contiguous.
I could have ranges like this:
1 TO 15
5 TO 30
50 TO 60
And so on... I'm not sure that what you supposed would work, right?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-imp
I think your example case would end up like this:
...
1 -- single-valued range field
15
...
On 02/27/2012 04:26 PM, federico.wachs wrote:
Michael thanks a lot for your quick answer, but i'm not exactly sure I
understand your solution.
How would the docuemnt you are proposin
Michael thanks a lot for your quick answer, but i'm not exactly sure I
understand your solution.
How would the docuemnt you are proposing would look like? Do you mind
showing me a simple xml as example?
Again, thank you for your cooperation. And yes, the ranges are contiguous!
--
View this messag
If your ranges are always contiguous, you could index two fields:
range-start and range-end and then perform queries like:
range-start:[* TO 30] AND range-end:[5 TO *]
If you have multiple ranges which could have gaps in between then you
need something more complicated :)
On 02/27/2012 04:09
Hi all !
Here's my dreadful case, thank you for helping out! I want to have a
document like this:
...
-- multivalued range field
1 TO 10
5 TO 15
...
And the reason why I want to do this is because it's so much lighter than
having all the numbers in there, of
Here is most of the cluster state:
Connected to Zookeeper
localhost:2181, localhost: 2182, localhost:2183
/(v=0 children=7) ""
/CONFIGS(v=0, children=1)
/CONFIGURATION(v=0 children=25)
< all the configuration files, velocity info, xslt, etc.
/NODE_STATES(v=0 child
For what it's worth, I run Solr 3.5 on Ubuntu using the OpenJDK packages and I
haven't run into any problems. I do realize that sometimes the Sun JDK has
features that are missing from other Java implementations, but so far it hasn't
affected my use of Solr.
- Demian
> -Original Message--
On 2/27/2012 at 3:16 PM, Alexey Verkhovsky wrote:
> By the way, I'm not sure that edismax interpreting 'wal mart' as 'wal' OR
> 'mart' is really a bug that should be fixed. It's a counter-intuitive
> behavior, for sure, but - per my understanding - edismax is supposed to
> treat consecutive words a
I am looking at two different options to filter results in Solr, basically
a per-user access control list. Our index is about 2.5 million documents
The first option is to use ExternalFieldField. It seems pretty
straightforward. Just put the necessary data in the files and query against
that data.
On Mon, Feb 27, 2012 at 12:36 PM, Steven A Rowe wrote:
> Separately, do you know about the "raw" query parser[2]? I'm not sure if
> it would help, but you may be able to use it in alternate solution.
>
And explicitly route to edismax when dismax syntax is detected in the
query? That would make
I was trying to use the new interface. I see it using the old admin page.
Is there a piece of it you're interested in? I don't have access to the
Internet where it exists so it would mean transcribing it.
On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller wrote:
>
> On Feb 27, 2012, at 2:22 PM, Matth
I'm not an Ubuntu user, but I think I read somewhere that sun's jdks
packages have been removed from repositories. Don't know more details, but
you should be able to install them by yourself... download and install
appropriate rpm's, that's the way I did using Fedora 14-16
On Mon, Feb 27, 2012 at
On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
> Thanks for your reply Mark.
>
> I believe the build was towards the begining of the month. The
> solr.spec.version is 4.0.0.2012.01.10.38.09
>
> I cannot access the clusterstate.json contents. I clicked on it a couple of
> times, but nothing
Hi,
I have internships open for this summer for students interested in working on
search and machine learning. Description is below.
-Grant
Research Engineer Internship
DESCRIPTION
Lucid Imagination, the leading commercial company for Apache Lucene and Solr,
is looking for interns to work on
A quick add on to this -- we have over 30 million documents.
I take it that we should be looking @ Distributed Solr?
as in
http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e344
Thanks.
On Mon, Feb 27, 2012 at 2:33 PM, Memory Makers wrote:
> Many thanks for the response.
>
> H
Hi Alexey,
Lucene's QueryParser, and at least some of Solr's query parsers - I'm not
familiar with all of them - have the problem you mention: analyzers are fed
queries word-by-word, instead of whole strings between operators. There is a
JIRA issue for fixing this, but no work done yet:
Many thanks for the response.
Here is the revised questions:
For example if I have N processes that are producing documents to index:
1. Should I have them simultaneously submit documents to Solr (will this
improve the indexing throughput)?
2. Is there anything I can do Solr configuration wise th
Thanks for your reply Mark.
I believe the build was towards the begining of the month. The
solr.spec.version is 4.0.0.2012.01.10.38.09
I cannot access the clusterstate.json contents. I clicked on it a couple of
times, but nothing happens. Is that stored on disk somewhere?
I configured a custom r
Hi all!
I had installed an Ubuntu 10.04 LTS. I had added a 'partner' repository to
my sources list and updated it, but I can't find a package sun-java6-*:
root@ubuntu:~# apt-cache search java6
default-jdk - Standard Java or Java compatible Development Kit
default-jre - Standard Java or Java compati
My two cents:
- pulling is better than pushing -
http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update
- DIH is not thread safe https://issues.apache.org/jira/browse/SOLR-3011 But
there are few patches for trunk which fix it.
Regards
On Mon, Feb 27, 2012 at 10:46 PM, Erik Hatcher
Anyone up to provide an answer?
The idea is have a kind of CustomInteger compound by an array of
timestamps. The value shown in this field would be based in the date range
that you're sending.
Biggest problem will be that this field would be in all the documents on
your solr index so you need to
Yes, absolutely. Parallelizing indexing can make a huge difference. How you
do so will depend on your indexing environment. Most crudely, running multiple
indexing scripts on different subsets of data up to the the limitations of your
operating system and hardware is how many do it. SolrJ h
Say, there is an index of business names (fairly short text snippets),
containing: Walmart, Walmart Bakery and Mini Mart. And say we need a query
for 'wal mart' to match all three, with an appropriate ranking order. Also
need 'walmart', 'walmart bakery' and 'bakery' to find the right things in
the
Thanks for clarifying Yonik.
On Sat, Feb 25, 2012 at 3:57 PM, Yonik Seeley
wrote:
> On Sat, Feb 25, 2012 at 3:39 PM, Jamie Johnson wrote:
>> "Unfortunately, Apache Solr still uses this horrible code in a lot of
>> places, leaving us with a major piece of work undone. Major parts of
>> Solr’s fac
perfect, thanks Yonik!
On Sat, Feb 25, 2012 at 11:41 PM, Yonik Seeley
wrote:
> On Sat, Feb 25, 2012 at 11:30 PM, Jamie Johnson wrote:
>> How large will the transaction log grow, and how long should it be kept
>> around?
>
> We keep around enough logs to satisfy a minimum of 100 updates
> lookba
Hey Matt - is your build recent?
Can you visit the cloud/zookeeper page in the admin and send the contents of
the clusterstate.json node?
Are you using a custom index chain or anything out of the ordinary?
- Mark
On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
> TWIMC:
>
> Environment
>
Hello,
>From what are you saying I can conclude you need something like
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
News are not really great for you, work in progress
https://issues.apache.org/jira/browse/SOLR-3076
I've heard that ElasticSearch has some sort of
TWIMC:
Environment
=
Apache SOLR rev-1236154
Apache Zookeeper 3.3.4
Windows 7
JDK 1.6.0_23.b05
I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
servers.
I created a 3 node zookeeper ensemble to manage the solr configuration data.
All the instances run on one serve
Hello Loren,
I suppose you are confused by printing list of *present* commits by
SolrDeletionPolicy
Feb 27, 2012 6:22:37 AM org.apache.solr.core.*SolrDeletionPolicy onCommit*
INFO: SolrDeletionPolicy.*onCommit: commits:num=2*
commit{dir=/home/search/solr/solr/data/index,segFN=segments_141z,versio
Yes, per-doc. I mentioned TermsComponent but meant TermVectorComponent,
where we get back all the terms in the doc. Just wondering if there was a
way to only get back the terms that matched the query.
Thanks EE,
-Jay
On Sat, Feb 25, 2012 at 2:54 PM, Erick Erickson wrote:
> Jay:
>
> I've seen th
Check in the data directory to make sure that they are present. If so, you
just need to load the cores again.
On Mon, Feb 27, 2012 at 11:30 AM, Wouter de Boer <
wouter.de.b...@springest.nl> wrote:
> Hi,
>
> I run SOLR on Jetty. After a restart of Jetty, the indices are empty.
> Anyone
> an idea
Hi,
I run SOLR on Jetty. After a restart of Jetty, the indices are empty. Anyone
an idea what the reason can be?
Regards,
Wouter.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-empty-after-restart-tp3781237p3781237.html
Sent from the Solr - User mailing list archive a
Yes! Thank you! I also get this in this morning from Sematext Blog.
Edismax
" Supports the “boost” parameter.. like the dismax bf param, but multiplies
the function query instead of adding it in"
http://blog.sematext.com/2010/01/20/solr-digest-january-2010/
--
View this message in context:
htt
Thanks Mark. I'll pull the latest trunk today and run with that.
On Sun, Feb 26, 2012 at 10:37 AM, Mark Miller wrote:
>>
>>
>>
>> Are there any outstanding issues that I should be aware of?
>>
>>
> Not that I know of - we where trying to track down an issue around peer
> sync recovery that our C
I am very excited to announce the availability of Solr 4.0 with
RankingAlgorithm 1.4 (NRT support) (Early Access Release).
RankingAlgorithm 1.4 supports the entire Lucene Query Syntax, ± and/or
boolean queries and is much faster than 1.3 and is compatible with
Lucene 4.0.
You can get more in
I've run some test on both the versions of Solr we are testing... one is the
2010.12.10 build and the other is the 2012.02.16 build. The latter one is
where we were initially seeing poor response performance. I've attached 4
text files which have the results of a few runs against each of the buil
I will run some queries today, both with lazyfield loading on and off (for
the 2010 build we're using and the 2012 build we're using) and get you some
of the debug data.
On Sun, Feb 26, 2012 at 4:13 PM, Yonik Seeley-2-2 [via Lucene] <
ml-node+s472066n318...@n3.nabble.com> wrote:
> On Sun, F
My *real* suggestion would be to not do it. Write a SolrJ
program that uses whatever version of Tika you want
to download and use *that* to index rather than try to
sort through the various jar dependencies in Solr. It'd be
safer.
Otherwise, you're on your own here.
Here's some example code:
htt
now, all works!
I have another problem If I use a conector with my solr-nutch.
this is the error:
Grave: java.lang.RuntimeException:
org.apache.lucene.index.CorruptIndexException: Unknown format version: -11
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.cor
Hi,
I am getting junk value in dynamic field in SOLR.
I am using Sqlserver driver(net.sourceforge.jtds.jdbc.Driver) for
connecting database and the same driver name is got as a junk value in my
dynamic field values.The below is sample junk value,
-
net.sourceforge.jtds.jdbc.ClobImpl@55
--- On Mon, 2/27/12, Xiao wrote:
> From: Xiao
> Subject: Customizing Solr score with DixMax query
> To: solr-user@lucene.apache.org
> Date: Monday, February 27, 2012, 5:59 AM
> In my application logic, I want to
> implement the ranking (scoring) logic as
> follows:
>
> score = "Solr relecenc
Hi All,
I am trying to understand features of Solr Cloud, regarding commits and
scaling.
- If I am using Solr Cloud then do I need to explicitly call commit
(hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
writing to disk?
- Do We still need to use Master
60 matches
Mail list logo