I have been a satisfied DIH user for a long time.
The project I use Solr for, runs on a MySQL (5.1) version. There are 6
solr-cores in total with a combined index size of 12G. The database design
is as relational as it can get, and writing SQL queries to fetch the data
has always been always a prob
On Fri, Aug 7, 2009 at 11:15 AM, Amit Nithian wrote:
> All,
> An off and on project of mine has been to work on refactoring the way we
> load data from MySQL into Solr. Our current approach is fairly hard coded
> and not configurable as I would like. I was curious of people who have used
> the DIH
All,
An off and on project of mine has been to work on refactoring the way we
load data from MySQL into Solr. Our current approach is fairly hard coded
and not configurable as I would like. I was curious of people who have used
the DIH and/or LuSQL to load data into Solr, how much data you typicall
Have you tried setting solr home via the JNDI? I think you can set it via
solr/home but that would require adding this to your servlet context
configuration.
Another option is to trace the startup scripts for Glassfish and see what
environment variables are passed in. JAVA_OPTS would make sense but
The you should consider replicating the index to the local intranet
and still run that it as a separate app.
Will it be the same master-slave replication?? If the master is
multicore, can I specifically replicate an index of a certain core ? Thanks
for the help.
2009/8/7 Noble Paul നോബിള് नोब
The you should consider replicating the index to the local intranet
and still run that it as a separate app.
On Fri, Aug 7, 2009 at 10:53 AM, Ninad Raut wrote:
> The remote web app will be accessing the Solr server via internet. Its not a
> intranet setup.
>
> On Fri, Aug 7, 2009 at 10:19 AM, Walt
The remote web app will be accessing the Solr server via internet. Its not a
intranet setup.
On Fri, Aug 7, 2009 at 10:19 AM, Walter Underwood wrote:
> About the first option, caches are more effective with more traffic, so ten
> front end servers using three Solr servers will have better caching
About the first option, caches are more effective with more traffic,
so ten front end servers using three Solr servers will have better
caching and probably better overall performance than having separate
search on all ten servers. You can even put an HTTP cache in there and
get better cach
>
> params.setQuery(queryString);
>
The query string is "*:*", right?
Your id field is sortable, right?
Cheers
Avlesh
On Fri, Aug 7, 2009 at 5:58 AM, Reuben Firmin wrote:
> I'm using SolrJ. When I attempt to set up a query to retrieve the maximum
> id
> in the index, I'm getting an exception.
Bradford,
If I may:
Have a look at http://www.sematext.com/products/language-identifier/index.html
And/or http://www.sematext.com/products/multilingual-indexer/index.html
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP,
Hi Noble,
Can you explain a bit more on how to use Solr "out of the box". I am
looking at ways to design the UI for remote application quickly and with
less problems.
Also could you elaborate more on what can go wrong with the first option?
Thanks.
2009/8/6 Noble Paul നോബിള് नोब्ळ्
> On Thu, A
Chris Hostetter wrote:
: I need to tokenize my field on whitespaces, html, punctuation, apostrophe
: but if I use HTMLStripStandardTokenizerFactory it strips only html
: but no apostrophes
you might consider using one of the HTML Tokenizers, and then use a
PatternReplaceFilterFilter ...
It looks like you export JAVA_OPTS in your .profile, but I bet Tomcat also sets
and thus overrides this same JAVA_OPTS it its own start up script. So that is
what you should edit and modify. I'm a Jetty user, so I don't have a Tomcat
startup script to check for you.
Otis
--
Sematext is hiring
Dynamic fields might be an answer. If you had a field called "product_*" and
these were populated with the corresponding values during indexing then
faceting on these fields will give you the desired behavior.
The only catch here is that the product names have to be known upfront. A
wildcard suppo
I have tried, but it was also not work!
The goal to set solr.home in tomcat6 is to start solr when the tomcat6 is
starting.
So I think the problem is that the solr can not start by set the solr.home
when glassfish is starting.
Chantal Ackermann wrote:
>
>
> You have to quote values that i
I'm using SolrJ. When I attempt to set up a query to retrieve the maximum id
in the index, I'm getting an exception.
My setup code is:
final SolrQuery params = new SolrQuery();
params.addSortField("id", ORDER.desc);
params.setRows(1);
params.setQuery(queryString);
fi
There is a patch for it:
https://issues.apache.org/jira/browse/SOLR-64
Koji
Jón Helgi Jónsson wrote:
Did a bit more creative searching for a solution and came up with this:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg15027.html
I'm using couple of days old nightly build, so u
Google Translate just released (last week) its language API with translation
and LANGUAGE DETECTION.
:)
It's very simple to use, and you can query it with some text to define witch
language is it.
Here is a simple example using groovy, but all you need is the url to
query: http://groovyconsole.ap
fyi, you can use the block property,but I think even better is to use
the unicode script property: http://unicode.org/reports/tr24/ . This
is easier because some characters are common across different scripts.
Also, some scripts span multiple unicode blocks.
This is the direction I was heading LUC
Is that 'blocks of text' is a (unicode) Java string? I don't think
this is the case, but then, use Character.UnicodeBlock to identify the
language of the text.
And, is that just text files with unknown character encoding? Then ICU
has a 'charset detector' that you can use. This feature 'suggests'
I can't reindex because the aggregated/grouped result should change as
the query changes... in other words, the result must by dynamic
We've been thinking about a new handler for it something like:
/select?q=laptop&rows=0&itemfacet=on&itemfacet.field=product_name,min(price),max(price)
Does i
By the way, I was using command=indexversion to verify replication is on or
off. Since it seems not reliable, is there a better to do it?
Thanks,
On Thu, Aug 6, 2009 at 8:43 AM, solr jay wrote:
> You are right. Replication was disabled after the server was restarted, and
> then I saw the behavi
Did a bit more creative searching for a solution and came up with this:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg15027.html
I'm using couple of days old nightly build, so unless there is
something new I should know about I'm going with that method :)
2009/8/6 Jón Helgi Jónsson
You should stand to benefit from concurrent loading. Certainly the text
analysis would end up being done concurrently; I'm not sure what else benefits
from it but I think there are other things. Ideally you could try a
configurable number of concurrent loads and pick the one that gets the job
If you can reindex, simply rebuild the index with fields replaced by
combining existing fields.
-Yao
-Original Message-
From: David Lojudice Sobrinho [mailto:dalss...@gmail.com]
Sent: Thursday, August 06, 2009 4:17 PM
To: solr-user@lucene.apache.org
Subject: Item Facet
Hi...
Is there
for first time loads i currently post to
/update/csv?commit=false&separator=%09&escape=\&stream.file=workfile.txt&map=NULL:&keepEmpty=false",
this works well and finishes in about 20 minutes for my work load.
this is mostly cpu bound, i have an 8 core box and it seems one takes
the brunt of the wo
Hello,
I think you are confusing the size of the data you want to index with the
size of the index. For our indexes (large full text documents) the Solr
index is about 1/3 of the size of the documents being indexed. For 3 TB of
data you might have an index of 1 TB or less. This depends on many
Hi...
Is there any way to group values like shopping.yahoo.com or shopper.cnet.com do?
For instance, I have documents like:
doc1 - product_name1 - value1
doc2 - product_name1 - value2
doc3 - product_name1 - value3
doc4 - product_name2 - value4
doc5 - product_name2 - value5
doc6 - product_name2 -
Bradford, there is an arabic analyzer in trunk. for farsi there is
currently a patch available:
http://issues.apache.org/jira/browse/LUCENE-1628
one option is not to detect languages at all.
it could be hard for short queries due to the languages you mentioned
borrowing from each other.
but you do
Hey there,
We're trying to add foreign language support into our new search
engine -- languages like Arabic, Farsi, and Urdu (that don't work with
standard analyzers). But our data source doesn't tell us which
languages we're actually collecting -- we just get blocks of text. Has
anyone here worke
As soon as I started reading your message I started thinking "common
grams", so that is what I would try first, esp. since somebody already
did the work of porting that from Nutch to Solr (see Solr JIRA).
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Ka
I'm investigating a problem I bet some of you have hit before, and exploring
several options to address it. I suspect that this specific IDF scenario is
common enough that it even has a name, though I'm not what it would be
called.
The scenario:
Suppose you have a search application focused on t
>
> does DIH call commit periodically, or are things done in one big batch?
>
AFAIK, one big batch.
Cheers
Avlesh
On Thu, Aug 6, 2009 at 11:23 PM, Yonik Seeley wrote:
> On Mon, Aug 3, 2009 at 12:32 PM, Chantal
> Ackermann wrote:
> > avg-cpu: %user %nice%sys %iowait %idle
> > 1
Hi, would really appreciate some help on this.
I'm doing a category browser for companies. Kind of like a yellow pages.
For each company I store each category the company is in like this:
Example for Boeing would be
03.03.02
which is an fictional id for 'Jets'
The beginning point I display all c
On Mon, Aug 3, 2009 at 12:32 PM, Chantal
Ackermann wrote:
> avg-cpu: %user %nice %sys %iowait %idle
> 1.23 0.00 0.03 0.03 98.71
>
> Basically, it is doing very little? *scratch*
How often is commit being called? (a Lucene commit sync's all of the
index files so a cra
>
> Do you think it's possible to return (in the nested entity) rows
> independent of the unique id, and let the processor decide when a document
> is complete?
>
I don't think so.
In my case, I had 9 (JDBC) entities for each document. Most of these
entities returned a single column and limited nu
Hi all,
to keep this thread up to date... ;-)
d) jdbc batch size
changed to 10. (Was default: 500, then 1000)
The problem with my dih setup is that the root entity query returns a
huge set (all ids that shall be indexed). A larger fetchsize would be
good for that query.
The nested entity, ho
Design so that you can handle the load with one server down (N+1
sizing), then take one server out for any maintenance. Simple and
works fine.
wunder
On Aug 6, 2009, at 9:25 AM, Robert Petersen wrote:
Here is another idea. With solr multicore you can dynamically spin up
extra cores and br
Here is another idea. With solr multicore you can dynamically spin up
extra cores and bring them online. I'm not sure how well this would
work for us since we have hard coded the names of the cores we are
hitting in our config files.
-Original Message-
From: Brian Klippel [mailto:br...@t
Something similar has been discussed earlier. Go through this thread -
http://www.lucidimagination.com/search/document/b5977650557f50cb/problem_with_query_parser
PS: Solr is pronounced as "Solar" but written without the "a".
Cheers
Avlesh
On Thu, Aug 6, 2009 at 7:18 PM, Deepak VSVK wrote:
> Hi
You could create a new "working" core, then call the swap command once
it is ready. Then remove the work core and delete the appropriate index
folder at your convenience.
-Original Message-
From: Robert Petersen [mailto:rober...@buy.com]
Sent: Wednesday, August 05, 2009 6:41 PM
To: solr
You are right. Replication was disabled after the server was restarted, and
then I saw the behavior. After I added some data, command "indexversion"
returns the right values. So it seems Solr behaved correctly.
Thanks,
2009/8/5 Noble Paul നോബിള് नोब्ळ्
> how is the replicationhandler configure
Hi everyone,
I'm indexing several documents that contain words that the StandardTokenizer
cannot detect as tokens. These are words like
C#
.NET
C++
which are important for users to be able to search for, but get treated as
"C", "NET", and "C".
How can I create a list of words that should be
I'm guessing it is because you have your Spell checker mapped to the
"spellchecker" request handler, but you are asking the standard
request handler to build the spell checker. Unless you've modified
the Standard Req Handler, it is not spell check aware.
Try http://localhost:8983/solr/sele
Great! *bow*
Thanks,
Chantal
this should do the trick
On Thu, Aug 6, 2009 at 6:41 PM, Chantal
Ackermann wrote:
> Hi again,
>
> 1.4 runs fine for me, now, but I'm still struggling for the correct delete
> query. There is few to no documentation at all for the new special commands,
> and I have problems guessing the correct setup from reading through th
Hi ,
In my application I am trying to search with some special characters like , $ #
, sloar returning all the search results available . Some of the charcters
like _ . are not encoding in the search url .can any one have any idea , what
would be the root cause of this .
I am using jetty server
Hi again,
1.4 runs fine for me, now, but I'm still struggling for the correct
delete query. There is few to no documentation at all for the new
special commands, and I have problems guessing the correct setup from
reading through the code. SORL-1060 is not enough help.
I've come up with a se
Hi,
Solr is fine out of RAM if you don't change it (build and then let it cache
what it needs). The RAM is needed when you constantly pepper it with updates
and commits. If you can have the logs update certain shards and then merge
those indexes periodically to machines you can leave alone - this
>
> if I know my fields and there are not many in the lucene index I should not
> face any problen creating a schema or are there any pitfalls which I should
> be aware off.
Nothing specific. The creation of schema should be very straightforward.
Just make sure you use the right field types.
Chee
But getting the schema right would be the challenge.
if I know my fields and there are not many in the lucene index I should not
face any problen creating a schema or are there any pitfalls which I should
be aware off.
Thanks for such quick replies guys.
2009/8/6 Noble Paul നോബിള് नोब्ळ्
> yea
yeah the big part was missed . You need to setup a schema.xml matching
the field names and types and you would need a solrconfig.xml . But
getting the schema right would be the challenge.
On Thu, Aug 6, 2009 at 5:23 PM, Mark Miller wrote:
> Your kidding right :)
>
> Noble Paul നോബിള് नोब्ळ् wrote
>
> what about the schema and querying?? there should be some changes to the
> solr schema I think. Correct me if I am wrong.
>
Of course! You have to create your own schema inside the schema.xml and
adjust values inside solrconfig.xml at the bare minimum to get started.
Cheers
Avlesh
On Thu, Aug
I am also interested in knowing! Does it work?
Cheers
Avlesh
On Thu, Aug 6, 2009 at 5:23 PM, Mark Miller wrote:
> Your kidding right :)
>
> Noble Paul നോബിള് नोब्ळ् wrote:
>
>> just copy the whole index into /index and start Solr. That
>> should just fine
>>
>> On Thu, Aug 6, 2009 at 5:17 PM,
what about the schema and querying?? there should be some changes to the
solr schema I think. Correct me if I am wrong.
2009/8/6 Noble Paul നോബിള് नोब्ळ्
> just copy the whole index into /index and start Solr. That
> should just fine
>
> On Thu, Aug 6, 2009 at 5:17 PM, Ninad Raut
> wrote:
> > H
Your kidding right :)
Noble Paul നോബിള് नोब्ळ् wrote:
just copy the whole index into /index and start Solr. That
should just fine
On Thu, Aug 6, 2009 at 5:17 PM, Ninad Raut wrote:
Hi,
Is there a way to import existing Lucene Indexes to SOLR?? I have a huge
lucene index which I want to impo
just copy the whole index into /index and start Solr. That
should just fine
On Thu, Aug 6, 2009 at 5:17 PM, Ninad Raut wrote:
> Hi,
> Is there a way to import existing Lucene Indexes to SOLR?? I have a huge
> lucene index which I want to import into SOLR server.
> Regards,
> Ninad Raut.
>
--
-
Hi,
Is there a way to import existing Lucene Indexes to SOLR?? I have a huge
lucene index which I want to import into SOLR server.
Regards,
Ninad Raut.
You have to quote values that include whitespace:
export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/home/huenzhao/search/solr"
or to make it accessible for other paths as well:
export SOLR_HOME=/home/huenzhao/search/solr
export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME"
Cheers,
Chantal
Go through this thread first - http://markmail.org/message/bannl2fpblt5sqlw
If it still does not help, post back your field type definition in
schema.xml
Cheers
Avlesh
On Thu, Aug 6, 2009 at 3:46 PM, Radha C. wrote:
> Hi,
>
>
> I have documents contain word "healthcare articles". I need to matc
Hi,
I have documents contain word "healthcare articles". I need to match the
"healthcare artcles" documents
for the query strings "helath", "articles"...
I tried q="health*", q=helath*, q="heath*articles" but everything returns
empty result. When I try q="healthcare artilces" ,the search re
Hi all,
I know how to configure solr.home by using tomcat6, but I don't know how to
set solr.home by using Glassfish(V2.1). I have tried to set the solr.home in
.profile as fellows:
export solr.home=/home/huenzhao/search/solr
export solr/home=/home/huenzhao/search/solr
export solr.solr.home=/hom
On Thu, Aug 6, 2009 at 12:24 PM, Ninad Raut wrote:
> Hi,
> I have a search engine on Solr. Also I have a remote web application which
> will be using the Solr Indexes for search.
> I have three scenarios:
> 1) Transfer the Indexes to the Remote Application.
>
> - This will reduce load on the actu
63 matches
Mail list logo