Hi,
I'm using solr1.3 mysql and tomcat55, can you please help to sort this out?
How can I index data in UTF8 ? I tried to add the parameter encoding="UTF-8"
in the datasource in data-config.xml.
| character_set_client| latin1
| characte
Have you confirmed Java's -Xmx setting? (Max memory)
e.g. java -Xmx2000MB -jar start.jar
-Nick
On Wed, Oct 22, 2008 at 3:24 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
> How much RAM in the box total? How many sort fields and what types? Sorts on
> each core?
>
> Willie Wong wrote:
>>
>> Hello,
>>
Actually, most XML parsers don't require you to escape such characters in
attributes. You are welcome to try this out, just look at the example-DIH :)
On Tue, Oct 21, 2008 at 11:11 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote:
> Wow, I really should read more closely before I respond - I see now,
Hi,
The best way to manage international characters is to keep everything in
UTF-8. Otherwise it will be difficult to figure out the source of the
problem.
1. Make sure the program which writes data into MySQL is using UTF-8
2. Make sure the MySQL tables are using UTF-8.
3. Make sure MySQL client
I've a function to clear up string which are in latin1 to UTF8, I would like
to know where exactly should I put it in the java code to clear up string
before indexing ?
Thanks a lot for this information,
Sunny
I'm using solr1.3, mysql, tomcat55
--
View this message in context:
http://www.nabbl
Hi Shalin
Thanks for your answer but it doesn't work just with Dfile.encoding
I was hoping it could work.
I definitely can't change the database so I guess I must change java code.
I've a function to change latin-1 string to utf8 but I don't know really
where should I put it?
Thanks for your
I am seeing odd behavior where a query such as:
http://localhost:8983/solr/select/?q=moss&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc
works until I add q.op=AND
http://localhost:8983/solr/select/?q=moss&q.op=AND&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc
whic
:so you can send your updates anytime you want, and as long as you only
:commit every 5 minutes (or commit on a master as often as you want, but
:only run snappuller/snapinstaller on your slaves every 5 minutes) your
:results will be at most 5minutes + warming time stale.
This is what I do as w
Hi,
See
http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html
and
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(java.lang.String)
Also note that you cannot transform a latin1 string in a utf-8
string. What you can do
is to decode a latin1 octet array
BY the way, the fq parameter is being used to apply a facet value as a
refinement which is why it is not tokenized and is a string.
jayson.minard wrote:
>
> I am seeing odd behavior where a query such as:
>
> http://localhost:8983/solr/select/?q=moss&version=2.2&start=0&rows=10&indent=on&fq=do
Hi Willie,
Are you using highliting ???
If, the response is yes, you need to know that for each document retrieved,
the solr highliting load into memory the full field who is using for this
functionality. If the field is too long, you have problems with memory.
You can solve the problem using th
you can try out a Transformer to translate that
On Wed, Oct 22, 2008 at 2:00 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
>
> I've a function to clear up string which are in latin1 to UTF8, I would like
> to know where exactly should I put it in the java code to clear up string
> before indexing ?
>
> T
Thinking about this, I could work around it by quoting the facet value so
that the AND isn't inserted between tokens in the fq parameter.
jayson.minard wrote:
>
> BY the way, the fq parameter is being used to apply a facet value as a
> refinement which is why it is not tokenized and is a stri
Can you tell me more about it ?
Noble Paul നോബിള് नोब्ळ् wrote:
>
> you can try out a Transformer to translate that
>
> On Wed, Oct 22, 2008 at 2:00 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
>>
>> I've a function to clear up string which are in latin1 to UTF8, I would
>> like
>> to know where e
I am very new to Solr, but I have played with Nutch and Lucene.
Has anybody used Solr for a whole web indexing application?
Which Spider did you use?
How does it compare to Nutch?
Thanks in advance for all of the info.
-John
http://wiki.apache.org/solr/DataImportHandler#head-eb523b0943596587f05532f3ebc506ea6d9a606b
On Wed, Oct 22, 2008 at 4:41 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
>
> Can you tell me more about it ?
>
>
> Noble Paul നോബിള് नोब्ळ् wrote:
>>
>> you can try out a Transformer to translate that
>>
>> On
Not quite yet, there is the IndexReader.clone patch that needs to be
completed that Ocean depends on
https://issues.apache.org/jira/browse/LUCENE-1314. I had it completed
but then things changed in IndexReader so now it doesn't work and I
have not had time to complete it again. Otherwise the Ocea
I should mention that I have already added this his tag in my SolrConfig.xml
of all cores.
and It works in single core but unfortunately doesn't work in multi core .
--
View this message in context:
http://www.nabble.com/immediatley-commit-of-docs-doesnt-work-in-multiCore-case-tp20072378p2
We're seeing strange behavior on one of our slave nodes after replication.
When the new searcher is created we see FileNotFoundExceptions in the log
and the index is strangely invalid/corrupted.
We may have identified the root cause but wanted to run it by the community.
We figure there is a bu
Hi,
I am working on a usecase where I want to boost a document if
there are certain group of words near the keywords searched by the user.
For eg: if the user is searching for keyword "pool", I want to boost the
documents which have words like "excellent pool", "nice pool", "awesome
pool",
On Oct 22, 2008, at 7:57 AM, John Martyniak wrote:
I am very new to Solr, but I have played with Nutch and Lucene.
Has anybody used Solr for a whole web indexing application?
Which Spider did you use?
How does it compare to Nutch?
There is a patch that combines Nutch + Solr. Nutch is used
On Tue, Oct 21, 2008 at 3:59 PM, Sachit P. Menon
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have gone through the archive in search of Hierarchical Faceting but was
> not clear as what should I exactly do to achieve that.
>
> Suppose, I have 3 categories like politics, science and sports. In the
> sc
Thanks Yonik,
I have more information...
1. We do indeed have large indexes: 40GB on disk, 30M documents - and is
just a test server we have 8 of these in parallel.
2. The performance problem I was seeing followed replication, and first
query on a new searcher. It turns out we didn't configur
Grant thanks for the response.
A couple of other people have recommended trying the Nutch + Solr
approach, but I am not sure what the real benefit of doing that is.
Since Nutch provides most of the same features as Solr and Solr has
some nice additional features (like spell checking, incre
Jim,
This is a off topic question.
But for your 30M documents, did you fetch those from external web
sites (Whole Web Search)? Or are they internal documents? If they
are external what method did you use to fetch them and which spider?
I am in the process of deciding between using Nutch
We index RSS content using our own home grown distributed spiders - not using
Nutch. We use ruby processes do do the feed fetching and XML shreading, and
Amazon SQS to queue up work packets to insert into our Solr cluster.
Sorry can't be of more help.
--
View this message in context:
http://
Hi,
Without changing any of the internals a simple approach might be to take the
query "pool" and expand the query with those other keywords, form query phrases
in addition to just plain "pool" keyword, and boost those expanded phrases to
make them bubble up - if they exist.
Otis
--
Sematext
Hello.
I have field "description" in my schema. And I want make a filed
"suggestion" with the same content. So I added following line to my
schema.xml:
But I also want to modify "description" string before copying it to
"suggestion" field. I want to remove all comas, dots and slashes. Here
Hii,
You probably lower-case tokens during indexing (LowerCaseFilterFactory).
Wildcard queries are not analyzed as non-wildcard ones (this is explained in
Lucene FAQ, I believe), so your capitalized Robert doesn't match the
lower-cased robert in your index.
Otis
--
Sematext -- http://sematext
Hi Jayson,
That's exactly what I was going to suggest: fq="docType:Fancy Doc"
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: jayson.minard <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, October 22, 2008 5:26:03
The filters and tokenizer that are applied to the copy field are
determined by it's type in the schema. Simply create a new field type in
your schema with the filters you would like, and use that type for your
copy field. So, the field description would have it's old type, but the
field suggestion
Thank you that is good information, as that is kind of way that I am
leaning.
So when you fetch the content from RSS, does that get rendered to an
XML document that Solr indexes?
Also what where a couple of decision points for using Solr as opposed
to using Nutch, or even straight Lucene?
Hi,
Without knowing the details I suspect it's just that 1.5GB heap is not enough.
Yes, sort will use your heap, as will various Solr caches. As will norms, so
double-check your schema to make sure you are using field types like string
where you can, not text, for example. If you sort by tim
Hi Shalin,
I wasn't talking about the behavior of parsers in the wild, but rather about
the XML specification (paraphrasing):
1. An XML document is not well-formed unless it matches the production labeled
document.
2. Violations of well-formedness constraints are fatal errors.
3. Once a fatal e
Thanks for reply. I want to make your point more exact, cause I'm not
sure that I correctly understood you :)
As far as I know (correct me please, if I wrong) type defines the way
in which the field is indexed and queried. But I don't want to index
or query "suggestion" field in different way, I
We shread the RSS into individual items then create Solr XML documents to
insert. Solr is an easy choice for us over straight Lucene since it adds
the server infrastructure that we would mostly be writing ourself - caching,
data types, master/slave replication.
We looked at nutch too - but that
Yes, using fieldType, you can have Solr run the PatternReplaceFilter for
you.
So, for example, you can declare something like this:
--
...
... Put the PatternReplaceFilter in here. At least for indexing, maybe
for query as well
...
...
---
I would suggest doing this i
On 10/22/08 8:57 AM, "Steven A Rowe" <[EMAIL PROTECTED]> wrote:
> Telling people that it's not a problem (or required!) to write non-well-formed
> XML, because a particular XML parser can't accept well-formed XML is kind of
> insidious.
I'm with you all the way on this.
A parser which accepts no
Here is what I am doing to check the memory statues.
1. Run the Servelt and Solr application.
2. On command prompt, jstat -gc 5s (5s means that getting data every 5
seconds.)
3. Watch it or pipe to the file.
4. Analyze the data gathered.
Jae
On Tue, Oct 21, 2008 at 9:48 PM, Willie Wong <[EMAIL P
FT> I would suggest doing this in your schema, then starting up Solr and
FT> using the analysis admin page to see if it will index and search the way
FT> you want. That way you don't have to pay the cost of actually indexing
FT> the data to find out.
Thanks. I did it exactly like you said.
I cr
My bad. I misunderstood what you wanted.
The example I gave was for the searching side of things. Not the data
representation in the document.
-Todd
-Original Message-
From: Aleksey Gogolev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 22, 2008 11:14 AM
To: Feak, Todd
Subject: Re[
It doesn't need to be a copy field, right? Could you create a new field
"ex", extract value from description, delete digits, and set to "ex"
field before add/index to solr server?
-Original Message-
From: Feak, Todd [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 22, 2008 11:25 Joe
JN> It doesn't need to be a copy field, right? Could you create a new field
JN> "ex", extract value from description, delete digits, and set to "ex"
JN> field before add/index to solr server?
Yes, I can. I just was wondering can I use solr for this purpose or
not.
JN> -Original Message-
Could you post fieldType specification for "ex"? What your regex look
like?
-Original Message-
From: Aleksey Gogolev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 22, 2008 11:39 Joe
To: Joe Nguyen
Subject: Re[6]: Question about copyField
JN> It doesn't need to be a copy field, r
URI encoding turns a space into a plus, then (maybe) Lucene takes that as a
space. Also you want a + in front of first_name.
A AND B -> +first_name:joe++last_name:smith
B AND maybe A -> first_name:joe++last_name:smith
Some of us need sample use cases to understand these things; documenta
Here is it, regex is very simple:
But the problem is not about the filed type. The problem is: how to retrive
final token and put it into the filed. Theoretically I gan re
To pass a plus sign in a URL parameter, use %2B.
This query:
foo +bar
Looks like this in a URL:
q=foo+%2Bbar
wunder
On 10/22/08 11:52 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote:
> URI encoding turns a space into a plus, then (maybe) Lucene takes that as a
> space. Also you want a + in
If you want your indexed value changed, you can use an analyzer (either
PatternReplaceFilter or a custom one). If you want the stored value changed,
you can use a custom UpdateRequestProcessor. However, taking care of this in
your application may be easier than bothering with the two particularly i
On Tue, Oct 21, 2008 at 06:57:03AM -0700, prerna07 wrote:
>
> Hi,
>
> On using Facet in solr query I am facing various issues.
>
> Scenario 1:
> I have 11 Index with tag : productIndex
>
> my search query is appended by facet parameters :
> facet=true&facet.field=Index_Type_s&qt=dismaxrequest
Hello,
It looks like we might have lost SolrSharp:
http://wiki.apache.org/solr/SolrSharp
It looks like its home is http://www.codeplex.com/solrsharp , but the site is
no longer available.
Does anyone know its status?
There is also http://code.google.com/p/deveel-solr/ , but this seems brand new
Folks:
I have two instances of solr running one on the master (U) and the other on
the slave (Q). Q is used for queries only, while U is where updates/deletes
are done. I am running on Windows so unfortunately I cannot use the
distribution scripts.
Every N hours when changes are committed
On Oct 22, 2008, at 4:17 PM, Otis Gospodnetic wrote:
Hello,
It looks like we might have lost SolrSharp:
http://wiki.apache.org/solr/SolrSharp
It looks like its home is http://www.codeplex.com/solrsharp , but
the site is no longer available.
Does anyone know its status?
looks like it is
Normally you don't have to start Q, but only "reload" Solr searcher when the
index has been copied.
However, you are on Windows, and its FS has the tendency not to let you
delete/overwrite files that another app (Solr/java) has opened. Are you able
to copy the index from U to Q? How are you do
Hi,
I'm using a couple of Solr 1.1 powered indexes and have relied on my "old"
Solr installation for more than two years now. I'm working on a new project
that is a bit complexer than my previous ones and I thought I had a look at
all the new goodies in Solr. One item that caught my attention is
DataImportHandler is only a way to get data into your index, from a
relational database of some sort. It won't affect your Solr reads in
any way - so everything that Solr normally does will still work the
same.
(I have not had a chance to look at it in depth, but searching the
index would
Otis,
Yes, I had forgotten that Windows will not permit me to overwrite files
currently in use. So my copy scripts are failing. Windows will not even
allow a rename of a folder containing a file in use so I am not sure how to
do this
I am going to dig around and see what I can come u
If that is the case you should look @ the DataImportHandler examples
as they can already index RSS, im doing it now for ~ a dozen feeds on
an hourly basis. (This is also for any XML-based feed for XHTML, XML,
etc). I find Nutch more useful for plain vanilla HTML (something that
was built
Thanks, it helped.
We were using *_s fields which had analyser section.
We used to copy all fields in some other field type and used
this new type in facet. It is working fine now.
Thanks,
Prerna
prerna07 wrote:
>
> Hi,
>
> On using Facet in solr query I am facing various issues.
>
> Sc
The case in point is DIH. DIH uses the standard DOM parser that comes
w/ JDK. If it reads the xml properly do we need to complain?. I guess
that data-config.xml may not be used for any other purposes.
On Wed, Oct 22, 2008 at 10:10 PM, Walter Underwood
<[EMAIL PROTECTED]> wrote:
> On 10/22/08 8:5
If you are using a nightly you can try the new SolrReplication feature
http://wiki.apache.org/solr/SolrReplication
On Thu, Oct 23, 2008 at 4:32 AM, William Pierce <[EMAIL PROTECTED]> wrote:
> Otis,
>
> Yes, I had forgotten that Windows will not permit me to overwrite files
> currently in use.
60 matches
Mail list logo