Hi there,
You should use LowerCaseTokenizerFactory as you point out yourself. As far
as I know, the StandardTokenizer "recognizes email addresses and internet
hostnames as one token". In your case, I guess you want an email, say
"[EMAIL PROTECTED]" to be split into four tokens: average joe
Thanks for the quick reply!
It is supposed to work a little like the Google Suggest or field
autocompletion.
I know I mentioned email and userid, but the problem lies with the name
field, because of the whitespaces in combination with the wildcard.
I looked at the solr.WordDelimiterFilterFactor
Glen,
The thing is, Solr has a database integration built-in with the new
DataImportHandler. So I'm not sure how much interest Solr users
would have in LuSql by itself.
Maybe there are LuSql features that DIH could borrow from? Or vice
versa?
Erik
On Nov 17, 2008, at 11:03
Hi Guys
I have timestamp fields in my database in the format,
ddmmyyhhmmss.Z AM
eg: 26-05-08 10:45:53.66100 AM
But I think the since the solr date format is different, i am unable to
index the document with the solr.DateField.
So is there any option by which I can give my timestamp format
How are you indexing the data ? by posting xml? or using DIH?
On Tue, Nov 18, 2008 at 3:53 PM, con <[EMAIL PROTECTED]> wrote:
>
> Hi Guys
> I have timestamp fields in my database in the format,
> ddmmyyhhmmss.Z AM
> eg: 26-05-08 10:45:53.66100 AM
> But I think the since the solr date form
Hey there, I've been testing and checking the source of the
TextProfileSignature.java to avoid similar entries at indexing time.
What I understood is that it is useful for huge text where the frequency of
the tokens (the words in lowercase just with number and leters in taht case)
is important. If
Ah, okay!
Well, then I suggest you index the field in two different ways if you want
both possible ways of searching. One, where you treat the entire name as
one token (in lowercase) (then you can search for avera* and match on for
instance "average joe" etc.) And then another field where yo
Have you tried the tunning params for TextProfileSignature? I probably
have to update the dedupe wiki.
You can set the quantRate and the minTokenLength. Those are the
variables names and you set them right with signatureClass,
signatureField, fields, etc.
Whether or not you can tune it to me
I have my own duplication system to detect that but I use String
comparison
so it works really slow...
What are you doing for the String comparison? Not exact right?
Hello,
I have some questions regarding the use of the EmbeddedSolrServer in order to
embed a solr instance into a Java application.
1°) Is an instance of the EmbeddedSolrServer class threadsafe when used by
several concurent threads?
2°) Regarding to transactions, can an instance of the Embedd
>>
>> I have my own duplication system to detect that but I use String
>> comparison
>> so it works really slow...
>>
What are you doing for the String comparison? Not exact right?
hey,
My comparison method looks for similar (not just exact)... what I do is to
compare two text word to word. Wh
Marc Sturlese wrote:
Hey there, I've been testing and checking the source of the
TextProfileSignature.java to avoid similar entries at indexing time.
What I understood is that it is useful for huge text where the frequency of
the tokens (the words in lowercase just with number and leters in taht
Erik,
Right now there is no real abstraction like DIH in LuSql. But as
indicated in the TODO section of the documentation, I was planning on
implementing or straight borrowing DIH in the near future.
I am assuming that Solr is all multi-threaded & as performant as it
can be. Is there a test SQL d
Hi Glen,
There is an issue open for making DIH API friendly. Take a look and let us
know what you think.
https://issues.apache.org/jira/browse/SOLR-853
On Tue, Nov 18, 2008 at 8:26 PM, Glen Newton <[EMAIL PROTECTED]> wrote:
> Erik,
>
> Right now there is no real abstraction like DIH in LuSql. B
Has anyone else experienced a deadlock when the DirectUpdateHandler2
does an autocommit?
I'm using a recent snapshot from hudson (apache-
solr-2008-11-12_08-06-21), and quite often when I'm loading data the
server (tomcat 6) gets stuck at line 469 of DirectUpdateHandler2:
// Check if t
Toby Cole wrote:
Has anyone else experienced a deadlock when the DirectUpdateHandler2
does an autocommit?
I'm using a recent snapshot from hudson
(apache-solr-2008-11-12_08-06-21), and quite often when I'm loading
data the server (tomcat 6) gets stuck at line 469 of
DirectUpdateHandler2:
Better yet, does anyone know where the method that writes the score lives?
For instance, a getScore() method that writes the score out that I could
override and truncate? Thanks!
-Derek
On Mon, Nov 17, 2008 at 9:59 PM, Derek Springer <[EMAIL PROTECTED]> wrote:
> Thanks for the heads up. Can anyo
Marc Sturlese wrote:
Hey there, I've been testing and checking the source of the
TextProfileSignature.java to avoid similar entries at indexing time.
What I understood is that it is useful for huge text where the frequency of
the tokens (the words in lowercase just with number and leters in taht
Mark Miller wrote:
Toby Cole wrote:
Has anyone else experienced a deadlock when the DirectUpdateHandler2
does an autocommit?
I'm using a recent snapshot from hudson
(apache-solr-2008-11-12_08-06-21), and quite often when I'm loading
data the server (tomcat 6) gets stuck at line 469 of
DirectU
I'm using Perl LWP which has a default 30 sec timeout on the http
request. I can set it to a larger number like 24 hours :-) I guess.
How do you set your timeout?
Phil
Lance Norskog wrote:
The 'optimize' http command blocks. If you script your automation, you can
just call the http and then
Hi Noble,
I am using DIH.
Noble Paul നോബിള് नोब्ळ् wrote:
>
> How are you indexing the data ? by posting xml? or using DIH?
>
>
> On Tue, Nov 18, 2008 at 3:53 PM, con <[EMAIL PROTECTED]> wrote:
>>
>> Hi Guys
>> I have timestamp fields in my database in the format,
>> ddmmyyhhmmss.Z AM
Take a look at the DateFormatTransformer. You can find documentation on the
DataImportHandler wiki.
http://wiki.apache.org/solr/DataImportHandler
On Tue, Nov 18, 2008 at 10:41 PM, con <[EMAIL PROTECTED]> wrote:
>
>
> Hi Noble,
> I am using DIH.
>
>
>
> Noble Paul നോബിള് नोब्ळ् wrote:
> >
> > Ho
: Is there a way to specify sort criteria through Solr admin ui. I tried
: doing it thorugh the query statement box but it did not work.
the search box on the admin gui is fairly limited ... it's jsut a quick
dirty way to run test queries. other options like sorting, filtering, and
faceting n
You don't need to hack the code since you can virtually treated these
scores 2.3518934 and 2.2173865 as if they were both equal (ignoring
digits after the decimal point).
Score = original score(2.3518934) + function(date_created)
You can scale the value of function(date_created) so that digits af
i am using embeddedSolrServer and simply has a queue that documents
are sent to ..and a listerner on that queue that writes it to the
index..
or just keep it simple, and do a synchronization block around the
method in the writeserver that writes the document to the index.
Jeryl Cook
/^\ Pharaoh /
Hi,
I assume there is a schema definition or DTD for XML response but could not
find it anywhere.
Is there one?
thanks
-Simon
--
View this message in context:
http://www.nabble.com/Is-there-a-DTD-XSD-for-XML-response--tp20565773p20565773.html
Sent from the Solr - User mailing list archiv
Anyone knows if the solr-ruby gem is compatible with solr 1.3??
Also anyone using acts_as_solr plugin? Off late the website is down and
can't find any recent activities on that
-Raghu
On Nov 18, 2008, at 2:41 PM, Kashyap, Raghu wrote:
Anyone knows if the solr-ruby gem is compatible with solr 1.3??
Yes, the gem at rubyforge is compatible with 1.3. Also, the library
itself is distributed with the binary release of Solr, in client/ruby/
solr-ruby/lib
Also anyone using ac
I've been using solr-ruby with 1.3 for quite a while now. It's powering our
"experimental", open-source OPAC, "Blacklight":
blacklight.rubyforge.org
I've got a custom query builder and response wrapper, but it's using
solr-ruby underneath.
Matt
On Tue, Nov 18, 2008 at 2:57 PM, Erik Hatcher <[EM
On 18-Nov-08, at 8:54 AM, Mark Miller wrote:
Mark Miller wrote:
Toby Cole wrote:
Has anyone else experienced a deadlock when the
DirectUpdateHandler2 does an autocommit?
I'm using a recent snapshot from hudson (apache-
solr-2008-11-12_08-06-21), and quite often when I'm loading data
the s
Mike Klaas wrote:
autoCommitCount is written in a CommitTracker.synchronized block
only. It is read to print stats in an unsynchronized fashion, which
perhaps could be fixed, though I can't see how it could cause a problem
lastAddedTime is only written in a call path within a
DirectUpdate
On 18 Nov 2008, at 20:18, Mark Miller wrote:
Mike Klaas wrote:
autoCommitCount is written in a CommitTracker.synchronized block
only. It is read to print stats in an unsynchronized fashion,
which perhaps could be fixed, though I can't see how it could cause
a problem
lastAddedTime is
Hello,
We are working with a very large index and with large documents (300+
page books.) It appears that the bottleneck on our system is the disk
IO involved in reading position information from the prx file for
commonly occuring terms.
An example slow query is "the new economics".
To pr
Very cool :-)
Both suggestions work fine! But only with solr version 1.4:
https://issues.apache.org/jira/browse/SOLR-823
Use a nightly build (e.g. 2008-11-17 works):
http://people.apache.org/builds/lucene/solr/nightly/
See below for examples for both solutions...
((( 1 )))
> There may be one
Rather than attempt an answer to your questions directly, I'll mention
how other projects have dealt with the very-common-word issue. Nutch,
for example, has a list of high frequency terms and concatenates them
with the successive word in order to form less-frequent aggregate
terms. The o
On 18-Nov-08, at 12:18 PM, Mark Miller wrote:
Mike Klaas wrote:
autoCommitCount is written in a CommitTracker.synchronized block
only. It is read to print stats in an unsynchronized fashion,
which perhaps could be fixed, though I can't see how it could cause
a problem
lastAddedTime i
Yes, I've found it.
Do you want my comments here or in solr-dev or on jira?
Glen
2008/11/18 Shalin Shekhar Mangar <[EMAIL PROTECTED]>:
> Hi Glen,
>
> There is an issue open for making DIH API friendly. Take a look and let us
> know what you think.
>
> https://issues.apache.org/jira/browse/SOLR-
On 18-Nov-08, at 6:56 AM, Glen Newton wrote:
Erik,
Right now there is no real abstraction like DIH in LuSql. But as
indicated in the TODO section of the documentation, I was planning on
implementing or straight borrowing DIH in the near future.
I am assuming that Solr is all multi-threaded & a
Was wondering if anyone can fill me in on the when and why I would set
waitFlush and waitSearcher to false when sending a commit command? I
think I understand what they do technically (I've looked at the code),
but I am not clear about why I would want to do it. Is there a risk
in setting
waitFlush I'm not sure...
waitSearcher=true it will wait until a new searcher is opened after
your commit, that way the client is guaranteed to have the results
that were just sent in the index. if waitSearcher=true, a query could
hit a searcher that does not have the new documents in the
Does waitFlush do anything now? I only see it being set if eclipse is
not missing a reference...
Ryan McKinley wrote:
waitFlush I'm not sure...
waitSearcher=true it will wait until a new searcher is opened after
your commit, that way the client is guaranteed to have the results
that were ju
Hi Glen ,
You can post all the queries first on solr-dev and all the valid ones
can be moved to JIRA
thanks,
Noble
On Wed, Nov 19, 2008 at 3:26 AM, Glen Newton <[EMAIL PROTECTED]> wrote:
> Yes, I've found it.
>
> Do you want my comments here or in solr-dev or on jira?
>
> Glen
>
> 2008/11/18 Sha
Thanks gistolero.
I have added this to the FAQ
http://wiki.apache.org/solr/DataImportHandlerFaq
On Wed, Nov 19, 2008 at 2:34 AM, <[EMAIL PROTECTED]> wrote:
> Very cool :-)
>
> Both suggestions work fine! But only with solr version 1.4:
> https://issues.apache.org/jira/browse/SOLR-823
>
> Use a ni
That explains true, but what about false? Why would I ever set it to
false? I f I don't wait, how will I ever know when the new searcher
is ready?
On Nov 18, 2008, at 10:27 PM, Ryan McKinley wrote:
waitFlush I'm not sure...
waitSearcher=true it will wait until a new searcher is opened a
I am using waitSearcher=false with a crawler. The crawling thread
finishes a set of stuff, and calls . It does not want to
search, it gets back to crawling ASAP
On Nov 18, 2008, at 11:35 PM, Grant Ingersoll wrote:
That explains true, but what about false? Why would I ever set it
to fal
Hi
Thanks for your quick reply Shalin
I have updated my data-config like:
This is an example of the date in my database: 22-10-08 03:57:11.63700
PM
In th
Do you have a stacktrace?
On Wed, Nov 19, 2008 at 10:24 AM, con <[EMAIL PROTECTED]> wrote:
>
> Hi
> Thanks for your quick reply Shalin
>
> I have updated my data-config like:
> transformer="TemplateTransformer,DateFormatTransformer" pk="EMP_ID"
> query="select EMP_ID, CREATED_DATE, CUST_ID FROM E
Hi Shalin
Please find the log data.
10:18:30,819 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
10:18:30,838 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
org.apache.solr.core.SolrResourceLoader locateInstanceDir
INFO: No /solr/ho
Hi Nobble
I have cross checked. This is my copy field of schema.xml
I am still getting that error.
thanks
con
Noble Paul നോബിള് नोब्ळ् wrote:
>
> yoour copyField has the wrong source field name . Field name is not
> "date" it is 'CREATED_DATE'
>
> On Wed, Nov 19, 2008 at 11:49 AM, co
nope... solr does not have a DTD.
On Nov 18, 2008, at 1:44 PM, Simon Hu wrote:
Hi,
I assume there is a schema definition or DTD for XML response but
could not
find it anywhere.
Is there one?
thanks
-Simon
--
View this message in context:
http://www.nabble.com/Is-there-a-DTD-XSD-for-XM
50 matches
Mail list logo