Here is how I did it (the code is from memory so it might not be correct
100%):
private boolean hasAccents;
private Token filteredToken;
public final Token next() throws IOException {
if (hasAccents) {
hasAccents = false;
return filteredToken;
}
Token t = input.next();
String filte
Firstly, my apologies for being off topic. I'm asking this question because
I think there are some machine learning and text processing experts on this
mailing list.
Basically, my task is to normalize a fairly unstructured set of short texts
using a dictionary. We have a pre-defined list of produc
tml. Looks
interesting but the implementation is in Python though. I think they use
Hidden Markov Model to label training data then matching records
probalistically.
On Fri, Jun 27, 2008 at 10:12 PM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:
> below
>
>
>
> On Jun 27, 2008,
Hi Bram,
You can use filter query (fq) to limit your results:
fq=tag:sometag&q=user_input_here
Have a look at dismax and standard query documentation on the wiki.
On Sun, Jun 29, 2008 at 6:49 PM, Bram de Jong <[EMAIL PROTECTED]> wrote:
> hello all,
>
> I would like to combine the DisMaxReques
Hi all,
Porter stemmer in general is really good. However, there are some cases
where it doesn't work. For example, "accountant" matches "Accountant" as
well as "Account Manager" which isn't desirable. Is it possible to use this
analyser for plural words only? For example:
+Accountant -> accountant
Ok, it looks like step 1a in Porter algo does what I need.
On Mon, Jun 30, 2008 at 6:39 PM, climbingrose <[EMAIL PROTECTED]>
wrote:
> Hi all,
> Porter stemmer in general is really good. However, there are some cases
> where it doesn't work. For example, "accountant"
4:12 AM, Mike Klaas <[EMAIL PROTECTED]> wrote:
> If you find a solution that works well, I encourage you to contribute it
> back to Solr. Plural-only stemming is probably a common need (I've
> definitely wanted to use it before).
>
> cheers,
> -Mike
>
>
> O
temmer ready, write something similar to
EnglishPorterFilterFactory to use it within Solr.
Hope this helps.
Cheers,
Cuong
On Tue, Jul 1, 2008 at 6:07 PM, Guillaume Smet <[EMAIL PROTECTED]>
wrote:
> Hi Cuong,
>
> On Tue, Jul 1, 2008 at 4:45 AM, climbingrose <[EMAIL PROTECTED]>
You do, I think. Have a look at DirectUpdateHandler2 class.
On Thu, Jul 10, 2008 at 9:16 PM, Gudata <[EMAIL PROTECTED]> wrote:
>
> Hi,
> I want (if possible) to dedicate one machine only for indexing and to be
> optimized only for that.
>
> In solrconfig.xml, I have:
> - commented all cache state
Hi all,
Has anyone tried to factor rating/popularity into Solr scoring? For example,
I want documents with more page views to be ranked higher in the search
results. From what I can see, the most difficult thing is that we have to
update the number of page views for each document. With Solr-139, do
, Jul 12, 2008 at 1:58 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> See ExternalFileField and BoostedQuery
>
> -Yonik
>
> On Fri, Jul 11, 2008 at 11:47 AM, climbingrose <[EMAIL PROTECTED]>
> wrote:
> > Hi all,
> > Has anyone tried to factor rating/populari
Hi Yonik,
I have had a looked at ExternalFileField. However, I coudn't figured out how
to include the externally referenced field in the search results. Also,
sorting on this type of field isn't possible right?
Thanks.
On Sat, Jul 12, 2008 at 2:28 AM, climbingrose <[EMAIL PROT
Hi all,
I've been trying to return a field of type ExternalFileField in the search
result. Upon examining XMLWriter class, it seems like Solr can't do this out
of the box. Therefore, I've tried to hack Solr to enable this behaviour.
The goal is to call to ExternalFileField.getValueSource(SchemaFie
You would need to modify schema.xml to change these names.
On Thu, Jul 24, 2008 at 8:06 AM, anshuljohri <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I need to change the filed names in schema.xml eg. default names are
> id,sku,name,text etc. But i want to use my own name instead of these names.
> Lets
Hi all,
Have any one try to use CollapseFilter with the latest version of Solr in
trunk? However, it looks like Solr 1.4 doesn't allow calling setFilterList()
and setFilter() on one instance of the QueryCommand. I modified the code in
QueryCommand to allow this:
public QueryCommand setFilterL
works at
> all.
>
> --
> Jeff Newburn
> Software Engineer, Zappos.com
> jnewb...@zappos.com - 702-943-7562
>
>
> > From: climbingrose
> > Reply-To:
> > Date: Fri, 17 Apr 2009 16:53:00 +1000
> > To: solr-user
> > Subject: CollapseFilter with the
Hi all,
I'm puzzling over how to boost a date field in a DisMax query. Atm, my qf is
"title^5 summary^1". However, what I really want to do is to allow document
with latest "listedDate" to have better score. For example, documents with
listedDate:[NOW-1DAY TO *] have additional score over documen
ECTED]> wrote:
> I think in this case you can use a "bq" (Boost Query) so you can apply
this
> boost to the range you want.
>
> your_date_field:[NOW/DAY-24HOURS TO NOW]^10.0
>
> This example will boost your documents with date within the last 24h.
>
> Regard
Just tried the bq approach and it works beautifully. Exactly what I was
looking for. Still, I'd like to know which approach is the preferred? Thanks
again guys.
On 7/20/07, climbingrose <[EMAIL PROTECTED]> wrote:
Thanks for both answers. Which one is better in terms of performanc
Thanks for the answer Chris. The DisMax query handler is just amazing!
On 7/20/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: Just tried the bq approach and it works beautifully. Exactly what I was
: looking for. Still, I'd like to know which approach is the preferred?
Thanks
: again guys.
I think I have the same question as Arnaud. For example, my dismax query has
qf=title^5 description^2. Now if I search for "Java developer", I want to
make sure that the results have at least "java" or "developer" in the title.
Is this possible with dismax query?
On 7/30/07, Chris Hostetter <[EMAI
Hi all,
I think there might be something wrong with the date time rounding up. I
tried this query: "q=*:*&fq=listedDate:[NOW/DAY-1DAY TO *]" which I think
should return results since yesterday. So if today is 9th of August, it
should return all results from the 8th of August. However, Solr returns
t; >
> >
> >
> >
> >
> > > ignoreCase="true" expand="true"/>
> >
> >
> >
> >
> >
> >
> > If you want this field to be automatical
cker index and "location" part with "location" field in
the index. Otherwise I might have irrelevant suggestions for the "location"
part since the number of terms in "location" is generally much smaller
compared with that of "description". Any ideas?
Thanks
OK, I just need to define 2 spellcheckers in solrconfig.xml for my purpose.
On 8/11/07, climbingrose <[EMAIL PROTECTED]> wrote:
>
> After looking the SpellChecker code, I realised that it only supports
> single-word. I made a very naive modification of SpellCheckerHandler to g
spellchecker:
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
.
On 8/11/07, Pieter Berkel <[EMAIL PROTECTED]> wrote:
>
> On 11/08/07, climbingrose <[EMAIL PROTECTED]> wrote:
> >
> > The spellchecker handl
I'm having the date boosting function as well. I'm using this function:
F = recip(rord(creationDate),1,1000,1000)^10. However, since I have around
10,000 of documents added in one day, rord(createDate) returns very
different values for the same createDate. For example, the last document
added with
Yeah. How stable is the patch Karl? Is it possible to use it in product
environment?
On 8/12/07, karl wettin <[EMAIL PROTECTED]> wrote:
>
>
> 11 aug 2007 kl. 10.36 skrev climbingrose:
>
> > There is an issue on
> > Lucene issue tracker regarding mult
We add around 10,000 docs during week days and 5,000 during weekends.
On 8/12/07, Pieter Berkel <[EMAIL PROTECTED]> wrote:
>
> Do you consistently add 10,000 documents to your index every day or does
> the
> number of new documents added per day vary?
>
>
> On 11/
I'm happy to contribute code for the SpellCheckerRequestHandler. I'll post
the code once I strip off stuff related to our product.
On 8/12/07, Pieter Berkel <[EMAIL PROTECTED]> wrote:
>
> <http://issues.apache.org/jira/browse/LUCENE-626>On 11/08/07,
> climbingr
Thanks Karl. I'll check it out!
On 8/18/07, karl wettin <[EMAIL PROTECTED]> wrote:
>
> I updated LUCENE-626 last night. It should now run smooth without
> LUCENE-550, but smoother with.
>
> Perhaps it is something you can use.
>
>
> 12 aug 2007 kl. 14.24 skr
Haven't tried the embedded server but I think I have to agree with Mike.
We're currently sending 2000 job batches to SOLR server and the amount of
time required to transfer documents over http is insignificant compared with
the time required to index them. So I do think unless you are sending
docum
take up more memory for the StringBuilder to store the much
> > > larger XML. For 10,000 it was much slower. For that size I would
> > > need
> > > to XML streaming or something to make it work.
> > >
> > > The solr war was on the same machine, so network o
I think you can use the CollapseFilter to collapse on "version" field.
However, I think you need to modify the CollapseFilter code to sort by
"version" and get the latest version returned.
On 9/13/07, Adrian Sutton <[EMAIL PROTECTED]> wrote:
>
> Hi all,
> The document's we're indexing are versione
Hi all,
I've been struggling to find a good way to synchronize Solr with a large
number of records. We collect our data from a number of sources and each
source produces around 50,000 docs. Each of these document has a "sourceId"
field indicating the source of the document. Now assuming we're inde
Hi Erik,
>>So in your case #1, documents are reindexed with this scheme - so if you
>>truly need to skip a reindexing for some reason (why, though?) you'll
>>need to come up with some other mechanism. [perhaps update could be
>>enhanced to allow ignoring a duplicate id rather than reindexing?]
I
I don't think you can with the current Solr because each instance runs in a
separate web app.
On 9/25/07, James liu <[EMAIL PROTECTED]> wrote:
>
> if use multi solr with one index, it will cache individually.
>
> so i think can it share their cache.(they have same config)
>
> --
> regards
> jl
>
1)On solr.master:
+Edit scripts.conf:
solr_hostname=localhost
solr_port=8983
rsyncd_port=18983
+Enable and start rsync:
rsyncd-enable; rsyncd-start
+Run snapshooter:
snapshooter
After running this, you should be able to see a new folder named snapshot.*
in data/index folder.
You can can solrconfig.
ks like a charm. Thanks very much.
> >
> >cheers
> >Y.
> >
> >Message d'origine
> >>Date: Mon, 1 Oct 2007 21:55:30 +1000
> >>De: climbingrose
> >>A: solr-user@lucene.apache.org
> >>Sujet: Re: Solr replication
> >>
I think search for "*:*" is the optimal code to do it. I don't think you can
do anything faster.
On 10/11/07, Stefan Rinner <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> for some tests I need to know how many documents are stored in the
> index - is there a fast & easy way to retrieve this number (instead
Hi all,
I've been so busy the last few days so I haven't replied to this email. I
modified SpellCheckerHandler a while ago to include support for multiword
query. To be honest, I didn't have time to write unit test for the code.
However, I deployed it in a production environment and it has been wo
viour
configurable.
On 10/11/07, climbingrose <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> I've been so busy the last few days so I haven't replied to this email. I
> modified SpellCheckerHandler a while ago to include support for multiword
> query. To be honest, I
The easiest solution I know is:
id:1 OR id:2 OR ...
If you know that all of these ids can be found by issuing a query, you
can do delete by query:
YOUR_DELETE_QUERY_HERE
Cheers
On Nov 19, 2007 4:18 PM, Norberto Meijome <[EMAIL PROTECTED]> wrote:
> Hi everyone,
>
> I'm trying to issue, via curl to
Hi David,
Do you use one of Solr client available
http://wiki.apache.org/solr/IntegratingSolr? These clients should
probably have done all the XML parsing jobs for you. I speak from
Solrj experience.
IMO, your approach is probably most commonly used when it comes to
pagination. Solr caching mecha
One approach is to extend SynonymFilter so that it reads synonyms from
database instead of a file. SynonymFilter is just a Java class so you
can do whatever you want with it :D. From what I remember, the filter
initialises a list of all input synonyms and store them in memory.
Therefore, you need t
bably need to do is to add more parameters
such as database host, username, password and the actual database in
init() method.
On Nov 20, 2007 3:18 PM, climbingrose <[EMAIL PROTECTED]> wrote:
> One approach is to extend SynonymFilter so that it reads synonyms from
> database
The duplication detection mechanism in Nutch is quite primitive. I
think it uses a MD5 signature generated from the content of a field.
The generation algorithm is described here:
http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/crawl/TextProfileSignature.html.
The problem with this a
Make sure you have JDK installed not just JRE. Also try to set
JAVA_HOME directory.
apt-get install sun-java5-jdk
On Nov 21, 2007 5:50 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> Phillip,
>
> I won't go into details, but I'll point out that the Java compiler is called
> javac and if mem
Hi Ken,
It's correct that uncommon words are most likely not showing up in the
signature. However, I was trying to say that if two documents has 99%
common tokens and differ in one token with frequency > quantised
frequency, the two resulted hashes are completely different. If you
want true near d
Assuming that you have the timestamp field defined:
q=*:*&sort=timestamp desc
On Nov 23, 2007 10:43 PM, Thorsten Scherler
<[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I need to ask solr to return me the id of the last committed document.
>
> Is there a way to archive this via a standard lucene query o
Hi all,
I'm trying to implement a custom UpdateProcessor which requires access to
SolrIndexSearcher. However, I'm constantly running into "Too many open
files" exception. I'm confused about which is the correct way to get access
to SolrIndexSearcher in UpdateProcessor:
1) req.getSearcher()
2) req
I don't think you have to. Just try the query on the REST interface and you
will know.
On Dec 5, 2007 9:56 AM, Kasi Sankaralingam <[EMAIL PROTECTED]> wrote:
> Do I need to select the fields in the query that I am trying to sort on?,
> for example if I want sort on update date then do I need to se
Hi Ryan,
I'm using solr with Maven 2 in our project. Here is how my pom.xml looks
like:
org.apache.solr
solr-solrj
1.3.0
Since I have all solrj dependencies declared by other artifacts, I don't
need to declare any of solrj dependenci
I think there is a event listener interface for hooking into Solr events
such as post commit, post optimise and open new searcher. I can't remember
on top of my head but if you do a search for *EventListener in Eclipse,
you'll find it.
The Wiki shows how to trigger snapshooter after each commit and
Make sure that the user running Solr has permission to execute snapshooter.
Also, try ./snapshooter instead of snapshooter.
Good luck.
On Dec 18, 2007 10:57 AM, Sunny Bassan <[EMAIL PROTECTED]> wrote:
> I've set up solrconfig.xml to create a snap shot of an index after doing
> a optimize, but th
Good day all Solr users & developers,
May I wish you and your family a merry Xmas and happy new year. Hope that
new year brings you all health, wealth and peace. It's been my pleasure to
be on this mailing list and working with Solr. Thank you all!
--
Cheers,
Cuong Hoang
Hi all,
Here is my situation:
I'm implementing some geographical search functions that allows user to
search for documents close to a location. Because not all documents have
proper location information that can be converted to (latitude, longitude)
coordinate, I also have to use normal full text
I don't think they (Solr developers) have a time frame for 1.3 release.
However, I've been using the latest code from the trunk and I can tell you
it's quite stable. The only problem is the documentation sometimes doesn't
cover lastest changes in the code. You'll probably have to dig into the code
I'm using code pulled directly from Subversion.
On Jan 21, 2008 12:34 PM, anuvenk <[EMAIL PROTECTED]> wrote:
>
> Thanks. Would this be the latest code from the trunk that you mentioned?
> http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip
>
>
>
Hi guys,
I'm running to some problems with accented (UTF-8) language. I'd love to
hear some ideas about how to use Solr with those languages. Basically, I
want to achieve what Google did with UTF-8 language.
My requirements including:
1) Accent insensitive search and proper highlighting:
For ex
>
> Peter
>
> Peter Binkley
> Digital Initiatives Technology Librarian
> Information Technology Services
> 4-30 Cameron Library
> University of Alberta Libraries
> Edmonton, Alberta
> Canada T6G 2J8
> Phone: (780) 492-3743
> Fax: (780) 492-9243
> e-mail: [EMAIL PROTEC
Hi all,
I thought many people would encounter the situation I'm having here.
Basically, we'd like to have a PhraseQuery with "minimum should match"
property similar to BooleanQuery. Consider the query "Senior Java
Developer":
1) I'd like to do a PhraseQuery on "Senior Java Developer" with a slop
Thanks Christ. I probably have to repost this in Lucene mailing list.
On Sun, Mar 23, 2008 at 9:49 AM, Chris Hostetter <[EMAIL PROTECTED]>
wrote:
>
> the topic has come up before on the lucene java lists (allthough i can't
> think of any good search terms to find the old threads .. I can't reall
Agree. I've been using Solrj on product site for 9 months without any
problem at all. You should probably give it a try instead of dealing with
all those low level details.
On Sun, May 11, 2008 at 4:14 AM, Chris Hostetter <[EMAIL PROTECTED]>
wrote:
>
> : please post a snippet of Java code to add
Probably the easiest way to do this is keep track of the number of items
yourself then retrieve it later on.
On Wed, May 21, 2008 at 7:57 AM, Brian Whitman <[EMAIL PROTECTED]>
wrote:
> Any way to query how many items are in a multivalued field? (Or use a
> functionquery against that # or anything
Hi Matthias,
How would you prevent Solr server from being exposed to outside world with
this javascript client? I prefer running Solr behind firewall and access it
from server side code.
Cheers.
On Mon, May 26, 2008 at 7:27 AM, Matthias Epheser <[EMAIL PROTECTED]>
wrote:
> Hi users,
>
> As init
Hi all,
I'm trying to implement "sponsored results" in Solr search results similar
to that of Google. We index products from various sites and would like to
allow certain sites to promote their products. My approach is to query a
slave instance to get sponsored results for user queries in addition
n help you but I will expose it and let you decide.
>
> I have an index containing products entries that I created a field called
> sponsored words. What I do is to boost this field , so when these words are
> matched in the query that products appear first on my result.
>
> 2008/
Hi Sachit,
I think what you could do is to create all the "core fields" of your models
such as username, role, title, body, images... You can name them with prefix
like user.username, user.role, article.title, article.body... If you want to
dynamically add more fields to your schema, you can use d
It depends on your query. The second query is better if you know that
fieldb:bar filtered query will be reused often since it will be cached
separately from the query. The first query occuppies one cache entry while
the second one occuppies two cache entries, one in queryCache and one in
filteredCa
Just correct myself, in the last setence, the first query is better if
fieldb:bar isn't reused often
On Thu, Jun 12, 2008 at 2:02 PM, climbingrose <[EMAIL PROTECTED]>
wrote:
> It depends on your query. The second query is better if you know that
> fieldb:bar filtered query wil
Hi all,
I've been watching the development of Solr in the last few months. I start
building a mobile phone shop with faceted browsing to allow users to filter
the catalogue in a friendly manner. Since the site will probably expand in a
year or two, I need some advice regarding the design and impl
Because the mobile phone info has many fields (>40), I don't want to
repeatedly submit it to Solr.
On 9/13/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 9/12/06, climbingrose <[EMAIL PROTECTED]> wrote:
> Obviously, I need to publish the mobile phone
> catalog
I probably need to visualise my models:
MobileInfo (1)(1...*) SellingItem
MobileInfo has many fields to describe the characteristics of a mobile phone
model (color, size..). SellingItem is an "instance" of MobileInfo that is
currently sold by a user. So in the
Hi all,
Am I right that we can only have one schema per solr server? If so, how
would you deal with the issue of submitting completely different data models
(such as clothes and cars)?
Thanks.
--
Regards,
Cuong Hoang
Hi all,
Is it true that Solr is mainly used for applications that rarely change the
underlying data? As I understand, if you submit new data or modify existing
data on Solr server, you would have to "refresh" the cache somehow to
display the updated data. If my application frequently gets new dat
Hi all,
I'm developing an application that potentially creates thousands of dynamic
fields. Does anyone know if large number of dynamic fields will degrade
Solr performance?
Thanks.
--
Regards,
Cuong Hoang
Thanks Yonik. I think both of the conditions hold true for our application
;).
On 3/27/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 3/26/07, climbingrose <[EMAIL PROTECTED]> wrote:
> I'm developing an application that potentially creates thousands of
dynamic
> field
Accidentally I have a very similar use case. Thanks for advice.
On 7/8/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 7/7/07, Brian Whitman <[EMAIL PROTECTED]> wrote:
> I have been trying to plan out a history function for Solr. When I
> update a document with an existing unique key, I would li
Hi Tristan,
Is this spellchecker available in 1.2 release or I have to build the trunk.
I tried your instructions but Solr returns nothing:
http://localhost:8984/solr/select/?q=title_text:java&qt=spellchecker&cmd=rebuild
Result:
0
3
rebuild
Thanks.
On 7/8/07, Tristan Vittorio <[EMAIL P
ttp://wiki.apache.org/solr/SpellCheckerRequestHandler
cheers,
Tristan
On 7/9/07, climbingrose <[EMAIL PROTECTED]> wrote:
>
> Hi Tristan,
>
> Is this spellchecker available in 1.2 release or I have to build the
> trunk.
> I tried your instructions but Solr returns nothing:
&
Hi all,
I've been using Solr for the last few projects and the experience has been
great. I'll post the link to the website once it finishes. Just have a few
questions regarding synonyms and parameters encoding:
1) Is multi-word synonyms possible now in Solr? For example, can I have
things like
Hi all,
My facet browsing performance has been decent on my system until I add my
custom Analyser. Initially, I facetted "title" field which is of default
string type (no analysers, tokenisers...) and got quick responses (first
query is just under 1s, subsequent queries are < 0.1s). I created a c
the fieldCache except
for large facets)
2) expand the size of the fieldcache to 100 if you have the memory
Optimizing your index should also speed up faceting (but that is a lot
of facets).
-Yonik
On 7/16/07, climbingrose <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> My facet b
7, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 7/16/07, climbingrose <[EMAIL PROTECTED]> wrote:
> Thanks Yonik. In my case, there is only one "title" field per document
so is
> there a way to force Solr to work the old way? My analyser doesn't break
up
> the &
Thanks for the suggestion Chris. I modified SimpleFacets to check for
[f.foo.]facet.field.type==(single|multi)
and the performance has been improved significantly.
On 7/17/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: > ...but i don't understand why both checking isTokenized() ...
shouldn't
86 matches
Mail list logo