Hi all,
I'm implementing Solr for a course and book search service for college
students, and I'm running into some issues with the highlighting plugin.
After a few minutes of tinkering, searching on Google, searching the group
archives and not finding anything, I thought I would see if anyone else
> I'm interested in what
> is the part in the query 'q={!.}. Is that a filter
> query?
It is in local params syntax. http://wiki.apache.org/solr/LocalParams
Awesome, thank you so much! That did the trick.
On Mon, Jan 10, 2011 at 10:02 PM, Ahmet Arslan wrote:
> Replacing with EdgeNGramTokenizerFactory with
>
>
> maxGramSize="114" />
>
> combination should solve your problem. Preserving your search within
> words.
>
> Searching histo will return
Replacing with EdgeNGramTokenizerFactory with
combination should solve your problem. Preserving your search within
words.
Searching histo will return : African American History
--- On Tue, 1/11/11, Dan Loewenherz wrote:
> From: Dan Loewenherz
> Subject: Re: Solr highlighting is botching
On Mon, Jan 10, 2011 at 9:19 PM, Ahmet Arslan wrote:
>
> Not sure about your solr version but probably it can be
> https://issues.apache.org/jira/browse/LUCENE-2266
>
> Is there a special reason for using EdgeNGramTokenizerFactory?
> Replacing this tokenizer with WhiteSpaceTokenizer should solve
Not sure about your solr version but probably it can be
https://issues.apache.org/jira/browse/LUCENE-2266
Is there a special reason for using EdgeNGramTokenizerFactory?
Replacing this tokenizer with WhiteSpaceTokenizer should solve this.
Or upgrade solr version.
And I don't see either in your
I am switching between building the query to a Solr instance by hand and doing
it with PHP Solr Extension.
I have this query that my dev partner said to insert before all the other
column
searches. What kind of query is it and how do I get it into the query in an
'OOP' style using the PHP Solr
Are there any chatrooms or ICQ rooms to ask questions late at night to people
who stay up or are on other side of planet?
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so
On Mon, Jan 10, 2011 at 6:51 PM, Ahmet Arslan wrote:
> > Thats really strange. Can you provide us field type
> > definition of text field. And full search URL that caused
> > that output. And the solr version.
> >
>
> Also, did you enable term vectors on text field?
>
>
Not sure what those are, s
On Mon, Jan 10, 2011 at 6:48 PM, Ahmet Arslan wrote:
> Thats really strange. Can you provide us field type definition of text
> field. And full search URL that caused that output. And the solr version.
>
Sure. Full search URL:
/solr/select?indent=on
&version=2.2&q=history&fq=&start=0&rows=10&fl
can u give an example.. like something that is currently being used.. i'am an
engineering student and my project is to index all the real time log files
from different devices and use some artificial intelligence and produce a
usefull data out of it.. i'm doing this for my college.. i'm struggling
Hi,
I'm not sure if this question is better posted in Solr - User or Solr - Dev,
but I'll start here.
I'm interested to find some documentation that describes in detail how
synonym expansion is handled at index time.
http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-co
Thanks!
On Mon, Jan 10, 2011 at 9:27 PM, Ahmet Arslan wrote:
> > Is there a max POST size limit when
> > sending documents over to Solrs update
> > handler to be indexed? Right now I've self imposed a
> > limit of sending a max
> > of 50 docs per request to solr in my PHP code..and that
> > see
> Thats really strange. Can you provide us field type
> definition of text field. And full search URL that caused
> that output. And the solr version.
>
Also, did you enable term vectors on text field?
> I'm implementing Solr for a course and book search service
> for college
> students, and I'm running into some issues with the
> highlighting plugin.
> After a few minutes of tinkering, searching on Google,
> searching the group
> archives and not finding anything, I thought I would see if
> anyo
> Is there a max POST size limit when
> sending documents over to Solrs update
> handler to be indexed? Right now I've self imposed a
> limit of sending a max
> of 50 docs per request to solr in my PHP code..and that
> seems to work fine.
> I was just curious as to if there was a limit somewhere a
Is there a max POST size limit when sending documents over to Solrs update
handler to be indexed? Right now I've self imposed a limit of sending a max
of 50 docs per request to solr in my PHP code..and that seems to work fine.
I was just curious as to if there was a limit somewhere at which Solr w
Hi all,
I'm implementing Solr for a course and book search service for college
students, and I'm running into some issues with the highlighting plugin.
After a few minutes of tinkering, searching on Google, searching the group
archives and not finding anything, I thought I would see if anyone else
Anyone know why this would not be working in solr?. Just to recap, we are
trying to exclude document which have fields missing values in the search
results. I have tried and none of it seems to be working:
1. *:* -field:[* TO *]2. -field:[* TO *]3. field:""
The fields are either typed string or
On Mon, Jan 10, 2011 at 05:58:42PM -0500, François Schiettecatte said:
> http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html(you
> need to read this one)
>
> http://java.sun.com/performance/reference/whitepapers/tuning.html (and
> this one).
Yeah, I have these two pages b
This reminded me of a situation I ran into in the past where the JVM was being
rendered useless because it was calling FGC repeatedly. Effectively what was
going on is that a very large array was allocated which swamped the JVM memory
and caused it to trash, much like an OS.
Here are some links
I've noticed that the spellcheck component also seems to tokenize by itself
on question marks, not only period marks.
Based on the spellcheck definition above, does anyone know how to stop Solr
from tokenizing strings on queries such as
www.sometest.com
(which causes suggestions of the form ww
Any sources to cite for this statement? And are you talking about RAM
allocated to the JVM or available for OS cache?
> Not sure if this was mentioned yet, but if you are doing slave/master
> replication you'll need 2x the RAM at replication time. Just something to
> keep in mind.
>
> -mike
>
>
And I don't think I've seen anyone suggest a seperate core just for
Access Control Lists. I'm not sure what that would get you.
Perhaps a separate store that isn't Solr at all, in some cases.
On 1/10/2011 5:36 PM, Jonathan Rochkind wrote:
Access Control Lists
On 1/10/2011 5:03 PM, Dennis Gearon wrote:
What I seem to see suggested here is to use different cores for the things you
suggested:
different types of documents
Access Control Lists
I wonder how sharding would work in that scenario?
Sharding has nothing to do with that scenario at all.
On Mon, Jan 10, 2011 at 01:56:27PM -0500, Brian Burke said:
> This sounds like it could be garbage collection related, especially
> with a heap that large. Depending on your jvm tuning, a FGC could
> take quite a while, effectively 'pausing' the JVM.
>
> Have you looked at something like jstat
Not sure if this was mentioned yet, but if you are doing slave/master
replication you'll need 2x the RAM at replication time. Just something to
keep in mind.
-mike
On Mon, Jan 10, 2011 at 5:01 PM, Toke Eskildsen wrote:
> On Mon, 2011-01-10 at 21:43 +0100, Paul wrote:
> > > I see from your other
Hi Grant,
Its a search relevancy problem. For example:
a document about london reads like
London is not very good for a peaceful break.
we analyse this at the (i can't remember the technical term) is it lexical
level? (bloody hell i think you may have wrote the book !) anyway which
produces tok
What I seem to see suggested here is to use different cores for the things you
suggested:
different types of documents
Access Control Lists
I wonder how sharding would work in that scenario?
Me, I plan on :
For security:
Using a permissions field
For different schmas:
Dynamic fie
On Mon, 2011-01-10 at 21:43 +0100, Paul wrote:
> > I see from your other messages that these indexes all live on the same
> > machine.
> > You're almost certainly I/O bound, because you don't have enough memory for
> > the
> > OS to cache your index files. With 100GB of total index size, you'll
StatsComponent, like many things, relies on FieldCache (and the related
uninverted version in Solr for multivalued fields), which takes up memory and
is related to the number of documents in the index. Strings in FieldCache can
also be expensive.
-Grant
On Jan 10, 2011, at 4:10 PM, Jonathan R
> most of the Solr sites I know of
> have much larger indexes than ram and expect everything to work
> smoothly
Hmm... In that case, throttling the merges would probably help most,
though, yes, that's not available today. In lieu of that, I'd run
large merges during off-peak hours, or better yet,
I found StatsComponent to be slow only when I didn't have enough RAM
allocated to the JVM. I'm not sure exactly what was causing it, but it
was pathologically slow -- and then adding more RAM to the JVM made it
incredibly fast.
On 1/10/2011 4:58 AM, Gora Mohanty wrote:
On Mon, Jan 10, 2011 a
I see a lot of people using shards to hold "different types of
documents", and it almost always seems to be a bad solution. Shards are
intended for distributing a large index over multiple hosts -- that's
it. Not for some kind of federated search over multiple schemas, not
for access control.
No, it also depends on the queries you execute (sorting is a big consumer) and
the number of concurrent users.
> Is that a general rule of thumb? That it is best to have about the
> same amount of RAM as the size of your index?
>
> So, with a 5GB index, I should have between 4GB and 8GB of RAM
>
> I see from your other messages that these indexes all live on the same
> machine.
> You're almost certainly I/O bound, because you don't have enough memory for
> the
> OS to cache your index files. With 100GB of total index size, you'll get best
> results with between 64GB and 128GB of total R
One other possiblity is that the OS or BIOS is doing that, at least on a
laptop.
There is a new feature where, if the load is low enough, non multi threaded
applications can be assigned to one processor and that processor has it's clock
boosted so the older software will run faster on the new p
: What I mean is that when you have publicly exposed search that bots crawl,
they
: issue all kinds of crazy "queries" that result in errors, that add noise to
Solr
: caches, increase Solr cache evictions, etc. etc.
I teld with this type of thing a few years back by having my front end app
e
This sounds like it could be garbage collection related, especially with a heap
that large. Depending on your jvm tuning, a FGC could take quite a while,
effectively 'pausing' the JVM.
Have you looked at something like jstat -gcutil or similar to monitor the
garbage collection?
On Jan 10,
I have a fairly classic master/slave set up.
Response times on the slave are generally good with blips periodically,
apparently when replication is happening.
Occasionally however the process will have one incredibly slow query and
will peg the CPU at 100%.
The weird thing is that it will rema
On Jan 10, 2011, at 12:42 PM, lee carroll wrote:
> Hi
>
> I'm indexing a set of documents which have a conversational writing style.
> In particular the authors are very fond
> of listing facts in a variety of ways (this is to keep a human reader
> interested) but its causing my index trouble.
>
Faceting will do this for you. Check out:
http://wiki.apache.org/solr/SimpleFacetParameters#facet.field
This param allows you to specify a field which should be treated as a facet.
> It will iterate over each Term in the field and generate a facet count using
> that Term as the constraint.
>
>
For
As I understand, a faceted search would be useful if keywords is a
multivalued field and the its field value is just a token.
I want to display the occurences of the tokens wich appear in a indexed (and
stored) text field.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Tok
Hi
I'm indexing a set of documents which have a conversational writing style.
In particular the authors are very fond
of listing facts in a variety of ways (this is to keep a human reader
interested) but its causing my index trouble.
For example instead of listing facts like: the house is white,
--- On Mon, 1/10/11, taimurAQ wrote:
> From: taimurAQ
> Subject: Help needed in handling plurals
> To: solr-user@lucene.apache.org
> Date: Monday, January 10, 2011, 6:35 PM
>
> Hi,
>
> I am currently facing the following problematic scenario:
>
> At index time, i index a field by the value o
H, so if someone says they have SEO skills on their resume, they COULD be
talking about optimizing the SEARH engnie at some site, not just a web site to
be crawled by search engines?
- Original Message
From: Ken Krugler
To: solr-user@lucene.apache.org
Sent: Mon, January 10, 2011
Yeah, it doesn't look like an easy, CRUD based interface.
- Original Message
From: Lukas Kahwe Smith
To: solr-user@lucene.apache.org
Sent: Sun, January 9, 2011 11:33:16 PM
Subject: Re: PHP PECL solr API library
On 10.01.2011, at 08:16, Dennis Gearon wrote:
> Anyone have any experien
- Original Message
From: lee carroll
To: solr-user@lucene.apache.org
Sent: Mon, January 10, 2011 6:48:12 AM
Subject: Re: How to let crawlers in, but prevent their damage?
Sorry not an answer but a +1 vote for finding out best practice for this.
Related to it is DOS attacks. We have rew
On Jan 10, 2011, at 7:02am, Otis Gospodnetic wrote:
Hi Ken, thanks Ken. :)
The problem with this approach is that it exposes very limited
content to
bots/web search engines.
Take http://search-lucene.com/ for example. People enter all kinds
of queries
in web search engines and end up on
I don't nkow about stopping proble3ms with the issues that you've raised.
But I do know that web sites that aren't indempotent with GET requests are in a
hurt locket. That seems to be WAY too many of them.
This means, don't do anything with GET that changes the contents of your web
site.
Regard
On 1/10/2011 8:38 AM, supersoft wrote:
Hello,
I would like to know if there is a trivial procedure/tool for displaying the
number of appearances of each token from query results.
Thanks
Unless I'm misunderstanding what you mean, this sounds exactly like facets.
http://wiki.apache.org/solr/So
Hi,
I am currently facing the following problematic scenario:
At index time, i index a field by the value of "Laptop"
At index time, i index another field with the value of "Laptops"
At query time, i search for "Laptops".
What is happening right now is that i am only getting back "Laptops" in t
Stefan,
You're right. I was attempting to post some quick pseudo-code, but that
is pretty misleading, they should have been elements, like /abc/def/ghi/123.xml, or something to that affect.
Thanks,
Walter
On Mon, Jan 10, 2011 at 10:08 AM, Stefan Matheis <
matheis.ste...@googlemail.com> w
Hey Walter,
what's against just putting your db-location in a 'string' field, and use it
like any other value?
There is no special field-type for something like a
path/directory/location-information, afaik.
Regards
Stefan
On Mon, Jan 10, 2011 at 4:50 PM, Walter Closenfleight <
walter.p.closenfle
I'm very unclear on how to associate what I need to a Solr index entry.
Based on what I've read thus far, you can extract data from text files and
store that in a Solr document.
I have hundreds of thousands of documents in a database/svn type system.
When I index a file, it is likely going to be l
Hi Koji,
I'm using apache-solr-4.0-2010-11-24_09-25-17 from trunk.
A grep for "SOLR-1973" in CHANGES.txt says that it should have been fixed.
Strange...
Regards,
Bernd
Am 10.01.2011 16:14, schrieb Koji Sekiguchi:
> (11/01/10 23:26), Bernd Fehling wrote:
>> Dear list,
>>
>> while trying differ
Hello,
I would like to know if there is a trivial procedure/tool for displaying the
number of appearances of each token from query results.
Thanks
--
View this message in context:
http://lucene.472066.n3.nabble.com/Token-Counter-tp2227795p2227795.html
Sent from the Solr - User mailing list ar
Otis,
The reason I ask is that I run a number of sites on Solr, some with 10
million+ docs faceting on similar types of data, and have not seen anywhere
near this length of initial delay. The main difference is that these sites
facet on single value fields rather that multivalued and that this site
(11/01/10 23:26), Bernd Fehling wrote:
Dear list,
while trying different options with DIH and SciptTransformer I also
tried using the "required=true" option for a field.
I have 3 records:
first title
identifier_01
http://www.foo.com/path/bar.html
Hi Ken, thanks Ken. :)
The problem with this approach is that it exposes very limited content to
bots/web search engines.
Take http://search-lucene.com/ for example. People enter all kinds of queries
in web search engines and end up on that site. People who visit the site
directly don't nece
Hi Howard,
This is normal. Your first query is reading a bunch of index data from disk
and
your RAM is then caching it. If your first query involves sorting, some more
data for FieldCache is being read and stored. If there are multiple sort
fields, one such thing for each. If facets are in
Sorry not an answer but a +1 vote for finding out best practice for this.
Related to it is DOS attacks. We have rewrite rules in between the proxy
server and solr which attempts to filter out undesriable stuff but would it
be better to have a query app doing this?
any standard rewrite rules whic
Hi Otis,
From what I learned at Krugle, the approach that worked for us was:
1. Block all bots on the search page.
2. Expose the target content via statically linked pages that are
separately generated from the same backing store, and optimized for
target search terms (extracted from your o
Dear list,
while trying different options with DIH and SciptTransformer I also
tried using the "required=true" option for a field.
I have 3 records:
first title
identifier_01
http://www.foo.com/path/bar.html
second title
identifier_02
Gora,
Thanks for the response. After taking another look, you are correct about
the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0). I
didn't recognize the case difference in the two function calls, so missed
it. I'll keep looking into the original issue and reply if I find a
ca
Hi,
I'd appreciate some explanation on what may be going on in the following
scenario using multivalued fields and facets.
Solr version: 1.5
Our index contains 35 million docs, and our search is using 2 multivalued
fields as facets. There are approx 5 million different values in one field
and 50
Hi,
How do people with public search services deal with bots/crawlers?
And I don't mean to ask how one bans them (robots.txt) or slow them down (Delay
stuff in robots.txt) or prevent them from digging too deep in search results...
What I mean is that when you have publicly exposed search that bo
Any thoughts on this one? Should i add a ticket?
On Tuesday 04 January 2011 20:08:40 Markus Jelsma wrote:
> Hi,
>
> It seems abort-fetch nicely removes the index directory which i'm
> replicating to which is fine. Restarting, however, does not trigger the
> the same feature as the abort-fetch com
Hello,
Are people using Solr trunk in serious production environments? I suspect the
answer is yes, just want to see if there are any gotchas/warnings.
Thanks,
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
Thanks Alexander
2011/1/3 Alexander Kanarsky
> Joan,
>
> current version of the patch assumes the location and names for the
> schema and solrconfig files ($SOLR_HOME/conf), it is hardcoded (see
> the SolrRecordWriter's constructor). Multi-core configuration with
> separate configuration locatio
when i start statsComponent i get this message:
INFO: UnInverted multi-valued field
{field=product,memSize=4336,tindexSize=46,time=0,phase1=0,nTerms=1,bigTerms=1,termInstances=0,uses=0}
what means this ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Tuning-StatsComponent-
Check your libraries for Tika related Jar files.Tika related files must be on
classpath of solr
-
Grijesh
--
View this message in context:
http://lucene.472066.n3.nabble.com/Internal-Server-Error-when-indexing-a-pdf-file-tp2214617p2226374.html
Sent from the Solr - User mailing list archive
Hi,
We are using :
Solr Specification Version: 1.4.1
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
Lucene Specification Version: 2.9.3
Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55
# java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environme
oh thx for your fast reply.
i will try the suggestions.
in meanwhile more information about my index.
i have 2 solr instances with 6 cores. each core have his own index and one
core`s index is about 30 million documents.
each document have:(stats-relevant)
amount
amount_euro
currency_id
user
Hello,
You could try taking advantage of Solr's facetization feature : provided
that you have the amount stored in the amount field and the currency stored
in the currency field, try the following request :
http://host:port
/solr/select?q=YOUR_QUERY&stats=on&stats.field=amount&f.amount.stats.facet
On Mon, Jan 10, 2011 at 2:28 PM, stockii wrote:
>
> Hello.
>
> i`m using the StatsComponent to get the sum of amounts. but solr
> statscomponent is very slow on a huge index of 30 Million documents. how can
> i tune the statscomponent ?
Not sure about this problem.
> the problem is, that i have
Hi Gora,
thanks a lot, very nice solution, works perfectly.
I will dig more into ScriptTransformer, seems to be very powerful.
Regards,
Bernd
Am 08.01.2011 14:38, schrieb Gora Mohanty:
> On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling
> wrote:
>> Hello list,
>>
>> is it possible to load only sel
Hello.
i`m using the StatsComponent to get the sum of amounts. but solr
statscomponent is very slow on a huge index of 30 Million documents. how can
i tune the statscomponent ?
the problem is, that i have 5 currencys and i need to send for each currency
a new request. thats make the solr search
Chris,
Our solr conf folder is in read-only file system. But the data directory
(index) is not in read-only file system. As per our production environment
guidelines, the configuration files should be in read-only file system.
Thanks,
SRD
--
View this message in context:
http://lucene.472066.
79 matches
Mail list logo