Thank you very much, Shawn. I had understood that Zookeeper was a mandatory
component for Solr 4, and it is immensely useful to know that it is
possible to do without.
/Martin Koch
On Fri, Mar 1, 2013 at 3:58 PM, Shawn Heisey wrote:
> On 3/1/2013 7:34 AM, Martin Koch wrote:
>
>> Most of the ti
: In the explain tag (debugQuery=true)
: what does the *fieldWeight* value refer to?,
fieldWeight is just a label being put on the the product of the tf, idf,
and fieldNorm for that term. (I don't remember why it's refered to as the
"fieldWeight" ... i think it may just be historical, since
Hi,
We are going about solving this problem by splitting a N-page document in
to N separate documents (one per page, type=Page) + 1 additional combined
document (that has all the pages, type=Combined). All the N+1 documents
have the same doc_id.
The search is initially performed against the combi
Hi,
Sure, lots of things could be done with creative curl usage but there
is still something to be said about having an ecosystem of nice devops
friendly tools...
Otis
--
Solr & ElasticSearch Support
http://sematext.com/
On Wed, Feb 27, 2013 at 8:01 AM, Upayavira wrote:
> I took the ch
Hi Chris,
I started a discussion on this topic on the ElasticSearch mailing list the
other day. As soon as SolrCloud get index alias functionality (JIRA for it
exists) I believe the same approach to cluster expansion will be applicable
to SolrCloud as what can be done with ES today:
http://searc
H Mike,
Doesn't exist as far as I know, but would be a nice contribution.
Otis
--
Solr & ElasticSearch Support
http://sematext.com/
On Fri, Mar 1, 2013 at 11:30 AM, Mike Hugo wrote:
> Does anyone know if a version of ConcurrentUpdateSolrServer exists that
> would use the size in memory of
Yes, the SolrEntityProcessor can be used for this.
If you stored the original document bodies in the Solr index!
You can also download the documents in Json or CSV format and re-upload
those to old Solr. I don't know if CSV will work for your docs. If CSV
works, you can directly upload what you
: We are strongly considering opening the source of our DMP (Data Management
: Platform), if it proves to be technically interesting to other developers /
: companies.
:
: More details: http://www.s1mbi0se.com/s1mbi0se_DMP.html
If you do decide you want to open source your platform, an important
: For full reindexes (DIH full-import), I use build cores, then swap them with
: the live cores. I don't do this for performance reasons, I do it because I
: want to continue making incremental updates to the live cores while the
: rebuild is underway. The rebuild takes four hours.
that's kind
>From your response, I gather that there's no way to maintain a single set of
fields for multiple languages i.e. I can't use a field "text" for the body
text. Instead, I would have to define text_en, text_fr, text_ru etc each
mapped to their specific languages.
--
View this message in context:
Didn't know ! thank you Shown :)
On 03/01/2013 09:23 PM, Shawn Heisey wrote:
On 3/1/2013 1:50 PM, Jilal Oussama wrote:
You can also specify in you schema that the default query operator is
AND.
This is deprecated as of Solr 4.0, so I don't mention it.
--
Oussama Jilal
Hi,
Q1. You use langid for the detection, and your chosen field(s) can be mapped to
new names such as title->title_en or title_de. Thus you need to configure
your schema with a separate fieldType for every language you want to support
if you'd like to use language specific stemming and stopwords e
On 3/1/2013 1:50 PM, Jilal Oussama wrote:
You can also specify in you schema that the default query operator is AND.
This is deprecated as of Solr 4.0, so I don't mention it.
Thanks Shawn.
Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Office: +1 (972) 405-2970 | Mobile: +1 (817) 791-1627
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham
-Original Message-
From: Shawn Heisey [ma
You can also specify in you schema that the default query operator is AND.
On Mar 1, 2013 5:35 PM, "Jack Park" wrote:
> I found a tiny notice about just using quotes; tried it in the admin
> query console and it works. e.g. label:"car house" would fetch any
> document for which the label field co
As I understand, SOLR allows us to plug in language detection
processors: http://wiki.apache.org/solr/LanguageDetection
GIven that our use case involves a collection of mixed language documents,
Q1: Assume that we plug in language detection, will this affect the
stemming and other language specifi
On 3/1/2013 12:01 PM, Branham, Jeremy [HR] wrote:
I've read this...
http://stackoverflow.com/questions/5154093/solr-requests-time-out-during-index-update-perhaps-replication-a-possible-solut
[Using SOLR 1.4]
We are doing intraday full re-index because we aren't sure if the document has
been mo
It was a subset of HTML, yes, and it appears to work for my needs, thank you!
-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Friday, March 01, 2013 11:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Defining tokenizer pattern with < character
Are you tr
Here is my config now:
And my initial heap allocation is now set to 4 GB and a max of 8GB as per
Shawn's recommendation.
Thanks Jack, Walter and Shawn for your suggestions.
I will post the results on this forum for others to
Jack,
No. It is a simple search. I cannot limit what the search will be like. Like
I mentioned to Walter, search could land for a "*@gmail.com" or a "*yahoo*".
Most of the time it is the dreaded and expensive "contains" search.
Regards,
Giri
--
View this message in context:
http://lucene.472066
That is a good start. Use the Analysis page in the admin UI to see what the
tokenizer does.
wunder
On Mar 1, 2013, at 11:02 AM, girish.gopal wrote:
> Hello Wunder,
> I see your point. Will this help if I search for "giri", "giri@",
> "giri@gmail", "@gmail.com" and other combinations.
> So, if
On 3/1/2013 11:49 AM, girish.gopal wrote:
My Specs are:
Windows Server 2008 64 bit Dual Quad Core CPUs with 64 GB of RAM.
I have allocated 55GB of memory to Tomcat in its config.
In addition to the advice you've gotten about wildcards, your memory
allocation needs some tweaking. It is highly
It sounds like you have enough raw memory. How big is the index (GB)?
Are you doing anything like ngrams that generate zillions of terms?
-- Jack Krupansky
-Original Message-
From: girish.gopal
Sent: Friday, March 01, 2013 1:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Email Sea
Hello Wunder,
I see your point. Will this help if I search for "giri", "giri@",
"giri@gmail", "@gmail.com" and other combinations.
So, if I use a StandardTokenizer, I will get the ALPHANUM without the "@"
and the '.'. So my phrases would be "giri","gmail","com". And I should do a
phrase search on t
I've read this...
http://stackoverflow.com/questions/5154093/solr-requests-time-out-during-index-update-perhaps-replication-a-possible-solut
[Using SOLR 1.4]
We are doing intraday full re-index because we aren't sure if the document has
been modified.
Currently we are re-indexing to a second cor
Don't use wildcards. A leading wildcard matches against every token in the
index. This is the search equivalent of a full table scan in a relational
database.
Instead, create a field type that tokenizes e-mail addresses into pieces, then
use phrase search against that.
The address "f...@yahoo.
Thanks Jack. The search is slow only when it is issued for the first time.
Ex. querying for *@gmail* takes 20+ seconds for the first time; when I
re-issue the same search, then it returns pretty quick(Possibly reading out
of cache).
But when I issue a new search *@yahoo.* then this too takes abou
Using Chrome (latest) on Mac OSX 10.8.2. In 4.1.0, I accessed cores
via something like:
http://machine:port/solr/#/corename
and got to the Ping, Query, Schema, etc. I attempted a similar URL
with my local installation and got the error I mentioned. I have one
core locally named "master", and m
Make sure you have enough heap space for your JVM and the most if not all of
your index fits in OS system memory.
After you start Solr and issue a couple of queries, how much JVM heap is
available?
-- Jack Krupansky
-Original Message-
From: girish.gopal
Sent: Friday, March 01, 2013
Hello,
I have over 40 million records/documents and I need to retrieve them using
wildcard searches on email and / or firstname and / or lastname.
The firstname, lastname and blank search (*:*) all return results within 3
seconds. But my Email search alone takes more than 20-25 secs.
I would like
Hey Neal
We changed the Navigation after 4.1 was released
(https://issues.apache.org/jira/browse/SOLR-4284) But you're the first one
reporting problems with this change (if it's related).
1) What Browser on which OS are you using?
2) And what is/are the Name of your core(s)?
3) When you're talk
On 3/1/2013 11:00 AM, Amit Nithian wrote:
But does that mean that in SolrCloud, slave nodes are busy indexing
documents?
With SolrCloud, there is no such thing as master or slave. When you
index documents, all applicable shard replicas are indexing the
documents independently. I think the s
But does that mean that in SolrCloud, slave nodes are busy indexing
documents?
On Fri, Mar 1, 2013 at 5:37 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:
> Amit,
>
> NRT is not possible in a master-slave setup because of the necessity
> of a hard commit and replication, both
Just pulled down the latest SVN and built it in place to try and
upgrade from 4.1 (having replication issues). However, bringing it up
with my current core (copied) presents me with the admin page, but an
empty drop-down to select specific cores. If I go to "Core Admin" I
can see the core, I can
I found a tiny notice about just using quotes; tried it in the admin
query console and it works. e.g. label:"car house" would fetch any
document for which the label field contained that phrase.
Jack
On Fri, Mar 1, 2013 at 9:17 AM, Shawn Heisey wrote:
> On 3/1/2013 8:50 AM, vsl wrote:
>>
>> I wou
Are you trying to strip out HTML tags? There are built-in classes that do that.
Or you might want to parse the XML or HTML before you pass it to Solr. An XML
parser will interpret CDATA so that you never have to think about it. The
parsed data is just text.
wunder
On Mar 1, 2013, at 9:21 AM, S
Kristian,
I think what you want is pattern="<[^>]>" (untested) - that is, you
probably don't want to regex-escape the character class brackets "[" and "]",
and you should html-escape the angle brackets.
Steve
On Mar 1, 2013, at 11:42 AM, "Van Tassell, Kristian"
wrote:
> I'm trying to defin
On 3/1/2013 8:50 AM, vsl wrote:
I would like to send query like "car house". My expectation is to have
resulting documents that contains both car and house. Unfortunately Apache
Solr out of the box returns documents as if the whitespace between was
treated as OR. Does anybody know how to fix this
Hi,
I would like to send query like "car house". My expectation is to have
resulting documents that contains both car and house. Unfortunately Apache
Solr out of the box returns documents as if the whitespace between was
treated as OR. Does anybody know how to fix this?
BR
Pawos
--
View this m
Hi Erick,
Thanks for the response. The the terms are indexed. I will need to dig
deeper and see there the issue is, it might be a correct syntax but I may
be using it incorrectly. In the meantime I have added a new column that
is end_time - order_prep_time and validate if a session is available
ag
thanks Eric,
Well the index was build from scratch with 4.1. Our IT Engineer was able to
take some CPU sampler and on analyzing it he mentioned that
"While running I noticed that the method:
java.util.AbstractList$ltr.hasNext() (called by
org.apache.lucene.document.Document.getFields() ) taking a
Does anyone know if a version of ConcurrentUpdateSolrServer exists that
would use the size in memory of the queue to decide when to send documents
to the solr server?
For example, if I set up a ConcurrentUpdateSolrServer with 4 threads and a
batch size of 200 that works if my documents are small.
Worked! Thanks :)
R
- Opprinnelig melding -
>
> Of course! My excuse is called Friday afternoon :) Will test when I'm
> in front of a computer :)
>
> Thanks!
>
> Remi
>
> Sendt fra min HTC
>
> - Reply message -
> Fra: "Ahmet Arslan"
> Til:
> Emne: NorwegianLightStemFilterFa
Hi,
I have a lot of non standard IBM RSS feeds that needs to be crawled (via
ManifoldCF v1.1.1) and put into solr 4.0 final.
The problem is that we need to put the additional non standard metadata into
solr.
I've confirmed via fiddler that manifoldcf is indeed sending all the
appropriate metadata b
Hi wunder,
Great advice!
As a matter of fact, I choose to use upper case due to the document I
indexed, but it is really pain in the ass when typing the field names all in
upper case.
I thought there probably would be a way to set field names case-insensitive.
I was wrong, wasn't I?
Thanks,
Hyrax
I'm a little confused here because if you are searching q=jeap OR denim , then
you should be getting both documents back. Having spellcheck configured does
not affect your search results at all. Having it in your request will sometime
result in spelling suggestions, usually if one or more term
Hi Shawn,
Thanks for your reply.
So you mean the field name can't be case insensitive when specifies in a
query?
I'm gonna stop doing research on this issue if this is confirmed...
Thanks,
Hyrax
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Case-sensitivity-issue-with
Is there an easy (enough) way to do this, storing the page number as a payload
on each term?
James Dyer
Ingram Content Group
(615) 213-4311
-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
Sent: Thursday, February 28, 2013 3:33 PM
To: solr-user@luc
Of course! My excuse is called Friday afternoon :) Will test when I'm in front
of a computer :)
Thanks!
Remi
Sendt fra min HTC
- Reply message -
Fra: "Ahmet Arslan"
Til:
Emne: NorwegianLightStemFilterFactory and protected words
Dato: fre., mars 1, 2013 15:50
Hi Remi,
You need
On 3/1/2013 7:34 AM, Martin Koch wrote:
Most of the time things run just fine; however, we see this error every so
often, and fix it as described.
How do I run solr in non cloud mode? Could you point me to a description?
The zookeeper options are required for cloud mode - zkHost to tell it
ab
Hi Remi,
You need to use *Factory class.
Ahmet
--- On Fri, 3/1/13, Remi Mikalsen wrote:
> From: Remi Mikalsen
> Subject: Re: NorwegianLightStemFilterFactory and protected words
> To: solr-user@lucene.apache.org
> Date: Friday, March 1, 2013, 4:38 PM
> Thanks for such a quick response!
>
>
Thanks for such a quick response!
I tried out the suggestion, but I'm struggeling with actually making it work:
schema.xml:
Produces an instantiation error:
SEVERE: org.apache.solr.common.SolrException: Error instantiating class:
'org.apache.lucene.analysis.KeywordMarkerFilter
...
Caused b
Hi Otis
Thanks for the info. I tried 2 different ways that both seem to work okay.
I added to the in the solrconfig.xml
And I tried adding the
To the section, in the Schema.xml file.
Both ways work ok.
Cheers Mark
On 28/02/2013 08:05, "Otis Gospodnetic" wrote:
> Mark,
>
> Look at
> ht
Most of the time things run just fine; however, we see this error every so
often, and fix it as described.
How do I run solr in non cloud mode? Could you point me to a description?
Thanks,
/Martin
On Fri, Mar 1, 2013 at 3:30 PM, Mark Miller wrote:
> It sounds like you have some sort of config
It sounds like you have some sort of configuration issue perhaps. When things
are setup right, you should not be seeing anything like this.
Whether or not you can do without ZooKeeper depends on what your requirements
are and what you want to support. You can use SolrCloud mode and non SolrCloud
On a host that is running two separate solr (jetty) processes and a single
zookeeper process, we're often seeing solr complain that it can't find a
particular core. If we restart the solr process, when it comes back up, it
has lost all information about its cores
Feb 28, 2013 10:26:47 PM org.apach
Hi Remi,
The filter does not support protwords but does support the KeywordAttribute.
Use the KeywordMarkerFilter to mark a list of words and protect them from
stemming.
http://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeywordMarkerFilter.html
Cheer
Check to see that you use compatible analyzers for index and query for that
field type. You can use the Solr Admin UI Analyzer page to enter the same
text and see how it gets analyzed for both index and query.
Also, if you do change the analyzer for a field type, make sure to fully
reindex you
While the NorwegianLightStemFilterFactory generally works very well, I have
come across a few words I'd very much like not to stem.
The following words:
- lærere (teachers)
- lærer (teacher)
- lære (teach)
all match :
- lær (leather)
I tried adding protected="protwords.txt" to my NorwegianL
Amit,
NRT is not possible in a master-slave setup because of the necessity
of a hard commit and replication, both of which add considerable
delay.
Solr Cloud sends each document for a given shard to each node hosting
that shard, so there's no need for the hard commit and replication for
visibilit
You will want to start zookeeper independently of Solr. If you want
fault tolerance with regard to Zookeeper, you should have three of them.
Hosting at Amazon, for example, I'd have one Zookeeper in each of three
availability zones, meaning that any one AZ can go down, leaving me with
two functioni
What about SolrEntityProcessor in DIH?
https://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor
Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
Hi Upayavira,
Sorry if my question is out of solr subject.
Thanks or this information,
Bruno
Le 01/03/2013 13:33, Upayavira a écrit :
This really is not a Solr question, rather it is a Tomcat one.
You can configure alternative/additional ports in your conf/server.xml
file. However, if you ar
Hi
Thanks for this info !
Le 01/03/2013 13:42, Miguel a écrit :
Hi
You could do an ip routing usind linux command iptables to
redirect request from port 80 to Tomcat port.
In this page explain how-to do:
http://forum.slicehost.com/index.php?p=/discussion/2497/iptables-redirect-port-80-to
Hi
You could do an ip routing usind linux command iptables to redirect
request from port 80 to Tomcat port.
In this page explain how-to do:
http://forum.slicehost.com/index.php?p=/discussion/2497/iptables-redirect-port-80-to-port-8080/p1
El 01/03/2013 12:43, Bruno Mannina escribió:
Dear
This really is not a Solr question, rather it is a Tomcat one.
You can configure alternative/additional ports in your conf/server.xml
file. However, if you are running on Linux, only root can run processes
on ports below 1024 so that might not help you.
You might find it just as easy to run Apach
Dear Users,
Actually we use Solr3.6/Tomcat6 on a specific port like 1234.
We connected our software to the solr on this specific port,
but several users have a lot of problem to open this specific port on
their network company.
I would like to know, If I can define two ports at the same time
Hello, I'm new to Solr, and I need to create the smallest possible Solr
cluster, supporting replication and fault tolerance, with no single point of
failure.
I've followed the instructions at http://wiki.apache.org/solr/SolrCloud, and
that's fine, but since my dataset it's not that large, I don't r
Hi Jan og takk for sist :)
More stuff is coming! Pageing and facets are the next things...
Fergus
On Mon, Feb 25, 2013 at 4:11 PM, Jan Høydahl wrote:
> Great Fergus,
>
> You have really been working on this since the MeetUp in Oslo! Impressive
> how much you can do with little code.
>
> Hav
Can you use a checkout from SVN? Does that resolve your issues? That is
what will become 4.2 when it is released soon:
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/
Upayavira
On Fri, Mar 1, 2013, at 10:51 AM, Dotan Cohen wrote:
> On Fri, Mar 1, 2013 at 12:22 PM, Rafał Kuć wrot
On Fri, Mar 1, 2013 at 12:22 PM, Rafał Kuć wrote:
> Hello!
>
> As far as I know you have to re-index using external tool.
>
Thank you Rafał. That is what I figured.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
Hello!
As far as I know you have to re-index using external tool.
--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
> On Fri, Mar 1, 2013 at 11:59 AM, Rafał Kuć wrote:
>> Hello!
>>
>> I assumed that re-indexing can be painful in your case, if it
On Fri, Mar 1, 2013 at 11:59 AM, Rafał Kuć wrote:
> Hello!
>
> I assumed that re-indexing can be painful in your case, if it wouldn't
> you probably would re-index by now :) I guess (didn't test it myself),
> that you can create another collection inside your cluster, use the
> old codec for Lucen
Fantastic! Thanks Eric.
Tim
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Thursday, February 28, 2013 6:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Repartition solr cloud
In the works, high priority:
https://issues.apache.org/jira/browse/SOLR-3755
Is it possible to write a plugin that is converting each page
separately with Tika and saving all pages in one document (maybe in a
dynamic field like "page_*")? I would like to have only one document
stored in SOLR for each pdf (it fit's better to the way my web
application is managing the
Hello!
I assumed that re-indexing can be painful in your case, if it wouldn't
you probably would re-index by now :) I guess (didn't test it myself),
that you can create another collection inside your cluster, use the
old codec for Lucene 4.0 (setting the version in solrconfig.xml should
be enough)
On Fri, Mar 1, 2013 at 11:28 AM, Rafał Kuć wrote:
> Hello!
>
> I suppose the only way to make this work will be reindexing the data.
> Solr 4.1 uses Lucene 4.1 as you know, which introduced new default
> codec with stored fields compression and this is one of the reasons
> you can't read that inde
Hello!
I suppose the only way to make this work will be reindexing the data.
Solr 4.1 uses Lucene 4.1 as you know, which introduced new default
codec with stored fields compression and this is one of the reasons
you can't read that index with 4.0.
--
Regards,
Rafał Kuć
Sematext :: http://semat
Solr 4.1 has been giving up much trouble rejecting documents indexed.
While I try to work my way through this, I would like to move our
application back to Solr 4.0. However, now when I try to start Solr
with same index that was created with Solr 4.0 but has been running on
4.1 few a few days I get
Hi,liwei.
I have met this problem before,and my solution is expanding synonyms first
,then normalizing.For example,'北京市' and ‘北京’ are synonyms,in indexing
process,my programme convert them to '北京市',and in searching process do the
same logic.
In solr.SynonymFilterFactory,posIncreament between synon
80 matches
Mail list logo