from:"Mark Fletcher"

Re: Search for FirstName with first Char uppercase followed by * not giving result; getting result with all lowercase and *

2011-01-30 Thread Mark Fletcher

Hi Ahmet,

Thanks for the reply.

I had attached the Analysis report of the query George*

It is found to be split into terms *George** and *George* by the
WordDelimiterFilterFactory and the LowerCaseFilterFactory converts it to *
george** and *george*

When I indexed *George *it was also finally analyzed and stored as *george*
Theny why is it that I don't get a match as per the analysis report I had
attached in my previous mail.

Or Am I missing something basic here?

Many Thanks.

M
On Sun, Jan 30, 2011 at 4:34 AM, Ahmet Arslan  wrote:

>
> :When i try george* I get results. Whereas George* fetches no results.
>
>
> Wildcard queries are not analyzed by QueryParser.
>
>
>
>
>

SOLR clustering ant code not compiling

2010-02-23 Thread Mark Fletcher

Hi,

I downloaded the latest version of SOLR. From the contrib/clustering
directory ran *ant get-libraries*. It is not building!.

Finally I manually downloaded colt, nni, pcj, simple xml and solr-common 1.3
jars and put them in the lib and restarted SOLR. It is giving me the
following err:-

*Error Loading Class org.apache.solr.handler.clustering.ClusteringComponent
at org.apache.solr.core.SolrResourceLoader.findClass*
**
Could some one pls help on how I can proceed.

BR
Mark.

Re: SOLR clustering ant code not compiling

2010-02-23 Thread Mark Fletcher

Hi Koji,

Thank you so much for the reply.

I am not much familiar with the open source "trunk". So I downloaded solr1.4
from the following location
http://www.apache.org/dyn/closer.cgi/lucene/solr/

On the browser I can see this err:-
-
org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.clustering.ClusteringComponent' at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at
org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at
org.apache.solr.core.SolrCore.(SolrCore.java:551) at ...

Please find my ANT screen attached herewith. I used ANT 1.7.1.

I separately downloaded the colt, nni, pcj, simple-xml jars
into  contrib/clustering/lib/downloads and again ran the ANT command. But
the error is still there.

Any early suggestions wud be a great help.

In a separate try I deployed the solr.war in tomcat webapps and put all the
jars in webapps/solr/WEB-INF/lib/{all jars including the downloded ones}
... still I get the error on browser that that the clustering class cldnot
be found.

Without running the ant command is it fine if we just download those 4 extra
jars and put it in lib in addition to the clustering jars and the usualy
solr jars???

Thanks!
Mark.

On Tue, Feb 23, 2010 at 9:55 AM, Koji Sekiguchi  wrote:

> Mark Fletcher wrote:
>
>> Hi,
>>
>> I downloaded the latest version of SOLR. From the contrib/clustering
>> directory ran *ant get-libraries*. It is not building!.
>>
>>
>>
> I've just tried ant get-libraries under contrib/clustering without
> any problems. I used trunk. What was your error message?
>
>
> Finally I manually downloaded colt, nni, pcj, simple xml and solr-common
>> 1.3
>> jars and put them in the lib and restarted SOLR. It is giving me the
>> following err:-
>>
>>
>>
> What's solr-common 1.3?
> Did you put colt, nni, ... jars under contrib/clustering/lib/downloads/
> directory?
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>

merge indexes command

2010-03-04 Thread Mark Fletcher

Hi,

Can someone pls suggest how to use this command as a part of linux script:

*
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=core0&indexDir=/opt/solr/core1/data/index&indexDir=/opt/solr/core2/data/index
*
Will just adding *curl* at  the beginning help..

I tried this but it gives err:-
*Missing required parameter: core*
**
Any help is deeply appreciated.

Thanks and Rgds,
Mark.

test mail... my mails to solr-user@lucene.apache.org are bouncing ... sorry for any inconvenience

2010-03-06 Thread Mark Fletcher

Hi,

Users pls ignore this mail. I am just sending a test mail to check whether
my user id is okay.
The mails I am sending to this group is bouncing from yesterday.Pls excuse
me for any inconvenience.

Thanks and Rgds,
Mark

index merge

2010-03-06 Thread Mark Fletcher

Hi,

I have a doubt regarding Index Merging:-

I have set up 2 cores COREX and COREY.
COREX - always serves user requests
COREY - gets updated with the latest values (dataDir is in a different
location from COREX)
I tried merging coreX and coreY at the end of COREY getting updated with the
latest data values so that COREX and COREY are having the latest data. So
the user who always queries COREX gets the latest data.Pls find the various
approaches I followed and the commands used.

I tried these merges:-
COREX = COREX and COREY merged
curl '
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreX/data/index&indexDir=/opt1/solr/coreY/data/index
'

COREX = COREY and COREY merged
curl '
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreY/data/index&indexDir=/opt1/solr/coreY/data/index
'

COREX = COREY and COREA merged (COREA just contains the initial 2 seed
segments.. a dummy core)
curl '
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreY/data/index&indexDir=/opt1/solr/coreA/data/index
'

When I check the record count in COREX and COREY, COREX always contains
about double of what COREY has. Is everything fine here and just the record
count is different or is there something wrong.
Note:- I have only 2 cores here and I tried the X=X+Y approach, X=Y+Y and
X=Y+A approach where A is a dummy index. Never have the record counts
matched after the merging is done.

Can someone please help me understand why this record count difference
occurs and is there anything fundamentally wrong in my approach.

Thanks and Rgds,
Mark.

Fwd: index merge

2010-03-07 Thread Mark Fletcher

Hi,

I have created 2  identical cores coreX and coreY (both have different
dataDir values, but their index is same).
coreX - always serves the request when a user performs a search.
coreY - the updates will happen to this core and then I need to synchronize
it with coreX after the update process, so that coreX also has the
   latest data in it.  After coreX and coreY are synchronized, both
should again be identical again.

For this purpose I tried core merging of coreX and coreY once coreY is
updated with the latest set of data. But I find coreX to be containing
double the record count as in coreY.
(coreX = coreX+coreY)

Is there a problem in using MERGE concept here. If it is wrong can some one
pls suggest the best approach. I tried the various merges explained in my
previous mail.

Any help is deeply appreciated.

Thanks and Rgds,
Mark.



-- Forwarded message --
From: Mark Fletcher 
Date: Sat, Mar 6, 2010 at 9:17 AM
Subject: index merge
To: solr-user@lucene.apache.org
Cc: goks...@gmail.com


Hi,

I have a doubt regarding Index Merging:-

I have set up 2 cores COREX and COREY.
COREX - always serves user requests
COREY - gets updated with the latest values (dataDir is in a different
location from COREX)
I tried merging coreX and coreY at the end of COREY getting updated with the
latest data values so that COREX and COREY are having the latest data. So
the user who always queries COREX gets the latest data.Pls find the various
approaches I followed and the commands used.

I tried these merges:-
COREX = COREX and COREY merged
curl '
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreX/data/index&indexDir=/opt1/solr/coreY/data/index
'

COREX = COREY and COREY merged
curl '
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreY/data/index&indexDir=/opt1/solr/coreY/data/index
'

COREX = COREY and COREA merged (COREA just contains the initial 2 seed
segments.. a dummy core)
curl '
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreY/data/index&indexDir=/opt1/solr/coreA/data/index
'

When I check the record count in COREX and COREY, COREX always contains
about double of what COREY has. Is everything fine here and just the record
count is different or is there something wrong.
Note:- I have only 2 cores here and I tried the X=X+Y approach, X=Y+Y and
X=Y+A approach where A is a dummy index. Never have the record counts
matched after the merging is done.

Can someone please help me understand why this record count difference
occurs and is there anything fundamentally wrong in my approach.

Thanks and Rgds,
Mark.

Re: index merge

2010-03-08 Thread Mark Fletcher

Hi Shalin,

Thank you for the reply.

I got your point. So I understand merge will just duplicate things.

I ran the SWAP command. Now:-
COREX has the dataDir pointing to the updated dataDir of COREY. So COREX has
the latest.
Again, COREY (on which the update regularly runs) is pointing to the old
index of COREX. So this now doesnt have the most updated index.

Now shouldn't I update the index of COREY (now pointing to the old COREX) so
that it has the latest footprint as in COREX (having the latest COREY
index)so that when the update again happens to COREY, it has the latest and
I again do the SWAP.

Is a physical copying of the index  named COREY (the latest and now datDir
of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
orginal non-updated index of COREX) the best way for this or is there any
other better option.

Once again, later when COREY is again updated with the latest, I will run
the SWAP again and it will be fine with COREX again pointing to its original
dataDir (now the updated one).So every even SWAP command run will point
COREX back to its original dataDir. (same case with COREY).

My only concern is after the SWAP is done, updating the old index (which was
serving previously and now replaced by the new index). What is the best way
to do that? Physically copy the latest index to the old one and make it in
sync with the latest one so that by the time it is to get the latest updates
it has the latest in it so that the new ones can be added to this and it
becomes the latest and is again swapped?

Please share your opinion. Once again your help is appreciated. I am kind of
going in circles with multiple indexs for some days!

Thanks and Rgds,
Mark.

On Mon, Mar 8, 2010 at 7:45 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Mark,
>
> On Sun, Mar 7, 2010 at 6:20 PM, Mark Fletcher
> wrote:
>
> >
> > I have created 2  identical cores coreX and coreY (both have different
> > dataDir values, but their index is same).
> > coreX - always serves the request when a user performs a search.
> > coreY - the updates will happen to this core and then I need to
> synchronize
> > it with coreX after the update process, so that coreX also has the
> >   latest data in it.  After coreX and coreY are synchronized,
> both
> > should again be identical again.
> >
> > For this purpose I tried core merging of coreX and coreY once coreY is
> > updated with the latest set of data. But I find coreX to be containing
> > double the record count as in coreY.
> > (coreX = coreX+coreY)
> >
> > Is there a problem in using MERGE concept here. If it is wrong can some
> one
> > pls suggest the best approach. I tried the various merges explained in my
> > previous mail.
> >
> >
> Index merge happens at the Lucene level which has no idea about uniqueKeys.
> Therefore when you merge two indexes containing exactly the same documents
> (by uniqueKey), you get double the document count.
>
> Looking at your scenario, it seems to me that what you want to do is a swap
> operation. coreX is serving the requests, coreY is updated and now you can
> swap coreX with coreY so that new requests hit the updated index. I suggest
> you look at the swap operation instead of index merge.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: index merge

2010-03-08 Thread Mark Fletcher

Hi Shalin,

Thank you for the mail.
My main purpose of having 2 identical cores
COREX - always serves user request
COREY - every day once, takes the updates/latest data and passess it on to
COREX.
is:-

Suppose say I have only one COREY and suppose a request comes to COREY while
the update of the latest data is happening on to it. Wouldn't it degrade
performance of the requests at that point of time?

So I was planning to keep COREX and COREY always identical. Once COREY has
the latest it should somehow sync with COREX so that COREX also now has the
latest. COREY keeps on getting the updates at a particular time of day and
it will again pass it on to COREX. This process continues everyday.

What is the best possible way to implement this?

Thanks,

Mark.

On Mon, Mar 8, 2010 at 9:53 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Mark,
>
>  On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher <
> mark.fletcher2...@gmail.com> wrote:
>
>>
>> I ran the SWAP command. Now:-
>> COREX has the dataDir pointing to the updated dataDir of COREY. So COREX
>> has the latest.
>> Again, COREY (on which the update regularly runs) is pointing to the old
>> index of COREX. So this now doesnt have the most updated index.
>>
>> Now shouldn't I update the index of COREY (now pointing to the old COREX)
>> so that it has the latest footprint as in COREX (having the latest COREY
>> index)so that when the update again happens to COREY, it has the latest and
>> I again do the SWAP.
>>
>> Is a physical copying of the index  named COREY (the latest and now datDir
>> of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
>> orginal non-updated index of COREX) the best way for this or is there any
>> other better option.
>>
>> Once again, later when COREY is again updated with the latest, I will run
>> the SWAP again and it will be fine with COREX again pointing to its original
>> dataDir (now the updated one).So every even SWAP command run will point
>> COREX back to its original dataDir. (same case with COREY).
>>
>> My only concern is after the SWAP is done, updating the old index (which
>> was serving previously and now replaced by the new index). What is the best
>> way to do that? Physically copy the latest index to the old one and make it
>> in sync with the latest one so that by the time it is to get the latest
>> updates it has the latest in it so that the new ones can be added to this
>> and it becomes the latest and is again swapped?
>>
>
> Perhaps it is best if we take a step back and understand why you need two
> identical cores?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: index merge

2010-03-11 Thread Mark Fletcher

Hi All,

Thank you for the very valuable suggestions.
I am planning to try using the Master - Slave configuration.

Best Rgds,
Mark.

On Mon, Mar 8, 2010 at 11:17 AM, Mark Miller  wrote:

> On 03/08/2010 10:53 AM, Mark Fletcher wrote:
>
>> Hi Shalin,
>>
>> Thank you for the mail.
>> My main purpose of having 2 identical cores
>> COREX - always serves user request
>> COREY - every day once, takes the updates/latest data and passess it on to
>> COREX.
>> is:-
>>
>> Suppose say I have only one COREY and suppose a request comes to COREY
>> while
>> the update of the latest data is happening on to it. Wouldn't it degrade
>> performance of the requests at that point of time?
>>
>>
> Yes - but your not going to help anything by using two indexes - best you
> can do it use two boxes. 2 indexes on the same box will actually
> be worse than one if they are identical and you are swapping between them.
> Writes on an index will not affect reads in the way you are thinking - only
> in that its uses IO and CPU that the read process cant. Thats going to
> happen with 2 indexes on the same box too - except now you have way more
> data to cache and flip between, and you can't take any advantage of things
> just being written possibly being in the cache for reads.
>
> Lucene indexes use a write once strategy - when writing new segments, you
> are not touching the segments being read from. Lucene is already doing the
> index juggling for you at the segment level.
>
>
> So I was planning to keep COREX and COREY always identical. Once COREY has
>> the latest it should somehow sync with COREX so that COREX also now has
>> the
>> latest. COREY keeps on getting the updates at a particular time of day and
>> it will again pass it on to COREX. This process continues everyday.
>>
>> What is the best possible way to implement this?
>>
>> Thanks,
>>
>> Mark.
>>
>>
>> On Mon, Mar 8, 2010 at 9:53 AM, Shalin Shekhar Mangar<
>> shalinman...@gmail.com>  wrote:
>>
>>
>>
>>> Hi Mark,
>>>
>>>  On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher<
>>> mark.fletcher2...@gmail.com>  wrote:
>>>
>>>
>>>
>>>> I ran the SWAP command. Now:-
>>>> COREX has the dataDir pointing to the updated dataDir of COREY. So COREX
>>>> has the latest.
>>>> Again, COREY (on which the update regularly runs) is pointing to the old
>>>> index of COREX. So this now doesnt have the most updated index.
>>>>
>>>> Now shouldn't I update the index of COREY (now pointing to the old
>>>> COREX)
>>>> so that it has the latest footprint as in COREX (having the latest COREY
>>>> index)so that when the update again happens to COREY, it has the latest
>>>> and
>>>> I again do the SWAP.
>>>>
>>>> Is a physical copying of the index  named COREY (the latest and now
>>>> datDir
>>>> of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
>>>> orginal non-updated index of COREX) the best way for this or is there
>>>> any
>>>> other better option.
>>>>
>>>> Once again, later when COREY is again updated with the latest, I will
>>>> run
>>>> the SWAP again and it will be fine with COREX again pointing to its
>>>> original
>>>> dataDir (now the updated one).So every even SWAP command run will point
>>>> COREX back to its original dataDir. (same case with COREY).
>>>>
>>>> My only concern is after the SWAP is done, updating the old index (which
>>>> was serving previously and now replaced by the new index). What is the
>>>> best
>>>> way to do that? Physically copy the latest index to the old one and make
>>>> it
>>>> in sync with the latest one so that by the time it is to get the latest
>>>> updates it has the latest in it so that the new ones can be added to
>>>> this
>>>> and it becomes the latest and is again swapped?
>>>>
>>>>
>>>>
>>> Perhaps it is best if we take a step back and understand why you need two
>>> identical cores?
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>>
>>>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>

create core with separate solrconfig.xml

2010-03-14 Thread Mark Fletcher

Hi,

I wanted to configure one core as Master and one core as slave.
This is my existing configuration:-

In my SOLR_HOME I have conf/schema.xml, conf/solrconfig.xml  and the others
when no core was present
Also in my SOLR_HOME are solr.xml and coreA created using the CREATE command
for cores

I have my other coreB's index in a different dataDir

I believe in this configuration both the cores share the same schema.xml and
solrconfig.xml. I added the master slave replication code in my
{SOLR_HOME}/conf/solrconfig.xml.

 



optimize




   
00:00:10




Just below that I specified the slave




{specified the instanceDir}
/coreA/replication






internal

5000
1

username
password
 


When I optimize coreA, replication to coreB doesn't happen. CoreA (my
supposed to be master here) gets the new values but not coreB. When I tried
the *startup* option in the first block of replication it gave lucene write
error in the index so I went for optimize.

Is there something wrong here or do I need to have separate solrconfig.xml
for coreA and coreB to clearly indicate who is master and who is slave by
including only one of the replicaiton codes in the corresponding
solrconfig.xml rather than have a common solrconfig.xml and specify both in
that.

If I need to specify separate solrconfig.xml for both cores, how do I do
that??

Any help is appreciated.

Thanks and Rgds,
Mark

Re: create core with separate solrconfig.xml

2010-03-15 Thread Mark Fletcher

Hi Shalin,

Thank you for your reply.
I think I mixed 2 matters in my prev mail (replication and core creation).
So let me first get help for my CORES set up.
My current set up is:-
In my SOLR_HOME I have *conf*/configfiles (like schema.xml, solrconfig.xml
etc...)
I created my new core say coreA using command:-

http://localhost:8983/solr//admin/cores?action=CREATE&name=coreA&instanceDir={SOLR_HOME}&config=solrconfig.xml&schema=schema.xml&dataDir={SOLR_HOME}/core3/solr/data/

Note:- the place-holder {SOLR_HOME} ... stands for my SOLR_HOME which I have
fully substituted there.

As a result of running this query, I had a new directory coreA created in my
SOLR_HOME. But inside coreA I didnt find a new conf directory to see my
configuration files specific to my new coreA (otherwise where should i be
looking to see the schema.xml and solrconfig.xml specific to coreA). It
seems it is still referring to the common schema.xml and solrconfig.xml
inside my {SOLR_HOME}/conf.

Where do I see the solrconfig.xml and schema.xml specific to coreA??

Now only the dataDir has been created for coreA in
{SOLR_HOME}/core3/solr/data/

Thanks and Rgds,

Mark

On Mon, Mar 15, 2010 at 4:15 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Mar 15, 2010 at 6:12 AM, Mark Fletcher
> wrote:
>
> >
> > I wanted to configure one core as Master and one core as slave.
> > This is my existing configuration:-
> >
> > In my SOLR_HOME I have conf/schema.xml, conf/solrconfig.xml  and the
> others
> > when no core was present
> > Also in my SOLR_HOME are solr.xml and coreA created using the CREATE
> > command
> > for cores
> >
> > I have my other coreB's index in a different dataDir
> >
> > I believe in this configuration both the cores share the same schema.xml
> > and
> > solrconfig.xml. I added the master slave replication code in my
> > {SOLR_HOME}/conf/solrconfig.xml.
> >
> >  
> >
> >
> >
> >optimize
> >
> >
> >
> >
> >   
> >00:00:10
> >
> > 
> >
> >
> > Just below that I specified the slave
> >
> > 
> >
> >
> >{specified the instanceDir}
> > /coreA/replication
> >
>
> The masterUrl is a HTTP URL. I'm not sure if you have specified a HTTP URL
> here.
>
>
> >
> >
> >
> >
> >
> >
> >internal
> >
> >5000
> >1
> >
> >username
> >password
> > 
> > 
> >
> > When I optimize coreA, replication to coreB doesn't happen. CoreA (my
> > supposed to be master here) gets the new values but not coreB. When I
> tried
> > the *startup* option in the first block of replication it gave lucene
> write
> > error in the index so I went for optimize.
> >
> >
> What was the lucene write error that you saw? Can you paste the stack
> trace?
>
>
> > Is there something wrong here or do I need to have separate
> solrconfig.xml
> > for coreA and coreB to clearly indicate who is master and who is slave by
> > including only one of the replicaiton codes in the corresponding
> > solrconfig.xml rather than have a common solrconfig.xml and specify both
> in
> > that.
> >
> > If I need to specify separate solrconfig.xml for both cores, how do I do
> > that??
> >
> >
> When you create the core using the CoreAdmin, you can specify an alternate
> solrconfig.xml through the "config" parameter.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

some snynonym clarifications

2010-03-18 Thread Mark Fletcher

Hi,

Just needed some help to understand the following synonym mappings:-

1. aaa => 
  does it mean:-
 if the user queries for aaa it is replaced with  and documents
matching  are searched for
   or does it mean
 if the user queries for aaa, documents with aaa as well as  are
looked for


2. bbb => 1 2
does it mean that if the user queries for bbb, SOLR will look for
documents that contain 1 2


3. ccc => 1,2
does it mean that if the user queries for ccc, SOLR will look for
documents that contain 1 or 2

4.  a\=>a => b\=>b
 First of all my doubt is what does the "\" do there. Does it have
any special significance.
 Can someone help me interpret the above

5. a\,a => b\,b
  Can some one help me with this also

6. fooaaa,baraaa,bazaaa
  does this mean that if any of  fooaaa or baraaa  or bazaaa comes
as the search keyword, SOLR will look for documents that contain
 fooaaa

7. abc def rose\, atlas method, NY.GNP.PCAP.PP.CD
   does this mean a query for any of the above 3 will always be
replaced by a query for abc def rose\

Can some one pls extend some help at your earliest convenience.

Thank you.
Mark.

Re: some snynonym clarifications

2010-03-18 Thread Mark Fletcher

Hi,

Thanks for the mail. I had tried the WIKI.

My doubts remaining were mainly:-

1.
If we have synonyms specified and they replace your search keyword with the
ones specified wouldn't we face a risk of our original keyword missed out.
What i meant is if I have a keyword for search say "agriculture" and I
replace it with some synonyms, will I never again be able to search directly
for "agriculture". ie suppose I have a document which has the term
agriculture and none of the synonyms in it. Will that document be retrieved
when i search for agriculture as I have now mapped it to other terms.

2.
I am still a bit confused about the interpretation of:-
a\=>a => b\=>b
a\,a => b\,b
   abc def rose\, my cap ,  rose flower
   Can you pls give a one linere explanation for the above. There are some
sample entries in the synonyms.txt
3. If I get some help me with the above 3 it will help me understand the
backslash "\" also better.

Thanks,
Mark.

On Thu, Mar 18, 2010 at 12:19 PM, Markus Jelsma  wrote:

> Hi,
>
>
> Check out the wiki page on the SynonymFilterFactory. It gives a decent
> explantion on the subject. The backslash is just for escaping otherwise
> meaningful characters.
>
>
> [1]:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
>
> Cheers,
>
> On Thursday 18 March 2010 17:10:56 Mark Fletcher wrote:
> > Hi,
> >
> > Just needed some help to understand the following synonym mappings:-
> >
> > 1. aaa => 
> >   does it mean:-
> >  if the user queries for aaa it is replaced with  and
> documents
> > matching  are searched for
> >or does it mean
> >  if the user queries for aaa, documents with aaa as well as 
> >  are looked for
> >
> >
> > 2. bbb => 1 2
> > does it mean that if the user queries for bbb, SOLR will look for
> > documents that contain 1 2
> >
> >
> > 3. ccc => 1,2
> > does it mean that if the user queries for ccc, SOLR will look for
> > documents that contain 1 or 2
> >
> > 4.  a\=>a => b\=>b
> >  First of all my doubt is what does the "\" do there. Does it
> have
> > any special significance.
> >  Can someone help me interpret the above
> >
> > 5. a\,a => b\,b
> >   Can some one help me with this also
> >
> > 6. fooaaa,baraaa,bazaaa
> >   does this mean that if any of  fooaaa or baraaa  or bazaaa
> comes
> > as the search keyword, SOLR will look for documents that contain
> >  fooaaa
> >
> > 7. abc def rose\, my cap ,  rose flower
>

>  >does this mean a query for any of the above 3 will always be
> > replaced by a query for abc def rose\
> >
> > Can some one pls extend some help at your earliest convenience.
> >
> > Thank you.
> > Mark.
> >
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
>

Re: some snynonym clarifications

2010-03-19 Thread Mark Fletcher

Thanks Marcus!

I got it.

BR,
Mark.

On Fri, Mar 19, 2010 at 5:50 AM, Markus Jelsma  wrote:

>
> On Thursday 18 March 2010 17:47:45 Mark Fletcher wrote:
> > Hi,
> >
> > Thanks for the mail. I had tried the WIKI.
> >
> > My doubts remaining were mainly:-
> >
> > 1.
> > If we have synonyms specified and they replace your search keyword with
> the
> >  ones specified wouldn't we face a risk of our original keyword missed
> out.
> >  What i meant is if I have a keyword for search say "agriculture" and I
> >  replace it with some synonyms, will I never again be able to search
> >  directly for "agriculture". ie suppose I have a document which has the
> >  term agriculture and none of the synonyms in it. Will that document be
> >  retrieved when i search for agriculture as I have now mapped it to other
> >  terms.
>
> It depends whether you let them be replaced. If you omit the => sign, the
> terms simlpy will be expanded to whatever synonyms you specified. I could
> not
> explain it any better than the wiki.
>
> > 2.
> > I am still a bit confused about the interpretation of:-
> > a\=>a => b\=>b
> >
> > a\,a => b\,b
> >
> >abc def rose\, my cap ,  rose flower
> >
> >Can you pls give a one linere explanation for the above. There are
> some
> >  sample entries in the synonyms.txt
>
> This is escaping otherwise meaningful characters. The , and => are
> meaninful
> to the SynonymFilterFactory and therefore need to be escaped as you also
> would
> escape certain characters in any language or whatever. You need to escape
> qoutes in many languages and you must escape the : sign a.o. in you Lucene
> queries.
>
>
> >  3. If I get some help me with the above 3 it will help me understand the
> backslash "\" also better.
> > Thanks,
> > Mark.
> >
> >
> > On Thu, Mar 18, 2010 at 12:19 PM, Markus Jelsma 
> wrote:
> > Hi,
> >
> >
> > Check out the wiki page on the SynonymFilterFactory. It gives a decent
> > explantion on the subject. The backslash is just for escaping otherwise
> > meaningful characters.
> >
> >
> > [1]:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synony
> > mFilterFactory
> >
> >
> > Cheers,
> >
> > On Thursday 18 March 2010 17:10:56 Mark Fletcher wrote:
> > > Hi,
> > >
> > > Just needed some help to understand the following synonym mappings:-
> > >
> > > 1. aaa => 
> > >   does it mean:-
> > >  if the user queries for aaa it is replaced with  and
> > > documents matching  are searched for
> > >or does it mean
> > >  if the user queries for aaa, documents with aaa as well as
> 
> > >  are looked for
> > >
> > >
> > > 2. bbb => 1 2
> > > does it mean that if the user queries for bbb, SOLR will look
> for
> > > documents that contain 1 2
> > >
> > >
> > > 3. ccc => 1,2
> > > does it mean that if the user queries for ccc, SOLR will look
> for
> > > documents that contain 1 or 2
> > >
> > > 4.  a\=>a => b\=>b
> > >  First of all my doubt is what does the "\" do there. Does it
> > > have any special significance.
> > >  Can someone help me interpret the above
> > >
> > > 5. a\,a => b\,b
> > >   Can some one help me with this also
> > >
> > > 6. fooaaa,baraaa,bazaaa
> > >   does this mean that if any of  fooaaa or baraaa  or bazaaa
> > > comes as the search keyword, SOLR will look for documents that contain
> > > fooaaa
> > >
> > > 7. abc def rose\, my cap ,  rose flower
> >
> >
> >
> > >does this mean a query for any of the above 3 will always be
> > > replaced by a query for abc def rose\
> > >
> > > Can some one pls extend some help at your earliest convenience.
> > >
> > > Thank you.
> > > Mark.
> >
> > Markus Jelsma - Technisch Architect - Buyways BV
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350
> >
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
>

dismax and q.op

2010-03-21 Thread Mark Fletcher

Hi,

I am using dismax handler. I have it set up in my solrconfig.xml.
 I have *not* used default="true" while setting it up  (the standard still
has default="true")
 *I haven't mentioned value for mm*
In my schema.xml I have set the default operator to be AND.
When I query I use the following in my query url where my query is for say
for example  *international monetary fund*:-
.../select?*q.alt*=international+monetary+fund&*qt=dismax*

My result:- No results; but each of the terms individually gave me results!

I appreciate any help on my following queries :-
1. Will the query look for documents that have *international* AND *monetary
* AND *fund*
or is it some other behavior based on the setting I have mentioned
above.
2. Does the default operator specified in schema.xml take effect when we use
dismax also or is it only for the *standard* request handler. If it has an
effect if we specify
value for mm like say 90% will it overridethe schema.xml default
operator set up.
3. How does q.alt and q difer in behavior in the above case. I found q.alt
to be giving me the results which I got when I used the standard RH also.
Hence used it.
4. When I make a change to the dismax set up I have in solrconfig.xml I
believe i just have to bounce the SOLR server.Do i need to re-index again
for the change to take effect
5. If I use the dismax how do I see the ANALYSIS feature on the admin
console other wise used for *standard* RH.

Thanks for your patience.

Best Rgds,
Mark.

Re: dismax and q.op

2010-03-23 Thread Mark Fletcher

Hi Hoss,

Thankyou so much for your time.

Regarding the last one I myself got confused when I posed the question. I
got it after your reply. I think I was actually looking for some thing like
the debugQuery="on" option, which I found later.

Best Regards,
Mark.

On Tue, Mar 23, 2010 at 6:56 PM, Chris Hostetter
wrote:

>
> :  *I haven't mentioned value for mm*
>...
> : My result:- No results; but each of the terms individually gave me
> results!
>
>
> http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29
>
>"The default value is 100% (all clauses must match)"
>
> : 2. Does the default operator specified in schema.xml take effect when we
> use
> : dismax also or is it only for the *standard* request handler. If it has
> an
>
> dismax doesn't look at the default operator, or q.op.
>
> : 3. How does q.alt and q difer in behavior in the above case. I found
> q.alt
> : to be giving me the results which I got when I used the standard RH also.
> : Hence used it.
>
> q.alt is used if and only if there is no q param (or hte q param is blank)
> ... the number of patches "q" gets, or the value of "mm" make no
> differnce.
>
> : 4. When I make a change to the dismax set up I have in solrconfig.xml I
> : believe i just have to bounce the SOLR server.Do i need to re-index again
> : for the change to take effect
>
> no ... changes to "query" time options like your SearchHandler configs
> don't require reindexing .. changes to your schema.xml *may* requre
> reindexing.
>
> : 5. If I use the dismax how do I see the ANALYSIS feature on the admin
> : console other wise used for *standard* RH.
>
> I'm afraid i don't understand this question ... analysis.jsp just shows
> you the index and query time analysis that is performed when certain
> fields are used -- it dosen't know/care about your choice of parser ... it
> knows nothing about query parser syntax.
>
>
>
> -Hoss
>
>

Re: Read Time Out Exception while trying to upload a huge SOLR input xml

2010-04-01 Thread Mark Fletcher

Hi Eric, Shawn,

Thank you for your reply.

Luckily just on the second time itself my 13GB SOLR XML (more than a million
docs) went in fine into SOLR without any problem and I uploaded another 2
more sets of 1.2million+ docs fine without any hassle.

I will try for lesser sized more xmls next time as well as the auto commit
suggestion.

Best Rgds,
Mark.

On Thu, Apr 1, 2010 at 6:18 PM, Shawn Smith  wrote:

> The error might be that your http client doesn't handle really large
> files (32-bit overflow in the Content-Length header?) or something in
> your network is killing your long-lived socket?  Solr can definitely
> accept a 13GB xml document.
>
> I've uploaded large files into Solr successfully, including recently a
> 12GB XML input file with ~4 million documents.  My Solr instance had
> 2GB of memory and it took about 2 hours.  Solr streamed the XML in
> nicely.  I had to jump through a couple of hoops, but in my case it
> was easier than writing a tool to split up my 12GB XML file...
>
> 1. I tried to use curl to do the upload, but it didn't handle files
> that large.  For my quick and dirty testing, netcat (nc) did the
> trick--it doesn't buffer the file in memory and it doesn't overflow
> the Content-Length header.  Plus I could pipe the data through pv to
> get a progress bar and estimated time of completion.  Not recommended
> for production!
>
>  FILE=documents.xml
>  SIZE=$(stat --format %s $FILE)
>  (echo "POST /solr/update HTTP/1.1
>  Host: localhost:8983
>  Content-Type: text/xml
>  Content-Length: $SIZE
>  " ; cat $FILE ) | pv -s $SIZE | nc localhost 8983
>
> 2. Indexing seemed to use less memory if I configured Solr to auto
> commit periodically in solrconfig.xml.  This is what I used:
>
>
>
>25000 
>30 
>
>
>
> Shawn
>
> On Thu, Apr 1, 2010 at 10:10 AM, Erick Erickson 
> wrote:
> > Don't do that. For many reasons . By trying to batch so many docs
> > together, you're just *asking* for trouble. Quite apart from whether
> it'll
> > work once, having *any* HTTP-based protocol work reliably with 13G is
> > fragile...
> >
> > For instance, I don't want to have my know whether the XML parsing in
> > SOLR parses the entire document into memory before processing or
> > not. But I sure don't want my application to change behavior if SOLR
> > changes it's mind and wants to process the other way. My perfectly
> > working application (assuming an event-driven parser) could
> > suddenly start requiring over 13G of memory... Oh my aching head!
> >
> > Your specific error might even be dependent upon GCing, which will
> > cause it to break differently, sometimes, maybe..
> >
> > So do break things up and transmit multiple documents. It'll save you
> > a world of hurt.
> >
> > HTH
> > Erick
> >
> > On Thu, Apr 1, 2010 at 4:34 AM, Mark Fletcher
> > wrote:
> >
> >> Hi,
> >>
> >> For the first time I tried uploading a huge input SOLR xml having about
> 1.2
> >> million *docs* (13GB in size). After some time I get the following
> >> exception:-
> >>
> >>  The server encountered an internal error ([was class
> >> java.net.SocketTimeoutException] Read timed out
> >> java.lang.RuntimeException: [was class java.net.SocketTimeoutException]
> >> Read
> >> timed out
> >>  at
> >>
> >>
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> >>  at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> >>  at
> >>
> >>
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> >>  at
> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> >>  at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:279)
> >>  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138)
> >>  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
> >>  at
> >>
> >>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> >>  at
> >>
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> >>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> >>  at
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> >>  at
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter

one particular doc in results should always come first for a particular query

2010-04-05 Thread Mark Fletcher

Hi,

Suppose I search for the word  *international. *A particular record (say *
recordX*) I am looking for is coming as the Nth result now.
I have a requirement that when a user queries for *international *I need
recordX to always be the first result. How can I achieve this.

Note:- When user searches with a *different* keyword, *recordX*  need not be
the expected first result record; it may be a different record that has to
be made to come as the first in the result for that keyword.

Is there a way to achieve this requirement. I am using dismax.

Thanks in advance.

BR,
Mark

exact match coming as second record

2010-04-05 Thread Mark Fletcher

Hi,

I am using the dismax handler.
I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have
boosted myfield^20.0.
Even with such a high boost (in fact among the qf fields specified this
field has the max boost given), when I search for XXX.YYY.ZZZ I see my
record as the second one in the results and a record of  the form
XXX.YYY.ZZZ.AAA.BBB is appearing as the first one.

Can any one help me understand why is this so, as I thought an exact match
on a heavily boosted field would give the exact match record first in
dismax.

Thanks and Rgds,
Mark

Re: exact match coming as second record

2010-04-05 Thread Mark Fletcher

Hi Eric,

Thanks many for your mail!
Please find attached the debugQuery results.

Thanks!
Mark

On Mon, Apr 5, 2010 at 7:38 PM, Erick Erickson wrote:

> What do you get back when you specify &debugQuery=on?
>
> Best
> Erick
>
> On Mon, Apr 5, 2010 at 7:31 PM, Mark Fletcher
> wrote:
>
>  > Hi,
> >
> > I am using the dismax handler.
> > I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have
> > boosted myfield^20.0.
> > Even with such a high boost (in fact among the qf fields specified this
> > field has the max boost given), when I search for XXX.YYY.ZZZ I see my
> > record as the second one in the results and a record of  the form
> > XXX.YYY.ZZZ.AAA.BBB is appearing as the first one.
> >
> > Can any one help me understand why is this so, as I thought an exact
> match
> > on a heavily boosted field would give the exact match record first in
> > dismax.
> >
> > Thanks and Rgds,
> > Mark
> >
>
A personal note:-
I have boosted the id field to the highest among my qf values specified in my 
dismax. 
Even then when I search for an id say XX.YYY.ZZZ, instead of pushing the record 
with id=XX.YYY.ZZZ to the first place, it is displaying another record 
XX.YYY.ZZZ.ME.PK as the first one...There are total 4 results but I have 
included details of only the first and second. Am surprised why XX.YY.ZZZ 
doesn't come as the first record even after an exact match found in it.

My qf fields in dismax:-
 
name^10.0 id^20.0 subtopic1^1.0 indicator_value^1.0 country_name^1.0 
country_code^1.0 source^0.8 database^1.4 definition^1.2 dr_report_name^1.0 
dr_header^1.0 dr_footer^1.0 dr_mdx_query^1.0 dr_reportmetadata^1.0 content^1.0 
aag_indicators^1.0 type^1.0 text^.3
 

id^6.0
 
 
type:Timeseries^1000.0
 

Debug Report:-


 xx.yyy.
 xx.yyy.
 +DisjunctionMaxQuery((text:"(xx.yyy.zzz xx) yyy 
"^0.3 | definition:"(xx.yyy.zzz xx) yyy "^0.2 | 
indicator_value:"(xx.yyy.zzz xx) yyy " | subtopic1:"(xx.yyy.zzz xx) yyy 
" | dr_report_name:"(xx.yyy.zzz xx) yyy " | 
dr_reportmetadata:"(xx.yyy.zzz xx) yyy " | dr_footer:"(xx.yyy.zzz xx) yyy 
" | type:"(xx.yyy.zzz xx) yyy " | country_code:"(xx.yyy.zzz xx) yyy 
"^2.0 | country_name:"(xx.yyy.zzz xx) yyy "^2.0 | database:"(xx.yyy.zzz 
xx) yyy "^1.4 | aag_indicators:"(xx.yyy.zzz xx) yyy " | 
content:"(xx.yyy.zzz xx) yyy " | id:xx.yyy.^1000.0 | 
dr_mdx_query:"(xx.yyy.zzz xx) yyy " | source:"(xx.yyy.zzz xx) yyy "^0.2 
| name:"(xx.yyy.zzz xx) yyy "^10.0 | dr_header:"(xx.yyy.zzz xx) yyy 
")~0.01) DisjunctionMaxQuery((id:xx.yyy.^6.0)~0.01) 
type:timeseries^1000.0
 +(text:"(xx.yyy.zzz xx) yyy "^0.3 | 
definition:"(xx.yyy.zzz xx) yyy "^0.2 | indicator_value:"(xx.yyy.zzz xx) 
yyy " | subtopic1:"(xx.yyy.zzz xx) yyy " | dr_report_name:"(xx.yyy.zzz 
xx) yyy " | dr_reportmetadata:"(xx.yyy.zzz xx) yyy " | 
dr_footer:"(xx.yyy.zzz xx) yyy " | type:"(xx.yyy.zzz xx) yyy " | 
country_code:"(xx.yyy.zzz xx) yyy "^2.0 | country_name:"(xx.yyy.zzz xx) yyy 
"^2.0 | database:"(xx.yyy.zzz xx) yyy "^1.4 | 
aag_indicators:"(xx.yyy.zzz xx) yyy " | content:"(xx.yyy.zzz xx) yyy " 
| id:xx.yyy.^1000.0 | dr_mdx_query:"(xx.yyy.zzz xx) yyy " | 
source:"(xx.yyy.zzz xx) yyy "^0.2 | name:"(xx.yyy.zzz xx) yyy "^10.0 | 
dr_header:"(xx.yyy.zzz xx) yyy ")~0.01 (id:xx.yyy.^6.0)~0.01 
type:timeseries^1000.0
 

0.15786289 = (MATCH) sum of:
  6.086512E-4 = (MATCH) max plus 0.01 times others of:
6.086512E-4 = (MATCH) weight(text:"(xx.yyy. sp) yyy "^0.3 in 1004), 
product of:
  7.562088E-4 = queryWeight(text:"(xx.yyy. xx) yyy "^0.3), product 
of:
0.3 = boost
20.604721 = idf(text:"(xx.yyy. xx) yyy "^0.3)
1.2233584E-4 = queryNorm
  0.8048719 = (MATCH) fieldWeight(text:"(xx.yyy. xx) yyy "^0.3 in 
1004), product of:
1.0 = tf(phraseFreq=1.0)
20.604721 = idf(text:"(xx.yyy. xx) yyy "^0.3)
0.0390625 = fieldNorm(field=text, doc=1004)
  0.15725423 = (MATCH) weight(type:timeseries^1000.0 in 1004), product of:
0.1387005 = queryWeight(type:timeseries^1000.0), product of:
  1000.0 = boost
  1.1337683 = idf(docFreq=1054, maxDocs=1206)
  1.2233584E-4 = queryNorm
1.1337683 = (MATCH) fieldWeight(type:timeseries in 1004), product of:
  1.0 = tf(termFreq(type:timeseries)=1)
  1.1337683 = idf(

Elevate query and standard RH

2010-04-06 Thread Mark Fletcher

hi,

I found elevate query working fine with dismax handler when i added the
searchComponent to my Dismax RH.

Couldn't find the desired results when trying with the standard
RequestHandler. Hope it works just like that with the Standard RH also.

Thanks and Rgds,
Mark.

Re: one particular doc in results should always come first for a particular query

2010-04-06 Thread Mark Fletcher

Thanks Eric, Chris!

I tried the Query Elevation and it seems to be working fine for me.

Best Rgds,
Mark.

On Mon, Apr 5, 2010 at 7:40 PM, Chris Hostetter wrote:

>
> : If that's the case, you could copy the magic keyword to a different field
> : (say magic_keyword) and boost it right into orbit as an OR clause
> : (magic_keyword:bonkers ^1). This kind of assumes that a magic keyword
> : corresponds to one and only one document
> :
> : If this is way off base, perhaps you could characterize how keywords map
> to
> : specific documents you want at the top.
>
> This smells like...
>
> http://wiki.apache.org/solr/QueryElevationComponent
>
> -Hoss
>

dismax and qf

2010-04-10 Thread Mark Fletcher

Hi,

I use *dismax* and have specified my fields to be boosted in the qf
parameter in solrconfig.xml. What I understand is that in the search URL
also I can specify these qf value by doing the addition &qf=field1^100
field2^200 which can override the boost specified to each field in
solrconfig.xml. But when I change the boosts like this in my URL for the
various fields, I still don't
find any difference at all in the order in which the results are coming. I
have many fields specified in qf. I changed their boost by changing say *
field1^2.0* to *field1^1000 *in the query URL but I find no change in the
order in which the results still come.

Is there any problem in using the qf parameter like this as part of the
query URL and varying the boosts (relevancy) to check how the order of
results vary. My aim is to see how the results would change if I boosted
some fields more from others or the viceversa of decreased boost of some
fields compared to others.

Could some one pls help!

Thanks.
Mark.

Re: Search for FirstName with first Char uppercase followed by * not giving result; getting result with all lowercase and *

SOLR clustering ant code not compiling

Re: SOLR clustering ant code not compiling

merge indexes command

test mail... my mails to solr-user@lucene.apache.org are bouncing ... sorry for any inconvenience

index merge

Fwd: index merge

Re: index merge

Re: index merge

Re: index merge

create core with separate solrconfig.xml

Re: create core with separate solrconfig.xml

some snynonym clarifications

Re: some snynonym clarifications

Re: some snynonym clarifications

dismax and q.op

Re: dismax and q.op

Re: Read Time Out Exception while trying to upload a huge SOLR input xml

one particular doc in results should always come first for a particular query

exact match coming as second record

Re: exact match coming as second record

Elevate query and standard RH

Re: one particular doc in results should always come first for a particular query

dismax and qf

24 matches

Site Navigation

Mail list logo

Footer information