from:"mohammad"

Solr update document issue

2013-12-26 Thread mohammad

ection; }   }   public void 
addDocument(SolrInputDocument doc) throws
SolrServerException,IOException {   
solrServer.add(doc);
solrServer.commit();}   public void updateDocument(SolrInputDocument 
doc)
throws SolrServerException, IOException {   solrServer.add(doc);
solrServer.commit();}}Best Thanks,Mohammad yaseen



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-update-document-issue-tp4108214.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr update document issue

2013-12-26 Thread mohammad

Hello all,

 in our last project we use solr as search engine to search for assets.
 
 we have a functionality to search for product in it's summary text, the
product itself is "container for a set of products parent" so each time we
add new product under it , the summary of the "parent product" should be
updated to add the new text.

so in this case each time we add new child product, the parent product
summary text should be updated,
some times the added summary text list is empty sometimes not, but in case
of empty list the document all field are delted except version and id.

 to avoid this problem we ignore the update behavior in case of empty list.

*A. in case of update with empty list:*

   1.added document is :
   

2. after update 

*B. in case of not empty list in update request:*
 1. same as in a.1.
 2. 
 


i use solrj and solr4.4.0.

my schema document :
  

my java code to test this senario is as follow:

//TestingSolrUpdateDoc.java

//SolrConnection .java

Best Thanks,
Mohammad yaseen



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-update-document-issue-tp4108214p4108215.html
Sent from the Solr - User mailing list archive at Nabble.com.

problem: zooKeeper Integration with solr

2011-06-06 Thread Mohammad Shariq

Hi folk,
I am using solr to index around 100mn docs.
now I am planning to move to cluster based solr, so that I can scale the
indexing and searching process.
since solrCloud is in development  stage, I am trying to index in shard
based environment using zooKeeper.

I followed the steps from
http://wiki.apache.org/solr/ZooKeeperIntegrationthen also I am not
able to do distributes search.
Once I index the docs in one shard, not able to query from other shard and
vice-versa, (using the query
http://localhost:8180/solr/select/?q=itunes&version=2.2&start=0&rows=10&indent=on
)

I am running solr3.1 on ubuntu 10.10.

please help me.


-- 
Thanks and Regards
Mohammad Shariq

Re: problem: zooKeeper Integration with solr

2011-06-07 Thread Mohammad Shariq

how this method
(http://localhost:8983/solr/select?shards=*:/,**:/*&indent=true&q=)
is better than zooKeeper, could you please refer any performance doc.


On 7 June 2011 08:18, bmdakshinamur...@gmail.com  wrote:

> Instead of integrating zookeeper, you could create shards over multiple
> machines and specify the shards while you are querying solr.
> Eg: http://localhost:8983/solr/select?shards=*:/ Path>,*
> *:/*&indent=true&q=
>
>
>
> On Mon, Jun 6, 2011 at 5:59 PM, Mohammad Shariq  >wrote:
>
> > Hi folk,
> > I am using solr to index around 100mn docs.
> > now I am planning to move to cluster based solr, so that I can scale the
> > indexing and searching process.
> > since solrCloud is in development  stage, I am trying to index in shard
> > based environment using zooKeeper.
> >
> > I followed the steps from
> > http://wiki.apache.org/solr/ZooKeeperIntegrationthen also I am not
> > able to do distributes search.
> > Once I index the docs in one shard, not able to query from other shard
> and
> > vice-versa, (using the query
> >
> >
> http://localhost:8180/solr/select/?q=itunes&version=2.2&start=0&rows=10&indent=on
> > )
> >
> > I am running solr3.1 on ubuntu 10.10.
> >
> > please help me.
> >
> >
> > --
> > Thanks and Regards
> > Mohammad Shariq
> >
>
>
>
> --
> Thanks and Regards,
> DakshinaMurthy BM
>



-- 
Thanks and Regards
Mohammad Shariq

Re: Re: Can I update a specific field in solr?

2011-06-08 Thread Mohammad Shariq

Solr dont support partial updates.

On 8 June 2011 16:04, ZiLi  wrote:

>
> Thanks very much , I'll re-index a whole document : )
>
>
>
>
> 发件人： Chandan Tamrakar
> 发送时间： 2011-06-08  18:25:37
> 收件人： solr-user
> 抄送：
> 主题： Re: Can I update a specific field in solr?
>
> I think You can do that but you need to re-index a whole document again.
> note that there is nothing like "update"  , its usually delete and then
> add.
> thanks
> On Wed, Jun 8, 2011 at 4:00 PM, ZiLi  wrote:
> > Hi, I try to update a specific field in solr , but I didn't find anyway
> to
> > implement this .
> > Anyone who knows how to ?
> > Any suggestions will be appriciate : )
> >
> >
> > 2011-06-08
> >
> >
> >
> > ZiLi
> >
> --
> Chandan Tamrakar
> *
> *
>



-- 
Thanks and Regards
Mohammad Shariq

Re: solr speed issues..

2011-06-08 Thread Mohammad Shariq

How frequently you Optimize your solrIndex ??
Optimization also helps in reducing search latency. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-speed-issues-tp2254823p3038794.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to Index and Search non-Eglish Text in solr

2011-06-08 Thread Mohammad Shariq

Hi,
I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles in
English, but my requirement extend to index the news of other languages too.

This is how my schema looks :



And the "text" Field in schema.xml looks like :



   
   
   
   
   


   
   
   
   
   
   




My Problem is :
Now I want to index the news articles in other languages to e.g.
Chinese,Japnese.
How I can I modify my text field so that I can Index the news in other lang
too and make it searchable ??

Thanks
Shariq





--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to Index and Search non-Eglish Text in solr

2011-06-09 Thread Mohammad Shariq

Can I specify multiple language in filter tag in schema.xml ???  like below


   
  
  
  









  


On 8 June 2011 18:47, Erick Erickson  wrote:

> This page is a handy reference for individual languages...
> http://wiki.apache.org/solr/LanguageAnalysis
>
> But the usual approach, especially for Chinese/Japanese/Korean
> (CJK) is to index the content in different fields with language-specific
> analyzers then spread your search across the language-specific
> fields (e.g. title_en, title_fr, title_ar). Stemming and stopwords
> particularly give "surprising" results if you put words from different
> languages in the same field.
>
> Best
> Erick
>
> On Wed, Jun 8, 2011 at 8:34 AM, Mohammad Shariq 
> wrote:
> > Hi,
> > I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles in
> > English, but my requirement extend to index the news of other languages
> too.
> >
> > This is how my schema looks :
> >  > required="false"/>
> >
> >
> > And the "text" Field in schema.xml looks like :
> >
> > 
> >
> >   
> >> words="stopwords.txt" enablePositionIncrements="true"/>
> >generateWordParts="1"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="1"/>
> >   
> >> protected="protwords.txt"/>
> >
> >
> >   
> >> ignoreCase="true" expand="true"/>
> >> words="stopwords.txt" enablePositionIncrements="true"/>
> >generateWordParts="1"
> > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="1"/>
> >   
> >> protected="protwords.txt"/>
> >
> > 
> >
> >
> > My Problem is :
> > Now I want to index the news articles in other languages to e.g.
> > Chinese,Japnese.
> > How I can I modify my text field so that I can Index the news in other
> lang
> > too and make it searchable ??
> >
> > Thanks
> > Shariq
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
Thanks and Regards
Mohammad Shariq

Re: how to Index and Search non-Eglish Text in solr

2011-06-09 Thread Mohammad Shariq

Thanks Erick for your help.
I have another silly question.
Suppose I created mutiple fieldTypes e.g. news_English, news_Chinese,
news_Japnese etc.
after creating these field, can I copy all these to CopyField "*defaultquery"
*like below :

*



*and my "defaultquery" looks like :*


*Is this right way to deal  with multiple language Indexing and searching* *
???*

*


On 9 June 2011 19:06, Erick Erickson  wrote:

> No, you'd have to create multiple fieldTypes, one for each language
>
> Best
> Erick
>
> On Thu, Jun 9, 2011 at 5:26 AM, Mohammad Shariq 
> wrote:
> > Can I specify multiple language in filter tag in schema.xml ???  like
> below
> >
> > 
> >   
> >  
> >   > words="stopwords.txt" enablePositionIncrements="true"/>
> >   generateWordParts="1"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="1"/>
> >
> > 
> > 
> > 
> > 
> > 
> >
> >
> >
> >   > class="solr.SnowballPorterFilterFactory" language="Hungarian" />
> >
> >
> > On 8 June 2011 18:47, Erick Erickson  wrote:
> >
> >> This page is a handy reference for individual languages...
> >> http://wiki.apache.org/solr/LanguageAnalysis
> >>
> >> But the usual approach, especially for Chinese/Japanese/Korean
> >> (CJK) is to index the content in different fields with language-specific
> >> analyzers then spread your search across the language-specific
> >> fields (e.g. title_en, title_fr, title_ar). Stemming and stopwords
> >> particularly give "surprising" results if you put words from different
> >> languages in the same field.
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Jun 8, 2011 at 8:34 AM, Mohammad Shariq 
> >> wrote:
> >> > Hi,
> >> > I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles
> in
> >> > English, but my requirement extend to index the news of other
> languages
> >> too.
> >> >
> >> > This is how my schema looks :
> >> >  >> > required="false"/>
> >> >
> >> >
> >> > And the "text" Field in schema.xml looks like :
> >> >
> >> >  positionIncrementGap="100">
> >> >
> >> >   
> >> >>> > words="stopwords.txt" enablePositionIncrements="true"/>
> >> >>> generateWordParts="1"
> >> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> >> > catenateAll="0" splitOnCaseChange="1"/>
> >> >   
> >> >language="English"
> >> > protected="protwords.txt"/>
> >> >
> >> >
> >> >   
> >> >synonyms="synonyms.txt"
> >> > ignoreCase="true" expand="true"/>
> >> >>> > words="stopwords.txt" enablePositionIncrements="true"/>
> >> >>> generateWordParts="1"
> >> > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> >> > catenateAll="0" splitOnCaseChange="1"/>
> >> >   
> >> >language="English"
> >> > protected="protwords.txt"/>
> >> >
> >> > 
> >> >
> >> >
> >> > My Problem is :
> >> > Now I want to index the news articles in other languages to e.g.
> >> > Chinese,Japnese.
> >> > How I can I modify my text field so that I can Index the news in other
> >> lang
> >> > too and make it searchable ??
> >> >
> >> > Thanks
> >> > Shariq
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html
> >> > Sent from the Solr - User mailing list archive at Nabble.com.
> >> >
> >>
> >
> >
> >
> > --
> > Thanks and Regards
> > Mohammad Shariq
> >
>



-- 
Thanks and Regards
Mohammad Shariq

Re: SolrCloud questions

2011-06-09 Thread Mohammad Shariq

I am also planning to move to SolrCloud;
since its still in under development, I am not sure about its behavior in
Production.
Please update us once you find it stable.


On 10 June 2011 03:56, Upayavira  wrote:

> I'm exploring SolrCloud for a new project, and have some questions based
> upon what I've found so far.
>
> The setup I'm planning is going to have a number of multicore hosts,
> with cores being moved between hosts, and potentially with cores merging
> as they get older (cores are time based, so once today has passed, they
> don't get updated).
>
> First question: The solr/conf dir gets uploaded to Zookeeper when you
> first start up, and using system properties you can specify a name to be
> associated with those conf files. How do you handle it when you have a
> multicore setup, and different configs for each core on your host?
>
> Second question: Can you query collections when using multicore? On
> single core, I can query:
>
>  http://localhost:8983/solr/collection1/select?q=blah
>
> On a multicore system I can query:
>
>  http://localhost:8983/solr/core1/select?q=blah
>
> but I cannot work out a URL to query collection1 when I have multiple
> cores.
>
> Third question: For replication, I'm assuming that replication in
> SolrCloud is still managed in the same way as non-cloud Solr, that is as
> ReplicationHandler config in solrconfig? In which case, I need a
> different config setup for each slave, as each slave has a different
> master (or can I delegate the decision as to which host/core is its
> master to zookeeper?)
>
> Thanks for any pointers.
>
> Upayavira
> ---
> Enterprise Search Consultant at Sourcesense UK,
> Making Sense of Open Source
>
>


-- 
Thanks and Regards
Mohammad Shariq

Search failed even if it has the keyword .

2011-06-17 Thread Mohammad Shariq

Hello,
solr-search failed even if it has the keyword .
I am using solr (solr3.1 on ubuntu 10.10) for Indexing the tweets.
I am indexing certain tweets, but solr do'nt return any result when I search
any keyword from tweet.

in Solr, tweet is stored as 'text'.
below is the tweet which I index :
*"RT @Khan_KK: DescribeYourImageWithAMovieTitle Khan Rais"*

once this tweet is index, I search for :
*
http://127.0.0.1:8983/solr/select/?q=DescribeYourImageWithAMovieTitle&version=2.2&start=0&rows=10&indent=on
*

and nothing is returned from Solr even though this tweet is there in the
solr.
I tried searching many keywords e.g. describe, image, movie  but nothing is
returned from solr.
I am using 'text' field of solr3.1.
am I using right textChunker ??
please help me.




-- 
Thanks and Regards
Mohammad Shariq

Re: Search failed even if it has the keyword .

2011-06-17 Thread Mohammad Shariq

Hi Pravesh,
this is how my schema looks for 'text' field :


















My default search field is 'defaultquery'
and I am copy field is :




And My tweet is indexed into 'title'.



On 17 June 2011 15:46, pravesh  wrote:

> First check, in your schema.xml, which is your default search field. Also
> look if you are using WordDelimiterFilterFactory in your schema.xml for the
> specific field. This would tokenize your words on every capital letter, so,
> for the word "DescribeYourImageWithAMovieTitle" will be broken into
> multiple
> tokens and each will be searchable.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Search-failed-even-if-it-has-the-keyword-tp3075626p3075644.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Mohammad Shariq

Re: Search failed even if it has the keyword .

2011-06-17 Thread Mohammad Shariq

very very thanks Parvesh.
my 'title' was 'text'  whereas 'defaultquery'  was 'query_text'.

I change my 'defaultquery'  to 'text' and problem is solved.

thanks again.

On 17 June 2011 16:57, pravesh  wrote:

> What is the type for the field's  defaultquery & title in your schema.xml ?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Search-failed-even-if-it-has-the-keyword-tp3075626p3075797.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

-- 
Thanks and Regards
Mohammad Shariq

Re: Solr and Tag Cloud

2011-06-18 Thread Mohammad Shariq

I am also looking for the same, Is there any way to find the cloud-tag of
all the documents matching a specific query.

On 18 June 2011 09:42, Jamie Johnson  wrote:

> Does anyone have details of how to generate a tag cloud of popular terms
> across an entire data set and then also across a query?
>

-- 
Thanks and Regards
Mohammad Shariq

Re: Is it true that I cannot delete stored content from the index?

2011-06-18 Thread Mohammad Shariq

I have define  in my solr and Deleting the docs from solr using
this uniqueKey.
and then doing optimization once in a day.
is this right way to delete ???

On 19 June 2011 05:14, Erick Erickson  wrote:

> Yep, you've got to delete and re-add. Although if you have a
>  defined you
> can just re-add that document and Solr will automatically delete the
> underlying
> document.
>
> You might have to optimize the index afterwards to get the data to really
> disappear since the deletion process just marks the document as
> deleted.
>
> Best
> Erick
>
> On Sat, Jun 18, 2011 at 1:20 PM, Gabriele Kahlout
>  wrote:
> > Hello,
> >
> > I've indexing with the content field stored. Now I'd like to delete all
> > stored content, is there how to do that without re-indexing?
> >
> > It seems not from lucene
> > FAQ<
> http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F
> >
> > :
> > How do I update a document or a set of documents that are already
> > indexed? There
> > is no direct update procedure in Lucene. To update an index incrementally
> > you must first *delete* the documents that were updated, and *then
> > re-add*them to the index.
> >
> > --
> > Regards,
> > K. Gabriele
> >
> > --- unchanged since 20/9/10 ---
> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > receipt within 48 hours then I don't resend the email.
> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> > < Now + 48h) ⇒ ¬resend(I, this).
> >
> > If an email is sent by a sender that is not a trusted contact or the
> email
> > does not contain a valid code then the email is not received. A valid
> code
> > starts with a hyphen and ends with "X".
> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> > L(-[a-z]+[0-9]X)).
> >
>



-- 
Thanks and Regards
Mohammad Shariq

Re: fq vs adding to query

2011-06-19 Thread Mohammad Shariq

fq is filter-query, search based on category, timestamp, language etc. but I
dont see any performance improvement if use 'keyword' in fq.

useCases :
fq=lang:English&q=camera AND digital
OR
fq=time:[13023567 TO 13023900]&q=camera AND digital


On 19 June 2011 20:17, Jamie Johnson  wrote:

> Are there any hard and fast rules about when touse fq vs adding to the
> query?  For instance if I started with a search of
> camera
>
> then wanted to add another keyword say digital, is it better to do
>
> q=camera AND digital
>
> or
>
> q=camera&fq=digital
>
> I know that fq isn't taken into account when doing highlighting, so what I
> am currently doing is when there are facet based queries I am doing fqs but
> everything else is being added to the query, so in the case above I would
> have done q=camera AND digital.  If however there was a field called
> category with values standard or digital I would have done
> q=camera&fq=category:digital.  Any guidance would be appreciated.
>



-- 
Thanks and Regards
Mohammad Shariq

Re: Weird optimize performance degradation

2011-06-19 Thread Mohammad Shariq

I also have the solr with around 100mn docs.
I do optimize once in a week, and it takes around 1 hour 30 mins to
optimize.


On 19 June 2011 20:02, Santiago Bazerque  wrote:

> Hello Erick, thanks for your answer!
>
> Yes, our over-optimization is mainly due to paranoia over these strange
> commit times. The long optimize time persisted in all the subsequent
> commits, and this is consistent with what we are seeing in other production
> indexes that have the same problem. Once the anomaly shows up, it never
> commits quickly again.
>
> I combed through the last 50k documents that were added before the first
> slow commit. I found one with a larger than usual number of fields (didn't
> write down the number, but it was a few thousands).
>
> I deleted it, and the following optimize was normal again (110 seconds). So
> I'm pretty sure a document with lots of fields is the cause of the
> slowdown.
>
> If that would be useful, I can do some further testing to confirm this
> hypothesis and send the document to the list.
>
> Thanks again for your answer.
>
> Best,
> Santiago
>
> On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson  >wrote:
>
> > First, there's absolutely no reason to optimize this often, if at all.
> > Older
> > versions of Lucene would search faster on an optimized index, but
> > this is no longer necessary. Optimize will reclaim data from
> > deleted documents, but is generally recommended to be performed
> > fairly rarely, often at off-peak hours.
> >
> > Note that optimize will re-write your entire index into a single new
> > segment,
> > so following your pattern it'll take longer and longer each time.
> >
> > But the speed change happening at 500,000 documents is suspiciously
> > close to the default mergeFactor of 10 X 50,000. Do subsequent
> > optimizes (i.e. on the 750,000th document) still take that long? But
> > this doesn't make sense because if you're optimizing instead of
> > committing, each optimize should reduce your index to 1 segment and
> > you'll never hit a merge.
> >
> > So I'm a little confused. If you're really optimizing every 50K docs,
> what
> > I'd expect to see is successively longer times, and at the end of each
> > optimize I'd expect there to be only one segment in your index.
> >
> > Are you sure you're not just seeing successively longer times on each
> > optimize and just noticing it after 10?
> >
> > Best
> > Erick
> >
> > On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque 
> > wrote:
> > > Hello!
> > >
> > > Here is a puzzling experiment:
> > >
> > > I build an index of about 1.2MM documents using SOLR 3.1. The index has
> a
> > > large number of dynamic fields (about 15.000). Each document has about
> > 100
> > > fields.
> > >
> > > I add the documents in batches of 20, and every 50.000 documents I
> > optimize
> > > the index.
> > >
> > > The first 10 optimizes (up to exactly 500k documents) take less than a
> > > minute and a half.
> > >
> > > But the 11th and all subsequent commits take north of 10 minutes. The
> > commit
> > > logs look identical (in the INFOSTREAM.txt file), but what used to be
> > >
> > >   Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene
> > Merge
> > > Thread #0]: merge: total 50 docs
> > >
> > > Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene
> Merge
> > > Thread #0]: merge store matchedCount=2 vs 2
> > >
> > >
> > > now eats a lot of time:
> > >
> > >
> > >   Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene
> > Merge
> > > Thread #0]: merge: total 55 docs
> > >
> > > Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene
> Merge
> > > Thread #0]: merge store matchedCount=2 vs 2
> > >
> > >
> > > What could be happening between those two lines that takes 10 minutes
> at
> > > full CPU? (and with 50k docs less used to take so much less?).
> > >
> > >
> > > Thanks in advance,
> > >
> > > Santiago
> > >
> >
>



-- 
Thanks and Regards
Mohammad Shariq

Search is taking long-long time.

2011-06-22 Thread Mohammad Shariq

I am running two solrShards. I have indexed 100 million docs in each shard (
each are 50 GB and only 'id' is stored).
My search have became very slow. Its taking around 2-3 seconds.
below is my query :

http://solrHost1:8080/solr/select?shards=solrHost1:8080/solr,solrHost2:8080/solr&q=
QUERY&fq=FilterQuery&fl=id&start=0&rows=100&indent=on&sort=time desc

QUERY and FilterQuery is below :

QUERY = Online Shopping AND ( Amex OR Am ex OR American express OR
americanexpress )
FilterQuery = time:[1308659371 TO 1308745771] AND category:news AND
lang:English

How to boost the query perfomance.
default search filed is title( text).


-- 
Thanks and Regards
Mohammad Shariq

Re: Search is taking long-long time.

2011-06-22 Thread Mohammad Shariq

this is how my 'time' field looks in schema :


and also, I am doing frequent Update to Solr (every 5 minuts).


On 22 June 2011 18:41, Ahmet Arslan  wrote:

> > I am running two solrShards. I have
> > indexed 100 million docs in each shard (
> > each are 50 GB and only 'id' is stored).
> > My search have became very slow. Its taking around 2-3
> > seconds.
> > below is my query :
> >
> >
> http://solrHost1:8080/solr/select?shards=solrHost1:8080/solr,solrHost2:8080/solr&q=
> > QUERY&fq=FilterQuery&fl=id&start=0&rows=100&indent=on&sort=time
> > desc
> >
> > QUERY and FilterQuery is below :
> >
> > QUERY = Online Shopping AND ( Amex OR Am ex OR American
> > express OR
> > americanexpress )
> > FilterQuery = time:[1308659371 TO 1308745771] AND
> > category:news AND
> > lang:English
> >
> > How to boost the query perfomance.
> > default search filed is title( text).
>
> If fieldType of time is not trie-based, you can change it to tdate, tint
> etc. For range queries.
>
> If you don't update your index frequently, you can use separate filter
> queries (fq) for your clauses. To benefit from caching.
> &fq=category:news&fq=lang:English
>
> http://wiki.apache.org/solr/SolrPerformanceFactors
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>



-- 
Thanks and Regards
Mohammad Shariq

Re: how to index data in solr form database automatically

2011-06-24 Thread Mohammad Shariq

First write a Script in Python ( or JAVA or PHP or anyLanguage)  which reads
the data from database and index into Solr.

Now setup this script as cron-job to run automatically at certain interval.

On 24 June 2011 17:23, Romi  wrote:

> would you please tell me how can i use Cron for auto index my database
> tables
> in solr
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-index-data-in-solr-form-database-automatically-tp3102893p3103768.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

-- 
Thanks and Regards
Mohammad Shariq

Solr PhraseSearch and ExactMatch

2011-06-27 Thread Mohammad Shariq

Hello,
I am using solr1.4 on ubuntu 10.10.
Currently I got the requirement to do the ExactMatch  for PhraseSearch.
I tried googling but I did'nt got the exact solution.

I am doing the search on 'text' field.
if I give the search query :
http://localhost:8983/solr/select/?q="the search agency" <http://localhost>

It apply the stopWordsFilter and remove the 'the' word from query, but my
requirement is to do the ExactMatch.
please suggest me the right solution to this problem.

-- 
Thanks and Regards
Mohammad Shariq

Re: Solr PhraseSearch and ExactMatch

2011-06-27 Thread Mohammad Shariq

I can use 'String' instead of 'Text' for exact match, but I need ExactMatch
only on PhraseSearch.

On 27 June 2011 16:29, Gora Mohanty  wrote:

> On Mon, Jun 27, 2011 at 3:42 PM, Mohammad Shariq 
> wrote:
> > Hello,
> > I am using solr1.4 on ubuntu 10.10.
> > Currently I got the requirement to do the ExactMatch  for PhraseSearch.
> > I tried googling but I did'nt got the exact solution.
> >
> > I am doing the search on 'text' field.
> > if I give the search query :
> > http://localhost:8983/solr/select/?q="the search agency" <
> http://localhost>
> >
> > It apply the stopWordsFilter and remove the 'the' word from query, but my
> > requirement is to do the ExactMatch.
> > please suggest me the right solution to this problem.
> [...]
>
> Use a "string" field which does not have any analyzers, and
> tokenizers.
>
> Regards,
> Gora
>



-- 
Thanks and Regards
Mohammad Shariq

Re: Analyzer creates PhraseQuery

2011-06-27 Thread Mohammad Shariq

I guess 'to' may be listed in 'stopWords' .

On 28 June 2011 08:27, entdeveloper  wrote:

> I have an analyzer setup in my schema like so:
>
>  
>
>
> maxGramSize="2"/>
>  
>
> What's happening is if I index a term like "toys and dolls", if I search
> for
> "to", I get no matches. The debug output in solr gives me:
>
> to
> to
> PhraseQuery(autocomplete:"t o to")
> autocomplete:"t o to"
>
> Which means it looks like the lucene query parser is turning it into a
> PhraseQuery for some reason. The explain seems to confirm that this
> PhraseQuery is what's causing my document to not match:
>
> 0.0 = (NON-MATCH) weight(autocomplete:"t o to" in 82), product of:
>  1.0 = queryWeight(autocomplete:"t o to"), product of:
>6.684934 = idf(autocomplete: t=60 o=68 to=14)
>0.1495901 = queryNorm
>  0.0 = fieldWeight(autocomplete:"t o to" in 82), product of:
>0.0 = tf(phraseFreq=0.0)
>6.684934 = idf(autocomplete: t=60 o=68 to=14)
>0.1875 = fieldNorm(field=autocomplete, doc=82)
>
> But why? This seems like it should match to me, and indeed the Solr
> analysis
> tool highlights the matches (see image), so something isn't lining up
> right.
>
>
> http://lucene.472066.n3.nabble.com/file/n3116288/Screen_shot_2011-06-27_at_7.55.49_PM.png
>
> In case you're wondering, I'm trying to implement a semi-advanced
> autocomplete feature that goes beyond using what a simple EdgeNGram
> analyzer
> could do.
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Analyzer-creates-PhraseQuery-tp3116288p3116288.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Mohammad Shariq

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq

I also have the problem of duplicate docs.
I am indexing news articles, Every news article will have the source URL,
If two news-article has the same URL, only one need to index,
removal of duplicate at index time.



On 23 June 2011 21:24, simon  wrote:

> have you checked out the deduplication process that's available at
> indexing time ? This includes a fuzzy hash algorithm .
>
> http://wiki.apache.org/solr/Deduplication
>
> -Simon
>
> On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash  wrote:
> > This approach would definitely work is the two documents are *Exactly*
> the
> > same. But this is very fragile. Even if one extra space has been added,
> the
> > whole hash would change. What I am really looking for is some %age
> > similarity between documents, and remove those documents which are more
> than
> > 95% similar.
> >
> > *Pranav Prakash*
> >
> > "temet nosce"
> >
> > Twitter <http://twitter.com/pranavprakash> | Blog <
> http://blog.myblive.com> |
> > Google <http://www.google.com/profiles/pranny>
> >
> >
> > On Thu, Jun 23, 2011 at 15:16, Omri Cohen  wrote:
> >
> >> What you need to do, is to calculate some HASH (using any message digest
> >> algorithm you want, md5, sha-1 and so on), then do some reading on solr
> >> field collapse capabilities. Should not be too complicated..
> >>
> >> *Omri Cohen*
> >>
> >>
> >>
> >> Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
> +972-3-6036295
> >>
> >>
> >>
> >>
> >> My profiles: [image: LinkedIn] <http://www.linkedin.com/in/omric>
> [image:
> >> Twitter] <http://www.twitter.com/omricohe> [image:
> >> WordPress]<http://omricohen.me>
> >>  Please consider your environmental responsibility. Before printing this
> >> e-mail message, ask yourself whether you really need a hard copy.
> >> IMPORTANT: The contents of this email and any attachments are
> confidential.
> >> They are intended for the named recipient(s) only. If you have received
> >> this
> >> email by mistake, please notify the sender immediately and do not
> disclose
> >> the contents to anyone or make copies thereof.
> >> Signature powered by
> >> <
> >>
> http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer
> >> >
> >> WiseStamp<
> >>
> http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer
> >> >
> >>
> >>
> >>
> >> -- Forwarded message --
> >> From: Pranav Prakash 
> >> Date: Thu, Jun 23, 2011 at 12:26 PM
> >> Subject: Removing duplicate documents from search results
> >> To: solr-user@lucene.apache.org
> >>
> >>
> >> How can I remove very similar documents from search results?
> >>
> >> My scenario is that there are documents in the index which are almost
> >> similar (people submitting same stuff multiple times, sometimes
> different
> >> people submitting same stuff). Now when a search is performed for
> >> "keyword",
> >> in the top N results, quite frequently, same document comes up multiple
> >> times. I want to remove those duplicate (or possible duplicate)
> documents.
> >> Very similar to what Google does when they say "In order to show you
> most
> >> relevant result, duplicates have been removed". How can I achieve this
> >> functionality using Solr? Does Solr has an implied or plugin which could
> >> help me with it?
> >>
> >>
> >> *Pranav Prakash*
> >>
> >> "temet nosce"
> >>
> >> Twitter <http://twitter.com/pranavprakash> | Blog <
> http://blog.myblive.com
> >> >
> >> |
> >> Google <http://www.google.com/profiles/pranny>
> >>
> >
>



-- 
Thanks and Regards
Mohammad Shariq

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq

I am making the Hash from URL, but I can't use this as UniqueKey because I
am using UUID as UniqueKey,
Since I am using SOLR as  index engine Only and using Riak(key-value
storage) as storage engine, I dont want to do the overwrite on duplicate.
I just need to discard the duplicates.



2011/6/28 François Schiettecatte 

> Create a hash from the url and use that as the unique key, md5 or sha1
> would probably be good enough.
>
> Cheers
>
> François
>
> On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:
>
> > I also have the problem of duplicate docs.
> > I am indexing news articles, Every news article will have the source URL,
> > If two news-article has the same URL, only one need to index,
> > removal of duplicate at index time.
> >
> >
> >
> > On 23 June 2011 21:24, simon  wrote:
> >
> >> have you checked out the deduplication process that's available at
> >> indexing time ? This includes a fuzzy hash algorithm .
> >>
> >> http://wiki.apache.org/solr/Deduplication
> >>
> >> -Simon
> >>
> >> On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash 
> wrote:
> >>> This approach would definitely work is the two documents are *Exactly*
> >> the
> >>> same. But this is very fragile. Even if one extra space has been added,
> >> the
> >>> whole hash would change. What I am really looking for is some %age
> >>> similarity between documents, and remove those documents which are more
> >> than
> >>> 95% similar.
> >>>
> >>> *Pranav Prakash*
> >>>
> >>> "temet nosce"
> >>>
> >>> Twitter <http://twitter.com/pranavprakash> | Blog <
> >> http://blog.myblive.com> |
> >>> Google <http://www.google.com/profiles/pranny>
> >>>
> >>>
> >>> On Thu, Jun 23, 2011 at 15:16, Omri Cohen  wrote:
> >>>
> >>>> What you need to do, is to calculate some HASH (using any message
> digest
> >>>> algorithm you want, md5, sha-1 and so on), then do some reading on
> solr
> >>>> field collapse capabilities. Should not be too complicated..
> >>>>
> >>>> *Omri Cohen*
> >>>>
> >>>>
> >>>>
> >>>> Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
> >> +972-3-6036295
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> My profiles: [image: LinkedIn] <http://www.linkedin.com/in/omric>
> >> [image:
> >>>> Twitter] <http://www.twitter.com/omricohe> [image:
> >>>> WordPress]<http://omricohen.me>
> >>>> Please consider your environmental responsibility. Before printing
> this
> >>>> e-mail message, ask yourself whether you really need a hard copy.
> >>>> IMPORTANT: The contents of this email and any attachments are
> >> confidential.
> >>>> They are intended for the named recipient(s) only. If you have
> received
> >>>> this
> >>>> email by mistake, please notify the sender immediately and do not
> >> disclose
> >>>> the contents to anyone or make copies thereof.
> >>>> Signature powered by
> >>>> <
> >>>>
> >>
> http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer
> >>>>>
> >>>> WiseStamp<
> >>>>
> >>
> http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> -- Forwarded message --
> >>>> From: Pranav Prakash 
> >>>> Date: Thu, Jun 23, 2011 at 12:26 PM
> >>>> Subject: Removing duplicate documents from search results
> >>>> To: solr-user@lucene.apache.org
> >>>>
> >>>>
> >>>> How can I remove very similar documents from search results?
> >>>>
> >>>> My scenario is that there are documents in the index which are almost
> >>>> similar (people submitting same stuff multiple times, sometimes
> >> different
> >>>> people submitting same stuff). Now when a search is performed for
> >>>> "keyword",
> >>>> in the top N results, quite frequently, same document comes up
> multiple
> >>>> times. I want to remove those duplicate (or possible duplicate)
> >> documents.
> >>>> Very similar to what Google does when they say "In order to show you
> >> most
> >>>> relevant result, duplicates have been removed". How can I achieve this
> >>>> functionality using Solr? Does Solr has an implied or plugin which
> could
> >>>> help me with it?
> >>>>
> >>>>
> >>>> *Pranav Prakash*
> >>>>
> >>>> "temet nosce"
> >>>>
> >>>> Twitter <http://twitter.com/pranavprakash> | Blog <
> >> http://blog.myblive.com
> >>>>>
> >>>> |
> >>>> Google <http://www.google.com/profiles/pranny>
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Thanks and Regards
> > Mohammad Shariq
>
>


-- 
Thanks and Regards
Mohammad Shariq

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq

Hey François,
thanks for your suggestion, I followed the same link (
http://wiki.apache.org/solr/Deduplication)

they have the solution*, either make Hash as uniqueKey OR overwrite on
duplicate,
I dont need either.

I need Discard on Duplicate.
*

>
>
> I have not used it but it looks like it will do the trick.
>
> François
>
> On Jun 28, 2011, at 8:44 AM, Pranav Prakash wrote:
>
> > I found the deduplication thing really useful. Although I have not yet
> > started to work on it, as there are some other low hanging fruits I've to
> > capture. Will share my thoughts soon.
> >
> >
> > *Pranav Prakash*
> >
> > "temet nosce"
> >
> > Twitter <http://twitter.com/pranavprakash> | Blog <
> http://blog.myblive.com> |
> > Google <http://www.google.com/profiles/pranny>
> >
> >
> > 2011/6/28 François Schiettecatte 
> >
> >> Maybe there is a way to get Solr to reject documents that already exist
> in
> >> the index but I doubt it, maybe someone else with can chime here here.
> You
> >> could do a search for each document prior to indexing it so see if it is
> >> already in the index, that is probably non-optimal, maybe it is easiest
> to
> >> check if the document exists in your Riak repository, it no add it and
> index
> >> it, and drop if it already exists.
> >>
> >> François
> >>
> >> On Jun 28, 2011, at 8:24 AM, Mohammad Shariq wrote:
> >>
> >>> I am making the Hash from URL, but I can't use this as UniqueKey
> because
> >> I
> >>> am using UUID as UniqueKey,
> >>> Since I am using SOLR as  index engine Only and using Riak(key-value
> >>> storage) as storage engine, I dont want to do the overwrite on
> duplicate.
> >>> I just need to discard the duplicates.
> >>>
> >>>
> >>>
> >>> 2011/6/28 François Schiettecatte 
> >>>
> >>>> Create a hash from the url and use that as the unique key, md5 or sha1
> >>>> would probably be good enough.
> >>>>
> >>>> Cheers
> >>>>
> >>>> François
> >>>>
> >>>> On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:
> >>>>
> >>>>> I also have the problem of duplicate docs.
> >>>>> I am indexing news articles, Every news article will have the source
> >> URL,
> >>>>> If two news-article has the same URL, only one need to index,
> >>>>> removal of duplicate at index time.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 23 June 2011 21:24, simon  wrote:
> >>>>>
> >>>>>> have you checked out the deduplication process that's available at
> >>>>>> indexing time ? This includes a fuzzy hash algorithm .
> >>>>>>
> >>>>>> http://wiki.apache.org/solr/Deduplication
> >>>>>>
> >>>>>> -Simon
> >>>>>>
> >>>>>> On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash 
> >>>> wrote:
> >>>>>>> This approach would definitely work is the two documents are
> >> *Exactly*
> >>>>>> the
> >>>>>>> same. But this is very fragile. Even if one extra space has been
> >> added,
> >>>>>> the
> >>>>>>> whole hash would change. What I am really looking for is some %age
> >>>>>>> similarity between documents, and remove those documents which are
> >> more
> >>>>>> than
> >>>>>>> 95% similar.
> >>>>>>>
> >>>>>>> *Pranav Prakash*
> >>>>>>>
> >>>>>>> "temet nosce"
> >>>>>>>
> >>>>>>> Twitter <http://twitter.com/pranavprakash> | Blog <
> >>>>>> http://blog.myblive.com> |
> >>>>>>> Google <http://www.google.com/profiles/pranny>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Jun 23, 2011 at 15:16, Omri Cohen  wrote:
> >>>>>>>
> >>>>>>>> What you need to do, is to calculate some HASH (using any message
> >>>> digest
> >>>>>>>> algorithm you want, md5, sha-1 and so on), then do some read

How to disable Phonetic search

2011-06-29 Thread Mohammad Shariq

I am using solr1.4
When I search for keyword "ansys" I get lot of posts.
but when I search for "ansys NOT ansi" I get nothing.
I guess its because of Phonetic search, "ansys" is converted into "ansi" (
that is NOT keyword) and nothing returns.

How to handle this kind of problem.

-- 
Thanks and Regards
Mohammad Shariq

Re: How to disable Phonetic search

2011-06-29 Thread Mohammad Shariq

I was using SnowballPorterFilterFactory for stemming, and that stammer was
stemming the words.
I added the keyword "ansys" to   file "protwords.txt".
Now the stemming is not happening for "ansys" and Its OK now.

On 29 June 2011 17:12, Ahmet Arslan  wrote:

> > I am using solr1.4
> > When I search for keyword "ansys" I get lot of posts.
> > but when I search for "ansys NOT ansi" I get nothing.
> > I guess its because of Phonetic search, "ansys" is
> > converted into "ansi" (
> > that is NOT keyword) and nothing returns.
> >
> > How to handle this kind of problem.
>
> Find and remove occurrences of "solr.PhoneticFilterFactory" from your
> schema.xml file.
>



-- 
Thanks and Regards
Mohammad Shariq

Re: what s the optimum size of SOLR indexes

2011-07-04 Thread Mohammad Shariq

There are Solutions for Indexing huge data. e.g.  SolrCloud,
ZooKeeperIntegration, MultiCore, MultiShard.
depending on your requirement you can choose one or other.


On 4 July 2011 17:21, Jame Vaalet  wrote:

> Hi,
>
> What would be the maximum size of a single SOLR index file for resulting in
> optimum search time ?
> In case I have got to index all the documents in my repository  (which is
> in TB size) what would be the ideal architecture to follow , distributed
> SOLR ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>


-- 
Thanks and Regards
Mohammad Shariq

how to do ExactMatch for PhraseQuery

2011-07-14 Thread Mohammad Shariq

I need exact match On PhraseQuery.
when I search for the phrase "call it spring"   I get the result for :
1) It's spring
2) The spring

but my requirement is ExactMatch for PhraseQuery.
my search field is text.
Along with PhraseQuery I am doing RegularQuery too.
how to tune the solr to do Exactmatch for PhraseQuery without affecting
RegularQuery.

below is "text" field of schema.xml.




















Thanks
Shariq

Re: Need Suggestion

2011-07-15 Thread Mohammad Shariq

below are  certain things to do for search latency.
1) Do bulk insert.
2) Commit after every ~5000 docs.
3) Do optimization once in a day.
4) in search query  use "fq" parameter.

What is the size of JVM you are using ???





On 15 July 2011 17:44, Rohit  wrote:

> I am facing some performance issues on my Solr Installation (3core server).
> I am indexing live twitter data based on certain keywords, as you can
> imagine, the rate at which documents are received is very high and so the
> updates to the core is very high and regular. Given below are the document
> size on my three core.
>
>
>
> Twitter  - 26874747
>
> Core2-  3027800
>
> Core3-  6074253
>
>
>
> My Server configuration has 8GB RAM, but now we are experiencing server
> performance drop. What can be done to improve this?  Also, I have a few
> questions.
>
>
>
> 1.  Does the number of commit takes high memory? Will reducing the
> number of commits per hour help?
> 2.  Most of my queries are field or date faceting based? how to improve
> those?
>
>
>
> Regards,
>
> Rohit
>
>
>
>
>
> Regards,
>
> Rohit
>
> Mobile: +91-9901768202
>
> About Me:  <http://about.me/rohitg> http://about.me/rohitg
>
>
>
>


-- 
Thanks and Regards
Mohammad Shariq

Re: How to find whether solr server is running or not

2011-07-19 Thread Mohammad Shariq

Check for HTTP response code, if its other than 200 means services are not
OK.

On 19 July 2011 14:39, Romi  wrote:

> I am running an application that get search results from solr server. But
> when server is not running i get no response from the server. Is there any
> way i can found that my server is not running so that i can give proper
> error message regarding it
>
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-find-whether-solr-server-is-running-or-not-tp3181870p3181870.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Mohammad Shariq

Delete by range query

2011-07-27 Thread Mohammad Shariq

Hi,
I want to delete the bunch of docs from my solr using rangeQuery.
I have one field called 'time' which is tint.

I am deleting using the query :
time:[1296777600+TO+1296778000]

but solr is returning Error, Saying bad request.
however I am able to delete one by one using below deleteQuery:
time:1296777600

Please suggest any solution to this problem.

-- 
Thanks and Regards
Mohammad Shariq

Re: Delete by range query

2011-07-27 Thread Mohammad Shariq

Thanks Koji
Its working now.


On 27 July 2011 19:30, Koji Sekiguchi  wrote:

> time:[**1296777600+TO+1296778000]
>>
>
> Should be time:[**1296777600 TO
> 1296778000] ?
>
> koji
> --
> http://www.rondhuit.com/en/
>



-- 
Thanks and Regards
Mohammad Shariq

Indexing tweet and searching "@keyword" OR "#keyword"

2011-08-04 Thread Mohammad Shariq

I have indexed around 1 million tweets ( using  "text" dataType).
when I search the tweet with "#"  OR "@"  I dont get the exact result.
e.g.  when I search for "#ipad" OR "@ipad"   I get the result where ipad is
mentioned skipping the "#" and "@".
please suggest me, how to tune or what are filterFactories to use to get the
desired result.
I am indexing the tweet as "text", below is "text" which is there in my
schema.xml.





    

    











-- 
Thanks and Regards
Mohammad Shariq

Re: Indexing tweet and searching "@keyword" OR "#keyword"

2011-08-10 Thread Mohammad Shariq

I tried tweaking "WordDelimiterFactory" but I won't accept # OR @ symbols
and it ignored totally.
I need solution plz suggest.

On 4 August 2011 21:08, Jonathan Rochkind  wrote:

> It's the WordDelimiterFactory in your filter chain that's removing the
> punctuation entirely from your index, I think.
>
> Read up on what the WordDelimiter filter does, and what it's settings are;
> decide how you want things to be tokenized in your index to get the behavior
> your want; either get WordDelimiter to do it that way by passing it
> different arguments, or stop using WordDelimiter; come back with any
> questions after trying that!
>
>
>
> On 8/4/2011 11:22 AM, Mohammad Shariq wrote:
>
>> I have indexed around 1 million tweets ( using  "text" dataType).
>> when I search the tweet with "#"  OR "@"  I dont get the exact result.
>> e.g.  when I search for "#ipad" OR "@ipad"   I get the result where ipad
>> is
>> mentioned skipping the "#" and "@".
>> please suggest me, how to tune or what are filterFactories to use to get
>> the
>> desired result.
>> I am indexing the tweet as "text", below is "text" which is there in my
>> schema.xml.
>>
>>
>> 
>> 
>> 
>> > minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
>> > generateWordParts="1"
>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>> catenateAll="0" splitOnCaseChange="1"/>
>> 
>> > protected="protwords.txt" language="English"/>
>> 
>> 
>> 
>> > words="stopwords.txt"
>> minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> 
>> > protected="protwords.txt" language="English"/>
>> 
>> 
>>
>>


-- 
Thanks and Regards
Mohammad Shariq

Re: Indexing tweet and searching "@keyword" OR "#keyword"

2011-08-10 Thread Mohammad Shariq

Do you really want a search on "ipad" to *fail* to match input of "#ipad"?
Or
vice-versa?
My requirement is :  I want to search both '#ipad' and 'ipad' for q='ipad'
BUT for q='#ipad'  I want to search ONLY '#ipad' excluding 'ipad'.


On 10 August 2011 19:49, Erick Erickson  wrote:

> Please look more carefully at the documentation for WDDF,
> specifically:
>
> split on intra-word delimiters (all non alpha-numeric characters).
>
> WordDelimiterFilterFactory will always throw away non alpha-numeric
> characters, you can't tell it do to otherwise. Try some of the other
> tokenizers/analyzers to get what you want, and also look at the
> admin/analysis page to see what the exact effects are of your
> fieldType definitions.
>
> Here's a great place to start:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> You probably want something like WhitespaceTokenizerFactory
> followed by LowerCaseFilterFactory or some such...
>
> But I really question whether this is what you want either. Do you
> really want a search on "ipad" to *fail* to match input of "#ipad"? Or
> vice-versa?
>
> KeywordTokenizerFactory is probably not the place you want to start,
> the tokenization process doesn't break anything up, you happen to be
> getting separate tokens because of WDDF, which as you see can't
> process things the way you want.
>
>
> Best
> Erick
>
> On Wed, Aug 10, 2011 at 3:09 AM, Mohammad Shariq 
> wrote:
> > I tried tweaking "WordDelimiterFactory" but I won't accept # OR @ symbols
> > and it ignored totally.
> > I need solution plz suggest.
> >
> > On 4 August 2011 21:08, Jonathan Rochkind  wrote:
> >
> >> It's the WordDelimiterFactory in your filter chain that's removing the
> >> punctuation entirely from your index, I think.
> >>
> >> Read up on what the WordDelimiter filter does, and what it's settings
> are;
> >> decide how you want things to be tokenized in your index to get the
> behavior
> >> your want; either get WordDelimiter to do it that way by passing it
> >> different arguments, or stop using WordDelimiter; come back with any
> >> questions after trying that!
> >>
> >>
> >>
> >> On 8/4/2011 11:22 AM, Mohammad Shariq wrote:
> >>
> >>> I have indexed around 1 million tweets ( using  "text" dataType).
> >>> when I search the tweet with "#"  OR "@"  I dont get the exact result.
> >>> e.g.  when I search for "#ipad" OR "@ipad"   I get the result where
> ipad
> >>> is
> >>> mentioned skipping the "#" and "@".
> >>> please suggest me, how to tune or what are filterFactories to use to
> get
> >>> the
> >>> desired result.
> >>> I am indexing the tweet as "text", below is "text" which is there in my
> >>> schema.xml.
> >>>
> >>>
> >>>  positionIncrementGap="100">
> >>> 
> >>> 
> >>>  words="stopwords.txt"
> >>> minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
> >>>  >>> generateWordParts="1"
> >>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> >>> catenateAll="0" splitOnCaseChange="1"/>
> >>> 
> >>>  >>> protected="protwords.txt" language="English"/>
> >>> 
> >>> 
> >>> 
> >>>  >>> words="stopwords.txt"
> >>> minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
> >>>  >>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >>> 
> >>>  >>> protected="protwords.txt" language="English"/>
> >>> 
> >>> 
> >>>
> >>>
> >
> >
> > --
> > Thanks and Regards
> > Mohammad Shariq
> >
>



-- 
Thanks and Regards
Mohammad Shariq

37 matches

Mail list logo