Multi Core indexed using SolrJ
Hello all I have gone through the tutorials of Solrj. now i want to create multi core indexes through solrj but i am not getting clue , so can anybody post some example code ? Regards Dhaivat -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-Core-indexed-using-SolrJ-tp3496830p3496830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi Core indexed using SolrJ
Thanks Ivan, Is there any specific method using which i can create core and add documents in it ? Regards Dhaivat -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-Core-indexed-using-SolrJ-tp3496830p3496869.html Sent from the Solr - User mailing list archive at Nabble.com.
Boosting Original Indexed Terms
Hello All, I need help in boosting original indexed terms. I am storing multiple terms at same position and i want to boost the original term. consider following scenario i am indexing document which contain the following text: "baby t-shirts" i am storing terms as following here is the indexing analysis. position 1 2 term textbabyt-shirts babet-shirt infant child kid startOffset 0 5 0 5 0 0 0 endOffset4 13 4 13 4 4 4 so now i want to boost results on original terms i.e if user searches baby it should returns that results which has original term baby in it. and then others. please let me know how to achieve this. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Boosting-Original-Indexed-Terms-tp4043380.html Sent from the Solr - User mailing list archive at Nabble.com.
Fast Vector Highlighter Working for some records only
Hi I am newbie to Solr and i am using Sorj Client to create index and query the solr data.. When i am querying the data i want to use Highlight feature of solr so i am using Fast Vector Highlighter to enable highlight on words.. I found that it's working fine for some documents and for some document it's returning any highlighted words even though the field of document contents that word.. i am using the following parameters using solrj client : query.add("hl","true"); query.add("hl.q",term); query.add("hl.fl","contents"); query.add("hl.snippets","100"); query.add("hl.fragsize","10"); query.add("hl.maxAnalyzedChars","10"); query.add("hl.useFastVectorHighlighter","true"); query.add("hl.highlightMultiTerm","true"); query.add("hl.regex.slop","0.5"); query.add("hl.regex.pattern","[-\\w ,/\n\\\"']*"); query.setHighlightSimplePre("*"); query.setHighlightSimplePost("*"); My solrconfig is pretty strait forward haven't specified anything related to highlighter there. this is how my solrConfig looks like : solr i have also enabled the TermVectors,TermOffsets,TermPostions on Field on which i am indexing can anyone tell me where i am going wrong ? thanks in advance Dhaivat -- View this message in context: http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3763286.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fast Vector Highlighter Working for some records only
Hi Koji, Thanks for quick reply, i am using solr 1.4.1 i am querying *"camera"* here is the example of documents : which matches the 70 Electronics/Cell Phones /b/l/blackberry-8100-pearl-2.jpg 349.99 BlackBerry 8100 Pearl sports a large 240 x 260 screen that supports over 65,000 colors-- plenty of real estate to view your e-mails, Web browser content, messaging sessions, and attachments. Silver blackberry-8100-pearl.html Like the BlackBerry 7105t, the BlackBerry 8100 Pearl is The BlackBerry 8100 Pearl sports a large 240 x 260 screen that supports over 65,000 colors-- plenty of real estate to view your e-mails, Web browser content, messaging sessions, and attachments. The venerable BlackBerry trackwheel has been replaced on this model with an innovative four-way trackball placed below the screen. On the rear of the handheld, you'll find a 1.3-megapixel camera and a self portrait mirror. The handheld's microSD memory card slot is located inside the device, behind the battery. There's also a standard 2.5mm headset jack that can be used with the included headset, as well as a mini-USB port for data connectivity. BlackBerry 8100 Pearl <ul> <ul class="disc"> <li> 1.3 mega pixel camera to capture those special moments<br></li> <li> MP3 player lets you listen to your favorite music on the go<br></li> <li>Menu and escape keys on the front of the device for easier access<br></li> <li> Bluetooth technology lets you experience hands free and wire free features<br></li> <li>Package Contents: phone,AC adapter,software CD,headset,USB cable,sim- card,get started poster,reference guide<br></li> </ul> 89 Electronics/Cameras/Accessories /u/n/universal-camera-case-2.jpg 34.0 Universal Camera Case Green universal-camera-case.html A stylish digital camera demands stylish protection. This leather carrying case will defend your camera from the dings and scratches of travel and everyday use while looking smart all the time. Universal Camera Case on above documents i am getting highlighting response on documentid = 89 and not for documentId = 70 even though there is word called "camera" in document(id=70).. I have field called for your information i am using custom analyser for indexing and querying. Thanks Dhaivat Koji Sekiguchi wrote > > Dhaivat, > > Can you give us the concrete document that you are trying to search and > make > a highlight snippet? And what is your Solr version? > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ > > (12/02/21 20:29), dhaivat wrote: >> >> Hi >> >> I am newbie to Solr and i am using Sorj Client to create index and query >> the >> solr data.. When i am querying the data i want to use Highlight feature >> of >> solr so i am using Fast Vector Highlighter to enable highlight on words.. >> I >> found that it's working fine for some documents and for some document >> it's >> returning any highlighted words even though the field of document >> contents >> that word.. i am using the following parameters using solrj client : >> >> query.add("hl","true"); >> query.add("hl.q",term); >> query.add("hl.fl","contents"); >> query.add("hl.snippets","100"); >> query.add("hl.fragsize","10"); >> query.add("hl.maxAnalyzedChars","10"); >> query.add("hl.useFastVectorHighlighter","true"); >> query.add("hl.highlightMultiTerm","true"); >> query.add("hl.regex.slop","0.5"); >> query.add("hl.regex.pattern","[-\\w ,/\n\\\"']*"); >> >> query.setHighlightSimplePre("*"); >> query.setHighlightSimplePost("*"); >> >> My solrconfig is pretty strait forward haven't specified anything related >> to >> highlighter there. >> >> this is how my solrConfig looks like : >> >> >> >> >> >> > multipartUploadLimitInKB="2048" /> >> >> >>> default="true" /> >> >>> class="org.apache.solr.handler.admin.AdminHandlers" /> >> >> >> >> >> >> >> solr >> &
Re: Fast Vector Highlighter Working for some records only
Koji Sekiguchi wrote > > (12/02/21 21:22), dhaivat wrote: >> Hi Koji, >> >> Thanks for quick reply, i am using solr 1.4.1 >> > > Uh, you cannot use FVH on Solr 1.4.1. FVH is available Solr 3.1 or later. > So your hl.useFastVectorHighlighter=true flag is ignored. > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ > Thanks for reply, But can you please tell me why it's working for some documents and not for other. -- View this message in context: http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3765458.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fast Vector Highlighter Working for some records only
Koji Sekiguchi wrote > > (12/02/22 11:58), dhaivat wrote: >> Thanks for reply, >> >> But can you please tell me why it's working for some documents and not >> for >> other. > > As Solr 1.4.1 cannot recognize hl.useFastVectorHighlighter flag, Solr just > ignore it, but due to hl=true is there, Solr tries to create highlight > snippets > by using (existing; traditional; I mean not FVH) Highlighter. > Highlighter (including FVH) cannot produce snippets sometime for some > reasons, > you can use hl.alternateField parameter. > > http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ > Thank you so much explanation, I have updated my solr version and using 3.5, Could you please tell me when i am using custom Tokenizer on the field,so do i need to make any changes related Solr highlighter. here is my custom analyser here is the field info: i am creating tokens using my custom analyser and when i am trying to use highlighter it's not working properly for contents field.. but when i tried to use Solr inbuilt tokeniser i am finding the word highlighted for particular query.. Please can you help me out with this ? Thanks in advance Dhaivat -- View this message in context: http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3766335.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fast Vector Highlighter Working for some records only
Hi Koji, Thanks for your guidance. i have looked into anlysis page of solr and it's working fine.but still it's not working fine for few documents. here is configuration for highlighter i am using,i have specefied this in solrconfig.xml, please can you tell me what should i change to highlighter to work for all documents. for your information i am not using any kind of filter for custom field, i am just using my custom tokeniser.. 1000 7 7 70 0.5 [-\w ,/\n\"']{20,200} 10 .,!? WORD en US Koji Sekiguchi wrote > > Hi dhaivat, > > I think you may want to use analysis.jsp: > > http://localhost:8983/solr/admin/analysis.jsp > > Go to the URL and look into how your custom tokenizer produces tokens, > and compare with the output of Solr's inbuilt tokenizer. > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ > > > (12/02/22 21:35), dhaivat wrote: >> >> Koji Sekiguchi wrote >>> >>> (12/02/22 11:58), dhaivat wrote: >>>> Thanks for reply, >>>> >>>> But can you please tell me why it's working for some documents and not >>>> for >>>> other. >>> >>> As Solr 1.4.1 cannot recognize hl.useFastVectorHighlighter flag, Solr >>> just >>> ignore it, but due to hl=true is there, Solr tries to create highlight >>> snippets >>> by using (existing; traditional; I mean not FVH) Highlighter. >>> Highlighter (including FVH) cannot produce snippets sometime for some >>> reasons, >>> you can use hl.alternateField parameter. >>> >>> http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField >>> >>> koji >>> -- >>> Query Log Visualizer for Apache Solr >>> http://soleami.com/ >>> >> >> Thank you so much explanation, >> >> I have updated my solr version and using 3.5, Could you please tell me >> when >> i am using custom Tokenizer on the field,so do i need to make any changes >> related Solr highlighter. >> >> here is my custom analyser >> >> > positionIncrementGap="100"> >> >> > class="ns.solr.analyser.CustomIndexTokeniserFactory"/> >> >> >> >> >> >> >> >> here is the field info: >> >> > multiValued="true" termPositions="true" termVectors="true" >> termOffsets="true"/> >> >> i am creating tokens using my custom analyser and when i am trying to use >> highlighter it's not working properly for contents field.. but when i >> tried >> to use Solr inbuilt tokeniser i am finding the word highlighted for >> particular query.. Please can you help me out with this ? >> >> >> Thanks in advance >> Dhaivat >> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3766335.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > -- View this message in context: http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3769006.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fast Vector Highlighter Working for some records only
Hi Koji i am using solr 3.5 and i want to highlight the multivalued field, when i supply single value for the multi field value at that highlighter is working fine. but when i am indexing multiple values for field and try to highlight that field at that time i am getting following error with Fast Vector Highlighter java.lang.StringIndexOutOfBoundsException: String index out of range: -1099 i have set following parameter using solrj query.add("hl.q",term); query.add("hl.fl","contents"); query.add("hl","true"); query.add("hl.useFastVectorHighlighter","true"); query.add("hl.snippets","100"); query.add("hl.fragsize","7"); query.add("hl.maxAnalyzedChars","7"); can you please tell me the cause of this error ? Thanks in advance Dhaivat Koji Sekiguchi wrote > > Hi dhaivat, > > I think you may want to use analysis.jsp: > > http://localhost:8983/solr/admin/analysis.jsp > > Go to the URL and look into how your custom tokenizer produces tokens, > and compare with the output of Solr's inbuilt tokenizer. > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ > > > (12/02/22 21:35), dhaivat wrote: >> >> Koji Sekiguchi wrote >>> >>> (12/02/22 11:58), dhaivat wrote: >>>> Thanks for reply, >>>> >>>> But can you please tell me why it's working for some documents and not >>>> for >>>> other. >>> >>> As Solr 1.4.1 cannot recognize hl.useFastVectorHighlighter flag, Solr >>> just >>> ignore it, but due to hl=true is there, Solr tries to create highlight >>> snippets >>> by using (existing; traditional; I mean not FVH) Highlighter. >>> Highlighter (including FVH) cannot produce snippets sometime for some >>> reasons, >>> you can use hl.alternateField parameter. >>> >>> http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField >>> >>> koji >>> -- >>> Query Log Visualizer for Apache Solr >>> http://soleami.com/ >>> >> >> Thank you so much explanation, >> >> I have updated my solr version and using 3.5, Could you please tell me >> when >> i am using custom Tokenizer on the field,so do i need to make any changes >> related Solr highlighter. >> >> here is my custom analyser >> >> > positionIncrementGap="100"> >> >> > class="ns.solr.analyser.CustomIndexTokeniserFactory"/> >> >> >> >> >> >> >> >> here is the field info: >> >> > multiValued="true" termPositions="true" termVectors="true" >> termOffsets="true"/> >> >> i am creating tokens using my custom analyser and when i am trying to use >> highlighter it's not working properly for contents field.. but when i >> tried >> to use Solr inbuilt tokeniser i am finding the word highlighted for >> particular query.. Please can you help me out with this ? >> >> >> Thanks in advance >> Dhaivat >> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3766335.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > -- View this message in context: http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3771933.html Sent from the Solr - User mailing list archive at Nabble.com.
Search speed issue on new core creation
Hello All, I am using Master - Slave architecture setup with hundreds of cores getting replicated between master and slave servers. I am facing very weird issue while creating a new core. Whenever there is a new call for a new core creation (using CoreAdminRequest.createCore(coreName,instanceDir,serverObj)) all the searches issued to other cores are getting blocked. Any help or thoughts would highly appreciated. Regards, Dhaivat
wildcard queries with custom analyzer
Hello everyone, I have written custom analyzer for indexing and querying data from solr indexes. Now i would like to enable wildcard search with this custom analyzer only. Please guide me on how to enable this feature? Many Thanks, Dhaivat
Indexing and Query time boosting together
Hello All, I want to boost certain products on particular keywords. for this i am using solr's indexing time boosting feature. i have given index time boosting with "1.0" value to all documents in my solr indices. now what i am doing is when user want to boost certain product i just increase index time boosting value to 10.0 of that particular product only. now the problem is: i have also used query time boosting (for boosting documents when searched term found directly in title field) and so even i have increase the indexing time boosting value of the particular product it appears after query time boosted product. consider following example: - I have indexed couple document related to mobile phone (nokia,samsung and so on) - All the documents contains the title field which contains following value *Doc1:* *==* 122 Nokia Phone 2610 Suprb phone .. *Doc2: * * ==* 123 Samsung smwer233 Samsung phone .. - now if some one searches for "Phone" it will display first "Nokia Phone" second "Samsung Phone" (by searching in and field) - to display "Samsung" before "Nokia" i have boost the index time value , some thing like below 123 Samsung smwer233 Samsung phone .. - i am also using boosting at query time to boost the document which has found terms in field *"titleName:phone^4"* now even though i have higher boosting in samsung mobile it displays nokia mobile first and then samsung mobile. can any one please guide how can i boost particular document using index time boosting(it should appear first even though i am applying query time boosting). Many Thanks, Dhaivat Dave
Re: Indexing and Query time boosting together
Hi Erick Many Thanks for your reply. I got your point. one question on this: is it possible to give more priority to those docs which has higher indexing time boosting against query time boosting. I am trying to achieve product promotions using this implementation. can you please guide how should i implement this feature ? Many Thanks, Dhaivat Dave On Fri, Aug 2, 2013 at 5:34 PM, Erick Erickson wrote: > Add &debug=all to your query, that'll show you exactly how the scores > are calculated. But the most obvious thing is that you're boosting > on the titleName field in your query, which for doc 123 does NOT > contain "phone" so I suspect the fact that "phone" is in the titleName > field for 122 is overriding the index-time boost, especially since "phone" > appears in both title and description for 122. > > Best > Erick > > > On Fri, Aug 2, 2013 at 7:53 AM, dhaivat dave wrote: > > > Hello All, > > > > I want to boost certain products on particular keywords. for this i am > > using solr's indexing time boosting feature. i have given index time > > boosting with "1.0" value to all documents in my solr indices. now what i > > am doing is when user want to boost certain product i just increase index > > time boosting value to 10.0 of that particular product only. now the > > problem is: i have also used query time boosting (for boosting documents > > when searched term found directly in title field) and so even i have > > increase the indexing time boosting value of the particular product it > > appears after query time boosted product. > > > > consider following example: > > > > - I have indexed couple document related to mobile phone (nokia,samsung > and > > so on) > > - All the documents contains the title field which contains following > value > >*Doc1:* > >*==* > > > >122 > >Nokia Phone 2610 > >Suprb phone > > .. > > > > > > > >*Doc2: * > > * ==* > > > > 123 > > Samsung smwer233 > > Samsung phone > > .. > > > > > > > > - now if some one searches for "Phone" it will display first "Nokia > Phone" > > second "Samsung Phone" (by searching in and > > field) > > - to display "Samsung" before "Nokia" i have boost the index time value > , > > some thing like below > > > > > > 123 > > Samsung smwer233 > > Samsung phone > > .. > > > > > > > > - i am also using boosting at query time to boost the document which has > > found terms in field > > *"titleName:phone^4"* > > > > now even though i have higher boosting in samsung mobile it displays > nokia > > mobile first and then samsung mobile. > > > > can any one please guide how can i boost particular document using index > > time boosting(it should appear first even though i am applying query time > > boosting). > > > > Many Thanks, > > Dhaivat Dave > > > -- Regards Dhaivat
Re: Indexing and Query time boosting together
Hey Jack, Thank you so much for your reply. This is very useful. Thanks again, Dhaivat Dave On Fri, Aug 2, 2013 at 8:04 PM, Jack Krupansky wrote: > "product promotions" = "query elevation" > > See: > http://wiki.apache.org/solr/**QueryElevationComponent<http://wiki.apache.org/solr/QueryElevationComponent> > https://cwiki.apache.org/**confluence/display/solr/The+** > Query+Elevation+Component<https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component> > > Or, boost the query using a function query referencing an external file > field that gets updated for promotions. > > -- Jack Krupansky > > -Original Message- From: dhaivat dave > Sent: Friday, August 02, 2013 9:17 AM > To: solr-user@lucene.apache.org > Subject: Re: Indexing and Query time boosting together > > > Hi Erick > > Many Thanks for your reply. I got your point. one question on this: is it > possible to give more priority to those docs which has higher indexing time > boosting against query time boosting. I am trying to achieve product > promotions using this implementation. can you please guide how should i > implement this feature ? > > Many Thanks, > Dhaivat Dave > > On Fri, Aug 2, 2013 at 5:34 PM, Erick Erickson ** > wrote: > > Add &debug=all to your query, that'll show you exactly how the scores >> are calculated. But the most obvious thing is that you're boosting >> on the titleName field in your query, which for doc 123 does NOT >> contain "phone" so I suspect the fact that "phone" is in the titleName >> field for 122 is overriding the index-time boost, especially since "phone" >> appears in both title and description for 122. >> >> Best >> Erick >> >> >> On Fri, Aug 2, 2013 at 7:53 AM, dhaivat dave wrote: >> >> > Hello All, >> > >> > I want to boost certain products on particular keywords. for this i am >> > using solr's indexing time boosting feature. i have given index time >> > boosting with "1.0" value to all documents in my solr indices. now what >> > i >> > am doing is when user want to boost certain product i just increase > >> index >> > time boosting value to 10.0 of that particular product only. now the >> > problem is: i have also used query time boosting (for boosting documents >> > when searched term found directly in title field) and so even i have >> > increase the indexing time boosting value of the particular product it >> > appears after query time boosted product. >> > >> > consider following example: >> > >> > - I have indexed couple document related to mobile phone (nokia,samsung >> and >> > so on) >> > - All the documents contains the title field which contains following >> value >> >*Doc1:* >> >*==* >> > >> >122 >> >Nokia Phone 2610 >> >Suprb phone >> > .. >> > >> > >> > >> >*Doc2: * >> > * ==* >> > >> > 123 >> > Samsung smwer233 >> > Samsung phone >> > .. >> > >> > >> > >> > - now if some one searches for "Phone" it will display first "Nokia >> Phone" >> > second "Samsung Phone" (by searching in and >> > field) >> > - to display "Samsung" before "Nokia" i have boost the index time value >> , >> > some thing like below >> > >> > >> > 123 >> > Samsung smwer233 >> > Samsung phone >> > .. >> > >> > >> > >> > - i am also using boosting at query time to boost the document which has >> > found terms in field >> > *"titleName:phone^4"* >> > >> > now even though i have higher boosting in samsung mobile it displays >> nokia >> > mobile first and then samsung mobile. >> > >> > can any one please guide how can i boost particular document using index >> > time boosting(it should appear first even though i am applying query > >> time >> > boosting). >> > >> > Many Thanks, >> > Dhaivat Dave >> > >> >> > > > -- > > > > > > > > Regards > Dhaivat > -- Regards Dhaivat
developing custom tokenizer
Hello All, I want to create custom tokeniser in solr 4.4. it will be very helpful if some one share any tutorials or information on this. Many Thanks, Dhaivat Dave
Re: developing custom tokenizer
Hi Alex, Thanks for your reply and i looked into core analyser and also created custom tokeniser using that.I have shared code below. when i tried to look into analysis of solr, the analyser is working fine but when i tried to submit 100 docs together i found in logs (with custom message printing) that for some of the document it's not calling "create" method from SampleTokeniserFactory (please see code below). can you please help me out what's wrong in following code. am i missing something? here is the class which extends TokeniserFactory class === SampleTokeniserFactory.java public class SampleTokeniserFactory extends TokenizerFactory { public SampleTokeniserFactory(Map args) { super(args); } public SampleTokeniser create(AttributeFactory factory, Reader reader) { return new SampleTokeniser(factory, reader); } } here is the class which extends Tokenizer class package ns.solr.analyser; import java.io.IOException; import java.io.Reader; import java.util.ArrayList; import java.util.List; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; public class SampleTokeniser extends Tokenizer { private List tokenList = new ArrayList(); int tokenCounter = -1; private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); /** * Object that defines the offset attribute */ private final OffsetAttribute offsetAttribute = (OffsetAttribute) addAttribute(OffsetAttribute.class); /** * Object that defines the position attribute */ private final PositionIncrementAttribute position = (PositionIncrementAttribute) addAttribute(PositionIncrementAttribute.class); public SampleTokeniser(AttributeFactory factory, Reader reader) { super(factory, reader); String textToProcess = null; try { textToProcess = readFully(reader); processText(textToProcess); } catch (IOException e) { e.printStackTrace(); } } public String readFully(Reader reader) throws IOException { char[] arr = new char[8 * 1024]; // 8K at a time StringBuffer buf = new StringBuffer(); int numChars; while ((numChars = reader.read(arr, 0, arr.length)) > 0) { buf.append(arr, 0, numChars); } return buf.toString(); } public void processText(String textToProcess) { String wordsList[] = textToProcess.split(" "); int startOffset = 0, endOffset = 0; for (String word : wordsList) { endOffset = word.length(); Token aToken = new Token("Token." + word, startOffset, endOffset); aToken.setPositionIncrement(1); tokenList.add(aToken); startOffset = endOffset + 1; } } @Override public boolean incrementToken() throws IOException { clearAttributes(); tokenCounter++; if (tokenCounter < tokenList.size()) { Token aToken = tokenList.get(tokenCounter); termAtt.append(aToken); termAtt.setLength(aToken.length()); offsetAttribute.setOffset(correctOffset(aToken.startOffset()), correctOffset(aToken.endOffset())); position.setPositionIncrement(aToken.getPositionIncrement()); return true; } return false; } /** * close object * * @throws IOException */ public void close() throws IOException { super.close(); System.out.println("Close method called"); } /** * called when end method gets called * * @throws IOException */ public void end() throws IOException { super.end(); // setting final offset System.out.println("end called with final offset"); } /** * method reset the record * * @throws IOException */ public void reset() throws IOException { super.reset(); System.out.println("Reset Called"); tokenCounter = -1; } } Many Thanks, Dhaivat On Mon, Aug 12, 2013 at 7:03 PM, Alexandre Rafalovitch wrote: > Have you tried looking at source code itself? Between simple organizer like > keyword and complex language ones, you should be able to get an idea. Then > ask specific follow up questions. > > Regards, > Alex > On 12 Aug 2013 09:29, "dhaivat dave" wrote: > > > Hello All, > > > > I want to create custom tokeniser in solr 4.4. it will be very helpful > if > > some one share any tutorials or information on this. > > > > > > Many Thanks, > > Dhaivat Dave > > > -- Regards Dhaivat
issue with custom tokenizer
Hello All, I am trying to develop custom tokeniser (please find code below) and found some issue while adding multiple document one after another. it works fine when i add first document and when i add another document it's not calling "create" method from SampleTokeniserFactory.java but it calls directly reset method and then call incrementToken(). any one have an idea on this what's wrong in the code below? please share your thoughts on this. here is the class which extends TokeniserFactory class === SampleTokeniserFactory.java public class SampleTokeniserFactory extends TokenizerFactory { public SampleTokeniserFactory(Map args) { super(args); } public SampleTokeniser create(AttributeFactory factory, Reader reader) { return new SampleTokeniser(factory, reader); } } here is the class which extends Tokenizer class package ns.solr.analyser; import java.io.IOException; import java.io.Reader; import java.util.ArrayList; import java.util.List; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; public class SampleTokeniser extends Tokenizer { private List tokenList = new ArrayList(); int tokenCounter = -1; private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); /** * Object that defines the offset attribute */ private final OffsetAttribute offsetAttribute = (OffsetAttribute) addAttribute(OffsetAttribute.class); /** * Object that defines the position attribute */ private final PositionIncrementAttribute position = (PositionIncrementAttribute) addAttribute(PositionIncrementAttribute.class); public SampleTokeniser(AttributeFactory factory, Reader reader) { super(factory, reader); String textToProcess = null; try { textToProcess = readFully(reader); processText(textToProcess); } catch (IOException e) { e.printStackTrace(); } } public String readFully(Reader reader) throws IOException { char[] arr = new char[8 * 1024]; // 8K at a time StringBuffer buf = new StringBuffer(); int numChars; while ((numChars = reader.read(arr, 0, arr.length)) > 0) { buf.append(arr, 0, numChars); } return buf.toString(); } public void processText(String textToProcess) { String wordsList[] = textToProcess.split(" "); int startOffset = 0, endOffset = 0; for (String word : wordsList) { endOffset = word.length(); Token aToken = new Token("Token." + word, startOffset, endOffset); aToken.setPositionIncrement(1); tokenList.add(aToken); startOffset = endOffset + 1; } } @Override public boolean incrementToken() throws IOException { clearAttributes(); tokenCounter++; if (tokenCounter < tokenList.size()) { Token aToken = tokenList.get(tokenCounter); termAtt.append(aToken); termAtt.setLength(aToken.length()); offsetAttribute.setOffset(correctOffset(aToken.startOffset()), correctOffset(aToken.endOffset())); position.setPositionIncrement(aToken.getPositionIncrement()); return true; } return false; } /** * close object * * @throws IOException */ public void close() throws IOException { super.close(); System.out.println("Close method called"); } /** * called when end method gets called * * @throws IOException */ public void end() throws IOException { super.end(); // setting final offset System.out.println("end called with final offset"); } /** * method reset the record * * @throws IOException */ public void reset() throws IOException { super.reset(); System.out.println("Reset Called"); tokenCounter = -1; } }
Boosting Original Indexed Terms
Hello All, I need help in boosting original indexed terms. I am storing multiple terms at same position and i want to boost the original term. consider following scenario i am indexing document which contain the following text: "*baby t-shirts*" i am storing terms as following position12term textbabyt-shirtsbabet-shirtinfantchildkidstartOffset0505000 endOffset413413444 so now i want to boost results on original terms i.e if user searches baby it should returns that results which has original term baby in it. and then others. please let me know how to achieve this. Thanks Dhaivat
Error while indexing data using Solr (Unexpected character 'F' (code 70) in prolog; expected '<')
Hello Everyone , I am getting an error while indexing data to solr. i am using solrj apis to index the document and using the xml request handler to index document. i am getting an error *org.apache.solr.common.SolrException: Unexpected character 'F' (code 70) in prolog; expected '<' at [row,col {unknown-source}]: [1,1] *. i have also escaped the content before sending it to solr. can any please tell me the reason behind this error. Regards Dhaivat
Re: Load Testing in Solr
Thanks Pravedsh for your reply. i ll use the JMeter tool . On Thu, Aug 30, 2012 at 11:10 PM, pravesh wrote: > Hi Dhaivat, > JMeter is a nice tool. But it all depends what sort of load are you > expecting, how complex queries are you expecting(sorting/filtering/textual > searches). You need to consider all these to benchmark. > > Thanx > Pravedsh > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Load-Testing-in-Solr-tp4004117p4004428.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Regards Dhaivat