timezone DIH and dataimport.properties
Hello. How can i set the timezone oft java in my java properties ? my problem is, that in the dataimport-properties is a wrong timezone and i dont know how to set the correct timezone ... !?!? thx - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores < 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/timezone-DIH-and-dataimport-properties-tp2864928p2864928.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to concatenate two nodes of xml with xpathentityprocessor
Vishal, i don't really understand what you're trying to achieve? indexing what (complete/sample documents, valid if possible)? And getting what exactly as result? Regards Stefan On Mon, Apr 25, 2011 at 5:01 PM, vrpar...@gmail.com wrote: > hello , > > i am using Xpathentityprocessor to do index xml files > > below is my xml file > > > >CustomerA > ThisB > AnyC > > > now i want to concatenate in index so that when i search it gives below > result > > CData with id attribute--- like CustomerA id="2">ThisB or something like that > > is it possible by RegexTransformer or templatetransformer? i did googling > little for both but could not get excat/useful solution > > Thanks > > Vishal Parekh > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-concatenate-two-nodes-of-xml-with-xpathentityprocessor-tp2861260p2861260.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: timezone DIH and dataimport.properties
java -Duser.timezone=UTC -jar start.jar ? On Tue, Apr 26, 2011 at 9:54 AM, stockii wrote: > Hello. > > How can i set the timezone oft java in my java properties ? > > my problem is, that in the dataimport-properties is a wrong timezone and i > dont know how to set the correct timezone ... !?!? thx > > - > --- System > > > One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, > 1 Core with 31 Million Documents other Cores < 100.000 > > - Solr1 for Search-Requests - commit every Minute - 5GB Xmx > - Solr2 for Update-Request - delta every Minute - 4GB Xmx > -- > View this message in context: > http://lucene.472066.n3.nabble.com/timezone-DIH-and-dataimport-properties-tp2864928p2864928.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Problem with autogeneratePhraseQueries
Hi, I'm new to solr. My solr instance version is: Solr Specification Version: 3.1.0 Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 18:00:07 Lucene Specification Version: 3.1.0 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 Current Time: Tue Apr 26 08:01:09 CEST 2011 Server Start Time:Tue Apr 26 07:59:05 CEST 2011 I have following definition for textgen type: I'm using this type for name field in my index. As you can see I'm using autoGeneratePhraseQueries="false" but for query sony vaio 4gb I'm getting following query in debug: sony vaio 4gb sony vaio 4gb +name:sony +name:vaio +MultiPhraseQuery(name:"(4gb 4) gb") +name:sony +name:vaio +name:"(4gb 4) gb" Do you have any idea how can I avoid this MultiPhraseQuery? Best Regards, solr_beginner
Re: Problem with autogeneratePhraseQueries
What do you have in solrconfig.xml for luceneMatchVersion? If you don't set this, then its going to default to "Lucene 2.9" emulation so that old solr 1.4 configs work the same way. I tried your example and it worked fine here, and I'm guessing this is probably whats happening. the default in the example/solrconfig.xml looks like this: LUCENE_31 On Tue, Apr 26, 2011 at 6:51 AM, Solr Beginner wrote: > Hi, > > I'm new to solr. My solr instance version is: > > Solr Specification Version: 3.1.0 > Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 > 18:00:07 > Lucene Specification Version: 3.1.0 > Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 > Current Time: Tue Apr 26 08:01:09 CEST 2011 > Server Start Time:Tue Apr 26 07:59:05 CEST 2011 > > I have following definition for textgen type: > > autoGeneratePhraseQueries="false"> > > > words="stopwords.txt" enablePositionIncrements="true" /> > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > preserveOriginal="1"/> > > side="front" preserveOriginal="1"/> > > > > ignoreCase="true" expand="true"/> > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true"/> > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0" preserveOriginal="1"/> > > > > > > I'm using this type for name field in my index. As you can see I'm > using autoGeneratePhraseQueries="false" but for query sony vaio 4gb I'm > getting following query in debug: > > > sony vaio 4gb > sony vaio 4gb > +name:sony +name:vaio +MultiPhraseQuery(name:"(4gb > 4) gb") > +name:sony +name:vaio +name:"(4gb 4) > gb" > > Do you have any idea how can I avoid this MultiPhraseQuery? > > Best Regards, > solr_beginner >
Re: Query regarding solr plugin.
Sorry, but there's too much here to debug remotely. I strongly advise you back wy up. Undo (but save) all your changes. Start by doing the simplest thing you can, just get a dummy class in place and get it called. Perhaps create a really dumb logger method that opens a text file, writes a message, and closes the file. Inefficient I know, but this is just to find out the problem. Debugging by println is an ancient technique... Once you're certain the dummy class is called, gradually build it up to the complex class you eventually want. One problem here is that you've changed a bunch of moving parts, copied jars around (it's unclear whether you have two copies of solr-core in your classpath, for instance). So knowing exactly which one of those is the issue is very difficult, especially since you may have forgotten one of the things you did. I know when I've been trying to do something for days, lots of details get lost. Try to avoid changing the underlying Solr code, can you do what you want by subclassing instead and calling your new class? That would avoid a bunch of problems. If you can't subclass, copy the whole thing and rename it to something new and call *that* rather than re-use the synonymfilterfactory. The only jar you should copy to the directory would be the one you put your new class in. I can't emphasize strongly enough that you'll save yourself lots of grief if you start with a fresh install and build up gradually rather than try to unravel the current code. It feels wasteful, but winds up being faster in my experience... Good Luck! Erick On Tue, Apr 26, 2011 at 12:41 AM, rajini maski wrote: > Thanks Erick. I have added my replies to the points you did mention. I am > somewhere going wrong. I guess do I need to club both the jars or something > ? If yes, how do i do that? I have no much idea about java and jar files. > Please guide me here. > > A couple of things to try. > > 1> when you do a 'jar -tfv ", you should see > output like: > 1183 Sun Jun 06 01:31:14 EDT 2010 > org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class > and your statement may need the whole path, in this example... > (note, > this > is just an example of the pathing, this class has nothing to do with > your filter)... > > I could see this output.. > > 2> But I'm guessing your path is actually OK, because I'd expect to be > seeing a > "class not found" error. So my guess is that your class depends on > other jars that > aren't packaged up in your jar and if you find which ones they are and copy > them > to your lib directory you'll be OK. Or your code is throwing an error > on load. Or > something like that... > > There is jar - "apache-solr-core-1.4.1.jar" this has the > BaseTokenFilterFacotry class and the Synonymfilterfactory class..I made the > changes in second class file and created it as new. Now i created a jar of > that java file and placed this in solr home/lib and also placed > "apache-solr-core-1.4.1.jar" file in lib folder of solr home. [solr home - > c:\orch\search\solr lib path - c:\orch\search\solr\lib] > > 3> to try to understand what's up, I'd back up a step. Make a really > stupid class > that doesn't do anything except derive from BaseTokenFilterFacotry and see > if > you can load that. If you can, then your process is OK and you need to > find out what classes your new filter depend on. If you still can't, then we > can > see what else we can come up with.. > > > I am perhaps doing same. In the synonymfilterfactory class, there is a > function parse rules which takes delimiters as one of the input parameter. > Here i changed comma ',' to '~' tilde symbol and thats it. > > > Regards, > Rajani > > > On Mon, Apr 25, 2011 at 6:26 PM, Erick Erickson > wrote: > >> Looking at things more carefully, it may be one of your dependent classes >> that's not being found. >> >> A couple of things to try. >> >> 1> when you do a 'jar -tfv ", you should see >> output like: >> 1183 Sun Jun 06 01:31:14 EDT 2010 >> org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class >> and your statement may need the whole path, in this example... >> (note, >> this >> is just an example of the pathing, this class has nothing to do with >> your filter)... >> >> 2> But I'm guessing your path is actually OK, because I'd expect to be >> seeing a >> "class not found" error. So my guess is that your class depends on >> other jars that >> aren't packaged up in your jar and if you find which ones they are and copy >> them >> to your lib directory you'll be OK. Or your code is throwing an error >> on load. Or >> something like that... >> >> 3> to try to understand what's up, I'd back up a step. Make a really >> stupid class >> that doesn't do anything except derive from BaseTokenFilterFacotry and see >> if >> you can load that. If you can, then your process is OK and you need to >> find out what classes your new filter depend on. If you still can't, then >> we can >> see what else we can come up with.. >>
Re: Problem with autogeneratePhraseQueries
Thank you very much for answer. You were right. There was no luceneMatchVersion in solrconfig.xml of our dev core. We thought that values not present in core configuration are copied from main solrconfig.xml. I will investigate if our administrators did something wrong during upgrade to 3.1. On Tue, Apr 26, 2011 at 1:35 PM, Robert Muir wrote: > What do you have in solrconfig.xml for luceneMatchVersion? > > If you don't set this, then its going to default to "Lucene 2.9" > emulation so that old solr 1.4 configs work the same way. I tried your > example and it worked fine here, and I'm guessing this is probably > whats happening. > > the default in the example/solrconfig.xml looks like this: > > > LUCENE_31 > > On Tue, Apr 26, 2011 at 6:51 AM, Solr Beginner > wrote: > > Hi, > > > > I'm new to solr. My solr instance version is: > > > > Solr Specification Version: 3.1.0 > > Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 > > 18:00:07 > > Lucene Specification Version: 3.1.0 > > Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 > > Current Time: Tue Apr 26 08:01:09 CEST 2011 > > Server Start Time:Tue Apr 26 07:59:05 CEST 2011 > > > > I have following definition for textgen type: > > > > positionIncrementGap="100" > > autoGeneratePhraseQueries="false"> > > > > > > > words="stopwords.txt" enablePositionIncrements="true" /> > > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > preserveOriginal="1"/> > > > > maxGramSize="15" > > side="front" preserveOriginal="1"/> > > > > > > > > > ignoreCase="true" expand="true"/> > > > ignoreCase="true" > > words="stopwords.txt" > > enablePositionIncrements="true"/> > > > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > > catenateAll="0" preserveOriginal="1"/> > > > > > > > > > > > > I'm using this type for name field in my index. As you can see I'm > > using autoGeneratePhraseQueries="false" but for query sony vaio 4gb I'm > > getting following query in debug: > > > > > > sony vaio 4gb > > sony vaio 4gb > > +name:sony +name:vaio > +MultiPhraseQuery(name:"(4gb > > 4) gb") > > +name:sony +name:vaio +name:"(4gb 4) > > gb" > > > > Do you have any idea how can I avoid this MultiPhraseQuery? > > > > Best Regards, > > solr_beginner > > >
Re: how to concatenate two nodes of xml with xpathentityprocessor
Thanks Stefan currently in dataconfig file part of xPathEntityProcessor and when i do make search i get following search result CustomerA AnyC 1 3 but i want following result 1,CustomerA 3,AnyC OR CustomerA AnyC or any other format but i want both combine, Thanks Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-concatenate-two-nodes-of-xml-with-xpathentityprocessor-tp2861260p2865508.html Sent from the Solr - User mailing list archive at Nabble.com.
What initialize new searcher?
Hi, I'm reading solr cache documentation - http://wiki.apache.org/solr/SolrCaching I found there "The current Index Searcher serves requests and when a new searcher is opened...". Could you explain when new searcher is opened? Does it have something to do with index commit? Best Regards, Solr Beginner
TermsCompoment + Dist. Search + Large Index + HEAP SPACE
Hi! We've got one index splitted into 4 shards á 70.000 records of large full-text data from (very dirty) OCR. Thus we got a lot of "unique" terms. No we try to obtain the first 400 most common words for "CommonGramsFilter" via TermsComponent but the request runs allways out of memory. The VM is equipped with 32 GB of RAM, 16-26 GB alocated to the Java-VM. Any Ideas how to get the most common terms without increasing VMs Memory? Thanks & best regards, Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/TermsCompoment-Dist-Search-Large-Index-HEAP-SPACE-tp2865609p2865609.html Sent from the Solr - User mailing list archive at Nabble.com.
org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
Hello, i got following source org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:459) . actually this error comes in solr 3.1 only in solr 1.4.1 it works fine how to solve this problem? Thanks Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Error-loading-class-org-apache-solr-handler-dataimport-DataImpo-tp2865625p2865625.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
http://www.lucidimagination.com/blog/2011/04/01/solr-powered-isfdb-part-8/ On Tue, Apr 26, 2011 at 3:34 PM, vrpar...@gmail.com wrote: > Hello, > > i got following source > > org.apache.solr.common.SolrException: Error loading class > 'org.apache.solr.handler.dataimport.DataImportHandler' at > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) > at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423) at > org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:459) . > > actually this error comes in solr 3.1 only in solr 1.4.1 it works fine > > how to solve this problem? > > Thanks > > Vishal Parekh > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Error-loading-class-org-apache-solr-handler-dataimport-DataImpo-tp2865625p2865625.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Automatic synonyms for multiple variations of a word
On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic wrote: > But somehow this feels bad (well, so does sticking word variations in what's > supposed to be a synonyms file), partly because it means that the person > adding > new synonyms would need to know what they stem to (or always check it against > Solr before editing the file). when creating the synonym map from your input file, currently the factory actually uses your Tokenizer only to pre-process the synonyms file. One idea would be to use the tokenstream up to the synonymfilter itself (including filters). This way if you put a stemmer before the synonymfilter, it would stem your synonyms file, too. I haven't totally thought the whole thing through to see if theres a big reason why this wouldn't work (the synonymsfilter is complicated, sorry). But it does seem like it would produce more consistent results... and perhaps the inconsistency isnt so obvious since in the default configuration the synonymfilter is directly after the tokenizer.
WhitespaceTokenizer and scoring(field length)
Hello, I have a problem with the whitespaceTokenizer and scoring. An example: id Titel 1 Manchester united 2 Manchester With the whitespaceTokenizer "Manchester united" will be splitted to "Manchester" and "united". When i search for "manchester" i get id 1 and 2 in my results. What i want is that id 2 scores higher(field length). How can i fix this? -- View this message in context: http://lucene.472066.n3.nabble.com/WhitespaceTokenizer-and-scoring-field-length-tp2865784p2865784.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: TermsCompoment + Dist. Search + Large Index + HEAP SPACE
Don't know your use case, but if you just want a list of the 400 most common words you can use the lucene contrib. HighFreqTerms.java with the - t flag. You have to point it at your lucene index. You also probably don't want Solr to be running and want to give the JVM running HighFreqTerms a lot of memory. http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/misc/src/java/org/apache/lucene/misc/HighFreqTerms.java?view=log Tom http://www.hathitrust.org/blogs/large-scale-search -Original Message- From: mdz-munich [mailto:sebastian.lu...@bsb-muenchen.de] Sent: Tuesday, April 26, 2011 9:29 AM To: solr-user@lucene.apache.org Subject: TermsCompoment + Dist. Search + Large Index + HEAP SPACE Hi! We've got one index splitted into 4 shards á 70.000 records of large full-text data from (very dirty) OCR. Thus we got a lot of "unique" terms. No we try to obtain the first 400 most common words for "CommonGramsFilter" via TermsComponent but the request runs allways out of memory. The VM is equipped with 32 GB of RAM, 16-26 GB alocated to the Java-VM. Any Ideas how to get the most common terms without increasing VMs Memory? Thanks & best regards, Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/TermsCompoment-Dist-Search-Large-Index-HEAP-SPACE-tp2865609p2865609.html Sent from the Solr - User mailing list archive at Nabble.com.
Apache Solr 3.1.0
I'm trying to tokenize email and IP addresses using StandardTokenizerFactory. It does correctly tokenize IP address but it divides email address into two tokens one with value before '@' and the other with value after that. It works correctly under Solr 1.4.1 Has anybody else tried similar thing on Solr 3.1.0 successfully or is it a potential bug? Thanks, Wlodek S. -- View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-3-1-0-tp2866007p2866007.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Apache Solr 3.1.0
Hi Wodek, UAX29URLEmailTokenizer includes all of StandardTokenizer's rules and adds rules to tokenize URLs and Emails: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.UAX29URLEmailTokenizerFactory Steve > -Original Message- > From: Wodek Siebor [mailto:siebor_wlo...@bah.com] > Sent: Tuesday, April 26, 2011 11:29 AM > To: solr-user@lucene.apache.org > Subject: Apache Solr 3.1.0 > > I'm trying to tokenize email and IP addresses using > StandardTokenizerFactory. > It does correctly tokenize IP address but it divides email address into > two > tokens one with value before '@' and the other with value after that. > > It works correctly under Solr 1.4.1 > > Has anybody else tried similar thing on Solr 3.1.0 successfully or is it > a > potential bug? > > Thanks, > Wlodek S. > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Apache- > Solr-3-1-0-tp2866007p2866007.html > Sent from the Solr - User mailing list archive at Nabble.com.
Problems with Spellchecker in 3.1
Hi, all. Sorry for any duplication - seems like what I sent yesterday never made it through... We're having some troubles with the Solr Spellcheck Response. We're running version 3.1. Overview: If we search for something really ugly like: "kljhklsdjahfkljsdhf book rck" then when we get back the response, there's a suggestions list for 'rck', but no suggestions list for the other two words. For 'book', that's fine, because it is 'spelled correctly' (i.e. we got hits on the word) and there shouldn't be any suggestions. For the ugly thing, though, there aren't any hits. The problem is that when we're handling the result, we can't tell the difference between no suggestions for a 'correctly spelled' term, and no suggestions for something that's odd like this. (Now - this is happening with searches that aren't as obviously garbage - i.e. words that are real words, just that just don't show up in the index and have no suggestions - this was just to illustrate the point). Our setup: We're running multiple shards, which may be part of the issue. For example, 'book' might be found in one of the shards, but not another. I don't *think* this has anything to do with our schema, since it's really how the Search Suggestions are being returned to us. But, here are some bits and pieces: >From schema.xml: >From solrconfig.xml: textSpell default textSpell ./spellchecker What we'd really like to see is the response coming back with an indication that a word wasn't found / had no suggestions. We've hacked around in the code a little bit to do this, but were wondering if anyone has come across this, and what approaches you've taken. We created new classes which extend IndexBasedSpellChecker and SpellCheckComponent, as follows (package and imports excluded for (sort of) brevity). The methods are as taken from the overridden classes, with changes noted by "SD" type comments... /** * This has a slight modification of Solr's AbstractLuceneSpellChecker.getSuggestions(SpellingOptions). * The modification allows correctly spelled words to be returned in the suggestion. This modification working in tandem * with the SirsiDynixSpellCheckComponent allows words with no suggestions to be returned from the spell check component * even in a sharded search. * Changes are marked with SD in the comments. */ public class SirsiDynixIndexBasedSpellChecker extends IndexBasedSpellChecker{ @Override public SpellingResult getSuggestions(SpellingOptions options) throws IOException { boolean shardRequest = false; SolrParams params = options.customParams; if(params!=null) { shardRequest = "true".equals(params.get(ShardParams.IS_SHARD)); } SpellingResult result = new SpellingResult(options.tokens); IndexReader reader = determineReader(options.reader); Term term = field != null ? new Term(field, "") : null; float theAccuracy = (options.accuracy == Float.MIN_VALUE) ? spellChecker.getAccuracy() : options.accuracy; int count = Math.max(options.count, AbstractLuceneSpellChecker.DEFAULT_SUGGESTION_COUNT); for (Token token : options.tokens) { String tokenText = new String(token.buffer(), 0, token.length()); String[] suggestions = spellChecker.suggestSimilar(tokenText, count, field != null ? reader : null, //workaround LUCENE-1295 field, options.onlyMorePopular, theAccuracy); if (suggestions.length == 1 && suggestions[0].equals(tokenText)) { //These are spelled the same, continue on List suggList = Arrays.asList(suggestions); //SD added result.add(token, suggList);//SD added continue; } if (options.extendedResults == true && reader != null && field != null) { term = term.createTerm(tokenText); result.add(token, reader.docFreq(term)); int countLimit = Math.min(options.count, suggestions.length); if(countLimit>0) { for (int i = 0; i < countLimit; i++) { term = term.createTerm(suggestions[i]); result.add(token, suggestions[i], reader.docFreq(term)); } } else if(shardRequest) { List suggList = Collections.emptyList(); result.add(token, suggList); } } else { if (suggestions.length > 0) { List suggList = Arrays.asList(suggestions); if (suggestions.length > options.count) { suggList = suggList.subList(0, options.count); } result.add(token, suggList); } else if(shardRequest) { List suggList = Collections.emptyList(); result.add(token, suggList); } } } return result; } } /** * This is a
Ebay Kleinanzeigen and Auto Suggest
Hi Someone told me that ebay is using solr. I was looking at their Auto Suggest implementation and I guess they are using Shingles and the TermsComponent. I managed to get a satisfactory implementation but I have a problem with category specific filtering. Ebay suggestions are sensitive to categories like Cars and Pets. As far as I understand it is not possible to using filters with a term query. Unless one uses multiple fields or special prefixes for the words to index I cannot think how to implement this. Is their perhaps a workaround for this limitation? Best Regards EricZ --- I am have a shingle type like: and a query like http://localhost:8983/solr/terms?q=*%3A*&terms.fl=suggest_text&terms.sort=count&terms.prefix=audi
Solr Newbie: Starting embedded server with multicore
I'm just starting with Solr. I'm using Solr 3.1.0, and I want to use EmbeddedSolrServer with a multicore setup, even though I currently have only one core (various documents I read suggest starting that way even if you have one core, to get the better administrative tools supported by mutlicore). I have two questions: 1. Does the first code sample below start the server with multicore or not? 2. Why is it the first sample work and the second does not? My solr.xml looks like this: It's in a directory called solrhome in war/WEB-INF. I can get the server to come up cleanly if I follow an example in the Packt Solr book (p. 231), but I'm not sure if this enables multi-core or not: File solrXML = new File("war/WEB-INF/solrhome/solr.xml"); String solrHome = solrXML.getParentFile().getAbsolutePath(); String dataDir = solrHome + "/data"; coreContainer = new CoreContainer(solrHome); SolrConfig solrConfig = new SolrConfig(solrHome, "solrconfig.xml", null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, "mycore", solrHome); SolrCore solrCore = new SolrCore("mycore", dataDir + "/" + "mycore", solrConfig, null, coreDescriptor); coreContainer.register(solrCore, false); embeddedSolr = new EmbeddedSolrServer(coreContainer, "mycore"); The documentation on the Solr wiki says I should configure the EmbeddedSolrServer for multicore like this: File home = new File( "/path/to/solr/home" ); File f = new File( home, "solr.xml" ); CoreContainer container = new CoreContainer(); container.load( "/path/to/solr/home", f ); EmbeddedSolrServer server = new EmbeddedSolrServer( container, "core name as defined in solr.xml" ); When I try to do this, I get an error saying that it cannot find solrconfig.xml: File solrXML = new File("war/WEB-INF/solrhome/solr.xml"); String solrHome = solrXML.getParentFile().getAbsolutePath(); coreContainer = new CoreContainer(); coreContainer.load(solrHome, solrXML); embeddedSolr = new EmbeddedSolrServer(coreContainer, "mycore"); The message says it is looking in an odd place (I removed my user name from this). Why is it looking in solrhome/mycore/conf for solrconfig.xml? Both that and my schema.xml are in solrhome/conf. How can I point it at the right place? I tried adding "\workspace-Solr\institution-webapp\war\WEB-INF\solrhome\conf" to the classpath, but got the same result: SEVERE: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or '\workspace-Solr\institution-webapp\war\WEB-INF\solrhome\mycore\conf/', cwd=\workspace-Solr\institution-webapp at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:268) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:234) at org.apache.solr.core.Config.(Config.java:141) at org.apache.solr.core.SolrConfig.(SolrConfig.java:132) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:430) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
RE: TermsCompoment + Dist. Search + Large Index + HEAP SPACE
Thanks for your suggestion. It seems to be the use of shards and TermsComponent together. Now we simple requesting shard-by-shard without "shard" and "shard.qt" params and merge the results via XSLT. Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/TermsCompoment-Dist-Search-Large-Index-HEAP-SPACE-tp2865609p2866499.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What initialize new searcher?
You're on the right track. In a system where the indexing process and search process are on the same machine, commits by the index process cause a new searcher to opened. In a master/slave situation (assuming you are indexing on the master and searching on the slave), then the searchers are reopened on the slaves after a replication. Replications happen after 1> a commit happens on the master and 2> the slave polls the master and pulls down the new commits. Hope that helps Erick On Tue, Apr 26, 2011 at 8:50 AM, Solr Beginner wrote: > Hi, > > I'm reading solr cache documentation - > http://wiki.apache.org/solr/SolrCaching I found there "The current > Index Searcher serves requests and when a new searcher is opened...". > Could you explain when new searcher is opened? Does it have something > to do with index commit? > > Best Regards, > Solr Beginner >
Re: WhitespaceTokenizer and scoring(field length)
First, you can give us some more data to work with ... In particular, attach &debugQuery=on to your http request and post the results. That will show how the documents got their score. Also, show us the definition and definition for the field in question. Best Erick On Tue, Apr 26, 2011 at 10:27 AM, roySolr wrote: > Hello, > > I have a problem with the whitespaceTokenizer and scoring. An example: > > id Titel > 1 Manchester united > 2 Manchester > > With the whitespaceTokenizer "Manchester united" will be splitted to > "Manchester" and "united". When > i search for "manchester" i get id 1 and 2 in my results. What i want is > that id 2 scores higher(field length). > How can i fix this? > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/WhitespaceTokenizer-and-scoring-field-length-tp2865784p2865784.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Question on Batch process
I am sure that this question has been asked a few times, but I can't seem to find the sweetspot for indexing. I have about 100,000 files each containing 1,000 xml documents ready to be posted to Solr. My desire is to have it index as quickly as possible and then once completed the daily stream of ADDs will be small in comparison. The individual documents are small. Essentially web postings from the net. Title, postPostContent, date. What would be the ideal configuration? For RamBufferSize, mergeFactor, MaxbufferedDocs, etc.. My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in TOP I have 16GB of available ram. Thanks in advance. Charlie
Re: Automatic synonyms for multiple variations of a word
Suppose your analysis stack includes lower-casing, but your synonyms are only supposed to apply to upper-case tokens. For example, "PET" might be a synonym of "positron emission tomography", but "pet" wouldn't be. -Mike On 04/26/2011 09:51 AM, Robert Muir wrote: On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic wrote: But somehow this feels bad (well, so does sticking word variations in what's supposed to be a synonyms file), partly because it means that the person adding new synonyms would need to know what they stem to (or always check it against Solr before editing the file). when creating the synonym map from your input file, currently the factory actually uses your Tokenizer only to pre-process the synonyms file. One idea would be to use the tokenstream up to the synonymfilter itself (including filters). This way if you put a stemmer before the synonymfilter, it would stem your synonyms file, too. I haven't totally thought the whole thing through to see if theres a big reason why this wouldn't work (the synonymsfilter is complicated, sorry). But it does seem like it would produce more consistent results... and perhaps the inconsistency isnt so obvious since in the default configuration the synonymfilter is directly after the tokenizer.
Re: Automatic synonyms for multiple variations of a word
Mike, thanks a lot for your example: the idea here would be you would put the lowercasefilter after the synonymfilter, and then you get this exact flexibility? e.g. WhitespaceTokenizer SynonymFilter -> no lowercasing of tokens are done as it "analyzes" your synonyms with just the tokenizer LowerCaseFilter but WhitespaceTokenizer LowerCaseFilter SynonymFilter -> the synonyms are lowercased, as it "analyzes" synonyms with the tokenizer+filter its already inconsistent today, because if you do: LowerCaseTokenizer SynonymFilter then your synonyms are in fact all being lowercased... its just arbitrary that they are only being analyzed with the "tokenizer". On Tue, Apr 26, 2011 at 4:13 PM, Mike Sokolov wrote: > Suppose your analysis stack includes lower-casing, but your synonyms are > only supposed to apply to upper-case tokens. For example, "PET" might be a > synonym of "positron emission tomography", but "pet" wouldn't be. > > -Mike > > On 04/26/2011 09:51 AM, Robert Muir wrote: >> >> On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic >> wrote: >> >> >>> >>> But somehow this feels bad (well, so does sticking word variations in >>> what's >>> supposed to be a synonyms file), partly because it means that the person >>> adding >>> new synonyms would need to know what they stem to (or always check it >>> against >>> Solr before editing the file). >>> >> >> when creating the synonym map from your input file, currently the >> factory actually uses your Tokenizer only to pre-process the synonyms >> file. >> >> One idea would be to use the tokenstream up to the synonymfilter >> itself (including filters). This way if you put a stemmer before the >> synonymfilter, it would stem your synonyms file, too. >> >> I haven't totally thought the whole thing through to see if theres a >> big reason why this wouldn't work (the synonymsfilter is complicated, >> sorry). But it does seem like it would produce more consistent >> results... and perhaps the inconsistency isnt so obvious since in the >> default configuration the synonymfilter is directly after the >> tokenizer. >> >
Re: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
I experienced the same issue. With Solr 1.x, I was copying out the 'example' directory to make my solr installation. However, for the Solr 3.x distributions, the DataImportHandler class exists in a directory that is at the same level as example: "dist", not a directory within. You'll either want to take the entire apache 3.1 directory, or modify solrconfig to point to the new place you've copied it: On Tue, Apr 26, 2011 at 6:38 AM, Stefan Matheis wrote: > http://www.lucidimagination.com/blog/2011/04/01/solr-powered-isfdb-part-8/ > > On Tue, Apr 26, 2011 at 3:34 PM, vrpar...@gmail.com > wrote: >> Hello, >> >> i got following source >> >> org.apache.solr.common.SolrException: Error loading class >> 'org.apache.solr.handler.dataimport.DataImportHandler' at >> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) >> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423) at >> org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:459) . >> >> actually this error comes in solr 3.1 only in solr 1.4.1 it works fine >> >> how to solve this problem? >> >> Thanks >> >> Vishal Parekh >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Error-loading-class-org-apache-solr-handler-dataimport-DataImpo-tp2865625p2865625.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >
RE: term position question from analyzer stack for WordDelimiterFilterFactory
OK this is even more weird... everything is working much better except for one thing: I was testing use cases with our top query terms to make sure the below query settings wouldn't break any existing behavior, and got this most unusual result. The analyzer stack completely eliminated the word McAfee from the query terms! I'm like huh? Here is the analyzer page output for that search term: Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text McAfee term type word source start,end0,6 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=query_synonyms.txt, expand=true, ignoreCase=true} term position 1 term text McAfee term type word source start,end0,6 payload org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position 1 term text McAfee term type word source start,end0,6 payload org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} term position term text term type source start,end payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position term text term type source start,end payload com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory {protected=protwords.txt} term position term text term type source start,end payload org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position term text term type source start,end payload -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, April 25, 2011 11:27 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: RE: term position question from analyzer stack for WordDelimiterFilterFactory Aha! I knew something must be awry, but when I looked at the analysis page output, well it sure looked like it should match. :) OK here is the query side WDF that finally works, I just turned everything off. (yay) First I tried just completely removeing WDF from the query side analyzer stack but that didn't work. So anyway I suppose I should turn off the catenate all plus the preserve original settings, reindex, and see if I still get a match huh? (PS thank you very much for the help!!!) -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, April 25, 2011 9:24 AM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen wrote: > The search and index analyzer stack are the same. Ahhh, they should not be! Using both generate and catenate in WDF at query time is a no-no. Same reason you can't have multi-word synonyms at query time: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym FilterFactory I'd recommend going back to the WDF settings in the solr example server as a starting point. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: WhitespaceTokenizer and scoring(field length)
Hi, If you run your query with debugQuery=true you will see the explanation about how Lucene/Solr went about scoring your 2 docs. If you can't figure out what's going on from there, send the relevant part to the list, along with the parsed query (which you can also see from debugQuery=true output) and maybe we can help. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: roySolr > To: solr-user@lucene.apache.org > Sent: Tue, April 26, 2011 10:27:44 AM > Subject: WhitespaceTokenizer and scoring(field length) > > Hello, > > I have a problem with the whitespaceTokenizer and scoring. An example: > > id Titel > 1 Manchester united > 2 Manchester > > With the whitespaceTokenizer "Manchester united" will be splitted to > "Manchester" and "united". When > i search for "manchester" i get id 1 and 2 in my results. What i want is > that id 2 scores higher(field length). > How can i fix this? > > > -- > View this message in context: >http://lucene.472066.n3.nabble.com/WhitespaceTokenizer-and-scoring-field-length-tp2865784p2865784.html > > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: What initialize new searcher?
Hi, Yes, typically after your index has been replicated from master to a slave a commit will be issued and the new searcher will be opened. Before being exposed to regular clients it's a good practice to warm things up. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Solr Beginner > To: solr-user@lucene.apache.org > Sent: Tue, April 26, 2011 8:50:21 AM > Subject: What initialize new searcher? > > Hi, > > I'm reading solr cache documentation - > http://wiki.apache.org/solr/SolrCaching I found there "The current > Index Searcher serves requests and when a new searcher is opened...". > Could you explain when new searcher is opened? Does it have something > to do with index commit? > > Best Regards, > Solr Beginner >
Re: Automatic synonyms for multiple variations of a word
Yes, I see. Makes sense. It is a bit hard to see a "bad" case for your proposal in that light. Here is one other example; I'm not sure whether it presents difficulties or not, and may be a bit contrived, but hey, food for thought at least: Say you have set up synonyms between names and commonly-used pseudonyms or alternate names that should not be stemmed: Malcolm X <=> Malcolm Little Prince <=> Rogers Nelson Prince Little Kim <=> Kimberly Denise Jones Biggy Smalls etc. You don't want "Malcolm Littler" or "Littlest Kim" or "Big Small" to match anything. And Princely shouldn't bring up the artist. But you also have regular linguistic synonyms (not names) that *should* be stemmed (as in the original example). So little <=> small should imply littler <=> smaller and so on via stemming. Ideally you could put one SynonymFilter before the stemming and the other one after. In that case do the SynonymFilters get composed? I can't think of a believable example where that would cause a problem, but maybe you can? -Mike On 04/26/2011 04:25 PM, Robert Muir wrote: Mike, thanks a lot for your example: the idea here would be you would put the lowercasefilter after the synonymfilter, and then you get this exact flexibility? e.g. WhitespaceTokenizer SynonymFilter -> no lowercasing of tokens are done as it "analyzes" your synonyms with just the tokenizer LowerCaseFilter but WhitespaceTokenizer LowerCaseFilter SynonymFilter -> the synonyms are lowercased, as it "analyzes" synonyms with the tokenizer+filter its already inconsistent today, because if you do: LowerCaseTokenizer SynonymFilter then your synonyms are in fact all being lowercased... its just arbitrary that they are only being analyzed with the "tokenizer". On Tue, Apr 26, 2011 at 4:13 PM, Mike Sokolov wrote: Suppose your analysis stack includes lower-casing, but your synonyms are only supposed to apply to upper-case tokens. For example, "PET" might be a synonym of "positron emission tomography", but "pet" wouldn't be. -Mike On 04/26/2011 09:51 AM, Robert Muir wrote: On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic wrote: But somehow this feels bad (well, so does sticking word variations in what's supposed to be a synonyms file), partly because it means that the person adding new synonyms would need to know what they stem to (or always check it against Solr before editing the file). when creating the synonym map from your input file, currently the factory actually uses your Tokenizer only to pre-process the synonyms file. One idea would be to use the tokenstream up to the synonymfilter itself (including filters). This way if you put a stemmer before the synonymfilter, it would stem your synonyms file, too. I haven't totally thought the whole thing through to see if theres a big reason why this wouldn't work (the synonymsfilter is complicated, sorry). But it does seem like it would produce more consistent results... and perhaps the inconsistency isnt so obvious since in the default configuration the synonymfilter is directly after the tokenizer.
Re: Ebay Kleinanzeigen and Auto Suggest
Hi Eric, Before using the terms component, allow me to point out: * http://sematext.com/products/autocomplete/index.html (used on http://search-lucene.com/ for example) * http://wiki.apache.org/solr/Suggester Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Eric Grobler > To: solr-user@lucene.apache.org > Sent: Tue, April 26, 2011 1:11:11 PM > Subject: Ebay Kleinanzeigen and Auto Suggest > > Hi > > Someone told me that ebay is using solr. > I was looking at their Auto Suggest implementation and I guess they are > using Shingles and the TermsComponent. > > I managed to get a satisfactory implementation but I have a problem with > category specific filtering. > Ebay suggestions are sensitive to categories like Cars and Pets. > > As far as I understand it is not possible to using filters with a term > query. > Unless one uses multiple fields or special prefixes for the words to index I > cannot think how to implement this. > > Is their perhaps a workaround for this limitation? > > Best Regards > EricZ > > --- > > I am have a shingle type like: > positionIncrementGap="100"> > > > maxShingleSize="4" /> > > > > > > > and a query like >http://localhost:8983/solr/terms?q=*%3A*&terms.fl=suggest_text&terms.sort=count&terms.prefix=audi >i >
SynonymFilterFactory case changes
So if there is a hit in the synonym filter factory, do I need to put the various case changes for a term so that the following WordDelimiterFilter analyzer can do its 'split on case changes' work? Here we see SynonymFilterFactory makes all terms lowercase because this is what is in my synonmyms.txt file and I have ignoreCase=true: "macafee, mcafee" Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text McAfee term type word source start,end0,6 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=index_synonyms.txt, expand=true, ignoreCase=true} term position 1 term text macafee mcafee term type word word source start,end0,6 0,6 payload
Re: Question on Batch process
Charlie, How's this: * -Xmx2g * ramBufferSizeMB 512 * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n allows) * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB * use SolrStreamingUpdateServer (with params matching your number of CPU cores) or send batches of say 1000 docs with the other SolrServer impl using N threads (N=# of your CPU cores) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Charles Wardell > To: solr-user@lucene.apache.org > Sent: Tue, April 26, 2011 2:32:29 PM > Subject: Question on Batch process > > I am sure that this question has been asked a few times, but I can't seem to >find the sweetspot for indexing. > > I have about 100,000 files each containing 1,000 xml documents ready to be >posted to Solr. My desire is to have it index as quickly as possible and then >once completed the daily stream of ADDs will be small in comparison. > > The individual documents are small. Essentially web postings from the net. >Title, postPostContent, date. > > > What would be the ideal configuration? For RamBufferSize, mergeFactor, >MaxbufferedDocs, etc.. > > My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in TOP > I have 16GB of available ram. > > > Thanks in advance. > Charlie
Re: term position question from analyzer stack for WordDelimiterFilterFactory
Hi Robert, I'm no WDFF expert, but all these zero look suspicious: org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} A quick visit to http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory makes me think you want: splitOnCaseChange=1 (if you want Mc Afee for some reason?) generateWordParts=1 (if you want Mc Afee for some reason?) preserveOriginal=1 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Robert Petersen > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > Sent: Tue, April 26, 2011 4:39:49 PM > Subject: RE: term position question from analyzer stack for >WordDelimiterFilterFactory > > OK this is even more weird... everything is working much better except > for one thing: I was testing use cases with our top query terms to make > sure the below query settings wouldn't break any existing behavior, and > got this most unusual result. The analyzer stack completely eliminated > the word McAfee from the query terms! I'm like huh? Here is the > analyzer page output for that search term: > > Query Analyzer > org.apache.solr.analysis.WhitespaceTokenizerFactory {} > term position 1 > term text McAfee > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.SynonymFilterFactory > {synonyms=query_synonyms.txt, expand=true, ignoreCase=true} > term position 1 > term text McAfee > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, > ignoreCase=true} > term position 1 > term text McAfee > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, > generateNumberParts=0, catenateWords=0, generateWordParts=0, > catenateAll=0, catenateNumbers=0} > term position > term text > term type > source start,end > payload > org.apache.solr.analysis.LowerCaseFilterFactory {} > term position > term text > term type > source start,end > payload > com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory > {protected=protwords.txt} > term position > term text > term type > source start,end > payload > org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} > term position > term text > term type > source start,end > payload > > > > -Original Message- > From: Robert Petersen [mailto:rober...@buy.com] > Sent: Monday, April 25, 2011 11:27 AM > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > Subject: RE: term position question from analyzer stack for > WordDelimiterFilterFactory > > Aha! I knew something must be awry, but when I looked at the analysis > page output, well it sure looked like it should match. :) > > OK here is the query side WDF that finally works, I just turned > everything off. (yay) First I tried just completely removeing WDF from > the query side analyzer stack but that didn't work. So anyway I suppose > I should turn off the catenate all plus the preserve original settings, > reindex, and see if I still get a match huh? (PS thank you very much > for the help!!!) > > generateWordParts="0" > generateNumberParts="0" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > preserveOriginal="0" > /> > > > > -Original Message- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > Seeley > Sent: Monday, April 25, 2011 9:24 AM > To: solr-user@lucene.apache.org > Subject: Re: term position question from analyzer stack for > WordDelimiterFilterFactory > > On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen > wrote: > > The search and index analyzer stack are the same. > > Ahhh, they should not be! > Using both generate and catenate in WDF at query time is a no-no. > Same reason you can't have multi-word synonyms at query time: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym > FilterFactory > > I'd recommend going back to the WDF settings in the solr example > server as a starting point. > > > -Yonik > http://www.lucenerevolution.org -- Lucene/Solr User Conference, May > 25-26, San Francisco >
RE: term position question from analyzer stack for WordDelimiterFilterFactory
Yeah I am about to try turning one on at a time and see what happens. I had a meeting so couldn't do it yet... (darn those meetings) (lol) -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, April 26, 2011 2:37 PM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory Hi Robert, I'm no WDFF expert, but all these zero look suspicious: org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} A quick visit to http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel imiterFilterFactory makes me think you want: splitOnCaseChange=1 (if you want Mc Afee for some reason?) generateWordParts=1 (if you want Mc Afee for some reason?) preserveOriginal=1 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Robert Petersen > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > Sent: Tue, April 26, 2011 4:39:49 PM > Subject: RE: term position question from analyzer stack for >WordDelimiterFilterFactory > > OK this is even more weird... everything is working much better except > for one thing: I was testing use cases with our top query terms to make > sure the below query settings wouldn't break any existing behavior, and > got this most unusual result. The analyzer stack completely eliminated > the word McAfee from the query terms! I'm like huh? Here is the > analyzer page output for that search term: > > Query Analyzer > org.apache.solr.analysis.WhitespaceTokenizerFactory {} > term position 1 > term text McAfee > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.SynonymFilterFactory > {synonyms=query_synonyms.txt, expand=true, ignoreCase=true} > term position 1 > term text McAfee > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, > ignoreCase=true} > term position 1 > term text McAfee > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, > generateNumberParts=0, catenateWords=0, generateWordParts=0, > catenateAll=0, catenateNumbers=0} > term position > term text > term type > source start,end > payload > org.apache.solr.analysis.LowerCaseFilterFactory {} > term position > term text > term type > source start,end > payload > com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory > {protected=protwords.txt} > term position > term text > term type > source start,end > payload > org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} > term position > term text > term type > source start,end > payload > > > > -Original Message- > From: Robert Petersen [mailto:rober...@buy.com] > Sent: Monday, April 25, 2011 11:27 AM > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > Subject: RE: term position question from analyzer stack for > WordDelimiterFilterFactory > > Aha! I knew something must be awry, but when I looked at the analysis > page output, well it sure looked like it should match. :) > > OK here is the query side WDF that finally works, I just turned > everything off. (yay) First I tried just completely removeing WDF from > the query side analyzer stack but that didn't work. So anyway I suppose > I should turn off the catenate all plus the preserve original settings, > reindex, and see if I still get a match huh? (PS thank you very much > for the help!!!) > > generateWordParts="0" > generateNumberParts="0" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > preserveOriginal="0" > /> > > > > -Original Message- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > Seeley > Sent: Monday, April 25, 2011 9:24 AM > To: solr-user@lucene.apache.org > Subject: Re: term position question from analyzer stack for > WordDelimiterFilterFactory > > On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen > wrote: > > The search and index analyzer stack are the same. > > Ahhh, they should not be! > Using both generate and catenate in WDF at query time is a no-no. > Same reason you can't have multi-word synonyms at query time: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym > FilterFactory > > I'd recommend going back to the WDF settings in the solr example > server as a starting point. > > > -Yonik > http://www.lucenerevolution.org -- Lucene/Solr User Conference, May > 25-26, San Francisco >
Reader per query request
Hi, I was wondering if solr open a new lucene IndexReader for every query request? >From performance point of view, is there any problem of opening a lot of IndexReaders concurrently, or application shall have some logic to reuse the same IndexReader? Thanks, cy -- View this message in context: http://lucene.472066.n3.nabble.com/Reader-per-query-request-tp2867778p2867778.html Sent from the Solr - User mailing list archive at Nabble.com.
Field Length and Highlight
Hi, I¹ve been using solr with Coldfusion9, I¹ve made a couple of adjustment to it in order to fulfill my needs of my client, I¹m using solr as a document search engine for a online library which has documents larger then 20MB and some of them have more than 20 pages. The thing is that... At first the solr didn¹t indexed all the text, I already fix it by changing the number of the maxfieldlength in the collections, now when I search for some word at the end of a document that has like 150 pages, it shows me the document but won¹t highlight the words that are almost at the end. Any ideas?
Re: SynonymFilterFactory case changes
Yes, order does matter. You're right, putting, say, lowercase in front of WordDelimiter... will mess up the operations of WDFF. The admin/analysis page is *extremely* useful for understanding what happens in the analysis of input. Make sure to check the "verbose" checkbox. Best Erick On Tue, Apr 26, 2011 at 5:10 PM, Robert Petersen wrote: > So if there is a hit in the synonym filter factory, do I need to put the > various case changes for a term so that the following > WordDelimiterFilter analyzer can do its 'split on case changes' work? > Here we see SynonymFilterFactory makes all terms lowercase because this > is what is in my synonmyms.txt file and I have ignoreCase=true: > "macafee, mcafee" > > Index Analyzer > org.apache.solr.analysis.WhitespaceTokenizerFactory {} > term position 1 > term text McAfee > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.SynonymFilterFactory > {synonyms=index_synonyms.txt, expand=true, ignoreCase=true} > term position 1 > term text macafee > mcafee > term type word > word > source start,end 0,6 > 0,6 > payload > >
Re: term position question from analyzer stack for WordDelimiterFilterFactory
I second Otis' comments. Is it possible that you've gotten twisted around by trying to modify these settings and would be better off going back to the WDDF settings in the example schema? I've sometimes found that to be very useful. Also (although I don't think it applies in this case) be aware that the analysis page may introduce it's own errors, so when you see something really wonky, try a query with &debugQuery=on and see if the parsed query squares with the results on the analysis page... Best Erick On Tue, Apr 26, 2011 at 5:44 PM, Robert Petersen wrote: > Yeah I am about to try turning one on at a time and see what happens. I > had a meeting so couldn't do it yet... (darn those meetings) (lol) > > > -Original Message- > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > Sent: Tuesday, April 26, 2011 2:37 PM > To: solr-user@lucene.apache.org > Subject: Re: term position question from analyzer stack for > WordDelimiterFilterFactory > > Hi Robert, > > I'm no WDFF expert, but all these zero look suspicious: > > org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, > generateNumberParts=0, catenateWords=0, generateWordParts=0, > catenateAll=0, catenateNumbers=0} > > A quick visit to > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel > imiterFilterFactory > makes me think you want: > > splitOnCaseChange=1 (if you want Mc Afee for some reason?) > generateWordParts=1 (if you want Mc Afee for some reason?) > preserveOriginal=1 > > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message >> From: Robert Petersen >> To: solr-user@lucene.apache.org; yo...@lucidimagination.com >> Sent: Tue, April 26, 2011 4:39:49 PM >> Subject: RE: term position question from analyzer stack for >>WordDelimiterFilterFactory >> >> OK this is even more weird... everything is working much better except >> for one thing: I was testing use cases with our top query terms to > make >> sure the below query settings wouldn't break any existing behavior, > and >> got this most unusual result. The analyzer stack completely > eliminated >> the word McAfee from the query terms! I'm like huh? Here is the >> analyzer page output for that search term: >> >> Query Analyzer >> org.apache.solr.analysis.WhitespaceTokenizerFactory {} >> term position 1 >> term text McAfee >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.SynonymFilterFactory >> {synonyms=query_synonyms.txt, expand=true, ignoreCase=true} >> term position 1 >> term text McAfee >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, >> ignoreCase=true} >> term position 1 >> term text McAfee >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.WordDelimiterFilterFactory > {preserveOriginal=0, >> generateNumberParts=0, catenateWords=0, generateWordParts=0, >> catenateAll=0, catenateNumbers=0} >> term position >> term text >> term type >> source start,end >> payload >> org.apache.solr.analysis.LowerCaseFilterFactory {} >> term position >> term text >> term type >> source start,end >> payload >> com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory >> {protected=protwords.txt} >> term position >> term text >> term type >> source start,end >> payload >> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} >> term position >> term text >> term type >> source start,end >> payload >> >> >> >> -Original Message- >> From: Robert Petersen [mailto:rober...@buy.com] >> Sent: Monday, April 25, 2011 11:27 AM >> To: solr-user@lucene.apache.org; yo...@lucidimagination.com >> Subject: RE: term position question from analyzer stack for >> WordDelimiterFilterFactory >> >> Aha! I knew something must be awry, but when I looked at the > analysis >> page output, well it sure looked like it should match. :) >> >> OK here is the query side WDF that finally works, I just turned >> everything off. (yay) First I tried just completely removeing WDF > from >> the query side analyzer stack but that didn't work. So anyway I > suppose >> I should turn off the catenate all plus the preserve original > settings, >> reindex, and see if I still get a match huh? (PS thank you very > much >> for the help!!!) >> >> > generateWordParts="0" >> generateNumberParts="0" >> catenateWords="0" >> catenateNumbers="0" >> catenateAll="0" >> preserveOriginal="0" >> /> >> >> >> >> -Original Message- >> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik >> Seeley >> Sent: Monday, April 25, 2011 9:24 AM >> To: solr-user@lucene.apache.org >> Subject: Re: term position qu
Re: Reader per query request
See below On Tue, Apr 26, 2011 at 6:15 PM, cyang2010 wrote: > Hi, > > I was wondering if solr open a new lucene IndexReader for every query > request? > no, absolutely not. Solr only opens a reader when the underlying index has changed, say a commit or a replication happens. > From performance point of view, is there any problem of opening a lot of > IndexReaders concurrently, or application shall have some logic to reuse the > same IndexReader? Every time you open a reader, a whole new set of caches are initiated. I have a hard time imagining a situation in which opening a new searcher for each request would be a good idea. Opening a new reader, especially for a large index is a very expensive operation and should be done as rarely as possible. But Solr will do this automatically for you, by and large you don't have to think about it. Best Erick > > > Thanks, > > > cy > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Reader-per-query-request-tp2867778p2867778.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Too many open files exception related to solrj getServer too often?
Just pushing up the topic and look for answers. -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-open-files-exception-related-to-solrj-getServer-too-often-tp2808718p2867976.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reader per query request
Thanks a lot. That makes sense. -- CY -- View this message in context: http://lucene.472066.n3.nabble.com/Reader-per-query-request-tp2867778p2867995.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SynonymFilterFactory case changes
But in this case lowercase is after WDF. The question is that when you get a hit in the SynonymFilter on a synonym and where the entries in synonmyms.txt file are all in lower case do I need to add the case changing versions to make WDF work on case changes because it appears the synonym text is replaced verbatim by what is in the txt file and so that defeats the WDF filter. In fact, adding the case changing versions of this term to the synonyms.txt file makes this use case work. (yay) -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 26, 2011 3:39 PM To: solr-user@lucene.apache.org Subject: Re: SynonymFilterFactory case changes Yes, order does matter. You're right, putting, say, lowercase in front of WordDelimiter... will mess up the operations of WDFF. The admin/analysis page is *extremely* useful for understanding what happens in the analysis of input. Make sure to check the "verbose" checkbox. Best Erick On Tue, Apr 26, 2011 at 5:10 PM, Robert Petersen wrote: > So if there is a hit in the synonym filter factory, do I need to put the > various case changes for a term so that the following > WordDelimiterFilter analyzer can do its 'split on case changes' work? > Here we see SynonymFilterFactory makes all terms lowercase because this > is what is in my synonmyms.txt file and I have ignoreCase=true: > "macafee, mcafee" > > Index Analyzer > org.apache.solr.analysis.WhitespaceTokenizerFactory {} > term position 1 > term text McAfee > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.SynonymFilterFactory > {synonyms=index_synonyms.txt, expand=true, ignoreCase=true} > term position 1 > term text macafee > mcafee > term type word > word > source start,end 0,6 > 0,6 > payload > >
Re: Field Length and Highlight
(11/04/27 7:35), Alejandro Delgadillo wrote: Hi, I¹ve been using solr with Coldfusion9, I¹ve made a couple of adjustment to it in order to fulfill my needs of my client, I¹m using solr as a document search engine for a online library which has documents larger then 20MB and some of them have more than 20 pages. The thing is that... At first the solr didn¹t indexed all the text, I already fix it by changing the number of the maxfieldlength in the collections, now when I search for some word at the end of a document that has like 150 pages, it shows me the document but won¹t highlight the words that are almost at the end. Any ideas? So your maxAnalyzedChars is too small? http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars Koji -- http://www.rondhuit.com/en/
Re: Question on Batch process
Thank you Otis. Without trying to appear to stupid, when you refer to having the params matching your # of CPU cores, you are talking about the # of threads I can spawn with the StreamingUpdateSolrServer object? Up until now, I have been just utilizing post.sh or post.jar. Are these capable of that or do I need to write some code to collect a bunch of files into the buffer and send it off? Also, Do you have a sense for how long it should take to index 100,000 files or in my case 100,000,000 documents? StreamingUpdateSolrServer public StreamingUpdateSolrServer(String solrServerUrl, int queueSize, int threadCount) throws MalformedURLException Thanks again, Charlie -- Best Regards, Charles Wardell Blue Chips Technology, Inc. www.bcsolution.com On Tuesday, April 26, 2011 at 5:12 PM, Otis Gospodnetic wrote: > Charlie, > > How's this: > * -Xmx2g > * ramBufferSizeMB 512 > * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n allows) > * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB > * use SolrStreamingUpdateServer (with params matching your number of CPU > cores) > or send batches of say 1000 docs with the other SolrServer impl using N > threads > (N=# of your CPU cores) > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > > From: Charles Wardell > > To: solr-user@lucene.apache.org > > Sent: Tue, April 26, 2011 2:32:29 PM > > Subject: Question on Batch process > > > > I am sure that this question has been asked a few times, but I can't seem > > to > > find the sweetspot for indexing. > > > > I have about 100,000 files each containing 1,000 xml documents ready to be > > posted to Solr. My desire is to have it index as quickly as possible and > > then > > once completed the daily stream of ADDs will be small in comparison. > > > > The individual documents are small. Essentially web postings from the net. > > Title, postPostContent, date. > > > > > > What would be the ideal configuration? For RamBufferSize, mergeFactor, > > MaxbufferedDocs, etc.. > > > > My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in TOP > > I have 16GB of available ram. > > > > > > Thanks in advance. > > Charlie >
Re: SynonymFilterFactory case changes
Ahhh, I mis-read your post.. First, it's not the synonymfilterfactory that's lowercasing anything. The ingorecase="true" affects the matching, not the output. The output is probably lowercased because you have it that way in the synonyms.txt file. At least that's what I just saw using the analysis page from the Solr admin page. So yes, if you want the WDF to do anything on tokens put into the input stream by SynonymFilterFactory, you need to make the replacement be the accurate case. But I think you already figured all that out Best Erick On Tue, Apr 26, 2011 at 7:19 PM, Robert Petersen wrote: > But in this case lowercase is after WDF. The question is that when you get a > hit in the SynonymFilter on a synonym and where the entries in synonmyms.txt > file are all in lower case do I need to add the case changing versions to > make WDF work on case changes because it appears the synonym text is replaced > verbatim by what is in the txt file and so that defeats the WDF filter. In > fact, adding the case changing versions of this term to the synonyms.txt file > makes this use case work. (yay) > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, April 26, 2011 3:39 PM > To: solr-user@lucene.apache.org > Subject: Re: SynonymFilterFactory case changes > > Yes, order does matter. You're right, putting, say, lowercase in front > of WordDelimiter... will mess up the operations of WDFF. > > The admin/analysis page is *extremely* useful for understanding what > happens in the analysis of input. Make sure to check the "verbose" > checkbox. > > Best > Erick > > On Tue, Apr 26, 2011 at 5:10 PM, Robert Petersen wrote: >> So if there is a hit in the synonym filter factory, do I need to put the >> various case changes for a term so that the following >> WordDelimiterFilter analyzer can do its 'split on case changes' work? >> Here we see SynonymFilterFactory makes all terms lowercase because this >> is what is in my synonmyms.txt file and I have ignoreCase=true: >> "macafee, mcafee" >> >> Index Analyzer >> org.apache.solr.analysis.WhitespaceTokenizerFactory {} >> term position 1 >> term text McAfee >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.SynonymFilterFactory >> {synonyms=index_synonyms.txt, expand=true, ignoreCase=true} >> term position 1 >> term text macafee >> mcafee >> term type word >> word >> source start,end 0,6 >> 0,6 >> payload >> >> >
Suggester or spellcheck return stored fields
Hello all, I am trying to build an autocomplete solution for a website that I run. The current implementation of it is going to be used on who you want to send PM's too. I have it basically working up to this point, The UI is done and the suggester is working in returning possible solutions without any major problems. The problem I am currently running into is that the suggestions it is returning are not necessarily unique. To solve this, I would like to return the user ID (a stored field) along with the suggestion. This would help in other areas but would ensure things are unique. Is it possible to make suggester to return these other fields or is it strictly returning text as I assume is the case. I know I am likely stretching what the suggester is suppose to do, so I am ok rolling back to a different plan using normal queries. But would prefer to be able to use suggester if possible. Thanks for the help, Cameron
Re: How to Update Value of One Field of a Document in Index?
My schema: id, name, checksum, body, notes, date I'd like for a user to be able to add notes to the notes field, and not have to re-index the document (since the body field may contain 100MB of text). Some ideas: 1) How about creating another core which only contains id, checksum, and notes? Then, "updating" (delete followed by add) wouldn't be that painful? 2) What about using a multValued field? Could you just keep adding values as the user enters more notes? Pete On Sep 9, 2010, at 11:06 PM, Liam O'Boyle wrote: > Hi Savannah, > > You can only reindex the entire document; if you only have the ID, > then do a search to retrieve the rest of the data, then reindex. This > assumes that all of the fields you need to index are stored (so that > you can retrieve them) and not just indexed. > > Liam > > On Fri, Sep 10, 2010 at 3:29 PM, Savannah Beckett > wrote: >> >> I use nutch to crawl and index to Solr. My code is working. Now, I want to >> update the value of one of the fields of a document in the solr index after >> the >> document was already indexed, and I have only the document id. How do I do >> that? >> >> Thanks. >> >> >>
Re: What initialize new searcher?
Thank you for the answers. I'm moving forward and have few more questions but for separate threads. On Tue, Apr 26, 2011 at 10:47 PM, Otis Gospodnetic wrote: > Hi, > > Yes, typically after your index has been replicated from master to a slave a > commit will be issued and the new searcher will be opened. Before being > exposed > to regular clients it's a good practice to warm things up. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message >> From: Solr Beginner >> To: solr-user@lucene.apache.org >> Sent: Tue, April 26, 2011 8:50:21 AM >> Subject: What initialize new searcher? >> >> Hi, >> >> I'm reading solr cache documentation - >> http://wiki.apache.org/solr/SolrCaching I found there "The current >> Index Searcher serves requests and when a new searcher is opened...". >> Could you explain when new searcher is opened? Does it have something >> to do with index commit? >> >> Best Regards, >> Solr Beginner >> >
fieldCache only on stats page
Hi, I can see only fieldCache (nothing about filter, query or document cache) on stats page. What I'm doing wrong? We have two servers with replication. There are two cores(prod, dev) on each server. Maybe I have to add something to solrconfig.xml of cores? Best Regards, Solr Beginner
DataImportHandler in Solr 3.1.0: not updating dataimport.properties last_index_time on delta-import?
Title pretty much says it all; I've configured the DIH in 3.1.0, and it works great, except the delta-imports are always from the last time a full-import happened, not a delta-import. After a delta-import, dataimport.properties is completely untouched. The documentation implies that the delta-import should update the last_index_time: "The DataImportHandler exposes a variable called last_index_time which is a timestamp value denoting the last time full-import 'or' delta-import was run" - http://wiki.apache.org/solr/DataImportHandler#Delta-Import_Example Is there a configuration preventing delta-import from updating dataimport.properties? It updates properly on each full-import.
Re: Ebay Kleinanzeigen and Auto Suggest
Thanks for the links Otis, I will have a look. Regards Ericz On Tue, Apr 26, 2011 at 10:06 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Hi Eric, > > Before using the terms component, allow me to point out: > > * http://sematext.com/products/autocomplete/index.html (used on > http://search-lucene.com/ for example) > > * http://wiki.apache.org/solr/Suggester > > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > > From: Eric Grobler > > To: solr-user@lucene.apache.org > > Sent: Tue, April 26, 2011 1:11:11 PM > > Subject: Ebay Kleinanzeigen and Auto Suggest > > > > Hi > > > > Someone told me that ebay is using solr. > > I was looking at their Auto Suggest implementation and I guess they are > > using Shingles and the TermsComponent. > > > > I managed to get a satisfactory implementation but I have a problem with > > category specific filtering. > > Ebay suggestions are sensitive to categories like Cars and Pets. > > > > As far as I understand it is not possible to using filters with a term > > query. > > Unless one uses multiple fields or special prefixes for the words to > index I > > cannot think how to implement this. > > > > Is their perhaps a workaround for this limitation? > > > > Best Regards > > EricZ > > > > --- > > > > I am have a shingle type like: > > > positionIncrementGap="100"> > > > > > > > maxShingleSize="4" /> > > > > > > > > > > > > > > and a query like > > > http://localhost:8983/solr/terms?q=*%3A*&terms.fl=suggest_text&terms.sort=count&terms.prefix=audi > >i > > >