Re: Re: How to properly use Levenstein distance with ~ in Java

2014-10-22 Thread karsten-solr
Hi Aleksander,   The Fuzzy Searche '~' is not supported in dismax (defType=dismax) https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser   You are using SearchComponent "spellchecker". This does not change the query results.   btw: It looks like you are using path "/select" wit

Re: Can Solr handle large text files?

2011-10-21 Thread karsten-solr
Hi Peter, highlighting in large text files can not be fast without dividing the original text in small piece. So take a look in http://xtf.cdlib.org/documentation/under-the-hood/#Chunking and in http://www.lucidimagination.com/blog/2010/09/16/2446/ Which means that you should divide your files a

Re: data-import problem

2011-10-24 Thread karsten-solr
Hi Radha Krishna, try command "full-import" instead of "fullimport" see http://wiki.apache.org/solr/DataImportHandler#Commands Best regards Karsten Original-Nachricht > Datum: Mon, 24 Oct 2011 11:10:22 +0530 > Von: Radha Krishna Reddy > An: solr-user@lucene.apache.org > Bet

Re: indexing key value pair into lucene solr index

2011-10-24 Thread karsten-solr
Hi Jame, you can - generate one token for each pair (key, value) --> key_value - insert a gap between each pair and us phrase queries - use key as field-name (if you have a restricted set of keys) - wait for joins in Solr 4.0 (http://wiki.apache.org/solr/Join) - use position or payloads to co

Re: indexing key value pair into lucene solr index

2011-10-24 Thread karsten-solr
Hi Jame, preserve order in index fields: if you don't want to use phrase queries in key or value this order is "position". if you use phrase queries but no value has more then 50 Tokens you also could use position and start each pair with position 100, 200, 300 ... Otherwise you could use paylo

Re: Limit by score? sort by other field

2011-10-27 Thread karsten-solr
Hi Robert, take a look to http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-threshold-td3219064.html#a3219117 and http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html So will sort=date+desc&q={!frange l=0.85}query($qq) qq= help? Best regards Karsten --

Re: [Profiling] How to profile/tune Solr server

2011-11-04 Thread karsten-solr
Hi Spark, 2009 there was a monitor from lucidimagination: http://www.lucidimagination.com/about/news/releases/lucid-imagination-releases-performance-monitoring-utility-open-source-apache-lucene A colleague of mine calls the sematext-monitor "trojan" because "SPM phone home": "Easy in, easy out -

Re: Unable to determine why query won't return results

2011-11-10 Thread karsten-solr
Hi Kurt, I toke your fieldtype definition and could not reproduce your problem with solr 3.4. But I think you have a problem with the ampersand in "A. J. Johnson & Co." Two comments: In your analysis html-example there is a gap of two positions between Johnson and Co. This must not be ("A. J.

sending a parsed query to solr (xml-query-parser, syntaxtree)

2011-03-21 Thread karsten-solr
Hi, I am working on a migration from verity k2 to solr. At this point I have a parser for the Verity Query Language (our used subset) which generates a syntax tree. I transfer this in a couple of filters and one query. This fragmentation is the reason, why I can not use my parser inside solr

Solr without Server / Search solutions with Solr on DVD (examples?)

2011-04-07 Thread karsten-solr
Hi folks, we want to migrate our search-portal to Solr. But some of our customers search in our informations offline with a DVD-Version. So we want to estimate the complexity of a Solr DVD-Version. This means to trim Solr to work on small computers with the opposite of heavy loads. So no server-o

Re: Solr without Server / Search solutions with Solr on DVD (examples?)

2011-04-07 Thread karsten-solr
Hi Ezequiel, In Solr the performance of sorting and faceted search is mainly a question of main memory. e.g Mike McCandless wrote in s.apache.org/OWK that sorting of 5m wikipedia documents by title field need 674 MB of RAM. But again: My main interest is an example of other companies/product wh

DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-09 Thread karsten-solr
Hi Folks, does anyone improve DIH XPathRecordReader to deal with nested xpaths? e.g. data-config.xml with and the XML stream contains /html/body/h1... will only fill field “alltext” but field “title” will be empty. This is a known issue from 2009 https://issues.apache.org/jira/browse/SOLR-

Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-11 Thread karsten-solr
Hi Lance, your are right: XPathEntityProcessor has the attribut "xsl", so I can use xslt to generate a xml-File "in the form of the standard Solr update schema". I will check the performance of this. Best regards Karsten btw. "flatten" is an attribute of the "field"-Tag, not of XPathEntityP

Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-11 Thread karsten-solr
Hi Lance, I used XPathEntityProcessor with attribut "xsl" and generate a xml-File "in the form of the standard Solr update schema". I lost a lot of performance, it is a pity that XPathEntityProcessor does only use one thread. My tests with a collection of 350T Document: 1. use of XPathRecordRea

Re: RE: Indexing Question for large dataset

2011-04-14 Thread karsten-solr
Hi Joshua, what is the use-case? Do you need only the facets for one field (for each query)? Do you need all facet-values or only the first 10 in .sort=index (FACET_SORT_INDEX / numeric order) / in .sort=count (FACET_SORT_COUNT) ? How many different facet-valuss do you have per field? Do you only

Re: Query on Synonyms feature in Solr

2011-06-13 Thread karsten-solr
Hi rajini, multi-word synonyms like "private schools" normally make problems. See e.g. Solr-1-4-Enterprise-Search-Server Page 56: "For multi-word synonyms to work, the analysis must be applied at index-time and with expansion so that both the original words and the combined word get indexed. ..."

Re: AndQueryNode to NearSpanQuery

2011-06-14 Thread karsten-solr
Hi member of digitalsmiths, I also implemented SpanNearQueryNode and some QueryNodeProcessors. Most possible you can solve your problem by using QueryNode#setTag: In QueryNodeProcessor#preProcessNode you can set and remove and reset a Tag to mark the AndNodes that should became SpanNodes; after t

Re: Showing facet of first N docs

2011-06-16 Thread karsten-solr
Hi Tommaso, the FacetComponent works with the DocListAndSet#docSet. It should be easy to switch to DocListAndSet#docList (which contains all documents for result list (default: TOP-10, but possible 15-25 (if start=15, rows=11). Which means to change the source code. Instead of changing the sour

Re: Solr Configuration with 404 error

2011-07-11 Thread karsten-solr
Hi rocco, you did not stop jetty after your first attempt. (You have to kill the task.) Best regards Karsten btw: How to change the port 8983: http://lucene.472066.n3.nabble.com/How-to-change-a-port-td490375.html Original-Nachricht > Datum: Sun, 10 Jul 2011 20:11:54 -0700 (P

solr-user@lucene.apache.org

2011-08-01 Thread karsten-solr
Hi lucene/solr-folk, Issue: Our documents are stable except for two fields which are used for linking between the docs. So we like to update this two fields in a batch once a month (possible once a week). We can not reindex all docs once a month, because we are using XeLDA in some fields for s

Re: xpath expression not working

2011-08-02 Thread karsten-solr
Hi abhayd, XPathEntityProcessor does only support a subset of xpath, like div[@id="2"] but not [id=2] Take a look to https://issues.apache.org/jira/browse/SOLR-1437#commentauthor_12756469_verbose I solve this problem by using xslt a preprocessor (with full xpath). The drawback is performance "wa

Re: Store complete XML record (DIH & XPathEntityProcessor)

2011-08-02 Thread karsten-solr
Hi g, Hi Chantal I had the same problem. You can use XPathEntityProcessor but you have to insert an xsl. The drawback is performance "wasting": See http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html Best regards Karsten -

Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread karsten-solr
Hi Suk-Hyun Cho, if "myFriend" is the unit of retrieval you should use this as lucene document with the fields "isCool" "gender" "bloodType" ... if you realy want to insert all "myFriends" in one field like your myFriends = [ "isCool=true SOME_JUNK_HERE gender=female bloodType=O", "isCoo

Re: How to cut off hits with score below threshold?

2011-08-02 Thread karsten-solr
Hi Otis, is this the same question as http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html ? If yes, perhaps something like (http://search-lucene.com/m/4AHNF17wIJW1/) q={!frange l=0.85}query($qq) qq= will help? (BTW, a also would like to specify a custom Collector via API in Sol

solr-user@lucene.apache.org

2011-08-03 Thread karsten-solr
Hi Erick, our two "changable" fields are used for linking between documents on application level. >From lucene point of view they are just two searchable fields with stored term >vector for one of them. Our queries will use one of this fields and a couple of fields from the "stable" fields. So

solr-user@lucene.apache.org

2011-08-04 Thread karsten-solr
Hi Erick, thanks a lot! This looks like a good idea: Our queries with the "changeable" fields fits the join-idea from https://issues.apache.org/jira/browse/SOLR-2272 because - we do not need relevance ranking - we can separate in a conjunction of a query with the "changeable" fields and our oth

Re: string cut-off filter?

2011-08-08 Thread karsten-solr
Hi Bernd, I also searched for such a filter but did not found it. Best regards Karsten P.S. I am using now this filter: public class CutMaxLengthFilter extends TokenFilter { public CutMaxLengthFilter(TokenStream in) { this(in, DEFAULT_MAXLENGTH); } pu

Re: Migration from Autonomy IDOL to SOLR

2011-08-16 Thread karsten-solr
Hi Arcadius, currently we have a migration project from verity k2 search server to solr. I do not know IDOL, but autonomy bought verity before IDOL was released, so possible it is comparable? verity k2 works directly on xml-Files, in result the query syntax is a little bit like xpath e.g. with "

Re: DataImport using last_indexed_id or getting max(id) quickly

2012-07-11 Thread karsten-solr
Hi Avenka, *DataImportHandler* 1.) there is no configuration to add the last uniqueKeyField-Values to dataimport.properties 2.) you can use LogUpdateProcessor to log all "schema.printableUniqueKey(doc)" to log.info( ""+toLog + " 0 " + (elapsed) ) 3.) you can write your own LogUpdateProcessor to

Re: DataImport using last_indexed_id or getting max(id) quickly

2012-07-12 Thread karsten-solr
Hi Avenka, you asked for a HowTo to add a field "inverseID" which allows to calculate max(id) from its first term: If you do not use solr you have to calculate "1 - id" and store it in an extra field "inverseID". If you fill solr with your own code, add a TrieLongField "inverseID" and fi

Re: NRT and multi-value facet - what is Solr's limit?

2012-07-12 Thread karsten-solr
Hi Andy, as long as the cache for facetting is not per segment there is no NRT together with facetting. This is what Jason told you in http://lucene.472066.n3.nabble.com/Nrt-and-caching-td3993612.html and I am agree. Possible you could use multicore. Beste regards Karsten Original

Re: Nrt and caching

2012-07-12 Thread karsten-solr
Hi Andy, Multi-value faceting is a special case of taxonomy. So it is covered by the "org.apache.lucene.facet" package (lucene/facet). This is not per segment but works without "per IndexSearcher" cache. So imho the taxonomy faceting will work with NRT. Because of the new TermsEnum#ord() Method

Re: Best way to index Solr XML from w/in the same servlet container

2012-09-18 Thread karsten-solr
Hi Jay, I would like to see the Zookeeper Watcher as part of DIH in solr. Possible you could extend org.apache.solr.handler.dataimport.DataSource. If you want to call solr without http you can use solrJ: org.apache.solr.client.solrj.embedded.EmbeddedSolrServer Beste regards Karsten