Re: Tokenize Sentence and Set Attribute

2013-05-08 Thread Edward Garrett
i find UpdateRequestProcessors ( http://wiki.apache.org/solr/UpdateRequestProcessor) a handy way to add and remove NLP-related fields to a document as it is processed by Solr. this is also how UIMA integrates with Solr (http://wiki.apache.org/solr/SolrUIMA). you might want to take a look at UIMA as

Re: indexing Text file in solr

2013-01-29 Thread Edward Garrett
i don't have experience with this but it looks like you could use, from DIH: http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor On Sun, Jan 27, 2013 at 10:23 AM, hadyelsahar wrote: > i have a large Arabic Text File that contains Tweets each line contains one > tweet , that i want

Re: Calculate a sum.

2013-01-14 Thread Edward Garrett
i've had perfectly fine performance with StatsComponent, but have only tested with 50,000 documents. for example i have field syllables and numeric field syllables_count. then i sum the syllable count for any search query. how many documents are you working with? On Mon, Jan 14, 2013 at 10:54 AM,

Re: get a list of terms sorted by total term frequency

2012-11-07 Thread Edward Garrett
HighFreqTerms tool. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Nov 7, 2012 at 1:15 PM, Edward Garrett > wrote: >> hi, >> >> is there a simple way to get a list of all terms that occur in a field >> sorted by their total

get a list of terms sorted by total term frequency

2012-11-07 Thread Edward Garrett
hi, is there a simple way to get a list of all terms that occur in a field sorted by their total term frequency within that field? TermsComponent (http://wiki.apache.org/solr/TermsComponent) "provides fast field faceting over the whole index", but as counts it gives the number of documents that e

Re: How to tell the highlighter not to escape?

2007-01-04 Thread Edward Garrett
just to add a note on this, the whole idea of inserting "pseudo-markup" into XML text elements seems to be pretty much in disrepute, and certainly caused many complaints about RSS 1.0, see e.g. http://www.biglist.com/lists/xsl-list/archives/200505/msg00316.html in xsl, you **can** use disable-ou

Re: How to tell the highlighter not to escape?

2007-01-03 Thread Edward Garrett
y for me so far. On 1/3/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote: On Wed, 2007-01-03 at 02:16 +, Edward Garrett wrote: > thorsten, > > see the following for discussion. your ca

Re: How to tell the highlighter not to escape?

2007-01-02 Thread Edward Garrett
o NOT escape the hl.simple.pre and hl.simple.post tag > since it is horror to work with cdata sections in xsl. > > I had a look in the lucene highlighter and it seem that it does not > escape the tags. > > Can somebody point me to code which is responsible for escaping and >

highlighting phrasal hits

2006-12-11 Thread Edward Garrett
ly tested the above against indexed english data, so it's possible that it's an artifact of the data and analysis procedures i am using. -- Edward Garrett Visiting Fellow (2006-07) Endangered Languages Academic Programme School of Oriental and African Studies London, UK 0207 898 4536 Assis