Re: Solr and UIMA

2009-07-24 Thread JCodina
On Jul 21, 2009, at 11:57 AM, JCodina wrote: Let me sintetize: We (well, I think Grant?) do changes in the DPTFF ( DelimitedPayloadTokenFilterFactory ) so that is able to index at the same position different tokes that may have payloads. 1. token delimiter (#) 2. payload delimiter (|) We

Build Solr to run SolrJS

2008-11-16 Thread JCodina
I downloaded solr/trunk and build it, everything seems to work except that the VelocityResponseWriter is not in the war file and tomcat , gives an error of configuration when using the conf.xml of the solrjs. Any suggestion on how to build the solr to work with solrjs?? Thanks Joan Codina -- V

Re: Build Solr to run SolrJS

2008-11-17 Thread JCodina
To give you more information. The error I get is this one: java.lang.NoClassDefFoundError: org/apache/solr/request/VelocityResponseWriter (wrong name: contrib/velocity/src/main/java/org/apache/solr/request/VelocityResponseWriter) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang

Re: Build Solr to run SolrJS

2008-11-20 Thread JCodina
I could not manage, yet to use it. :confused: My doubts are: - must I download solr from svn - trunk? - then, must I apply the patches of solrjs and velocity and unzip the files? or is this already in trunk? because trunk contains velocity and javascript in contrib. but does not find the ve

DataImportHanler JDBC case problems

2008-11-21 Thread JCodina
I tried to perform a DataImportHandler where the column name "user" and the field name "User" are the same but the case of the first letter, when performing a full import, I was getting different sorts of errors, on that field depending on the cases of the names, I tried the four possible combi

Re: Build Solr to run SolrJS

2008-11-22 Thread JCodina
orking better/cleaner as we go, so we appreciate your > early adopter help ironing out this stuff. > > Erik > > On Nov 20, 2008, at 5:44 PM, JCodina wrote: > >> >> I could not manage, yet to use it. :confused: >> My doubts are: >> - must I down

facets and stopwords

2009-06-09 Thread JCodina
I have a text field from where I remove stop words, as a first approximation I use facets to see the most common words in the text, but.. stopwords are there, and if I search documents having the stopwords, then , there are no documents in the answer. You can test it in this address (using solrjs

version of lucene

2009-06-15 Thread JCodina
I have the solr-nightly build of last week, and in the lib foloder i can find the lucene-core-2.9-dev.jar I need to do some changes to the shingle filter in order to remove stopwords from bigrams, but to do so I need to compile lucene, the problem is, lucene is in version 2.4 not 2.9 If I take, w

Re: version of lucene

2009-06-15 Thread JCodina
Ok thanks, yes I found it, the jump from version 2.4 to 2.9 was really disturbing me I've seen the notes on svn, and is clear now. Joan markrmiller wrote: > > > You want to build from svn trunk: > http://svn.apache.org/viewvc/lucene/java/ > > You want revision r779312, because as you can

Top tf_idf in TermVectorComponent

2009-06-25 Thread JCodina
In order to perform any further study of the resultset, like clustering, the TermVectorComponent gives the list of words with the correspoing tf, idf, but this list can be huge for each document, and most of the terms may have a low tf or a too high df, maybe, it is usefull to compare the relati

Re: facets and stopwords

2009-07-01 Thread JCodina
Sorry , I was too cryptic. I you follow this link http://projecte01.development.barcelonamedia.org/fonetic/ you will see a "Top Words" list (in Spanish and stemmed) in the list there is the word "si" which is in 20649 documents. If you click at this word, the system will perform the query

Re: facets and stopwords

2009-07-08 Thread JCodina
hossman wrote: > > > but are you sure that example would actually cause a problem? > i suspect if you index thta exact sentence as is you wouldn't see the > facet count for "si" or "que" increase at all. > > If you do a query for "{!raw field=content}que" you bypass the query > parsers (whi

Solr and UIMA

2009-07-20 Thread JCodina
We are starting to use UIMA as a platform to analyze the text. The result of analyzing a document is a UIMA CAS. A Cas is a generic data structure that can contain different data. UIMA processes single documents, They get the documents from a CAS producer, process them using a PIPE that the user

Re: Lemmatisation support in Solr

2009-07-21 Thread JCodina
I think that to get the best results you need some kind of natural language processing I'm trying to do so using UIMA but i need to integrate it with SOLR as I explain in this post http://www.nabble.com/Solr-and-UIMA-tc24567504.html prerna07 wrote: > > Hi, > > I am implementing Lemmatisation

Re: Solr and UIMA

2009-07-21 Thread JCodina
n three words and adds the trailing character that allows to search for the right semantic info. But gives them the same increment. Of course the full processing chain must be aware of this. But I must think on multiwords tokens Grant Ingersoll-6 wrote: > > > On Jul 20, 2009, at 6:43 AM,

Re: Solr and UIMA

2010-02-11 Thread JCodina
Things are done :-) now we already have done the UIMA CAS consumer for Solr, we are making it public, more news soon. We have also been developing some filters based on payloads One of the filters is to remove words with the payloads in the list the other one maintains only these tokens

Re: Solr and UIMA

2010-03-02 Thread JCodina
You can test our UIMA to Solr cas consumer is based on JulieLab Lucas and uses their CAS. but transformed to generate XML which can be saved to a file or posted direcly to solr In the map file you can define which information is generated for each token, and how its concatenaded, allowing the gene

Clustering from anlayzed text instead of raw input

2010-03-03 Thread JCodina
I'm trying to use carrot2 (now I started with the workbench) and I can cluster any field, but, the text used for clustering is the original raw text, the one that was indexed, without any of the processing performed by the tokenizer or filters. So I get stop words. I also did shingles (after fi

error in sum function

2010-03-03 Thread JCodina
the sum function or the map one are not parsed correctly, doing this sort, works as a charm... sort=score+desc,sum(Num,map(Num,0,2000,42000))+asc but sort=score+desc,sum(map(Num,0,2000,42000),Num)+asc gives the following exception SEVERE: org.apache.solr.common.SolrException: Must declare sort

Re: error in sum function

2010-03-03 Thread JCodina
Ok, solved!!! Joan Koji Sekiguchi-2 wrote: > > Can you try it latest trunk? I have just fixed it in a couple of days > > Koji Sekiguchi from mobile > > > On 2010/03/03, at 18:18, JCodina wrote: > >> >> the sum function or the map one are not parsed cor

Re: Clustering from anlayzed text instead of raw input

2010-03-03 Thread JCodina
Thanks Staszek I'll give a try to stopwords treatbment, but the problem is that we perform POS tagging and then use payloads to keep only Nouns and Adjectives, and we thought that could be interesting to perform clustering only with these elements, to avoid senseless words. Of course is a proble

Store input text after analyzers and token filters

2010-03-05 Thread JCodina
In an stored field, the content stored is the raw input text. But when the analyzers perform some cleaning or interesting transformation of the text, then it could be interesting to store the text after the tokenizer/Filter chain there is a way to do this? To be able to get back the text of the d

Re: Store input text after analyzers and token filters

2010-03-05 Thread JCodina
Thanks, It can be useful as a workarrond, but I get a vector not a "result" that I may use wherever I could used the stored text. I'm thinking in clustering. Ahmet Arslan wrote: > >> In an stored field, the content stored is the raw input >> text. >> But when the analyzers perform some cleani

Re: Store input text after analyzers and token filters

2010-03-09 Thread JCodina
Otis, I've been thinking on it, and trying to figure out the different solutions - Try to solve it doing a bridge between solr and clustering. - Try to solve it before/during indexing The second option, of course is better for performance, but how to do it?? I think a good option may be to crea

Re: Store input text after analyzers and token filters

2010-03-15 Thread JCodina
Ok For solr 1.5 after looking around, analyzing the answers in this forum, and browsing the code, I think that I could manage it. I had to write a few lines of code, the problem was to find which ones !!! So i did a new class, which is a subclass of CompressableField that includes a new parameter

Re: Store input text after analyzers and token filters

2010-03-15 Thread JCodina
For solr 1.4 Is basically the same but IndexSchema (org.apache.solr.schema.IndexSchema) needs to be updated to include the function getFieldTypeByName(String fieldTypeName) which is already in sorl1.5 /** * Given the name of a {...@link org.apache.solr.schema.FieldType} (not to be confused