On Jul 21, 2009, at 11:57 AM, JCodina wrote:
Let me sintetize:
We (well, I think Grant?) do changes in the DPTFF (
DelimitedPayloadTokenFilterFactory ) so that is able to index at the same
position different tokes that may have payloads.
1. token delimiter (#)
2. payload delimiter (|)
We
I downloaded solr/trunk and build it,
everything seems to work except that the VelocityResponseWriter is not in
the war file
and tomcat , gives an error of configuration when using the conf.xml of the
solrjs.
Any suggestion on how to build the solr to work with solrjs??
Thanks
Joan Codina
--
V
To give you more information.
The error I get is this one:
java.lang.NoClassDefFoundError:
org/apache/solr/request/VelocityResponseWriter (wrong name:
contrib/velocity/src/main/java/org/apache/solr/request/VelocityResponseWriter)
at java.lang.ClassLoader.defineClass1(Native Method) at
java.lang
I could not manage, yet to use it. :confused:
My doubts are:
- must I download solr from svn - trunk?
- then, must I apply the patches of solrjs and velocity and unzip the files?
or is this already in trunk?
because trunk contains velocity and javascript in contrib.
but does not find the ve
I tried to perform a DataImportHandler where the column name "user" and the
field name "User" are the same but the case of the first letter,
when performing a full import, I was getting different sorts of errors, on
that field depending on the cases of the names, I tried the four possible
combi
orking better/cleaner as we go, so we appreciate your
> early adopter help ironing out this stuff.
>
> Erik
>
> On Nov 20, 2008, at 5:44 PM, JCodina wrote:
>
>>
>> I could not manage, yet to use it. :confused:
>> My doubts are:
>> - must I down
I have a text field from where I remove stop words, as a first approximation
I use facets to see the most common words in the text, but.. stopwords are
there, and if I search documents having the stopwords, then , there are no
documents in the answer.
You can test it in this address (using solrjs
I have the solr-nightly build of last week, and in the lib foloder i can find
the lucene-core-2.9-dev.jar
I need to do some changes to the shingle filter in order to remove stopwords
from bigrams, but to do so I need to compile lucene,
the problem is, lucene is in version 2.4 not 2.9
If I take, w
Ok thanks, yes I found it, the jump from version 2.4 to 2.9 was really
disturbing me
I've seen the notes on svn, and is clear now.
Joan
markrmiller wrote:
>
>
> You want to build from svn trunk:
> http://svn.apache.org/viewvc/lucene/java/
>
> You want revision r779312, because as you can
In order to perform any further study of the resultset, like clustering, the
TermVectorComponent
gives the list of words with the correspoing tf, idf,
but this list can be huge for each document, and most of the terms may have
a low tf or a too high df,
maybe, it is usefull to compare the relati
Sorry , I was too cryptic.
I you follow this link
http://projecte01.development.barcelonamedia.org/fonetic/
you will see a "Top Words" list (in Spanish and stemmed) in the list there
is the word "si" which is in 20649 documents.
If you click at this word, the system will perform the query
hossman wrote:
>
>
> but are you sure that example would actually cause a problem?
> i suspect if you index thta exact sentence as is you wouldn't see the
> facet count for "si" or "que" increase at all.
>
> If you do a query for "{!raw field=content}que" you bypass the query
> parsers (whi
We are starting to use UIMA as a platform to analyze the text.
The result of analyzing a document is a UIMA CAS. A Cas is a generic data
structure that can contain different data.
UIMA processes single documents, They get the documents from a CAS producer,
process them using a PIPE that the user
I think that to get the best results you need some kind of natural language
processing
I'm trying to do so using UIMA but i need to integrate it with SOLR as I
explain in this post
http://www.nabble.com/Solr-and-UIMA-tc24567504.html
prerna07 wrote:
>
> Hi,
>
> I am implementing Lemmatisation
n three words and adds the trailing character that
allows to search for the right semantic info. But gives them the same
increment. Of course the full processing chain must be aware of this.
But I must think on multiwords tokens
Grant Ingersoll-6 wrote:
>
>
> On Jul 20, 2009, at 6:43 AM,
Things are done :-)
now we already have done the UIMA CAS consumer for Solr,
we are making it public, more news soon.
We have also been developing some filters based on payloads
One of the filters is to remove words with the payloads in the list the
other one maintains only these tokens
You can test our UIMA to Solr cas consumer
is based on JulieLab Lucas and uses their CAS.
but transformed to generate XML which can be saved to a file or posted
direcly to solr
In the map file you can define which information is generated for each
token, and how its concatenaded, allowing the gene
I'm trying to use carrot2 (now I started with the workbench) and I can
cluster any field, but, the text used for clustering is the original raw
text, the one that was indexed, without any of the processing performed by
the tokenizer or filters.
So I get stop words.
I also did shingles (after fi
the sum function or the map one are not parsed correctly,
doing this sort, works as a charm...
sort=score+desc,sum(Num,map(Num,0,2000,42000))+asc
but
sort=score+desc,sum(map(Num,0,2000,42000),Num)+asc
gives the following exception
SEVERE: org.apache.solr.common.SolrException: Must declare sort
Ok, solved!!!
Joan
Koji Sekiguchi-2 wrote:
>
> Can you try it latest trunk? I have just fixed it in a couple of days
>
> Koji Sekiguchi from mobile
>
>
> On 2010/03/03, at 18:18, JCodina wrote:
>
>>
>> the sum function or the map one are not parsed cor
Thanks Staszek
I'll give a try to stopwords treatbment, but the problem is that we perform
POS tagging and then use payloads to keep only Nouns and Adjectives, and we
thought that could be interesting to perform clustering only with these
elements, to avoid senseless words.
Of course is a proble
In an stored field, the content stored is the raw input text.
But when the analyzers perform some cleaning or interesting transformation
of the text, then it could be interesting to store the text after the
tokenizer/Filter chain
there is a way to do this? To be able to get back the text of the d
Thanks,
It can be useful as a workarrond,
but I get a vector not a "result" that I may use wherever I could used the
stored text.
I'm thinking in clustering.
Ahmet Arslan wrote:
>
>> In an stored field, the content stored is the raw input
>> text.
>> But when the analyzers perform some cleani
Otis,
I've been thinking on it, and trying to figure out the different solutions
- Try to solve it doing a bridge between solr and clustering.
- Try to solve it before/during indexing
The second option, of course is better for performance, but how to do it??
I think a good option may be to crea
Ok
For solr 1.5
after looking around, analyzing the answers in this forum, and browsing the
code, I think that I could manage it. I had to write a few lines of code,
the problem was to find which ones !!!
So i did a new class, which is a subclass of CompressableField that includes
a new parameter
For solr 1.4
Is basically the same but IndexSchema (org.apache.solr.schema.IndexSchema)
needs to be updated to include the function
getFieldTypeByName(String fieldTypeName) which is already in sorl1.5
/**
* Given the name of a {...@link org.apache.solr.schema.FieldType} (not to be
confused
26 matches
Mail list logo