Re: are stopwords indexed?

2012-07-16 Thread Giovanni Gherdovich
Hi all, thank you for your replies. Lance: > Look at the index with the Schema Browser in the Solr UI. This pulls > the terms for each field. I did it, and it was the first alarm I got. After the indexing, I went on the schema browser hoping to don't see any stopword in the top-terms, but... they

are stopwords indexed?

2012-07-15 Thread Giovanni Gherdovich
Hi all, are stopwords from the stopwords.txt config file supposed to be indexed? I would say no, but this is the situation I am observing on my Solr instance: * I have a bunch of stopwords in stopwords.txt * my fields are of fieldType "text" from the example schema.xml, i.e. I have -- -- >8 -

Re: documentation on the pragmatics behind the example schema.xml

2012-06-30 Thread Giovanni Gherdovich
Hello Eric, 2012/7/1 Erick Erickson : > Your very best way of figuring this out is to use the admin/analysis > page. [...] thank you for this advice. I'll make myself comfortable with the admin/analysis page. cheers, GGhh

Re: more than one text corpus with solr?

2012-06-30 Thread Giovanni Gherdovich
Hi Gora, yes I was actually looking for a multi-core setup. thanks! GGhh 2012/6/30 Gora Mohanty > > Not quite sure what you mean by "more than one > corpus", and by "several independent indices" in > this context, but maybe multi-core Solr will meet > your needs: http://wiki.apache.org/solr/Cor

Re: difference between stored="false" and stored="true" ?

2012-06-30 Thread Giovanni Gherdovich
Thank you François and Jack for those explainations. Cheers, GGhh 2012/6/30 François Schiettecatte: > Giovanni > > means the data is stored in the index and [...] 2012/6/30 Jack Krupansky: > "indexed" and "stored" are independent [...]

Re: how do I trash a whole index and start over?

2012-06-30 Thread Giovanni Gherdovich
2012/6/30 Dmitry Kan: > Hello, > > The easiest way is to remove what's inside data/index directory; in case > you have a spell-checker index, remove it as well. This requires solr > instance restart. thanks dmitry, I'll go for this solution. cheers, GGhh

difference between stored="false" and stored="true" ?

2012-06-30 Thread Giovanni Gherdovich
Hi all, when declaring a field in the schema.xml file you can set the attributes 'indexed' and 'stored' to "true" or "false". What is the difference between a and a ? I guess understanding this would require me to have a closer look to lucene's index data structures; what's the pointer to some

documentation on the pragmatics behind the example schema.xml

2012-06-30 Thread Giovanni Gherdovich
Hi all, in the example schema.xml I can find a wide variety of fieldType and field, already there to be used. I believe each of them has been designed for a specific usage case, with some pragmatics in mind. Where can I find documentation on what those field / fieldTypes were designed for? Is th

how do I trash a whole index and start over?

2012-06-30 Thread Giovanni Gherdovich
Hi all, how do I trash a whole index and start over with a new fresh index of my corpus? I need that since I modified my schema.xml since my last indexing, and I'd like the changes to be taken into account. Cheers, Giovanni

Re: how to retrieve a doc from its docID ?

2012-06-30 Thread Giovanni Gherdovich
Sascha: > You should also make sure that the field definition (in schema.xml) for 'text' > says stored="true", otherwise the field will not be returned. I guess you're hitting my problem. The field I want to search on is declared with store=false in the schema.xml: -- -- >8 -- -- >8 -- -- >8 -

Re: querying thru solritas gives me zero results

2012-06-30 Thread Giovanni Gherdovich
2012/6/30 Erik Hatcher: > Debugging this you can add &debugQuery=true&wt=xml to get > the full classic Solr XML output that drives it all. Thank you Erik, I'll see what I get from it. cheers, GGhh

Re: more than one text corpus with solr?

2012-06-30 Thread Giovanni Gherdovich
2012/6/30 Afroz Ahmad: > You can set up multiple cores, each core managing a different index. > See http://wiki.apache.org/solr/CoreAdmin > thank you very much Ahmad for this hint. cheers, Giovanni

Re: querying thru solritas gives me zero results

2012-06-30 Thread Giovanni Gherdovich
Hello Sascha, Sascha: > Solritas uses the dismax query parser. > The dismax config parameter 'qf' specifies > the index fields to be searched in. > Make sure that 'name' is your default search field. I am not sure I understand this; I have no field named 'name'. My documents are like -- -- >8 --

querying thru solritas gives me zero results

2012-06-30 Thread Giovanni Gherdovich
Hi all, this morning I was very proud of myself since I managed to set up solritas ( http://wiki.apache.org/solr/VelocityResponseWriter ) for the solr instance on my server (ubuntu natty). This joy lasted only half a minute, since the only query that gets more than zero results with solritas is t

how to retrieve a doc from its docID ?

2012-06-30 Thread Giovanni Gherdovich
Hi all, when querying my solr instance, the answers I get are the document IDs of my docs. Here is how one of my docs looks like: -- -- >8 -- -- >8 -- -- >8 -- -- >8 -- -- >8 -- -- hello solar! 123 -- -- >8 -- -- >8 -- -- >8 -- -- >8 -- -- >8 -- -- here is the respons

more than one text corpus with solr?

2012-06-30 Thread Giovanni Gherdovich
Hi all, i am experimenting with solr, and I feel the need to index more than just one corpus and search them with solr independently. is it possible to have this setup? Several independent indices all managed by the same solr instance? cheers, Giovanni

return *all* words at levenstein distance = N from query word

2012-06-07 Thread Giovanni Gherdovich
Hi all, I am wandering if SOLR can return me all words in my text corpus that have a given levenstein distance with my query word. Possible? Difficult? Cheers, Giovanni

Re: indexing unstructured text (tweets)

2012-05-28 Thread Giovanni Gherdovich
2012/5/28 Jack Krupansky : > Ah, okay. Here's some PHP regexp code for parsing a raw tweet to get user > names and hash tags: > > http://saturnboy.com/2010/02/parsing-twitter-with-regexp/ Awesome! thank you very much Jack. GGhh

Re: indexing unstructured text (tweets)

2012-05-28 Thread Giovanni Gherdovich
Hello Jack and Anuj, 2012/5/28 Jack Krupansky : > The Twitter API extracts hash tag and user mentions for you, in addition to > giving you the full raw text. You'll have to read up on the Twitter API. That's what I thought just after hittind "send" on the message above ;-) I am pretty sure the Tw

Re: indexing unstructured text (tweets)

2012-05-28 Thread Giovanni Gherdovich
Hello Jack, hi all, 2012/5/28 Jack Krupansky : > Other obvious metadata from the Twitter API to index would be hashtags, user > mentions (both the user id/screen name and user name), date/time, urls > mentioned (expanded if a URL shortener is used), and possibly coordinates > for spatial search.

Re: indexing unstructured text (tweets)

2012-05-28 Thread Giovanni Gherdovich
Hello Dmitry and David, 2012/5/28 Dmitry Kan : > [...] If you just want to > index the text contents of tweets (including web links etc), using just > off-the-shelf Solr is enough. You'll have to wrap your text input (per each > tweet I would assume) into an xml [...] > So design your schema firs

indexing unstructured text (tweets)

2012-05-28 Thread Giovanni Gherdovich
explaination about the general picture? Can I index my tweets with Solr? Or do I need to put also Tika in my pipeline? Best regards, Giovanni Gherdovich