Re: Very basic questions: Indexing text

2010-06-28 Thread Ahmet Arslan
> Could you give an example? > E.g. lets say I have a field 'title' and a field 'fulltext' > and my > search term is 'solr'. What would be the right set of > parameters to get > back the whole title-field but only a sniplet of 50 words > (or three > sentences or whatever the unit) from the fulltext

Re: Very basic questions: Indexing text

2010-06-28 Thread Michael Lackhoff
On 28.06.2010 23:00 Ahmet Arslan wrote: >> 1) I can get my docs in the index, but when I search, it >> returns the entire document. I'd love to have it only >> return the line (or two) around the search term. > > Solr can generate Google-like snippets as you describe. > http://wiki.apache.org/s

AutoSuggest Question

2010-06-28 Thread Neil Lott
Hi, I've read some on the autosuggest and I would like to know if the following is possible with my current configuration. I'm using solr 1.4.

What is the proper procedure to reopen closed bugs?

2010-06-28 Thread Teruhiko Kurosaka
I'd like to reopen a bug SOLR-1960 https://issues.apache.org/jira/browse/SOLR-1960 "http://wiki.apache.org/solr/ : non-English users get generic MoinMoin page instead of the desired information" as I submitted a patch. But jira won't let me do it. Do I have to clone it? Teruhiko "Kuro" Kuro

Re: Very basic questions: Indexing text

2010-06-28 Thread Erick Erickson
try adding &hl.fl=text to specify your highlight field. I don't understand why you're only getting the ID field back though. Do note that the highlighting is after the docs, related by the ID. Try a (non highlighting) query of just * to verify that you're pointing at the index you think you are. I

Re: Very basic questions: Indexing text

2010-06-28 Thread Peter Spam
On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: >> 1) I can get my docs in the index, but when I search, it >> returns the entire document. I'd love to have it only >> return the line (or two) around the search term. > > Solr can generate Google-like snippets as you describe. > http://wiki.apa

Re: Too Many Open Files

2010-06-28 Thread Anderson vasconcelos
Other question, Why SOLRJ d'ont close the StringWriter e OutputStreamWriter ? thanks 2010/6/28 Anderson vasconcelos > Thanks for responses. > I instantiate one instance of per request (per delete query, in my case). > I have a lot of concurrency process. Reusing the same instance (to send, > d

Re: DIH and denormalizing

2010-06-28 Thread Alexey Serba
> It seems that ${ncdat.feature} is not being set. Try ${dataTable.feature} instead. On Tue, Jun 29, 2010 at 1:22 AM, Shawn Heisey wrote: > I am trying to do some denormalizing with DIH from a MySQL source.  Here's > part of my data-config.xml: > >      query="SELECT *,FROM_UNIXTIME(post_date)

Re: DIH and denormalizing

2010-06-28 Thread Shawn Heisey
On 6/28/2010 3:28 PM, caman wrote: In your query 'query="SELECT webtable as wt FROM ncdat_wt WHERE featurecode='${ncdat.feature}' .. instead of ${ncdat.feature} use ${dataTable.feature} where dataTable is your parent entity name. I knew it would be something stupid like that. I thought I

unknown handler dataimport

2010-06-28 Thread Lance Hill
Hi, I am trying to get db indexing up and running, but I am having trouble getting it working. In the solrconfig.xml file, I added data-config.xml I defined a couple of fields in schema.xml media_id is defined as the unique

RE: DIH and denormalizing

2010-06-28 Thread caman
In your query 'query="SELECT webtable as wt FROM ncdat_wt WHERE featurecode='${ncdat.feature}' .. instead of ${ncdat.feature} use ${dataTable.feature} where dataTable is your parent entity name. From: Shawn Heisey-4 [via Lucene] [mailto:ml-node+929151-1527242139-124...@n3.nabble.com]

Optimizing cache

2010-06-28 Thread Blargy
Here is a screen shot for our cache from New Relic. http://s4.postimage.org/mmuji-31d55d69362066630eea17ad7782419c.png Query cache: 55-65% Filter cache: 100% Document cache: 63% Cache size is 512 for above 3 caches. How do I interpret this data? What are some optimal configuration changes give

Re: Too Many Open Files

2010-06-28 Thread Anderson vasconcelos
Thanks for responses. I instantiate one instance of per request (per delete query, in my case). I have a lot of concurrency process. Reusing the same instance (to send, delete and remove data) in solr, i will have a trouble? My concern is if i do this, solr will commit documents with data from oth

DIH and denormalizing

2010-06-28 Thread Shawn Heisey
I am trying to do some denormalizing with DIH from a MySQL source. Here's part of my data-config.xml: query="SELECT *,FROM_UNIXTIME(post_date) as pd FROM ncdat WHERE did > ${dataimporter.request.minDid} AND did <= ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShar

Re: Very basic questions: Indexing text

2010-06-28 Thread Peter Spam
Great, thanks for the pointers. Thanks, Peter On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: >> 1) I can get my docs in the index, but when I search, it >> returns the entire document. I'd love to have it only >> return the line (or two) around the search term. > > Solr can generate Google-

Re: Very basic questions: Indexing text

2010-06-28 Thread Ahmet Arslan
> 1) I can get my docs in the index, but when I search, it > returns the entire document.  I'd love to have it only > return the line (or two) around the search term. Solr can generate Google-like snippets as you describe. http://wiki.apache.org/solr/HighlightingParameters > 2) There are one or

Very basic questions: Indexing text

2010-06-28 Thread Peter Spam
Hi everyone, I'm looking for a way to index a bunch of (potentially large) text files. I would love to see results like Google, so I went through a few tutorials, but I've still got questions: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to h

Re: solr data config questions

2010-06-28 Thread Alexey Serba
Hi, You can add additional commentreplyjoin entity to story entity, i.e. ... Thus, you will have multivalued field commentreply that contains list of related "comment_id, reply_id" ("comment_id," if you don't have any related replies for this entry

Re: Too Many Open Files

2010-06-28 Thread Michel Bottan
Hi Anderson, If you are using SolrJ, it's recommended to reuse the same instance per solr server. http://wiki.apache.org/solr/Solrj#CommonsHttpSolrServer But there are other scenarios which may cause this situation: 1. Other application running in the same Solr JVM which doesn't close properly

spellcheckcomponent and frequency thresholds

2010-06-28 Thread Matthew Goldfield
Hi, I'm adding the spellCheckComponent to my current configuration of solr, and I was wondering if there was a way to set a minimum frequency threshold for the IndexBasedSpellChecker through solr like there is in the depreciated Spell Check Request Handler. I know that you can fix most problems

Re: SweetSpotSimilarity

2010-06-28 Thread Blargy
iorixxx wrote: > > CustomSimilarityFactory that extends > org.apache.solr.schema.SimilarityFactory should do it. There is an example > CustomSimilarityFactory.java under src/test/org... > This is exactly what I was looking for... this is very similar ( no put intended ;) ) to the updateProcess

Re: Spatial types and DIH

2010-06-28 Thread Eric Angel
Yes. For now, I've gone back to Lucene 1.4 and installed Local Lucene. I just couldn't get the sfilt to work. I'm sure I was probably missing something, but I think I'll just wait until 1.5 is ready to be shipped. On Jun 28, 2010, at 12:02 PM, Grant Ingersoll wrote: > > On Jun 24, 2010, at

Re: Spatial types and DIH

2010-06-28 Thread Grant Ingersoll
On Jun 24, 2010, at 12:32 AM, Eric Angel wrote: > I'm using solr 4.0-2010-06-23_08-05-33 and can't figure out how to add the > spatial types (LatLon, Point, GeoHash or SpatialTile) using > dataimporthandler. My lat/lngs from the database are in separate fields. > Does anyone know how to do h

Re: SweetSpotSimilarity

2010-06-28 Thread Ahmet Arslan
> How would you configure the tfBaselineTfFactors and > LengthNormFactors when > configuring via schema.xml? CustomSimilarityFactory that extends org.apache.solr.schema.SimilarityFactory should do it. There is an example CustomSimilarityFactory.java under src/test/org...

Re: SweetSpotSimilarity

2010-06-28 Thread Blargy
iorixxx wrote: > > it is in schema.xml: > > > How would you configure the tfBaselineTfFactors and LengthNormFactors when configuring via schema.xml? Do I have to create a subclass that hardcodes these values? -- View this message in context: http://lucene.472066.n3.nabble.com/SweetSpotSimi

Re: preside != president

2010-06-28 Thread Jan Høydahl / Cominvent
Hi, You might also want to check out the new Lucene-Hunspell stemmer at http://code.google.com/p/lucene-hunspell/ It uses OpenOffice dictionaries with known stems in combination with a large set of language specific rules. It handles your example, but it is an early release, so test it thoroughl

solr data config questions

2010-06-28 Thread Peng, Wei
Hi All, I am a new user of Solr. We are now trying to enable searching on Digg dataset. It has story_id as the primary key and comment_id are the comment id which commented story_id, so story_id and comment_id is one-to-many relationship. These comment_ids can be replied by some repliers, so

Re: Too Many Open Files

2010-06-28 Thread Erick Erickson
This probably means you're opening new readers without closing old ones. But that's just a guess. I'm guessing that this really has nothing to do with the delete itself, but the delete is what's finally pushing you over the limit. I know this has been discussed before, try searching the mail archi

Too Many Open Files

2010-06-28 Thread Anderson vasconcelos
Hi all When i send a delete query to SOLR, using the SOLRJ i received this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Too many open files 11:53:06,964 INFO [HttpMethodDirector] I/O exception (java.net.SocketException) caught when processing request: Too

Re: questions about Solr shards

2010-06-28 Thread Joe Calderon
there is a first pass query to retrieve all matching document ids from every shard along with relevant sorting information, the document ids are then sorted and limited to the amount needed, then a second query is sent for the rest of the documents metadata. On Sun, Jun 27, 2010 at 7:32 PM, Babak

Re: Strange query behavior

2010-06-28 Thread Joe Calderon
splitOnCaseChange is creating multiple tokens from 3dsMax disable it or enable catenateAll, use the analysys page in the admin tool to see exactly how your text will be indexed by analyzers without having to reindex your documents, once you have it right you can do a full reindex. On Mon, Jun 28,

Re: preside != president

2010-06-28 Thread Joe Calderon
the general consensus among people who run into the problem you have is to use a plurals only stemmer, a synonyms file or a combination of both (for irregular nouns etc) if you search the archives you can find info on a plurals stemmer On Mon, Jun 28, 2010 at 6:49 AM, wrote: > Thanks for the ti

Re: Search limit to the first 50 000 chars for one field

2010-06-28 Thread Erick Erickson
And note that those are tokens, not characters, not that it should make any difference if you've bumped it up that far Best Erick On Mon, Jun 28, 2010 at 9:18 AM, judauphant wrote: > > Ok thanks, it works. > > Best regards, > Julien > -- > View this message in context: > http://lucene.47206

Re: Chinese chars are not indexed ?

2010-06-28 Thread Andy
What if Chinese is mixed with English? I have text that is entered by users and it could be a mix of Chinese, English, etc. What's the best way to handle that? Thanks. --- On Mon, 6/28/10, Ahmet Arslan wrote: > From: Ahmet Arslan > Subject: Re: Chinese chars are not indexed ? > To: solr-use

Facet as Autosuggestion

2010-06-28 Thread stockii
Hello.. I have a little question about Facetting. When i use "facet.prefix=mau", i geht these result: 49 23 thats fine, but i miss the last word for the result "bluetooth laser". i want "bluetooth laser maus" the problem is, when i use this as suggestion and the user search for "bluetooth lase

custom core admin handler

2010-06-28 Thread Dave Hall
Hi all, I have been using Solr for quite a while, but I never really got into looking at the code. Last week that all changed, I decided to write a custom core admin handler. I've posted something on my blog about it, along with a Drupal centric howto. I'd be interested to know what people thin

DataImportHandler $deleteDocById question

2010-06-28 Thread André Maldonado
Hi all. I'm trying to get $deleteDocById working, but any document is being deleted from my index. I'm using Full-Import (withOUT cleaning) and a script with: row.put('$deleteDocById', row.get('codAnuncio')); The script is passing in this line for every document it processes (for testing purpos

Re: preside != president

2010-06-28 Thread darren
Thanks for the tip. Yeah, I think the stemming confounds search results as it stands (porter stemmer). I was also thinking of using my dictionary of 500,000 words with their complete morphologies and conjugations and create a synonyms.txt to provide english accurate morphology. Is this a good ide

Re: preside != president

2010-06-28 Thread Brendan Grainger
Hi Darren, You might want to look at the KStemmer (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem) instead of the standard PorterStemmer. It essentially has a 'dictionary' of exception words where stemming stops if found, so in your case president won't be stemmed any furthe

Re: Search limit to the first 50 000 chars for one field

2010-06-28 Thread judauphant
Ok thanks, it works. Best regards, Julien -- View this message in context: http://lucene.472066.n3.nabble.com/Search-limit-to-the-first-50-000-chars-for-one-field-tp927635p927725.html Sent from the Solr - User mailing list archive at Nabble.com.

Strange query behavior

2010-06-28 Thread Marc Ghorayeb
Hello, I have a title that says "3DVIA Studio & Virtools Maya and 3dsMax Exporters". The analysis tool for this field gives me these tokens:3dviadviastudio&;virtoolmaya3dsmaxdssystèmmaxexport However, when i search for "3dsmax", i get no results :( Furthermore, if i search for "dsmax" i get t

Re: Data Import Handler Rich Format Documents

2010-06-28 Thread Alexey Serba
> Ok, I'm trying to integrate the TikaEntityProcessor as suggested.  I'm using > Solr Version: 1.4.0 and getting the following error: > > java.lang.ClassNotFoundException: Unable to load BinURLDataSource or > org.apache.solr.handler.dataimport.BinURLDataSource It seems that DIH-Tika integration is

Re: Search limit to the first 50 000 chars for one field

2010-06-28 Thread Ahmet Arslan
> I use solr 1.4 for search contents in documents (pdf, doc, > odt ...). I use > the module "/update/extract". > When I am researching, I am limited to the first 5 > characters > (approximately). > Any word or sentence after is not found (but the field has > more than 5 > characters when I

Search limit to the first 50 000 chars for one field

2010-06-28 Thread judauphant
Hi, I use solr 1.4 for search contents in documents (pdf, doc, odt ...). I use the module "/update/extract". When I am researching, I am limited to the first 5 characters (approximately). Any word or sentence after is not found (but the field has more than 5 characters when I recovered it

preside != president

2010-06-28 Thread Darren Govoni
Hi, It seems to me that because the stemming does not produce grammatically correct stems in many of the cases, search anomalies can occur like the one I am seeing where I have a document with "president" in it and it is returned when I search for "preside", a different word entirely. Is this co

RE: is there a "delete all" command in updateHandler?

2010-06-28 Thread Daniel Alheiros
Hi Li, Yes, you can issue a delete all by: curl http://your_solr_server:your_solr_port/solr/update -H "Content-Type: text/xml" --data-binary '*:*'; Hope it helps. Cheers, Daniel -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: 28 June 2010 03:41 To: solr-user@lucene.a

Question about the mailinglist (junk on my behalf)

2010-06-28 Thread MitchK
Hello community, since a few days I recieve daily some mails with suspicious content. It is said that some of my mails were rejected, because of the file-types of the mail's attachements and other things. This wonders me a lot, because I didn't send any mails with attachements and even the eMail-

Re: Use of EmbeddedSolrServer

2010-06-28 Thread Antonio Calò
I think that this is the best way to use Solr. I've used EmbeddedSolrServer keeping it in a singleton manner (by using Spring framework). Also Solr is threadsafe, so you should not have any issue by using it directly in an Ejb. Antonio 2010/6/27 Robert Naczinski > Hello, > > there is a recom

one to many denormalization approach

2010-06-28 Thread Michael Delaney
Hi, I have an architectural question about using apache solr/lucene. I'm building a solr index for searching a CV database. Basically every CV on there will have some fields like: rate of pay, address, title these fields are straight forward. The area I need advise on is, skills and job history

Re: Chinese chars are not indexed ?

2010-06-28 Thread Ahmet Arslan
> oh yes, *...* works. thanks. > > I saw tokenizer is defined in schema.xml. There are a few > places that define the tokenizer. Wondering if it is enough > to define one for: It is better to define a brand new field type specific to Chinese. http://wiki.apache.org/solr/LanguageAnalysis?highlig

Re: Chinese chars are not indexed ?

2010-06-28 Thread go canal
oh yes, *...* works. thanks. I saw tokenizer is defined in schema.xml. There are a few places that define the tokenizer. Wondering if it is enough to define one for: