Re: Changing the default Fuzzy minSimilarity?

2010-12-15 Thread Jan Høydahl / Cominvent
>> A fuzzy query foo~ defaults to a similarity of 0.5, i.e. equal to foo~0.5 >> > > just as an FYI, this isn't true in trunk (4.0) any more. > > the defaults are changed so that it never enumerates the entire > dictionary (slow) like before, see: > https://issues.apache.org/jira/browse/LUCENE-26

Omitting tf but not positions

2010-12-15 Thread Jan Høydahl / Cominvent
Hi, I have a case where I use DisMax "pf" to boost on phrase match in a field. I use omitNorms=true to avoid length normalization to mess with my scores. However, for some documents, the phrase "foo bar" occur more than one time in the same field, and I get an unintended TF boost for one of the

Re: [DIH] Example for SQL Server

2010-12-15 Thread Savvas-Andreas Moysidis
Hi Adam, we are using DIH to index off an SQL Server database(the freeby SQLExpress one.. ;) ). We have defined the following in our %TOMCAT_HOME%\solr\conf\data-config.xml: We downloaded a JDBC driver from here http://jtds.sourceforge.net/faq.html and found it to be a quite stab

Problem using curl in PHP to get Solr results

2010-12-15 Thread Dennis Gearon
I finally figured out how to use curl to GET results, i.e. just turn all spaces into '%20' in my type of queries. I'm using solar spatial, and then searching in both the default text field and a couple of columns. Works fine on in the browser. But if I query for it using curl in PHP, there's a

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread pankaj bhatt
HI , On Wed, Dec 15, 2010 at 2:52 PM, Dennis Gearon wrote: > I finally figured out how to use curl to GET results, i.e. just turn all > spaces > into '%20' in my type of queries. I'm using solar spatial, and then > searching in > both the default text field and a couple of columns. Works fine on

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Stephen Weiss
Forgive me if this seems like a dumb question but have you tried the Apache_Solr_Service class? http://www.ibm.com/developerworks/library/os-php-apachesolr/index.html It's really quite good at handling the nuts and bolts of making the HTTP requests and decoding the responses for PHP. I almost

Re: Omitting tf but not positions

2010-12-15 Thread Robert Muir
On Wed, Dec 15, 2010 at 3:09 AM, Jan Høydahl / Cominvent wrote: > Any way to disable TF/IDF normalization without also disabling positions? > see Similarity.tf(float) and Similarity.tf(int) if you want to change this for both terms and phrases just override Similarity.tf(float), since by default

French stemming / size of synonyms file

2010-12-15 Thread Emmanuel Bégué
Hello, According to the wiki http://wiki.apache.org/solr/LanguageAnalysis, the light stemmers for French (solr.FrenchLightStemFilterFactory and solr.FrenchMinimalStemFilterFactory) are only available for SOLR 3.1. Is there a way to make them work with 1.4.1? - - - Additionally, there is an "off

Re: Search with facet.pivot

2010-12-15 Thread Erik Hatcher
One oddity is the duplicated sections: root_category_name,parent_category_name,category root_category_id,parent_category_id,category_id That's in your responseHeader twice. Perhaps something fishy caused from that? Is this hardcoded in your solrconfig.xml request handler mapping

R: limit the search results to one category

2010-12-15 Thread Andrea Gazzarini
Did you try with filterquery? Andrea Gazzarini -Original Message- From: sara motahari Date: Tue, 14 Dec 2010 17:34:52 To: Reply-To: solr-user@lucene.apache.org Subject: limit the search results to one category Hi all, I am using a dismax request handler with vrious fieds that it sear

Re: French stemming / size of synonyms file

2010-12-15 Thread Robert Muir
2010/12/15 Emmanuel Bégué : > Hello, > > According to the wiki http://wiki.apache.org/solr/LanguageAnalysis, > the light stemmers for French (solr.FrenchLightStemFilterFactory and > solr.FrenchMinimalStemFilterFactory) are only available for SOLR 3.1. > > Is there a way to make them work with 1.4.1

Dataimport performance

2010-12-15 Thread Robert Gründler
Hi, we're looking for some comparison-benchmarks for importing large tables from a mysql database (full import). Currently, a full-import of ~ 8 Million rows from a MySQL database takes around 3 hours, on a QuadCore Machine with 16 GB of ram and a Raid 10 storage setup. Solr is running on a apa

Re: [DIH] Example for SQL Server

2010-12-15 Thread Adam Estrada
Thanks All, Testing here shortly and will report back asap. w/r, Adam On Wed, Dec 15, 2010 at 4:10 AM, Savvas-Andreas Moysidis < savvas.andreas.moysi...@googlemail.com> wrote: > Hi Adam, > > we are using DIH to index off an SQL Server database(the freeby SQLExpress > one.. ;) ). We have defined

Problem with multicore

2010-12-15 Thread Jörg Agatz
Hallo Users, I habve a Problem wit Solr 1.4.1 on Ubuntu 10.10 I have download the new version and extract it! than i have copy the solr.xml from example/multicore/solr.xml to /examples/solr/solr.xml than i create folders example/solr/core0 and example/solr/

Re: Dataimport performance

2010-12-15 Thread Adam Estrada
What version of Solr are you using? Adam 2010/12/15 Robert Gründler > Hi, > > we're looking for some comparison-benchmarks for importing large tables > from a mysql database (full import). > > Currently, a full-import of ~ 8 Million rows from a MySQL database takes > around 3 hours, on a QuadCo

Re: Dataimport performance

2010-12-15 Thread Erick Erickson
You're adding on the order of 750 rows (docs)/second, which isn't bad... have you profiled the machine as this runs? Even just with top (assuming unix)... because the very first question is always "what takes the time, getting the data from MySQL or indexing or I/O?". If you aren't maxing out you

Re: Dataimport performance

2010-12-15 Thread Robert Gründler
> What version of Solr are you using? Solr Specification Version: 1.4.1 Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42 Lucene Specification Version: 2.9.3 Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55 -robert > > Adam > > 2010/12/15 Robert Gründ

Re: Dataimport performance

2010-12-15 Thread Bernd Fehling
We are currently running Solr 4.x from trunk. -d64 -Xms10240M -Xmx10240M Total Rows Fetched: 24935988 Total Documents Skipped: 0 Total Documents Processed: 24568997 Time Taken: 5:55:19.104 24.5 Million Docs as XML from filesystem with less than 6 hours. May be your MySQL is the bottleneck? Reg

Re: Dataimport performance

2010-12-15 Thread Tim Heckman
2010/12/15 Robert Gründler : > The data-config.xml looks like this (only 1 entity): > >       >         >         >         >         >         >         name="sf_unique_id"/> > >         >           >         > >       So there's one track entity with an artist sub-entity. My (admittedly rather l

Re: Problem with multicore

2010-12-15 Thread Tommaso Teofili
Hi Jörg, I think the first thing you should check is your Ubuntu's encoding, second one is file permissions (BTW why are you sudoing?). Did you try using the bash script under example/exampledocs named "post.sh" (use it like this: 'sh post.sh *.xml') Cheers, Tommaso 2010/12/15 Jörg Agatz > Hall

Re: Dataimport performance

2010-12-15 Thread Robert Gründler
i've benchmarked the import already with 500k records, one time without the artists subquery, and one time without the join in the main query: Without subquery: 500k in 3 min 30 sec Without join and without subquery: 500k in 2 min 30. With subquery and with left join: 320k in 6 Min 30 so t

Lower level filtering

2010-12-15 Thread Michael Owen
Hi all, I'm currently using Solr and I've got a question about filtering on a lower level than filter queries. We want to be able to restrict the documents that can possibly be returned to a users query. From another system we'll get a list of document unique ids for the user which is all the d

Re: Lower level filtering

2010-12-15 Thread Stephen Green
On Wed, Dec 15, 2010 at 9:49 AM, Michael Owen wrote: > I'm currently using Solr and I've got a question about filtering on a lower > level than filter queries. > We want to be able to restrict the documents that can possibly be returned to > a users query. From another system we'll get a list of

Re: Lower level filtering

2010-12-15 Thread Savvas-Andreas Moysidis
It might not be practical in your case, but is it possible to get from that other system, a list of ids the user is *not* allow to see and somehow invert the logic in the filter? Regards, -- Savvas. On 15 December 2010 14:49, Michael Owen wrote: > > Hi all, > I'm currently using Solr and I've g

Re: Dataimport performance

2010-12-15 Thread Tim Heckman
The custom import I wrote is a java application that uses the SolrJ library. Basically, where I had sub-entities in the DIH config I did the mappings inside my java code. 1. Identify a subset or "chunk" of the primary id's to work on (so I don't have to load everything into memory at once) and put

Custom scoring for searhing geographic objects

2010-12-15 Thread Pavel Minchenkov
Hi, Please give me advise how to create custom scoring. I need to result that documents were in order, depending on how popular each term in the document (popular = how many times it appears in the index) and length of the document (less terms - higher in search results). For example, index contai

RE: Lower level filtering

2010-12-15 Thread Michael Owen
That was a quick response Steve! Sounds all great! Much appreciated. Definitely think specifying a bit filter is something that many people many find useful. I'll have a look at Solr-2052 too. Thanks again, Mike > Date: Wed, 15 Dec 2010 09:57:54 -0500 > Subject: Re: Lower level filtering > From

RE: Lower level filtering

2010-12-15 Thread Michael Owen
Good point - though the inverse could be true where only a few documents is allowed and then a big list still exists. Even in the middle ground, its still going to be a long list of thousands. Thanks Mike > Date: Wed, 15 Dec 2010 14:58:33 + > Subject: Re: Lower level filtering > From: sav

Re: Lower level filtering

2010-12-15 Thread Erick Erickson
Here's the problem with what you're outlining: Solr/Lucene doc ids are NOT invariant, so the doc IDs you get from "the other system" will not be directly usable by in the filter. But assuming the other system stores what you've defined as you could walk the index and get the doc IDs from that (See

Copying the index from one solr instance to another

2010-12-15 Thread Robert Gründler
Hi again, let's say you have 2 solr Instances, which have both exactly the same configuration (schema, solrconfig, etc). Could it cause any troubles if we import an index from a SQL database on solr instance A, and copy the whole index to the datadir of solr instance B (both solr instances run

Re: Copying the index from one solr instance to another

2010-12-15 Thread Shawn Heisey
On 12/15/2010 10:05 AM, Robert Gründler wrote: Hi again, let's say you have 2 solr Instances, which have both exactly the same configuration (schema, solrconfig, etc). Could it cause any troubles if we import an index from a SQL database on solr instance A, and copy the whole index to the dat

Re: Copying the index from one solr instance to another

2010-12-15 Thread Robert Gründler
thanks for your feedback. we can shutdown both solr servers for the time of the copy-process, and both solr instances run the same version, so we should be ok. i'll let you know if we encounter any troubles. -robert On Dec 15, 2010, at 18:11 , Shawn Heisey wrote: > On 12/15/2010 10:05 AM,

Parenthesis in query string

2010-12-15 Thread Tommaso Teofili
Hi all, I've just noticed a strange behavior (or, at least, I didn't expect that), when adding useless parenthesis to a query. Using the lucene query parser in Solr I get no results with the query: * ((( NOT (text:"something"))) AND date <= 2010-12-15) * while I get the expected results when the

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Dennis Gearon
I want to just pass the JSON through after qualifying the user's access to the site. Didn't want to spend the horse power to receive it as PHP array syntax, run the risk of someone putting bad stuff in the contents and running 'exec()' on it, and then spending the extra horsepower to putput i

Dates BC

2010-12-15 Thread Agethle, Matthias
Hi everyone, does the solr.TrieDateField support dates BC? I indexed negative dates and I'm able to query them, but if I store them, they show up as postitive dates. Thanks Matthias

search for a number within a range, where range values are mentioned in documents

2010-12-15 Thread Arunkumar Ayyavu
Hi! I have a typical case where in an attribute (in a DB record) can contain different ranges of numeric values. Let us say the range values in this attribute for "record1" are (2-4,5000-8000,45000-5,454,231,1000). As you can see this attribute can also contain isolated numeric values

Transparent redundancy in Solr

2010-12-15 Thread Tommaso Teofili
Hi all, me, Upayavira and other guys at Sourcesense have collected some Solr architectural views inside the presentation at [1]. For sure one can set up an architecture for failover and resiliency on the "search face" (search slaves with coordinators and distributed search) but I'd like to ask how

Re: search for a number within a range, where range values are mentioned in documents

2010-12-15 Thread Jonathan Rochkind
I'm not sure you're right that it will result in an out-of-memory error if the range is too large. I don't think it will, I think it'll be fine as far as memory goes, because of how Lucene works. Or do you actually have reason to believe it was causing you memory issues? Or do you just mean me

Re: Copying the index from one solr instance to another

2010-12-15 Thread Rob Casson
just making sure that you're aware of the built-in replication: http://wiki.apache.org/solr/SolrReplication can pull the indexes, along with config files. cheers, rob 2010/12/15 Robert Gründler : > Hi again, > > let's say you have 2 solr Instances, which have both exactly the same > confi

Re: facet.pivot for date fields

2010-12-15 Thread Adeel Qureshi
Thanks Pankaj - that was useful to know. I havent used the query stuff before for facets .. so that was good to know .. but the problem is still there because I want the hierarchical counts which is exactly what facet.pivot does .. so e.g. i want to count for fieldC within fieldB and even fieldB w

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Andrew McCombe
Hi You could use Solr's php serialized object output (wt=phps) and then convert it to json in your php: Regards Andrew McCombe On 15 December 2010 17:49, Dennis Gearon wrote: > I want to just pass the JSON through after qualifying the user's access to > the > site. > > > Didn't want to spend

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Markus Jelsma
The GeoDistanceComponent triggers the problem. It may be an issue in the component but it could very well be a Solr issue. It seems you missed a very recent thread on this one. https://issues.apache.org/jira/browse/SOLR-2278 > I finally figured out how to use curl to GET results, i.e. just turn

Re: [DIH] Example for SQL Server

2010-12-15 Thread Adam Estrada
I got it to work! This is an excellent article for importing SQL Server data in to your index. http://www.chrisumbel.com/article/lucene_solr_sql_server Adam On Wed, Dec 15, 2010 at 8:43 AM, Adam Estrada wrote: > Thanks All, > > Testing h

Re: Exceptions in Embedded Solr

2010-12-15 Thread Antoniya Statelova
I experienced this on an EmbeddedSolrServer which was running behind a tomcat process. After restarting the tomcat process 2-3 times (implying this also recreates the SolrServer every time as well) this issue went away but I don't know why it ever started. It looked like the searcher shutdown was n

[Adding] Entities when indexing a DB

2010-12-15 Thread Adam Estrada
All, I have successfully indexed a single entity but when I try multiple entities is the second is skipped all together. Is there something wrong with my config file?

Re: [Adding] Entities when indexing a DB

2010-12-15 Thread Allistair Crossley
mission.id and event.id if the same value will be overwriting the indexed document. your ids need to be unique across all documents. i usually have a field id_original that i map the table id to, and then for id per entity i usually prefix it with the entity name in the value mapped to the schem

Re: Dates BC

2010-12-15 Thread Chris Hostetter
: does the solr.TrieDateField support dates BC? : I indexed negative dates and I'm able to query them, : but if I store them, they show up as postitive dates. Hmm... definitely seems to be a bug. I *think* this is another manifestation of SOLR-1899 (because of how the hokey formatting code us

Re: Facet same field with different preifx

2010-12-15 Thread Chris Hostetter
: Can I facet the same field twice with a different prefix as per example : below? not at the moment. it should be if/when someone gets arround to working on SOLR-2251... https://issues.apache.org/jira/browse/SOLR-2251 -Hoss

Re: Custom scoring for searhing geographic objects

2010-12-15 Thread Grant Ingersoll
Have a look at http://lucene.apache.org/java/3_0_2/scoring.html on how Lucene's scoring works. You can override the Similarity class in Solr as well via the schema.xml file. On Dec 15, 2010, at 10:28 AM, Pavel Minchenkov wrote: > Hi, > Please give me advise how to create custom scoring. I ne

[ANN] General Availability of LucidWorks Enterprise

2010-12-15 Thread Grant Ingersoll
Lucid Imagination is pleased to announce the general availability of our Apache Solr/Lucene powered LucidWorks Enterprise (LWE). LWE is designed to make it easier for people to get up to speed on search by providing easier management, integration with libraries commonly used in building search

Re: Viewing query debug explanation with dismax and multicore

2010-12-15 Thread Chris Hostetter
: I am trying to debug my queries and see how scoring is done. I have 6 cores and : send the quesy to 6 shards and it's dismax handler (with search on various : fields with different boostings). I enable debug, and view source but I'm unable : to see the explanations. I'm returning ID and sco

Re: limit the search results to one category

2010-12-15 Thread Chris Hostetter
: Subject: limit the search results to one category : References: <427522.34555...@web52907.mail.re2.yahoo.com> : <930238.38683...@web51308.mail.re2.yahoo.com> : In-Reply-To: <930238.38683...@web51308.mail.re2.yahoo.com> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mail

Re: Problem with multicore

2010-12-15 Thread Chris Hostetter
: SimplePostTool: FATAL: Solr returned an error: : Unexpected_character_m_code_109_in_prolog_expected___at_rowcol_unknownsource_11 if you look at your solr log (or the HTTP response body, SimplePostTool only gives you the status line) you'll see the more human readable form of that error which

Re: [ANN] General Availability of LucidWorks Enterprise

2010-12-15 Thread Andy
Congrats! A couple questions: 1) Which version of Solr is this based on? 2) How is LWE different from standard Solr? How should one choose between the two? Thanks. --- On Wed, 12/15/10, Grant Ingersoll wrote: > From: Grant Ingersoll > Subject: [ANN] General Availability of LucidWorks Enterp

Re: Parenthesis in query string

2010-12-15 Thread Ahmet Arslan
I think this is related to http://search-lucene.com/m/lM9CXH2Pl7 Also as explained here http://search-lucene.com/m/g4JmKSGMaI/ it is better to use + - operator rather than OR AND NOT parenthesis. http://wiki.apache.org/lucene-java/BooleanQuerySyntax Just for your information: There is no such s

Re: [Adding] Entities when indexing a DB

2010-12-15 Thread Adam Estrada
Ahhh...I found that I did not set a dataSource name and when I did that and then referred each entity to that dataSource all went according to plan ;-) Solr Rocks! Ad

Re: nexus of synonyms and stemming, take 2

2010-12-15 Thread Chris Hostetter
: This is a fairly basic synonyms question: how does synonyms handle stemming? it's all a question of how your analysis chain is configured forh te field type. if you have your stemming filter before your synonyms filter, then the synonyms.txt file needs to map the *stems* of hte synonyms. if

Re: can solrj swap cores?

2010-12-15 Thread Chris Hostetter
: One of our developers had initially tried swapping solr cores (e.g. core0 : and core1) using the solrj api, but it failed. (don't have the exact error) : He susequently replaced the call with straight http (i.e. http client). : : Unfortunately I don't have the exact error in front of me... off

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Dennis Gearon
I will look into the security and processor power implications of that. Good idea, thx. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yours

Re: can solrj swap cores?

2010-12-15 Thread Tim Heckman
It's been working for me. One thing to look out for might be the url you're using in SolrUtil.getSolrServer()? The url you use for reindexing won't be the same as the one you use to swap cores. Make sure it's using "admin/cores" and not "production/admin/cores" or "reindex/admin/cores". Sorry if t

Memory use during merges (OOM)

2010-12-15 Thread Burton-West, Tom
Hello all, Are there any general guidelines for determining the main factors in memory use during merges? We recently changed our indexing configuration to speed up indexing but in the process of doing a very large merge we are running out of memory. Below is a list of the changes and part of t

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Dennis Gearon
well, it was three problems: 1/ I was saving the file as a 'complete web page', uknowingly, from firefox. 2/ I had a small message for troubleshooting being spit out after the json. 3/ My partner had output all the spatial solr 'tiers' information, and there's a binary value in there that stops

Re: Next Word - Any Suggestions?

2010-12-15 Thread Sean O'Connor
Hi Christopher, One option comes to mind: shingles? I have not done anything with them yet, but that is on my radar for sometime about a month out. Speaking unencumbered by experience or substantial understanding, my guess is that shingles would be great for you if you can select shing

Thank you!

2010-12-15 Thread Adam Estrada
I just want to say that this list serve has been invaluable to a newbie like me ;-) I posted a question earlier today and literally 10 minutes later I got an answer that helped me solve my problem. This is proof that there is a experienced and energetic community behind this FOSS group of projects

Re: Dataimport performance

2010-12-15 Thread Lance Norskog
Can you do just one join in the top-level query? The DIH does not have a batching mechanism for these joins, but your database does. On Wed, Dec 15, 2010 at 7:11 AM, Tim Heckman wrote: > The custom import I wrote is a java application that uses the SolrJ > library. Basically, where I had sub-enti