performance of json vs xml?

2011-12-11 Thread Jason Toy
I'm thinking about modifying my index process to use json because all my docs are originally in json anyway . Are there any performance issues if I insert json docs instead of xml docs? A colleague recommended to me to stay with xml because solr is highly optimized for xml.

how to implement per doc weighting

2011-12-07 Thread Jason Toy
I've been reading the solr source code and made modifications by implementing a custom Similarity class. I want to implement a weight to the score by multiplying a number based on if the current doc has certain term in it. So if the query was q=data_text:foo then the Similiarity class would apply

Re: joins and filter queries effecting scoring

2011-12-05 Thread Jason Toy
> In this example the fq produces a docset that contains all user > documents that are active. This docset is used as filter during the > execution of the main query (q param), > so it only returns posts with the contain the text hello for active users. > > Martijn > > On

getting lots of errors doing bulk insertion

2011-11-15 Thread Jason Toy
I've written a script that does bulk insertion from my database, it grabs chunks of 500 docs (out of 100 million ) and inserts them into solr over http. I have 5 threads that are inserting from a queue. After each insert I issue a commit. Every 20 or so inserts I get this error message: Error:

Re: joins and filter queries effecting scoring

2011-10-27 Thread Jason Toy
Does anyone have any idea on this issue? On Tue, Oct 25, 2011 at 11:40 AM, Jason Toy wrote: > Hi Yonik, > > Without a Join I would normally query user docs with: > q=data_text:"test"&fq=is_active_boolean:true > > With joining users with posts, I get no no results

Re: Limit by score? sort by other field

2011-10-27 Thread Jason Toy
I have a similar problem except I need to filter scores that are too high. Robert Stewart 於 Oct 27, 2011 7:04 AM 寫道: > BTW, this would be good standard feature for SOLR, as I've run into this > requirement more than once. > > > On Oct 27, 2011, at 9:49 AM, karsten-s...@gmx.de wrote: > >> H

Re: joins and filter queries effecting scoring

2011-10-25 Thread Jason Toy
be the same as I would get from my original "q=data_text:"test"&fq=is_active_boolean:true", but with the ability to join with the Posts docs. On Tue, Oct 25, 2011 at 11:30 AM, Yonik Seeley wrote: > Can you give an example of the request (URL) you are sending to Solr? > > -Yonik >

joins and filter queries effecting scoring

2011-10-24 Thread Jason Toy
I have 2 types of docs, users and posts. I want to view all the docs that belong to certain users by joining posts and users together. I have to filter the users with a filter query of "is_active_boolean:true" so that the score is not effected,but since I do a join, I have to move the filter query

how can I get the exact phrase count from solr?

2011-10-08 Thread Jason Toy
I know that solr has functions like termfreq and that works fine for single words. How can I do the same count but for a phrase?When solr does a full text search with a phrase, does it actually search for the phrase or does it break it down into single words? If it is broken down into single wo

Re: what is the recommended way to store locations?

2011-10-06 Thread Jason Toy
lds. > > You may also want to use ngram fields instead of text if you want to still > match that San Fransisco oops typo. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > >

what is the recommended way to store locations?

2011-10-06 Thread Jason Toy
In our current system ,we have 3 fields for location, city, state, and country.People in our system search for one of those 3 strings. So a user can search for "San Francisco" or "California". In solr I store those 3 fields as strings and when a search happens I search with an OR statement ac

composite Unique Keys?

2011-10-04 Thread Jason Toy
I have several different document types that I store. I use a serialized integer that is unique to the document type. If I use id as the uniqueKey, then there is a possibility to have colliding docs on the id, what would be the best way to have a unique id given I am storing my unique identifier

I think I've found a bug with filter queries and joins

2011-09-30 Thread Jason Toy
I'm testing out the join functionality on the svn revision 1175424. I've found when I add a single filter query to a join it works fine, but when I do more then 1 filter query, the query does not return results. This single function query with a join returns results: http://127.0.0.1:8983/solr/se

Re: dismax with AND/OR combination

2011-09-29 Thread Jason Toy
Can dismax understand that query in a translated form? 在 Sep 29, 2011 10:01 PM 時,yingshou guo 寫到: > you cann't use this kind of query syntax against dismax query parser. > your query can by understood by standard query parser or edismax query > parser. "qt" request parameter is used by solr to

resource to see which versions build from trunk?

2011-09-24 Thread Jason Toy
Hi all, I am testing various versions of solr from trunk, I am finding that often times the example doesn't build and I can't test out the version. Is there a resource that shows which versions build correctly so that we can test it out?

what are the disdvantages of using dynamic fields?

2011-09-23 Thread Jason Toy
Hi all, I'd like to know what the specific disadvantages are for using dynamic fields in my schema are? About half of my fields are dynamic, but I could move all of them to be static fields. WIll my searches run faster? If there are no disadvantages, can I just set all my fields to be dynamic? J

Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?

2011-09-21 Thread Jason Toy
I am running the sun version: java version "1.6.0_26" Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) I get multiple Out of memory exceptions looking at my application and the solr logs, but my script doesn't get called the first

how to perform joins with function queries?

2011-09-20 Thread Jason Toy
I had a join query that was originally written as : {!join from=self_id_i to=user_id_i}data_text:hello and that works fine. I later added an fq filter: {!frange l=0.05 }div(termfreq(data_text,'hello'),max_i) and the query doesn't work anymore. if I do the fq by itself without the join the query w

OOM errors and -XX:OnOutOfMemoryError flag not working on solr?

2011-09-16 Thread Jason Toy
I have solr issues where I keep running out of memory. I am working on solving the memory issues (this will take a long time), but in the meantime, I'm trying to be notified when the error occurs. I saw with the jvm I can pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every time t

Re: how would I use the new join feature given my schema.

2011-09-15 Thread Jason Toy
Anyone know the query I would do to get the join to work? I'm unable to get it to work. On Wed, Sep 14, 2011 at 10:49 AM, Jason Toy wrote: > I've been reading the information on the new join feature and am not quite > sure how I would use it given my schema structure. I have

how would I use the new join feature given my schema.

2011-09-14 Thread Jason Toy
I've been reading the information on the new join feature and am not quite sure how I would use it given my schema structure. I have "User" docs and "BlogPost" docs and I want to return all BlogPosts that match the fulltext title "cool" that belong to Users that match the description "solr". Here

Re: How to plug a new ANTLR grammar

2011-09-13 Thread Jason Toy
I'd love to see the progress on this. On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla wrote: > Hi, > > The standard lucene/solr parsing is nice but not really flexible. I > saw questions and discussion about ANTLR, but unfortunately never a > working grammar, so... maybe you find this useful: > >

Re: using a function query with OR and spaces?

2011-09-13 Thread Jason Toy
I wrote the title wrong, its a filter query, not a function query, thanks for the correction. The field is a string, I had tried fq=stats_s:"New York" before and that did not work, I'm puzzled to why this didn't work. I tried out your b suggestion and that worked,thanks! On Tue, Sep 13, 2011 at

using a function query with OR and spaces?

2011-09-13 Thread Jason Toy
I had queries breaking on me when there were spaces in the text I was searching for. Originally I had : fq=state_s:New York and that would break, I found a work around by using: fq={!raw f=state_s}New York My problem now is doing this with an OR query, this is what I have now, but it doesn't w

syntax for functions used in the fq parameter

2011-08-26 Thread Jason Toy
I'm trying to limit my data to only docs that have the word 'foo' appear at least once. I am trying to use: fq=termfreqdata,'foo'):[1+TO+*] but I get the syntax error: Caused by: org.apache.lucene.queryparser.classic.ParseException: Encountered " ":" ": "" at line 1, column 33. Was expecting one o

automatically dealing with out of memory exceptions

2011-08-24 Thread Jason Toy
After running a combination of different queries, my solr server eventually is unable to complete certain requests because it runs out of memory, which means I need to restart the server as its basically useless with some queries working and not others. I am moving to distributed setting soon, bu

Re: solr keeps dying every few hours.

2011-08-17 Thread Jason Toy
What can I do temporarily in this situation? It seems like I must eventually move to a distributed setup. I am sorting on dynamic float fields. On Wed, Aug 17, 2011 at 3:01 PM, Yonik Seeley wrote: > On Wed, Aug 17, 2011 at 5:56 PM, Jason Toy wrote: > > I've only set set minimum m

Re: solr keeps dying every few hours.

2011-08-17 Thread Jason Toy
I've only set set minimum memory and have not set maximum memory. I'm doing more investigation and I see that I have 100+ dynamic fields for my documents, not the 10 fields I quoted earlier. I also sort against those dynamic fields often, I'm reading that this potentially uses a lot of memory.

Re: solr keeps dying every few hours.

2011-08-17 Thread Jason Toy
imply 'restart' and start serving queries again. > > -Original Message- > From: Jason Toy [mailto:jason...@gmail.com] > Sent: Wednesday, August 17, 2011 5:15 PM > To: solr-user@lucene.apache.org > Subject: solr keeps dying every few hours. > > I have a large ec2 instance(

solr keeps dying every few hours.

2011-08-17 Thread Jason Toy
I have a large ec2 instance(7.5 gb ram), it dies every few hours with out of heap memory issues. I started upping the min memory required, currently I use -Xms3072M . I insert about 50k docs an hour and I currently have about 65 million docs with about 10 fields each. Is this already too much data

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
gt; > > > > To understand why you'd need to reindex, you might want to read up on how > > lucene actually works, to get a basic understanding of how different > > indexing choices effect what is possible at query time. Lucene In Action > > is a pretty good book. >

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
sort specifically by termfreq of a phrase? > > You cannot. What you can do is index multiple terms as one term using the > shingle filter. Take care, it can significantly increase your index size > and > number of unique terms. > > > > > > > > > On Mon, Aug 8, 2011 at

bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
ou can use the standard query parser and pass q=*:* > > 2011/8/8 Jason Toy > > > I am trying to list some data based on a function I run , > > specifically termfreq(post_text,'indie music') and I am unable to do it > > without passing in data to the q p

is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms?

getting result count only

2011-08-06 Thread Jason Toy
How can I run a query to get the result count only? I only need the count and so I dont need solr to send me all the results back.

dealing with so many different sorting options

2011-07-29 Thread Jason Toy
As I'm using solr more and more, I'm finding that I need to do searches and then order by new criteria. So I am constantly add new fields into solr and then reindexing everything. I want to know if adding in all this data into solr is the normal way to deal with sorting. I'm finding that I have

Re: saving timestamps in trunk broken?

2011-07-22 Thread Jason Toy
Hi Chris, you were correct, the filed was getting set as a double. Thanks for the help. On Fri, Jul 22, 2011 at 7:03 PM, Jason Toy wrote: > This is the document I am posting: > Post > 75004824785129473Post name="at_d">2011-05-30T01:05:18ZNew > YorkUnited States nam

Re: saving timestamps in trunk broken?

2011-07-22 Thread Jason Toy
This is the document I am posting: Post 75004824785129473Post2011-05-30T01:05:18ZNew YorkUnited Stateshello world! In my schema.xml file I have these date fields, do I need more? On Fri, Jul 22, 2011 at 5:00 PM, Jason Toy wrote: > I haven't modified my schema in the older

Re: saving timestamps in trunk broken?

2011-07-22 Thread Jason Toy
I haven't modified my schema in the older solr or trunk solr,is it required to modify my schema to support timestamps? On Fri, Jul 22, 2011 at 4:45 PM, Chris Hostetter wrote: > : In Solr 1.3.1 I am able to store timestamps in my docs so that I query > them. > : > : In trunk when I try to store a

saving timestamps in trunk broken?

2011-07-22 Thread Jason Toy
In Solr 1.3.1 I am able to store timestamps in my docs so that I query them. In trunk when I try to store a doc with a timestamp I get a sever error, is there a different way I should store this data or is this a bug? Jul 22, 2011 7:20:14 PM org.apache.solr.update.processor.LogUpdateProcessor fi

problem searching on non standard characters

2011-07-22 Thread Jason Toy
How does one search for words with characters like # and +. I have tried searching solr with "#test" and "\#test" but all my results always come up with "test" and not "#test". Is this some kind of configuration option I need to set in solr? -- - sent from my mobile 6176064373

Re: I found a sorting bug in solr/lucene

2011-07-19 Thread Jason Toy
According to that bug list, there are other characters that break the sorting function. Is there a list of safe characters I can use as a delimiter? On Mon, Jul 18, 2011 at 1:31 PM, Chris Hostetter wrote: > > : When I try to sort by a column with a colon in it like > : "scores:rails_f", solr ha

searching for google+

2011-07-18 Thread Jason Toy
How does one search for the term "google+" with solr? I noticed on twitter I can search for google+: http://search.twitter.com/search?q=google%2B (which uses lucene, not sure about solr) but searching on my copy of solr, I can't search for google+ -- - sent from my mobile 6176064373

Re: I found a sorting bug in solr/lucene

2011-07-18 Thread Jason Toy
whether that's actually prohibited, but that could > be your problem. > > ---- Nick > > > On 7/18/2011 8:10 AM, Jason Toy wrote: > >> Hi all, I found a bug that exists in the 3.1 and in trunk, but not in >> 1.4.1 >> >> When I try to sort by a column wit

I found a sorting bug in solr/lucene

2011-07-18 Thread Jason Toy
Hi all, I found a bug that exists in the 3.1 and in trunk, but not in 1.4.1 When I try to sort by a column with a colon in it like "scores:rails_f", solr has cutoff the column name from the colon forward so "scores:rails_f" becomes "scores" To test, I inserted this doc: In 1.4.1 I was able to

sorting by termfreq on trunk doesn't work?

2011-06-22 Thread Jason Toy
I am trying to use sorting by the termfreq function using the trunk code since termfreq was added in the 4.0 code base. I run this query: http://127.0.0.1:8983/solr/select/?q=librarian&sort=termfreq(all_lists_text,librarian)%20desc but I get: HTTP ERROR 500 Problem accessing /solr/select/. Reaso

example doesnt run from source?

2011-06-19 Thread Jason Toy
I'm trying to run the example app from the svn source, but it doesn't seem to work. I am able to run : java -jar start.jar and Jetty starts with: INFO::Started SocketConnector@0.0.0.0:8983 But then when I go to my browser and go to this address: http://localhost:8983/solr/ I get a 404 error. What

can't determine sort order?

2011-06-11 Thread Jason Toy
I am trying to use sorting by function on solr 3.2 and it doesn't now workt with termfreq. I do this query: /solr/select?q=test&qf=all_lists_text&defType=dismax&sort=termfreq%28all_lists_text%2Ctest%29+desc&rows=50 I get this error: Can't determine Sort Order: 'termfreq(description_text,'test')

Re: how can I return function results in my query?

2011-06-10 Thread Jason Toy
function you get the results > > of the function back? > > > > Can you show me an example query you run ? > > > > > > > > //http://wiki.apache.org/solr/FunctionQuery#idf > > > > On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy wrote: > > > I

Re: how can I return function results in my query?

2011-06-10 Thread Jason Toy
Ahmet, that doesnt return the idf data in my results, unless I am doing something wrong. When you run any function you get the results of the function back? Can you show me an example query you run ? //http://wiki.apache.org/solr/FunctionQuery#idf On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy

how can I return function results in my query?

2011-06-09 Thread Jason Toy
I want to be able to run a query like idf(text, 'term') and have that data returned with my search results. I've searched the docs,but I'm unable to find how to do it. Is this possible and how can I do that ?

found a bug in query parser upgrading from 1.4.1 to 3.1

2011-06-03 Thread Jason Toy
t; becomes "scores" I can see in the lucene index that the data for scores:rails_f is in the document. For that reason I believe the bug is in solr and not in lucene. Jason Toy socmetrics http://socmetrics.com @jtoy