I'm thinking about modifying my index process to use json because all my
docs are originally in json anyway . Are there any performance issues if I
insert json docs instead of xml docs? A colleague recommended to me to
stay with xml because solr is highly optimized for xml.
I've been reading the solr source code and made modifications by
implementing a custom Similarity class.
I want to implement a weight to the score by multiplying a number
based on if the current doc has certain term in it.
So if the query was q=data_text:foo
then the Similiarity class would apply
> In this example the fq produces a docset that contains all user
> documents that are active. This docset is used as filter during the
> execution of the main query (q param),
> so it only returns posts with the contain the text hello for active users.
>
> Martijn
>
> On
I've written a script that does bulk insertion from my database, it
grabs chunks of 500 docs (out of 100 million ) and inserts them into
solr over http. I have 5 threads that are inserting from a queue.
After each insert I issue a commit.
Every 20 or so inserts I get this error message:
Error:
Does anyone have any idea on this issue?
On Tue, Oct 25, 2011 at 11:40 AM, Jason Toy wrote:
> Hi Yonik,
>
> Without a Join I would normally query user docs with:
> q=data_text:"test"&fq=is_active_boolean:true
>
> With joining users with posts, I get no no results
I have a similar problem except I need to filter scores that are too high.
Robert Stewart 於 Oct 27, 2011 7:04 AM 寫道:
> BTW, this would be good standard feature for SOLR, as I've run into this
> requirement more than once.
>
>
> On Oct 27, 2011, at 9:49 AM, karsten-s...@gmx.de wrote:
>
>> H
be the same as I would get from my original
"q=data_text:"test"&fq=is_active_boolean:true", but with the ability to join
with the Posts docs.
On Tue, Oct 25, 2011 at 11:30 AM, Yonik Seeley
wrote:
> Can you give an example of the request (URL) you are sending to Solr?
>
> -Yonik
>
I have 2 types of docs, users and posts.
I want to view all the docs that belong to certain users by joining posts
and users together. I have to filter the users with a filter query of
"is_active_boolean:true" so that the score is not effected,but since I do a
join, I have to move the filter query
I know that solr has functions like termfreq and that works fine for single
words.
How can I do the same count but for a phrase?When solr does a full text
search with a phrase, does it actually search for the phrase or does it
break it down into single words? If it is broken down into single wo
lds.
>
> You may also want to use ngram fields instead of text if you want to still
> match that San Fransisco oops typo.
>
> Otis
>
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
In our current system ,we have 3 fields for location, city, state, and
country.People in our system search for one of those 3 strings.
So a user can search for "San Francisco" or "California". In solr I store
those 3 fields as strings and when a search happens I search with an OR
statement ac
I have several different document types that I store. I use a serialized
integer that is unique to the document type. If I use id as the uniqueKey,
then there is a possibility to have colliding docs on the id, what would be
the best way to have a unique id given I am storing my unique identifier
I'm testing out the join functionality on the svn revision 1175424.
I've found when I add a single filter query to a join it works fine, but
when I do more then 1 filter query, the query does not return results.
This single function query with a join returns results:
http://127.0.0.1:8983/solr/se
Can dismax understand that query in a translated form?
在 Sep 29, 2011 10:01 PM 時,yingshou guo 寫到:
> you cann't use this kind of query syntax against dismax query parser.
> your query can by understood by standard query parser or edismax query
> parser. "qt" request parameter is used by solr to
Hi all, I am testing various versions of solr from trunk, I am finding that
often times the example doesn't build and I can't test out the version. Is
there a resource that shows which versions build correctly so that we can
test it out?
Hi all,
I'd like to know what the specific disadvantages are for using dynamic
fields in my schema are? About half of my fields are dynamic, but I could
move all of them to be static fields. WIll my searches run faster? If there
are no disadvantages, can I just set all my fields to be dynamic?
J
I am running the sun version:
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
I get multiple Out of memory exceptions looking at my application and the
solr logs, but my script doesn't get called the first
I had a join query that was originally written as :
{!join from=self_id_i to=user_id_i}data_text:hello
and that works fine. I later added an fq filter:
{!frange l=0.05 }div(termfreq(data_text,'hello'),max_i)
and the query doesn't work anymore. if I do the fq by itself without the
join the query w
I have solr issues where I keep running out of memory. I am working on
solving the memory issues (this will take a long time), but in the meantime,
I'm trying to be notified when the error occurs. I saw with the jvm I can
pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every time
t
Anyone know the query I would do to get the join to work? I'm unable to get
it to work.
On Wed, Sep 14, 2011 at 10:49 AM, Jason Toy wrote:
> I've been reading the information on the new join feature and am not quite
> sure how I would use it given my schema structure. I have
I've been reading the information on the new join feature and am not quite
sure how I would use it given my schema structure. I have "User" docs and
"BlogPost" docs and I want to return all BlogPosts that match the fulltext
title "cool" that belong to Users that match the description "solr".
Here
I'd love to see the progress on this.
On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla wrote:
> Hi,
>
> The standard lucene/solr parsing is nice but not really flexible. I
> saw questions and discussion about ANTLR, but unfortunately never a
> working grammar, so... maybe you find this useful:
>
>
I wrote the title wrong, its a filter query, not a function query, thanks
for the correction.
The field is a string, I had tried fq=stats_s:"New York" before and that
did not work, I'm puzzled to why this didn't work.
I tried out your b suggestion and that worked,thanks!
On Tue, Sep 13, 2011 at
I had queries breaking on me when there were spaces in the text I was
searching for. Originally I had :
fq=state_s:New York
and that would break, I found a work around by using:
fq={!raw f=state_s}New York
My problem now is doing this with an OR query, this is what I have now, but
it doesn't w
I'm trying to limit my data to only docs that have the word 'foo' appear at
least once.
I am trying to use:
fq=termfreqdata,'foo'):[1+TO+*]
but I get the syntax error:
Caused by: org.apache.lucene.queryparser.classic.ParseException: Encountered
" ":" ": "" at line 1, column 33.
Was expecting one o
After running a combination of different queries, my solr server eventually
is unable to complete certain requests because it runs out of memory, which
means I need to restart the server as its basically useless with some
queries working and not others. I am moving to distributed setting soon,
bu
What can I do temporarily in this situation? It seems like I must eventually
move to a distributed setup. I am sorting on dynamic float fields.
On Wed, Aug 17, 2011 at 3:01 PM, Yonik Seeley wrote:
> On Wed, Aug 17, 2011 at 5:56 PM, Jason Toy wrote:
> > I've only set set minimum m
I've only set set minimum memory and have not set maximum memory. I'm doing
more investigation and I see that I have 100+ dynamic fields for my
documents, not the 10 fields I quoted earlier. I also sort against those
dynamic fields often, I'm reading that this potentially uses a lot of
memory.
imply 'restart' and start serving queries again.
>
> -Original Message-
> From: Jason Toy [mailto:jason...@gmail.com]
> Sent: Wednesday, August 17, 2011 5:15 PM
> To: solr-user@lucene.apache.org
> Subject: solr keeps dying every few hours.
>
> I have a large ec2 instance(
I have a large ec2 instance(7.5 gb ram), it dies every few hours with out of
heap memory issues. I started upping the min memory required, currently I
use -Xms3072M .
I insert about 50k docs an hour and I currently have about 65 million docs
with about 10 fields each. Is this already too much data
gt;
> >
> > To understand why you'd need to reindex, you might want to read up on how
> > lucene actually works, to get a basic understanding of how different
> > indexing choices effect what is possible at query time. Lucene In Action
> > is a pretty good book.
>
sort specifically by termfreq of a phrase?
>
> You cannot. What you can do is index multiple terms as one term using the
> shingle filter. Take care, it can significantly increase your index size
> and
> number of unique terms.
>
> >
> >
> >
> > On Mon, Aug 8, 2011 at
ou can use the standard query parser and pass q=*:*
>
> 2011/8/8 Jason Toy
>
> > I am trying to list some data based on a function I run ,
> > specifically termfreq(post_text,'indie music') and I am unable to do it
> > without passing in data to the q p
I am trying to list some data based on a function I run ,
specifically termfreq(post_text,'indie music') and I am unable to do it
without passing in data to the q paramater. Is it possible to get a sorted
list without searching for any terms?
How can I run a query to get the result count only? I only need the count
and so I dont need solr to send me all the results back.
As I'm using solr more and more, I'm finding that I need to do searches and
then order by new criteria. So I am constantly add new fields into solr
and then reindexing everything.
I want to know if adding in all this data into solr is the normal way to
deal with sorting. I'm finding that I have
Hi Chris, you were correct, the filed was getting set as a double. Thanks
for the help.
On Fri, Jul 22, 2011 at 7:03 PM, Jason Toy wrote:
> This is the document I am posting:
> Post
> 75004824785129473Post name="at_d">2011-05-30T01:05:18ZNew
> YorkUnited States nam
This is the document I am posting:
Post
75004824785129473Post2011-05-30T01:05:18ZNew
YorkUnited Stateshello world!
In my schema.xml file I have these date fields, do I need more?
On Fri, Jul 22, 2011 at 5:00 PM, Jason Toy wrote:
> I haven't modified my schema in the older
I haven't modified my schema in the older solr or trunk solr,is it required
to modify my schema to support timestamps?
On Fri, Jul 22, 2011 at 4:45 PM, Chris Hostetter
wrote:
> : In Solr 1.3.1 I am able to store timestamps in my docs so that I query
> them.
> :
> : In trunk when I try to store a
In Solr 1.3.1 I am able to store timestamps in my docs so that I query them.
In trunk when I try to store a doc with a timestamp I get a sever error, is
there a different way I should store this data or is this a bug?
Jul 22, 2011 7:20:14 PM org.apache.solr.update.processor.LogUpdateProcessor
fi
How does one search for words with characters like # and +. I have tried
searching solr with "#test" and "\#test" but all my results always come up
with "test" and not "#test". Is this some kind of configuration option I
need to set in solr?
--
- sent from my mobile
6176064373
According to that bug list, there are other characters that break the
sorting function. Is there a list of safe characters I can use as a
delimiter?
On Mon, Jul 18, 2011 at 1:31 PM, Chris Hostetter
wrote:
>
> : When I try to sort by a column with a colon in it like
> : "scores:rails_f", solr ha
How does one search for the term "google+" with solr? I noticed on twitter I
can search for google+: http://search.twitter.com/search?q=google%2B (which
uses lucene, not sure about solr) but searching on my copy of solr, I can't
search for google+
--
- sent from my mobile
6176064373
whether that's actually prohibited, but that could
> be your problem.
>
> ---- Nick
>
>
> On 7/18/2011 8:10 AM, Jason Toy wrote:
>
>> Hi all, I found a bug that exists in the 3.1 and in trunk, but not in
>> 1.4.1
>>
>> When I try to sort by a column wit
Hi all, I found a bug that exists in the 3.1 and in trunk, but not in 1.4.1
When I try to sort by a column with a colon in it like
"scores:rails_f", solr has cutoff the column name from the colon
forward so "scores:rails_f" becomes "scores"
To test, I inserted this doc:
In 1.4.1 I was able to
I am trying to use sorting by the termfreq function using the trunk code
since termfreq was added in the 4.0 code base.
I run this query:
http://127.0.0.1:8983/solr/select/?q=librarian&sort=termfreq(all_lists_text,librarian)%20desc
but I get:
HTTP ERROR 500
Problem accessing /solr/select/. Reaso
I'm trying to run the example app from the svn source, but it doesn't seem
to work. I am able to run :
java -jar start.jar
and Jetty starts with:
INFO::Started SocketConnector@0.0.0.0:8983
But then when I go to my browser and go to this address:
http://localhost:8983/solr/
I get a 404 error. What
I am trying to use sorting by function on solr 3.2 and it doesn't now workt
with termfreq. I do this query:
/solr/select?q=test&qf=all_lists_text&defType=dismax&sort=termfreq%28all_lists_text%2Ctest%29+desc&rows=50
I get this error:
Can't determine Sort Order: 'termfreq(description_text,'test')
function you get the results
> > of the function back?
> >
> > Can you show me an example query you run ?
> >
> >
> >
> > //http://wiki.apache.org/solr/FunctionQuery#idf
> >
> > On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy wrote:
> > > I
Ahmet, that doesnt return the idf data in my results, unless I am
doing something wrong. When you run any function you get the results
of the function back?
Can you show me an example query you run ?
//http://wiki.apache.org/solr/FunctionQuery#idf
On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy
I want to be able to run a query like idf(text, 'term') and have that data
returned with my search results. I've searched the docs,but I'm unable to
find how to do it. Is this possible and how can I do that ?
t; becomes "scores"
I can see in the lucene index that the data for scores:rails_f is in
the document. For that reason I believe the bug is in solr and not in
lucene.
Jason Toy
socmetrics
http://socmetrics.com
@jtoy
52 matches
Mail list logo