Field Types Question
I was wondering what are the differences in certain field types? For instance what's the difference between the following? integer / sint float / sfloat text / textzh Also, if I have two dynamic fields for instance *_facet and *_facet_mv which both have the type set to string does it really matter which one I use? Thanks, - Jake
Re: Field Types Question
Thanks Erik! On Tue, Aug 12, 2008 at 1:58 AM, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Aug 11, 2008, at 9:28 PM, Jake Conk wrote: >> >> I was wondering what are the differences in certain field types? For >> instance what's the difference between the following? >> >> integer / sint >> float / sfloat > > The difference is the internal representation of the String value of the > term representing the numbers. The "s" prefix means the terms are sortable, > in that they are in ascending _numerical_ (not textual) order within the > index. > >> text / textzh > > Looks like maybe you're picking up an acts_as_solr schema (which uses > text_zh, though). "zh" is the country code for China... and that field type > is likely configured to use a Chinese-savvy analyzer. > >> Also, if I have two dynamic fields for instance *_facet and *_facet_mv >> which both have the type set to string does it really matter which one >> I use? > > If the field types are identical, then no it won't matter which you use - > the same thing will happen internally. > >Erik > >
Searching Questions
1) I want to search only within a specific field, for instance `category`. Is there a way to do this? 2) When searching for multiple results are the following identical since "*_facet" and "*_facet_mv" have their type's both set to string? /select?q=tag_facet:%22John+McCain%22+OR+tag_facet:%22Barack+Obama%22 /select?q=tag_facet_mv:%22John+McCain%22+OR+tag_facet_mv:%22Barack+Obama%22 3) If I'm searching for something that is in a text field but I specify it as a facet string rather than a text type would it still search within text fields or would it just limit the search to string fields? 4) Is there a page that will show me different querying combinations or can someone post some more examples? 5) Anyone else notice returning back the data in php (&wt=phps) doesn't unserialize? I am using PHP 5.3 w/ a nightly copy of Solr from last week. Thanks, - Jake
Static Fields vs Dynamic Fields
Is there a performance difference when using fields that are defined in my schema vs dynamic fields?
Simple Searching Question
Hello, I inserted the following documents into Solr: --- 124 Jake Conk 125 Jake Conk --- id is the only required integer field. foobar_facet is a dynamic string field. When I try to search for anything with the word Jake in it the following ways I get no results. select?q=Jake select?q=Jake* I thought one of those two should work but the only way I got it to work was by specifying which field "Jake" is in along with a wild card. select?q=foobar_facet:Jake* 1) Does this mean for each field I would like to search if Jake exists I would have to add each field like I did above to the query? 2) How would I search if I want to find the name Jake anywhere in the string? The documentation (http://lucene.apache.org/java/docs/queryparsersyntax.html) states that I cannot use a wildcard as the first character such as *Jake* Thanks, - Jake
Re: Simple Searching Question
Hi Shalin, "foobar_facet" is a dynamic field. Its defined in my schema like this: I have the default search field set to text. Can I use more than one default search field? text Thanks, - Jake On Thu, Aug 14, 2008 at 2:48 PM, Shalin Shekhar Mangar <[EMAIL PROTECTED]> wrote: > Hi Jake, > > What is the type of the foobar_facet field in your schema.xml ? > Did you add foobar_facet as the default search field? > > On Fri, Aug 15, 2008 at 3:13 AM, Jake Conk <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> I inserted the following documents into Solr: >> >> >> --- >> >> >> >> 124 >> Jake Conk >> >> >> 125 >> Jake Conk >> >> >> >> >> --- >> >> id is the only required integer field. >> foobar_facet is a dynamic string field. >> >> When I try to search for anything with the word Jake in it the >> following ways I get no results. >> >> >> select?q=Jake >> select?q=Jake* >> >> >> I thought one of those two should work but the only way I got it to >> work was by specifying which field "Jake" is in along with a wild >> card. >> >> >> select?q=foobar_facet:Jake* >> >> >> 1) Does this mean for each field I would like to search if Jake exists >> I would have to add each field like I did above to the query? >> >> 2) How would I search if I want to find the name Jake anywhere in the >> string? The documentation >> (http://lucene.apache.org/java/docs/queryparsersyntax.html) states >> that I cannot use a wildcard as the first character such as *Jake* >> >> Thanks, >> - Jake >> > > > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Simple Searching Question
Rob, Actually I am copying *_facet to text. I have the following for copyField in my schema: This is my field configuration in my schema: Thanks, - Jake On Thu, Aug 14, 2008 at 5:49 PM, Rob Casson <[EMAIL PROTECTED]> wrote: > you're likely not copyField-ing *_facet to text, and we'd need to see > what type of field it is to see how it will be analyzed at both > search/index time. > > the default schema.xml file is pretty well documented, so you might > want to spend some time looking thru it, and reading the > commentslots of good info in there. > > cheers, > rob > > On Thu, Aug 14, 2008 at 7:17 PM, Jake Conk <[EMAIL PROTECTED]> wrote: >> Hi Shalin, >> >> "foobar_facet" is a dynamic field. Its defined in my schema like this: >> >> >> >> I have the default search field set to text. Can I use more than one >> default search field? >> >> text >> >> Thanks, >> - Jake >> >> >> On Thu, Aug 14, 2008 at 2:48 PM, Shalin Shekhar Mangar >> <[EMAIL PROTECTED]> wrote: >>> Hi Jake, >>> >>> What is the type of the foobar_facet field in your schema.xml ? >>> Did you add foobar_facet as the default search field? >>> >>> On Fri, Aug 15, 2008 at 3:13 AM, Jake Conk <[EMAIL PROTECTED]> wrote: >>> >>>> Hello, >>>> >>>> I inserted the following documents into Solr: >>>> >>>> >>>> --- >>>> >>>> >>>> >>>> 124 >>>> Jake Conk >>>> >>>> >>>> 125 >>>> Jake Conk >>>> >>>> >>>> >>>> >>>> --- >>>> >>>> id is the only required integer field. >>>> foobar_facet is a dynamic string field. >>>> >>>> When I try to search for anything with the word Jake in it the >>>> following ways I get no results. >>>> >>>> >>>> select?q=Jake >>>> select?q=Jake* >>>> >>>> >>>> I thought one of those two should work but the only way I got it to >>>> work was by specifying which field "Jake" is in along with a wild >>>> card. >>>> >>>> >>>> select?q=foobar_facet:Jake* >>>> >>>> >>>> 1) Does this mean for each field I would like to search if Jake exists >>>> I would have to add each field like I did above to the query? >>>> >>>> 2) How would I search if I want to find the name Jake anywhere in the >>>> string? The documentation >>>> (http://lucene.apache.org/java/docs/queryparsersyntax.html) states >>>> that I cannot use a wildcard as the first character such as *Jake* >>>> >>>> Thanks, >>>> - Jake >>>> >>> >>> >>> >>> -- >>> Regards, >>> Shalin Shekhar Mangar. >>> >> >
Querying Question
Hello, I'm having trouble using the + operator. According to the documentation if I put that operator in front of any term then it should find that term anywhere within the field. So if I want all the records that have the name "Jake" in them I started with a simple query that works: ?q=Jake Now if I wanted to grow on that and add that the name "test" must be in the category name I thought I would add the following: ?q=%22Jake%22+AND+category_facet:+test But that doesn't work. A matter of fact, whenever I specify exactly which field I want to use the + or - operator it never works. So going back to the first example that does work but if I were to do this: ?q=fullname_facet:+Jake ... then that would not return back any results either. The closest I've gotten was the following query which sorta works: ?q=Jake+AND+%22+test%22 For the time being that query works but is not what I want because I need to specify exactly which fields are allowed to have Jake and +test. I don't want results returned when another field has the word "test" in it. Am I doing something wrong? Please help. The two fields below are both string fields and I'm using copyField to copy them to text fields: --------- 124 Jake Conk My test category 125 Jake Conk My test category -- Thanks, - Jake
Re: Querying Question
I thought if I used to copy my string field to a text field then I can search for words within it and not limited to the entire content. Did I misunderstand that? Thanks, - Jake On Thu, Aug 21, 2008 at 5:53 PM, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Aug 21, 2008, at 7:33 PM, Jake Conk wrote: >> >> I'm having trouble using the + operator. According to the >> documentation if I put that operator in front of any term then it >> should find that term anywhere within the field. > > Be sure to look at this documentation: > <http://lucene.apache.org/java/2_3_2/queryparsersyntax.html> > >> Now if I wanted to grow on that and add that the name "test" must be >> in the category name I thought I would add the following: >> >> >> ?q=%22Jake%22+AND+category_facet:+test > > A couple of things here... + goes in front of the field selector in this > case. +category_facet:test, but also with AND in there the + is > superfluous. AND automatically makes both sides of it required. Another > thing to be careful of - category_facet is likely a "string" field, and thus > it can only be queried for exactly the entire content of the field, not > words within it. > >> The two fields below are both string fields and I'm using copyField to >> copy them to text fields: > > Then in your example you'll want to be sure to query on the text fields, not > the string ones. > >Erik > >
Querying Greater Than and Less Than
Hello, I was trying to figure out how to query ranges greater than and less than. The closest solution I could find was using the range format: field:[x TO z] While this solution works for querying greater than items how would I query all items less than 10 assuming I have some items that have a negative number that should be selected as well. The closest thing I've came to was this: field:[0 TO 10] Given I don't know what is the smallest negative number but I want to be able to somehow be able to get all items, is there a way somehow? Thanks, - Jake
How does Solr search when a field is not specified?
Hello, I was wondering how does Solr search when a field is not specified, just a query? Say for example I got the following: ?q="Jake" AND "Test" I have a mixture of integer, string, and text columns. Some indexed, some stored, and some string fields copied to text fields. Say I have a string field with the value "Jake is Testing" which is also copied to a text field. If I did not copyField that string field to a text field then would the above query not return any results if the word "Jake" and "Test" are not found anywhere else since we cannot do fulltext searches on string fields? Lastly, is there a limit how many characters can be in a string and text field? Thanks, - Jake
Re: How does Solr search when a field is not specified?
Thanks Otis! :) On Tue, Aug 26, 2008 at 10:47 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Jake, > > Yes, that field would have to be some kind of an analyzed field (e.g. text), > not string if you wanted that query to match "Jake is Testing" input. There > are no built-in Lucene or Solr-specific limits on field lengths. There is > one parameter called maxFieldLength in Solr's solrconfig.xml, I think, > which tells Lucene how many tokens to consider for indexing. If you don't > want that limit, increase that parameter's value to the max. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Jake Conk <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Tuesday, August 26, 2008 4:38:09 PM >> Subject: How does Solr search when a field is not specified? >> >> Hello, >> >> I was wondering how does Solr search when a field is not specified, >> just a query? Say for example I got the following: >> >> ?q="Jake" AND "Test" >> >> I have a mixture of integer, string, and text columns. Some indexed, >> some stored, and some string fields copied to text fields. >> >> Say I have a string field with the value "Jake is Testing" which is >> also copied to a text field. If I did not copyField that string field >> to a text field then would the above query not return any results if >> the word "Jake" and "Test" are not found anywhere else since we cannot >> do fulltext searches on string fields? >> >> Lastly, is there a limit how many characters can be in a string and text >> field? >> >> Thanks, >> - Jake > >
copyField: String vs Text Field
Hello, I was wondering if there was an added advantage in using to copy a string field to a text field? If the field is copied to a text field then why not just make the field a text field and eliminate copying its data? If you are going to use full text searching on that field which you cant do with string fields wouldn't it just make sense to keep it a text field since it has the same abilities as a string field and more? ... Or is the reason because string fields have better performance on matching exact strings than text fields? Thanks, - Jake
Re: copyField: String vs Text Field
Yonik, Thanks for the reply. Does that mean that if I were to edit the data then the field it was copied to will not be updated? I assume it does get deleted if I delete the record right? I understand how it can make searching simpler by copying fields to one but would that really make it faster? How? Thanks, - Jake On Wed, Aug 27, 2008 at 2:22 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > Jake, copyField exists to decouple document values (on the update > size) from how they are indexed. > > From the example schema: > > > -Yonik > > On Wed, Aug 27, 2008 at 4:46 PM, Jake Conk <[EMAIL PROTECTED]> wrote: >> Hello, >> >> I was wondering if there was an added advantage in using >> to copy a string field to a text field? >> >> If the field is copied to a text field then why not just make the >> field a text field and eliminate copying its data? >> >> If you are going to use full text searching on that field which you >> cant do with string fields wouldn't it just make sense to keep it a >> text field since it has the same abilities as a string field and more? >> >> ... Or is the reason because string fields have better performance on >> matching exact strings than text fields? >> >> Thanks, >> >> - Jake >> >
Re: copyField: String vs Text Field
Hi Walter, What do you mean by when you stemmed and stopped your title field? Thanks, - Jake On Wed, Aug 27, 2008 at 7:41 PM, Walter Underwood <[EMAIL PROTECTED]> wrote: > On 8/27/08 5:54 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote: >> >> That's really only one use case though... the other being to have a >> single stored field that is analyzed multiple different ways. > > We are the other use case. We take a title and put it in three > fields: one merely lowercased, one stemmed and stopped, and one > phonetic. At query time, we search all three with decreasing > weights. An exact match is weighted more than a stemmed and > stopped match, and so on. > > wunder > -- > Search Guy, Netflix > > >
Searching Question
Hello, We are using Solr for our new forums search feature. If possible when searching for the word "Halo" we would like threads that contain the word "Halo" the most with the least amount of posts in that thread to have a higher score. For instance, if we have a thread with 10 posts and the word "Halo" shows up 5 times then that should have a lower score than a thread that has the word "Halo" 3 times within its posts and has 5 replies. Basically the thread that shows the search string most frequently amongst the number of posts in the thread should be the one with the highest score. Is something like this possible? Thanks, - JC
Re: Searching Question
Grant, Each post is its own document but I can merge them all into a single document under one thread if that will allow me to do what I want. The number of replies is stored both in Solr and the DB. Thanks, - JC On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Is a thread and all of it's posts a single document? In other words, how > are you modeling your posts as Solr documents? Also, where are you keeping > track of the number of replies? Is that in Solr or in a DB? > > -Grant > > On Sep 25, 2008, at 8:51 PM, Jake Conk wrote: > >> Hello, >> >> We are using Solr for our new forums search feature. If possible when >> searching for the word "Halo" we would like threads that contain the >> word "Halo" the most with the least amount of posts in that thread to >> have a higher score. >> >> For instance, if we have a thread with 10 posts and the word "Halo" >> shows up 5 times then that should have a lower score than a thread >> that has the word "Halo" 3 times within its posts and has 5 replies. >> Basically the thread that shows the search string most frequently >> amongst the number of posts in the thread should be the one with the >> highest score. >> >> Is something like this possible? >> >> Thanks, >> >> - JC > > -- > Grant Ingersoll > http://www.lucidimagination.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > >
Re: Searching Question
How would I write a custom Similarity factor that overrides the TF function? Is there some documentation on that somewhere? On Sat, Sep 27, 2008 at 5:14 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Sep 26, 2008, at 2:10 PM, Otis Gospodnetic wrote: > >> It might be easiest to store the thread ID and the number of replies in >> the thread in each post Document in Solr. > > Yeah, but that would mean updating every document in a thread every time a > new reply is added. > > I still keep going back to the solution as putting all the replies in a > single document, and then using a custom Similarity factor that overrides > the TF function and/or the length normalization. Still, this suffers from > having to update the document for every new reply. > > Let's take a step back... > > Can I ask why you want the scoring this way? What have you seen in your > results that leads you to believe it is the correct way? Note, I'm not > trying to convince you it's wrong, I just want to better understand what's > going on. > > >> >> >> Otherwise it sounds like you'll have to combine some search results or >> data post-search. >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> - Original Message >>> >>> From: Jake Conk <[EMAIL PROTECTED]> >>> To: solr-user@lucene.apache.org >>> Sent: Friday, September 26, 2008 1:50:37 PM >>> Subject: Re: Searching Question >>> >>> Grant, >>> >>> Each post is its own document but I can merge them all into a single >>> document under one thread if that will allow me to do what I want. >>> The number of replies is stored both in Solr and the DB. >>> >>> Thanks, >>> >>> - JC >>> >>> On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote: >>>> >>>> Is a thread and all of it's posts a single document? In other words, >>>> how >>>> are you modeling your posts as Solr documents? Also, where are you >>>> keeping >>>> track of the number of replies? Is that in Solr or in a DB? >>>> >>>> -Grant >>>> >>>> On Sep 25, 2008, at 8:51 PM, Jake Conk wrote: >>>> >>>>> Hello, >>>>> >>>>> We are using Solr for our new forums search feature. If possible when >>>>> searching for the word "Halo" we would like threads that contain the >>>>> word "Halo" the most with the least amount of posts in that thread to >>>>> have a higher score. >>>>> >>>>> For instance, if we have a thread with 10 posts and the word "Halo" >>>>> shows up 5 times then that should have a lower score than a thread >>>>> that has the word "Halo" 3 times within its posts and has 5 replies. >>>>> Basically the thread that shows the search string most frequently >>>>> amongst the number of posts in the thread should be the one with the >>>>> highest score. >>>>> >>>>> Is something like this possible? >>>>> >>>>> Thanks, >>>>> >>>>> >
Stored field question
Hello, I have a field with the following definition... I'm not storing the data because I never need to retrieve it but each *_t_ns_mv field is indexed and has a specific boost value... I added this field with the word "test" as the value but when I search for "test" no results come up in my unstored field unless I put the word "test" in a field that is stored. Do I have a misunderstanding of how to use stored/unstored fields? Can someone help me clarify it? Thanks, - Jake C
Querying Ranges Problem
I have the following query: q=(+thread_title_t:test OR +posts_t_ns_mv:test) AND locked_i:0 AND replies_i:[50 TO *] I have replies_i which is an integer field set to return me back documents that have a value 50 or greater but the problem is I'm getting back results with the replied_i field column with lesser than 50 results. I tried other things like "replies_i:[50 TO 1000]" but I'm still getting results with the replies_i field under 50. Am I doing this wrong or is my other party of the query somehow affecting the replies_i value? I tried removing the other part of the query but I still get unexpected results. Please help! Thanks, - Jake C.
Trying to exclude integer field with certain numbers
Hello, I am trying to exclude certain records from my search results in my query by specifying which ones I don't want back but its not working as expected. Here is my query: +message:test AND (-thread_id:123 OR -thread_id:456 OR -thread_id:789) So basically I just want anything back that has the word "test" anywhere in the message text field and does not contain the thread id 123, 456, or 789. When I execute that query I get no results back. When I just execute +message:test then I get results back and some of them with the thread ids I listed above but when I try to exclude them like that it doesn't work. Anyone have any idea how do I fix this? Thanks, - Jake C.