Field Types Question

2008-08-11 Thread Jake Conk
I was wondering what are the differences in certain field types? For
instance what's the difference between the following?

integer / sint
float / sfloat
text / textzh

Also, if I have two dynamic fields for instance *_facet and *_facet_mv
which both have the type set to string does it really matter which one
I use?

Thanks,
- Jake


Re: Field Types Question

2008-08-12 Thread Jake Conk
Thanks Erik!

On Tue, Aug 12, 2008 at 1:58 AM, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
>
> On Aug 11, 2008, at 9:28 PM, Jake Conk wrote:
>>
>> I was wondering what are the differences in certain field types? For
>> instance what's the difference between the following?
>>
>> integer / sint
>> float / sfloat
>
> The difference is the internal representation of the String value of the
> term representing the numbers.  The "s" prefix means the terms are sortable,
> in that they are in ascending _numerical_ (not textual) order within the
> index.
>
>> text / textzh
>
> Looks like maybe you're picking up an acts_as_solr schema (which uses
> text_zh, though).  "zh" is the country code for China... and that field type
> is likely configured to use a Chinese-savvy analyzer.
>
>> Also, if I have two dynamic fields for instance *_facet and *_facet_mv
>> which both have the type set to string does it really matter which one
>> I use?
>
> If the field types are identical, then no it won't matter which you use -
> the same thing will happen internally.
>
>Erik
>
>


Searching Questions

2008-08-12 Thread Jake Conk
1) I want to search only within a specific field, for instance
`category`. Is there a way to do this?

2) When searching for multiple results are the following identical
since "*_facet" and "*_facet_mv" have their type's both set to string?

/select?q=tag_facet:%22John+McCain%22+OR+tag_facet:%22Barack+Obama%22
/select?q=tag_facet_mv:%22John+McCain%22+OR+tag_facet_mv:%22Barack+Obama%22

3) If I'm searching for something that is in a text field but I
specify it as a facet string rather than a text type would it still
search within text fields or would it just limit the search to string
fields?

4) Is there a page that will show me different querying combinations
or can someone post some more examples?

5) Anyone else notice returning back the data in php (&wt=phps)
doesn't unserialize? I am using PHP 5.3 w/ a nightly copy of Solr from
last week.

Thanks,
- Jake


Static Fields vs Dynamic Fields

2008-08-12 Thread Jake Conk
Is there a performance difference when using fields that are defined
in my schema vs dynamic fields?


Simple Searching Question

2008-08-14 Thread Jake Conk
Hello,

I inserted the following documents into Solr:

---



 124
 Jake Conk


 125
 Jake Conk



---

id is the only required integer field.
foobar_facet is a dynamic string field.

When I try to search for anything with the word Jake in it the
following ways I get no results.


select?q=Jake
select?q=Jake*


I thought one of those two should work but the only way I got it to
work was by specifying which field "Jake" is in along with a wild
card.


select?q=foobar_facet:Jake*


1) Does this mean for each field I would like to search if Jake exists
I would have to add each field like I did above to the query?

2) How would I search if I want to find the name Jake anywhere in the
string? The documentation
(http://lucene.apache.org/java/docs/queryparsersyntax.html) states
that I cannot use a wildcard as the first character such as *Jake*

Thanks,
- Jake


Re: Simple Searching Question

2008-08-14 Thread Jake Conk
Hi Shalin,

"foobar_facet" is a dynamic field. Its defined in my schema like this:



I have the default search field set to text. Can I use more than one
default search field?

text

Thanks,
- Jake


On Thu, Aug 14, 2008 at 2:48 PM, Shalin Shekhar Mangar
<[EMAIL PROTECTED]> wrote:
> Hi Jake,
>
> What is the type of the foobar_facet field in your schema.xml ?
> Did you add foobar_facet as the default search field?
>
> On Fri, Aug 15, 2008 at 3:13 AM, Jake Conk <[EMAIL PROTECTED]> wrote:
>
>> Hello,
>>
>> I inserted the following documents into Solr:
>>
>>
>> ---
>>
>> 
>> 
>>  124
>>  Jake Conk
>> 
>> 
>>  125
>>  Jake Conk
>> 
>> 
>>
>>
>> ---
>>
>> id is the only required integer field.
>> foobar_facet is a dynamic string field.
>>
>> When I try to search for anything with the word Jake in it the
>> following ways I get no results.
>>
>>
>> select?q=Jake
>> select?q=Jake*
>>
>>
>> I thought one of those two should work but the only way I got it to
>> work was by specifying which field "Jake" is in along with a wild
>> card.
>>
>>
>> select?q=foobar_facet:Jake*
>>
>>
>> 1) Does this mean for each field I would like to search if Jake exists
>> I would have to add each field like I did above to the query?
>>
>> 2) How would I search if I want to find the name Jake anywhere in the
>> string? The documentation
>> (http://lucene.apache.org/java/docs/queryparsersyntax.html) states
>> that I cannot use a wildcard as the first character such as *Jake*
>>
>> Thanks,
>> - Jake
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Simple Searching Question

2008-08-14 Thread Jake Conk
Rob,

Actually I am copying *_facet to text. I have the following for
copyField in my schema:

 
 


This is my field configuration in my schema:

 
   
   
   
   


   
   
   
   
   
   
   
   
   

   
   
   
   
   
   
 

Thanks,
- Jake



On Thu, Aug 14, 2008 at 5:49 PM, Rob Casson <[EMAIL PROTECTED]> wrote:
> you're likely not copyField-ing *_facet to text, and we'd need to see
> what type of field it is to see how it will be analyzed at both
> search/index time.
>
> the default schema.xml file is pretty well documented, so you might
> want to spend some time looking thru it, and reading the
> commentslots of good info in there.
>
> cheers,
> rob
>
> On Thu, Aug 14, 2008 at 7:17 PM, Jake Conk <[EMAIL PROTECTED]> wrote:
>> Hi Shalin,
>>
>> "foobar_facet" is a dynamic field. Its defined in my schema like this:
>>
>> 
>>
>> I have the default search field set to text. Can I use more than one
>> default search field?
>>
>> text
>>
>> Thanks,
>> - Jake
>>
>>
>> On Thu, Aug 14, 2008 at 2:48 PM, Shalin Shekhar Mangar
>> <[EMAIL PROTECTED]> wrote:
>>> Hi Jake,
>>>
>>> What is the type of the foobar_facet field in your schema.xml ?
>>> Did you add foobar_facet as the default search field?
>>>
>>> On Fri, Aug 15, 2008 at 3:13 AM, Jake Conk <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hello,
>>>>
>>>> I inserted the following documents into Solr:
>>>>
>>>>
>>>> ---
>>>>
>>>> 
>>>> 
>>>>  124
>>>>  Jake Conk
>>>> 
>>>> 
>>>>  125
>>>>  Jake Conk
>>>> 
>>>> 
>>>>
>>>>
>>>> ---
>>>>
>>>> id is the only required integer field.
>>>> foobar_facet is a dynamic string field.
>>>>
>>>> When I try to search for anything with the word Jake in it the
>>>> following ways I get no results.
>>>>
>>>>
>>>> select?q=Jake
>>>> select?q=Jake*
>>>>
>>>>
>>>> I thought one of those two should work but the only way I got it to
>>>> work was by specifying which field "Jake" is in along with a wild
>>>> card.
>>>>
>>>>
>>>> select?q=foobar_facet:Jake*
>>>>
>>>>
>>>> 1) Does this mean for each field I would like to search if Jake exists
>>>> I would have to add each field like I did above to the query?
>>>>
>>>> 2) How would I search if I want to find the name Jake anywhere in the
>>>> string? The documentation
>>>> (http://lucene.apache.org/java/docs/queryparsersyntax.html) states
>>>> that I cannot use a wildcard as the first character such as *Jake*
>>>>
>>>> Thanks,
>>>> - Jake
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>
>


Querying Question

2008-08-21 Thread Jake Conk
Hello,

I'm having trouble using the + operator. According to the
documentation if I put that operator in front of any term then it
should find that term anywhere within the field.

So if I want all the records that have the name "Jake" in them I
started with a simple query that works:


?q=Jake


Now if I wanted to grow on that and add that the name "test" must be
in the category name I thought I would add the following:


?q=%22Jake%22+AND+category_facet:+test


But that doesn't work. A matter of fact, whenever I specify exactly
which field I want to use the + or - operator it never works. So going
back to the first example that does work but if I were to do this:


?q=fullname_facet:+Jake


... then that would not return back any results either. The closest
I've gotten was the following query which sorta works:


?q=Jake+AND+%22+test%22


For the time being that query works but is not what I want because I
need to specify exactly which fields are allowed to have Jake and
+test. I don't want results returned when another field has the word
"test" in it. Am I doing something wrong? Please help.

The two fields below are both string fields and I'm using copyField to
copy them to text fields:

---------


 124
 Jake Conk
 My test category


 125
 Jake Conk
 My test category


--

Thanks,
- Jake


Re: Querying Question

2008-08-21 Thread Jake Conk
I thought if I used  to copy my string field to a text
field then I can search for words within it and not limited to the
entire content. Did I misunderstand that?

Thanks,
- Jake

On Thu, Aug 21, 2008 at 5:53 PM, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
>
> On Aug 21, 2008, at 7:33 PM, Jake Conk wrote:
>>
>> I'm having trouble using the + operator. According to the
>> documentation if I put that operator in front of any term then it
>> should find that term anywhere within the field.
>
> Be sure to look at this documentation:
> <http://lucene.apache.org/java/2_3_2/queryparsersyntax.html>
>
>> Now if I wanted to grow on that and add that the name "test" must be
>> in the category name I thought I would add the following:
>>
>>
>> ?q=%22Jake%22+AND+category_facet:+test
>
> A couple of things here... + goes in front of the field selector in this
> case.  +category_facet:test, but also with AND in there the + is
> superfluous.  AND automatically makes both sides of it required.   Another
> thing to be careful of - category_facet is likely a "string" field, and thus
> it can only be queried for exactly the entire content of the field, not
> words within it.
>
>> The two fields below are both string fields and I'm using copyField to
>> copy them to text fields:
>
> Then in your example you'll want to be sure to query on the text fields, not
> the string ones.
>
>Erik
>
>


Querying Greater Than and Less Than

2008-08-26 Thread Jake Conk
Hello,

I was trying to figure out how to query ranges greater than and less
than. The closest solution I could find was using the range format:

field:[x TO z]

While this solution works for querying greater than items how would I
query all items less than 10 assuming I have some items that have a
negative number that should be selected as well. The closest thing
I've came to was this:

field:[0 TO 10]

Given I don't know what is the smallest negative number but I want to
be able to somehow be able to get all items, is there a way somehow?

Thanks,

- Jake


How does Solr search when a field is not specified?

2008-08-26 Thread Jake Conk
Hello,

I was wondering how does Solr search when a field is not specified,
just a query? Say for example I got the following:

?q="Jake" AND "Test"

I have a mixture of integer, string, and text columns. Some indexed,
some stored, and some string fields copied to text fields.

Say I have a string field with the value "Jake is Testing" which is
also copied to a text field. If I did not copyField that string field
to a text field then would the above query not return any results if
the word "Jake" and "Test" are not found anywhere else since we cannot
do fulltext searches on string fields?

Lastly, is there a limit how many characters can be in a string and text field?

Thanks,
- Jake


Re: How does Solr search when a field is not specified?

2008-08-27 Thread Jake Conk
Thanks Otis! :)

On Tue, Aug 26, 2008 at 10:47 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Jake,
>
> Yes, that field would have to be some kind of an analyzed field (e.g. text), 
> not string if you wanted that query to match "Jake is Testing" input.  There 
> are no built-in Lucene or Solr-specific limits on field lengths.  There is 
> one parameter called maxFieldLength in Solr's solrconfig.xml, I think, 
> which tells Lucene how many tokens to consider for indexing.  If you don't 
> want that limit, increase that parameter's value to the max.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: Jake Conk <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, August 26, 2008 4:38:09 PM
>> Subject: How does Solr search when a field is not specified?
>>
>> Hello,
>>
>> I was wondering how does Solr search when a field is not specified,
>> just a query? Say for example I got the following:
>>
>> ?q="Jake" AND "Test"
>>
>> I have a mixture of integer, string, and text columns. Some indexed,
>> some stored, and some string fields copied to text fields.
>>
>> Say I have a string field with the value "Jake is Testing" which is
>> also copied to a text field. If I did not copyField that string field
>> to a text field then would the above query not return any results if
>> the word "Jake" and "Test" are not found anywhere else since we cannot
>> do fulltext searches on string fields?
>>
>> Lastly, is there a limit how many characters can be in a string and text 
>> field?
>>
>> Thanks,
>> - Jake
>
>


copyField: String vs Text Field

2008-08-27 Thread Jake Conk
Hello,

I was wondering if there was an added advantage in using 
to copy a string field to a text field?

If the field is copied to a text field then why not just make the
field a text field and eliminate copying its data?

If you are going to use full text searching on that field which you
cant do with string fields wouldn't it just make sense to keep it a
text field since it has the same abilities as a string field and more?

... Or is the reason because string fields have better performance on
matching exact strings than text fields?

Thanks,

- Jake


Re: copyField: String vs Text Field

2008-08-27 Thread Jake Conk
Yonik,

Thanks for the reply. Does that mean that if I were to edit the data
then the field it was copied to will not be updated? I assume it does
get deleted if I delete the record right? I understand how it can make
searching simpler by copying fields to one but would that really make
it faster? How?

Thanks,
- Jake

On Wed, Aug 27, 2008 at 2:22 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Jake, copyField exists to decouple document values (on the update
> size) from how they are indexed.
>
> From the example schema:
>  
>
> -Yonik
>
> On Wed, Aug 27, 2008 at 4:46 PM, Jake Conk <[EMAIL PROTECTED]> wrote:
>> Hello,
>>
>> I was wondering if there was an added advantage in using 
>> to copy a string field to a text field?
>>
>> If the field is copied to a text field then why not just make the
>> field a text field and eliminate copying its data?
>>
>> If you are going to use full text searching on that field which you
>> cant do with string fields wouldn't it just make sense to keep it a
>> text field since it has the same abilities as a string field and more?
>>
>> ... Or is the reason because string fields have better performance on
>> matching exact strings than text fields?
>>
>> Thanks,
>>
>> - Jake
>>
>


Re: copyField: String vs Text Field

2008-08-27 Thread Jake Conk
Hi Walter,

What do you mean by when you stemmed and stopped your title field?

Thanks,
- Jake




On Wed, Aug 27, 2008 at 7:41 PM, Walter Underwood
<[EMAIL PROTECTED]> wrote:
> On 8/27/08 5:54 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote:
>>
>> That's really only one use case though... the other being to have a
>> single stored field that is analyzed multiple different ways.
>
> We are the other use case. We take a title and put it in three
> fields: one merely lowercased, one stemmed and stopped, and one
> phonetic. At query time, we search all three with decreasing
> weights. An exact match is weighted more than a stemmed and
> stopped match, and so on.
>
> wunder
> --
> Search Guy, Netflix
>
>
>


Searching Question

2008-09-25 Thread Jake Conk
Hello,

We are using Solr for our new forums search feature. If possible when
searching for the word "Halo" we would like threads that contain the
word "Halo" the most with the least amount of posts in that thread to
have a higher score.

For instance, if we have a thread with 10 posts and the word "Halo"
shows up 5 times then that should have a lower score than a thread
that has the word "Halo" 3 times within its posts and has 5 replies.
Basically the thread that shows the search string most frequently
amongst the number of posts in the thread should be the one with the
highest score.

Is something like this possible?

Thanks,

- JC


Re: Searching Question

2008-09-26 Thread Jake Conk
Grant,

Each post is its own document but I can merge them all into a single
document under one  thread if that will allow me to do what I want.
The number of replies is stored both in Solr and the DB.

Thanks,

- JC

On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> Is a thread and all of it's posts a single document?  In other words, how
> are you modeling your posts as Solr documents?  Also, where are you keeping
> track of the number of replies?  Is that in Solr or in a DB?
>
> -Grant
>
> On Sep 25, 2008, at 8:51 PM, Jake Conk wrote:
>
>> Hello,
>>
>> We are using Solr for our new forums search feature. If possible when
>> searching for the word "Halo" we would like threads that contain the
>> word "Halo" the most with the least amount of posts in that thread to
>> have a higher score.
>>
>> For instance, if we have a thread with 10 posts and the word "Halo"
>> shows up 5 times then that should have a lower score than a thread
>> that has the word "Halo" 3 times within its posts and has 5 replies.
>> Basically the thread that shows the search string most frequently
>> amongst the number of posts in the thread should be the one with the
>> highest score.
>>
>> Is something like this possible?
>>
>> Thanks,
>>
>> - JC
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>


Re: Searching Question

2008-09-30 Thread Jake Conk
How would I write a custom Similarity factor that overrides the TF
function? Is there some documentation on that somewhere?

On Sat, Sep 27, 2008 at 5:14 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
>
> On Sep 26, 2008, at 2:10 PM, Otis Gospodnetic wrote:
>
>> It might be easiest to store the thread ID and the number of replies in
>> the thread in each post Document in Solr.
>
> Yeah, but that would mean updating every document in a thread every time a
> new reply is added.
>
> I still keep going back to the solution as putting all the replies in a
> single document, and then using a custom Similarity factor that overrides
> the TF function and/or the length normalization.  Still, this suffers from
> having to update the document for every new reply.
>
> Let's take a step back...
>
> Can I ask why you want the scoring this way?  What have you seen in your
> results that leads you to believe it is the correct way?  Note, I'm not
> trying to convince you it's wrong, I just want to better understand what's
> going on.
>
>
>>
>>
>> Otherwise it sounds like you'll have to combine some search results or
>> data post-search.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message 
>>>
>>> From: Jake Conk <[EMAIL PROTECTED]>
>>> To: solr-user@lucene.apache.org
>>> Sent: Friday, September 26, 2008 1:50:37 PM
>>> Subject: Re: Searching Question
>>>
>>> Grant,
>>>
>>> Each post is its own document but I can merge them all into a single
>>> document under one  thread if that will allow me to do what I want.
>>> The number of replies is stored both in Solr and the DB.
>>>
>>> Thanks,
>>>
>>> - JC
>>>
>>> On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote:
>>>>
>>>> Is a thread and all of it's posts a single document?  In other words,
>>>> how
>>>> are you modeling your posts as Solr documents?  Also, where are you
>>>> keeping
>>>> track of the number of replies?  Is that in Solr or in a DB?
>>>>
>>>> -Grant
>>>>
>>>> On Sep 25, 2008, at 8:51 PM, Jake Conk wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We are using Solr for our new forums search feature. If possible when
>>>>> searching for the word "Halo" we would like threads that contain the
>>>>> word "Halo" the most with the least amount of posts in that thread to
>>>>> have a higher score.
>>>>>
>>>>> For instance, if we have a thread with 10 posts and the word "Halo"
>>>>> shows up 5 times then that should have a lower score than a thread
>>>>> that has the word "Halo" 3 times within its posts and has 5 replies.
>>>>> Basically the thread that shows the search string most frequently
>>>>> amongst the number of posts in the thread should be the one with the
>>>>> highest score.
>>>>>
>>>>> Is something like this possible?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>


Stored field question

2008-10-06 Thread Jake Conk
Hello,

I have a field with the following definition...



I'm not storing the data because I never need to retrieve it but each
*_t_ns_mv field is indexed and has a specific boost value... I added
this field with the word "test" as the value but when I search for
"test" no results come up in my unstored field unless I put the word
"test" in a field that is stored.

Do I have a misunderstanding of how to use stored/unstored fields? Can
someone help me clarify it?

Thanks,
- Jake C


Querying Ranges Problem

2008-11-24 Thread Jake Conk
I have the following query:


q=(+thread_title_t:test OR +posts_t_ns_mv:test) AND locked_i:0 AND
replies_i:[50 TO *]


I have replies_i which is an integer field set to return me back
documents that have a value 50 or greater but the problem is I'm
getting back results with the replied_i field column with lesser than
50 results.

I tried other things like "replies_i:[50 TO 1000]" but I'm still
getting results with the replies_i field under 50.

Am I doing this wrong or is my other party of the query somehow
affecting the replies_i value? I tried removing the other part of the
query but I still get unexpected results.


Please help!

Thanks,

- Jake C.


Trying to exclude integer field with certain numbers

2008-12-02 Thread Jake Conk
Hello,

I am trying to exclude certain records from my search results in my
query by specifying which ones I don't want back but its not working
as expected. Here is my query:

+message:test AND (-thread_id:123 OR -thread_id:456 OR -thread_id:789)

So basically I just want anything back that has the word "test"
anywhere in the message text field and does not contain the thread id
123, 456, or 789.

When I execute that query I get no results back. When I just execute
+message:test then I get results back and some of them with the thread
ids I listed above but when I try to exclude them like that it doesn't
work.

Anyone have any idea how do I fix this?

Thanks,

- Jake C.