word delimiter

2010-08-05 Thread j
I have UPPER12-lower and would like to be able to find it with queries
"UPPER" or "lower". What should break this up for the index? A
tokenizer or a filter such as WordDelimiterFilterFactory?

I have tried various combinations of parameters to
WordDelimiterFilterFactory and cant get it to split properly. Here are
the results from using standard tokenizer followed directly by the
WordDelimiterFilterFactory markup below (from analysis.jsp):

1 | 2
UPPER12-lower | lower
---
UPPER  |
---
12   |





dismax debugging hyphens dashes

2010-08-07 Thread j
How does one debug index vs. dismax query parser?

I have a solr instance with 1 document whose title is "ABC12-def". I
am using dismax. While "abc", "12", and "def" do match, "abc12" and
"def" do not. Here is a the parsedquery_toString, I'm having trouble
understanding it:

+(id:abc12^3.0 | title:"(abc12 abc) 12"^1.5) (id:abc12^3.0 |
title:"(abc12 abc) 12"^1.5)

Does anyone have advice for getting this to work?


uniqueKey and custom fieldType

2010-08-13 Thread j
Does fieldType have any effect on the thing that I specify should be unique?

uniqueKey has been working for me up until recently. I change the
field that is unique from type "string" to a fieldType that I have
defined. Now when I do an update I get a newly created document (so
that I have duplicates).

Has anyone else had this problem before?


Re: uniqueKey and custom fieldType

2010-08-15 Thread j
I guess another way to pose the question is- what could cause
id   to no longer be respected?


The last chance I made since I noticed the problem of non-unique docs
was by changing field "title" from "string" to "SplitUpStuff". But I
dont understand how that could affect the uniqueness of a different
field called "id".


  




  







In order to make even a guess, we'd have to see your new
field type. Particularly its field definitions and the analysis
chain...

Best
Erick

On Fri, Aug 13, 2010 at 5:16 PM, j  wrote:

> Does fieldType have any effect on the thing that I specify should be
> unique?
>
> uniqueKey has been working for me up until recently. I change the
> field that is unique from type "string" to a fieldType that I have
> defined. Now when I do an update I get a newly created document (so
> that I have duplicates).
>
> Has anyone else had this problem before?
>


Re: uniqueKey and custom fieldType

2010-08-15 Thread j
Hi Erick, thanks- your explanation makes sense. But how then, do I
make my unique field useful in terms of searching. If I have a unique
column id with value:

sometexthere-1234567

and want it match the query '1234567', I need to use an analyzer to
split up the parts around the hyphen/dash. I guess I could make a copy
of that field in another field with gets analyzed?

Thanks for any advice.



The short answer is that unique keys should be s single
term. String types are guaranteed to be single, since they
aren't analyzed. Your SplitUpStuff type *does* analyze
terms, and can make multiple tokens out of single strings
via WordDelimterFactory.

A common error when thinking about the "string" the type is
not understanding that it is NOT analyzed. It's indexed as
a single term. So whey you define UniqueKey of type string,
it behaves as you expect. That is documents are updated if
the ID field matches exactly, case, spaces, order and all.

By introducing your "SplitUpStuff" type as UniqueKey, Well,
I don't even know what behavior I'd expect. And whatever
behavior I happened to observe would not be guaranteed to
be the behavior of the next release.

Consider what you're asking for and you can see why you
don't want to analyze your uniquekey field. Consider
the following simple text type (where each word is a term).
You have two values from two different docs
doc1: "this is a nice unique key"
doc2: "My Keys are Unique and Nice"

It's quite possible, with combinations of analyzers and stemmers
to index the exact same tokens, namely "nice", "unique" and "key"
for each document. Are these equivalent? Does order count?
Capitalization? It'd just be a nightmare to try to
explain/predict/implement.

Likely whatever behavior you do get is just whatever falls out of the
code. I'm not even sure any attempt is made to enforce uniqueness
on an analyzed field.

HTH
Erick

On Sun, Aug 15, 2010 at 11:59 AM, j  wrote:

> I guess another way to pose the question is- what could cause
> id   to no longer be respected?
>
>
> The last chance I made since I noticed the problem of non-unique docs
> was by changing field "title" from "string" to "SplitUpStuff". But I
> dont understand how that could affect the uniqueness of a different
> field called "id".
>
>  positionIncrementGap="100">
>  
>  generateWordParts="1" generateNumberParts="0" c
> ignoreCase="true"
>words="stopwords.txt"
> enablePositionIncrements="false"
>    />
>
> protected="protwords.txt"/>
> 
>  
> 
>
>
>
>
>
>
> In order to make even a guess, we'd have to see your new
> field type. Particularly its field definitions and the analysis
> chain...
>
> Best
> Erick
>
> On Fri, Aug 13, 2010 at 5:16 PM, j  wrote:
>
> > Does fieldType have any effect on the thing that I specify should be
> > unique?
> >
> > uniqueKey has been working for me up until recently. I change the
> > field that is unique from type "string" to a fieldType that I have
> > defined. Now when I do an update I get a newly created document (so
> > that I have duplicates).
> >
> > Has anyone else had this problem before?
> >
>


Getting unique key of a document inside of a Similarity class.

2015-02-19 Thread J-Pro

Good afternoon.

I need to uniquely identify a document inside of a Similarity class 
during scoring. Is it possible to get value of unique key of a document 
at this point?


For some time I though I can use internal docID for achieving that. 
Method score(int doc, float freq) is called after every query execution 
for each matched doc. For each indexed doc it equals 0, 1, 2, etc. But 
this is only when documents indexed in a bulk, i.e. in single HTTP 
request. But when docs are indexed in separate requests, these docIds 
equal 0 for all documents.


To summarize, here are 2 final questions:

1. Is docIds behavior described above a bug or a feature? Obviously, if 
it's a bug and I can use docID to uniquely identify a document, then my 
question is answered after this bug is fixed.
2. If docIds behavior described above is normal, then what is an 
alternative way of uniquely identify a document inside of a Similarity 
class during scoring? Can I get unique key of a scoring document in 
Similarity?


FYI: I have asked 1st question in #solr IRC channel. The person named 
hoss answered the following: "you're seeing the *internal* docIds ... 
you can't assign any special meaning to them ... i believe that at the 
level of the Similarity class, these may even be per segment, which 
means that in the context of a SegmentReader they can be used to get 
things like docValues, but they odn't have any meaning compared to your 
uniqueKey (for example)". This kinda makes me think that answer for the 
1st question is "it's a feature". But I am still not sure and don't know 
the answer to the 2nd question. Please help.


Thank you very much in advance.


Re: Getting unique key of a document inside of a Similarity class.

2015-02-19 Thread J-Pro
Thank you for your answer, Chris. I will reply with inline comments as 
well. Please see below.



: I need to uniquely identify a document inside of a Similarity class during
: scoring. Is it possible to get value of unique key of a document at this
: point?

Can you tell us a bit more about your usecase ... your problem description
is a bit vague, and sounds like it may be an "XY Problem"...


Sure, sorry I did not do it before, I just wanted to take minimum of 
your valuable time. So in my custom Similarity class I am trying to 
implement such a logic, where score calculation is only based on field 
weight and a field match - that's it. In other words, if a field matches 
the query, I want "score" method to return this field's weight only, 
regardless of factors like: norms; coord; doc frequencies; fact that 
field was multivalued and more than one value matched; fact that field 
was tokenized as multiple tokens and more than one token matched, etc. 
As far as I know, there is no such a similarity in list of existing ones.
In order to implement this, I am trying to score only once for a 
combination of a specific field + doc unique identifier. And I don't 
care what is this unique doc identifier - it can be unique key or it can 
be internal doc ID.
I had my implementation working, but as I understood from your answer, I 
had it working only for one segment. So now I need to add segment ID or 
something like this to my combination.




Assuming the method you are refering to (you didn't give a specific
class/interface name) is SimScorer.score(doc,req) then the javadocs say...

 doc - document id within the inverted index segment
 freq - sloppy term frequency

...so for #1, yes this is definitely the per-segment docId.


Yes, it's ExactSimScorer.score(int doc, int freq). Ah! Per segment! Here 
we go, then I understand why it's 0 every new commit! SOLR doc says new 
docs are written to a new segment. Then question #1 is clear for me. 
Thanks, Chris!




for #2: the methor for providing a SimScorer to lucene is by implementing
Similarity.simScorer(...) -- that method gets as an argument an
AtomicReaderContext context, which not only has an AtomicReader for the
individual segment, but also details about that segments role in the
larger index.


Interesting details, that may be exactly what I need. If I can somehow 
uniquely identify a document using its internal doc id + data from 
context (like segment id or something), that would be awesome. I have 
checked AtomicReaderContext, it has 'ord' (The readers ord in the 
top-level's leaves array) and 'docBase' (The readers absolute doc base) 
- probably what I need. Do you have any more information (maybe links to 
wikis) about this AtomicReaderContext, DocValues, "low" and "top" levels 
(other than javadoc in source code)? I have a high-level understanding, 
but it's obviously not enough for the problem I am solving. I would be 
more than happy to understand it.


Thank you very much for your time, Chris and other people who spend time 
on reading/answering this thread!


Re: Getting unique key of a document inside of a Similarity class.

2015-02-19 Thread J-Pro

how are you defining/specifying these field weights?


I define weights inside of a query (name:SomeName^7).



it would help if you could give a concrete example of some sample docs, a
sample query, and what results you would expect ... the sample input and
sample output of the system you are interested in.


Sure. Imagine we have 2 docs:

doc1
-
name:DocumentOne
place:34 High Street (StandardTokenizerFactory, i.e. 3 tokens created)

doc2
-
name:DocumentTwo
place:34 High Street (StandardTokenizerFactory, i.e. 3 tokens created)

I want the following queries return docs with scores:

1. name:DocumentOne^7 => doc1(score=7)
2. name:DocumentOne^7 AND place:notExist^3 => doc1(score=7)
3. place:(34\ High\ Street)^3 => doc1(score=3), doc2(score=3)
4. name:DocumentOne^7 OR place:(34\ High\ Street)^3 => doc1(score=10), 
doc2(score=3)



If you're curious about why do I need it, i.e. about my very initial 
"problem X", then I need this scoring to be able to calculate matching 
percentage. That's a separate topic, I read a lot about it (including 
http://wiki.apache.org/lucene-java/ScoresAsPercentages) and people say 
it's either not doable or very-very complicated with SOLR. So I just 
want to give it a try. For case #3 from above matching percentage is 
100% for both docs. For case #4 it's doc1:100% and doc2:30%.




it's not clear why you need any sort of unique document identification for
you scoring algorithm .. from what you described, matches on fieldA should
get score "A" matches on fieldB should get score "B" ... why does it mater
which doc is which?


For case #3, for example, method SimScorer.score is called 3 times for 
each of these documents, total 6 times for both. I have added a 
ThreadLocal> to my custom similarity, which is cleared 
every time before new scoring session (after each query execution). This 
HashSet stores strings consisting of fieldName + docID. Every time 
score() is called, I check this HashSet - if fieldName + docID exists, I 
return 0 as score, otherwise field weight.
If there was no docID in this string (only field name), then case #3 
would return the following: doc1(score=3), doc2(score=0). If there was 
no HashSet at all, case #3 would return: doc1(score=9), doc2(score=9) 
since query matched all 3 tokens for every doc.


I know that what I'm doing is a "hack", but that's the only way I've 
found so far to implement percentage matching. I just want to play 
around with it, see how it performs and decide whether to use it or not. 
But for that I need to uniquely identify a document while scoring :)


Re: Getting unique key of a document inside of a Similarity class.

2015-02-20 Thread J-Pro

from all the examples of what you've described, i'm fairly certain all you
really need is a TFIDF based Similarity where coord(), idf(), tf() and
queryNorm() return 1 allways, and you omitNorms from all fields.


Yeah, that's what I did in the very first iteration. It works only for 
cases #1 and #2. If you try query 3 and 4 with such Similarity, you'll get:


3. place:(34\ High\ Street)^3 => doc1(score=9), doc2(score=9)
4. name:DocumentOne^7 OR place:(34\ High\ Street)^3 => doc1(score=16), 
doc2(score=9)


That is not what I need. As I described above, in case of multiple 
tokens match for a field, method SimScorer.score is called X times, 
where X is number of matched tokens (in cases #3 and #4 there are 3 
tokens), therefore score sums up. I need to score only once in this 
case, regardless of number of tokens.


How to do it? First idea was HashSet based on fieldName, so that after 
scoring once, it don't score anymore. But in this case only first 
document was scoring (since second and other documents have the same 
field name). So I understood that I need also docID for that. And it 
worked fine until I found out (thank you for that) about that docID is 
segment-specific. So now I need segmentID as well (or something similar).




(You didn't give any examples of what you expect to happen with exclusion
clauses in your BooleanQueries


For my needs I won't need exclusion clauses, but in this case the same 
would happen - it would score depending on weight, because condition is 
true:


5. (NOT name:DocumentOne)^7 => doc2(score=7)


Data Import Handler - reading GET

2015-03-16 Thread Kiran J
Hi,

In data import handler, I can read the "clean" query parameter using
${dih.request.clean} and pass it on to the queries. Is it possible to read
any query parameter from the URL ? for eg ${foo} ?

Thanks


Re: Solr custom component issue

2015-05-11 Thread j 90
unsubscribe

On Mon, May 11, 2015 at 6:58 PM, Upayavira  wrote:

> attaching them to each request, then just add qf= as a param to the URL,
> easy.
>
> On Mon, May 11, 2015, at 12:17 PM, nutchsolruser wrote:
> > These boosting parameters will be configured outside Solr and there is
> > seperate module from which these values get populated , I am reading
> > those
> > values from external datasource and I want to attach them to each request
> > .
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Solr-custom-component-issue-tp4204799p4204832.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


DIH dataimport.properties Zulu time

2014-03-25 Thread Kiran J
Hi

Is it possible to set up the data import handler so that it keeps track of
the last imported time in Zulu time and not local time ?

Its not very clear from the documentation how to do it or if it is even
possible to do it.

Ref:

http://wiki.apache.org/solr/DataImportHandler#Configuring_The_Property_Writer


Thanks


Re: DIH dataimport.properties Zulu time

2014-03-27 Thread Kiran J
Thank you for the response. This works if I invoke start.jar with java. In
my usecase however, I need to invoke start.jar directly (consoleless
service so that the user cannot close it accidentally). It doesnt pickup
user.timezone property when done this way. Is it possible to do this using
the tag below somehow. I tried setting locale="UTC" and it didnt work.





On Tue, Mar 25, 2014 at 7:45 PM, Gora Mohanty  wrote:

> On 26 March 2014 02:44, Kiran J  wrote:
> >
> > Hi
> >
> > Is it possible to set up the data import handler so that it keeps track
> of
> > the last imported time in Zulu time and not local time ?
> [...]
>
> Start your JVM with the desired timezone, e.g.,
> java -Duser.timezone=UTC -jar start.jar
>
> Regards,
> Gora
>


Re: DIH dataimport.properties Zulu time

2014-03-27 Thread Kiran J
I figured it out. I use SQL Server, so this is my solution :



In TSQL, this can be converted to a UTC date time using :

CONVERT(datetimeoffset, '${dih.last_index_time}', 127)

Refs:

http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html
http://msdn.microsoft.com/en-us/library/ms187928.aspx




On Thu, Mar 27, 2014 at 2:17 PM, Kiran J  wrote:

> Thank you for the response. This works if I invoke start.jar with java. In
> my usecase however, I need to invoke start.jar directly (consoleless
> service so that the user cannot close it accidentally). It doesnt pickup
> user.timezone property when done this way. Is it possible to do this using
> the tag below somehow. I tried setting locale="UTC" and it didnt work.
>
>  type="SimplePropertiesWriter" directory="data" filename="my_dih.properties" 
> locale="en_US" />
>
>
>
> On Tue, Mar 25, 2014 at 7:45 PM, Gora Mohanty  wrote:
>
>> On 26 March 2014 02:44, Kiran J  wrote:
>> >
>> > Hi
>> >
>> > Is it possible to set up the data import handler so that it keeps track
>> of
>> > the last imported time in Zulu time and not local time ?
>> [...]
>>
>> Start your JVM with the desired timezone, e.g.,
>> java -Duser.timezone=UTC -jar start.jar
>>
>> Regards,
>> Gora
>>
>
>


Example for DIH data source through query string

2013-07-15 Thread Kiran J
Hi,

I want to dynamically specify the data source in the URL when invoking data
import handler. I'm looking at this :

http://wiki.apache.org/solr/DataImportHandler#solrconfigdatasource

/home/username/data-config.xml   com.mysql.jdbc.Driver jdbc:mysql://localhost/dbname db_username db_password  



Can anyone give me a good example ?

ie http://localhost:8983/solr/dataimport?datasource=

Your help is much appreciated.

Thanks


Re: Example for DIH data source through query string

2013-07-15 Thread Kiran J
Thank you Alex.


On Mon, Jul 15, 2013 at 12:37 PM, Alexandre Rafalovitch
wrote:

> I don't think you can get there from here.
>
> But you can specify config file on a query line. If you only have a couple
> of configurations, you could have them in different files and switch that
> way.
>
> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Mon, Jul 15, 2013 at 2:56 PM, Kiran J  wrote:
>
> > Hi,
> >
> > I want to dynamically specify the data source in the URL when invoking
> data
> > import handler. I'm looking at this :
> >
> > http://wiki.apache.org/solr/DataImportHandler#solrconfigdatasource
> >
> >> class="org.apache.solr.handler.dataimport.DataImportHandler"> > name="defaults">   > name="config">/home/username/data-config.xml   > name="datasource">  > name="driver">com.mysql.jdbc.Driver  > name="url">jdbc:mysql://localhost/dbname  > name="user">db_username  > name="password">db_password  
> > 
> >
> >
> > Can anyone give me a good example ?
> >
> > ie http://localhost:8983/solr/dataimport?datasource=
> >
> > Your help is much appreciated.
> >
> > Thanks
> >
>


Solr Thrift APIs

2013-04-13 Thread Kiran J
Hi,

Is it possible to access Solr through thrift APIs ?

Thanks


Multi dimensional spatial search

2013-05-10 Thread Kiran J
Hi,

Does Solr support multi dimensional spatial search ?

http://en.wikipedia.org/wiki/K-d_tree

Thanks


Re: Multi dimensional spatial search

2013-05-24 Thread Kiran J
Thank you for the excellent explanation David.

My use case is in the signal processing area. I have a wave that is in time
domain & it is converted to frequency domain on 8 different bands (FFT) ie,
an 8D point. The question for me is "If I have a set of waves (8D points)
in the database and I have a lookup wave, what is the best match ?"




On Sat, May 11, 2013 at 10:42 PM, David Smiley (@MITRE.org) <
dsmi...@mitre.org> wrote:

> Hi Kiran.
>
> The often-forgotten PointType field type can be configured to hold a
> variable number of dimensions.  See the "dimension" attribute of the field
> type's configuration in the example schema.  This field type is really just
> a kind of a macro field type for a configurable number of numeric fields.
> To do a range search you could do:  myPointField:[x1,y1,z1 TO x2,y2,z2]  (3
> dimensional).  This works identically to AND'ing together a series of range
> searches on number fields you explicitly configure.  If the space you want
> to search isn't a simple set or ranges on the dimensions, then you might be
> stuck or at least forced to code a solution, but the scalability of any
> solution is very unlikely to be very scalable (i.e. if you have a ton of
> data this may be a problem).  There are some function-queries (AKA
> ValueSources) that compute special distance calculations in N-dimensional
> space:
> http://wiki.apache.org/solr/FunctionQuery#distdist(), hsine() and
> sqedist() can take a PointType or you can configure each numeric field
> individually and reference them as seen on the wiki.  One key limitation of
> PointType (and LatLonType) is that it does not support multi-valued fields
> (no multiple values for any dimension per document).
>
> You linked to info on "K-D Trees".  Lucene internally at its essence has
> *one* index scheme, which is a sorted list of bytes (often tokenized words
> in text but can be any bytes) that map to a list of document ids.  There
> are
> a class of trees in computer science called a "Trie" AKA "PrefixTree" that
> are fundamentally based on just sorted keys.
> http://en.wikipedia.org/wiki/Trie  Lucene's single-dimensional numeric
> range
> fields are in fact tries; they can store a variable number of
> single-dimensional values per document.  For 2-dimensional spatial, there
> is
> the SpatialPrefixTree abstraction with Geohash and QuadTree
> implementations.
> These support not just indexing Points but any Spatial4j shape, which is
> approximated to the grid.  Supporting an N-dimensional Trie would require
> writing a custom SpatialPrefixTree.  The spatial
> RecursivePrefixTreeStrategy
> has spatial search algorithms that use a SpatialPrefixTree but makes no
> assumptions about the dimensionality of the tree or related shapes, so no
> change there (pretty cool, I think).  You would need to implement some
> N-dimensional Spatial4j shapes. An n-dimensional Point -- trivial.  A
> multi-dimensional range shape (i.e. a square or cube or ...) is not hard.
> Anything more complex in N dimensions is going to get hard fast.  Finally,
> there needs to be a way to parse this shape, which for a custom hack could
> simply be a Solr QParser but longer term would be Spatial4j's not yet
> finished extensible WKT parser.
>
> That was probably more info than you care for but someone else may read
> this
> and find it interesting.
>
> Kiran, I'm curious, what do you want an n-dimensional field for?
>
> p.s. A new SpatialPrefixTree implementation is slated to be developed this
> summer as part of the Google Summer of Code (GSOC).  I hadn't planned to
> add
> N-dimensionality to the feature list.  It could be a stretch-goal maybe.
> https://issues.apache.org/jira/browse/LUCENE-4922
>
> ~ David
>
>
>
> Kiran Jayakumar wrote
> > Hi,
> >
> > Does Solr support multi dimensional spatial search ?
> >
> > http://en.wikipedia.org/wiki/K-d_tree
> >
> > Thanks
>
>
>
>
>
> -
>  Author:
> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multi-dimensional-spatial-search-tp4062515p4062646.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


DIH Delete with Full Import

2013-02-12 Thread Kiran J
Hi,

I'm using this configuration:

http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport

The wiki says: "In this case it means obviously that in case you also want
to use deletedPkQuery then when running the delta-import command is still
necessary."

In this link: http://wiki.apache.org/solr/DataImportHandler



   -

   *postImportDeleteQuery* : after full-import this will be used to cleanup
   the index . This is honored only on an entity that is an immediate
   sub-child of  Solr1.4 .

Is it possible for me to use full-import and postimportdeletequery ? I have
table that has the UUIDs of all the records that need to be deleted. Can I
define something like postImportDeleteQuery = "Select Id from
delete_log_table". Can someone provide me an example ?

Any help is much appreciated.

Thank you.


Re: DIH Delete with Full Import

2013-02-13 Thread Kiran J
Thank you Ahmet.

I figured it out. I had to define a separate entity which takes care of
deletes.





On Wed, Feb 13, 2013 at 1:32 AM, Ahmet Arslan  wrote:

> > define something like postImportDeleteQuery = "Select Id
> > from
> > delete_log_table". Can someone provide me an example ?
>
> postImportDeleteQuery and preImportDeleteQuery queries are lucene/solr
> queries. For example I am using the following:
>
> preImportDeleteQuery="document_type:(photo OR news OR video OR audio)"
>


Re: Start solr from different folder / .Net code

2013-02-19 Thread Kiran J
Thanks guys.

I ended up setting the working folder in ProcessStartInfo.WorkingDirectory
& it works fine.

Batch file though it works, I am not able to kill the Solr process from
code if I need to.


On Sun, Feb 17, 2013 at 1:12 PM, d_k  wrote:

> Might as well do:
> cd /d H:\downloads\apache-solr-3.6.0\example
>
> or add solr and java to the PATH environment variable
>
> On Sat, Feb 16, 2013 at 11:28 AM, 林辉林灯  wrote:
> > cd  H:\downloads\apache-solr-3.6.0\example
> > H:
> > java -jar start.jar
> >
> >
> >
> > save up codes as a bat file,then,start the bat
> > will be ok
> >
> >
> > -- Original --
> > From:  "Kiran J";
> > Date:  Sat, Feb 16, 2013 06:55 AM
> > To:  "solr-user";
> >
> > Subject:  Start solr from different folder / .Net code
> >
> >
> >
> > Hi everyone,
> >
> > How can I start Solr from a different folder in Windows ? I tried
> >
> > *java -cp "c:\\start.jar" -jar start.jar*
> >
> > I get this error:
> >
> > *Unable to access jarfile start.jar*
> > *
> > *
> > I need to be able to start Solr from .Net. I am able to do it if I wrap
> it
> > in a batch file by first CDing to that folder. Is there any other method
> to
> > do this ?
> > *
> > *
> > Any help is much appreciated.
> >
> > Thanks
>


SOLR - Recommendation on architecture

2013-03-08 Thread Kobe J
We are planning to use SOLR 4.1 for full text indexing. Following is the
hardware configuration of the web server that we plan to install SOLR on:-

*CPU*: 2 x Dual Core (4 cores)

*R**AM:* 12GB

*Storage*: 212GB

*OS Version* – Windows 2008 R2



The dataset to be imported will have approx.. 800k records, with 450 fields
per record. Query response time should be btw 200ms-800ms.



Please suggest if the current single server implementation should work fine
and if the specified configuration is enough for the requirement.


Grouping and sorting results

2012-01-31 Thread Vijay J
Hi,

I'm running into some issues with solr scoring affecting ordering of query
results. Is it possible to run a Solr boolean query to check if
a document contains any search terms and get the results back without any
scoring mechanism besides presence or absence of any of the search
terms? Basically I'd like to turn off tf-idf/Vector space scoring for one
query and get the ordering by boolean instead of score.

As an example, suppose I have a collection of wholesale providers and some
of the wholesale providers are preferred over other providers. I'd like
Solr to get all the preferred providers,
apply an ordering  then get all the non-preferred providers and apply an
ordering.  (i.e. give me a group of preferred providers and apply a sort to
the preferred provider result set and group non-preferred provider and sort
that result set)
The preferred status of the provider is not known ahead of time until query
execution.


Can you give us a lead on this?

Thank you!


Regards,
Vijay


Re: SolrIndex eats up lots of disk space for intermediate data

2012-06-23 Thread Harsh J
Hey Safdar,

This question is best asked on the Apache Solr mailing lists. I
believe you'll get better responses there, so I've redirected to
Solr's own list (solr-user[at]lucene.apache.org).

BCC'd common-user[at]hadoop.apache.org and CC'd you in case you
haven't subscribed to Solr.

On Sat, Jun 23, 2012 at 8:14 PM, Safdar Kureishy
 wrote:
> Hi,
>
> I couldn't find an answer to this question online, so I'm posting to the
> mailing list.
>
> I've got a crawl of about 10M *fetched* pages (crawl db has about 50 M
> pages, since it includes the fetched + failed + unfetched pages). I've also
> got a freshly updated linkdb and webgraphdb (having run linkrank). I'm
> trying to index the fetched pages (content + anchor links) using solrindex.
>
> When I launch the "bin/nutch solrindex   -linkdb 
> -dir " command, the disk space utilization really jumps.
> Before running the solrindex stage, I had about 50% of disk space remaining
> for HDFS on my nodes (5 nodes) -- I had consumed about 100G and had about
> 100G left over. However, when running the solrindex phase, by the end of
> the map phase, the disk space utilization nears 100% and the available HDFS
> space drops below 1%. Running "hadoop dfsadmin -report" shows that the jump
> in storage is for non-DFS data (i.e. intermediate data) and it happens
> during the map phase of the IndexerMapReduce job (solrindex).
>
> What can I do to reduce the intermediate data being generated for
> solrindex? Any configuration settings I should change? I'm using all the
> defaults, for the indexing phase, and I'm not using any custom plugins
> either.
>
> Thanks,
> Safdar



-- 
Harsh J


Re: occasional exception

2010-11-17 Thread j...@nuatech.net
Thanks a million Robert.

On 17 November 2010 11:36, Robert Muir  wrote:

> Thank you,
>
> Looks like the problem was
> https://issues.apache.org/jira/browse/SOLR-1667. I backported it to
> the 1.4 branch:
> http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/
>
> On Wed, Nov 17, 2010 at 4:48 AM, j...@nuatech.net 
> wrote:
> > Hi Richard,
> > My full schema.xml is below (and attached). Do you want me to raise this
> in
> > Jira?
> > Regards,
>


-- 
_
John G. Moylan


occasional exception

2010-11-19 Thread j...@nuatech.net
Hi,

I setup a Solr infrastructure a couple of months ago. So far the system has
worked well but I am occasionally  getting  stuck in loops where I keep
getting 500's returned for commits.

my Tomcat Catalina logs on the Solr master show the following issue:

Nov 14, 2010 2:41:46 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[http://www.rte.ie/news/2000/0428/sport.html,
http://www.rte.ie/news/1999/1019/moriarty.html,
http://www.rte.ie/news/2000/0216/sport.html,
http://www.rte.ie/news/2000/0715/explosion.html
, http://www.rte.ie/news/1999/0514/monk.html,
http://www.rte.ie/news/2001/0515/goodman.html,
http://www.rte.ie/news/2002/0415/easttimor.html,
http://www.rte.ie/news/2001/0901/u2.html, ... (8 added)
]} 0 181
Nov 14, 2010 2:41:46 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.IllegalArgumentException: Increment must be zero or
greater: -2147483648
at
org.apache.lucene.analysis.Token.setPositionIncrement(Token.java:322)
at
org.apache.lucene.analysis.TokenWrapper.setPositionIncrement(TokenWrapper.java:93)
at
org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:228)
at
org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:38)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:189)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:828)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:809)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2683)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2655)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:636)


I have a single RW solr master instance that has 2 indexers
updating/committing to it over HTTP. And a  third script deleting old
documents then committing over HTTP also.

I can't figure out what is wrong. A trawl through Google suggests issues
with custom tokenizers but I am using the built in ones in Solr 1.4.1. I am
running on the latest Tomcat 6. with java version "1.6.0_17"
OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-x86_64)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode).

Any help or pointers would be appreciated.

Regards,
John

-- 
_
John G. Moylan


Re: command line parameters for solr

2010-12-11 Thread J O
That's cool, I am just looking to hire someone to do some solr work for me. 

Please advise what's the best way to reach out solr development community for 
contract help?

/j

On Dec 11, 2010, at 2:59 PM, Erick Erickson  wrote:

In general, it's discouraged to send private e-mails unless invited since
the whole point of open source is to make source, solutions, etc available
to everyone. See: http://people.apache.org/~hossman/#private_q

Best
Erick

On Sat, Dec 11, 2010 at 1:08 AM, Jack O  wrote:

Tom,

I would like to reachout to directly. Whats your email address?

/j





From: Tom Hill 
To: solr-user@lucene.apache.org
Sent: Fri, December 10, 2010 9:43:08 PM
Subject: Re: command line parameters for solr

java -jar start.jar --help

More docs here
http://docs.codehaus.org/display/JETTY/A+look+at+the+start.jar+mechanism

Personally, I usually limit access to localhost by using whatever
firewall the machine uses.

Tom

On Fri, Dec 10, 2010 at 7:55 PM, Jack O  wrote:
Hello,

For starting solr, from where do i find the list of command line
parameters.

java -jar start.jar blahblah...

I am especially looking for how to specify my own jetty config file. I
want to
allow access of solr from localhost only.


I would really appreciate all your help.

/J











  


copyFields, multiple terms -- IDF?

2011-02-02 Thread Martin J
Hi, I'm having a weirdness with indexing multiple terms to a single field
using a copyField. An example:

For document A
field:contents_1 is a multivalued field containing "cat", "dog" and "duck"
field:contents_2 is a multivalued field containing "cat", "horse", and
"flower"

For document B
field:contents_1 is a multivalued field containing "cat" and "fish"
field:contents_2 is a multivalued field containing "bear" and "turkey"

I have a copyField in my schema:

 

A query like contents_1:cat contents_2:cat returns document A first, and
then document B. I think that is the way it should work.

But a query like combined:cat returns document B first. In my mind, when I
am doing a copyField I am copying each of the terms in the multivalued
fields of contents_1 and contents_2 into combined, so that combined
internally has "cat", "dog", "duck", "cat", "horse", "flower" for document
A.

An explain on the query says something like (this is from a real query not
the fake one above)



4.0687284 = (MATCH) fieldWeight(combined:cat in 1663089), product of: 1.0 =
tf(termFreq(combined:cat)=1) 4.0687284 = idf(docFreq=135688,
maxDocs=2919285) 1.0 = fieldNorm(field=combined, doc=1663089)


0.8509077 = (MATCH) fieldWeight(combined:cat in 913171), product of:
2.236068 = tf(termFreq(combined:cat)=5) 4.0590663 = idf(docFreq=143689,
maxDocs=3061697) 0.09375 = fieldNorm(field=combined, doc=913171)


If I am reading this right, it is finding the higher TF in A (5 in this
case) but still scoring B higher. Shouldn't idf be exactly the same?

(Both fields are a solr.TextField:

 
  






  

)

Another piece of perhaps relevant information is that this a query over 16
shards using distributed solr.


Re: copyFields, multiple terms -- IDF?

2011-02-02 Thread Martin J
On a closer review, i am noticing that the fieldNorm is what is killing
document A.
If I reindex with omitNorms=true, will this problem be "solved"?


On Wed, Feb 2, 2011 at 4:54 PM, Martin J  wrote:

> Hi, I'm having a weirdness with indexing multiple terms to a single field
> using a copyField. An example:
>
> For document A
> field:contents_1 is a multivalued field containing "cat", "dog" and "duck"
> field:contents_2 is a multivalued field containing "cat", "horse", and
> "flower"
>
> For document B
> field:contents_1 is a multivalued field containing "cat" and "fish"
> field:contents_2 is a multivalued field containing "bear" and "turkey"
>
> I have a copyField in my schema:
>
>  
>
> A query like contents_1:cat contents_2:cat returns document A first, and
> then document B. I think that is the way it should work.
>
> But a query like combined:cat returns document B first. In my mind, when I
> am doing a copyField I am copying each of the terms in the multivalued
> fields of contents_1 and contents_2 into combined, so that combined
> internally has "cat", "dog", "duck", "cat", "horse", "flower" for document
> A.
>
> An explain on the query says something like (this is from a real query not
> the fake one above)
>
> 
> 
> 4.0687284 = (MATCH) fieldWeight(combined:cat in 1663089), product of: 1.0 =
> tf(termFreq(combined:cat)=1) 4.0687284 = idf(docFreq=135688,
> maxDocs=2919285) 1.0 = fieldNorm(field=combined, doc=1663089)
> 
> 
> 0.8509077 = (MATCH) fieldWeight(combined:cat in 913171), product of:
> 2.236068 = tf(termFreq(combined:cat)=5) 4.0590663 = idf(docFreq=143689,
> maxDocs=3061697) 0.09375 = fieldNorm(field=combined, doc=913171)
> 
>
> If I am reading this right, it is finding the higher TF in A (5 in this
> case) but still scoring B higher. Shouldn't idf be exactly the same?
>
> (Both fields are a solr.TextField:
>
>  
>   
> 
> 
> 
> 
>  ignoreCase="true"/>
>  protected="protwords.txt"/>
>   
> 
> )
>
> Another piece of perhaps relevant information is that this a query over 16
> shards using distributed solr.
>


Filter query not null or in list

2012-09-27 Thread Kiran J
Hi everyone,

I have a group field which restricts the permission for each user. A user
can belong to multiple groups. A document can belong to only Group (ie) non
multi valued. There are some documents which are unrestricted, hence group
id is null. How can I use the filter for a given user so that it includes
results from both Group=NULL and Group=(X or Y or Z) ? I try something like
this, but doesnt work:

-Group:[* TO *] OR Group:(X OR Y OR Z)

Note that the Group is a UUID field. Is it possible to assign a default
UUID value ?

Any help is much appreciated.

Thanks
Kiran


Re: Filter query not null or in list

2012-09-28 Thread Kiran J
Thank you Jack, that works.

Kiran

On Thu, Sep 27, 2012 at 5:18 PM, Jack Krupansky wrote:

> Add a "*:*" before the negative query.
>
> (*:* -Group:[* TO *]) OR Group:(X OR Y OR Z)
>
> -- Jack Krupansky
>
> -Original Message- From: Kiran J Sent: Thursday, September 27,
> 2012 8:07 PM To: solr-user@lucene.apache.org Subject: Filter query not
> null or in list
> Hi everyone,
>
> I have a group field which restricts the permission for each user. A user
> can belong to multiple groups. A document can belong to only Group (ie) non
> multi valued. There are some documents which are unrestricted, hence group
> id is null. How can I use the filter for a given user so that it includes
> results from both Group=NULL and Group=(X or Y or Z) ? I try something like
> this, but doesnt work:
>
> -Group:[* TO *] OR Group:(X OR Y OR Z)
>
> Note that the Group is a UUID field. Is it possible to assign a default
> UUID value ?
>
> Any help is much appreciated.
>
> Thanks
> Kiran
>


DIH scheduling

2012-10-17 Thread Kiran J
Hi everyone,

Does Solr have out of the box data import handler scheduling ? This link
looks like I need to run an additional JAR.

http://wiki.apache.org/solr/DataImportHandler?highlight=%28%28DataImportHandler%29%29#Scheduling

I need to invoke the import from .Net environment, so I'd like to avoid any
non-Solr code. Any help is much appreciated.

Thanks
Kiran


Solr Replication

2009-08-25 Thread J G

Hello,

We are running multiple slices in our environment. I have enabled JMX and I am 
inspecting the replication handler mbean to obtain some information about the 
master/slave configuration for replication. Is the replication handler mbean a 
singleton? I only see one mbean for the entire server and it's picking an 
arbitrary slice to report on. So I'm curious if every slice gets its own 
replication handler mbean? This is important because I have no way of knowing 
in this specific server any information about the other slices, in particular, 
information about the master/slave value for the other slices.

Reading through the Solr 1.4 replication strategy, I saw that a slice can be 
configured to be a master and a slave, i.e. a repeater. I'm wondering how 
repeaters work because let's say I have a slice named 'A' and the master is on 
server 1 and the slave is on server 2 then how are these two servers 
communicating to replicate? Looking at the jmx information I have in the MBean 
both the isSlave and isMaster is set to true for my repeater so how does this 
solr slice know if it's the master or slave? I'm a bit confused.

Thanks.




_
With Windows Live, you can organize, edit, and share your photos.
http://www.windowslive.com/Desktop/PhotoGallery

RE: Solr Replication

2009-08-26 Thread J G

Thanks for the response.

It's interesting because when I run jconsole all I can see is one 
ReplicationHandler jmx mbean. It looks like it is defaulting to the first slice 
it finds on its path. Is there anyway to have multiple replication handlers or 
at least obtain replication on a per "slice"/"instance" via JMX like how you 
can see attributes for each "slice"/"instance" via each replication admin jsp 
page? 

Thanks again.

> From: noble.p...@corp.aol.com
> Date: Wed, 26 Aug 2009 11:05:34 +0530
> Subject: Re: Solr Replication
> To: solr-user@lucene.apache.org
> 
> The ReplicationHandler is not enforced as a singleton , but for all
> practical purposes it is a singleton for one core.
> 
> If an instance  (a slice as you say) is setup as a repeater, It can
> act as both a master and slave
> 
> in the repeater the configuration should be as follows
> 
> MASTER
>   |_SLAVE (I am a slave of MASTER)
>   |
> REPEATER (I am a slave of MASTER and master to my slaves )
>  |
>  |
> REPEATER_SLAVE( of REPEATER)
> 
> 
> the point is that REPEATER will have a slave section has a masterUrl
> which points to master and REPEATER_SLAVE will have a slave section
> which has a masterurl pointing to repeater
> 
> 
> 
> 
> 
> 
> On Wed, Aug 26, 2009 at 12:40 AM, J G wrote:
> >
> > Hello,
> >
> > We are running multiple slices in our environment. I have enabled JMX and I 
> > am inspecting the replication handler mbean to obtain some information 
> > about the master/slave configuration for replication. Is the replication 
> > handler mbean a singleton? I only see one mbean for the entire server and 
> > it's picking an arbitrary slice to report on. So I'm curious if every slice 
> > gets its own replication handler mbean? This is important because I have no 
> > way of knowing in this specific server any information about the other 
> > slices, in particular, information about the master/slave value for the 
> > other slices.
> >
> > Reading through the Solr 1.4 replication strategy, I saw that a slice can 
> > be configured to be a master and a slave, i.e. a repeater. I'm wondering 
> > how repeaters work because let's say I have a slice named 'A' and the 
> > master is on server 1 and the slave is on server 2 then how are these two 
> > servers communicating to replicate? Looking at the jmx information I have 
> > in the MBean both the isSlave and isMaster is set to true for my repeater 
> > so how does this solr slice know if it's the master or slave? I'm a bit 
> > confused.
> >
> > Thanks.
> >
> >
> >
> >
> > _
> > With Windows Live, you can organize, edit, and share your photos.
> > http://www.windowslive.com/Desktop/PhotoGallery
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com

_
Hotmail® is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009

master/slave replication issue

2009-08-26 Thread J G







Hello,

I'm having an issue getting the master to replicate its index to the slave. 
Below you will find my configuration settings. Here is what is happening: I can 
access the replication dashboard for both the slave and master and I can 
successfully execute HTTP commands against both of these urls through my 
browser. Now, my slave is configured to use the same URL as the one I am using 
in my browser when I query the master, yet when I do a tail -f /logs/catalina.out on the slave server all I see is :


Master - server1.xyz.com Aug 27, 2009 12:13:29 AM org.apache.solr.core.SolrCore 
execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

Aug 27, 2009 12:13:32 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

Aug 27, 2009 12:13:34 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

Aug 27, 2009 12:13:36 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

Aug 27, 2009 12:13:39 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

Aug 27, 2009 12:13:42 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

Aug 27, 2009 12:13:44 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=


For some reason, the webapp and the path is being set to null and I "think" 
this is affecting the replication?!? I am running Solr as the WAR file and it's 
1.4 from a few weeks ago.






optimize


optimize





Notice that I commented out the replication of the configuration files. I 
didn't think this is important for the attempt to try to get replication 
working. However, is it good to have these files replicated?


Slave - server2.xyz.com





http://server1.xyz.com:8080/jdoe/replication  


00:00:20  


internal

5000
1


username
password

 




Thanks for your help!




_
Hotmail® is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009

RE: Solr Replication

2009-08-27 Thread J G

We have multiple solr webapps all running from the same WAR file. Each webapp 
is running under the same Tomcat container and I consider each webapp the same 
thing as a "slice" (or "instance"). I've configured the Tomcat container to 
enable JMX and when I connect using JConsole I only see the replication handler 
for one of the webapps in the server. I was under the impression each webapp 
gets its own replication handler. Is this not true? 

It would be nice to be able to have a JMX MBean for each replication handler in 
the container so we can get all the same replication information using JMX as 
in using the replication admin page for each web app.

Thanks.





> From: noble.p...@corp.aol.com
> Date: Thu, 27 Aug 2009 13:04:38 +0530
> Subject: Re: Solr Replication
> To: solr-user@lucene.apache.org
> 
> when you say a slice you mean one instance of solr? So your JMX
> console is connecting to only one solr?
> 
> On Thu, Aug 27, 2009 at 3:19 AM, J G wrote:
> >
> > Thanks for the response.
> >
> > It's interesting because when I run jconsole all I can see is one 
> > ReplicationHandler jmx mbean. It looks like it is defaulting to the first 
> > slice it finds on its path. Is there anyway to have multiple replication 
> > handlers or at least obtain replication on a per "slice"/"instance" via JMX 
> > like how you can see attributes for each "slice"/"instance" via each 
> > replication admin jsp page?
> >
> > Thanks again.
> >
> >> From: noble.p...@corp.aol.com
> >> Date: Wed, 26 Aug 2009 11:05:34 +0530
> >> Subject: Re: Solr Replication
> >> To: solr-user@lucene.apache.org
> >>
> >> The ReplicationHandler is not enforced as a singleton , but for all
> >> practical purposes it is a singleton for one core.
> >>
> >> If an instance  (a slice as you say) is setup as a repeater, It can
> >> act as both a master and slave
> >>
> >> in the repeater the configuration should be as follows
> >>
> >> MASTER
> >>   |_SLAVE (I am a slave of MASTER)
> >>   |
> >> REPEATER (I am a slave of MASTER and master to my slaves )
> >>  |
> >>  |
> >> REPEATER_SLAVE( of REPEATER)
> >>
> >>
> >> the point is that REPEATER will have a slave section has a masterUrl
> >> which points to master and REPEATER_SLAVE will have a slave section
> >> which has a masterurl pointing to repeater
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Aug 26, 2009 at 12:40 AM, J G wrote:
> >> >
> >> > Hello,
> >> >
> >> > We are running multiple slices in our environment. I have enabled JMX 
> >> > and I am inspecting the replication handler mbean to obtain some 
> >> > information about the master/slave configuration for replication. Is the 
> >> > replication handler mbean a singleton? I only see one mbean for the 
> >> > entire server and it's picking an arbitrary slice to report on. So I'm 
> >> > curious if every slice gets its own replication handler mbean? This is 
> >> > important because I have no way of knowing in this specific server any 
> >> > information about the other slices, in particular, information about the 
> >> > master/slave value for the other slices.
> >> >
> >> > Reading through the Solr 1.4 replication strategy, I saw that a slice 
> >> > can be configured to be a master and a slave, i.e. a repeater. I'm 
> >> > wondering how repeaters work because let's say I have a slice named 'A' 
> >> > and the master is on server 1 and the slave is on server 2 then how are 
> >> > these two servers communicating to replicate? Looking at the jmx 
> >> > information I have in the MBean both the isSlave and isMaster is set to 
> >> > true for my repeater so how does this solr slice know if it's the master 
> >> > or slave? I'm a bit confused.
> >> >
> >> > Thanks.
> >> >
> >> >
> >> >
> >> >
> >> > _
> >> > With Windows Live, you can organize, edit, and share your photos.
> >> > http://www.windowslive.com/Desktop/PhotoGallery
> >>
> >>
> >>
> >> --
> >> -
> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >
> > _
> > Hotmail® is up to 70% faster. Now good news travels really fast.
> > http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com

_
With Windows Live, you can organize, edit, and share your photos.
http://www.windowslive.com/Desktop/PhotoGallery

ExtractingRequestHandler commitWithin

2009-11-23 Thread j philoon

Any chance of getting the ExtractingRequestHandler to use the commitWithin
parameter?
-- 
View this message in context: 
http://old.nabble.com/ExtractingRequestHandler-commitWithin-tp26478144p26478144.html
Sent from the Solr - User mailing list archive at Nabble.com.



PatternTokenizer question

2009-11-24 Thread j philoon

I have defined a comma-delimited pattern tokenizer as follows:

  


  




This appears to work fine when adding documents, since if I add a field
commafld as "word1,WORD2,word 3" I see terms in the index as expected:
"word1", "word2", and "word 3".

When I query, I am expecting that the same tokenization would take place, so
a query that has 'commafld:(word 3)' would match term "word 3".  However, I
find I have to submit the query as 'commafld:("word 3")'.  That is, it seems
as if whitespace tokenization is taking place, not the comma-delimited
tokenization.

Am I misunderstanding what should be happening or making some basic mistake? 
Thanks. 
-- 
View this message in context: 
http://old.nabble.com/PatternTokenizer-question-tp26497675p26497675.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: PatternTokenizer question

2009-11-24 Thread j philoon

I think the answer to my question is contained in the wiki when discussing
the SynonymFilter, "The Lucene QueryParser tokenizes on white space before
giving any text to the Analyzer".  This would indeed explain what I am
getting.  Next question - can I avoid that behavior?


j philoon wrote:
> 
> I have defined a comma-delimited pattern tokenizer as follows:
>  positionIncrementGap="100">
>   
> 
> 
>   
> 
> 
> 
> 
> This appears to work fine when adding documents, since if I add a field
> commafld as "word1,WORD2,word 3" I see terms in the index as expected:
> "word1", "word2", and "word 3".
> 
> When I query, I am expecting that the same tokenization would take place,
> so a query that has 'commafld:(word 3)' would match term "word 3". 
> However, I find I have to submit the query as 'commafld:("word 3")'.  That
> is, it seems as if whitespace tokenization is taking place, not the
> comma-delimited tokenization.
> 
> Am I misunderstanding what should be happening or making some basic
> mistake?  Thanks. 
> 

-- 
View this message in context: 
http://old.nabble.com/PatternTokenizer-question-tp26497675p26503324.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr jmx connection

2009-07-10 Thread J G

 Hello,

I have a SOLR JMX connection issue. I am running my JMX MBeanServer through 
Tomcat, meaning I am using Tomcat's MBeanServer rather than any other 
MBeanServer implemenation.
I am having a hard time trying to figure out the correct JMX Service URL on my 
localhost for the accessing the SOLR MBeans. My current configuration consists 
of the following:

JMX Service url = localhost:9000/jmxrmi

So I have configured JMX to run on port 9000 on tomcat on my localhost and 
using the above service url i can access the tomcat jmx MBeanServer and get 
related JVM object information(e.g. I can access the MemoryMXBean object)

However, I am having a harder time trying to access the SOLR MBeans. First, I 
could have the wrong service URL. Second, I'm confused as to which MBeans SOLR 
provides.

You might be asking why am I creating my own client rather than using JConsole, 
but JConsole doesn't provide the features I need.

Anyone with any knowledge or code snippets would be a huge help!

Thank you for your time!

Regards



_
Hotmail® has ever-growing storage! Don’t worry about storage limits. 
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009

JMX monitoring for multiple SOLR instances

2009-07-14 Thread J G

Hi,

If I want to run multiple SOLR war files in tomcat is it possible to monitor 
each of the SOLR instances individually through JMX? Has anyone attempted this 
before? Also, what are the implications (e.g. performance) of runnign mulitple 
SOLR instances in the same tomcat server?

Thanks.




_
Windows Live™: Keep your life in sync. 
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_BR_life_in_synch_062009

Obtaining SOLR index size on disk

2009-07-17 Thread J G

Hello,

Is it possible to obtain the SOLR index size on disk through the SOLR API? I've 
read through the docs and mailing list questions but can't seem to find the 
answer.

Any help is appreciated.

Thanks.



_
Hotmail® has ever-growing storage! Don’t worry about storage limits. 
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009

Search Multiple indexes In Solr

2007-11-07 Thread j 90
Hi, I'm new to Solr but very familiar with Lucene.

Is there a way to have Solr search in more than once index, much like the
MultiSearcher in Lucene ?

If so how so I configure the location of the indexes ?


Re: Solr 1.3 expected release date

2007-12-19 Thread j 90
Thums up - w00t

On Dec 14, 2007 1:21 AM, Norberto Meijome <[EMAIL PROTECTED]> wrote:

> On Wed, 12 Dec 2007 20:04:00 -0500
> "Norskog, Lance" <[EMAIL PROTECTED]> wrote:
>
> > ... SOLR-303 (Distributed Search over HTTP)...
> >
> > Woo-hoo!
>
> hear hear!!!
> _
> {Beto|Norberto|Numard} Meijome
>
> Your reasoning is excellent -- it's only your basic assumptions that are
> wrong.
>
> I speak for myself, not my employer. Contents may be hot. Slippery when
> wet. Reading disclaimers makes you go blind. Writing them is worse. You have
> been Warned.
>


i think it is time to release new solr version

2008-01-27 Thread j . L
because lucene 2.3.0 today released..



-- 
regards
j.L


Re: question about fl=score

2008-03-20 Thread j . L
2008/3/20 李银松 <[EMAIL PROTECTED]>:

> 1、When I set fl=score ,solr returns just as fl=*,score ,not just scores
> Is it a bug or just do it on purpose?


u can set fl=id,score, solr not support the style like fl=score


> My customer want to get the 1th-10010th added docs
> So I have to sort by timestamp, to get top10010 docs' timestamp ……


 limit 1, 10010 order by timestamp?


-- 
regards
j.L


Re: Chinese Language + Solr

2008-05-14 Thread j . L
u can try je-analyzer,,,i  building 17m docs search site by solr and
je-analyzer

On Thu, May 15, 2008 at 6:44 AM, Walter Underwood <[EMAIL PROTECTED]>
wrote:

> N-gram works pretty well for Chinese, there are even studies to
> back that up.
>
> Do not use the N-gram matches for highlighting. They look really
> stupid to native speakers.
>
> wunder
>
> On 5/14/08 2:03 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:
>
> > There are no free morphological analyzers for Chinese (are there for any
> > language?) that I know.  People tend to use one of the n-gram analyzers
> from
> > Lucene contrib.  I've used them before and they do OK.
> >
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> > - Original Message 
> >> From: Francisco Sanmartin <[EMAIL PROTECTED]>
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, May 14, 2008 4:54:05 PM
> >> Subject: Chinese Language + Solr
> >>
> >> I have had successful experiences using Sorl with an English website,
> >> and now I am going to deploy Solr in a chinese site. I've been looking
> >> in the mailing list and there are some useful information in the old
> posts.
> >> But, we would like some kind of feedback of the people who already have
> >> deployed Solr in any CJK Language.
> >>
> >> Is there any free and good analyzer? (Preferible morphological)
> >> Among all the commercial analyzers, what would you recommend? Is there
> >> any of them that works ok out-of-the-box with Solr?
> >>
> >> Thanks in advance.
> >>
> >> Pako
> >
>
>


-- 
regards
j.L


Re: Chinese Language + Solr

2008-05-14 Thread j . L
if commercial analyzers, i recommend http://www.hylanda.com/(it is the best
analyzer in chinese word)

On Thu, May 15, 2008 at 8:32 AM, j. L <[EMAIL PROTECTED]> wrote:

> u can try je-analyzer,,,i  building 17m docs search site by solr and
> je-analyzer
>
>
> On Thu, May 15, 2008 at 6:44 AM, Walter Underwood <[EMAIL PROTECTED]>
> wrote:
>
>> N-gram works pretty well for Chinese, there are even studies to
>> back that up.
>>
>> Do not use the N-gram matches for highlighting. They look really
>> stupid to native speakers.
>>
>> wunder
>>
>> On 5/14/08 2:03 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]>
>> wrote:
>>
>> > There are no free morphological analyzers for Chinese (are there for any
>> > language?) that I know.  People tend to use one of the n-gram analyzers
>> from
>> > Lucene contrib.  I've used them before and they do OK.
>> >
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> > - Original Message 
>> >> From: Francisco Sanmartin <[EMAIL PROTECTED]>
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wednesday, May 14, 2008 4:54:05 PM
>> >> Subject: Chinese Language + Solr
>> >>
>> >> I have had successful experiences using Sorl with an English website,
>> >> and now I am going to deploy Solr in a chinese site. I've been looking
>> >> in the mailing list and there are some useful information in the old
>> posts.
>> >> But, we would like some kind of feedback of the people who already have
>> >> deployed Solr in any CJK Language.
>> >>
>> >> Is there any free and good analyzer? (Preferible morphological)
>> >> Among all the commercial analyzers, what would you recommend? Is there
>> >> any of them that works ok out-of-the-box with Solr?
>> >>
>> >> Thanks in advance.
>> >>
>> >> Pako
>> >
>>
>>
>
>
> --
> regards
> j.L




-- 
regards
j.L


Re: Chinese Language + Solr

2008-05-14 Thread j . L
if u can read chinese and wanna write ur chinese-analyzer,,, maybe u can see
it http://www.googlechinablog.com/2006/04/blog-post_10.html



2008/5/15 j. L <[EMAIL PROTECTED]>:

> if commercial analyzers, i recommend 
> http://www.hylanda.com/(it<http://www.hylanda.com/%28it>is the best analyzer 
> in chinese word)
>
>
> On Thu, May 15, 2008 at 8:32 AM, j. L <[EMAIL PROTECTED]> wrote:
>
>> u can try je-analyzer,,,i  building 17m docs search site by solr and
>> je-analyzer
>>
>>
>> On Thu, May 15, 2008 at 6:44 AM, Walter Underwood <[EMAIL PROTECTED]>
>> wrote:
>>
>>> N-gram works pretty well for Chinese, there are even studies to
>>> back that up.
>>>
>>> Do not use the N-gram matches for highlighting. They look really
>>> stupid to native speakers.
>>>
>>> wunder
>>>
>>> On 5/14/08 2:03 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>> > There are no free morphological analyzers for Chinese (are there for
>>> any
>>> > language?) that I know.  People tend to use one of the n-gram analyzers
>>> from
>>> > Lucene contrib.  I've used them before and they do OK.
>>> >
>>> >
>>> > Otis
>>> > --
>>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> >
>>> >
>>> > - Original Message 
>>> >> From: Francisco Sanmartin <[EMAIL PROTECTED]>
>>> >> To: solr-user@lucene.apache.org
>>> >> Sent: Wednesday, May 14, 2008 4:54:05 PM
>>> >> Subject: Chinese Language + Solr
>>> >>
>>> >> I have had successful experiences using Sorl with an English website,
>>> >> and now I am going to deploy Solr in a chinese site. I've been looking
>>> >> in the mailing list and there are some useful information in the old
>>> posts.
>>> >> But, we would like some kind of feedback of the people who already
>>> have
>>> >> deployed Solr in any CJK Language.
>>> >>
>>> >> Is there any free and good analyzer? (Preferible morphological)
>>> >> Among all the commercial analyzers, what would you recommend? Is there
>>> >> any of them that works ok out-of-the-box with Solr?
>>> >>
>>> >> Thanks in advance.
>>> >>
>>> >> Pako
>>> >
>>>
>>>
>>
>>
>> --
>> regards
>> j.L
>
>
>
>
> --
> regards
> j.L




-- 
regards
j.L


Re: Chinese Language + Solr

2008-05-14 Thread j . L
I don't know the cost.

I know the bigger chinese search use it.

More chinese people who study and use full-text search think it is the best
chinese analyzer  which u can buy.

Baidu(www.baidu.com), is the biggest chinese search, and googlechina is the
No 2.

Baidu not use it (http://www.hylanda.com/ ),
they use theirself chinese analyzer.




On Thu, May 15, 2008 at 8:45 AM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:

> Out of curiosity, what's the cost (the site is in Chinese, so I can't tell
> :( )?
> BasisTech are the main people for this type of stuff.  Expensive, though, I
> believe.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>


-- 
regards
j.L


anyone use hadoop+solr?

2008-05-19 Thread j . L
can u talk about it ?

maybe i will use hadoop + solr.

thks for ur advice.



-- 
regards
j.L


Re: Chinese Language + Solr

2008-05-28 Thread j . L
On Thu, May 15, 2008 at 11:25 PM, Walter Underwood <[EMAIL PROTECTED]>
wrote:

> I've worked with the Basis products. Solid, good support.
> Last time I talked to them, they were working on hooking
> them into Lucene.
>

i don't know basis product. but i know google use it and in china,
google.cnnot better that baidu.
we always use baidu.com to search chinese information.


>
> For really good quality results from any of these, you need
> to add terms to the user dictionary of the segmenter. These
> may be local jargon, product names, personal names, place
> names, etc.
>

yes, i agree your point.

baidu's analyzer use this way which i learn from Internet.


>
> Baidu has different problems than the rest of us, because
> their code has to be scary fast. They might even trade
> lower quality for more speed.
>

Can u say it more?
I think baidu use more cache server and have effective cache strategy.


>
> wunder
>
>


-- 
regards
j.L


Re: Deleting Solr index

2008-06-18 Thread j . L
just rm -r SOLR_DIR/data/index.


2008/6/18 Mihails Agafonovs <[EMAIL PROTECTED]>:

> How can I clear the whole Solr index?
>  Ar cieņu, Mihails




-- 
regards
j.L


Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

2019-10-21 Thread rhys J
I am trying to import a csv file to my solr core.

It looks like this:

"user_id","name","email","client","classification","default_client","disabled","dm_password","manager"
"A2M","Art Morse","amo...@morsemoving.com","Morse
Moving","Morse","","X","blue0show",""
"ABW","Amy Wiedner","amy.wied...@pyramid-logistics.com","Pyramid","","","
","shawn",""
"J2P","Joan Padal","jo...@bergerallied.com","Berger","","","
","skew3cues",""
"ALB","Anna Bachman","an...@bergerallied.com","Berger","","","
","wary#scan",""
"B1B","Bridget Baker","bba...@reliablevan.com","Reliable","","","
","laps,hear",""
"B1K","Bev Klein"," ","Nor-Cal","",""," ","pipe3hour",""
"B1L","Beverly Leonard","bleon...@reliablevan.com","Reliable","","","
","gail6copy",""
"CMD","Christal Davis","christalda...@smmoving.com","SMMoving","","","
","risk-pair",""
"BEB","Bob Barnum","b...@bergerts.com","Berger","",""," ","mets=pol",""

I have set up the schema via the API, and have all the fields that are
listed on the top line of the csv file.

When I finish the import, it returns no errors. But when I go to look at
the schema, it's created a 2 fields in the managed-schema file:



and

 


Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

2019-10-21 Thread rhys J
I am using this command:

curl '
http://localhost:8983/solr/users/update/csv?commit=true&separator=%09&encapsulator=%20&escape=\&stream.file=/tmp/users.csv
'

On Mon, Oct 21, 2019 at 1:22 PM Alexandre Rafalovitch 
wrote:

> What command do you use to get the file into Solr? My guess that you
> are somehow not hitting the correct handler. Perhaps you are sending
> it to extract handler (designed for PDF, MSWord, etc) rather than the
> correct CSV handler.
>
> Solr comes with the examples of how to index CSV command.
> See for example:
>
> https://github.com/apache/lucene-solr/blob/master/solr/example/films/README.txt#L39
> Also reference documentation:
>
> https://lucene.apache.org/solr/guide/8_1/uploading-data-with-index-handlers.html
>
> Regards,
>Alex.
>
> On Mon, 21 Oct 2019 at 13:04, rhys J  wrote:
> >
> > I am trying to import a csv file to my solr core.
> >
> > It looks like this:
> >
> >
> "user_id","name","email","client","classification","default_client","disabled","dm_password","manager"
> > "A2M","Art Morse","amo...@morsemoving.com","Morse
> > Moving","Morse","","X","blue0show",""
> > "ABW","Amy Wiedner","amy.wied...@pyramid-logistics.com
> ","Pyramid","","","
> > ","shawn",""
> > "J2P","Joan Padal","jo...@bergerallied.com","Berger","","","
> > ","skew3cues",""
> > "ALB","Anna Bachman","an...@bergerallied.com","Berger","","","
> > ","wary#scan",""
> > "B1B","Bridget Baker","bba...@reliablevan.com","Reliable","","","
> > ","laps,hear",""
> > "B1K","Bev Klein"," ","Nor-Cal","",""," ","pipe3hour",""
> > "B1L","Beverly Leonard","bleon...@reliablevan.com","Reliable","","","
> > ","gail6copy",""
> > "CMD","Christal Davis","christalda...@smmoving.com","SMMoving","","","
> > ","risk-pair",""
> > "BEB","Bob Barnum","b...@bergerts.com","Berger","",""," ","mets=pol",""
> >
> > I have set up the schema via the API, and have all the fields that are
> > listed on the top line of the csv file.
> >
> > When I finish the import, it returns no errors. But when I go to look at
> > the schema, it's created a 2 fields in the managed-schema file:
> >
> >  >
> name="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> > type="text_general"/>
> >
> > and
> >
> >   >
> source="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> >
> dest="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager__str"
> > maxChars="256"/>
>


Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

2019-10-21 Thread rhys J
Thank you, that worked perfectly. I can't believe I didn't notice the
separator was a tab.


using the df parameter to set a default to search all fields

2019-10-22 Thread rhys J
 How do I make Solr search on all fields in a document?

I read the documentation about the df field, and added the following to my
solrconfig.xml:

 
  explicit
  10
 _text_


in my managed-schema file i have the following:

 

I have deleted the documents, and re-indexed the csv file.

When I do a search in the api for: _text_:amy - which should return 2
documents, I get nothing.

If I do a search for 'amy' in the q field, I still get nothing.

If I do an explicit search for name:amy, I get 2 documents returned.


Re: using the df parameter to set a default to search all fields

2019-10-22 Thread rhys J
> Solr does not have a way to ask for all fields on a search.  If you use
> the edismax query parser, you can specify multiple fields with the qf
> parameter, but there is nothing you can put in that parameter as a
> shortcut for "all fields."  Using qf with multiple fields is the
> cleanest way to do this.
>
>
How would I enter qf parameters in the solrconfig.xml?


> Probably what you are looking for here is to set up one or more
> copyField definitions in your schema, which are configured to copy one
> or more of your other fields to _text_ so it can be searched as a
> catchall field.  I find it useful to name that field "catchall" rather
> than something like _text_ which seems like a special field name, but
> isn't.
>

I did as you suggested, and created a field called 'all_fields' and added
copyFields too. I re-indexed, and this works when i do the search.

Thanks

Rhys


Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-25 Thread rhys J
Is there some reason that text_general fields are returned as arrays, and
other fields are returned as hashes in the json response from a curl query?

Here's my curl query:

curl "http://10.40.10.14:8983/solr/dbtr/select?indent=on&q=debtor_id:393291";

Here's the response:

response":{"numFound":1,"start":0,"docs":[
  {
"agent":[" "],
"assign_id":["587"],
"client_group":[" "],
"credit_hold":false,
"credit_limit":0.0,
"credit_terms":["N30"],
"currency":["USD"],
"debtor_id":"393291",
"dl1":["165743"],
"dl2":["Great Plains"],
"do_not_call":false,
"do_not_report":false,
"in_aris_date":"2009-10-19T00:00:00Z",
"name1":["CRATE & BARREL"],
"name2":[" "],
"next_contact_date":"2019-10-17T00:00:00Z",
"parent_customer_number":["215976"],
"potential_bad_debt":true,
"priority_followup":false,
"reference_no":["165743"],
"report_as":"CRATE & BARREL",
"report_status":[" "],
"risk":["Low"],
"rms_acct_id":["Berger"],
"salesperson":["Corp House"],
"ssn1":["32"],
"ssn2":["EXEMPT"],
"status_code":["173"],
"status_date":"2016-05-12T00:00:00Z",
"watch_list":[0],
"_version_":1648384727255613441,
"data_signature":"f020b831dd6e553eed217125de13de850d1f4bbc"}]
  }}

As you can see, dates and booleans are hashes, and the text_general fields
(the only thing I can think of that is different) are arrays.

Why is this, and how can i make it return just a hash for the code to
handle?

One thing I did notice in the schema API is that even though I did not
choose MultiValued, it's set to true.

Is this a bug?

Thanks,

Rhys


Re: Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-25 Thread rhys J
> 
>
> >  "dl2":["Great Plains"],
> >  "do_not_call":false,
>
> There are no hashes inside the document.  If there were, they would be
> surrounded by {} characters.  The whole document is a hash, which is why
> it has {} characters.  Referring to the snippet that I included above,
> dl2 is mapped in the hash to an array, and do_not_call is mapped to a
> boolean, not a hash.
>
> When there is an array in search results, it happens because the field
> is multiValued ... even if there is only one value, it is placed in an
> array for consistency.
>

So I went back to one of the fields that is multi-valued, which I
explicitly did not choose when I created the field, and I re-created it.

It still made the field multi-valued as true.

Why is this?

Thanks,

Rhys

>
> Thanks,
> Shawn
>


Re: Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-28 Thread rhys J
> Did you reload the core/collection or restart Solr so the new schema
> would take effect? If it's SolrCloud, did you upload the changes to
> zookeeper and then reload the collection?  SolrCloud does not use config
> files on disk.
>

So I have not done this part yet, but I noticed some things in the
managed-schema.

 the first was this (I did verify that the version of the schema is
up-to-date. I am doing an out of the box install of the latest Solr release.

I checked all the fields that I created (I will paste them below), and they
are NOT multi-valued. However, text_general is set to multi-valued as a
default?

 

  
  
  


  
  
  
  

  

Here are some of the fields I created through the API. When I created them,
I did NOT check the multi-valued box at all. However, when I then go to
look at the field through the API, it is marked Multi-valued. I am assuming
this is because of the fieldType definition above? Why is this set to
default to Multi-valued?

Will I break Solr if i change this to default to not multi-valued?

Thanks,

Rhys


Re: Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-28 Thread rhys J
I forgot to include the fields created through the API:


  
  
  
  
  
  
  
  
  
  
  
  
  

Thanks,

Rhys

On Mon, Oct 28, 2019 at 11:30 AM rhys J  wrote:

>
>
>> Did you reload the core/collection or restart Solr so the new schema
>> would take effect? If it's SolrCloud, did you upload the changes to
>> zookeeper and then reload the collection?  SolrCloud does not use config
>> files on disk.
>>
>
> So I have not done this part yet, but I noticed some things in the
> managed-schema.
>
>  the first was this (I did verify that the version of the schema is
> up-to-date. I am doing an out of the box install of the latest Solr release.
>
> I checked all the fields that I created (I will paste them below), and
> they are NOT multi-valued. However, text_general is set to multi-valued as
> a default?
>
>   positionIncrementGap="100" multiValued="true">
> 
>   
>ignoreCase="true"/>
>   
> 
> 
>   
>ignoreCase="true"/>
>ignoreCase="true" synonyms="synonyms.txt"/>
>   
> 
>   
>
> Here are some of the fields I created through the API. When I created
> them, I did NOT check the multi-valued box at all. However, when I then go
> to look at the field through the API, it is marked Multi-valued. I am
> assuming this is because of the fieldType definition above? Why is this set
> to default to Multi-valued?
>
> Will I break Solr if i change this to default to not multi-valued?
>
> Thanks,
>
> Rhys
>


creating a core with a custom managed-schema

2019-11-04 Thread rhys J
I have created a tmp directory where I want to have reside custom
managed-schemas to use when creating cores.

/tmp/solr_schema/CORENAME/managed-schema

Based on this page:
https://lucene.apache.org/solr/guide/7_0/coreadmin-api.html#coreadmin-create
, I am running the following command:

sudo -u solr /opt/solr/bin/solr create -c dbtrphon -schema
/tmp/solr_schemas/dbtrphon/managed-schema

I get this error:

ERROR: Unrecognized or misplaced argument: -schema!

How can I create a core with a custom managed-schema?

I'm trying to implement solr in a development environment, but I would like
to have custom schemas, so that when we move it to live, we don't have to
recreate the schemas by hand again.

Thanks,

Rhys


Virus-free.
www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: creating a core with a custom managed-schema

2019-11-04 Thread rhys J
On Mon, Nov 4, 2019 at 1:36 PM Erick Erickson 
wrote:

> Well, just what it says. -schema isn’t a recognized parameter, where did
> you get it? Did you try bin/solr create -help and follow the instructions
> there?
>
> I am confused.

This page:
https://lucene.apache.org/solr/guide/7_0/coreadmin-api.html#coreadmin-create

says that schema is a valid parameter, and it explains how to use it.

But when I use the command create, I get an error.

Is there no way to use a custom schema to create a core from the command
line? Will I always have to either hand edit the managed-schema, or use the
API?

Thanks,

Rhys


Using solr API to return csv results

2019-11-07 Thread rhys J
If I am using the Solr API to query the core, is there a way to tell how
many documents are found if i use wt=CSV?

Thanks,

Rhys


different results in numFound vs using the cursor

2019-11-11 Thread rhys J
i am using this logic in perl:

my $decoded = decode_json( $solrResponse->{_content} );
my $numFound = $decoded->{response}{numFound};

$cursor = "*";
$prevCursor = '';

while ( $prevCursor ne $cursor )
{
  my $solrURI = "\"http://[SOLR URL]:8983/solr/";
  $solrURI .= $fdat{core};

  $solrSort = ( $fdat{core} eq 'dbtr' ) ? "debtor_id+asc" : "id+asc";
  $solrOptions = "/select?indent=on&rows=$getrows&sort=$solrSort&q=";
  $solrURI .= $solrOptions;
  $solrURI .= $query;

 $solrURI .= ( $prevCursor eq '' ) ? "&cursorMark=*\"":
 "&cursorMark=$cursor\"";

 print STDERR "solrURI '$solrURI'\n";
 my $solrResponse = $ua->post( $solrURI );
   my $decoded = decode_json( $solrResponse->{_content} );
  my $numFound = $decoded->{response}{numFound};

 foreach my $d ( $decoded->{response}{docs} )
  {
  my @docs = @$d;
  print STDERR "size of docs '" . scalar( @docs ) . "'\n";
   foreach my $r ( @docs )
   {
   if ( $fdat{cust_num} and $fdat{core} eq 'dbtr' )
   {
   push ( @solrResults, $r->{debtor_id} );
   }
   elsif ( $fdat{cust_num} and $fdat{core} eq 'debt' )
   {
   push ( @solrResults, $r->{debt_id} );
   }
   }

}
   $prevCursor = ( $prevCursor eq '' ) ? "*" : $cursor;
 $cursor = $decoded->{nextCursorMark};
  print STDERR "cursor '$cursor'\n";
  print STDERR "prevCursor '$prevCursor'\n";
  print STDERR "size of solrResults '" . scalar( @solrResults ) . "'\n";
}

print out:

http://[SOLR
URL]:8983/solr/debt/select?indent=on&rows=1000&sort=id+asc&q=debt_id:
608384 OR debt_id: 393291&cursorMark=AoEmMzkzMjkx

The numFound: 35008
final size of solrResults: 22006

Am I missing something I should be using with cursorMark? Or is this
expected?

I've checked my logic, and I'm using the cursors the way this page is using
them in examples:

https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html

Thanks

Rhys


Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
On Mon, Nov 11, 2019 at 8:32 PM Chris Hostetter 
wrote:

>
> Based on the info provided, it's hard to be certain, but reading between
> the lines here are hte assumptions i'm making...
>
> 1) your core name is "dbtr"
> 2) the uniqueId field for the "dbtr" core is "debtor_id"
>
> ..are those assumptions correct?
>

Yes they are. Sorry I didn't provide that from the beginning.


> Two key pieces of information that doesn't seem to be assumable from the
> imfo you've provided:
>
> a) What is the fieldType of the uniqueKey field in use?
>

It is a textField


> b) how are you determining that "The numFound: 35008"
>
>
I do a preliminary query to the solr core and print out the numFound from
this:

 my $solrResponse = $ua->post( $solrURI );

 my $decoded = decode_json( $solrResponse->{_content} );
 my $numFound = $decoded->{response}{numFound};


> ...
>
> You show the code that prints out "size of solrResults: 22006" but nothing
> in your code ever prints $numFound.  there is a snippet of code at the top
>

I am printing numFound every time it loops. This should remain constant,
because it is the total of all documents found. It's not really necessary
that I am printing it.

The number of docs is the size that I also print, and that is 1000 every
time, until the last little bit, and then it is 6 docs found.


> of your perl logic that seems disconnected from the rest of the code which
> makes me think that before you do anything with a cursor you are already
> parsing some *other* query response to get $numFound that way...
>
>
I am running this query first, to get the cursor set:

"http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id
asc&q=debt_id: 608384 OR debt_id: 393291&cursorMark=*"

This sets the cursor, and then returns a cursorMark that I start using in
order to grab 1000 documents at a time.



> ...what exactly does all the code *before* this look like? what is the
> request that you are using to get that initial '$solrResponse' that you
> are parsing to extract '$numFound'  are you sure it's exactly the same as
> the query whose cursor you are iterating over?
>
>
query from before the loop:

"http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id
asc&q=debt_id: 608384 OR debt_id: 393291&cursorMark=*"

query in the loop:

http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id+asc&q=debt_id:
608384 OR debt_id: 393291&cursorMark=AoElMTg1MzE=

I do have some logic to make sure i grab the first 1000 from the first
query, but other than that, it's a simple loop.


> It looks like you are (also) extracting 'my $numFound =
> $decoded->{response}{numFound};' on every (cusor) request ... what do you
> get if add this to your cursor loop...
>
>print STDERR "numFound = $numFound at '$cursor'";
>
> numFound is always 35008 because that is how many total documents are
found. The number of docs in the response is the number that I care about,
because that shows me how many came back for this slice.


> ...because unless documents are being added/deleted as you iterate over
> hte cursor, the numFound value should be consistent on each request.
>
>
numFound is consistently 35008.

Thanks

Rhys


using fq means no results

2019-11-12 Thread rhys J
If I do this query in the browser:

http://10.40.10.14:8983/solr/debt/select?q=(clt_ref_no:+owl-2924-8)^=1.0+clt_ref_no:owl-2924-8

I get 84662 results.

If I do this query:

http://10.40.10.14:8983/solr/debt/select?q=(clt_ref_no:+owl-2924-8)^=1.0+clt_ref_no:owl-2924-8&fq=clt_ref_no

I get 0 results.

Why does using fq do this?

What am I missing in my query?

Thanks,

Rhys


Re: using fq means no results

2019-11-12 Thread rhys J
On Tue, Nov 12, 2019 at 11:57 AM Erik Hatcher 
wrote:

> fq is a filter query, and thus narrows the result set provided by the q
> down to what also matches all specified fq's.
>
>
So this can be used instead of scoring? Or alongside scoring?


> You gave it a query, "cat_ref_no", which literally looks for that string
> in your default field.   Looking at your q parameter, cat_ref_no looks like
> a field name, and your fq should probably also have a value for that field
> (say fq=cat_ref_no=owl-2924-8)
>
> Use debug=true to see how your q and fq's are parsed, and that
> should shed some light on the issue.
>
>
Thank you for your help!

Rhys


Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
On Tue, Nov 12, 2019 at 12:18 PM Chris Hostetter 
wrote:

>
> : > a) What is the fieldType of the uniqueKey field in use?
> : >
> :
> : It is a textField
>
> whoa... that's not normal .. what *exactly* does the fieldType declaration
> (with all analyzers) look like, and what does the  declaration
> look like?
>
>




  
  
  


  
  
  
  

  



> you should really never use TextField for a uniqueKey ... it's possible,
> but incredibly tricky to get "right".
>
>
I am going to adjust my schema, re-index, and try again. See if that
doesn't fix this problem. I didn't know that having the uniqueKey be a
textField was a bad idea.


> Independent from that, "sorting" on a TextField doesn't always do what you
> might think (again: depending on the analysis in use)
>
> With a cursorMark you have other factors to consider: i bet what's
> happening is that the post-analysis terms for your docs result it
> duplicate values, so the cursorMark is skipping all docs that have hte
> same (post analysis) sort value ... this could also manifest itself in
> other weird ways, like trying to deleteById.
>
> Step #1: switch to using a simple StrField for your uniqueKey field and
> see if htat solves all your problems.
>
>
Thanks, doing this now.

Rhys


Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
> : I am going to adjust my schema, re-index, and try again. See if that
> : doesn't fix this problem. I didn't know that having the uniqueKey be a
> : textField was a bad idea.
>
>
> https://lucene.apache.org/solr/guide/8_3/other-schema-elements.html#OtherSchemaElements-UniqueKey
>
> "The fieldType of uniqueKey must not be analyzed"
>
> (hence my comment baout "possible, but hard to get right ... you can use
> something like the KeywordTokenizer, but at that point you might as well
> use StrField except in some really esoteric special situations)
>
>
Good news. I added a field called ID, and made it string. Then I deleted
documents, re-indexed my data, and tried the search again.

Now solrResults size and numFound size are exactly the same.

Thanks for your help.

Rhys


date fields and invalid date string errors

2019-11-13 Thread rhys J
I have date fields in my documents that are just -MM-DD.

I set them as a pdate field in the schema as such:



and



When I use the API to do a search and try:

2018-01-01
[2018-01-01 TO NOW]

I get 'Invalid Date String'.

Did I type my data wrong in the schema? Is there something I'm missing from
the field itself?

According to this page, I should be able to query on just  or -MM
or -MM-DD.

https://lucene.apache.org/solr/guide/6_6/working-with-dates.html

Thanks,

Rhys


Re: date fields and invalid date string errors

2019-11-13 Thread rhys J
> If you use DateRangeField instead of DatePointField for your field's
> class, then you can indeed use partial timestamps for both indexing and
> querying.  This only works with DateRangeField.
>
>
I don't see that as an option in the API? Do I need to change what pdate's
type is in the managed-schema for it to take effect?

As in:

 

to

 

Thanks,

Rhys


Re: date fields and invalid date string errors

2019-11-13 Thread rhys J
> You could do it that way ... but instead, I'd create a new fieldType,
> not change an existing one.  The existing name is "pdate" which implies
> "point date".  I would probably go with "daterange" or "rdate" as the
> name, but that is completely up to you.
>
>
I did that, deleted docs, stopped, started solr, and then re-indexed. And
it's working like I expect it to.

Thanks for the help.

Rhys


Query More Than One Core

2019-11-13 Thread rhys J
I have more than one core. Each core represents one database table.

They are coordinated by debt_id/debtor_id, so we can do join statements on
them with Sybase/SQL.

Is there a way to query more than one core at a time, or do I need to do
separate queries per core, and then somehow with perl aggregate them into
one list?

Thanks,

Rhys


Re: Query More Than One Core

2019-11-13 Thread rhys J
On Wed, Nov 13, 2019 at 3:16 PM Jörn Franke  wrote:

> You can use nested indexing and Index both types of documents in one core.
>
> https://lucene.apache.org/solr/guide/8_1/indexing-nested-documents.html


I had read that, but it doesn't really fit our needs right now.

I figured out how to do a join like so:

http://localhost8983/solr/debt/select?indent=on&rows=100&sort=id
asc&q=(debt_id:570856 OR reference_no: *570856*)&fq={!join from=debtor_id
to=debt_id fromIndex=dbtr}ssn1:12


However, what is the use case for Solr if you have already a database?
>

The use case is that we have an old search tool that uses the db, but it's
painfully slow, and it doesn't do fuzzy searches very well, or handle
things like searching for phone numbers without it relying on a lot of
regular expressions. A search engine speeds things up, and gets more
precise results.

Thanks,

Rhys


using gt and lt in a query

2019-11-14 Thread rhys J
I am trying to duplicate this line from a db query:

(debt.orig_princ_amt > 0 AND debt.princ_paid > 0 AND debt.orig_princ_amt >
debt.princ_paid)

I have the following, but it returns no results:

http://localhost:8983/solr/debt/select?q=orig_princ_amt
: 0 TO
 *
AND

princ_paid:0

TO

*
AND

gt(orig_princ_amt
,
princ_paid)


I should have 1075459 results, but I get 0.

Thanks,

Rhys


Re: using gt and lt in a query

2019-11-14 Thread rhys J
> Range queries are done with brackets and/or braces.  A square bracket
> indicates that the range should include the precise value mentioned, and
> a curly brace indicates that the range should exclude the precise value
> mentioned.
>
>
> https://lucene.apache.org/solr/guide/8_2/the-standard-query-parser.html#TheStandardQueryParser-RangeSearches
>
>
But I'm not doing a range, I'm doing a query on whether one field is
greater than another field. Or am I missing something here?

Thanks,

Rhys


Re: using gt and lt in a query

2019-11-14 Thread rhys J
On Thu, Nov 14, 2019 at 1:28 PM Erick Erickson 
wrote:

> You might be able to make this work with function queries….
>
>
>
I managed to decipher something along the lines of this:

http://10.40.10.14:8983/solr/debt/select?q=orig_princ_amt: 0 TO
 *
AND

princ_paid:0

TO

*&
fq={!frange
l=0}if( gt(orig_princ_amt
,
princ_paid),1, 0 )

but it's still not giving me the entire results that the database gives. So
I'm not sure what I am missing?

Thanks,

Rhys


using NOT or - to exclude results with a textField type

2019-11-15 Thread rhys J
I'm trying to exclude results based on the documentation about the boolean
NOT symbol, but I keep getting errors.

I've tried:

http://localhost:8983/solr/debt/select?q=clt_ref_no:-”owl-2924-8”

and

http://localhost:8983/solr/debt/select?q=clt_ref_no:NOT”owl-2924-8”

I have tried with and without quotes too.

Am I not able to use the NOT with a textField?

Here are the errors I get from the browser:

"msg":"org.apache.solr.search.SyntaxError: Cannot parse
'clt_ref_no:-”=owl-2924-8”': Encountered \" \"-\" \"- \"\" at line 1,
column 11.\nWas expecting one of:\n ...\n\"(\" ...\n
   \"*\" ...\n ...\n ...\n ...\n
...\n ...\n\"[\" ...\n\"{\"
...\n ...\n\"filter(\" ...\n ...\n",

Thanks,

Rhys


attempting to get an exact match on a textField

2019-11-15 Thread rhys J
I am trying to use the API to get an exact match on clt_ref_no.

At one point, I was using ""s to enclose the text such as:

clt_ref_no: "OWL-2924-8", and I was getting 5 results. Which is accurate.

Now when I use it, I only get one match.

If I try to build the url in perl, and then post the url, my response is
this:

http://localhost:8983/solr/debt/select?indent=on&rows=1000&sort=id%20asc&q=(%20clt_ref_no:%22OWL-2924-8%E2%80%9D%20OR%20contract_number:%22OWL-2924-8%22%20)&fq={!join%20from=debtor_id%20to=debt_id%20fromIndex=dbtr}&cursorMark=*&debug=true


Breaking that down, I've got:

q=( clt_ref_no: "OWL-2924-8" OR contract_number: "OWL-2924-8" )
fq= {!join from=debtor_id to=debt_id fromIndex=dbtr}

"error":{
"trace":"java.lang.NullPointerException\n\tat
org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:584)\n\tat
java.base/java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)\n\tat
org.apache.solr.util.ConcurrentLRUCache.get(ConcurrentLRUCache.java:130)\n\tat
org.apache.solr.search.FastLRUCache.get(FastLRUCache.java:165)\n\tat
org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:815)\n\tat
org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1026)\n\tat
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1541)\n\tat
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1421)\n\tat
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:568)\n\tat
org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1484)\n\tat
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:398)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:305)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:505)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProd

Re: attempting to get an exact match on a textField

2019-11-16 Thread rhys J
I figured it out. It was a combination of problems.

1. not fully indexing the data. that made the result set return smaller
than expected.
2. using the join statement without adding a field at the end of it to
search the other core on.

On Fri, Nov 15, 2019 at 1:39 PM rhys J  wrote:

>
> I am trying to use the API to get an exact match on clt_ref_no.
>
> At one point, I was using ""s to enclose the text such as:
>
> clt_ref_no: "OWL-2924-8", and I was getting 5 results. Which is accurate.
>
> Now when I use it, I only get one match.
>
> If I try to build the url in perl, and then post the url, my response is
> this:
>
>
> http://localhost:8983/solr/debt/select?indent=on&rows=1000&sort=id%20asc&q=(%20clt_ref_no:%22OWL-2924-8%E2%80%9D%20OR%20contract_number:%22OWL-2924-8%22%20)&fq={!join%20from=debtor_id%20to=debt_id%20fromIndex=dbtr}&cursorMark=*&debug=true
> <http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id%20asc&q=(%20clt_ref_no:%22OWL-2924-8%E2%80%9D%20OR%20contract_number:%22OWL-2924-8%22%20)&fq=%7B!join%20from=debtor_id%20to=debt_id%20fromIndex=dbtr%7D&cursorMark=*&debug=true>
>
> Breaking that down, I've got:
>
> q=( clt_ref_no: "OWL-2924-8" OR contract_number: "OWL-2924-8" )
> fq= {!join from=debtor_id to=debt_id fromIndex=dbtr}
>
> "error":{
> "trace":"java.lang.NullPointerException\n\tat 
> org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:584)\n\tat 
> java.base/java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)\n\tat
>  
> org.apache.solr.util.ConcurrentLRUCache.get(ConcurrentLRUCache.java:130)\n\tat
>  org.apache.solr.search.FastLRUCache.get(FastLRUCache.java:165)\n\tat 
> org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:815)\n\tat
>  
> org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1026)\n\tat
>  
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1541)\n\tat
>  
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1421)\n\tat
>  
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:568)\n\tat
>  
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1484)\n\tat
>  
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:398)\n\tat
>  
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:305)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)\n\tat
>  
> org.eclipse.jetty.server.

using scoring to find exact matches while using a cursormark

2019-11-18 Thread rhys J
I am trying to use scoring to get the expected results at the top of the
stack when doing a Solr query.

I am looking up clt_ref_no: OWL-2924-8^2 OR contract_number: OWL-2924-8^2

If I use the following query:in the browser, I get the expected results at
the top of the returned values from Solr.

{
  "responseHeader":{
"status":0,
"QTime":41,
"params":{
  "q":"( clt_ref_no:OWL-2924-8 ^2 OR contract_number:OWL-2924-8^2 )",
  "indent":"on",
  "fl":"clt_ref_no, score",
  "rows":"1000"}},
  "response":{"numFound":84663,"start":0,"maxScore":25.664566,"docs":[
  {
"clt_ref_no":"OWL-2924-8",
"score":25.664566},
  {
"clt_ref_no":"OWL-2924-8",
"score":25.664566},
  {
"clt_ref_no":"OWL-2924-8/73847",
"score":23.509575},
  {
"clt_ref_no":"OWL-2924-8/73847",
"score":23.509575},
  {
"clt_ref_no":"OWL-2924-8/73847",
"score":23.509575},
  {
"clt_ref_no":"U615-2924-8",
"score":19.244316},
  {
"clt_ref_no":"M1057-2924-8/88543",
"score":17.650301},

If I add ihe sorting needed for cursor, my results change
dramatically, and the exact matches are not at the top of the stack.

Example:



{
  "responseHeader":{
"status":0,
"QTime":80,
"params":{
  "q":"( clt_ref_no:OWL-2924-8 ^2 OR contract_number:OWL-2924-8^2 )",
  "indent":"on",
  "fl":"clt_ref_no, score",
  "sort":"score asc, id asc",
  "rows":"1000"}},
  "response":{"numFound":84663,"start":0,"maxScore":25.664566,"docs":[
  {
"clt_ref_no":"MMRO-1258-13/MMRO-1258-13/8",
"score":1.3380225},
  {
"clt_ref_no":"MMMP-151-14/MMMP-151-14/8",
"score":1.3380225},
  {
"clt_ref_no":"MMRO-806-14/MMRO-806-14/8",
"score":1.3380225},
  {
"clt_ref_no":"MMMP-44-14/MMMP-44-14/8",
"score":1.3380225},
  {
"clt_ref_no":"MMRO-45-13/MMRO-45-13/8",
"score":1.3380225},
  {
"clt_ref_no":"MMIN-202-14/MMIN-202-14/8",
"score":1.3380225},
  {
"clt_ref_no":"MMTC-1457-14/MMTC-1457-14/8",
"score":1.3380225},
  {

Should I not be sorting on score? I thought sorting on score was how I
would get the exact matches to return?

If I add in sort=score asc to the first query, it does what the second
query does, and not have expected matches floating to the top of the
results.

Thanks,

Rhys


Re: using scoring to find exact matches while using a cursormark

2019-11-18 Thread rhys J
> ...so w/o a score param you're getting the default sort: score "desc"
> (descending)...
>
>
> https://lucene.apache.org/solr/guide/8_3/common-query-parameters.html#CommonQueryParameters-ThesortParameter
>
> "If the sort parameter is omitted, sorting is performed as though
> the
> parameter were set to score desc."
>
>
>
Oh my goodness, I didn't realize the default was desc! Thanks for pointing
that out. I adjusted my query, and now it's getting the sorting right.

Thanks so much,

Rhys


Attempting to do a join with 3 cores

2019-11-18 Thread rhys J
I was hoping to be able to do a join with 3 cores.

I found this page that seemed to indicate it's possible?

https://stackoverflow.com/questions/52380302/solr-how-to-join-three-cores

Here's my query:

http://localhost:8983/solr/dbtrphon/select?indent=on&rows=1000&sort=score
desc, id desc&q=(phone:*Meredith* OR descr:*Meredith*){!join from=debtor_id
to=debt_id fromIndex=debt}*&fq={!join from=debtor_id to=debtor_id
fromIndex=dbtr }(ssn1:12 OR ssn1:33) AND (assign_id:584 OR
assign_id:583)&cursorMark=*

{
  "responseHeader":{
"status":400,
"QTime":14,
"params":{
  "q":"(phone:*Meredith* OR descr:*Meredith*){!join from=debtor_id
to=debt_id fromIndex=debt}*",
  "indent":"on",
  "cursorMark":"*",
  "sort":"score desc, id desc",
  "fq":"{!join from=debtor_id to=debtor_id fromIndex=dbtr
}(ssn1:12 OR ssn1:33) AND (assign_id:584 OR assign_id:583)",
  "rows":"1000"}},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"undefined field: \"debtor_id\"",
"code":400}}


The schema for dbtrphon has debtor_id, the schema for debt has debt_id, and
the schema for dbtr has debtor_id. These fields all should be able to join,
but I get this error.

I've tried substituting debt_id for the debtor_id in the second join, but I
get the same error 'undefined field 'debt_id'.

I am unsure what I'm missing?

Thanks,

Rhys


exact matches on a join

2019-11-19 Thread rhys J
I am trying to do a join, which I have working properly on 2 cores.

One core has report_as, and the other core has debt_id.

If I enter 'report_as: "Freeman", I expect to get 272 results. But I get
557.

When I do a database search on the matched fields, it shows me that
report_as: "Freeman" is matching also on 'A-1 Freeman'.

I have tried boosting the score as report_as: "Freeman"^2, but I get the
same results from the API, and from the browser itself.

Here is my query:

{
  "responseHeader":{
"status":0,
"QTime":5,
"params":{
  "q":"( * )",
  "indent":"on",
  "fl":"debt_id, score",
  "cursorMark":"*",
  "sort":"score desc, id desc",
  "fq":"{!join from=debtor_id to=debt_id fromIndex=dbtr}(
report_as:\"Freeman\"^2)",
  "rows":"1000"}},
  "response":{"numFound":557,"start":0,"maxScore":1.0,"docs":[
  {
"debt_id":"485435",
"score":1.0},
  {
"debt_id":"485435",
"score":1.0},
  {
"debt_id":"482795",
"score":1.0},
  {
"debt_id":"482795",
"score":1.0},
  {
"debt_id":"482794",
"score":1.0},
  {
"debt_id":"482794",
"score":1.0},
  {
"debt_id":"482794",
"score":1.0},

SKIP



{
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",


These ones are the correct matches that I can verify with the
database, but their scores are the same as the ones matching on
'A1-Freeman'

Is my scoring set up wrong?

Thanks,

Rhys


Re: exact matches on a join

2019-11-21 Thread rhys J
On Thu, Nov 21, 2019 at 8:04 AM Jason Gerlowski 
wrote:

> Are these fields "string" or "text" fields?
>
> Text fields receive analysis that splits them into a series of terms.
> That's why the query "Freeman" matches the document "A-1 Freeman".
> "A-1 Freeman" gets split up into multiple terms, and the "Freeman"
> query matches one of those terms.  Text fields are what you use when
> you want matches to have some wiggle room based on your analyzers.
>
> String fields are much more geared towards exact matches.  No analysis
> is done, so a query for "Freeman" would only match docs who have that
> value identically.
>
>
Thanks, this was the conclusion I came to too. When I asked, they decided
that those matches were acceptable, and to keep the field a textField.

Rhys


Highlighting on typing in search box

2019-11-21 Thread rhys J
Are there any recommended APIs or code examples of using Solr and then
highlighting results below the search box?

I'm trying to implement a search box that will search solr as the user
types, if that makes sense?

Thanks,

Rhys


Re: Highlighting on typing in search box

2019-11-21 Thread rhys J
Thank you both! I've got an autocomplete working on a basic format right
now, and I'm working on implementing it to be smart about which core it
searches.

On Thu, Nov 21, 2019 at 11:43 AM Jörn Franke  wrote:

> It sounds like you look for a suggester.
>
> You can use the suggester of Solr.
>
> For the visualization part: Angular has a suggestion box that can ingest
> the results from Solr.
>
> > Am 21.11.2019 um 16:42 schrieb rhys J :
> >
> > Are there any recommended APIs or code examples of using Solr and then
> > highlighting results below the search box?
> >
> > I'm trying to implement a search box that will search solr as the user
> > types, if that makes sense?
> >
> > Thanks,
> >
> > Rhys
>


How to tell which core was used based on Json or XML response from Solr

2019-11-22 Thread rhys J
I'm implementing an autocomplete search box for Solr.

I'm using JSON as my response style, and this is the jquery code.


 var url='http://10.40.10.14:8983/solr/'+core+'/select/?q='+queryField +

query+'&version=2.2&hl=true&start=0&rows=50&indent=on&wt=json&callback=?&json.wrf=on_data';

 jQuery_3_4_1.getJSON(url);

___

on_data(data)
{
 var docs = data.response.docs;
jQuery_3_4_1.each(docs, function(i, item) {

var trLink = ' '
 + item.debtor_id + '';

trLink += '' + item.name1 + '';
trLink += '' + item.dl1 + '';
trLink += '';

jQuery_3_4_1('#resultsTable').prepend(jQuery_3_4_1(trLink));
});

}

the jQuery_3_4_1 variable is replacing $ because I needed to have 2
different versions of jQuery running in the same document.

I'd like to know if there's something I'm missing that will indicate which
core I've used in Solr based on the response.

Thanks,

Rhys


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-22 Thread rhys J
On Fri, Nov 22, 2019 at 1:39 PM David Hastings 
wrote:

> 2 things (maybe 3):
> 1.  dont have this code facing a client thats not you, otherwise anyone
> could view the source and see where the solr server is, which means they
> can destroy your index or anything they want.  put at the very least a
> simple api/front end in between the javascript page for the user and the
> solr server
>

Is there a way I can fix this?


> 2. i dont think there is a way, you would be better off indexing an
> indicator of sorts into your documents
>

Oh this is a good idea.

Thanks!

3. the jquery in your example already has the core identified, not sure why
> the receiving javascript wouldn't be able to read that variable unless im
> missing something.
>
>
There's another function on_data that is being called by the url, which
does not have any indication of what the core was, only the response from
the url.

Thanks,

Rhys


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-25 Thread rhys J
On Mon, Nov 25, 2019 at 2:10 AM Erik Hatcher  wrote:

> add &core=&echoParams=all and the parameter will be in the response
> header.
>
>Erik
>

Thanks. I just tried this, and all I got was this response:

http://localhost:8983/solr/dbtr/select?q=debtor_id%3A%20393291&echoParams=all



{
  "responseHeader":{
"status":0,
"QTime":14,
"params":{
  "q":"debtor_id: 393291",
  "df":"_text_",
  "rows":"10",
  "echoParams":"all"}},
  "response":{"numFound":1,"start":0,"docs":[
  {

Rhys


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-25 Thread rhys J
> if you are taking the PHP route for the mentioned server part then I would
> suggest
> using a client library, not plain curl.  There is solarium, for instance:
>
> https://solarium.readthedocs.io/en/stable/
> https://github.com/solariumphp/solarium
>
> It can use curl under the hood but you can program your stuff on a higher
> level,
> against an API.
>
>
I am using jquery, so I am using the json package to send and decode the
json that solr sends. I hope that makes sense?

Thanks for your tip!

Our pages are a combo of jquery, javascript, and perl.


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-25 Thread rhys J
On Mon, Nov 25, 2019 at 1:10 AM Paras Lehana 
wrote:

> Hey rhys,
>
> What David suggested is what we do for querying Solr. You can figure out
> our frontend implementation of Auto-Suggest by seeing the AJAX requests
> fired when you type in the search box on www.indiamart.com.
>

 That is pretty cool.

I've ended up with something that highlights the match in a results table.
It's working, and the client seems happy with that implementation for now.


> Why are you using two jQuery files? If you have a web server, you already
> know that which core you queried from. Just convert the Solr JSON response
> and add the key "core" and return the modified JSON response. Keep your
> front-end query simple - just describe your query. All the other parameters
>

We are using 2 jquery versions, because this tool is running a tool that
has an old version of jquery attached to it. Because of that, I'm doing the
trick where you can load 2 different versions at the same time.


> can be added on the web server side. Anyways, why do you want to know the
> core name?
>

I need to know the core name, because each core has different values in the
documents, and I want to display those values based on which core was
queried.

This is kind of like an omnibox, where the user will just start typing
stuff into it. Based on what is typed, I will search a different core to
provide the right answer to them.

Thanks,

Rhys


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-25 Thread rhys J
On Mon, Nov 25, 2019 at 10:43 AM David Hastings <
hastings.recurs...@gmail.com> wrote:

> you missed the part about adding &core= to the query:
> &echoParams=all&core=mega
>
> returns for me:
>
>  "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
>   "q":"*:*",
>   "core":"mega",
>   "df":"text",
>   "q.op":"AND",
>   "rows":"10",
>   "echoParams":"all"}},
>

You're right, I missed that. I added it, and it works perfectly.


>
> also we are a perl shop as well, you could implement something as
> simple as this in a cgi script or something:
>
>
> my $url = $searcher;
> my $agent = new LWP::UserAgent;
> my $request = POST($url, $data);
> my $response = $agent->request($request)->decoded_content;
>
>
>
Thanks for this tip.

Rhys


Using an & in an indexed field and then querying for it.

2019-11-25 Thread rhys J
I have some fields that have text like so:

Reliable Van & Storage.

They indexed fine when I used curl and csv files to read them into the core.

Now when I try to query for them, I get errors.

If I try escaping it like so \&, I get the following error:

on_data({
  "responseHeader":{
"status":400,
"QTime":1,
"params":{
  "q":"name1:( reliable van \\",
  "core":"dbtr",
  "json.wrf":"on_data",
  "hl":"true",
  "indent":"on",
  "start":"0",
  "stor )":"",
  "callback":"?",
  "rows":"50",
  "version":"2.2",
  "wt":"json"}},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.parser.TokenMgrError"],
"msg":"org.apache.solr.search.SyntaxError: Cannot parse 'name1:(
reliable van \\': Lexical error at line 1, column 23.  Encountered:
 after : \"\"",
"code":400}})

If I try html encoding it like so: & I get the following error:



on_data({
  "responseHeader":{
"status":400,
"QTime":3,
"params":{
  "q":"name1:( reliable van ",
  "core":"dbtr",
  "json.wrf":"on_data",
  "hl":"true",
  "indent":"on",
  "amp; stor )":"",
  "start":"0",
  "callback":"?",
  "rows":"50",
  "version":"2.2",
  "wt":"json"}},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.parser.ParseException"],
"msg":"org.apache.solr.search.SyntaxError: Cannot parse 'name1:(
reliable van ': Encountered \"\" at line 1, column 21.\nWas
expecting one of:\n ...\n ...\n ...\n
\"+\" ...\n\"-\" ...\n ...\n\"(\" ...\n\")\"
...\n\"*\" ...\n ...\n ...\n
...\n ...\n ...\n\"[\" ...\n
\"{\" ...\n ...\n\"filter(\" ...\n ...\n
  ",
"code":400}})


How can I search for a field that has an & without breaking the
parser, or is it not possible because & is used as a special
character?

Thanks,

Rhys


Re: Using an & in an indexed field and then querying for it.

2019-11-25 Thread rhys J
On Mon, Nov 25, 2019 at 2:36 PM David Hastings 
wrote:

> its breaking on the & because its in the url and you are most likely
> sending a get request to solr.  you should send it as post or as %26
>
>
The package I am using doesn't have a postJSON function available, so I'm
using their getJSON function.

I changed the & to %26, and that fixed things.

Thanks,

Rhys


Search returning unexpected matches at the top

2019-12-06 Thread rhys J
I have a search box that is just searching every possible core, and every
possible field.

When I enter 'owl-2924-8', I expect the clt_ref_no of OWL-2924-8 to float
to the top, however it is the third result in my list.

Here is the code from the search:

on_data({
  "responseHeader":{
"status":0,
"QTime":31,
"params":{
  "hl":"true",
  "indent":"on",
  "fl":"debt_id, clt_ref_no",
  "start":"0",
  "sort":"score desc, id asc",
  "rows":"500",
  "version":"2.2",
  "q":"clt_ref_no:owl\\-2924\\-8 debt_descr:owl\\-2924\\-8
comments:owl\\-2924\\-8 reference_no:owl\\-2924\\-8 ",
  "core":"debt",
  "json.wrf":"on_data",
  "urlquery":"owl-2924-8",
  "callback":"?",
  "wt":"json"}},
  "response":{"numFound":85675,"start":0,"docs":[
  {
"clt_ref_no":"2924",
"debt_id":"574574"},
  {
"clt_ref_no":"2924",
"debt_id":"598663"},
  {
"clt_ref_no":"OWL-2924-8",
"debt_id":"624401"},
  {
"clt_ref_no":"OWL-2924-8",
"debt_id":"628157"},
  {
"clt_ref_no":"2924",
"debt_id":"584807"},
  {
"clt_ref_no":"U615-2924-8",
"debt_id":"628310"},
  {
"clt_ref_no":"OWL-2924-8/73847",
"debt_id":"596713"},
  {
"clt_ref_no":"OWL-2924-8/73847",
"debt_id":"624401"},
  {
"clt_ref_no":"OWL-2924-8/73847",
"debt_id":"628157"},
  {

I'm not interested in having a specific search with quotes around it,
because this is searching everything, so it's a fuzzy search. But I am
interested in understanding why 'owl-2924-8' doesn't come out on top of the
search.

As you can see, I'm sorting by score and then id, which should take care of
things, but it's not.

Thanks,

Rhys


Re: Search returning unexpected matches at the top

2019-12-06 Thread rhys J
On Fri, Dec 6, 2019 at 11:21 AM David Hastings  wrote:

> whats the field type for:
> clt_ref_no
>

It is a text_general field because it can have numbers or alphanumeric
characters.

*_no isnt a default dynamic character, and owl-2924-8 usually translates
> into
> owl 2924 8
>
>
So it's matching on word breaks, am I understanding properly?

It's matching all things that match either 'owl' or '2924' or '8'?

Thanks,

Rhys


  1   2   3   4   >