from:"A. Steven Anderson"

leading and trailing wildcard query

2009-11-04 Thread A. Steven Anderson

I've scoured the archives and JIRA , but the answer to my question is just
not clear to me.

With all the new Solr 1.4 features, is there any way  to do a leading and
trailing wildcard query on an *untokenized* field?

e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx

Yes, I know how expensive such a query would be, but we have the user
requirement, nonetheless.

If not, any suggestions on how to implement a custom solution using Solr?
Using an external data structure?

-- 
A. Steven Anderson

Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson

No thoughts on this? Really!?

I would hate to admit to my Oracle DBE that Solr can't be customized to do a
common query that a relational database can do. :-(


On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson <
a.steven.ander...@gmail.com> wrote:

> I've scoured the archives and JIRA , but the answer to my question is just
> not clear to me.
>
> With all the new Solr 1.4 features, is there any way  to do a leading and
> trailing wildcard query on an *untokenized* field?
>
> e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx
>
> Yes, I know how expensive such a query would be, but we have the user
> requirement, nonetheless.
>
> If not, any suggestions on how to implement a custom solution using Solr?
> Using an external data structure?
>
>
-- 
A. Steven Anderson

Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson

>
> The guilt trick is not the best thing to try on public mailing lists. :)
>

Point taken, although not my intention.  I guess I have been spoiled by
quick replies and was getting to think it was a stupid question.

Plus, I'm literally gonna get trash talk from my Oracle DBE if I can't make
this work. ;-)

We've basically relegated Oracle to handling ingest from which we index Solr
and provide all search features.  I'd hate to have to succumb to using
Oracle to service this one special query.


> The first thing that popped to my mind is to use 2 fields, where the second
> one contains the desrever string of the first one.
>

Please elaborate. What do you mean by *desrever* string?


> The second idea is to use n-grams (if it's OK to tokenize), more
> specifically edge n-grams.
>

Well, that's the problem.  The field may have non-Latin characters that may
not have whitespace nor punctuation.


-- 
A. Steven Anderson

Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson

Thanks for the solution, but could you elaborate on how it would find
something like *abc* in a field that contains abc.

Steve

On Thu, Nov 5, 2009 at 5:25 PM, Bernadette Houghton <
bernadette.hough...@deakin.edu.au> wrote:

> I've just set up something similar (much thanks to Avesh!)-
>
>  positionIncrementGap="100">
>  
>   
>   
>maxGramSize="25" />
>  
>  
>   
>   
>  
> 
>
>  positionIncrementGap="100">
>  
>   
>   
>/>
>  
>  
>   
>   
>  
> 
> .
> .
>multiValued="true"/>
>stored="false" multiValued="true"/>
> .
> .
>   
>   
>   
>   
>
>   
>   
>   
>   
>
> bern

Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson

> Doesn't it work to call SolrQueryParser.setAllowLeadingWildcard?


Good question.  Anyone?


> It can be really slow, what an RDBMS person would call a full table scan.


Understood.


> There is an open bug to make that settable in a config file, but this is a
> pretty tiny change to the source.
>   http://issues.apache.org/jira/browse/SOLR-218
>

Unfortunately, we can only use official releases (not even snapshots) since
it's a government-related project.

-- 
A. Steven Anderson

Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson

> Hi Steve, a query such as *abc* would need the NGramFilterFactor, hence the
> doubleedgytext, and would be retrievable by a query such as contains:abc.
> Note that you can set the max and minimum size of strings that get indexed.
>

Excellent!  Just to clarify though, NGramFilterFactor is a Solr 1.4 feature
only, correct?

-- 
A. Steven Anderson

Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson

> Ah. With that restriction, it is impossible.
> If it is OK to pay Lucid to make a one-line change, you might be able to do
> it. Otherwise, get ready to spend a lot of money for a search engine.
>

Well, now that Lucid is getting In-Q-Tel $$$, they will soon learn that
officially releases are all that matters, and 12-18 month release cycles are
not acceptable. ;-)

-- 
A. Steven Anderson

Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson

> Note that N-grams are limited to specific string lengths. I presume that
> you need to search for arbitrary strings, not just three-letter ones.
>

Understood, but that is a limitation that we can live with.

Thanks!
-- 
A. Steven Anderson

Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson

> Not sure what version it was supported from, but we're on 1.3.


Really!? Great answer!

Thanks!
-- 
A. Steven Anderson

Retrieve docs with > 1 multivalue field hits

2009-07-02 Thread A. Steven Anderson

Greetings!

I thought I remembered seeing a thread related to retrieving only documents
that had more than one hit in a particular multivalue field, but I cannot
find it now.

Regardless, is this possible in Solr 1.3? Solr 1.4?

-- 
A. Steven Anderson
Independent Consultant

Re: Retrieve docs with > 1 multivalue field hits

2009-07-06 Thread A. Steven Anderson

I thought this would be a quick yes or no answer and/or reference to another
thread, but alas, I got no replies.

Is it safe to assume the answer is 'no' for both Solr 1.3 and 1.4?

On Thu, Jul 2, 2009 at 3:48 PM, A. Steven Anderson wrote:

> Greetings!
>
> I thought I remembered seeing a thread related to retrieving only documents
> that had more than one hit in a particular multivalue field, but I cannot
> find it now.
>
> Regardless, is this possible in Solr 1.3? Solr 1.4?
>

-- 
A. Steven Anderson
Independent Consultant

Re: Retrieve docs with > 1 multivalue field hits

2009-07-09 Thread A. Steven Anderson

>
> : I thought I remembered seeing a thread related to retrieving only
> documents
> : that had more than one hit in a particular multivalue field, but I cannot
> : find it now.
>
> not easily.
>
> for arbitrary queries there's no simple way to kow what that query matches
> a single document, multiple times, in seperate field values.
>
> there may be some tricks that are possible if you give us a more concrete
> example of what exactly your use case is.
>

Thanks for the reply. I was beginning to think it was a stupid question. ;-)


A simple example would be if a schema included a phoneNum mulitValue field
and I wanted to return all docs that contained more than 1 phoneNum field
value.

It sort of like a facet query, but only on a per document count basis.

-- 
A. Steven Anderson
Independent Consultant

Re: Retrieve docs with > 1 multivalue field hits

2009-07-10 Thread A. Steven Anderson

>
> all docs that contain more than one phone number - regardless of matching a
> particular query?
>

Exactly.


> knowing that was a useful query, i'd change my indexer to also provide
> either a field with the count of phone number values, or a boolean field
> saying whether there are more than one or not.
>

True, but I was hoping there would be a generic query way to do it.


> things are more complex if you're asking something different than my
> interpretation above.
>

No. That was it.  Basically, the functional equivalent of "select * from
docs where count(phone) > 1".

I was thinking there might be a way with function queries, but I guess not.

Thanks for the answer though.

-- 
A. Steven Anderson
Independent Consultant

performance question

2009-12-29 Thread A. Steven Anderson

Greetings!

Is there any significant negative performance impact of using a
dynamicField?

Likewise for multivalued fields?

The reason why I ask is that our system basically aggregates data from many
disparate data sources (structured, unstructured, and semi-structured), and
the management of the schema.xml has become unwieldy; i.e. we currently have
dozens of fields which grows every time we add a new data source.

I was considering redefining the domain model outside of Solr which would be
used to generate the fields for the indexing process and the metadata (e.g.
display names) for the search process.

Thoughts?
-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com

DIH optional fields?

2009-12-29 Thread A. Steven Anderson

Greetings!

I'm trying to index a MySQL database that has some invalid dates (e.g.
"-00-00") which is causing my DIH to abort.

Ideally, I'd like DIH to skip this optional field but not the whole record.

I don't see any way to do this currently, but is there any work-around?

Should there be a JIRA for a field-level optional or onError parameter?

-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com

Re: DIH optional fields?

2009-12-29 Thread A. Steven Anderson

> Use &zeroDateTimeBehavior=convertToNull parameter in you sql connection
> string.
>

That worked great!

Thanks!

-- 
A. Steven Anderson
Independent Consultant
A. S. Anderson & Associates LLC
P.O. Box 672
Forest Hill, MD  21050-0672
443-790-4269
st...@asanderson.com

Re: performance question

2009-12-30 Thread A. Steven Anderson

> There can be an impact if you are searching against a lot of fields or if
> you are indexing a lot of fields on every document, but for the most part in
> most applications it is negligible.
>

We index a lot of fields at one time, but we can tolerate the performance
impact at index time.

It probably can't hurt to be more streamlined, but without knowing more
> about your model, it's hard to say.  I've built apps that were totally
> dynamic field based and they worked just fine, but these were more for
> discovery than just pure search.  In other words, the user was interacting
> with the system in a reflective model that selected which fields to search
> on.
>

Our application is as much about discovery as search, so this is good to
know.

Thanks for the feedback. It was very helpful.
-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com

Re: performance question

2010-01-03 Thread A. Steven Anderson

> Sorting and index norms have space penalties.
> Sorting on a field creates an array of Java ints, one for every
> document in the index. Index norms (used for boosting documents and
> other things) create an array of bytes in the Lucene index files, one
> for every document in the index.
> If you sort on many of your dynamic fields your memory use will
> explode, and the same with index norms and disk space.


Thanks for the info.  In general, I knew sorting was expensive, but I didn't
realize that dynamic fields made it worse.

-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com

Re: performance question

2010-01-03 Thread A. Steven Anderson

>
> dynamic fields don't make it worse ... the number of actaul field names
> you sort on makes it worse.
>
> If you sort on 100 fields, the cost is the same regardless of wether all
> 100 of those fields exist because of a single  declaration,
> or 100 distinct  declarations.
>

Ahh...thanks for the clarification.

So, in general, there is no *significant* performance difference with using
dynamic fields. Correct?


-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com

Re: performance question

2010-01-06 Thread A. Steven Anderson

> Strictly speaking there is some insignificant distinctions in performance
> related to how a field name is resolved -- Grant alluded to this
> earlier in this thread -- but it only comes into play when you actually
> refer to that field by name and Solr has to "look them up" in the
> metadata.  So for example if your request refered to 100 differnet field
> names in the q, fq, and facet.field params there would be a small overhead
> for any of those 100 fields that existed because of 
> declarations, that would not exist for any of those fields that were
> declared using  -- but there would be no added overhead to htat
> query if there were 999 other fields that existed in your index
> because of that same  declaration.
>
> But frankly: we're getting talking about seriously ridiculous
> "pico-optimizing" at this point ... if you find yourselv with performance
> concerns there are probaly 500 other things worth worrying about before
> this should ever cross your mind.
>

Thanks for the follow up.

I've converted our schema to required fields only with every other field
being a dynamic field.

The only negative that I've found so far is that you lose the copyField
capability, so it makes my ingest a little bigger, since I have to manually
copy the values myself.

-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com

Re: performance question

2010-01-06 Thread A. Steven Anderson

> You don't lose copyField capability with dynamic fields.  You can copy
> dynamic fields into a fixed field name like *_s => text or dynamic fields
> into another dynamic field like  *_s => *_t


Ahhh...I missed that little detail.  Nice!

Ok, so there are no negatives to using dynamic fields then. ;-)

Thanks for all the info!

-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com

StreamingUpdateSolrServer URL reset or flush?

2010-04-21 Thread A. Steven Anderson

Greetings!

I'm using StreamingUpdateSolrServer to index my daily Solr shards.

However, at midnight when I need to start indexing the next day's shard, is
there a way to reset the StreamingUpdateSolrServer URL to point to my new
shard, or is there a way to flush the current StreamingUpdateSolrServer just
before midnight?

-- 
A. Steven Anderson
Independent Consultant

leading and trailing wildcard query

Re: leading and trailing wildcard query

Re: leading and trailing wildcard query

Re: leading and trailing wildcard query

Re: leading and trailing wildcard query

Re: leading and trailing wildcard query

Re: leading and trailing wildcard query

Re: leading and trailing wildcard query

Re: leading and trailing wildcard query

Retrieve docs with > 1 multivalue field hits

Re: Retrieve docs with > 1 multivalue field hits

Re: Retrieve docs with > 1 multivalue field hits

Re: Retrieve docs with > 1 multivalue field hits

performance question

DIH optional fields?

Re: DIH optional fields?

Re: performance question

Re: performance question

Re: performance question

Re: performance question

Re: performance question

StreamingUpdateSolrServer URL reset or flush?

22 matches

Site Navigation

Mail list logo

Footer information