leading and trailing wildcard query
I've scoured the archives and JIRA , but the answer to my question is just not clear to me. With all the new Solr 1.4 features, is there any way to do a leading and trailing wildcard query on an *untokenized* field? e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx Yes, I know how expensive such a query would be, but we have the user requirement, nonetheless. If not, any suggestions on how to implement a custom solution using Solr? Using an external data structure? -- A. Steven Anderson
Re: leading and trailing wildcard query
No thoughts on this? Really!? I would hate to admit to my Oracle DBE that Solr can't be customized to do a common query that a relational database can do. :-( On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson < a.steven.ander...@gmail.com> wrote: > I've scoured the archives and JIRA , but the answer to my question is just > not clear to me. > > With all the new Solr 1.4 features, is there any way to do a leading and > trailing wildcard query on an *untokenized* field? > > e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx > > Yes, I know how expensive such a query would be, but we have the user > requirement, nonetheless. > > If not, any suggestions on how to implement a custom solution using Solr? > Using an external data structure? > > -- A. Steven Anderson
Re: leading and trailing wildcard query
> > The guilt trick is not the best thing to try on public mailing lists. :) > Point taken, although not my intention. I guess I have been spoiled by quick replies and was getting to think it was a stupid question. Plus, I'm literally gonna get trash talk from my Oracle DBE if I can't make this work. ;-) We've basically relegated Oracle to handling ingest from which we index Solr and provide all search features. I'd hate to have to succumb to using Oracle to service this one special query. > The first thing that popped to my mind is to use 2 fields, where the second > one contains the desrever string of the first one. > Please elaborate. What do you mean by *desrever* string? > The second idea is to use n-grams (if it's OK to tokenize), more > specifically edge n-grams. > Well, that's the problem. The field may have non-Latin characters that may not have whitespace nor punctuation. -- A. Steven Anderson
Re: leading and trailing wildcard query
Thanks for the solution, but could you elaborate on how it would find something like *abc* in a field that contains abc. Steve On Thu, Nov 5, 2009 at 5:25 PM, Bernadette Houghton < bernadette.hough...@deakin.edu.au> wrote: > I've just set up something similar (much thanks to Avesh!)- > > positionIncrementGap="100"> > > > >maxGramSize="25" /> > > > > > > > > positionIncrementGap="100"> > > > >/> > > > > > > > . > . >multiValued="true"/> >stored="false" multiValued="true"/> > . > . > > > > > > > > > > > bern
Re: leading and trailing wildcard query
> Doesn't it work to call SolrQueryParser.setAllowLeadingWildcard? Good question. Anyone? > It can be really slow, what an RDBMS person would call a full table scan. Understood. > There is an open bug to make that settable in a config file, but this is a > pretty tiny change to the source. > http://issues.apache.org/jira/browse/SOLR-218 > Unfortunately, we can only use official releases (not even snapshots) since it's a government-related project. -- A. Steven Anderson
Re: leading and trailing wildcard query
> Hi Steve, a query such as *abc* would need the NGramFilterFactor, hence the > doubleedgytext, and would be retrievable by a query such as contains:abc. > Note that you can set the max and minimum size of strings that get indexed. > Excellent! Just to clarify though, NGramFilterFactor is a Solr 1.4 feature only, correct? -- A. Steven Anderson
Re: leading and trailing wildcard query
> Ah. With that restriction, it is impossible. > If it is OK to pay Lucid to make a one-line change, you might be able to do > it. Otherwise, get ready to spend a lot of money for a search engine. > Well, now that Lucid is getting In-Q-Tel $$$, they will soon learn that officially releases are all that matters, and 12-18 month release cycles are not acceptable. ;-) -- A. Steven Anderson
Re: leading and trailing wildcard query
> Note that N-grams are limited to specific string lengths. I presume that > you need to search for arbitrary strings, not just three-letter ones. > Understood, but that is a limitation that we can live with. Thanks! -- A. Steven Anderson
Re: leading and trailing wildcard query
> Not sure what version it was supported from, but we're on 1.3. Really!? Great answer! Thanks! -- A. Steven Anderson
Retrieve docs with > 1 multivalue field hits
Greetings! I thought I remembered seeing a thread related to retrieving only documents that had more than one hit in a particular multivalue field, but I cannot find it now. Regardless, is this possible in Solr 1.3? Solr 1.4? -- A. Steven Anderson Independent Consultant
Re: Retrieve docs with > 1 multivalue field hits
I thought this would be a quick yes or no answer and/or reference to another thread, but alas, I got no replies. Is it safe to assume the answer is 'no' for both Solr 1.3 and 1.4? On Thu, Jul 2, 2009 at 3:48 PM, A. Steven Anderson wrote: > Greetings! > > I thought I remembered seeing a thread related to retrieving only documents > that had more than one hit in a particular multivalue field, but I cannot > find it now. > > Regardless, is this possible in Solr 1.3? Solr 1.4? > -- A. Steven Anderson Independent Consultant
Re: Retrieve docs with > 1 multivalue field hits
> > : I thought I remembered seeing a thread related to retrieving only > documents > : that had more than one hit in a particular multivalue field, but I cannot > : find it now. > > not easily. > > for arbitrary queries there's no simple way to kow what that query matches > a single document, multiple times, in seperate field values. > > there may be some tricks that are possible if you give us a more concrete > example of what exactly your use case is. > Thanks for the reply. I was beginning to think it was a stupid question. ;-) A simple example would be if a schema included a phoneNum mulitValue field and I wanted to return all docs that contained more than 1 phoneNum field value. It sort of like a facet query, but only on a per document count basis. -- A. Steven Anderson Independent Consultant
Re: Retrieve docs with > 1 multivalue field hits
> > all docs that contain more than one phone number - regardless of matching a > particular query? > Exactly. > knowing that was a useful query, i'd change my indexer to also provide > either a field with the count of phone number values, or a boolean field > saying whether there are more than one or not. > True, but I was hoping there would be a generic query way to do it. > things are more complex if you're asking something different than my > interpretation above. > No. That was it. Basically, the functional equivalent of "select * from docs where count(phone) > 1". I was thinking there might be a way with function queries, but I guess not. Thanks for the answer though. -- A. Steven Anderson Independent Consultant
performance question
Greetings! Is there any significant negative performance impact of using a dynamicField? Likewise for multivalued fields? The reason why I ask is that our system basically aggregates data from many disparate data sources (structured, unstructured, and semi-structured), and the management of the schema.xml has become unwieldy; i.e. we currently have dozens of fields which grows every time we add a new data source. I was considering redefining the domain model outside of Solr which would be used to generate the fields for the indexing process and the metadata (e.g. display names) for the search process. Thoughts? -- A. Steven Anderson Independent Consultant st...@asanderson.com
DIH optional fields?
Greetings! I'm trying to index a MySQL database that has some invalid dates (e.g. "-00-00") which is causing my DIH to abort. Ideally, I'd like DIH to skip this optional field but not the whole record. I don't see any way to do this currently, but is there any work-around? Should there be a JIRA for a field-level optional or onError parameter? -- A. Steven Anderson Independent Consultant st...@asanderson.com
Re: DIH optional fields?
> Use &zeroDateTimeBehavior=convertToNull parameter in you sql connection > string. > That worked great! Thanks! -- A. Steven Anderson Independent Consultant A. S. Anderson & Associates LLC P.O. Box 672 Forest Hill, MD 21050-0672 443-790-4269 st...@asanderson.com
Re: performance question
> There can be an impact if you are searching against a lot of fields or if > you are indexing a lot of fields on every document, but for the most part in > most applications it is negligible. > We index a lot of fields at one time, but we can tolerate the performance impact at index time. It probably can't hurt to be more streamlined, but without knowing more > about your model, it's hard to say. I've built apps that were totally > dynamic field based and they worked just fine, but these were more for > discovery than just pure search. In other words, the user was interacting > with the system in a reflective model that selected which fields to search > on. > Our application is as much about discovery as search, so this is good to know. Thanks for the feedback. It was very helpful. -- A. Steven Anderson Independent Consultant st...@asanderson.com
Re: performance question
> Sorting and index norms have space penalties. > Sorting on a field creates an array of Java ints, one for every > document in the index. Index norms (used for boosting documents and > other things) create an array of bytes in the Lucene index files, one > for every document in the index. > If you sort on many of your dynamic fields your memory use will > explode, and the same with index norms and disk space. Thanks for the info. In general, I knew sorting was expensive, but I didn't realize that dynamic fields made it worse. -- A. Steven Anderson Independent Consultant st...@asanderson.com
Re: performance question
> > dynamic fields don't make it worse ... the number of actaul field names > you sort on makes it worse. > > If you sort on 100 fields, the cost is the same regardless of wether all > 100 of those fields exist because of a single declaration, > or 100 distinct declarations. > Ahh...thanks for the clarification. So, in general, there is no *significant* performance difference with using dynamic fields. Correct? -- A. Steven Anderson Independent Consultant st...@asanderson.com
Re: performance question
> Strictly speaking there is some insignificant distinctions in performance > related to how a field name is resolved -- Grant alluded to this > earlier in this thread -- but it only comes into play when you actually > refer to that field by name and Solr has to "look them up" in the > metadata. So for example if your request refered to 100 differnet field > names in the q, fq, and facet.field params there would be a small overhead > for any of those 100 fields that existed because of > declarations, that would not exist for any of those fields that were > declared using -- but there would be no added overhead to htat > query if there were 999 other fields that existed in your index > because of that same declaration. > > But frankly: we're getting talking about seriously ridiculous > "pico-optimizing" at this point ... if you find yourselv with performance > concerns there are probaly 500 other things worth worrying about before > this should ever cross your mind. > Thanks for the follow up. I've converted our schema to required fields only with every other field being a dynamic field. The only negative that I've found so far is that you lose the copyField capability, so it makes my ingest a little bigger, since I have to manually copy the values myself. -- A. Steven Anderson Independent Consultant st...@asanderson.com
Re: performance question
> You don't lose copyField capability with dynamic fields. You can copy > dynamic fields into a fixed field name like *_s => text or dynamic fields > into another dynamic field like *_s => *_t Ahhh...I missed that little detail. Nice! Ok, so there are no negatives to using dynamic fields then. ;-) Thanks for all the info! -- A. Steven Anderson Independent Consultant st...@asanderson.com
StreamingUpdateSolrServer URL reset or flush?
Greetings! I'm using StreamingUpdateSolrServer to index my daily Solr shards. However, at midnight when I need to start indexing the next day's shard, is there a way to reset the StreamingUpdateSolrServer URL to point to my new shard, or is there a way to flush the current StreamingUpdateSolrServer just before midnight? -- A. Steven Anderson Independent Consultant