Re: RegexQuery performance

2011-12-12 Thread Jay Luker
On Sat, Dec 10, 2011 at 9:25 PM, Erick Erickson wrote: > My off-the-top-of-my-head notion is you implement a > Filter whose job is to emit some "special" tokens when > you find strings like this that allow you to search without > regexes. For instance, in the example you give, you could > index so

Re: RegexQuery performance

2011-12-10 Thread Erick Erickson
Hmmm, I don't know all that much about the universe you're searching (I'm *really* sorry about that, but I couldn't resist) but I wonder if you can't turn the problem on its head and do your regex stuff at index time instead. My off-the-top-of-my-head notion is you implement a Filter whose job is

Re: RegexQuery performance

2011-12-10 Thread Jay Luker
Hi Erick, On Fri, Dec 9, 2011 at 12:37 PM, Erick Erickson wrote: > Could you show us some examples of the kinds of things > you're using regex for? I.e. the raw text and the regex you > use to match the example? Sure! An example identifier would be "IRAS-A-FPA-3-RDR-IMPS-V6.0", which identifies

Re: RegexQuery performance

2011-12-09 Thread Erick Erickson
Could you show us some examples of the kinds of things you're using regex for? I.e. the raw text and the regex you use to match the example? The reason I ask is that perhaps there are other approaches, especially thinking about some clever analyzing at index time. For instance, perhaps NGrams are

Re: RegexQuery performance

2011-12-08 Thread Robert Muir
On Thu, Dec 8, 2011 at 11:01 AM, Jay Luker wrote: > Hi, > > I am trying to provide a means to search our corpus of nearly 2 > million fulltext astronomy and physics articles using regular > expressions. A small percentage of our users need to be able to > locate, for example, certain types of iden

RegexQuery performance

2011-12-08 Thread Jay Luker
Hi, I am trying to provide a means to search our corpus of nearly 2 million fulltext astronomy and physics articles using regular expressions. A small percentage of our users need to be able to locate, for example, certain types of identifiers that are present within the fulltext (grant numbers, d