indexing unique keys
I have a use-case where we want to store unique keys ( Hashes) which would be used to compare against another set of keys ( Hashes) For example Index set= { h1, h2 , h3 , h4 } comparision set = { h1 , h2 } result set = h1,h2 Would it be an advantage to store "index set" in Solr instead of storing in traditional databases? Thanks in advance *Nipen Mark *
Two analyzer per field
Is it possible to specify two analyzers per fields for example , consider a field *F1 *( keyword analyzer) = "cheers mate" *F2 *(keyword analyzer ) = "hello world" There is also a copy field *TEXT *( standard analyzer ) which will store the terms { cheers mate hello world } now when user perform any search we will be looking at copy field "TEXT" only which uses standard analyzer . Suppose user search "hello word" phrase it will not return any result as "hello" and "world" terms are tokenized . is it possible that I index "hello world" as it is as well in to *TEXT*field ? i.e can I use keyword analyzer as well and standard analyzer for field "TEXT" what should be better approach to handle this situation ? -- Nipen Mark
question on wild card
I have a database field = hello world and i am indexing to *text* field with standard analyzer ( text is a copy field of solr) Now when user gives a query text:"hello world%" , how does the query is interpreted in the background are we actually searchingtext: hello OR text: world%( consider by default operator is OR ) -- Nipen Mark
Re: question on wild card
thanks erick . One more question when "the perfect world*" is passed as search query its converted as "? perfect world" what does "?" mean Since i am using standard analyzer i thought stop word "the" is removed thanks On Thu, Jul 15, 2010 at 7:01 AM, Erick Erickson wrote: > The best way to understand how things are parsed is to go to the solr admin > page (Full interface link?) and click the "debug info" box and submit your > query. That'll tell you exactly what happens. > > Alternatively, you can put &debugQuery=on on your URL... > > HTH > Erick > > On Wed, Jul 14, 2010 at 8:48 AM, Mark N wrote: > > > I have a database field = hello world and i am indexing to *text* field > > with standard analyzer ( text is a copy field of solr) > > > > Now when user gives a query text:"hello world%" , how does the query > is > > interpreted in the background > > > > are we actually searchingtext: hello OR text: world%( consider > by > > default operator is OR ) > > > > > > > > > > > > > > -- > > Nipen Mark > > > -- Nipen Mark
Re: wildcard and proximity searches
Hi were you successful in trying SOLR -1604 to allow wild card queries in phrases ? Also does this plugin allow us to use proximity with wild card * "solr mail*"~10 * If this the right approach to go ahead to support these functionalities? thanks Mark On Wed, Aug 4, 2010 at 2:24 PM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > Thanks for you ideia. > > At this point I'm logging each query time. My ideia is to divide my > queries into "normal queries" and "heavy queries". I have some heavy > queries with 1 minute or 2mintes to get results. But they have for > instance (*word1* AND *word2* AND word3*). I guess that this will be > always slower (could be a little faster with > "ReversedWildcardFilterFactory") but they never be ready in a few > seconds. For now, I just increased the timeout for those :) (using > solrnet). > > My priority at the moment is the queries phrases like "word1* word2* > word3". After this is working, I'll try to optimize the "heavy queries" > > Frederico > > > -Original Message- > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: quarta-feira, 4 de Agosto de 2010 01:41 > To: solr-user@lucene.apache.org > Subject: Re: wildcard and proximity searches > > Frederico Azeiteiro wrote: > > > >>> But it is unusual to use both leading and trailing * operator. Why > are > >>> > > you doing this? > > > > Yes I know, but I have a few queries that need this. I'll try the > > "ReversedWildcardFilterFactory". > > > > > > > > ReverseWildcardFilter will help leading wildcard, but will not help > trying to use a query with BOTH leading and trailing wildcard. it'll > still be slow. Solr/lucene isn't good at that; I didn't even know Solr > would do it at all in fact. > > If you really needed to do that, the way to play to solr/lucene's way of > > doing things, would be to have a field where you actually index each > _character_ as a seperate token. Then leading and trailing wildcard > search is basically reduced to a "phrase search", but where the words > are actually characters. But then you're going to get an index where > pretty much every token belongs to every document, which Solr isn't that > > great at either, but then you can apply "commongram" stuff on top to > help that out a lot too. Not quite sure what the end result will be, > I've never tried it. I'd only use that weird special "char as token" > field for queries that actually required leading and trailing wildcards. > > Figuring out how to set up your analyzers, and what (if anything) you're > > going to have to do client-app-side to transform the user's query into > something that'll end up searching like a "phrase search where each > 'word' is a character is left as an exersize for the reader. :) > > Jonathan > -- Nipen Mark
Re: wildcard and proximity searches
Thanks ahmet Is it also possible to search the document having a field ENDING with "week*" query should return documents with a field ending with week and its derivatives such as weekly,weeks So above query should return "this week" "Past three weeks" "Report weekly" thanks chandan On Tue, Oct 5, 2010 at 5:04 PM, Ahmet Arslan wrote: > > Also does this plugin allow us to use proximity with wild > > card > > * "solr mail*"~10 * > > > > Yes it supports "solr mail*"~10 kind of queries without any problem. > > Currently it throws exception with "mail*" kind of queries, but they are > not valid phrase queries. Because there is only one clause inside quotation > marks. > > > > -- Nipen Mark
filtering footer information
Is it possible to filter certain repeated footer information from text documents while indexing to solr ? Are there any built-in filters similar to stop word filters ? -- Thanks, *Nipen Mark *
filtering number and repeated contents
Is it possible to filter out numbers and disclaimer ( repeated contents) while indexing to SOLR? These are all surplus information and do not want to index it I have tried using boilerpipe algorithm as well to remove surplus infromation from web pages such as navigational elements, templates, and advertisements , I think it works well but looking forward to see If I could filter out "disclaimer" information too mainly in email texts. -- Thanks, *Nipen Mark *
Re: filtering number and repeated contents
thanks Jack , I will try updateProcessor Between does SOLR store tokenized "content" in fields if field have property stored="true" ? On Tue, Jun 5, 2012 at 8:23 PM, Jack Krupansky wrote: > My (very limited) understanding of "boilerpipe" in Tika is that it strips > out "short text", which is great for all the menu and navigation text, but > the typical disclaimer at the bottom of an email is not very short and > frequently can be longer than the email message body itself. You may have > to resort to a custom update processor that is programmed with some > disclaimer signature text strings to be removed from field values. > > -- Jack Krupansky > > -Original Message- From: Mark , N > Sent: Tuesday, June 05, 2012 8:28 AM > To: solr-user@lucene.apache.org > Subject: filtering number and repeated contents > > > Is it possible to filter out numbers and disclaimer ( repeated contents) > while indexing to SOLR? > These are all surplus information and do not want to index it > > I have tried using boilerpipe algorithm as well to remove surplus > infromation from web pages such as navigational elements, templates, and > advertisements , I think it works well but looking forward to see If I > could filter out "disclaimer" information too mainly in email texts. > -- > Thanks, > > *Nipen Mark * > -- Thanks, *Nipen Mark *
search hit on multivalued fields
I have a multivalued field "Tex" which is indexed , for example : F1: some value F2: some value Text = ( content of f1,f2) When user search , I am checking only a "Text" field but i would also need to display to users which Field ( F1 or F2 ) resulted the search hit Is it possible in SOLR ? -- Thanks, *Nipen Mark *
nested solr queries
Is it possible to write nested queries in Solr similar to sql like query where I can take results of the first query and use one or more of its fields as an argument in the second query. For example: field1:XYZ AND (_query_: field3:{value of field4}) This should search for all types of XYZ and then iterate over the result set and perform a query for where field3 is equal to the value of field1 from each item of the first result set. this is similar to SQL like query select distinct ( fieldA ) from table where fieldA IN
Re: nested solr queries
hi shalin I am trying to achieve something like JOIN. Previously am doing this with two queries on solr solr index = ( field1 ,field 2, field3) query1 = ( for example field1="ABC" ) suppose query1 returns results set1= { 1, 2 ,3 ,4 } which matches query1 query2 = ( get all records having field2="xyz" for each records i.e for set1= {1,2,3,4} returned by query1 ) Am not sure if I could do something like this using the nested solr query from link http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/ thanks On Mon, Nov 30, 2009 at 1:50 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Mon, Nov 30, 2009 at 1:19 PM, Mark N wrote: > > > Is it possible to write nested queries in Solr similar to sql like query > > where I can take results of the first query and use one or more of its > > fields as an argument in the second query. > > > > > That sounds like a join. If so, the answer would be no. > > > > > > For example: > > > > field1:XYZ AND (_query_: field3:{value of field4}) > > > > This should search for all types of XYZ and then iterate over the result > > set > > and perform a query for where field3 is equal to the value of field1 > from > > each item of the first result set. > > > > > Your description is not consistent with the query you have given. If > field:XYZ is specified, then what are "types" of XYZ? Also, if you want to > perform a query where field3 is equal to the value of field1 then, what is > field4 in the query you have given? > > > > this is similar to SQL like query > > > > > > select distinct ( fieldA ) from table where fieldA IN > > > > That sounds similar to faceting. See > http://wiki.apache.org/solr/SimpleFacetParameters > > Perhaps you can give more details on what you want to achieve. > > -- > Regards, > Shalin Shekhar Mangar. >
Re: nested solr queries
field2="xyz" we dont know until we run query1 To simply i was actually trying to do some kind of JOIN similar to following SQL query select * from table1 where *field2* in ( select *field2 *from dbo.concept_db where field1='ABC' ) if this is not possible then i will have to search inner query ( select *field2 *from dbo.concept_db where field1='ABC' ) first and then only run the outer query thanks chandan On Mon, Nov 30, 2009 at 2:25 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Mon, Nov 30, 2009 at 2:02 PM, Mark N wrote: > > > hi shalin > > > > I am trying to achieve something like JOIN. Previously am doing this with > > two queries on solr > > > > solr index = ( field1 ,field 2, field3) > > > > query1 = ( for example field1="ABC" ) > > > > suppose query1 returns results set1= { 1, 2 ,3 ,4 } which matches query1 > > > > query2 = ( get all records having field2="xyz" for each records i.e > for > > set1= {1,2,3,4} returned by query1 ) > > > > > That sequence of queries will return documents which have field1="ABC" and > field2="xyz". The same result can be obtained in one query with > q=+field1:"ABC" +field2:"xyz" > > Have I misunderstood the problem? > > > > Am not sure if I could do something like this using the nested solr query > > from link > > > > http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/ > > > > > No, nested queries can only influence scores. They do not filter the > results. > > -- > Regards, > Shalin Shekhar Mangar. >
Re: nested solr queries
thanks for your help so do you think I should execute solr queries twice ? or is there any other workarounds On Mon, Nov 30, 2009 at 3:07 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Mon, Nov 30, 2009 at 2:26 PM, Mark N wrote: > > > field2="xyz" we dont know until we run query1 > > > > > Ah, ok. I thought xyz was a literal that you wanted to search. > > > > To simply i was actually trying to do some kind of JOIN similar to > > following > > SQL query > > > > > > select * from table1 where *field2* in > > ( select *field2 *from dbo.concept_db where field1='ABC' ) > > > > if this is not possible then i will have to search inner query ( > > select *field2 > > *from dbo.concept_db where field1='ABC' ) first and then only run the > > outer query > > > > > No, there are no joins in Solr. Consider de-normalizing your schema, if you > haven't. > > -- > Regards, > Shalin Shekhar Mangar. > -- Nipen Mark
Enumerating wildcard terms
Is it possible to enumerate all terms that match the specified wildcard filter term. Similar to Lunce WildCardTermEnum API for example if I search abc* then I just should able to access all the terms abc1, abc2 , abc3... that exists in Index What should be better approach to meet this functionality ? -- Nipen Mark
Indexing large text documents
SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( "Fulltext", strContent); strContent is a string variable which contains contents of text file. ( assume that text file is located in c:\files\abc.txt ) In my case abc.text ( text files ) could be very huge ~ 2 GB so it is not always possible to read and store them into string variables before indexing . Can anyone suggest what should be better approach to index these huge text files ? -- Nipen Mark
solr updateCSV
I am trying to use solr's csv updater to index the data , i am tryin to specify the .Dat format consisting of field seperator , text qualifier and a line seperator for example field 1 < field separator> field 2 value for field 1value for field 2 Can we specify text qualifier and line seperator as well ? I have tested that we can specify a seperator and works good. -- Nipen Mark
Getting max/min dates from solr index
How can we get the max and min date from the Solr index ? I would need these dates to draw a graph ( for example timeline graph ) Also can we use date faceting to show how many documents are indexed every month . Consider I need to draw a timeline graph for current year to show how many records are indexed for every month .So i will have months in X axis and no of document in Y axis. What should be the better approach to design a schema to achieve this functionality ? Any suggestions would be appreciated thanks -- Nipen Mark
Re: Getting max/min dates from solr index
thanks . Is it possible to do date faceting on multiple solr shards? I am using index created in two different shards to do date faceting on field "DATE" * http://localhost:8983/solr/1_13_1_3/select?&shards=localhost:8983/solr/index1/,localhost_two:8983/solr/index/&start=0&rows=20&q=*&facet=true&facet.date=DATE&facet.date.start=2004-01-01T00:00:00Z&facet.date.end=2011-01-01T00:00:00Z&facet.date.gap=%2B1YEAR * On Fri, Feb 12, 2010 at 3:39 AM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Mark, > > Yes, facets will give you that information. Min/max StatsComponent? > See http://www.search-lucene.com/?q=StatsComponent > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Hadoop ecosystem search :: http://search-hadoop.com/ > > > > - Original Message > > From: Mark N > > To: solr-user@lucene.apache.org > > Sent: Wed, February 10, 2010 8:12:43 AM > > Subject: Getting max/min dates from solr index > > > > How can we get the max and min date from the Solr index ? I would need > these > > dates to draw a graph ( for example timeline graph ) > > > > > > Also can we use date faceting to show how many documents are indexed > every > > month . > > Consider I need to draw a timeline graph for current year to show how > many > > records are indexed for every month .So i will have months in X axis and > no > > of document in Y axis. > > > > What should be the better approach to design a schema to achieve this > > functionality ? > > > > > > Any suggestions would be appreciated > > > > thanks > > > > > > -- > > Nipen Mark > > -- Nipen Mark
indexing a huge data
what should be the fastest way to index a documents , I am indexing huge collection of data after extracting certain meta - data information for example author and filename of each files i am extracting these information and storing in XML format for example :1abc abc.doc 2abc abc1.doc I can not index these documents directly to solr as it is not in the format required by solr ( i can not change the format as its used in other modules) should converting these file to CSV will be better and faster approach compared to XML? please suggest -- Nipen Mark
Solr DataImportHandler
Is it possible to use solr DataImportHandler when that database fields are not fixed ? As per my findings we need to configure which table ( entity) we will read the data and must match which fields in database will map to fields in solr schema Since in my case database fields could be dynamic , can DIH be helpful ? please suggest -- Nipen Mark