How to Index multiple tables using SOLR
Hi, Eventhough I am new to SOLR I was able to successfully index a single table in a very short span of time. Now we have a requirement where the search needs to happen on multiple tables (mutiple table indexes) at the same time. I couldnt figure out a way to index more than one table in SOLR and search on that indexed data. I tried using the below data config format but its just indexing either of the 2 tables (and not both the tables). DB-config.xml Schema.xml ElementPropertyID Can anyone help me out with a solution / pointers to the solution. Thanks, Barani -- View this message in context: http://old.nabble.com/How-to-Index-multiple-tables-using-SOLR-tp27277250p27277250.html Sent from the Solr - User mailing list archive at Nabble.com.
How to Create dynamic field names using script transformers
Hi, I am trying to generate a dynamic fieldname using custom transformers but couldn't achieve the expected results. My requirement is that I do not want to hardcode some of field names used by SOLR for indexing, instead the field name should be generated using the data retreieved from a table. Any help on this regard is greatly appreciated. Thanks, Barani -- View this message in context: http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27329876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Create dynamic field names using script transformers
Hey Erik, Thanks a lot for your reply.. I am a newbie to SOLR ... I am just trying to use the example present in Apache WIKI to understand "how" the scriptTransformer works. I want to know how to pass the data from table.field to transformer and get back the data from transformer and set the value to any field. <![CDATA[ function f1(row){ row.put('Hello', 'Test'); return row; } ]]> Basically I want a field like... and index this field so that users can search on this dynamic field and get the corresponding data also. Thanks, Barani Erik Hatcher-4 wrote: > > Barani - > > Give us some details of what you tried, what you expected to happen, > and what actually happened. > > Erik > > > On Jan 26, 2010, at 4:15 PM, JavaGuy84 wrote: > >> >> Hi, >> >> I am trying to generate a dynamic fieldname using custom >> transformers but >> couldn't achieve the expected results. >> >> My requirement is that I do not want to hardcode some of field names >> used by >> SOLR for indexing, instead the field name should be generated using >> the data >> retreieved from a table. >> >> Any help on this regard is greatly appreciated. >> >> Thanks, >> Barani >> -- >> View this message in context: >> http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27329876.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > -- View this message in context: http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27330330.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Create dynamic field names using script transformers
To add some more details, this is what I am trying to acheive... There are 2 fields present in a database table and I am trying to make those 2 fields as key value pair. Eg: Consider there are 2 fields associated with each other (Propertyid and propertyValue) I want the property id as field name and property value as its field value..something like... <111>Test<1> Thanks, Barani JavaGuy84 wrote: > > Hi, > > I am trying to generate a dynamic fieldname using custom transformers but > couldn't achieve the expected results. > > My requirement is that I do not want to hardcode some of field names used > by SOLR for indexing, instead the field name should be generated using the > data retreieved from a table. > > Any help on this regard is greatly appreciated. > > Thanks, > Barani > -- View this message in context: http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27330470.html Sent from the Solr - User mailing list archive at Nabble.com.
How to index the fields as key value pair if a query returns multiple rows
Hi all, I have a scenario where a particular query returns multiple results and I need to map those results as a key value pair. Ex: http://old.nabble.com/How-to-index-the-fields-as-key-value-pair-if-a-query-returns-multiple-rows-tp27332475p27332475.html Sent from the Solr - User mailing list archive at Nabble.com.
Performance issue in indexing the data with DIH when using subqueries
Hi, I am facing a performace issue when I am trying to index the data using DIH.. I have a model as below Tables Object ObjectProperty ObjectRelationship Object --> ObjectProperty one to Many Relationship Object --> ObjectRelationship one to Many Relationship We need to get the object and its related properties / relationship in a single document. So as of now I have a outer query in DIH config which loops over the objectid and for each objectid my inner query retreives the data from objectrelationship/objectproperty table. The performace seems to be very bad (took 4+ minutes to index 4000 rows / 590 documents) and I am trying to figure out a way to improve the performance. It would be great if someone can give me a suggestion on how to overcome / work around to this problem. Also it would be great if someone can provide me with any tips that can improve the indexing performance. Thanks, Barani -- View this message in context: http://old.nabble.com/Performance-issue-in-indexing-the-data-with-DIH-when-using-subqueries-tp27692967p27692967.html Sent from the Solr - User mailing list archive at Nabble.com.
CachedSqlEntityProcessor- Need help using multiple lookups
Hi, I am trying to use the CachedSqlEntityProcessor for one of my requirement. I couldnt make the CachedSqlEntityProcessor accept more than one filter in the 'where' condition I have something like this My question is, how to make the CachedSqlEntityProcessor accept multiple filter condition something like Any pointers or tips would be of great help.. Thanks a lot in advance. Thanks, Barani -- View this message in context: http://old.nabble.com/CachedSqlEntityProcessor--Need-help-using-multiple-lookups-tp27706841p27706841.html Sent from the Solr - User mailing list archive at Nabble.com.
Delta Query - DIH
Hi,My data config looks like below, I am able to successfully run the Full-Import query without any issue. I am not sure how can I implement a delta query as each of the tables get updated independantly and I need the updates of that particular table to get reflected independently (in the solr document).Thanks,Barani -- View this message in context: http://old.nabble.com/Delta-Query---DIH-tp27714480p27714480.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH Issue in Delta Query
Hi, My data config looks like below I am able to successfully run the Full-Import query without any issue. I am not sure how can I implement a delta query as each of the tables get updated independantly and I need the updates of that particular table to get reflected in solr document. Thanks, Barani -- View this message in context: http://old.nabble.com/DIH-Issue-in-Delta-Query-tp27714483p27714483.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance issue in indexing the data with DIH when using subqueries
Thanks a lot Shalin.. This resolve my issue :). Thanks, Barani Shalin Shekhar Mangar wrote: > > On Tue, Feb 23, 2010 at 1:01 AM, JavaGuy84 wrote: > >> >> Hi, >> >> I am facing a performace issue when I am trying to index the data using >> DIH.. I have a model as below >> >> Tables >> >> Object >> ObjectProperty >> ObjectRelationship >> >> >> Object --> ObjectProperty one to Many Relationship >> Object --> ObjectRelationship one to Many Relationship >> >> We need to get the object and its related properties / relationship in a >> single document. >> >> So as of now I have a outer query in DIH config which loops over the >> objectid and for each objectid my inner query retreives the data from >> objectrelationship/objectproperty table. >> >> The performace seems to be very bad (took 4+ minutes to index 4000 rows / >> 590 documents) and I am trying to figure out a way to improve the >> performance. >> >> It would be great if someone can give me a suggestion on how to overcome >> / >> work around to this problem. >> >> > Have you tried using CachedSqlEntityProcessor? > > See http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://old.nabble.com/Performance-issue-in-indexing-the-data-with-DIH-when-using-subqueries-tp27692967p27714484.html Sent from the Solr - User mailing list archive at Nabble.com.
Multi core Search is not working when used with SHARDS
Hi all, I am trying to search on multiple cores (distributed search) but not able to succeed using Shards. I am able to get the results when I am hitting each core seperately, http://localhost:8981/solr/core1/select/?q=test http://localhost:8981/solr/core0/select/?q=test but when I try to use distributed search using Shards as below http://localhost:8981/solr/core0/select?shards=localhost:8981/solr/core0,localhost:8981/solr/core1&indent=true&q=test I am getting the below error, HTTP ERROR: 500 null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.createMainQuery(QueryComponent.java:372) at org.apache.solr.handler.component.QueryComponent.distributedProcess(QueryComponent.java:292) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:234) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) RequestURI=/solr/core0/select Powered by Jetty:// Do I need to make any changes to make the Shards work? Thanks, Barani -- View this message in context: http://old.nabble.com/Multi-core-Search-is-not-working-when-used-with-SHARDS-tp27772726p27772726.html Sent from the Solr - User mailing list archive at Nabble.com.
Confused with Shards multicore search results
Hi, I finally got shards work with multicore but now I am facing a different issue. I have 2 seperate schema / data config files for each core. I also have different unique id for each schema.xml file. I indexed both the cores and I was able to successfully search independently on each core but when I used Shards, I didnt get what I expected. For ex: http://localhost:8990/solr/core0/select?q=1565 returned 1 row http://localhost:8990/solr/core1/select?q=1565 returned 1 row When I tried this http://localhost:8990/solr/core0/select/?q=1565&shards=localhost:8990/solr/core0,localhost:8990/solr/core1 It again returned just one row.. but I would think that it should return 2 rows if I have different unique id for each document. Is there any configuration I need to do in order to make it searchable across multiple indexex? any primary / slave configuration? any help would be of great help to me. Thanks a lot in advance. Thanks, Barani -- View this message in context: http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p27776478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Confused with Shards multicore search results
Thanks a lot for your reply, I will surely try this.. I have a requirement to index 2 diff schema's but need to do a search on both using a single url. Is there a way I can have 2 diff schema's / data config file and do a search on both the indexes using a single URL (like using Shards?) Thanks, Barani Lance Norskog-2 wrote: > > "different unique id for each schema.xml file." > > All cores should have the same schema file with the same unique id > field and type. > > Did you mean that the documents in both cores have a different value > for the unique id field? > > On Wed, Mar 3, 2010 at 6:45 PM, JavaGuy84 wrote: >> >> Hi, >> >> I finally got shards work with multicore but now I am facing a different >> issue. >> >> I have 2 seperate schema / data config files for each core. I also have >> different unique id for each schema.xml file. >> >> I indexed both the cores and I was able to successfully search >> independently >> on each core but when I used Shards, I didnt get what I expected. For ex: >> >> http://localhost:8990/solr/core0/select?q=1565 returned 1 row >> http://localhost:8990/solr/core1/select?q=1565 returned 1 row >> >> When I tried this >> http://localhost:8990/solr/core0/select/?q=1565&shards=localhost:8990/solr/core0,localhost:8990/solr/core1 >> >> It again returned just one row.. but I would think that it should return >> 2 >> rows if I have different unique id for each document. >> >> Is there any configuration I need to do in order to make it searchable >> across multiple indexex? any primary / slave configuration? any help >> would >> be of great help to me. >> >> Thanks a lot in advance. >> >> Thanks, >> Barani >> -- >> View this message in context: >> http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p27776478.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Lance Norskog > goks...@gmail.com > > -- View this message in context: http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p2152.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR takes more than 9 hours to index 300000 rows
Hi, I am facing performance issue in SOLR when indexing huge data. Please find below the stats, 8:57:17.334 42778 273725 42775 0 Indexing of 273725 rows is taking almost 9 hours. Please find below my Data config file Time taken to directly query the database with the above mentioned SQL statements, select objectuid as uid, objectid, objecttype, objectname, repositoryname, a.lastupdateddate from MetaModel.POC.Object a, MetaModel.POC.Repository b where a.repositoryid = b.repositoryid ---> 3 minutes select ObjectUID,ObjectPropertyName as name, ObjectPropertyValue as value from MetaModel.POC.ObjectProperty --> 5 minutes select OBJECT1uid,Object2name as rname,Object2type as rtype,relationshiptype as rship, b.RepositoryName as rrepname from MetaModel.POC.BinaryRelationShip a, MetaModel.POC.Repository b where a.Object2RepositoryId=b.repositoryId" --> 3 seconds As I am using CachedSqlEntityProcessor I assume that SOLR first issues these select statements (mentioned above first) and then it match based on cacheKey (from caching), so SOLR should ideally take (addition of time taken to execute the above 3 queries + some time for doing filtering based on cacheKey ). But in my case its taking hours and hours for indexing. Can someone please let me know if I am doing anything wrong which might cause this issue? Thanks, Barani -- View this message in context: http://old.nabble.com/SOLR-takes-more-than-9-hours-to-index-30-rows-tp27805403p27805403.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR takes more than 9 hours to index 300000 rows
Shawn, Thanks a lot for your response, Yes, still the DB connection is active.. It is still fetching the data from the DB. I am using Redhat MetaMatrix DB as backend and I am trying to find out the parameter for setting the JDBC fetch size.. Do you think that this problem will be mostly due to fetch size? Thanks, Barani Shawn Heisey-4 wrote: > > At the 9+ hour mark, is your database server showing active connections > that are sending data, or is all the activity local to SOLR? > > We have a 40 million row database in MySQL, with each row comprising > more than 80 fields. I'm including the config from one of our shards. > There are about 6.6 million rows in this shard, and it indexes to a 16GB > index (9.6GB of which is the .fdt file) in 2-3 hours depending on how > loaded the database server is at the time. I once indexed the all 40 > million rows into a shard and that only took 11 hours to build a 91GB > index. > > The batchSize parameter is necessary to have the jdbc driver stream the > results instead of trying to cache them all before sending them to the > application. The server doesn't have enough memory for that. > > > driver="com.mysql.jdbc.Driver" >encoding="UTF-8" > > url="jdbc:mysql://[SERVER]:3306/[SCHEMA]?zeroDateTimeBehavior=convertToNull" >batchSize="-1" >user="[REMOVED]" > password="[REMOVED]"/> > > query="select * from [TABLE] where (did mod 6) = 0"> > > > > > On 3/6/2010 9:36 AM, JavaGuy84 wrote: >> Hi, >> >> I am facing performance issue in SOLR when indexing huge data. Please >> find >> below the stats, >> >>8:57:17.334 >>42778 >>273725 >>42775 >>0 >> >> Indexing of 273725 rows is taking almost 9 hours. Please find below my >> Data >> config file >> >> >> >> >> > query="select objectuid as uid, objectid, objecttype, >> objectname, >> repositoryname, a.lastupdateddate from MetaModel.POC.Object a, >> MetaModel.POC.Repository b where a.repositoryid = b.repositoryid" >> transformer="RegexTransformer,DateFormatTransformer,TemplateTransformer"> >> >> >> >> >> >> >> >> > processor="CachedSqlEntityProcessor" cacheKey="ObjectUID" >> cacheLookup="object.uid" >> transformer="RegexTransformer,DateFormatTransformer,TemplateTransformer"> >> >> >> >> >> >> > processor="CachedSqlEntityProcessor" cacheKey="OBJECT1uid" >> cacheLookup="object.uid" >> transformer="RegexTransformer,DateFormatTransformer,TemplateTransformer"> >> >> >> >> >> >> >> >> >> >> >> Time taken to directly query the database with the above mentioned SQL >> statements, >> >> >> select objectuid as uid, objectid, objecttype, objectname, >> repositoryname, >> a.lastupdateddate from MetaModel.POC.Object a, MetaModel.POC.Repository >> b >> where a.repositoryid = b.repositoryid ---> 3 minutes >> >> select ObjectUID,ObjectPropertyName as name, ObjectPropertyValue as value >> from MetaModel.POC.ObjectProperty --> 5 minutes >> >> >> select OBJECT1uid,Object2name as rname,Object2type as >> rtype,relationshiptype >> as rship, b.RepositoryName as rrepname from >> MetaModel.POC.BinaryRelationShip a, MetaModel.POC.Repository b where >> a.Object2RepositoryId=b.repositoryId" --> 3 seconds >> >> As I am using CachedSqlEntityProcessor I assume that SOLR first issues >> these >> select statements (mentioned above first) and then it match based on >> cacheKey (from caching), so SOLR should ideally take (addition of time >> taken to execute the above 3 queries + some time for doing filtering >> based >> on cacheKey ). But in my case its taking hours and hours for indexing. >> >> Can someone please let me know if I am doing anything wrong which might >> cause this issue? >> > > > -- View this message in context: http://old.nabble.com/SOLR-takes-more-than-9-hours-to-index-30-rows-tp27805403p27806375.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR takes more than 9 hours to index 300000 rows
Shawn, Please find below the resulset size of each query, select objectuid as uid, objectid, objecttype, objectname, repositoryname, a.lastupdateddate from MetaModel.POC.Object a, MetaModel.POC.Repository b where a.repositoryid = b.repositoryid ---> 30 rows ---> Query 1 select ObjectUID,ObjectPropertyName as name, ObjectPropertyValue as value from MetaModel.POC.ObjectProperty --> 60 rows --> Query 2 select OBJECT1uid,Object2name as rname,Object2type as rtype,relationshiptype as rship, b.RepositoryName as rrepname from MetaModel.POC.BinaryRelationShip a, MetaModel.POC.Repository b where a.Object2RepositoryId=b.repositoryId" --> 600 rows --> Query 3 I want my second and third query to run for every row returned by my first query without hitting the DB for every loop hence started using SqlCachedEntityProcessor. Please find below RAM stats of my server running SOLR / MetaMatrix DB SOLR - 100,176 K MetaMatrix DB - 1106048 K almost 1080.125 MB Thanks, Barani Shawn Heisey-4 wrote: > > Do keep looking into the batchSize, but I think I might have found the > issue. If I understand things correctly, you will need to add > processor="CachedSqlEntityProcessor" to your first entity. It's only > specified on the other two. Assuming you have enough RAM and heap space > available in your JVM to load the results of all three queries, that > ought to make it work very quickly. > > If I'm right, basically what it's doing is issuing a real SQL query > against your first table for every entry it has read for the other two > tables. > > Shawn > > On 3/6/2010 11:58 AM, JavaGuy84 wrote: >> Shawn, >> >> Thanks a lot for your response, >> >> Yes, still the DB connection is active.. It is still fetching the data >> from >> the DB. >> >> I am using Redhat MetaMatrix DB as backend and I am trying to find out >> the >> parameter for setting the JDBC fetch size.. >> >> Do you think that this problem will be mostly due to fetch size? >> >> Thanks, >> Barani >> > > > -- View this message in context: http://old.nabble.com/SOLR-takes-more-than-9-hours-to-index-30-rows-tp27805403p27806552.html Sent from the Solr - User mailing list archive at Nabble.com.
Is it possible to use ODBC with DIH?
Hi, I have a ODBC driver with me for MetaMatrix DB(Redhat). I am trying to figure out a way to use DIH using the DSN which has been created in my machine with that ODBC driver? Is it possible to spcify a DSN in DIH and index the DB? if its possible, can you please let me know the ODBC URL that I need to enter for Datasource in DIH data-config.xml? Thanks, Barani -- View this message in context: http://old.nabble.com/Is-it-possible-to-use-ODBC-with-DIH--tp27808016p27808016.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR takes more than 9 hours to index 300000 rows
Shawn, Increasing the fetch size and increasing my heap based on that did the trick.. Thanksss a lot for your help.. your suggestions helped me a lot.. Hope these suggestions will be helpful to others too who are facing similar kind of issue. Thanks, Barani Shawn Heisey-4 wrote: > > Do keep looking into the batchSize, but I think I might have found the > issue. If I understand things correctly, you will need to add > processor="CachedSqlEntityProcessor" to your first entity. It's only > specified on the other two. Assuming you have enough RAM and heap space > available in your JVM to load the results of all three queries, that > ought to make it work very quickly. > > If I'm right, basically what it's doing is issuing a real SQL query > against your first table for every entry it has read for the other two > tables. > > Shawn > > On 3/6/2010 11:58 AM, JavaGuy84 wrote: >> Shawn, >> >> Thanks a lot for your response, >> >> Yes, still the DB connection is active.. It is still fetching the data >> from >> the DB. >> >> I am using Redhat MetaMatrix DB as backend and I am trying to find out >> the >> parameter for setting the JDBC fetch size.. >> >> Do you think that this problem will be mostly due to fetch size? >> >> Thanks, >> Barani >> > > > -- View this message in context: http://old.nabble.com/SOLR-takes-more-than-9-hours-to-index-30-rows-tp27805403p27825172.html Sent from the Solr - User mailing list archive at Nabble.com.
Search on dynamic fields which contains spaces /special characters
Hi, We have some dynamic fields getting indexed using SOLR. Some of the dynamic fields contains spaces / special character (something like: short name, Full Name etc...). Is there a way to search on these fields (which contains the spaces etc..). Can someone let me know the filter I need to pass to do this type of search? I tried with short name:name1 --> this didnt work.. Thanks, Barani -- View this message in context: http://old.nabble.com/Search-on-dynamic-fields-which-contains-spaces--special-characters-tp27826147p27826147.html Sent from the Solr - User mailing list archive at Nabble.com.
How to edit / compile the SOLR source code
Hi, Sorry for asking this very simple question but I am very new to SOLR and I want to play with its source code. As a initial step I have a requirement to enable wildcard search (*text) in SOLR. I am trying to figure out a way to import the complete SOLR build to Eclipse and edit QueryParsing.java file but I am not able to import (I tried to import with ant project in Eclipse and selected the build.xml file and got an error stating javac is not present in the build.xml file). Can someone help me out with the initial steps on how to import / edit / compile / test the SOLR source? Thanks a lot for your help!!! Thanks, B -- View this message in context: http://old.nabble.com/How-to-edit---compile-the-SOLR-source-code-tp27866410p27866410.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to edit / compile the SOLR source code
Eric, Thanks a lot for your reply. I was able to successfully hack the query parser and enabled the leading wild card search. As of today I hacked the code for this reason only, I am not sure how to make the leading wild card search to work without hacking the code and this type of search is the preferred type of search in our organization. I had previously searched all over the web to find out 'why' that feature was disabled as default but couldn't find any solid answer stating the reason. In one of the posting in nabble it was mentioned that it might take a performance hit if we enable the leading wild card search, can you please let me know your comments on that? But I am very much interested in contributing some new stuff to SOLR group so I consider this as a starting point.. Thanks, Barani Erick Erickson wrote: > > See Trey's comment, but before you go there. > > What about SOLR's wildcard searching capabilities aren't > working for you now? There are a couple of tricks for making > leading wildcard searches work quickly, but this is a solved > problem. Although whether the existing solutions work in > your situation may be an open question... > > Or do you have to hack into the parser for other reasons? > > Best > Erick > > On Thu, Mar 11, 2010 at 12:07 PM, JavaGuy84 wrote: > >> >> Hi, >> >> Sorry for asking this very simple question but I am very new to SOLR and >> I >> want to play with its source code. >> >> As a initial step I have a requirement to enable wildcard search (*text) >> in >> SOLR. I am trying to figure out a way to import the complete SOLR build >> to >> Eclipse and edit QueryParsing.java file but I am not able to import (I >> tried >> to import with ant project in Eclipse and selected the build.xml file and >> got an error stating javac is not present in the build.xml file). >> >> Can someone help me out with the initial steps on how to import / edit / >> compile / test the SOLR source? >> >> Thanks a lot for your help!!! >> >> Thanks, >> B >> -- >> View this message in context: >> http://old.nabble.com/How-to-edit---compile-the-SOLR-source-code-tp27866410p27866410.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/How-to-edit---compile-the-SOLR-source-code-tp27866410p27871470.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to edit / compile the SOLR source code
Erik, That was a wonderful explanation, I hope many folks in this forum will be benefited from the explanation you have given here. Actually I Googled and found the solution when you had earlier mentioned that I can do a leading wildcard without hacking the code. I found out the patch that had been already available to resolve this issue (by using ReversedWildcardFilterFactory) and I have started to implement that idea. Thanks a lot for your valuable time.. SOLR rocks Thanks, Barani Erick Erickson wrote: > > Leaving aside some historical reasons, the root of > the issue is that any search has to identify all the > terms in a field that satisfy it. Let's take a normal > non-leading wildcard case first. > > Finding all the terms like 'some*' will have to > deal with many fewer terms than 's*'. Just dealing with > that many terms will decrease performance, regardless > of the underlying mechanisms used. Imagine you're > searching down an ordered list of all the terms for > a field, assembling a list, and then comparing that list with > all the terms in that field with your list. > > So, pure wildcard serches, i.e. just *, would have to > handle all the terms in the index for the field. > > The situation with leading wildcards is worse than > trailing, since all the terms in the index have to be > examined. Even doing something as bad as > a* will examine only terms starting in a. But looking > for *a has to examine each and every term in the index > because australia and zebra both qualify, there aren't > any good shortcuts if you think of having an ordered > list of terms in a field. > > So performance can degrade pretty dramatically when > you allow this kind of thing and the original writers > (my opinion here, I wasn't one of them) decided it was > much better to disallow it by default and require users > to dig around for the why rather than have them > crash and burn a lot by something that seems innocent > if you aren't familiar with the issues involved. > > A better approach is, and this isn't very obvious, > is to index your terms reversed, and do leading wildcard > searches on the *reversed* field as trailing wildcards. > E.g. 'some' gets indexed as 'emos' and the wildcard > search '*me' gets searched in the reversed field as > 'em*'. > > There may still be performance issues if you allow > single-letter wildcards, e.g. s* or *s, although a lot of > work has been done in this area in the last few years. > You'll have to measure in your situation. And beware > that a really common problem when deciding how many > real letters to allow is that it all works fine in your test > data, but when you load your real corpus and suddenly > SOLR/Lucene has to deal with 100,000 terms that > might match rather than the 1,000 in your test set, response > time changesfor the worse. > > So I'd look around for the reversed idea (See SOLR-1321 > in the JIRA), and at least one of the schema examples > has it. > > One hurdle for me was asking the question "does it > really help the user to allow one or two leading > characters in a wildcard search?". Surprisingly often, > that's of no use to real users because so many > terms match that it's overwhelming. YMMV, but it's > a good question to ask if you find yourself in a > quagmire because you allow a* type of queries. > > There are other strategies too, but that seems easiest > > Now, all that said, SOLR has done significant work > to make wildcards work well, these are just general > things to look out for when thinking about wildcards... > > I really think hacking the parser will come back to bite > you as both as a maintenance and performance issue, > I wouldn't go there without a pretty exhaustive look at > other options. > > HTH > Erick > > On Thu, Mar 11, 2010 at 6:29 PM, JavaGuy84 wrote: > >> >> Eric, >> >> Thanks a lot for your reply. >> >> I was able to successfully hack the query parser and enabled the leading >> wild card search. >> >> As of today I hacked the code for this reason only, I am not sure how to >> make the leading wild card search to work without hacking the code and >> this >> type of search is the preferred type of search in our organization. >> >> I had previously searched all over the web to find out 'why' that feature >> was disabled as default but couldn't find any solid answer stating the >> reason. In one of the posting in nabble it was mentioned that it might >> take >> a performance hit if we enable the
How does ReversedWildcardFilterFactory work?
Hi, I am just curious to know how this class works and how this can be implemented in right way. I got the details from JIRA as below This patch is an implementation of the "reversed tokens" strategy for efficient leading wildcards queries. ReversedWildcardsTokenFilter reverses tokens and returns both the original token (optional) and the reversed token (with positionIncrement == 0). Reversed tokens are prepended with a marker character to avoid collisions between legitimate tokens and the reversed tokens - e.g. "DNA" would become "and", thus colliding with the regular term "and", but with the marker character it becomes "\u0001and". I just want to know if I can just include a new fieldtype with ReversedWildcardFilterFactory class and start the index? and will that take care of reversing the strings for all the fields present in the schema? This is what I have done as of now, I have added a field with fieldtype text_rev which implements ReversedWildcardFilterFactory class .. Is that it? I dont have to reverse each and every field which is getting indexed right? Thanks, Barani -- View this message in context: http://old.nabble.com/How-does-ReversedWildcardFilterFactory-work--tp27879839p27879839.html Sent from the Solr - User mailing list archive at Nabble.com.
Need help in deploying the modified SOLR source code
Hi, I had made some changes to solrqueryparser.java using Eclipse and I am able to do a leading wildcard search using Jetty plugin (downloaded this plugin for eclipse).. Now I am not sure how I can package this code and redploy it. Can someone help me out please? Thanks, B -- View this message in context: http://old.nabble.com/Need-help-in-deploying-the-modified-SOLR-source-code-tp27885403p27885403.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH - Out of Memory error when using CachedsqlEntityProcessor
Hi, I am using CachedsqlEntityProcessor in my DIH dataconfig to reduce the number of queries executed against the database , I having more than 2 million rows returned for Entity 2 and around 30 rows returned for entity1. I am have set the heap size to 1 GB but even then I am always getting heap out of size error. I am not sure how to flush the documents in buffer at certain condition. I tried to enable Autocommit / reduced the maxdocbuffersize but of no use.. Can someone let me know what is the best way to overcome this issue? Thanks, Barani -- View this message in context: http://old.nabble.com/DIH---Out-of-Memory-error-when-using-CachedsqlEntityProcessor-tp27889623p27889623.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH - Out of Memory error when using CachedsqlEntityProcessor
print(Unknown Source) at com.metamatrix.common.comm.platform.socket.PrintStreamSocketLog.log(P rintStreamSocketLog.java:169) at com.metamatrix.common.comm.platform.socket.PrintStreamSocketLog.log(P rintStreamSocketLog.java:175) at com.metamatrix.common.comm.platform.socket.PrintStreamSocketLog.logEr ror(PrintStreamSocketLog.java:71) at com.metamatrix.common.comm.platform.socket.client.SocketServerInstanc eImpl$1.run(SocketServerInstanceImpl.java:578) at java.lang.Thread.run(Unknown Source) Mar 13, 2010 3:52:09 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback Thanks, Barani JavaGuy84 wrote: > > Hi, > > I am using CachedsqlEntityProcessor in my DIH dataconfig to reduce the > number of queries executed against the database , > > > cachekey="id" cachelookup="x.id"/> > > > I having more than 2 million rows returned for Entity 2 and around 30 > rows returned for entity1. > > I am have set the heap size to 1 GB but even then I am always getting heap > out of size error. I am not sure how to flush the documents in buffer at > certain condition. I tried to enable Autocommit / reduced the > maxdocbuffersize but of no use.. Can someone let me know what is the best > way to overcome this issue? > > Thanks, > Barani > -- View this message in context: http://old.nabble.com/DIH---Out-of-Memory-error-when-using-CachedsqlEntityProcessor-tp27889623p27890751.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH - Out of Memory error when using CachedsqlEntityProcessor
Erik, I have seen many posts regarding out of memory error but I am not sure whether they are using cachesqlEntityProcessor.. I want to know if there is a way to flush out the buffer of cache instead of storing everything in cache. I can clearly see the heapsize growing like anything if I use the cachesqlentity processor, trying to figure out if there is a way to resolve this by using any other way other than using this processor. Thanks, Barani Erick Erickson wrote: > > Have you searched the users' list? This question has come up multiple > times > and you'll find your question has probably already been answered. Let us > know if you come up blank... > > Best > Erick > > On Sat, Mar 13, 2010 at 3:56 PM, JavaGuy84 wrote: > >> >> Sorry forgot to attach the error log, >> >> Error Log: >> - >> org.apache.solr.handler.dataimport.DataImportHandlerException: >> java.lang.OutOfMe >> moryError: Java heap space >>at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde >> r.java:650) >>at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde >> r.java:605) >>at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j >> ava:261) >>at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java >> :185) >>at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo >> rter.java:333) >>at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j >> ava:391) >>at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja >> va:372) >> Caused by: java.lang.OutOfMemoryError: Java heap space >>at java.util.HashMap.(Unknown Source) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.g >> etARow(JdbcDataSource.java:281) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.a >> ccess$800(JdbcDataSource.java:228) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1 >> .next(JdbcDataSource.java:266) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1 >> .next(JdbcDataSource.java:269) >>at >> org.apache.solr.handler.dataimport.CachedSqlEntityProcessor.getAllNon >> CachedRows(CachedSqlEntityProcessor.java:70) >>at >> org.apache.solr.handler.dataimport.EntityProcessorBase.getIdCacheData >> (EntityProcessorBase.java:194) >>at >> org.apache.solr.handler.dataimport.CachedSqlEntityProcessor.nextRow(C >> achedSqlEntityProcessor.java:58) >>at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent >> ityProcessorWrapper.java:233) >>at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde >> r.java:579) >>... 6 more >> Mar 13, 2010 3:52:09 PM org.apache.solr.handler.dataimport.DataImporter >> doFullIm >> port >> SEVERE: Full Import failed >> org.apache.solr.handler.dataimport.DataImportHandlerException: >> java.lang.OutOfMe >> moryError: Java heap space >>at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde >> r.java:650) >>at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde >> r.java:605) >>at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j >> ava:261) >>at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java >> :185) >>at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo >> rter.java:333) >>at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j >> ava:391) >>at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja >> va:372) >> Caused by: java.lang.OutOfMemoryError: Java heap space >>at java.util.HashMap.(Unknown Source) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.g >> etARow(JdbcDataSource.java:281) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.a >> ccess$800(JdbcDataSource.java:228) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1 >> .next(JdbcDataSource.java:266) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1 >> .next(JdbcData
Re: Exception encountered during replication on slave....Any clues?
Hi William, We are facing the same issue as yourself.. just thought of checking if you had already resolve this issue? Thanks, Barani William Pierce-3 wrote: > > Folks: > > I am seeing this exception in my logs that is causing my replication to > fail.I start with a clean slate (empty data directory). I index the > data on the postingsmaster using the dataimport handler and it succeeds. > When the replication slave attempts to replicate it encounters this error. > > Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex > SEVERE: Master at: http://localhost/postingsmaster/replication is not > available. Index fetch failed. Exception: Invalid version or the data in > not in 'javabin' format > > Any clues as to what I should look for to debug this further? > > Replication is enabled as follows: > > The postingsmaster solrconfig.xml looks as follows: > > > > > commit > > > > > > The postings slave solrconfig.xml looks as follows: > > > > > name="masterUrl">http://localhost/postingsmaster/replication > > 00:05:00 > > > > > Thanks, > > - Bill > > > > -- View this message in context: http://old.nabble.com/Exception-encountered-during-replication-on-slaveAny-clues--tp26684769p27933575.html Sent from the Solr - User mailing list archive at Nabble.com.
Replication failed due to HTTP PROXY?
Hi, One of my collegue back in India is not able to replicate the index present in the Servers (USA). I am now thinking if this is due to any proxy related issue? He is getting the below metioned error message Is there a way to configure PROXY in SOLR config files? Server logs INFO: [] Registered new searcher searc...@edf730 main Mar 17, 2010 8:38:06 PM org.apache.solr.handler.ReplicationHandler getReplicatio nDetails WARNING: Exception while invoking 'details' method for replication on master org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept t he connection within timeout of 5000 ms at org.apache.commons.httpclient.protocol.ReflectionSocketFactory.create Socket(ReflectionSocketFactory.java:155) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c reateSocket(DefaultProtocolSocketFactory.java:125) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java :707) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Http ConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Htt pMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMe thodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav a:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav a:323) at org.apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.ja va:193) at org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java :188) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(Repl icationHandler.java:581) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(Replicat ionHandler.java:180) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.jsp.admin.replication.index_jsp.executeCommand(org.apache. jsp.admin.replication.index_jsp:50) at org.apache.jsp.admin.replication.index_jsp._jspService(org.apache.jsp .admin.replication.index_jsp:231) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper .java:373) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:4 64) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487 ) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 67) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav a:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 81) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 12) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte r.java:264) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet Handler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 65) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav a:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 81) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 12) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand lerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 39) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:50 2) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpCo nnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector. java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool .java:442) Caused by: java.net.Socket