IndexReaders cannot exceed 2 Billion
Hello, I faced an issue that is making me go crazy. I am running SOLR saving data on HDFS and I have a single node setup with an index that has been running fine until today. I know that 2 billion documents is too much on a single node but it has been running fine for my requirements and it was pretty fast. I restarted SOLR today and I am getting an error stating "Too many documents, composite IndexReaders cannot exceed 2147483519. The last backup I have is 2 weeks back and I really need the index to start to get the data from the index. Please help ! -- Regards, Wael
IndexReaders cannot exceed 2 Billion
> > Hello, > > I am facing an issue on my live environment and I couldn’t find a solution > yet. > I am running SOLR saving data on HDFS and I have a single node setup with an > index that has been running fine until today. > I know that 2 billion documents is too much on a single node but it has been > running fine for my requirements and it was pretty fast. > > I restarted SOLR today and I am getting an error stating "Too many documents, > composite IndexReaders cannot exceed 2147483519. > The last backup I have is 2 weeks back and I really need the index to start > to get the data from the index. I can delete data and create a separate shard > but I need it to be up so I can take the data. > > Please help ! > -- > Regards, > Wael
Could not find configName error
Hi, I had some issues in SOLR shutting down on a single node application on Hadoop. After starting up i got the error: Could not find configName for collection XXX found. I know the issue is that the configs has issues in Zookeeper but I would like to know how I can push this configuration back to get the index running. -- Regards, Wael
Re: Could not find configName error
i am using SOLR 4.10.3 I am not sure I have them in source control. I don't actually know what that is. I am using SOLR on a pre-setup VM. On Tue, Sep 5, 2017 at 5:26 PM, Erick Erickson wrote: > What version of Solr? > > bin/solr zk -help > > In particular upconfig can be used to move configsets up to Zookeeper > (or back down or whatever) in relatively recent versions of Solr. Yo > are keeping them in source control right? ;) > > Best, > Erick > > On Mon, Sep 4, 2017 at 11:27 PM, Wael Kader wrote: > > Hi, > > > > I had some issues in SOLR shutting down on a single node application on > > Hadoop. > > > > After starting up i got the error: > > Could not find configName for collection XXX found. > > > > I know the issue is that the configs has issues in Zookeeper but I would > > like to know how I can push this configuration back to get the index > > running. > > > > -- > > Regards, > > Wael > -- Regards, Wael
Faceting Word Count
Hello, I am having an index with around 100 Million documents. I have a multivalued column that I am saving big chunks of text data in. It has around 20 GB of RAM and 4 CPU's. I was doing faceting on it to get word cloud but it was taking around 1 second to retrieve when the data was 5-10 Million . Now I have more data and its taking minutes to get the results (that is if it gets it and SOLR doesn't crash). Whats the best way to make it run or maybe its not scalable to make it run on my current schema and design with News articles. I am looking to find the best solution for this. Maybe create another index to split the data while inserting it or maybe if I change some settings in SolrConfig or add some RAM, it would perform better. -- Regards, Wael
Re: Faceting Word Count
Hi, I am using a custom field. Below is the field definition. I am using this because I don't want stemming. Regards, Wael On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Wael, > Can you provide your field definition and sample query. > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 6 Nov 2017, at 08:30, Wael Kader wrote: > > > > Hello, > > > > I am having an index with around 100 Million documents. > > I have a multivalued column that I am saving big chunks of text data in. > It > > has around 20 GB of RAM and 4 CPU's. > > > > I was doing faceting on it to get word cloud but it was taking around 1 > > second to retrieve when the data was 5-10 Million . > > Now I have more data and its taking minutes to get the results (that is > if > > it gets it and SOLR doesn't crash). Whats the best way to make it run or > > maybe its not scalable to make it run on my current schema and design > with > > News articles. > > > > I am looking to find the best solution for this. Maybe create another > index > > to split the data while inserting it or maybe if I change some settings > in > > SolrConfig or add some RAM, it would perform better. > > > > -- > > Regards, > > Wael > > -- Regards, Wael
Re: Faceting Word Count
Hi, The whole index has 100M but when I add the criteria, it will filter the data to maybe 10k as a max number of rows. The facet isn't working when the total number of records in the index is 100M but it was working at 5M. I have social media & RSS data in the index and I am trying to get the word count for a specific user on specific date intervals. Regards, Wael On Mon, Nov 6, 2017 at 3:42 PM, Erick Erickson wrote: > _Why_ do you want to get the word counts? Faceting on all of the > tokens for 100M docs isn't something Solr is ordinarily used for. As > Emir says it'll take a huge amount of memory. You can use one of the > function queries (termfreq IIRC) that will give you the count of any > individual term you have and will be very fast. > > But getting all of the word counts in the index is probably not > something I'd use Solr for. > > This may be an XY problem, you're asking how to do something specific > (X) without explaining what the problem you're trying to solve is (Y). > Perhaps there's another way to accomplish (Y) if we knew more about > what it is. > > Best, > Erick > > > > On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović > wrote: > > Hi Wael, > > You are faceting on analyzed field. This results in field being > uninverted - fieldValueCache being built - on first call after every > commit. This is both time and memory consuming (you can check in admin > console in stats how much memory it took). > > What you need to do is to create multivalue string field (not text) and > parse values (do analysis steps) on client side and store it like that. > This will allow you to enable docValues on that field and avoid building > fieldValueCache. > > > > HTH, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection > > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > > >> On 6 Nov 2017, at 13:06, Wael Kader wrote: > >> > >> Hi, > >> > >> I am using a custom field. Below is the field definition. > >> I am using this because I don't want stemming. > >> > >> > >> >> positionIncrementGap="100"> > >> > >> >> mapping="mapping-ISOLatin1Accent.txt"/> > >> > >> > >> >>ignoreCase="true" > >>words="stopwords.txt" > >>enablePositionIncrements="true" > >>/> > >> >>protected="protwords.txt" > >>generateWordParts="0" > >>generateNumberParts="1" > >>catenateWords="1" > >>catenateNumbers="1" > >>catenateAll="0" > >>splitOnCaseChange="1" > >>preserveOriginal="1"/> > >> > >> > >> > >> > >> > >> >> mapping="mapping-ISOLatin1Accent.txt"/> > >> > >> synonyms="synonyms.txt" > >> ignoreCase="true" expand="true"/> > >> >>ignoreCase="true" > >>words="stopwords.txt" > >>enablePositionIncrements="true" > >>/> > >> > >> >>protected="protwords.txt" > >>generateWordParts="0" > >>catenateWords="0" > >>catenateNumbers="0" > >>catenateAll="0" > >>splitOnCaseChange="1" > >>preserveOriginal="1"/> > >> > >> > >> > >> > >> > >> > >> > >> > >> Regards, > >> Wael > >> > >> On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović < > >> emir.arnauto...@sematext.com> wrote: > >> > >>> Hi Wael, > >>> Can you provide your field definition and sample query. > >>> > >>> Thanks, > >>> Emir > >>> -- > >>> Monitoring - Log Management - Alerting - Anomaly Detection > >>> Solr & Elasticsearch Consulting Support Training - > http://sematext.com/ > >>> > >>> > >>> > >>>> On 6 Nov 2017, at 08:30, Wael Kader wrote: > >>>> > >>>> Hello, > >>>> > >>>> I am having an index with around 100 Million documents. > >>>> I have a multivalued column that I am saving big chunks of text data > in. > >>> It > >>>> has around 20 GB of RAM and 4 CPU's. > >>>> > >>>> I was doing faceting on it to get word cloud but it was taking around > 1 > >>>> second to retrieve when the data was 5-10 Million . > >>>> Now I have more data and its taking minutes to get the results (that > is > >>> if > >>>> it gets it and SOLR doesn't crash). Whats the best way to make it run > or > >>>> maybe its not scalable to make it run on my current schema and design > >>> with > >>>> News articles. > >>>> > >>>> I am looking to find the best solution for this. Maybe create another > >>> index > >>>> to split the data while inserting it or maybe if I change some > settings > >>> in > >>>> SolrConfig or add some RAM, it would perform better. > >>>> > >>>> -- > >>>> Regards, > >>>> Wael > >>> > >>> > >> > >> > >> -- > >> Regards, > >> Wael > > > -- Regards, Wael
Re: Faceting Word Count
Hi, I want to know the best option for getting word cloud in SOLR. Is it saving the data as multivalued, using vector, JSON faceting(didn't work with me)? Terms doesn't work because I can't provide any criteria. I don't mind changing the design but I need to know the best feasible way that won't make any problems on the long run. I want to be able to get the word frequency based on a criteria. Facets are taking around 1 minute to return data now. Regards, Wael On Wed, Nov 8, 2017 at 11:06 AM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Wael, > You can try out JSON faceting - it’s not just about rq/resp format, but it > uses different implementation as well. In any case you will have to index > documents differently in order to be able to use docValues. > > HTH > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 7 Nov 2017, at 09:26, Wael Kader wrote: > > > > Hi, > > > > The whole index has 100M but when I add the criteria, it will filter the > > data to maybe 10k as a max number of rows. > > The facet isn't working when the total number of records in the index is > > 100M but it was working at 5M. > > > > I have social media & RSS data in the index and I am trying to get the > word > > count for a specific user on specific date intervals. > > > > Regards, > > Wael > > > > On Mon, Nov 6, 2017 at 3:42 PM, Erick Erickson > > wrote: > > > >> _Why_ do you want to get the word counts? Faceting on all of the > >> tokens for 100M docs isn't something Solr is ordinarily used for. As > >> Emir says it'll take a huge amount of memory. You can use one of the > >> function queries (termfreq IIRC) that will give you the count of any > >> individual term you have and will be very fast. > >> > >> But getting all of the word counts in the index is probably not > >> something I'd use Solr for. > >> > >> This may be an XY problem, you're asking how to do something specific > >> (X) without explaining what the problem you're trying to solve is (Y). > >> Perhaps there's another way to accomplish (Y) if we knew more about > >> what it is. > >> > >> Best, > >> Erick > >> > >> > >> > >> On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović > >> wrote: > >>> Hi Wael, > >>> You are faceting on analyzed field. This results in field being > >> uninverted - fieldValueCache being built - on first call after every > >> commit. This is both time and memory consuming (you can check in admin > >> console in stats how much memory it took). > >>> What you need to do is to create multivalue string field (not text) and > >> parse values (do analysis steps) on client side and store it like that. > >> This will allow you to enable docValues on that field and avoid building > >> fieldValueCache. > >>> > >>> HTH, > >>> Emir > >>> -- > >>> Monitoring - Log Management - Alerting - Anomaly Detection > >>> Solr & Elasticsearch Consulting Support Training - > http://sematext.com/ > >>> > >>> > >>> > >>>> On 6 Nov 2017, at 13:06, Wael Kader wrote: > >>>> > >>>> Hi, > >>>> > >>>> I am using a custom field. Below is the field definition. > >>>> I am using this because I don't want stemming. > >>>> > >>>> > >>>>>>>> positionIncrementGap="100"> > >>>> > >>>>>>>> mapping="mapping-ISOLatin1Accent.txt"/> > >>>> > >>>> > >>>>>>>> ignoreCase="true" > >>>> words="stopwords.txt" > >>>> enablePositionIncrements="true" > >>>> /> > >>>>>>>> protected="protwords.txt" > >>>> generateWordParts="0" > >>>> generateNumberParts="1" > >>>> catenateWords="1" > >>>> catenateNumbers="1" > >>>> catenateAll="0" > >>>> splitOnCaseChan
SOLR Data Backup
Hello, Whats the best way to do a backup of the SOLR data. I have a single node solr server and I want to always keep a copy of the data I have. Is replication an option for what I want ? I would like to get some tutorials and papers if possible on the method that should be used in case its backup or replication or anything else. -- Regards, Wael
Re: SOLR Data Backup
Hi, Its not possible for me to re-index the data in some of my indexes is only saved in SOLR. I need this solution to make sure that in case the live index fails, I can move to the backup or replicated index. Thanks, Wael On Thu, Jan 18, 2018 at 11:41 AM, Charlie Hull wrote: > On 18/01/2018 09:21, Wael Kader wrote: > >> Hello, >> >> Whats the best way to do a backup of the SOLR data. >> I have a single node solr server and I want to always keep a copy of the >> data I have. >> >> Is replication an option for what I want ? >> >> I would like to get some tutorials and papers if possible on the method >> that should be used in case its backup or replication or anything else. >> >> > Hi Wael, > > Have you considered backing up the source data instead? You can always > re-index to re-create the Solr data. > > Replication will certainly allow you to maintain a copy of the Solr data, > either so you can handle more search traffic by load balancing between the > two, or to provide a failover capability in the case of a server failure. > But this isn't a backup in the traditional sense. You shouldn't consider > Solr as your 'source of truth' unless for some reason it is impossible to > re-index. > > Perhaps if you could let us know why you think you need a backup we can > suggest the best solution. > > Cheers > > Charlie > > -- > Charlie Hull > Flax - Open Source Enterprise Search > > tel/fax: +44 (0)8700 118334 > mobile: +44 (0)7767 825828 > web: www.flax.co.uk > -- Regards, Wael
Re: SOLR Data Backup
Hi, The data is always changing for me so I think I can try the replication option. I am using cloudera and the data is saved in HDFS. Is it possible for me to move the data while the index is running without any problems ? I would also like to know if its possible to setup slave/master replication without rebuilding the index. Thanks, Wael On Thu, Jan 18, 2018 at 12:06 PM, Wael Kader wrote: > Hi, > > Its not possible for me to re-index the data in some of my indexes is only > saved in SOLR. > I need this solution to make sure that in case the live index fails, I can > move to the backup or replicated index. > > Thanks, > Wael > > On Thu, Jan 18, 2018 at 11:41 AM, Charlie Hull wrote: > >> On 18/01/2018 09:21, Wael Kader wrote: >> >>> Hello, >>> >>> Whats the best way to do a backup of the SOLR data. >>> I have a single node solr server and I want to always keep a copy of the >>> data I have. >>> >>> Is replication an option for what I want ? >>> >>> I would like to get some tutorials and papers if possible on the method >>> that should be used in case its backup or replication or anything else. >>> >>> >> Hi Wael, >> >> Have you considered backing up the source data instead? You can always >> re-index to re-create the Solr data. >> >> Replication will certainly allow you to maintain a copy of the Solr data, >> either so you can handle more search traffic by load balancing between the >> two, or to provide a failover capability in the case of a server failure. >> But this isn't a backup in the traditional sense. You shouldn't consider >> Solr as your 'source of truth' unless for some reason it is impossible to >> re-index. >> >> Perhaps if you could let us know why you think you need a backup we can >> suggest the best solution. >> >> Cheers >> >> Charlie >> >> -- >> Charlie Hull >> Flax - Open Source Enterprise Search >> >> tel/fax: +44 (0)8700 118334 >> mobile: +44 (0)7767 825828 >> web: www.flax.co.uk >> > > > > -- > Regards, > Wael > -- Regards, Wael
Solr Recommended setup
Hi, I would like to get a recommendation for the SOLR setup I have. I have an index getting around 2 Million records per day. The index used is in Cloudera Search (Solr). I am running everything on one node. I run SOLR commits for whatever data that comes to the index every 5 minutes. The whole Cloudera VM has 64 GB of Ram. Its working fine till now having around 80 Million records but Solr gets slow once a week so I restart the VM for things to work. I would like to get a recommendation on the setup. Note that I can add VM's for my setup if needed. I read somewhere that its wrong to index and read data from the same place. I am doing this now and I do know I am doing things wrong. How can I do a setup on Cloudera for SOLR to do indexing in one VM and do the reading on another and what recommendations should I do for my setup. -- Regards, Wael
Solr crashing StandardWrapperValve
Hello, SOLR kept crashing today over and over again . I am running a single node solr instance on Cloudera with 140 GB of data. Things were working fine until today. I have a replication server that I am replicating data to but it wasn't working before and was fixed today.. so I thought maybe its causing the issue so I stopped the replication. I am not sure this is the problem as it crashed once after I stopped the replication. I need help on identifying the problem. I tried to find the problem from the log and I found the below error: Feb 27, 2018 6:23:14 AM org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet default threw exception java.lang.IllegalStateException at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:962) at org.apache.solr.servlet.SolrDispatchFilter.httpSolrCall(SolrDispatchFilter.java:497) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.SolrHadoopAuthenticationFilter$2.doFilter(SolrHadoopAuthenticationFilter.java:408) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:622) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:301) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:574) at org.apache.solr.servlet.SolrHadoopAuthenticationFilter.doFilter(SolrHadoopAuthenticationFilter.java:413) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:612) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:503) at java.lang.Thread.run(Thread.java:745) -- Regards, Wael
Move SOLR from cloudera HDFS to SOLR on Docker
Hello, I want to move data from my SOLR setup on Cloudera Hadoop to a docker SOLR container. I don't need to run all the hadoop services in my setup as I am only currently using SOLR from the cloudera HDP. My concern now is to know what's the best way to move the data and schema to Docker container. I don't mind moving data to an older version of SOLR Container to match the 4.10.3 SOLR Version I have on Cloudera. Much help is appreciated. -- Regards, Wael