date:20160306

Re: Field exclusion from fl and hl.fl

2016-03-06 Thread Zheng Lin Edwin Yeo

Hi,

No, I tried that and I got the following error.

{
  "responseHeader":{
"status":500,
"QTime":0},
  "error":{
"msg":"For input string: \"-\"",
"trace":"java.lang.NumberFormatException: For input string:
\"-\"\r\n\tat 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)\r\n\tat
java.lang.Long.parseLong(Long.java:581)\r\n\tat
java.lang.Long.parseLong(Long.java:631)\r\n\tat
org.apache.solr.search.StrParser.getNumber(StrParser.java:124)\r\n\tat
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:298)\r\n\tat
org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:80)\r\n\tat
org.apache.solr.search.QParser.getQuery(QParser.java:141)\r\n\tat
org.apache.solr.search.SolrReturnFields.add(SolrReturnFields.java:297)\r\n\tat
org.apache.solr.search.SolrReturnFields.parseFieldList(SolrReturnFields.java:113)\r\n\tat
org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:99)\r\n\tat
org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:75)\r\n\tat
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:139)\r\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:247)\r\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)\r\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:457)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\r\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\r\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\r\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:499)\r\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\r\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\r\n\tat
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\r\n\tat
java.lang.Thread.run(Thread.java:745)\r\n",
"code":500}}


Regards,
Edwin


On 6 March 2016 at 11:19, William Bell  wrote:

> it used to support
>
> fl=*,-field
>
> Does that not work now?
>
> On Sat, Mar 5, 2016 at 7:37 PM, Zheng Lin Edwin Yeo 
> wrote:
>
> > I have yet to find any workaround so far.Still have to list out all the
> > remaining fields one by one.
> >
> > Does anyone else has any suggestions?
> >
> > Regards,
> > Edwin
> >
> >
> > On 18 February 2016 at 17:07, Anil  wrote:
> >
> > > I am looking for the same. please do let me know just in case you find
> > > workaround.
> > >
> > > On 18 February 2016 at 14:18, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Would like to find out, is there already a way to exclude field from
> > the
> > > > Solr response. I did came across SOLR-3191 which was created about 4
> > > years
> > > > ago, but could not find any workable solutions from there.
> > > >
> > > > As my collections can have more than 100 fields, and I would need to
> > > return
> > > > the majority of then except for one or two, so if there is a way to
> > > exclude
> > > > the fields would be good, if not I have to list all the remaining
> > fields
> > > > (which can be more than 100 for each collections).
> > > >
> > > > I am using Solr 5.4.0.
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > >
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>

Re: Field exclusion from fl and hl.fl

2016-03-06 Thread Erik Hatcher

I don't believe that was ever supported. But Scott and I have been working 
through SOLR-3191 recently to get that need addressed.  Maybe some time next 
week we will have a ready to commit patch up. 

Erik

> On Mar 5, 2016, at 22:19, William Bell  wrote:
> 
> it used to support
> 
> fl=*,-field
> 
> Does that not work now?
> 
> On Sat, Mar 5, 2016 at 7:37 PM, Zheng Lin Edwin Yeo 
> wrote:
> 
>> I have yet to find any workaround so far.Still have to list out all the
>> remaining fields one by one.
>> 
>> Does anyone else has any suggestions?
>> 
>> Regards,
>> Edwin
>> 
>> 
>>> On 18 February 2016 at 17:07, Anil  wrote:
>>> 
>>> I am looking for the same. please do let me know just in case you find
>>> workaround.
>>> 
>>> On 18 February 2016 at 14:18, Zheng Lin Edwin Yeo 
>>> wrote:
>>> 
 Hi,
 
 Would like to find out, is there already a way to exclude field from
>> the
 Solr response. I did came across SOLR-3191 which was created about 4
>>> years
 ago, but could not find any workable solutions from there.
 
 As my collections can have more than 100 fields, and I would need to
>>> return
 the majority of then except for one or two, so if there is a way to
>>> exclude
 the fields would be good, if not I have to list all the remaining
>> fields
 (which can be more than 100 for each collections).
 
 I am using Solr 5.4.0.
 
 Regards,
 Edwin
> 
> 
> 
> -- 
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076

Re: Indexing Twitter - Hypothetical

2016-03-06 Thread Susheel Kumar

Entity Recognition means you may want to recognize different entities
name/person, email, location/city/state/country etc. in your
tweets/messages with goal of  providing better relevant results to users.
NER can be used at query or indexing (data enrichment) time.

Thanks,
Susheel

On Fri, Mar 4, 2016 at 7:55 PM, Joseph Obernberger <
joseph.obernber...@gmail.com> wrote:

> Thank you all very much for all the responses so far.  I've enjoyed reading
> them!  We have noticed that storing data inside of Solr results in
> significantly worse performance (particularly faceting); so we store the
> values of all the fields elsewhere, but index all the data with Solr
> Cloud.  I think the suggestion about splitting the data up into blocks of
> date/time is where we would be headed.  Having two Solr-Cloud clusters -
> one to handle ~30 days of data, and one to handle historical.  Another
> option is to use a single Solr Cloud cluster, but use multiple
> cores/collections.  Either way you'd need a job to come through and clean
> up old data. The historical cluster would have much worse performance,
> particularly for clustering and faceting the data, but that may be
> acceptable.
> I don't know what you mean by 'entity recognition in the queries' - could
> you elaborate?
>
> We would want to index and potentially facet on any of the fields - for
> example entities_media_url, username, even background color, but we do not
> know a-priori what fields will be important to users.
> As to why we would want to make the data searchable; well - I don't make
> the rules!  Tweets is not the only data source, but it's certainly the
> largest that we are currently looking at handling.
>
> I will read up on the Berlin Buzzwords - thank you for the info!
>
> -Joe
>
>
>
> On Fri, Mar 4, 2016 at 9:59 AM, Jack Krupansky 
> wrote:
>
> > As always, the initial question is how you intend to query the data -
> query
> > drives data modeling. How real-time do you need queries to be? How fast
> do
> > you need archive queries to be? How many fields do you need to query on?
> > How much entity recognition do you need in queries?
> >
> >
> > -- Jack Krupansky
> >
> > On Fri, Mar 4, 2016 at 4:19 AM, Charlie Hull  wrote:
> >
> > > On 03/03/2016 19:25, Toke Eskildsen wrote:
> > >
> > >> Joseph Obernberger  wrote:
> > >>
> > >>> Hi All - would it be reasonable to index the Twitter 'firehose'
> > >>> with Solr Cloud - roughly 500-600 million docs per day indexing
> > >>> each of the fields (about 180)?
> > >>>
> > >>
> > >> Possible, yes. Reasonable? It is not going to be cheap.
> > >>
> > >> Twitter index the tweets themselves and have been quite open about
> > >> how they do it. I would suggest looking for their presentations;
> > >> slides or recordings. They have presented at Berlin Buzzwords and
> > >> Lucene/Solr Revolution and probably elsewhere too. The gist is that
> > >> they have done a lot of work and custom coding to handle it.
> > >>
> > >
> > > As I recall they're not using Solr, but rather an in-house layer built
> on
> > > a customised version of Lucene. They're indexing around half a trillion
> > > tweets.
> > >
> > > If the idea is to provide a searchable archive of all tweets, my first
> > > question would be 'why': if the idea is to monitor new tweets for
> > > particular patterns there are better ways to do this (Luwak for
> example).
> > >
> > > Charlie
> > >
> > >
> > >> If I were to guess at a sharded setup to handle such data, and keep
> > >>> 2 years worth, I would guess about 2500 shards.  Is that
> > >>> reasonable?
> > >>>
> > >>
> > >> I think you need to think well beyond standard SolrCloud setups. Even
> > >> if you manage to get 2500 shards running, you will want to do a lot
> > >> of tweaking on the way to issue queries so that each request does not
> > >> require all 2500 shards to be searched. Prioritizing newer material
> > >> and only query the older shards if there is not enough resent results
> > >> is an example.
> > >>
> > >> I highly doubt that a single SolrCloud is the best answer here. Maybe
> > >> one cloud for each month and a lot of external logic?
> > >>
> > >> - Toke Eskildsen
> > >>
> > >>
> > >
> > > --
> > > Charlie Hull
> > > Flax - Open Source Enterprise Search
> > >
> > > tel/fax: +44 (0)8700 118334
> > > mobile:  +44 (0)7767 825828
> > > web: www.flax.co.uk
> > >
> >
>

Re: Disk Usage anomoly across shards/replicas

2016-03-06 Thread Robert Brown

There was only one single index dir, after taking a node down, another 
was created with the timestamp, so I know what you mean, but then the 
original was removed.


I've since replaced the node and all is well again, just very odd.




On 06/03/16 06:52, Varun Thacker wrote:

Hi Robert,

Within the shard directory there should be multiple directories - "tlog"
"index." . Do you see multiple "index.*" directories in there
for the shard which has more data on disk?

On Sat, Mar 5, 2016 at 6:39 PM, Robert Brown  wrote:


Hi,

I have an index with 65m docs spread across 2 shards, each with 1 replica.

The replica1 of shard2 is using up nearly double the amount of disk space
as the other shards/replicas.

Could there be a reason/fix for this?


/home/s123/solr/data/de_shard1_replica1 = 72G

numDocs:34,786,026
maxDoc:45,825,444
deletedDocs:11,039,418



/home/s123/solr/data/de_shard1_replica2 = 70G

numDocs:34,786,026
maxDoc:46,914,095
deletedDocs:12,128,069



/home/s123/solr/data/de_shard2_replica1 = 138G

numDocs:34,775,193
maxDoc:45,409,362
deletedDocs:10,634,169



/home/s123/solr/data/de_shard2_replica2 = 66G

numDocs:34,775,193
maxDoc:44,181,734
deletedDocs:9,406,541



Thanks,
Rob

Re: High Cpu sys usage

2016-03-06 Thread Shawn Heisey

On 3/5/2016 11:44 PM, YouPeng Yang wrote:
>   We are using Solr Cloud 4.6 in our production for searching service
> since 2 years ago.And now it has 700GB in one cluster which is  comprised
> of 3 machines with ssd. At beginning ,everything go well,while more and
> more business services interfered with our searching service .And a problem
>  which we haunted with is just like a  nightmare . That is the cpu sys
> usage is often growing up to  over 10% even higher, and as a result the
> machine will hang down because system resources have be drained out.We have
> to restart the machine manually.

One of the most common reasons for performance issues with Solr is not
having enough system memory to effectively cache the index.  Another is
running with a heap that's too small, or a heap that's really large with
ineffective garbage collection tuning.  All of these problems get worse
as query rate climbs.

Running on SSD can reduce, but not eliminate, the requirement for plenty
of system memory.

With 700GB of index data, you are likely to need somewhere between 128GB
and 512GB of memory for good performance.  If the query rate is high,
then requirements are more likely to land in the upper end of that
range.  There's no way for me to narrow that range down -- it depends on
a number of factors, and usually has to be determined through trial and
error.  If the data were on regular disks instead of SSD, I would be
recommending even more memory.

https://wiki.apache.org/solr/SolrPerformanceProblems
https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

If you want a single number recommendation for memory size, I would
recommend starting with 256GB, and being ready to add more.  It is very
common for servers to be incapable of handling that much memory,
though.  The servers that I use for Solr max out at 64GB.

You might need to split your index onto additional machines by sharding
it, and gain the additional memory that way.

Thanks,
Shawn

Re: Indexing Twitter - Hypothetical

2016-03-06 Thread Jack Krupansky

Back to the original question... there are two answers:

1. Yes - for guru-level Solr experts.
2. No - for anybody else.

For starters, (as always), you would need to do a lot more upfront work on
mapping out the forms of query which will be supported. For example, is
your focus on precision or recall. And, are you looking to analyze all
matching tweets or just a sample. And, the load, throughput, and latency
requirements. And, any spatial search requirements. And, any entity search
requirements. Without a clear view of the query requirements it simply
isn't possible to even begin defining a data model. And without a data
model, indexing is a fool's errand. In short, no focus, no progress.

-- Jack Krupansky

On Sun, Mar 6, 2016 at 7:42 AM, Susheel Kumar  wrote:

> Entity Recognition means you may want to recognize different entities
> name/person, email, location/city/state/country etc. in your
> tweets/messages with goal of  providing better relevant results to users.
> NER can be used at query or indexing (data enrichment) time.
>
> Thanks,
> Susheel
>
> On Fri, Mar 4, 2016 at 7:55 PM, Joseph Obernberger <
> joseph.obernber...@gmail.com> wrote:
>
> > Thank you all very much for all the responses so far.  I've enjoyed
> reading
> > them!  We have noticed that storing data inside of Solr results in
> > significantly worse performance (particularly faceting); so we store the
> > values of all the fields elsewhere, but index all the data with Solr
> > Cloud.  I think the suggestion about splitting the data up into blocks of
> > date/time is where we would be headed.  Having two Solr-Cloud clusters -
> > one to handle ~30 days of data, and one to handle historical.  Another
> > option is to use a single Solr Cloud cluster, but use multiple
> > cores/collections.  Either way you'd need a job to come through and clean
> > up old data. The historical cluster would have much worse performance,
> > particularly for clustering and faceting the data, but that may be
> > acceptable.
> > I don't know what you mean by 'entity recognition in the queries' - could
> > you elaborate?
> >
> > We would want to index and potentially facet on any of the fields - for
> > example entities_media_url, username, even background color, but we do
> not
> > know a-priori what fields will be important to users.
> > As to why we would want to make the data searchable; well - I don't make
> > the rules!  Tweets is not the only data source, but it's certainly the
> > largest that we are currently looking at handling.
> >
> > I will read up on the Berlin Buzzwords - thank you for the info!
> >
> > -Joe
> >
> >
> >
> > On Fri, Mar 4, 2016 at 9:59 AM, Jack Krupansky  >
> > wrote:
> >
> > > As always, the initial question is how you intend to query the data -
> > query
> > > drives data modeling. How real-time do you need queries to be? How fast
> > do
> > > you need archive queries to be? How many fields do you need to query
> on?
> > > How much entity recognition do you need in queries?
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > On Fri, Mar 4, 2016 at 4:19 AM, Charlie Hull 
> wrote:
> > >
> > > > On 03/03/2016 19:25, Toke Eskildsen wrote:
> > > >
> > > >> Joseph Obernberger  wrote:
> > > >>
> > > >>> Hi All - would it be reasonable to index the Twitter 'firehose'
> > > >>> with Solr Cloud - roughly 500-600 million docs per day indexing
> > > >>> each of the fields (about 180)?
> > > >>>
> > > >>
> > > >> Possible, yes. Reasonable? It is not going to be cheap.
> > > >>
> > > >> Twitter index the tweets themselves and have been quite open about
> > > >> how they do it. I would suggest looking for their presentations;
> > > >> slides or recordings. They have presented at Berlin Buzzwords and
> > > >> Lucene/Solr Revolution and probably elsewhere too. The gist is that
> > > >> they have done a lot of work and custom coding to handle it.
> > > >>
> > > >
> > > > As I recall they're not using Solr, but rather an in-house layer
> built
> > on
> > > > a customised version of Lucene. They're indexing around half a
> trillion
> > > > tweets.
> > > >
> > > > If the idea is to provide a searchable archive of all tweets, my
> first
> > > > question would be 'why': if the idea is to monitor new tweets for
> > > > particular patterns there are better ways to do this (Luwak for
> > example).
> > > >
> > > > Charlie
> > > >
> > > >
> > > >> If I were to guess at a sharded setup to handle such data, and keep
> > > >>> 2 years worth, I would guess about 2500 shards.  Is that
> > > >>> reasonable?
> > > >>>
> > > >>
> > > >> I think you need to think well beyond standard SolrCloud setups.
> Even
> > > >> if you manage to get 2500 shards running, you will want to do a lot
> > > >> of tweaking on the way to issue queries so that each request does
> not
> > > >> require all 2500 shards to be searched. Prioritizing newer material
> > > >> and only query the older shards if there is not enough resent
> results
> > > >> is an example.
> > > >>
> > > >> I highl

Custom field using PatternCaptureGroupFilterFactory

2016-03-06 Thread Jay Potharaju

Hi,
I have a custom field for getting the first letter of an firstname. For
this I am using PatternCaptureGroupFilterFactory.
This is not working as expected, not able to parse the data and get the
first character for the string. Any suggestions on how to fix this?

 

  







   



-- 
Thanks
Jay

Re: Custom field using PatternCaptureGroupFilterFactory

2016-03-06 Thread Binoy Dalal

What do you see under the analysis screen in the solr admin UI?

On Sun, Mar 6, 2016 at 10:55 PM Jay Potharaju  wrote:

> Hi,
> I have a custom field for getting the first letter of an firstname. For
> this I am using PatternCaptureGroupFilterFactory.
> This is not working as expected, not able to parse the data and get the
> first character for the string. Any suggestions on how to fix this?
>
>  
>
>   
>
> 
>
> 
>
>  "^[a-zA-Z0-9]{0,1}" preserve_original="false"/>
>
>
>
> 
>
> --
> Thanks
> Jay
>
-- 
Regards,
Binoy Dalal

Re: Indexing Twitter - Hypothetical

2016-03-06 Thread Walter Underwood

This is a very good presentation on using entity extraction in query 
understanding. As you’ll see from the preso, it is not easy.

http://www.slideshare.net/dtunkelang/better-search-through-query-understanding 


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Mar 6, 2016, at 7:27 AM, Jack Krupansky  wrote:
> 
> Back to the original question... there are two answers:
> 
> 1. Yes - for guru-level Solr experts.
> 2. No - for anybody else.
> 
> For starters, (as always), you would need to do a lot more upfront work on
> mapping out the forms of query which will be supported. For example, is
> your focus on precision or recall. And, are you looking to analyze all
> matching tweets or just a sample. And, the load, throughput, and latency
> requirements. And, any spatial search requirements. And, any entity search
> requirements. Without a clear view of the query requirements it simply
> isn't possible to even begin defining a data model. And without a data
> model, indexing is a fool's errand. In short, no focus, no progress.
> 
> -- Jack Krupansky
> 
> On Sun, Mar 6, 2016 at 7:42 AM, Susheel Kumar  wrote:
> 
>> Entity Recognition means you may want to recognize different entities
>> name/person, email, location/city/state/country etc. in your
>> tweets/messages with goal of  providing better relevant results to users.
>> NER can be used at query or indexing (data enrichment) time.
>> 
>> Thanks,
>> Susheel
>> 
>> On Fri, Mar 4, 2016 at 7:55 PM, Joseph Obernberger <
>> joseph.obernber...@gmail.com> wrote:
>> 
>>> Thank you all very much for all the responses so far.  I've enjoyed
>> reading
>>> them!  We have noticed that storing data inside of Solr results in
>>> significantly worse performance (particularly faceting); so we store the
>>> values of all the fields elsewhere, but index all the data with Solr
>>> Cloud.  I think the suggestion about splitting the data up into blocks of
>>> date/time is where we would be headed.  Having two Solr-Cloud clusters -
>>> one to handle ~30 days of data, and one to handle historical.  Another
>>> option is to use a single Solr Cloud cluster, but use multiple
>>> cores/collections.  Either way you'd need a job to come through and clean
>>> up old data. The historical cluster would have much worse performance,
>>> particularly for clustering and faceting the data, but that may be
>>> acceptable.
>>> I don't know what you mean by 'entity recognition in the queries' - could
>>> you elaborate?
>>> 
>>> We would want to index and potentially facet on any of the fields - for
>>> example entities_media_url, username, even background color, but we do
>> not
>>> know a-priori what fields will be important to users.
>>> As to why we would want to make the data searchable; well - I don't make
>>> the rules!  Tweets is not the only data source, but it's certainly the
>>> largest that we are currently looking at handling.
>>> 
>>> I will read up on the Berlin Buzzwords - thank you for the info!
>>> 
>>> -Joe
>>> 
>>> 
>>> 
>>> On Fri, Mar 4, 2016 at 9:59 AM, Jack Krupansky >> 
>>> wrote:
>>> 
 As always, the initial question is how you intend to query the data -
>>> query
 drives data modeling. How real-time do you need queries to be? How fast
>>> do
 you need archive queries to be? How many fields do you need to query
>> on?
 How much entity recognition do you need in queries?
 
 
 -- Jack Krupansky
 
 On Fri, Mar 4, 2016 at 4:19 AM, Charlie Hull 
>> wrote:
 
> On 03/03/2016 19:25, Toke Eskildsen wrote:
> 
>> Joseph Obernberger  wrote:
>> 
>>> Hi All - would it be reasonable to index the Twitter 'firehose'
>>> with Solr Cloud - roughly 500-600 million docs per day indexing
>>> each of the fields (about 180)?
>>> 
>> 
>> Possible, yes. Reasonable? It is not going to be cheap.
>> 
>> Twitter index the tweets themselves and have been quite open about
>> how they do it. I would suggest looking for their presentations;
>> slides or recordings. They have presented at Berlin Buzzwords and
>> Lucene/Solr Revolution and probably elsewhere too. The gist is that
>> they have done a lot of work and custom coding to handle it.
>> 
> 
> As I recall they're not using Solr, but rather an in-house layer
>> built
>>> on
> a customised version of Lucene. They're indexing around half a
>> trillion
> tweets.
> 
> If the idea is to provide a searchable archive of all tweets, my
>> first
> question would be 'why': if the idea is to monitor new tweets for
> particular patterns there are better ways to do this (Luwak for
>>> example).
> 
> Charlie
> 
> 
>> If I were to guess at a sharded setup to handle such data, and keep
>>> 2 years worth, I would guess about 2500 shards.  Is that
>>> reasonable?
>>> 
>> 
>

Solr Deserialize/Read .fdt file

2016-03-06 Thread Bin Wang

Hi there, I am interested in understanding all the files in the index
folder.

here  is
a stackoverflow question that I have tried however failed.

Can anyone provide some sample code to help me get started.

Best regards,
Bin

Re: Solr Deserialize/Read .fdt file

2016-03-06 Thread Jack Krupansky

Solr itself doesn't directly access index files - that is the
responsibility of Lucene. That's why you see "lucene" in the class names,
not "solr".

To be clear, no Solr user will ever have to read or deserialize a .fdt
file. Or any Lucene index file for that matter.

If you actually do want to work at the Lucene level (which no one here will
recommend), start with the Lucene doc:

https://lucene.apache.org/core/documentation.html
https://lucene.apache.org/core/5_5_0/index.html

For File Formats:
https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/codecs/lucene54/package-summary.html#package_description

After that you will need to become much more familiar with the Lucene (not
Solr) source code.

If you want to trace through the code from Solr through Lucene, I suggest
you start with Solr unit tests in Eclipse.

But none of that will be an appropriate topic for users on this (Solr) list.

-- Jack Krupansky

On Sun, Mar 6, 2016 at 3:34 PM, Bin Wang  wrote:

> Hi there, I am interested in understanding all the files in the index
> folder.
>
> here 
> is
> a stackoverflow question that I have tried however failed.
>
> Can anyone provide some sample code to help me get started.
>
> Best regards,
> Bin
>

Re: Custom field using PatternCaptureGroupFilterFactory

2016-03-06 Thread Jay Potharaju

On the analysis screen I see the following. Not sure why the regex didnt
work. Any suggestions?
Thanks

KT
text
raw_bytes
start
end
positionLength
type
position
test
[74 65 73 74]
0
4
1
word
1
UCF
text
raw_bytes
start
end
positionLength
type
position
TEST
[54 45 53 54]
0
4
1
word
1
PCGTF
text
raw_bytes
start
end
positionLength
type
position
TEST
[54 45 53 54]
0
4
1
word
1

On Sun, Mar 6, 2016 at 9:56 AM, Binoy Dalal  wrote:

> What do you see under the analysis screen in the solr admin UI?
>
> On Sun, Mar 6, 2016 at 10:55 PM Jay Potharaju 
> wrote:
>
> > Hi,
> > I have a custom field for getting the first letter of an firstname. For
> > this I am using PatternCaptureGroupFilterFactory.
> > This is not working as expected, not able to parse the data and get the
> > first character for the string. Any suggestions on how to fix this?
> >
> >  
> >
> >   
> >
> > 
> >
> > 
> >
> >  > "^[a-zA-Z0-9]{0,1}" preserve_original="false"/>
> >
> >
> >
> > 
> >
> > --
> > Thanks
> > Jay
> >
> --
> Regards,
> Binoy Dalal
>



-- 
Thanks
Jay Potharaju

Re: Field exclusion from fl and hl.fl

2016-03-06 Thread William Bell

Can we get this over the goal line?

https://issues.apache.org/jira/browse/SOLR-3191

On Sun, Mar 6, 2016 at 3:16 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> No, I tried that and I got the following error.
>
> {
>   "responseHeader":{
> "status":500,
> "QTime":0},
>   "error":{
> "msg":"For input string: \"-\"",
> "trace":"java.lang.NumberFormatException: For input string:
> \"-\"\r\n\tat
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)\r\n\tat
> java.lang.Long.parseLong(Long.java:581)\r\n\tat
> java.lang.Long.parseLong(Long.java:631)\r\n\tat
> org.apache.solr.search.StrParser.getNumber(StrParser.java:124)\r\n\tat
>
> org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:298)\r\n\tat
>
> org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:80)\r\n\tat
> org.apache.solr.search.QParser.getQuery(QParser.java:141)\r\n\tat
>
> org.apache.solr.search.SolrReturnFields.add(SolrReturnFields.java:297)\r\n\tat
>
> org.apache.solr.search.SolrReturnFields.parseFieldList(SolrReturnFields.java:113)\r\n\tat
>
> org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:99)\r\n\tat
>
> org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:75)\r\n\tat
>
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:139)\r\n\tat
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:247)\r\n\tat
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)\r\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)\r\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)\r\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:457)\r\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222)\r\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)\r\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\r\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\r\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\r\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\r\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\r\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\r\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\r\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\r\n\tat
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\r\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\r\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:499)\r\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\r\n\tat
>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\r\n\tat
>
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\r\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\r\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\r\n\tat
> java.lang.Thread.run(Thread.java:745)\r\n",
> "code":500}}
>
>
> Regards,
> Edwin
>
>
> On 6 March 2016 at 11:19, William Bell  wrote:
>
> > it used to support
> >
> > fl=*,-field
> >
> > Does that not work now?
> >
> > On Sat, Mar 5, 2016 at 7:37 PM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > wrote:
> >
> > > I have yet to find any workaround so far.Still have to list out all the
> > > remaining fields one by one.
> > >
> > > Does anyone else has any suggestions?
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 18 February 2016 at 17:07, Anil  wrote:
> > >
> > > > I am looking for the same. please do let me know just in case you
> find
> > > > workaround.
> > > >
> > > > On 18 February 2016 at 14:18, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Would like to find out, is there already a way to exclude field
> from
> > > the
> > > > > Solr response. I did came across SOLR-3191 which was created about
> 4
> > > > years
> > > > > ago, but could not find any workable solutions from there.
> > > > >
> > > > > As my collections can have more than 100 fields, and I would need
> to
> > > > return
> > > > > the majority of then except for one or two, so if there is a way to
> > > > exclude
> > > > > the fields would be good, if not I have to

Re: Custom field using PatternCaptureGroupFilterFactory

2016-03-06 Thread Alexandre Rafalovitch

I don't see the brackets that mark the group you actually want to
capture. As per:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternCaptureGroupTokenFilter.html

I am also not sure if you actually need "{0,1}" part.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/

On 7 March 2016 at 04:25, Jay Potharaju  wrote:
> Hi,
> I have a custom field for getting the first letter of an firstname. For
> this I am using PatternCaptureGroupFilterFactory.
> This is not working as expected, not able to parse the data and get the
> first character for the string. Any suggestions on how to fix this?
>
>  
>
>   
>
> 
>
> 
>
>  "^[a-zA-Z0-9]{0,1}" preserve_original="false"/>
>
>
>
> 
>
> --
> Thanks
> Jay

Re: Field exclusion from fl and hl.fl

2016-03-06 Thread Zheng Lin Edwin Yeo

Thank you.

Looking forward for this to be solved.

Regards,
Edwin


On 7 March 2016 at 07:41, William Bell  wrote:

> Can we get this over the goal line?
>
> https://issues.apache.org/jira/browse/SOLR-3191
>
> On Sun, Mar 6, 2016 at 3:16 AM, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi,
> >
> > No, I tried that and I got the following error.
> >
> > {
> >   "responseHeader":{
> > "status":500,
> > "QTime":0},
> >   "error":{
> > "msg":"For input string: \"-\"",
> > "trace":"java.lang.NumberFormatException: For input string:
> > \"-\"\r\n\tat
> >
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)\r\n\tat
> > java.lang.Long.parseLong(Long.java:581)\r\n\tat
> > java.lang.Long.parseLong(Long.java:631)\r\n\tat
> > org.apache.solr.search.StrParser.getNumber(StrParser.java:124)\r\n\tat
> >
> >
> org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:298)\r\n\tat
> >
> >
> org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:80)\r\n\tat
> > org.apache.solr.search.QParser.getQuery(QParser.java:141)\r\n\tat
> >
> >
> org.apache.solr.search.SolrReturnFields.add(SolrReturnFields.java:297)\r\n\tat
> >
> >
> org.apache.solr.search.SolrReturnFields.parseFieldList(SolrReturnFields.java:113)\r\n\tat
> >
> >
> org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:99)\r\n\tat
> >
> >
> org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:75)\r\n\tat
> >
> >
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:139)\r\n\tat
> >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:247)\r\n\tat
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)\r\n\tat
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)\r\n\tat
> >
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)\r\n\tat
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:457)\r\n\tat
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222)\r\n\tat
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)\r\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\r\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\r\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
> >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\r\n\tat
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\r\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\r\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\r\n\tat
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\r\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\r\n\tat
> >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\r\n\tat
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\r\n\tat
> > org.eclipse.jetty.server.Server.handle(Server.java:499)\r\n\tat
> > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\r\n\tat
> >
> >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\r\n\tat
> >
> >
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\r\n\tat
> >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\r\n\tat
> >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\r\n\tat
> > java.lang.Thread.run(Thread.java:745)\r\n",
> > "code":500}}
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 6 March 2016 at 11:19, William Bell  wrote:
> >
> > > it used to support
> > >
> > > fl=*,-field
> > >
> > > Does that not work now?
> > >
> > > On Sat, Mar 5, 2016 at 7:37 PM, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>
> > > wrote:
> > >
> > > > I have yet to find any workaround so far.Still have to list out all
> the
> > > > remaining fields one by one.
> > > >
> > > > Does anyone else has any suggestions?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 18 February 2016 at 17:07, Anil  wrote:
> > > >
> > > > > I am looking for the same. please do let me know just in case you
> > find
> > > > > workaround.
> > > > >
> > > > > On 18 February 2016 at 14:18, Zheng Lin Edwin Yeo <
> > > edwinye...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Would like to find out, is there already a way to exclude field
> >

Re: Custom field using PatternCaptureGroupFilterFactory

2016-03-06 Thread Jack Krupansky

The filter name, "Capture Group", says it all - only pattern groups are
captured and you have not specified even a single group. See the example:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternCaptureGroupFilterFactory.html

Groups are each enclosed within parentheses, as shown in the Javadoc
example above.

Since no groups were found, the filter doc applied this rule:
"If none of the patterns match, or if preserveOriginal is true, the
original token will be preserved."
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternCaptureGroupTokenFilter.html

That should probably also say "or if no pattern groups match".

To test regular expressions, try an interactive online tool, such as:
https://regex101.com/

-- Jack Krupansky

On Sun, Mar 6, 2016 at 7:51 PM, Alexandre Rafalovitch 
wrote:

> I don't see the brackets that mark the group you actually want to
> capture. As per:
>
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternCaptureGroupTokenFilter.html
>
> I am also not sure if you actually need "{0,1}" part.
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 7 March 2016 at 04:25, Jay Potharaju  wrote:
> > Hi,
> > I have a custom field for getting the first letter of an firstname. For
> > this I am using PatternCaptureGroupFilterFactory.
> > This is not working as expected, not able to parse the data and get the
> > first character for the string. Any suggestions on how to fix this?
> >
> >  
> >
> >   
> >
> > 
> >
> > 
> >
> >  > "^[a-zA-Z0-9]{0,1}" preserve_original="false"/>
> >
> >
> >
> > 
> >
> > --
> > Thanks
> > Jay
>

Re: Deciding on Solr Nodes and Configuration

2016-03-06 Thread sangs8788

Could you please provide some inputs/thoughts on how we can decide on the
configuration ?

Thanks
Sangeetha



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deciding-on-Solr-Nodes-and-Configuration-tp4261581p4262042.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to use geospatial search to find the locations within polygon

2016-03-06 Thread Pradeep Chandra

Thank u so much David & Jack for ur response.

I downloaded the JTS jar file and pasted in server/lib directory. Now it's
working and giving the results.

Once again Thank u both of u..




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-use-geospatial-search-to-find-the-locations-within-polygon-tp4261588p4262052.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field exclusion from fl and hl.fl

Re: Field exclusion from fl and hl.fl

Re: Indexing Twitter - Hypothetical

Re: Disk Usage anomoly across shards/replicas

Re: High Cpu sys usage

Re: Indexing Twitter - Hypothetical

Custom field using PatternCaptureGroupFilterFactory

Re: Custom field using PatternCaptureGroupFilterFactory

Re: Indexing Twitter - Hypothetical

Solr Deserialize/Read .fdt file

Re: Solr Deserialize/Read .fdt file

Re: Custom field using PatternCaptureGroupFilterFactory

Re: Field exclusion from fl and hl.fl

Re: Custom field using PatternCaptureGroupFilterFactory

Re: Field exclusion from fl and hl.fl

Re: Custom field using PatternCaptureGroupFilterFactory

Re: Deciding on Solr Nodes and Configuration

Re: How to use geospatial search to find the locations within polygon

18 matches

Site Navigation

Mail list logo

Footer information