Re: autocomplete_edge type split words

2013-09-30 Thread elisabeth benoit
in fact, I've removed the autoGeneratePhraseQuery=true, and it doesn't
change anything. behaviour is the same with or without (ie request with
debugQuery=on is the same)

Thanks for your comments.

Best,
Elisabeth


2013/9/28 Erick Erickson 

> You've probably been doing this right along, but adding
> debug=query will show the parsed query.
>
> I really question though, your apparent combination of
> autoGeneratePhraseQuery what looks like an ngram field.
> I'm not at all sure how those would interact...
>
> Best,
> Erick
>
> On Fri, Sep 27, 2013 at 10:12 AM, elisabeth benoit
>  wrote:
> > Yes!
> >
> > what I've done is set autoGeneratePhraseQueries to true for my field,
> then
> > give it a boost (bq=myAutompleteEdgeNGramField="my query with
> spaces"^50).
> > This only worked with autoGeneratePhraseQueries=true, for a reason I
> didn't
> > understand.
> >
> > since when I did
> >
> > q= myAutompleteEdgeNGramField="my query with spaces", I didn't need
> > autoGeneratePhraseQueries
> > set to true.
> >
> > and, another thing is when I tried
> >
> > q=myAutocompleteNGramField:(my query with spaces) OR
> > myAutompleteEdgeNGramField="my
> > query with spaces"
> >
> > (with a request handler with edismax and default operator field = AND),
> the
> > request on myAutocompleteNGramField would OR the grams, so I had to put
> an
> > AND (myAutocompleteNGramField:(my AND query AND with AND spaces)), which
> > was pretty ugly.
> >
> > I don't always understand what is exactly going on. If you have a pointer
> > to some text I could read to get more insights about this, please let me
> > know.
> >
> > Thanks again,
> > Best regards,
> > Elisabeth
> >
> >
> >
> >
> > 2013/9/27 Erick Erickson 
> >
> >> Have you looked at "autoGeneratePhraseQueries"? That might help.
> >>
> >> If that doesn't work, you can always do something like add an OR clause
> >> like
> >> OR "original query"
> >> and optionally boost it high. But I'd start with the autoGenerate bits.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Fri, Sep 27, 2013 at 7:37 AM, elisabeth benoit
> >>  wrote:
> >> > Thanks for your answer.
> >> >
> >> > So I guess if someone wants to search on two fields, on with phrase
> query
> >> > and one with "normal" query (splitted in words), one has to find a
> way to
> >> > send query twice: one with quote and one without...
> >> >
> >> > Best regards,
> >> > Elisabeth
> >> >
> >> >
> >> > 2013/9/27 Erick Erickson 
> >> >
> >> >> This is a classic issue where there's confusion between
> >> >> the query parser and field analysis.
> >> >>
> >> >> Early in the process the query parser has to take the input
> >> >> and break it up. that's how, for instance, a query like
> >> >> text:term1 term2
> >> >> gets parsed as
> >> >> text:term1 defaultfield:term2
> >> >> This happens long before the terms get to the analysis chain
> >> >> for the field.
> >> >>
> >> >> So your only options are to either quote the string or
> >> >> escape the spaces.
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit
> >> >>  wrote:
> >> >> > Hello,
> >> >> >
> >> >> > I am using solr 4.2.1 and I have a autocomplete_edge type defined
> in
> >> >> > schema.xml
> >> >> >
> >> >> >
> >> >> > 
> >> >> >   
> >> >> >  >> >> > mapping="mapping-ISOLatin1Accent.txt"/>
> >> >> > 
> >> >> > 
> >> >> >  >> >> > replacement=" " replace="all"/>
> >> >> >  >> >> > minGramSize="1"/>
> >> >> >
> >> >> >   
> >> >> >  >> >> > mapping="mapping-ISOLatin1Accent.txt"/>
> >> >> > 
> >> >> > 
> >> >> >  >> >> > replacement=" " replace="all"/>
> >> >> >   >> >> > pattern="^(.{30})(.*)?" replacement="$1" replace="all"/>
> >> >> >   
> >> >> > 
> >> >> >
> >> >> > When I have a request with more then one word, for instance "rue de
> >> la",
> >> >> my
> >> >> > request doesn't match with my autocomplete_edge field unless I use
> >> quotes
> >> >> > around the query. In other words q=rue de la doesnt work and
> q="rue de
> >> >> la"
> >> >> > works.
> >> >> >
> >> >> > I've check the request with debugQuery=on, and I can see in first
> >> case,
> >> >> the
> >> >> > query is splitted into words, and I don't understand why since my
> >> field
> >> >> > type uses KeywordTokenizerFactory.
> >> >> >
> >> >> > Does anyone have a clue on how I can request my field without using
> >> >> quotes?
> >> >> >
> >> >> > Thanks,
> >> >> > Elisabeth
> >> >>
> >>
>


documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json

2013-09-30 Thread Liu Bo
Hi all

I'm trying out the tutorial about solrcloud, and then I manage to write my
own plugin to import data from our set of databases, I use SolrWriter from
DataImporter package and the docs could be distributed commit to shards.

Every thing works fine using jetty from the solr example, but when I move
to tomcat, solrcloud seems not been configured right. As the documents are
just committed to the shard where update requested goes to.

The cause probably is the range is null for shards in clusterstate.json.
The router is "implicit" instead of "compositeId" as well.

Is there anything missed or configured wrong in the following steps? How
could I fix it. Your help will be much of my appreciation.

PS, solr cloud tomcat wiki page isn't up to 4.4 with core discovery, I'm
trying out after reading SoclrCloud, SolrCloudJboss, and CoreAdmin wiki
pages.

Here's what I've done and some useful logs:

1. start three zookeeper server.
2. upload configuration files to zookeeper, the collection name is
"content_collection"
3. start three tomcat instants on three server with core discovery

a) core file:
 name=content
 loadOnStartup=true
 transient=false
 shard=shard1   (differrent on servers)
 collection=content_collection
b) solr.xml

 

  

${host:}

${hostContext:solr}

8080

${zkClientTimeout:15000}

10.199.46.176:2181,10.199.46.165:2181,
10.199.46.158:2181

${genericCoreNodeNames:true}

  


  

${socketTimeout:0}

${connTimeout:0}

  



4. In the solr.log, I see the three shards are recognized, and the
solrcloud can see the content_collection has three shards as well.
5. write documents to content_collection using my update request, the
documents only commits to the shard the request goes to, in the log I can
see the DistributedUpdateProcessorFactory is in the processorChain and
disribute commit is triggered:

INFO  - 2013-09-30 16:31:43.205;
com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
updata request processor factories:

INFO  - 2013-09-30 16:31:43.206;
com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
org.apache.solr.update.processor.LogUpdateProcessorFactory@4ae7b77

INFO  - 2013-09-30 16:31:43.207;
com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
org.apache.solr.update.processor.*DistributedUpdateProcessorFactory*
@5b2bc407

INFO  - 2013-09-30 16:31:43.207;
com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
org.apache.solr.update.processor.RunUpdateProcessorFactory@1652d654

INFO  - 2013-09-30 16:31:43.283; org.apache.solr.core.SolrDeletionPolicy;
SolrDeletionPolicy.onInit: commits: num=1


commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1}

INFO  - 2013-09-30 16:31:43.284; org.apache.solr.core.SolrDeletionPolicy;
newest commit generation = 1

INFO  - 2013-09-30 16:31:43.440; *org.apache.solr.update.SolrCmdDistributor;
Distrib commit to*:[StdNode: http://10.199.46.176:8080/solr/content/,
StdNode: http://10.199.46.165:8080/solr/content/]
params:commit_end_point=true&commit=true&softCommit=false&waitSearcher=true&expungeDeletes=false

but the documents won't go to other shards, the other shards only has a
request with not documents:

INFO  - 2013-09-30 16:31:43.841;
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

INFO  - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy;
SolrDeletionPolicy.onInit: commits: num=1

commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1}

INFO  - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy;
newest commit generation = 1

INFO  - 2013-09-30 16:31:43.856;
org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
Skipping IW.commit.

INFO  - 2013-09-30 16:31:43.865; org.apache.solr.search.SolrIndexSearcher;
Opening Searcher@3c74c144 main

INFO  - 2013-09-30 16:31:43.869; org.apache.solr.core.QuerySenderListener;
QuerySenderListener sending requests to
Searcher@3c74c144main{StandardDirectoryReader(segments_1:1:nrt)}

INFO  - 2013-09-30 16:31:43.869; org.apache.solr.core.QuerySenderListener;
QuerySenderListener done.

INFO  - 2013-09-30 16:31:43.869; org.apache.solr.core.SolrCore; [content]
Registered new searcher
Searcher@3c74c144main{StandardDirectoryReader(segments_1:1:nrt)}

INFO  - 2013-09-30 16:31:43.870;
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush

INFO  - 2013-09-30 16:31:43.870;
org.apache.solr.update.processor.LogUpdateProcessor; [content] webapp=/solr
path=/update
params={waitSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false}
{commit=} 0 42

6) later I found the range is null in clusterstate.json which might have
ca

solr 4.4 config trouble

2013-09-30 Thread Marc des Garets

Hi,

I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I 
can't get it to work. If someone can point me at what I'm doing wrong.


tomcat context:
crossContext="true">
value="/opt/solr4.4/solr_address" override="true" />




core.properties:
name=address
collection=address
coreNodeName=address
dataDir=/opt/indexes4.1/address


solr.xml:



${host:}
8080
solr_address
${zkClientTimeout:15000}
false



${socketTimeout:0}
${connTimeout:0}




In solrconfig.xml I have:
4.1

/opt/indexes4.1/address


And the log4j logs in catalina.out:
...
INFO: Deploying configuration descriptor solr_address.xml
0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – 
SolrDispatchFilter.init()
24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI 
solr.home: /opt/solr4.4/solr_address
26 [main] INFO org.apache.solr.core.SolrResourceLoader – new 
SolrResourceLoader for directory: '/opt/solr4.4/solr_address/'
176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container 
configuration from /opt/solr4.4/solr_address/solr.xml
272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for 
cores in /opt/solr4.4/solr_address
276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for 
cores in /opt/solr4.4/solr_address/conf
276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for 
cores in /opt/solr4.4/solr_address/conf/xslt
277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for 
cores in /opt/solr4.4/solr_address/conf/lang
278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for 
cores in /opt/solr4.4/solr_address/conf/velocity
283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer 
991552899
284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into 
CoreContainer [instanceDir=/opt/solr4.4/solr_address/]
301 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
socketTimeout to: 0
301 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
urlScheme to: http://
301 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
connTimeout to: 0
302 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
maxConnectionsPerHost to: 20
302 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
corePoolSize to: 0
303 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
maximumPoolSize to: 2147483647
303 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
maxThreadIdleTime to: 5
303 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
sizeOfQueue to: -1
303 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
fairnessPolicy to: false
320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – 
Creating new http client, 
config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log 
Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper 
client=192.168.10.206:2181
429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – 
Creating new http client, 
config:maxConnections=500&maxConnectionsPerHost=16&socketTimeout=0&connTimeout=0
487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting 
for client to connect to ZooKeeper
540 [main-EventThread] INFO 
org.apache.solr.common.cloud.ConnectionManager – Watcher 
org.apache.solr.common.cloud.ConnectionManager@7dc21ece 
name:ZooKeeperConnection Watcher:192.168.10.206:2181 got event 
WatchedEvent state:SyncConnected type:None path:null path:null type:None
541 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Client 
is connected to ZooKeeper
562 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/overseer/queue
578 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/overseer/collection-queue-work
591 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/live_nodes
595 [main] INFO org.apache.solr.cloud.ZkController – Register node as 
live in ZooKeeper:/live_nodes/192.168.10.206:8080_solr_address
600 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/live_nodes/192.168.10.206:8080_solr_address
606 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/collections
613 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/overseer_elect/election
649 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/overseer_elect/leader
654 [main] INFO org.apache.solr.cloud.Overseer – Overseer 
(id=90474615489036288-192.168.10.206:8080_solr_address-n_00) 
starting
675 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/overseer/queue-work
690 
[Overseer-90474615489036288-192.168.10.206:8080_solr_address-n_00] 
INFO org

Solr takes too long to start up

2013-09-30 Thread Zenith
Hi all and thanks in advance for any help with this issue I am having...

Loading halts here: 

Sep 30, 2013 9:38:04 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@687de17d 
main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)}

Once i flush the index and repopulate it loads up normally. I suspect somehow 
the index is getting corrupt.
I also get the following errors on startup (these are related to the tomcat 
admin page which I do not use and solr has run fine in the past with them):


NFO: QuerySenderListener sending requests to Searcher@252ac42e 
main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)}
Sep 30, 2013 9:52:13 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8080"]
Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 1018 ms
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardService startInternal
INFO: Starting service Catalina
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.30
Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig deployDescriptor
INFO: Deploying configuration descriptor 
/etc/tomcat7/Catalina/localhost/host-manager.xml
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext resourcesStart
SEVERE: Error starting static Resources
java.lang.IllegalArgumentException: Document base 
/usr/share/tomcat7-admin/host-manager does not exist or is not a readable 
directory
at 
org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140)
at 
org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
at 
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error in resourceStart()
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error getConfigured
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Context [/host-manager] startup failed due to previous errors
Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig deployDescriptor
INFO: Deploying configuration descriptor 
/etc/tomcat7/Catalina/localhost/manager.xml
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext resourcesStart
SEVERE: Error starting static Resources
java.lang.IllegalArgumentException: Document base 
/usr/share/tomcat7-admin/manager does not exist or is not a readable directory
at 
org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140)
at 
org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
at 
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error in resou

cookies sent by solrj to SOLR

2013-09-30 Thread Dmitry Kan
Hello!

We have recorded the tcp stream between the client using solrj to send
requests to SOLR and arrived at the following header (body omitted):




POST /solr/core0/select HTTP/1.1

Content-Charset: UTF-8

Content-Type: application/x-www-form-urlencoded; charset=UTF-8

User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0

Content-Length: 6163

Host: host:port

Connection: Keep-Alive

Cookie: visited=yes

Cookie2: $Version=1


Can someone please explain what effect do both cookies have on the frontend
solr that has shards underneath it? Solr 4.3.1.

Thanks,

Dmitry


Re: Solr takes too long to start up

2013-09-30 Thread Zenith
As a follow up looks like it is related to this thread:
http://lucene.472066.n3.nabble.com/spellcheck-causing-Core-Reload-to-hang-td4089866.html

Disabling spellcheck gave a normal restart. 



On Sep 30, 2013, at 12:54 PM, Zenith wrote:

> Hi all and thanks in advance for any help with this issue I am having...
> 
> Loading halts here: 
> 
> Sep 30, 2013 9:38:04 AM org.apache.solr.core.QuerySenderListener newSearcher
> INFO: QuerySenderListener sending requests to Searcher@687de17d 
> main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)}
> 
> Once i flush the index and repopulate it loads up normally. I suspect somehow 
> the index is getting corrupt.
> I also get the following errors on startup (these are related to the tomcat 
> admin page which I do not use and solr has run fine in the past with them):
> 
> 
> NFO: QuerySenderListener sending requests to Searcher@252ac42e 
> main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)}
> Sep 30, 2013 9:52:13 AM org.apache.coyote.AbstractProtocol init
> INFO: Initializing ProtocolHandler ["http-bio-8080"]
> Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.Catalina load
> INFO: Initialization processed in 1018 ms
> Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardService startInternal
> INFO: Starting service Catalina
> Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardEngine startInternal
> INFO: Starting Servlet Engine: Apache Tomcat/7.0.30
> Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig 
> deployDescriptor
> INFO: Deploying configuration descriptor 
> /etc/tomcat7/Catalina/localhost/host-manager.xml
> Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext 
> resourcesStart
> SEVERE: Error starting static Resources
> java.lang.IllegalArgumentException: Document base 
> /usr/share/tomcat7-admin/host-manager does not exist or is not a readable 
> directory
>   at 
> org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140)
>   at 
> org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906)
>   at 
> org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086)
>   at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
>   at 
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
>   at 
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
>   at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
>   at 
> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
>   at 
> org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> 
> Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
> SEVERE: Error in resourceStart()
> Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
> SEVERE: Error getConfigured
> Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
> SEVERE: Context [/host-manager] startup failed due to previous errors
> Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig 
> deployDescriptor
> INFO: Deploying configuration descriptor 
> /etc/tomcat7/Catalina/localhost/manager.xml
> Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext 
> resourcesStart
> SEVERE: Error starting static Resources
> java.lang.IllegalArgumentException: Document base 
> /usr/share/tomcat7-admin/manager does not exist or is not a readable directory
>   at 
> org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140)
>   at 
> org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906)
>   at 
> org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086)
>   at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
>   at 
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
>   at 
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
>   at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
>   at 
> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
>   at 
> org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.con

OpenJDK or OracleJDK

2013-09-30 Thread Raheel Hasan
Hi guyz,

I am trying to setup a server.

Could someone tell me if OpenJDK or OracleJDK will be best for Apache Solr
over CentOS?

Thanks a lot.

-- 
Regards,
Raheel Hasan


Re: solr 4.4 config trouble

2013-09-30 Thread Siegfried Goeschl
Hi Marc,

what exactly is not working - no obvious problemsin the logs as as I see

Cheers,

Siegfried Goeschl

Am 30.09.2013 um 11:44 schrieb Marc des Garets :

> Hi,
> 
> I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I can't 
> get it to work. If someone can point me at what I'm doing wrong.
> 
> tomcat context:
>  crossContext="true">
>  value="/opt/solr4.4/solr_address" override="true" />
> 
> 
> 
> core.properties:
> name=address
> collection=address
> coreNodeName=address
> dataDir=/opt/indexes4.1/address
> 
> 
> solr.xml:
> 
> 
> 
> ${host:}
> 8080
> solr_address
> ${zkClientTimeout:15000}
> false
> 
> 
>  class="HttpShardHandlerFactory">
> ${socketTimeout:0}
> ${connTimeout:0}
> 
> 
> 
> 
> In solrconfig.xml I have:
> 4.1
> 
> /opt/indexes4.1/address
> 
> 
> And the log4j logs in catalina.out:
> ...
> INFO: Deploying configuration descriptor solr_address.xml
> 0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – 
> SolrDispatchFilter.init()
> 24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI 
> solr.home: /opt/solr4.4/solr_address
> 26 [main] INFO org.apache.solr.core.SolrResourceLoader – new 
> SolrResourceLoader for directory: '/opt/solr4.4/solr_address/'
> 176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container 
> configuration from /opt/solr4.4/solr_address/solr.xml
> 272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores 
> in /opt/solr4.4/solr_address
> 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores 
> in /opt/solr4.4/solr_address/conf
> 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores 
> in /opt/solr4.4/solr_address/conf/xslt
> 277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores 
> in /opt/solr4.4/solr_address/conf/lang
> 278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores 
> in /opt/solr4.4/solr_address/conf/velocity
> 283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer 
> 991552899
> 284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into 
> CoreContainer [instanceDir=/opt/solr4.4/solr_address/]
> 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
> Setting socketTimeout to: 0
> 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
> Setting urlScheme to: http://
> 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
> Setting connTimeout to: 0
> 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
> Setting maxConnectionsPerHost to: 20
> 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
> Setting corePoolSize to: 0
> 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
> Setting maximumPoolSize to: 2147483647
> 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
> Setting maxThreadIdleTime to: 5
> 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
> Setting sizeOfQueue to: -1
> 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
> Setting fairnessPolicy to: false
> 320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating 
> new http client, 
> config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
> 420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log Listener 
> [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
> 422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper 
> client=192.168.10.206:2181
> 429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating 
> new http client, 
> config:maxConnections=500&maxConnectionsPerHost=16&socketTimeout=0&connTimeout=0
> 487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for 
> client to connect to ZooKeeper
> 540 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager – 
> Watcher org.apache.solr.common.cloud.ConnectionManager@7dc21ece 
> name:ZooKeeperConnection Watcher:192.168.10.206:2181 got event WatchedEvent 
> state:SyncConnected type:None path:null path:null type:None
> 541 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Client is 
> connected to ZooKeeper
> 562 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
> /overseer/queue
> 578 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
> /overseer/collection-queue-work
> 591 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
> /live_nodes
> 595 [main] INFO org.apache.solr.cloud.ZkController – Register node as live in 
> ZooKeeper:/live_nodes/192.168.10.206:8080_solr_address
> 600 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
> /live_nodes/192.168.10.206:8080_solr_address
> 606 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
> /collections
> 613 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
> /overseer_elect/elec

Re: solr 4.4 config trouble

2013-09-30 Thread Kishan Parmar
http://www.coretechnologies.com/products/AlwaysUp/Apps/RunApacheSolrAsAService.html

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!



On Mon, Sep 30, 2013 at 5:33 AM, Siegfried Goeschl  wrote:

> Hi Marc,
>
> what exactly is not working - no obvious problemsin the logs as as I see
>
> Cheers,
>
> Siegfried Goeschl
>
> Am 30.09.2013 um 11:44 schrieb Marc des Garets :
>
> > Hi,
> >
> > I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I
> can't get it to work. If someone can point me at what I'm doing wrong.
> >
> > tomcat context:
> >  crossContext="true">
> >  value="/opt/solr4.4/solr_address" override="true" />
> > 
> >
> >
> > core.properties:
> > name=address
> > collection=address
> > coreNodeName=address
> > dataDir=/opt/indexes4.1/address
> >
> >
> > solr.xml:
> > 
> > 
> > 
> > ${host:}
> > 8080
> > solr_address
> > ${zkClientTimeout:15000}
> > false
> > 
> >
> >  > class="HttpShardHandlerFactory">
> > ${socketTimeout:0}
> > ${connTimeout:0}
> > 
> > 
> >
> >
> > In solrconfig.xml I have:
> > 4.1
> >
> > /opt/indexes4.1/address
> >
> >
> > And the log4j logs in catalina.out:
> > ...
> > INFO: Deploying configuration descriptor solr_address.xml
> > 0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter –
> SolrDispatchFilter.init()
> > 24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI
> solr.home: /opt/solr4.4/solr_address
> > 26 [main] INFO org.apache.solr.core.SolrResourceLoader – new
> SolrResourceLoader for directory: '/opt/solr4.4/solr_address/'
> > 176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container
> configuration from /opt/solr4.4/solr_address/solr.xml
> > 272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
> cores in /opt/solr4.4/solr_address
> > 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
> cores in /opt/solr4.4/solr_address/conf
> > 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
> cores in /opt/solr4.4/solr_address/conf/xslt
> > 277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
> cores in /opt/solr4.4/solr_address/conf/lang
> > 278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
> cores in /opt/solr4.4/solr_address/conf/velocity
> > 283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer
> 991552899
> > 284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into
> CoreContainer [instanceDir=/opt/solr4.4/solr_address/]
> > 301 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> socketTimeout to: 0
> > 301 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> urlScheme to: http://
> > 301 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> connTimeout to: 0
> > 302 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> maxConnectionsPerHost to: 20
> > 302 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> corePoolSize to: 0
> > 303 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> maximumPoolSize to: 2147483647
> > 303 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> maxThreadIdleTime to: 5
> > 303 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> sizeOfQueue to: -1
> > 303 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> fairnessPolicy to: false
> > 320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil –
> Creating new http client,
> config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
> > 420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log
> Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
> > 422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper client=
> 192.168.10.206:2181
> > 429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil –
> Creating new http client,
> config:maxConnections=500&maxConnectionsPerHost=16&socketTimeout=0&connTimeout=0
> > 487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting
> for client to connect to ZooKeeper
> > 540 [main-EventThread] INFO
> org.apache.solr.common.cloud.ConnectionManager – Watcher
> org.apache.solr.common.cloud.ConnectionManager@7dc21ecename:ZooKeeperConnection
>  Watcher:
> 192.168.10.206:2181 got event WatchedEvent state:SyncConnected type:None
> path:null path:null type:None
> > 541 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Client
> is connected to ZooKeeper
> > 562 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath:
> /overseer/queue
> > 578 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath:
> /overseer/collection-queue-work
> > 591 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath:
> /live_nodes
> > 595 [main] INFO org.apache.solr.cloud.ZkController – Register nod

AW: Re: solr 4.4 config trouble

2013-09-30 Thread sgoeschl
Not sure if you are doing your company a favour ;-)

Cheers

Siegfried Goeschl


Von Samsung Mobile gesendet

 Ursprüngliche Nachricht 
Von: Kishan Parmar  
Datum:  
An: solr-user@lucene.apache.org 
Betreff: Re: solr 4.4 config trouble 
 
http://www.coretechnologies.com/products/AlwaysUp/Apps/RunApacheSolrAsAService.html

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!



On Mon, Sep 30, 2013 at 5:33 AM, Siegfried Goeschl  wrote:

> Hi Marc,
>
> what exactly is not working - no obvious problemsin the logs as as I see
>
> Cheers,
>
> Siegfried Goeschl
>
> Am 30.09.2013 um 11:44 schrieb Marc des Garets :
>
> > Hi,
> >
> > I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I
> can't get it to work. If someone can point me at what I'm doing wrong.
> >
> > tomcat context:
> >  crossContext="true">
> >  value="/opt/solr4.4/solr_address" override="true" />
> > 
> >
> >
> > core.properties:
> > name=address
> > collection=address
> > coreNodeName=address
> > dataDir=/opt/indexes4.1/address
> >
> >
> > solr.xml:
> > 
> > 
> > 
> > ${host:}
> > 8080
> > solr_address
> > ${zkClientTimeout:15000}
> > false
> > 
> >
> >  > class="HttpShardHandlerFactory">
> > ${socketTimeout:0}
> > ${connTimeout:0}
> > 
> > 
> >
> >
> > In solrconfig.xml I have:
> > 4.1
> >
> > /opt/indexes4.1/address
> >
> >
> > And the log4j logs in catalina.out:
> > ...
> > INFO: Deploying configuration descriptor solr_address.xml
> > 0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter –
> SolrDispatchFilter.init()
> > 24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI
> solr.home: /opt/solr4.4/solr_address
> > 26 [main] INFO org.apache.solr.core.SolrResourceLoader – new
> SolrResourceLoader for directory: '/opt/solr4.4/solr_address/'
> > 176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container
> configuration from /opt/solr4.4/solr_address/solr.xml
> > 272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
> cores in /opt/solr4.4/solr_address
> > 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
> cores in /opt/solr4.4/solr_address/conf
> > 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
> cores in /opt/solr4.4/solr_address/conf/xslt
> > 277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
> cores in /opt/solr4.4/solr_address/conf/lang
> > 278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
> cores in /opt/solr4.4/solr_address/conf/velocity
> > 283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer
> 991552899
> > 284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into
> CoreContainer [instanceDir=/opt/solr4.4/solr_address/]
> > 301 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> socketTimeout to: 0
> > 301 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> urlScheme to: http://
> > 301 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> connTimeout to: 0
> > 302 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> maxConnectionsPerHost to: 20
> > 302 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> corePoolSize to: 0
> > 303 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> maximumPoolSize to: 2147483647
> > 303 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> maxThreadIdleTime to: 5
> > 303 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> sizeOfQueue to: -1
> > 303 [main] INFO
> org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
> fairnessPolicy to: false
> > 320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil –
> Creating new http client,
> config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
> > 420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log
> Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
> > 422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper client=
> 192.168.10.206:2181
> > 429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil –
> Creating new http client,
> config:maxConnections=500&maxConnectionsPerHost=16&socketTimeout=0&connTimeout=0
> > 487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting
> for client to connect to ZooKeeper
> > 540 [main-EventThread] INFO
> org.apache.solr.common.cloud.ConnectionManager – Watcher
> org.apache.solr.common.cloud.ConnectionManager@7dc21ecename:ZooKeeperConnection
>  Watcher:
> 192.168.10.206:2181 got event WatchedEvent state:SyncConnected type:None
> path:null path:null type:None
> > 541 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Client
> is connected to ZooKeeper
> > 562 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath:
> /overseer/queue
> > 578 [m

Re: OpenJDK or OracleJDK

2013-09-30 Thread Bram Van Dam

On 09/30/2013 01:11 PM, Raheel Hasan wrote:

Could someone tell me if OpenJDK or OracleJDK will be best for Apache Solr
over CentOS?


If you're using Java 7 (or 8) then it doesn't matter. If you're using 
Java 6, stick with the Oracle version.




Re: Solr sorting situation!

2013-09-30 Thread Gustav
Anyone with any ideas?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-sorting-situation-tp4091966p4092688.html
Sent from the Solr - User mailing list archive at Nabble.com.


filterCache stats reported wrongly in solr admin?

2013-09-30 Thread Dmitry Kan
Hi!

Can it really be so that filterCache size is 63, inserts 103 and zero
evictions?

Is this a bug or am I misinterpreting the stats?

http://pasteboard.co/9Dmkc4H.png

Thanks,

Dmitry


Re: Atomic updates with solr cloud in solr 4.4

2013-09-30 Thread Sesha Sendhil Subramanian
The field variant_count is stored and is not the target of a copyfield.


However I did notice that we were setting the same coreNodeName on both the
shards in core.properties. Removing this property fixed the issue and
updates succeed.

What role does this play in handling updates and why were other queries
using the select handler not failing?

Thanks
Sesha


On Sat, Sep 21, 2013 at 7:59 PM, Yonik Seeley  wrote:

> I can't reproduce this.
> I tried starting up a 2 shard cluster and then followed the example here:
> http://yonik.com/solr/atomic-updates/
>
> "book1" was on shard2 (port 7574) and everything still worked fine.
>
> > missing required field: variant_count
>
> Perhaps the problem is document specific... What can you say about
> this variant_count field?
> Is it stored?  Is it the target of a copyField?
>
>
> -Yonik
> http://lucidworks.com
>
>
>
>
> On Tue, Sep 17, 2013 at 12:56 PM, Sesha Sendhil Subramanian
>  wrote:
> > curl http://localhost:8983/solr/search/update -H
> > 'Content-type:application/json' -d '
> > [
> >  {
> >   "id":
> > "c8cce27c1d8129d733a3df3de68dd675!c8cce27c1d8129d733a3df3de68dd675",
> >   "link_id_45454" : {"set":"abcdegff"}
> >  }
> > ]'
> >
> > I have two collections search and meta. I want to do an update in the
> > search collection.
> > If i pick a document in same shard : localhost:8983, the update succeeds
> >
> > 15350327 [qtp386373885-19] INFO
> >  org.apache.solr.update.processor.LogUpdateProcessor  ? [search]
> > webapp=/solr path=/update params={}
> > {add=[6cfcb56ca52b56ccb1377a7f0842e74d!6cfcb56ca52b56ccb1377a7f0842e74d
> > (1446444025873694720)]} 0 5
> >
> > If i pick a document on a different shard : localhost:7574, the update
> fails
> >
> > 15438547 [qtp386373885-75] INFO
> >  org.apache.solr.update.processor.LogUpdateProcessor  ? [search]
> > webapp=/solr path=/update params={} {} 0 1
> > 15438548 [qtp386373885-75] ERROR org.apache.solr.core.SolrCore  ?
> > org.apache.solr.common.SolrException:
> > [doc=c8cce27c1d8129d733a3df3de68dd675!c8cce27c1d8129d733a3df3de68dd675]
> > missing required field: variant_count
> > at
> >
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:189)
> > at
> >
> org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
> > at
> >
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
> > at
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
> > at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:556)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:692)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
> > at
> >
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> > at
> >
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:392)
> > at
> >
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:117)
> > at
> >
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:101)
> > at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:65)
> > at
> >
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> > at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> > at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> > at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> > at
> >
> or

Re: filterCache stats reported wrongly in solr admin?

2013-09-30 Thread Dmitry Kan
Looking at the code reveals that the put  = insert operation increases the
counter regardless of the duplicates. The size returns unique values only.

Thanks to ehatcher for the hint.

Dmitry
On 30 Sep 2013 16:23, "Dmitry Kan"  wrote:

> Hi!
>
> Can it really be so that filterCache size is 63, inserts 103 and zero
> evictions?
>
> Is this a bug or am I misinterpreting the stats?
>
> http://pasteboard.co/9Dmkc4H.png
>
> Thanks,
>
> Dmitry
>


Re: Solr Autocomplete with "did you means" functionality handle misspell word like google

2013-09-30 Thread Alessandro Benedetti
It's really simple indeed. Solr provide the SpellCheck[1] feature that
allow you to do this.
You have only to configure the RequestHandler and the Search Component.
And of course develop a simple ui ( you can find an example in the velocity
response handler Solritas[2] .

Cheers

[1] https://cwiki.apache.org/confluence/display/solr/Spell+Checking
[2] https://cwiki.apache.org/confluence/display/solr/Velocity+Search+UI


2013/9/27 Otis Gospodnetic 

> Hi,
>
> Not sure if Solr suggester can do this (can it, anyone?), but...
> shameless plug... I know
> http://sematext.com/products/autocomplete/index.html can do that.
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Thu, Sep 26, 2013 at 8:26 AM, Suneel Pandey 
> wrote:
> > 
> >
> > Hi,
> >
> > I have implemented auto complete it's  working file but, I want to
> implement
> > autosuggestion like google (see above screen) . when someone typing
> misspell
> > words suggestion should be show e.g: cmputer => computer.
> >
> >
> > Please help me.
> >
> >
> >
> >
> >
> >
> > -
> > Regards,
> >
> > Suneel Pandey
> > Sr. Software Developer
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Autocomplete-with-did-you-means-functionality-handle-misspell-word-like-google-tp4092127.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Solr and jvm Garbage Collection tuning

2013-09-30 Thread Alessandro Benedetti
I think this could help : http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Cheers


2013/9/27 ewinclub7 

>
> ด้วยที่แทงบอลแบบออนไลน์กำลังมาแรงทำให้พวกโต๊ะบอลเดี๋ยวนี้ก็เริ่มขยับขยายมาเปิดรับแทงบอลออนไลน์เอง
> download goldclub 
> เป้าหมายหลักในวิธีการเล่นคาสิโนนั้น มีเพื่อความเพลิดเพลินหรือความสนุก
> ไม่ใช่เพื่อมาหาเงินหรือหวังที่จะรวย
> เพราะนั้นเหมือนกับการที่เราเอาจิตใจของตนเองไปผูกติดกับวิธีการเล่นพนัน
> โปรโมชั่น goldclub slot    เล่นสนุก
> เล่นง่าย พร้อมบริการอย่างเป็นกันเอง กับทีมงาน  ผลบอลเมื่อคืนนี้
> 
>
> จากที่เราได้เห็นวิธีการเล่นการพนันที่เล่นกันง่ายนั่นก็เลยทำให้คนเรานั่นเกิดความคิดที่อยากจะลองเล่นการพนันลองดู
> สาเหตุที่ทำให้นักเล่นหน้าใหม่ได้หัดเล่นเกมส์ซะเป็นส่วนใหญ่  goldclub slot
> 
>
> เพราะแน่นอนว่าจากที่เคยไปเที่ยวประเทศไหนที่มีคาสิโนและเข้าไปลองเล่นดูก็คงจะได้สัมผัสถึงความคึกคักของคาสิโนนั้นๆ
> ถอนออกมาทั้ง 1200 บาทเลยก็ได้ หรือจะถอนมาแค่ 1000 บาท อีก 200 บาทเก็บไว้
> เล่นอีก แบบนี้ก็ได้เล่นกัน
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-and-jvm-Garbage-Collection-tuning-tp1455467p4092328.html
> Sent from the Solr - User mailing list archive at Nabble.com.




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: OpenJDK or OracleJDK

2013-09-30 Thread Raheel Hasan
hmm why is that so?
Isnt Oracle's version a bit slow?


On Mon, Sep 30, 2013 at 5:56 PM, Bram Van Dam  wrote:

> On 09/30/2013 01:11 PM, Raheel Hasan wrote:
>
>> Could someone tell me if OpenJDK or OracleJDK will be best for Apache Solr
>> over CentOS?
>>
>
> If you're using Java 7 (or 8) then it doesn't matter. If you're using Java
> 6, stick with the Oracle version.
>
>


-- 
Regards,
Raheel Hasan


Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-30 Thread P Williams
Hi Andreas,

When using 
XPathEntityProcessoryour
DataSource
must be of type DataSource.  You shouldn't be using
BinURLDataSource, it's giving you the cast exception.  Use
URLDataSource
or
FileDataSourceinstead.

I don't think you need to specify namespaces, at least you didn't used to.
 The other thing that I've noticed is that the anywhere xpath expression //
doesn't always work in DIH.  You might have to be more specific.

Cheers,
Tricia





On Sun, Sep 29, 2013 at 9:47 AM, Andreas Owen  wrote:

> how dum can you get. obviously quite dum... i would have to analyze the
> html-pages with a nested instance like this:
>
>  url="file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml"
> forEach="/docs/doc" dataSource="main">
>
>  url="${rec.urlParse}" forEach="/xhtml:html" dataSource="dataUrl">
> 
> 
> 
> 
> 
> 
>
> but i'm pretty sure the foreach is wrong and the xpath expressions. in the
> moment i getting the following error:
>
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException:
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream cannot be cast
> to java.io.Reader
>
>
>
>
>
> On 28. Sep 2013, at 1:39 AM, Andreas Owen wrote:
>
> > ok i see what your getting at but why doesn't the following work:
> >
> >   
> >   
> >
> > i removed the tiki-processor. what am i missing, i haven't found
> anything in the wiki?
> >
> >
> > On 28. Sep 2013, at 12:28 AM, P Williams wrote:
> >
> >> I spent some more time thinking about this.  Do you really need to use
> the
> >> TikaEntityProcessor?  It doesn't offer anything new to the document you
> are
> >> building that couldn't be accomplished by the XPathEntityProcessor alone
> >> from what I can tell.
> >>
> >> I also tried to get the Advanced
> >> Parsingexample to
> >> work without success.  There are some obvious typos (
> >> instead of ) and an odd order to the pieces ( is
> >> enclosed by ).  It also looks like
> >> FieldStreamDataSource<
> http://lucene.apache.org/solr/4_3_1/solr-dataimporthandler/org/apache/solr/handler/dataimport/FieldStreamDataSource.html
> >is
> >> the one that is meant to work in this context. If Koji is still around
> >> maybe he could offer some help?  Otherwise this bit of erroneous
> >> instruction should probably be removed from the wiki.
> >>
> >> Cheers,
> >> Tricia
> >>
> >> $ svn diff
> >> Index:
> >>
> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
> >> ===
> >> ---
> >>
> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
> >>(revision 1526990)
> >> +++
> >>
> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
> >>(working copy)
> >> @@ -99,13 +99,13 @@
> >>runFullImport(getConfigHTML("identity"));
> >>assertQ(req("*:*"), testsHTMLIdentity);
> >>  }
> >> -
> >> +
> >>  private String getConfigHTML(String htmlMapper) {
> >>return
> >>"" +
> >>"  " +
> >>"  " +
> >> -" >> processor='TikaEntityProcessor' " +
> >> +" >> processor='TikaEntityProcessor' " +
> >>"   url='" +
> >> getFile("dihextras/structured.html").getAbsolutePath() + "' " +
> >>((htmlMapper == null) ? "" : (" htmlMapper='" + htmlMapper +
> >> "'")) + ">" +
> >>"  " +
> >> @@ -114,4 +114,36 @@
> >>"";
> >>
> >>  }
> >> +  private String[] testsHTMLH1 = {
> >> +  "//*[@numFound='1']"
> >> +  , "//str[@name='h1'][contains(.,'H1 Header')]"
> >> +  };
> >> +
> >> +  @Test
> >> +  public void testTikaHTMLMapperSubEntity() throws Exception {
> >> +runFullImport(getConfigSubEntity("identity"));
> >> +assertQ(req("*:*"), testsHTMLH1);
> >> +  }
> >> +
> >> +  private String getConfigSubEntity(String htmlMapper) {
> >> +return
> >> +"" +
> >> +"" +
> >> +"" +
> >> +"" +
> >> +" >> dataSource='bin' format='html' rootEntity='false'>" +
> >> +"" +
> >> +"" +
> >> +"" +
> >> +"" +
> >> +"" +
> >> +" forEach='/html'
> >> dataSource='fld' dataField='tika.text' rootEntity='true' >" +
> >> +"" +
> >> +"" +
> >> +"" +
> >> +"" +
> >> +  

Re: Cross index join query performance

2013-09-30 Thread Peter Keegan
Ah, got it now - thanks for the explanation.


On Sat, Sep 28, 2013 at 3:33 AM, Upayavira  wrote:

> The thing here is to understand how a join works.
>
> Effectively, it does the inner query first, which results in a list of
> terms. It then effectively does a multi-term query with those values.
>
> q=size:large {!join fromIndex=other from=someid
> to=someotherid}type:shirt
>
> Imagine the inner join returned values A,B,C. Your inner query is, on
> core 'other', q=type:shirt&fl=someid.
>
> Then your outer query becomes size:large someotherid:(A B C)
>
> Your inner query returns 25k values. You're having to do a multi-term
> query for 25k terms. That is *bound* to be slow.
>
> The pseudo-joins in Solr 4.x are intended for a small to medium number
> of values returned by the inner query, otherwise performance degrades as
> you are seeing.
>
> Is there a way you can reduce the number of values returned by the inner
> query?
>
> As Joel mentions, those other joins are attempts to find other ways to
> work with this limitation.
>
> Upayavira
>
> On Fri, Sep 27, 2013, at 09:44 PM, Peter Keegan wrote:
> > Hi Joel,
> >
> > I tried this patch and it is quite a bit faster. Using the same query on
> > a
> > larger index (500K docs), the 'join' QTime was 1500 msec, and the 'hjoin'
> > QTime was 100 msec! This was for true for large and small result sets.
> >
> > A few notes: the patch didn't compile with 4.3 because of the
> > SolrCore.getLatestSchema call (which I worked around), and the package
> > name
> > should be:
> >  > class="org.apache.solr.search.joins.HashSetJoinQParserPlugin"/>
> >
> > Unfortunately, I just learned that our uniqueKey may have to be an
> > alphanumeric string instead of an int, so I'm not out of the woods yet.
> >
> > Good stuff - thanks.
> >
> > Peter
> >
> >
> > On Thu, Sep 26, 2013 at 6:49 PM, Joel Bernstein 
> > wrote:
> >
> > > It looks like you are using int join keys so you may want to check out
> > > SOLR-4787, specifically the hjoin and bjoin.
> > >
> > > These perform well when you have a large number of results from the
> > > fromIndex. If you have a small number of results in the fromIndex the
> > > standard join will be faster.
> > >
> > >
> > > On Wed, Sep 25, 2013 at 3:39 PM, Peter Keegan  > > >wrote:
> > >
> > > > I forgot to mention - this is Solr 4.3
> > > >
> > > > Peter
> > > >
> > > >
> > > >
> > > > On Wed, Sep 25, 2013 at 3:38 PM, Peter Keegan <
> peterlkee...@gmail.com
> > > > >wrote:
> > > >
> > > > > I'm doing a cross-core join query and the join query is 30X slower
> than
> > > > > each of the 2 individual queries. Here are the queries:
> > > > >
> > > > > Main query:
> http://localhost:8983/solr/mainindex/select?q=title:java
> > > > > QTime: 5 msec
> > > > > hit count: 1000
> > > > >
> > > > > Sub query: http://localhost:8983/solr/subindex/select?q=+fld1:[0.1TO
> > > > 0.3]
> > > > > QTime: 4 msec
> > > > > hit count: 25K
> > > > >
> > > > > Join query:
> > > > >
> > > >
> > >
> http://localhost:8983/solr/mainindex/select?q=title:java&fq={!joinfromIndex=mainindextoIndex=subindexfrom=docidto=docid}fld1:[0.1
>  TO 0.3]
> > > > > QTime: 160 msec
> > > > > hit count: 205
> > > > >
> > > > > Here are the index spec's:
> > > > >
> > > > > mainindex size: 117K docs, 1 segment
> > > > > mainindex schema:
> > > > > > > > > required="true" multiValued="false" />
> > > > > > > > > stored="true" multiValued="false" />
> > > > >docid
> > > > >
> > > > > subindex size: 117K docs, 1 segment
> > > > > subindex schema:
> > > > > > > > > required="true" multiValued="false" />
> > > > > > > > > required="false" multiValued="false" />
> > > > >docid
> > > > >
> > > > > With debugQuery=true I see:
> > > > >   "debug":{
> > > > > "join":{
> > > > >   "{!join from=docid to=docid fromIndex=subindex}fld1:[0.1 TO
> > > 0.3]":{
> > > > > "time":155,
> > > > > "fromSetSize":24742,
> > > > > "toSetSize":24742,
> > > > > "fromTermCount":117810,
> > > > > "fromTermTotalDf":117810,
> > > > > "fromTermDirectCount":117810,
> > > > > "fromTermHits":24742,
> > > > > "fromTermHitsTotalDf":24742,
> > > > > "toTermHits":24742,
> > > > > "toTermHitsTotalDf":24742,
> > > > > "toTermDirectCount":24627,
> > > > > "smallSetsDeferred":115,
> > > > > "toSetDocsAdded":24742}},
> > > > >
> > > > > Via profiler and debugger, I see 150 msec spent in the outer
> > > > > 'while(term!=null)' loop in: JoinQueryWeight.getDocSet(). This
> seems
> > > > like a
> > > > > lot of time to join the bitsets. Does this seem right?
> > > > >
> > > > > Peter
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Joel Bernstein
> > > Professional Services LucidWorks
> > >
>


Considerations about setting maxMergedSegmentMB

2013-09-30 Thread Isaac Hebsh
Hi,
Trying to solve query performance issue, we suspect on the number of index
segments, which might slow the query (due to I/O seeks, happens for each
term in the query, multiplied by number of segments).
We are on Solr 4.3 (TieredMergePolicy with mergeFactor of 4).

We can reduce the number of segments by enlarging maxMergedSegmentMB, from
the default 5GB to something bigger (10GB, 15GB?).

What are the side effects, which should be considered when doing it?
Did anyone changed this setting in PROD for a while?


Searching on (hyphenated/capitalized) word issue

2013-09-30 Thread Van Tassell, Kristian
I have a search term "multi-CAD" being issues on tokenized text.  The problem 
is that you cannot get any search results when you type "multicad" unless you 
add a hyphen (multi-cad) or type "multiCAD" (omitting the hyphen, but correctly 
adding the CAPS into the spelling).



However, for the similar but unhyphenated word AutoCAD, you can type "autocad" 
and get hits for AutoCAD, as you would expect. You can type "auto-cad" and get 
the same results.

The query seems to get parsed as separate words (resulting in hits) for 
multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for multicad. In 
other words, the search terms  become "multi cad" and "auto cad" for all cases 
except for when the term is "multicad".

I'm guessing this may be in part to "auto" being a more common word prefix, but 
I may be wrong. Can anyone provide some clarity (and maybe point me towards a 
potential solution)?

Thanks in advance!


Kristian Van Tassell
Siemens Industry Sector
Siemens Product Lifecycle Management Software Inc.
5939 Rice Creek Parkway
Shoreview, MN  55126 United States
Tel.  :+1 (651) 855-6194
Fax  :+1 (651) 855-6280
kristian.vantass...@siemens.com 
www.siemens.com/plm



Re: Data duplication using Cloud+HDFS+Mirroring

2013-09-30 Thread Isaac Hebsh
Hi Greg, Did you get an answer?
I'm interested in the same question.

More generally, what are the benefits of HdfsDirectoryFactory, besides the
transparent restore of the shard contents in case of a disk failure, and
the ability to rebuild index using MR?
Is the next statement exact? blocks of a particular shard, which are
replicated to another node, will be never queried, since there is no solr
core configured to read them.


On Wed, Aug 7, 2013 at 8:46 PM, Greg Walters
wrote:

> While testing Solr's new ability to store data and transaction directories
> in HDFS I added an additional core to one of my testing servers that was
> configured as a backup (active but not leader) core for a shard elsewhere.
> It looks like this extra core copies the data into its own directory rather
> than just using the existing directory with the data that's already
> available to it.
>
> Since HDFS likely already has redundancy of the data covered via the
> replicationFactor is there a reason for non-leader cores to create their
> own data directory rather than doing reads on the existing master copy? I
> searched Jira for anything that suggests this behavior might change and
> didn't find any issues; is there any intent to address this?
>
> Thanks,
> Greg
>


Re: Doing time sensitive search in solr

2013-09-30 Thread Darniz
Thanks for the quick answers.
i have gone thru the presentation and thats what i was tilting towards using
dynamic fields i just want to run down an example so thats its clear about
how to approach this issue. 

Sept content : Honda is releasing the car this month 


Dec content : Toyota is releasing the car this month 

After adding dynamic fields like *_entryDate and *_entryText my solr doc
will look something like this.

2013-09-01T00:00:00Z
Sept content : Honda is releasing
the car this month 

2013-12-01T00:00:00Z
Dec content : Toyota is releasing
the car this month 

if someone searches for a query something like
*_entryDate:[* TO NOW] AND *_entryText:Toyota the results wont show up
toyota in the search results.

the only disadvantage we have with this approach is we might end up with a
lot of runtime fields since we have thousands of entries which might be time
bound in our cms. 
i might also do some more investigation to see if we can handle this at
index time to index data as time comes some scheduler of something, because
the above approach might solve the issue but may make the queries very slow.


Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092763.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching on (hyphenated/capitalized) word issue

2013-09-30 Thread Upayavira
You need to look at your analysis chain. The stuff you're talking about
there is all configurable.

There's different tokenisers available to split your fields differently,
then you might use the WordDelimiterFilterFactory to split existing
tokens further (e.g. WiFi might become "wi", "fi" and "WiFi"). So
really, you need to craft your own analysis chain to fit the kind of
data you are working with.

Upayavira

On Mon, Sep 30, 2013, at 06:50 PM, Van Tassell, Kristian wrote:
> I have a search term "multi-CAD" being issues on tokenized text.  The
> problem is that you cannot get any search results when you type
> "multicad" unless you add a hyphen (multi-cad) or type "multiCAD"
> (omitting the hyphen, but correctly adding the CAPS into the spelling).
> 
> 
> 
> However, for the similar but unhyphenated word AutoCAD, you can type
> "autocad" and get hits for AutoCAD, as you would expect. You can type
> "auto-cad" and get the same results.
> 
> The query seems to get parsed as separate words (resulting in hits) for
> multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for multicad.
> In other words, the search terms  become "multi cad" and "auto cad" for
> all cases except for when the term is "multicad".
> 
> I'm guessing this may be in part to "auto" being a more common word
> prefix, but I may be wrong. Can anyone provide some clarity (and maybe
> point me towards a potential solution)?
> 
> Thanks in advance!
> 
> 
> Kristian Van Tassell
> Siemens Industry Sector
> Siemens Product Lifecycle Management Software Inc.
> 5939 Rice Creek Parkway
> Shoreview, MN  55126 United States
> Tel.  :+1 (651) 855-6194
> Fax  :+1 (651) 855-6280
> kristian.vantass...@siemens.com 
> www.siemens.com/plm
> 


Issue with Group By / Field Collapsing

2013-09-30 Thread Shamik Bandopadhyay
Hi,

  I'm trying to use group by option to remove duplicates from my search
result. I'm applying Group By option on a field called TopicId. I'm simply
appending this at the end of my query.

group=true&group.field=TopicId

Initially, the result looked great as I was able to see the duplicates
getting removed and only the document with highest score among the
duplicates,is being returned. But then when I started comparing the result
without the group by option, something doesn't look right. For e.g. the
search without a group by option returned results from Source "A",  "B" and
"C". Documents from Source "A" have the TopicId field while it's not
present in "B" or "C". When I add the Group-By option, the documents from
"B" and "C" are completely ignored, though some of them have scores higher
than A.

I'm little confused if this is the intended behavior.  Does group-by mean
that it'll only return results where the group-by field is present ? Do I
need to use additional group-by parameters to address this ?

Any pointers will be highly appreciated.

Thanks,
Shamik


Re: Hello and help :)

2013-09-30 Thread Marcelo Elias Del Valle
Upayavira,

First of all, thanks for the answers.

We have considerer the possibily of doing several queries, however in
hour case we want a count to show to the user (should take less than 2
seconds) and we could have millions of rows (being million of queries) to
get this count.
Isn't there any way to filter by the count? Something like, get all
users where the number of corresponding documents in a join is lesser than
X.  Or all the users grouped by field F where count of records for field F
is lesser than X... Or anything like that, regarding counts...

Best regards,
Marcelo Valle.


2013/9/30 Upayavira 

> If your app and solr aren't far apart, you shouldn't be afraid of
> multiple queries to solr per user request (I once discovered an app that
> did 36 hits to solr per user request, and despite such awfulness of
> design, no user ever complained about speed).
>
> You could do a query to solr for q=+user_id:X +date:[dateX TO dateY] to
> find out how many docs, then take the numFound value, if it is above Y,
> do a subsequent query to retrieve the docs, either all docs, or toes in
> the relevant date range.
>
> Don't know if that helps.
>
> Upayavira
>
> On Sun, Sep 29, 2013, at 05:15 PM, Matheus Salvia wrote:
> > Thanks for the anwser. Yes, you understood it correctly.
> > The method you proposed should work perfectly, except I do have one more
> > requirement that I forgot to mention earlier, and I apologize for that.
> > The true problem we are facing is:
> > * find all documents for userID=x, where userID=x has more than y
> >  documents in the index between dateA and dateB
> >
> > And since dateA and dateB can be any dates, its impossible to save the
> > count, since we cannot foresee what date and what count will be
> > requested.
> >
> >
> > 2013/9/28 Upayavira 
> >
> > > To phrase your need more generically:
> > >
> > >  * find all documents for userID=x, where userID=x has more than y
> > >  documents in the index
> > >
> > > Is that correct?
> > >
> > > If it is, I'd probably do some work at index time. First guess, I'd
> keep
> > > a separate core, which has a very small document per user, storing
> just:
> > >
> > >  * userID
> > >  * docCount
> > >
> > > Then, when you add/delete a document, you use atomic updates to either
> > > increase or decrease the docCount on that user doc.
> > >
> > > Then you can use a pseudo join between these two cores relatively
> > > easily.
> > >
> > > q=user_id:x {!join fromIndex=user from=user_id to=user_id}+user_id:x
> > > +doc_count:[y TO *]
> > >
> > > Worst case, if you don't want to mess with your indexing code, I wonder
> > > if you could use a ScriptUpdateProcessor to do this work - not sure if
> > > you can have one add an entirely new, additional, document to the list,
> > > but may be possible.
> > >
> > > Upayavira
> > >
> > > On Fri, Sep 27, 2013, at 09:50 PM, Matheus Salvia wrote:
> > > > Sure, sorry for the inconvenience.
> > > >
> > > > I'm having a little trouble trying to make a query in Solr. The
> problem
> > > > is:
> > > > I must be able retrieve documents that have the same value for a
> > > > specified
> > > > field, but they should only be retrieved if this value appeared more
> than
> > > > X
> > > > times for a specified user. In pseudosql it would be something like:
> > > >
> > > > select user_id from documents
> > > > where my_field="my_value"
> > > > and
> > > > (select count(*) from documents where my_field="my_value" and
> > > > user_id=super.user_id) > X
> > > >
> > > > I Know that solr return a 'numFound' for each query you make, but I
> dont
> > > > know how to retrieve this value in a subquery.
> > > >
> > > > My Solr is organized in a way that a user is a document, and the
> > > > properties
> > > > of the user (such as name, age, etc) are grouped in another document
> with
> > > > a
> > > > 'root_id' field. So lets suppose the following query that gets all
> the
> > > > root
> > > > documents whose children have the prefix "some_prefix".
> > > >
> > > > is_root:true AND _query_:"{!join from=root_id
> > > > to=id}requests_prefix:\"some_prefix\""
> > > >
> > > > Now, how can I get the root documents (users in some sense) that have
> > > > more
> > > > than X children matching 'requests_prefix:"some_prefix"' or any other
> > > > condition? Is it possible?
> > > >
> > > > P.S. It must be done in a single query, fields can be added at will,
> but
> > > > the root/children structure should be preserved (preferentially).
> > > >
> > > >
> > > > 2013/9/27 Upayavira 
> > > >
> > > > > Mattheus,
> > > > >
> > > > > Given these mails form a part of an archive that are themselves
> > > > > self-contained, can you please post your actual question here?
> You're
> > > > > more likely to get answers that way.
> > > > >
> > > > > Thanks, Upayavira
> > > > >
> > > > > On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote:
> > > > > > Hello everyone,
> > > > > > I'm having a problem regarding how to make a solr query, I've
>

Re: Hello and help :)

2013-09-30 Thread Marcelo Elias Del Valle
Socratees,

You wrote: "Or, What if you can facet by the field, and group by the field
count, then *apply facet filtering to exclude all filters with count less
than 5?*"
That's exactly what I want, I just couldn't figure how to do it! Any idea
how could I write this query?

Best regards,
Marcelo.


2013/9/27 Socratees Samipillai 

> Hi Marcelo,
> I haven't faced this exact situation before so I can only try posting my
> thoughts.
> Since Solr allows Result Grouping and Faceting at the same time, and since
> you can apply filters on these facets, can you take advantage of that?
> Or, What if you can facet by the field, and group by the field count, then
> apply facet filtering to exclude all filters with count less than 5?
> These links might be helpful.
> http://architects.dzone.com/articles/facet-over-same-field-multiple
> https://issues.apache.org/jira/browse/SOLR-2898
> Thanks,
> — Socratees.
>
> > Date: Fri, 27 Sep 2013 20:32:22 -0300
> > Subject: Re: Hello and help :)
> > From: marc...@s1mbi0se.com.br
> > To: solr-user@lucene.apache.org
> >
> > Ssami,
> >
> > I work with Matheus and I am helping him to take a look at this
> > problem. We took a look at result grouping, thinking it could help us,
> but
> > it has two drawbacks:
> >
> >- We cannot have multivalued fields, if I understood it correctly. But
> >ok, we could manage that...
> >- Suppose some query like that:
> >   - select count(*) NUMBER group by FIELD where CONDITION AND NUMBER
> > 5
> >   - In this case, we are not just taking the count for each group as
> a
> >   result. The count actually makes part of the where clause.
> >   - AFAIK, result grouping doesn't allow that, although I would
> really
> >   love to be proven wrong :D
> >
> > We really need this, so I am trying to figure what could I change in
> > solr to make this work... Any hint on that? We would need to write a
> custom
> > facet / search handler / search component ? Of course we prefer a
> solution
> > that works with current solr features, but we could consider writing some
> > custom code to do that
> >
> > Thanks in advance!
> >
> > Best regards,
> > Marcelo Valle.
> >
> >
> > 2013/9/27 ssami 
> >
> > > If I understand your question right, Result Grouping in Solr might help
> > > you.
> > >
> > > Refer  here
> > >   .
> > >
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
>


multi core join and simple indexed join

2013-09-30 Thread Marcelo Elias Del Valle
Comparing indexed joins on multiple core or on the same core...
Which one would be faster?
I am guessing doing it on multiple cores would be faster, as the index on
each core would be smaller... Any thoughts on that?
[]s


[JOB] Solr / Elasticsearch Engineer @ Sematext

2013-09-30 Thread Otis Gospodnetic
Hello,

If you are looking to work with Solr and Elasticsearch, among other
things, this may be for you:

http://blog.sematext.com/2013/09/26/solr-elasticsearch-job-engineering/

This role offers a healthy mix of Solr/ES consulting, support, and
product development.

Everything that might be of interest should be there, but I'll be
happy to answer any questions anyone may have off-list.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm


Re: Considerations about setting maxMergedSegmentMB

2013-09-30 Thread Erick Erickson
Before going there, you can do a really simple test. Turn off
indexing and then issues a optimize/force-merge. After it
completes (and it may take quite some time) measure your
performance again to see fi this is on the right track.

Best,
Erick

On Mon, Sep 30, 2013 at 1:31 PM, Isaac Hebsh  wrote:
> Hi,
> Trying to solve query performance issue, we suspect on the number of index
> segments, which might slow the query (due to I/O seeks, happens for each
> term in the query, multiplied by number of segments).
> We are on Solr 4.3 (TieredMergePolicy with mergeFactor of 4).
>
> We can reduce the number of segments by enlarging maxMergedSegmentMB, from
> the default 5GB to something bigger (10GB, 15GB?).
>
> What are the side effects, which should be considered when doing it?
> Did anyone changed this setting in PROD for a while?


No longer allowed to store html in a 'string' type

2013-09-30 Thread Kevin Cunningham
We have been using Solr for a while now, went from 1.4 -> 3.6.  While running 
some tests in 4.4 we are no longer allowed to store raw html in a documents 
field with a type of 'string', which we used to be able to do. Has something 
changed here?  Now we get the following error: Undeclared general entity 
\"nbsp\"\r\n at [row,col {unknown-source}]: [11,53]

I understand what its saying and can change the way we store and extract it if 
it's a must but would like to understand what changed.  Sounds like something 
just became more strict to adhering to rules.



Testing #bananas tag  
document document document document document document

blog





Re: Doing time sensitive search in solr

2013-09-30 Thread Darniz
Hello 
i just wanted to make sure can we query dynamic fields using wildcard well
if not then i dont think this solution might work, since i dont know the
exact concrete name of the field.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092830.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [JOB] Solr / Elasticsearch Engineer @ Sematext

2013-09-30 Thread Ashwin Tandel
Hi,

I would like to apply for SEARCH CONSULTING & SEARCH SOLUTIONS ARCHITECT
position.

PFA my resume. You can reach me at 2019934403.

Thanks,
Ashwin
cell - 2019934403


On Mon, Sep 30, 2013 at 4:17 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hello,
>
> If you are looking to work with Solr and Elasticsearch, among other
> things, this may be for you:
>
> http://blog.sematext.com/2013/09/26/solr-elasticsearch-job-engineering/
>
> This role offers a healthy mix of Solr/ES consulting, support, and
> product development.
>
> Everything that might be of interest should be there, but I'll be
> happy to answer any questions anyone may have off-list.
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>


AshwinTandel.docx
Description: application/vnd.openxmlformats-officedocument.wordprocessingml.document


Re: OpenJDK or OracleJDK

2013-09-30 Thread Shawn Heisey
On 9/30/2013 9:28 AM, Raheel Hasan wrote:
> hmm why is that so?
> Isnt Oracle's version a bit slow?

For Java 6, the Sun JDK is the reference implementation.  For Java 7,
OpenJDK is the reference implementation.

http://en.wikipedia.org/wiki/Reference_implementation

I don't think Oracle's version could really be called slow.  Sun
invented Java.  Sun open sourced Java.  Oracle bought Sun.

The Oracle implemetation is likely more conservative than some of the
other implementations, like the one by IBM.  The IBM implementation is
pretty aggressive with optimization, so aggressive that Solr and Lucene
have a history of revealing bugs that only exist in that implementation.

Thanks,
Shawn



Re: OpenJDK or OracleJDK

2013-09-30 Thread Otis Gospodnetic
Hi,

A while back I remember we notices some SPM users were having issues
with OpenJDK.  Since then we've been recommending Oracle's
implementation to our Solr and to SPM users.  At the same time, we
haven't seen any issues with OpenJDK in the last ~6 months.  Oracle
JDK is not slow. :)

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Mon, Sep 30, 2013 at 11:02 PM, Shawn Heisey  wrote:
> On 9/30/2013 9:28 AM, Raheel Hasan wrote:
>> hmm why is that so?
>> Isnt Oracle's version a bit slow?
>
> For Java 6, the Sun JDK is the reference implementation.  For Java 7,
> OpenJDK is the reference implementation.
>
> http://en.wikipedia.org/wiki/Reference_implementation
>
> I don't think Oracle's version could really be called slow.  Sun
> invented Java.  Sun open sourced Java.  Oracle bought Sun.
>
> The Oracle implemetation is likely more conservative than some of the
> other implementations, like the one by IBM.  The IBM implementation is
> pretty aggressive with optimization, so aggressive that Solr and Lucene
> have a history of revealing bugs that only exist in that implementation.
>
> Thanks,
> Shawn
>


Re: Percolate feature?

2013-09-30 Thread Otis Gospodnetic
Just came across this "ancient" thread.  Charlie, did this end up
happening?  I suspect Wolfgang may be interested, but that's just a
wild guess.

I was curious about your feeling that what you were open-sourcing
might be a lot faster and more flexible than ES's percolator - can you
share more about why do you have that feeling and whether you've
confirmed this?

Thanks,
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Mon, Aug 5, 2013 at 6:34 AM, Charlie Hull  wrote:
> On 03/08/2013 00:50, Mark wrote:
>>
>> We have a set number of known terms we want to match against.
>>
>> In Index:
>> "term one"
>> "term two"
>> "term three"
>>
>> I know how to match all terms of a user query against the index but we
>> would like to know how/if we can match a user's query against all the terms
>> in the index?
>>
>> Search Queries:
>> "my search term" => 0 matches
>> "my term search one" => 1 match  ("term one")
>> "some prefix term two" => 1 match ("term two")
>> "one two three" => 0 matches
>>
>> I can only explain this is almost a reverse search???
>>
>> I came across the following from ElasticSearch
>> (http://www.elasticsearch.org/guide/reference/api/percolate/) and it sounds
>> like this may accomplish the above but haven't tested. I was wondering if
>> Solr had something similar or an alternative way of accomplishing this?
>>
>> Thanks
>>
>>
> Hi Mark,
>
> We've built something that implements this kind of reverse search for our
> clients in the media monitoring sector - we're working on releasing the core
> of this as open source very soon, hopefully in a month or two. It's based on
> Lucene.
>
> Just for reference it's able to apply tens of thousands of stored queries to
> a document per second (our clients often have very large and complex Boolean
> strings representing their clients' interests and may monitor hundreds of
> thousands of news stories every day). It also records the positions of every
> match. We suspect it's a lot faster and more flexible than Elasticsearch's
> Percolate feature.
>
> Cheers
>
> Charlie
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk


Problem regarding queries enclosed in double quotes in Solr 3.4

2013-09-30 Thread Kunal Mittal
We have a Solr 3.4 setup. When we try to do queries with double quotes like :
"semantic web" , the query takes a long time to execute.
One solution we are thinking about is to make the same query without the
quotes and set the phrase slop(ps) parameter to 0. That is quite quicker
than the query with the quotes and gives similar results to the query with
quotes.
Is there a way to fix this by modifying the schema.xml file? Any suggestions
would be appreciated.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-regarding-queries-enclosed-in-double-quotes-in-Solr-3-4-tp4092856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem regarding queries with numbers with a decimal point

2013-09-30 Thread Kunal Mittal
We have a Solr 3.4 setup. When we try to do queries with a decimal point like
: "web 2.0" , the query takes a long time to execute.
One fix we did was to set generateNumberParts="0" in the
"solr.WordDelimiterFilterFactory"

This reduced the query time greatly but we want to reduce it further. Is
there a way to fix this by modifying the schema.xml file? Any suggestions
would be appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-regarding-queries-with-numbers-with-a-decimal-point-tp4092857.html
Sent from the Solr - User mailing list archive at Nabble.com.


in Problem

2013-09-30 Thread PAVAN
Hi,

When i type any query string without "in" it is giving proper results. But
when i try same query string using in then it is not displaying the proper
results. May i know what is the problem.

And i mentioned "in" as a stopword. If remove "in" from the stop words it is
not showing relevant results.

Ex :

used computers chennai -- showing good results

used computer in chennai -- Not showing proper results 


Can anybody tell me what is the problem?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/in-Problem-tp4092866.html
Sent from the Solr - User mailing list archive at Nabble.com.