You should maybe scan your db for bad data ...
This bit ...
at sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:324)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:561)
Is probably happening on a specific record somewhere, in the query limit the id
range and try to narrow down which
There is the LuSQL tool which Ive used a few times.
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
http://www.slideshare.net/eby/lusql-quickly-and-easily-getting-your-data-from-your-dbms-into-lucene
- Jon
On Apr 7, 2010, at 11:26 PM, bbarani wrote:
>
> Hi,
>
> I am curr
If it helps at all to mention, I manually updated the last_index_time in
conf/dataimport.properties so I could select a smaller subset and the
delta-import worked which leads me to believe there is nothing wrong with my
DIH delta queries themselves. There must be something wrong with my dataset
th
SolrJ goes through the Solr stack. It talks to the Solr HTTP service
or, in Embedded mode, to the top-level Solr code. All documents are
processed just the same as if you uploaded them with 'curl'.
You have to write JDBC code and submit the fields. There is no special
code involved.
On Wed, Apr
Hi,
I am currently using DIH to index the data from a database. I am just trying
to figure out if there are any other open source tools which I can use just
for indexing purpose and use SOLR for querying.
I also thought of writing a custom code for retrieving the data from
database and use SOLRJ
Sorting takes memory. What data types are the fields sorted on? If
they're strings, that could be a space-eater. If they are ints or
dates, not a problem.
Do the queries pull all of the documents found? Or do they just fetch
the, for example, first 10 documents?
What are the cache statistics like
> Since min(a,b) == -1*max(-1*a, -1*b), you could rewrite the previous
> expression using this more complicated logic and it would work. But
> that's ugly.
>
> Also, it would crash anyway. It looks like max currently requires one
> of its arguments to be a float constant, and neither of our args wo
Well, for a quick trial using trunk, I had to remove the
UnicodeNormalizationFactory, is that yours?
But with that removed, I get the results you do, ASSUMING that you've set
your default operator to AND in schema.xml...
Believe it or not, it all changes and all your queries return a hit if you
d
The point of the cached table is that we don't know where interesting
rows are. Loading from a DB is much faster when you grab the first N
rows, the next N rows, etc. So, some strategy which switches back and
forth between searching for a requested ID v.s. grabbing blocks would
be very efficient.
See the SpellCheckComponent:
http://wiki.apache.org/solr/SpellCheckComponent
Also, the phoneme filters like DoubleMetaphone turn a word into a
series a phonemes. Misspellings that are in the right order will
become the same series. I don't know how to build a spelling
dictionary from a phoneme-fi
Stream XML input (or CSV if you can make that happen) works fine. If
the file is local, you can do a curl that would normally upload a file
via POST, but give this parameter: stream.file=/full/path/name.xml
Solr will read the file locally instead of through HTTP.
On Wed, Apr 7, 2010 at 9:18 AM, W
The NFS mount has to be done with distributed file locking. I don't
know what DFL features are available.
OS Native file locking is the default in solrconfig.xml, and I think
this should be used in your scenario. But doing this over NFS is not
likely to work well.
On Tue, Apr 6, 2010 at 6:42 AM,
Hi,
We are using Solr 1.4 running 2 cores each containing ~90M documents. Each
core index size on the disk is ~ 120 G.
The machine is a 64-bit quad-core 64G RAM running Windows Server 2008.
Max heap size is set to 9G for the Tomcat process. Default caches are used.
Our queries are complex and in
I'm using function queries to boost more recent documents, using
something like the
recip(ms(NOW,mydatefield),3.16e-11,1,1)
approach described on the wiki:
http://wiki.apache.org/solr/FunctionQuery#Date_Boosting
What I'd like to do is figure out the best way to tweak how documents
with missi
Am 25.02.2010 um 02:07 schrieb Andy:
> 1) Built-in hierarchical faceting
> Right now there're 2 patches, SOLR-64 and SOLR-792. SOLR-64 seems to be
> slated for 1.5 release but according to the wiki seems to have poor
> performance. SOLR-792 has better performance according to the wiki but it's
Am 24.02.2010 um 14:42 schrieb Grant Ingersoll:
> What would it be?
Remote administration/editing/filling of synonyms.txt, stopwords.txt, ...
through a request handler, maybe a JSON interface or similar
best
Ingo
--
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2, Admin Google S
My last few delta-imports via DIH have been failing with a StackOverFlow
error. Has anyone else encountered this why trying to importing? I don't
even see any relevant information in the stack trace. Can anyone lend some
suggestions. Thanks...
pr 7, 2010 2:13:34 PM org.apache.solr.handler.dataimp
: I had a slight hiccup that I just ignored. Even when I used Java 1.6
: JDK mode, Eclipse did not know this method. I had to comment out the
: three places that use this method.
:
: javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(true)
That method has existed since Java 1.5, so if you
Hi,
I just thought of sharing a suggestion for overcoming OOM issues with
CachedSQLEntityProcessor.
Consider a scenario as below,
If we have sub entities in DIH,
---> object
-->
object properties
cachedSqlEntityprocessor works as below,
• First enti
Whats is the best way to handle misspellings? Complete ignore them and
suggest alternative searches or some sort of fuzzy matching?
Also, is it possible to use fuzzy matching using the dismax request handler?
Thanks
--
View this message in context:
http://n3.nabble.com/Best-practice-to-handl
On 4/7/2010 9:16 AM, Shawn Heisey wrote:
On 4/5/2010 8:12 PM, Chris Hostetter wrote:
what you cna do however, is have a distinct solrconfig.xml for each
core,
which is just a thin shell that uses XInclude to include big chunkcs of
frequently reused declarations, and some cores can exclude some
Erik,
thank you for responsing.
I will check the code to get some ideas for implementation.
I do need some cached ressources like the CharArraySet of protected words
for a WordDelimiterFilter (for the MAX_LEN-parameter mentioned by Hoss) or a
SynonymFilter .
I think it would consume too much t
I don't think I want to stream from Java, text munging in Java is a PITA. Would
rather stream from a script, so need a more general solution.
The Streaming document interface looks interesting, let me see if I can figure
out how to achieve the same thing without a Java client..
Brian
-Ori
Hi Brian,
I had similar questions when I begun to try and evaluate Solr.
If you use Java and SolrJ you might find these useful:
- http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update
-
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer
Hi,
(I know that this is probably not recommended and not a common
scenario, but...)
Is it possible to have an application using Lucene and a separate
(i.e. different JVM) instance of Solr both pointing at the same
index and read/write to the index from both applications?
I am trying (separately
Hello,
I am using SOLR for some proof of concept work, and was wondering if anyone has
some guidance on a best practice.
Background:
Nightly get a delivery of a few 1000 reports. Each report is between 1 and
500,000 pages.
For my proof of concept I am using a single 100,000 page report.
I want
Hello all,
I read through the wiki to find a solution on filling multiValued fields
with, you guessed it, multiple values. :)
What I have found was a short excerpt of code and I am not really sure,
whether this fills a multiValued-field with multiple values.
The code (not everything is relevant
On 4/5/2010 8:12 PM, Chris Hostetter wrote:
what you cna do however, is have a distinct solrconfig.xml for each core,
which is just a thin shell that uses XInclude to include big chunkcs of
frequently reused declarations, and some cores can exclude some of thes
includes. (ie: turn the problem in
Duh, didnt even think of that. This will probably be the easy way for now
since we are only using a small number of predefined ranges.
Thanks for the reply
--
View this message in context:
http://n3.nabble.com/Bucketing-a-price-field-tp701801p703169.html
Sent from the Solr - User mailing list a
On 4/5/2010 8:43 PM, Mark Miller wrote:
On 04/05/2010 10:12 PM, Chris Hostetter wrote:
: The best you have to work with at the moment is Xincludes:
:
: http://wiki.apache.org/solr/SolrConfigXml#XInclude
:
: and System Property Substitution:
:
: http://wiki.apache.org/solr/SolrConfigXml#System_pr
Hello. It has been a few weeks, and I haven't gotten any responses. Perhaps
my question is too complicated -- maybe a better approach is to try to gain
enough knowledge to answer it myself. My gut feeling is still that it's
something to do with the way term positions are getting handled by th
Oops, the new patch only works on Trie fields, other stuff I said should
still be valid. (One extra thing to be aware of is double counting, see
http://n3.nabble.com/Date-Faceting-and-Double-Counting-td502014.html for
example)
Regards,
gwk
On 4/7/2010 4:03 PM, gwk wrote:
Hi,
A while back I
Hi,
A while back I created a patch for Solr
(http://issues.apache.org/jira/browse/SOLR-1240) to do range faceting on
numbers. I haven't uploaded an updated patch for Solr 1.4 yet, I'll try
to do that shortly. I haven't tested it on a floating point field but in
theory it should work on most n
On 07.04.2010, at 14:24, Lukas Kahwe Smith wrote:
> For Solr the idea is also just copy the index files into a new directory and
> then use http://wiki.apache.org/solr/CoreAdmin#RELOAD after updating the
> config file (I assume its not possible to hot swap like with MySQL).
Since I want to ke
On Apr 7, 2010, at 7:40 AM, MitchK wrote:
I can't believe that Solr isn't caching data like the synonym.txt's
etc.
Solr does cache these, look at the implementation of
SynonymFilterFactory where it keeps SynonymMap.
Are there no ideas how to access them?
There is a public getSynonymMap
On Apr 6, 2010, at 8:44 PM, Blargy wrote:
What would be the best way to do range bucketing on a price field?
I'm sort of taking the example from the Solr 1.4 book and I was
thinking
about using a PatternTokenizerFactory with a SynonymFilterFactory.
Is there a better way?
For faceting..
Hi,
For a project I am running a LAMP cluster (master and multiple slaves). Solr is
running inside Jetty. To make things easy in terms of server management, all
servers are configured the same way, and one server just acts as the MySQL
master.
As for Solr the only data changes happen over nigh
Erick Erickson wrote:
It is possible but you have to take care to match Solr's schema with the
structure of documents in the Lucene index. The correct field names and
query-analyzers should be configured in schema.xml
Is it possible to use Solr v1.4 together with a legacy Lucene (v2.1.0
and/or
I can't believe that Solr isn't caching data like the synonym.txt's etc.
Are there no ideas how to access them?
- Mitch
--
View this message in context:
http://n3.nabble.com/Minimum-Should-Match-the-other-way-round-tp694867p702761.html
Sent from the Solr - User mailing list archive at Nabble.c
Copying from another answer to this question on the list (See "how to deploy
index on SOLR")...
It is possible but you have to take care to match Solr's schema with the
structure of documents in the Lucene index. The correct field names and
query-analyzers should be configured in schema.xml
HTH
I like that name! That's a good way to think of it, assuming the
available coins/bill denominations grow exponentially with a base
roughly of mergeFactor :)
It's also like the odometer on a car.
Mike
On Tue, Apr 6, 2010 at 10:51 PM, Lance Norskog wrote:
> Ok, thanks. I'm studying the RAM buffe
Doni,
have a look at DisMaxRequestHandler. For more information consider the
Solr-Wiki.
Kind regards
- Mitch
--
View this message in context:
http://n3.nabble.com/Search-results-based-on-priority-tp701487p702350.html
Sent from the Solr - User mailing list archive at Nabble.com.
42 matches
Mail list logo