How to read an entire file into a string variable?

Blitz3D Forums/Blitz3D Beginners Area/How to read an entire file into a string variable?

Cybersed(Posted 2003) [#1]
Hi,
I'm just beginning to code with Blitz+ and, as my first project, I would like to open a MS Publisher file and extract some text out of it. Since I don't have the details of the .PUB file format, I plan to search within the file for a certain string which is always at the beginning of the text section I want to grab. For this, the INSTR command would be great, but:

1-How to put the entire file into a string variable without reading it byte by byte, which would be too slow
2- Is a 200K binary file too big to fit in a string?

Thnx in advance


GfK(Posted 2003) [#2]
You'd be better off loading the file into a bank.


jhocking(Posted 2003) [#3]
Instead of reading the entire file into a single string you should read the file in a line at a time, using the same string variable every time, and check that line until you encounter the text you are looking for.

Or use a bank like GFK suggests. I should really learn how to use banks for data; I haven't done much file i/o other than ReadLine/WriteLine.


Cybersed(Posted 2003) [#4]
Is there a way to do a fast search on the content of a bank to retrieve a specific byte sequence? It needs to be fast since I have hundreds of these files, they are all about 180-200K in size. Only one file will be loaded at a time. That's why I was counting on INSTR to do the search. But, since the bank is in memory, that may be fast enough. I'll go do some more tests.


GfK(Posted 2003) [#5]
Try this:

Bank = LoadFileToBank("Filename.txt")
Result = FindTextInBank("some text",Bank)
If Result = 0
	Notify "String not found!"
Else
	Notify "Result found at offset " + Result
EndIf

Function LoadFileToBank(Filename$)
	FILE = ReadFile(Filename)
	BankID = CreateBank(FileSize(Filename))
	ReadBytes BankID,FILE,0,BankSize(BankID)
	CloseFile FILE
	Return BankID
End Function

Function FindTextInBank(Txt$,BankID,Start = 0)
	Ptr = Start
	Repeat
		T$ = Chr$(PeekByte(BankID,Ptr))
		If T$ = Left$(Txt$,1)
			For N = 1 To Len(Txt$)-1
				T$ = T$ + Chr$(PeekByte(BankID,Ptr+N))
				If T$ = Txt$ Then Exit
			Next	
		EndIf
		Ptr = Ptr + 1
	Until Ptr >= BankSize(BankID) Or T$ = Txt$
	If T$ = Txt$
		Return Ptr
	Else
		Return False
	EndIf
End Function
It works. Don't know how fast it is with big files though...

[EDIT] Just did a test on a 170k file - took ~4s to find a text string right near the end of the file.

I haven't really used banks before - there's probably a way of optimising this so that it works in a fraction of the time it currently takes.

[EDIT 2] Try it with debug off! Same test as above takes about 0.3s. :))


Floyd(Posted 2003) [#6]
Here's a trick that just occurred to me.

WriteString writes an integer, the length of the string, followed by the string data.
ReadString reads this back into a Blitz string.

So you could use Blitz to build a new file consisting of an integer ( the length of the original file ) followed by the original file.

Then use ReadString on this new file.


Cybersed(Posted 2003) [#7]
Wow! Plenty of good ideas for me to work on...

At 0.3s, it will be more than fast enough. I may also try Floyd's idea in a slightly modified way: to overwrite the first 4 bytes in the file to make an integer that will be equal the length of the remaining bytes of the files and then read it as a string. That should work as I don't need those first 4 bytes anyway. I should work on a copy of a the file, though, which may slows down things a bit.

Thanks alot, guys.


Difference(Posted 2003) [#8]
You guys should check the codearchives...

http://www.blitzbasic.com/codearcs/codearcs.php?code=685

and

http://www.blitzbasic.com/codearcs/codearcs.php?code=687